Cortical neurons display two fundamental nonlinear response characteristics: contrast-set gain control (also termed contrast normalization) and response expansion (also termed half-squaring). These nonlinearities could play an important role in forming and maintaining stimulus selectivity during natural viewing, but only if they operate well within the time frame of a single fixation. To analyze the temporal dynamics of these nonlinearities, we measured the responses of individual neurons, recorded from the primary visual cortex of monkeys and cats, as a function of the contrast of transient stationary gratings that were presented for a brief interval (200 ms). We then examined 1) the temporal response profile (i.e., the post stimulus time histogram) as a function of contrast and 2) the contrast response function throughout the course of the temporal response. We found that the shape and complexity of the temporal response profile varies considerably from cell to cell. However, within a given cell, the shape remains relatively invariant as a function of contrast and appears to be simply scaled and shifted. Stated quantitatively, approximately 95% of the variation in the temporal responses as a function of contrast could be accounted for by scaling and shifting the average poststimulus time histogram. Equivalently, we found that the overall shape of the contrast response function (measured every 2 ms) remains relatively invariant from the onset through the entire temporal response. Further, the contrast-set gain control and the response expansion are fully expressed within the first 10 ms after the onset of the response. Stated quantitatively, the same, scaled Naka-Rushton equation (with the same half-saturation contrast and expansive response exponent) provides a good fit to the contrast response function from the first 10 ms through the last 10 ms of the temporal response. Based upon these measurements, it appears as though the two nonlinear properties, contrast-set gain control and response expansion, are present in full strength, virtually instantaneously, at the onset of the response. This observation suggests that response expansion and contrast-set gain control can influence the performance of visual cortex neurons very early in a single fixation, based on the contrast within that fixation. In the discussion, we consider the implications of the results within the context of 1) slower types of contrast gain control, 2) discrimination performance, 3) drifting steady-state measurements, 4) functional models that incorporate response expansion and contrast normalization, and5) structural models of the biochemical and biophysical neural mechanisms.
Over the past several decades, many different laboratories have measured the responses of primary visual cortex neurons as a function of luminance contrast, and as a consequence, we have acquired a rich understanding of the basic properties of the contrast response function (e.g., Albrecht 1978, 1995; Albrecht and Geisler 1991;Albrecht and Hamilton 1982; Bonds 1991;Carandini and Heeger 1994; Carandini et al. 1997; DeAngelis et al. 1993; Geisler and Albrecht 1992, 1997; Geisler et al. 1991;Li and Creutzfeldt 1984; Ohzawa et al. 1985; Sclar and Freeman 1982; Sclar et al. 1990; Tolhurst and Heeger 1997; for recent reviews, see Albrecht et al. 2002; Carandini et al. 1999; Geisler and Albrecht 2000). Although there is a great deal of heterogeneity from cell to cell, it is possible to provide a description of the basic properties of the contrast response function that applies to the overwhelming majority of neurons: As the contrast increases from zero, the response increases in an accelerating fashion, remains dynamic over some limited range of contrasts, and then saturates. Studies of this contrast response relationship have revealed two important nonlinear properties of cortical neurons: 1) a contrast-set gain control (also referred to as “contrast normalization”) and 2) an expansive response exponent (also referred to as “half-squaring;” i.e., half-wave rectification followed by an expansive exponent of 2.0).
There is a wealth of evidence to indicate that the two nonlinearities have important consequences on the overall stimulus selectivity of cortical neurons: The expansive response exponent enhances stimulus selectivity and the contrast-set gain control maintains stimulus selectivity independent of contrast (see references and reviews cited in the preceding paragraph as well as Albrecht and Geisler 1994; Bradley et al. 1987;Ferster and Miller 2000; Gardner et al. 1999; Heeger 1991, 1992a,b, 1993; McLean and Palmer 1994; Murthy et al. 1998;Skottun et al. 1986; Troyer et al. 1998). However, in order for these two nonlinear properties to enhance and maintain stimulus selectivity during normal saccadic inspection of a visual scene, their temporal dynamics would have to be rapid enough to occur within a single fixation (i.e., within a few hundred milliseconds, or less).
All of the studies of the contrast response function referenced in the preceding paragraphs have been performed using prolonged (“steady-state”) stimuli with durations far in excess of a few hundred milliseconds, and there was little or no analysis of what occurs over the course of the first few hundred milliseconds. Similarly, investigations of the temporal dynamics of contrast gain control and contrast adaptation have generally measured and analyzed the responses over the course of many seconds (e.g., Albrecht et al. 1984; Bonds 1991; McLean and Palmer 1996; Movshon and Lennie 1979; Ohzawa et al. 1985; Saul and Cynader 1989a,b; Sclar et al. 1989). Although several studies have shown that contrast adaptation (and/or contrast gain control) can have a rapid onset (Bonds 1991; Frazor et al. 1997;Gawne et al. 1996; Geisler and Albrecht 1992; Müller et al. 1999; Reich et al. 2001; Tolhurst et al. 1980), there have been no comprehensive evaluations of how the basic characteristics of the contrast response function (specifically, the expansive response exponent and the contrast-set gain control) develop during the first few hundred milliseconds after presentation of a given contrast. Therefore it is uncertain whether the important consequences of the contrast nonlinearities on stimulus selectivity operate within the time frame that would be useful for perception during normal saccadic inspection.
The goal of the present study was to measure the temporal dynamics of the contrast response function over the course of a very brief interval in order to assess whether the contrast-set gain control and response expansion operate rapidly enough to be effective within a single fixation. To accomplish this goal, we presented stationary sinusoidal gratings for 200 ms at 10 different levels of contrast and 8 different spatial positions. The responses as a function of contrast and spatial position were analyzed throughout the entire course of the response in time bins that varied from 2 to 20 ms (depending on the analysis). In general, we found a great diversity from cell to cell in the temporal pattern of the response, the temporal response profile (i.e., in the shape of the poststimulus time histogram); however, within a given cell, the shape of the temporal response profile was relatively invariant as a function of contrast and position. Further, the basic characteristics of the contrast response function (in particular, the response expansion and the contrast-set gain control) were fully expressed within the first 10 ms after the onset of the response and they remained relatively invariant throughout the entire time course of the response, for both optimal and nonoptimal spatial positions.
Recording and physiology
The procedures for the paralyzed anesthetized preparation, the electrophysiological recording, the stimulus display, and the measurement of neural responses using systems analysis were similar to those described elsewhere (Albrecht and Geisler 1991;Geisler and Albrecht 1997; Geisler et al. 2001; Hamilton et al. 1989; Metha et al. 2001). All experimental procedures were approved by the University of Texas at Austin Institutional Animal Care and Use Committee and conform to the National Institutes of Health guidelines. In brief, young adult cats (Felis domesticus) and monkeys (Macaca fascicularis or M. mulatta) were prepared for recording under deep isoflurane anesthesia. Following the surgical procedures, isoflurane anesthesia was discontinued. Anesthesia and paralysis were maintained throughout the duration of the experiment using the following pharmaceuticals. For cats, anesthesia was maintained with pentothal sodium (2–6 mg · kg−1 · h−1). For monkeys, anesthesia was maintained with sufentanil citrate (2–8 μg · kg−1 · h−1). For both species, paralysis was maintained with gallamine triethiodide (10 mg · kg−1 · h−1) as well as pancuronium bromide (0.1 mg · kg−1 · h−1). The physiological state of the animal was monitored throughout the experiment by continuous measurement of the following quantitative indices: body temperature, inhaled/exhaled respiratory gases, pressure in the airway, fluid input, urine output, urinary pH, caloric input, blood glucose level, electroencephalogram, and electrocardiogram. Microelectrodes were inserted into regions of the primary visual cortex such that the receptive fields of the neurons were located within 5° of the visual axis. Three different types of microelectrodes were utilized: varnish-insulated tungsten, glass pipette, or glass-coated platinum-iridium. The impedances of the microelectrodes ranged from 8 to 21 MΩ.
Action potentials were collected with a temporal accuracy of 0.1 ms and then binned to produce a poststimulus time histogram (PSTH). Note that within this report, we refer to the PSTH as the temporal response profile. For some analyses, the responses were averaged across 2-, 10-, 20-, or 200-ms time bins; for other analyses, the responses were averaged across 10-ms bins and this average was computed every ms: a 10-ms running average evaluated at every ms throughout the entire time course of the response (cf. De Valois and Pease 1973;Enroth-Cugell and Robson 1966; Gerstein 1960; Levick and Zacks 1970).
The stimuli were presented on a monochromatic Image Systems monitor at a frame rate of 100 Hz, with a mean luminance of 27.4 cd/m2. To overcome the nonlinearities inherent in visual displays, both hardware and software methods were used to ensure a linear relationship between the requested luminance and the measured luminance. The precision of these methods to overcome the nonlinearities in the visual display was assessed through quantitative measurements that were performed before, during, and after each experiment.
Prior to the main experimental protocol, several qualitative determinations were performed by simply listening to the firing rate of the cell as a function of different stimulus manipulations. Specifically, the optimum orientation, spatial frequency, and temporal frequency were determined by varying the stimuli along these dimensions while listening to the firing rate of the cell. For the dimension of contrast, the minimum detectable contrast, half-saturation contrast, and saturation contrast were determined. In all of the experiments reported here, the stimuli were confined to the conventional receptive field, which was determined by expanding the size (the length and the width separately) of an optimal drifting sine wave grating until the neuron's response stopped increasing (DeAngelis et al. 1994; De Valois et al. 1985). Following these qualitative determinations, the responses of each cell were quantitatively and systematically measured as a function of orientation, spatial frequency, and contrast. Cells were classified as simple cells or complex cells using the criteria described by De Valois et al. (1982; see also Skottun et al. 1991).
Stimulus protocol: stationary gratings at different contrasts and phases
Following the preliminary experiments, stationary grating patterns (of the optimum spatial frequency and orientation) were presented at 8 different spatial phase/positions, each separated by 45°, and at 10 different levels of contrast (in linear increments), making a total of 80 unique combinations of phase and contrast. Each unique combination was turned on (flashed) for 200 ms and then turned off for 300 ms, with a minimum of 40 (and a maximum of 80) repeated presentations. (During the 300-ms interval, when the stationary grating was turned off, the animal viewed the mean luminance.) The stimuli were presented in a counterbalanced fashion such that all stimulus conditions occurred an equal number of times, in a random order. The range of contrasts (starting at 0%) was chosen to include the cell's dynamic region as well as the cell's saturated region, both of which can vary from cell to cell. Presenting the grating in eight different spatial phase/positions ensures that 1) the space average luminance remains equivalent throughout the course of the experiment over the entire receptive field, 2) both optimal and nonoptimal spatial positions are sampled, and 3) many different regions of the receptive field are sampled with different luminance increments and decrements.
Assessing goodness of fit
To assess goodness of fit, we indexed the percentage of the variation in the data that was accounted for, using the following standard procedure. First, we computed the sum of the squared deviations from the mean (the “total variation”). Second, we computed the sum of the squared deviations between the data and the predictions (the “residual variation”). Finally, we subtracted the residual variation from the total variation, divided the result by the total variation, and multiplied this ratio by 100 to express the variance accounted for as a percentage of the total variation.
Analysis of the variation in the PSTHs at different spatial positions
We have shown that the amplitude of the response as a function of contrast is similar in shape at different spatial phases when the stimulus is a counterphase flickering grating (Albrecht and Geisler 1991, 1994). In this report, we will demonstrate the same basic fact when the stimulus is a stationary flashed grating (see Figs. 11-13). Because the primary focus of this study was the temporal dynamics of the contrast response function of the receptive field as a whole, the responses at the different spatial locations were averaged together, for a given level of contrast. It is well known that there is variation in the amplitude of the PSTH at different spatial positions, particularly for simple cells (e.g., Albrecht and Geisler 1991; De Valois et al. 1982; Movshon et al. 1978). Using stationary flashed gratings, it has also been demonstrated that PSTHs of both simple and complex cells can show variations with spatial phase (Victor and Purpura 1998). Given that we are averaging the PSTHs across all of the spatial positions (for a given level of contrast), we performed a preliminary analysis to assess the degree of any systematic variation in the overall shape of the PSTHs at the different spatial positions.
To accomplish this goal, we determined what percentage of the variation in the shape of the PSTHs, at the different positions, could not be accounted for by the average across all of the positions. We also determined what percentage of the residual variation was a consequence of systematic variation as opposed to the stochastic variation inherent in the responses of cortical neurons. Specifically, we took the average PSTH and determined the scale factor (for the amplitude of the PSTH) that was required to fit the averageof the eight positions to each of the eight positions. We then determined what percentage of the variation across all eight positions could be accounted for by simply scaling the amplitude of the average for each position. Finally, we determined what would be expected by chance alone (i.e., by stochastic variation).1
We found that approximately two-thirds of the cells did show systematic residual variation (at the 0.01 level) that was not accounted for by the average PSTH or by chance alone. However, this systematic residual variation was relatively small. The median value was 4.2%; 8 of the 50 neurons showed a value larger than 10%; 2 showed a value larger than 20%; the largest value was 24%. There were no obvious trends across cell type or animal type. As might be expected, among the 15 simple cells there was a small positive correlation (ρ = 0.35) between the systematic residual variation and the direction selectivity (but it was not significant,P = 0.1). Based on this preliminary analysis, it therefore seemed reasonable to average the responses across spatial phase.
PSTH as a function of contrast
We measured and analyzed the responses of 26 neurons recorded from the cat visual cortex and 24 neurons recorded from the monkey visual cortex. Figure1 A shows the responses of a neuron (a complex cell recorded from the visual cortex of a monkey) through time, for 10 levels of contrast. The stimulus was an optimal stationary grating pattern, which was turned on for 200 ms and then turned off for 300 ms. The 10 different levels of contrast (spanning 0–90% in linear increments) and the eight different spatial phase positions (separated by 45°) resulted in 80 unique stimulus configurations. Each of these 80 stimulus configurations was presented on 40 different occasions. The PSTH at any given contrast is the average across the eight spatial phases and the 40 repetitions at each spatial phase (i.e., the average of 320 stimulus presentations). For ease of viewing the differences between the PSTHs as function of time and contrast, the responses in the running 10-ms time bins (which are evaluated every millisecond; seemethods) are only plotted at 4-ms intervals, and only 100 ms of the response is shown.
As can be seen, the responses through time (illustrated in Fig.1 A) are systematic as a function of contrast. Further, the overall shape of the temporal response profile (i.e., the PSTH) appears to be qualitatively similar across contrast and can be described as follows: the response increases from the spontaneous level of firing to the peak very rapidly and then gradually falls back toward the spontaneous level, reaching a rather sustained plateau. Finally, note that as the contrast decreases, the temporal response profile scales downward and shifts rightward (i.e., the amplitude of the response decreases and the latency of the response increases).
Figure 1 B plots the responses from the same cell as a function of contrast at six time intervals during the course of the temporal response: 58 (♦), 62 (•), 70 (▴), 78 (●), 86 (▾), and 102 ms (×). As can be seen, response expansion (evident in the accelerating response at low contrasts) is apparent within every time interval, and response saturation (at high contrasts) can be seen in every time interval, except the first (♦). Note, however, that when plotted in this fashion, the shape of the contrast response function changes through time (cf. Fig.2 B). Figure 1, Cand D, shows the same measurements for a monkey simple cell. For this neuron, the range of contrasts spanned 0–80%. The contrast response functions were sampled at six time intervals following the onset of the stimulus: 38 (♦), 42 (•), 50 (▴), 58 (●), 66 (▾), and 82 ms (×). Figure 1,E and F, shows the same measurements for another monkey complex cell. For this neuron, the range of contrasts spanned 0–30%. The contrast response functions were sampled at six time intervals: 50 (♦), 54 (•), 62 (▴), 70 (●), 78 (▾), and 94 ms (×). The overall pattern of results is very similar for all three cells: the shapes of the PSTH appear similar at the different levels of contrast; however, as the contrast decreases, the PSTH scales downward and shifts rightward.
Average PSTH as a function of contrast: scaled and shifted
As noted in the previous paragraph, the shape of the temporal response profile appears to be qualitatively similar across contrast. Further, as contrast decreases, the profile appears to scale downward and shift rightward. To evaluate quantitatively the similarity of the temporal response profiles at the different levels of contrast, we measured the percentage of the variation that could be accounted for by simply scaling and shifting the average PSTH. Figure 2 Ashows the results for the same set of measurements illustrated in Fig.1 A: the solid line through each PSTH is the average PSTH after scaling and shifting. For ease of viewing, the time shifts have been removed so that the PSTHs are aligned across all of the different levels of contrast. Thus the onset of the response, the peak, the decline after the peak, the plateau, and so forth, all occur at the same time. (Note that the values of the scalar and the values of the time shift provide estimates of the effect of contrast on the amplitude and the latency of the response, respectively.)
Figure 2 B plots the responses as a function of contrast at six time intervals within the PSTHs shown in Fig. 2 A. As in Fig. 1 B, the six contrast response functions were sampled at the following time intervals: 58 (♦), 62 (•), 70 (▴), 78 (●), 86 (▾), and 102 ms (×). As can be seen, when plotted with respect to the onset of the response at each level of contrast, the overall shape of the contrast response function appears to be very similar for these six time intervals. This is, of course, exactly what one would expect if the shape of the temporal response profile is relatively invariant as a function of contrast. The smooth curves through the data points, which show the fits of the Naka-Rushton equation (see ), quantitatively demonstrate that the shapes are similar at the six time intervals: The expansive exponent and the half-saturation contrast are identical across the curves, and only the maximum response at each level of contrast was allowed to vary. Further, the data points and the curves demonstrate that the expansive response exponent and the contrast-set gain control are fully and equivalently expressed at all the time intervals, even within the first 10 ms after the onset of the response. Figure 2, C–F,illustrates the same analysis for the other two neurons shown in Fig.1. Figure 3 shows the same type of analysis performed on three neurons recorded from the cat visual cortex.
For the cells shown in Fig. 2, A, C, andE, scaling and shifting the average PSTH accounts for 99, 99, and 98% of the variation in the responses, respectively. For the cells shown in Fig. 3, A, C, and E,scaling and shifting the average PSTH accounts for 98, 98, and 95% of the variation in the responses, respectively. Overall, the pattern of results is very similar for these six neurons. The temporal response profiles are reasonably well fitted by simply scaling and shifting the average profile. The results of this analysis indicate that the shape of the temporal response profile remains relatively invariant with contrast.
As noted, the smooth curves in the righthand panels of Figs. 2 and 3show the best fit of a single Naka-Rushton equation with a different amplitude scale factor for each time interval. This amounts to an eight-parameter equation: the exponent, the half-saturation, and six scale factors. (Note that the spontaneous discharge is measured at 0% contrast and therefore it is not a free parameter in the fitting process.) If a unique/independent Naka-Rushton equation is fitted to the responses as a function of contrast for each time interval separately, the number of parameters increases from 8 to 18: 3 for each of the six time intervals. As can be seen in Figs. 2 and 3, the eight-parameter equation captures a large percentage of the variation in the data. When the 18 parameters are free to vary, the increase in the variance accounted for is rather modest. For the cell shown in 2B, the value increases from 99.5 to 99.8%; for the cell shown in 2D, the value increases from 99.0 to 99.2%; for the cell shown in 2F, the value increases from 98.9 to 99.6%.
Across the population as a whole (50 cells), the average percentage of variation that is accounted for by a single scaled Naka-Rushton equation is 94.7%± 0.61% (mean ± SD); the median value is 95.5. In comparison, the mean value for the 18 parameter equation is 95.9 ± 0.49%; the median value is 96.9%. This comparison suggests that the eight-parameter equation provides a reasonably good fit to the responses for all of the cells. This analysis indicates that the contrast response function is relatively invariant throughout the temporal response.
Diversity in the shapes of the PSTHs
Although the shapes of the temporal response profiles illustrated in Figs. 1-3 are representative of many cells, there are some cells that have shapes that are quite different. Figure4 illustrates the diversity in the shapes that we encountered within the sample of monkey cells. The profiles were grouped into 12 arbitrary categories (shown in each panel) based on a qualitative visual evaluation of similarity. The number of cells within each category is given within each panel, and the panels are organized from top to bottom and left to right based upon the number within each category. Thus for example, the profiles illustrated inA–F are similar to the profiles illustrated in Figs. 1-3: There is a rapid rise in the response to a peak and then a more gradual decline to a sustained plateau. The cells in A–E are sorted primarily on the basis of the magnitude of the plateau. The cells illustrated in C are separated from those in Bbecause of the small amplitude oscillations on the sustained plateau. For the cells illustrated in F, there is no sustained plateau; further, the rise and decline around the peak are approximately symmetric. The cells illustrated in G–L are considerably more heterogeneous and, as indicated by the counts in each panel, they occurred less frequently in the sample.
Figure 5 illustrates the diversity in the shapes of the temporal profiles that we encountered within the sample of cat cells. As can be seen, although there is a good deal of overlap between the shapes we measured in the cats versus the monkeys, the variety of shapes within the cat sample seems even more diverse. There appear to be some obvious qualitative similarities and differences between the cat and monkey profiles; however, given the diversity that we observed in the shapes (particularly within the cat cells), it might be necessary to perform these measurements on a very large sample of cells before drawing any firm conclusions. Nonetheless, consider the following observations: 1) the profiles illustrated in 5A are very similar to the shapes illustrated in 4A. 2) The shapes illustrated in 5Fare very similar to the shapes illustrated in 4F and so forth. And 3) on the other hand, the cat cells shown inH, I, and L had no obvious counterparts in the monkey sample.
Figure 6 shows the responses as a function of time and contrast for three cells with more complex and unusual profiles in comparison to the cells illustrated in Figs. 2 and3. The cell shown in A illustrates what might be termed “secondary oscillations.” The neuron begins responding at approximately 30 ms (after the onset of the stimulus), and the response rapidly increases to a maximum at approximately 50 ms. The response rapidly declines to a trough at approximately 80 ms and then rises again to a secondary peak (with an amplitude that is almost half the value of the 1st peak) at approximately 110 ms. The response then trails off to a relatively sustained plateau that remains larger than the response at the initial trough, even at 190 ms. The cell shown inC begins responding approximately 45 ms after stimulus onset. The response rapidly increases to a peak at approximately 75 ms and then declines to a trough (that is only slightly larger than the spontaneous discharge) at approximately 130 ms. After this low point the response begins to increase once again.
The analysis illustrated here is the same as in Figs. 1-3. As can be seen, the average temporal response profile (scaled and shifted) provides a reasonably good description of the basic trends in the responses even for those cells that have profiles with more complex shapes. For the cell shown in A, the percentage of the variance accounted for by the average profile is 97.0; for the cell shown in C, the percentage is 97.2; and, for the cell shown in E, the percentage is 97.1. Nonetheless, there are some deviations from the average profile that appear to be systematic. For example, in A, the responses during the initial rise and decline appear to undershoot (by a small amount) the average profile at lower contrasts and overshoot the average at higher contrasts. On the other hand, the responses during the second rise and decline undershoot the average profile at higher contrasts. Further, in E, the shape of the temporal response clearly changes as the contrast increases: At higher contrasts there is a pronounced dip in the response at approximately 80 ms and then a secondary oscillation that peaks at approximately 120 ms. (Such oscillations were first noted byTolhurst et al. 1980.) This pattern of results is not apparent in the responses at the lower contrasts and thus at 80 ms into the temporal response (i.e., at the dip), the average profile overshoots the responses at higher contrasts and undershoots the responses at lower contrasts.
Figure 6 B shows the corresponding contrast response functions (for the responses illustrated in A) at several time intervals. The smooth curves show the fit of a single scaled Naka-Rushton equation (with the same exponent and half-saturation contrast). As can be seen, the shape of the contrast response function is relatively invariant throughout the time course of the temporal response. The single scaled function accounts for approximately 98% of the variation in the responses. If a separate Naka-Rushton equation is fitted to each contrast response function, the percentage of the variance accounted for only increases by approximately 1%. Figure 6,D and F show the comparable analysis for the cells shown in C and E, respectively. Once again, a single scaled Naka-Rushton equation accounts for approximately 98% of the variation, thus indicating that the shape of the contrast response function is relatively invariant through time even for cells with temporal response profiles that are more complex than those shown in Figs. 1-3.
Figure 7 shows the percentage of the variation that can be accounted for by scaling and shifting the average temporal response profile for the entire sample of cells. This histogram shows that the average profile for each cell provides a reasonably good description of the responses as a function of time and contrast. The percentage is more than 90% for 43 of the 50 cells, with the lowest value being 86%. The mean value is 94 ± 3.8%; the median value is 95%. The high values of this index show that the responses through time are relatively invariant as a function of contrast. Note that the cells with the lower percentages do not reveal any obvious systematic variations in the temporal response profile as a function of contrast; instead, the low percentages appear to be more related to the stochastic properties of the cells, given the low firing rates and the high variance proportionality constants.
The results of these measurements and analyses indicate that the contrast response function remains relatively invariant as a function of time, even for those cells that have other types of complex temporal dynamics in the responses as a function of time (e.g., those cells with unusual and complex shaped PSTHs, with multiple oscillations).
Correlation between the time to peak and the width of the PSTH
Visual inspection of the temporal response profiles illustrated in Figs. 2, 3, 5, and 6 reveal a trend across the population as a whole (see also Fig. 18): There is a positive correlation between the time to the peak of the response and the width of the profile. Figure8 shows a scatter plot that illustrates this trend: Width at half-height is plotted along the vertical axis and time to the peak of the response is plotted on the horizontal axis. The positive correlation is clear; the slope of the best fitting straight line (shown in the figure) is 1.06. (Note that 6 of the cells are not plotted because their responses did not decrease to half of the maximum response, even after 200 ms.) The reason for this correlation remains unclear; further experimentation would be required to explore this phenomenon. Nonetheless, it is worth noting that there are hypothetical mechanisms that could potentially account for this type of behavior. For example, increasing the number of low-pass filters in a cascade would produce a correlation between the width and the latency.
First 16 ms after the onset of the response: 2-ms time bins
Figure 9 shows the growth of the response of a monkey neuron as a function of the contrast of a stationary grating during the first 16 ms of the neuron's response: Each set of data points plots the responses in sequential 2-ms time bins after the onset of the response. The sequential order of the symbols is as follows: (•), (▴), (♦), and (●) with - - -, and (•), (▴), (♦), and (●) with —. For example, • connected by - - - plot the response of the cell during the first 2 ms, and ● connected by — plot the response of the cell during the last 2 ms (of the 16-ms time period). Note that in comparison to the responses obtained with the 10-ms running average, the responses obtained with nonoverlapping 2-ms time bins are more variable. Nonetheless, the pattern of results is systematic because the responses are the average of 320 presentations.
The smooth curves through the data points plot the fit of a single Naka-Rushton function with the same expansive response exponent (3.1) and the same half-saturation contrast (31.0%); only the maximum response was allowed to vary. As can be seen, this single scaled Naka-Rushton equation provides a good fit to the responses across all of the 2-ms time bins. The fit is based upon 10 parameters: the exponent, the half-saturation, and 8 amplitude scale factors. (As noted in the preceding text, the spontaneous discharge is measured at 0% contrast and therefore it is not a free parameter in the fitting process.) If a unique/independent Naka-Rushton equation is fitted to the responses as a function of contrast for each time interval separately, the number of parameters increases from 10 to 24: 3 for each of the 8 time intervals. The 10-parameter fit accounts for 96% of the variation, whereas the 24-parameter fit accounts for 97%.
Figure 10 shows the same analysis applied to six additional neurons. The three panels on the left (A, C, and E) plot the responses of three cat neurons; the three panels on the right (B, D, andF) plot the responses of three monkey neurons. Cand D plot the responses of two simple cells; the other four panels plot the responses of complex cells. The sequential order of the symbols and fitted curves is the same as the order described for Fig.9. As can be seen, the overall trend for these six neurons is similar to the trend described for the cell shown in Fig. 9.
Across a sample of 22 neurons, the 10-parameter fit accounts for 92.9% of the variation; the 24-parameter fit accounts for 94.2%. (The 2-ms time bins were so small that the resulting low signal-to-noise ratio precluded performing the analysis on all of the cells.) In sum, this analysis of the first 16 ms suggests that the nonlinear characteristics of the contrast response function are present as soon as the cell responds.
Response amplitude: optimal and nonoptimal spatial phases
Figures 1-3, 6, 9, and 10 demonstrate that the response saturation and all of the other gain related scaling that occurs throughout the entire contrast response function are fully present within 10 ms after the onset of the response. Further, the saturation and other gain related changes appear to occur at the same contrast throughout the entire time course of the response. Specifically, the responses as a function of contrast are well fitted by a single scaled Naka-Rushton equation, with the same half-saturation contrast and response exponent, for all times during the response. Therefore if the responses are summed across the entire 200 ms of the PSTH, the responses as a function of contrast must be well fitted by the same, scaled Naka-Rushton equation. These facts suggest that the response saturation and other gain changes apparent throughout the entire temporal response, including the first 10 ms, are the same type of contrast-set gain changes that have been extensively studied over the past 20 years (see references in the introduction). If so, then the scaling and saturation should be determined only by the magnitude of the contrast, independent of the magnitude of the response.
In order to provide a more direct assessment of this hypothesis, we evaluated the amplitude of the response as a function of contrast during the first 20 ms after the onset of the response for1) a grating positioned at an optimal spatial phase and2) a grating positioned at a nonoptimal spatial phase. The logic of this analysis has been described in detail elsewhere (e.g.,Albrecht and Geisler 1991; Albrecht and Hamilton 1982; Bonds 1991; Carandini et al. 1997; Geisler and Albrecht 1997; Heeger 1991, 1992a; Sclar and Freeman 1982; for general reviews, see Albrecht and Geisler 1994; Albrecht et al. 2002; Carandini et al. 1999;Geisler and Albrecht 2000). In brief, if the scaling and saturation are determined by the contrast, then the two contrast response functions should have the same shape and half-saturation contrast, but they should have different maximum firing rates. On the other hand, if the scaling and saturation are determined by the response, then the two contrast response functions should have different shapes (in linear coordinates) and the same maximum firing rates, but they should have different half-saturation contrasts (cf. Figs. 7 and 9; see also related discussion in Albrecht and Hamilton 1982 and Fig. 9 and related discussion inAlbrecht and Geisler 1991).
Figure 11 shows the results of this analysis for 12 neurons. Each panel plots the amplitude of the response as a function of contrast, during the first 20 ms, for a stationary grating that was presented at an optimal and a nonoptimal spatial phase. The smooth curves in each panel plot the best fitting Naka-Rushton equation, with the same response exponent and half-saturation contrast; only the maximum response (an amplitude scale factor) was free to vary. As can be seen, the responses as a function of contrast for the optimal and the nonoptimal stimuli appear to be scaled versions of the same contrast response function.
This analysis was performed on a total 23 neurons. For these cells, a single scaled Naka-Rushton equation accounted for 97% of the variation (on average) in the responses as a function of contrast. If the responses for the optimal and nonoptimal phase are fitted with two unique Naka-Rushton equations, the percentage of the variation that is accounted for is improved by less than 1%. (The analysis could not be performed on the entire sample of 50 neurons because many of the complex cells showed little or no variation with spatial phase.) These results are consistent with the hypothesis that the scaling and saturation are set by the magnitude of the contrast and not by the magnitude of the response.
An alternative method for comparing and quantifying the similarity of the shapes and half-saturation contrasts of the contrast response function is to plot the responses for the optimal phase on the horizontal axis and the nonoptimal phase on the vertical axis. If the shapes and half-saturation contrasts are similar, then the responses should cluster around a straight line through the origin. Figure12 shows the results of this analysis for the 12 neurons illustrated in Fig. 11; the solid line in each panel shows the best fitting straight line though the origin. These cells illustrate that the shape of the contrast response function is quite similar at the optimal and the nonoptimal spatial phase. This observation indicates that the saturation occurs at the same contrast, even though the response magnitude is quite different at that contrast. In other words, the saturation nonlinearity (which is apparent within the first 20 ms of the response) is determined by the magnitude of the contrast and not the magnitude of the response.
Finally, if the responses to the nonoptimal stimulus are scaled by simply dividing the y coordinate of each data point by the slope of the best fitting straight line, it is then possible to plot all of the responses for all of the cells on the same scatter plot. The results of this analysis for all of the responses from all 23 cells are shown in Fig. 13. As can be seen, the responses to the optimal and nonoptimal stimuli cluster around the unity line. This observation indicates that the shape of the contrast response function is similar at both the optimal and the nonoptimal spatial phase and that the saturation occurs at the same contrast for the optimal and nonoptimal stimuli, even though the response magnitude is different. In sum, the results of these measurements and these analyses are consistent with the hypothesis that the scaling and saturation during the first 20 ms are set by the magnitude of the contrast, independent of the magnitude of the response.
Response latency: optimal and nonoptimal spatial phases
Figure 1 illustrates that as the contrast decreases, the temporal response profile shifts rightward in time. This shift in the response latency could potentially be a consequence of either the response magnitude or the contrast magnitude. Previous studies of the phase transfer function of simple cells have shown that as the contrast increases, the phase of the response advances, and that this advance depends upon the magnitude of the contrast as opposed to the magnitude of the response (Albrecht 1978, 1995; Carandini and Heeger 1994; Carandini et al. 1997). It seems likely that the shifts in latency demonstrated in the present experiments (for both complex cells and simple cells) also depend upon the magnitude of the contrast (see also Gawne et al. 1996; Reich et al. 2001; Richmond et al. 1997).
To provide a direct assessment of this hypothesis, we measured the latency shifts for 1) a grating that was positioned at an optimal spatial phase and 2) a grating that was positioned at a nonoptimal spatial phase. In both cases, the variations in the magnitude of the contrast are identical; however, the variations in the magnitude of the response are quite different. If the latency shift is determined by the magnitude of the contrast, then the shifts should be similar for the optimal and the nonoptimal phase. On the other hand, if the latency shift is determined by the magnitude of the response, then the magnitude of the shift should be larger for the optimal stimulus. Figure 14 shows the results of this analysis for the same 12 neurons shown in Figs. 11-13. The ▴ plot the response latencies for the optimal phase and the ○ plot the latencies for the nonoptimal phase; the solid line shows the fit of a single inverted Naka-Rushton equation (which has been shown to provide a good fit to the latency shift that occurs as a function of contrast; see Albrecht 1995).
As can be seen, the latency shifts as a function of contrast are very similar for both the optimal and the nonoptimal phase, even though the magnitude of the response is quite different (cf. Fig. 11). Thus a single inverted Naka-Rushton equation provides a good fit to the latency shifts for both the optimal and the nonoptimal stimulus. This observation indicates that the latency shift is induced by the magnitude of the contrast and not the magnitude of the response. Figure15 shows the response latencies for all 23 neurons. The response latency for the optimal phase is plotted along the horizontal axis and the response latency for the nonoptimal phase is plotted along the vertical axis. If the latencies are the same for optimal and nonoptimal phases, then the latencies should cluster around a diagonal line through the origin with a slope of 1.0. The solid line shows the best fitting straight line passing through the origin; this indicates that the response latencies at the different levels of contrast are similar for the optimal and nonoptimal stimuli even though the response amplitudes for the two stimuli are quite different.
The results of these measurements and these analyses are consistent with the hypothesis that the shifts in the latency are determined by the magnitude of the contrast independent of the magnitude of the response.
Not all cells show nonlinear contrast response characteristics
It is important to note that not all of the cells that one finds in the striate cortex have a nonlinear contrast response function. In particular, not all of the cells have an expansive response exponent and not all of the cells saturate at higher contrasts. For example, a small fraction of the cells in this study (approximately 5%) demonstrated a nearly linear relationship between the magnitude of the response and the magnitude of the contrast.2 This percentage is very similar to the percentage of linear contrast response functions reported in the past (Albrecht and Hamilton 1982; Sclar et al. 1990).
Contrast response nonlinearities: temporal dynamics
The primary goal of the present set of experiments was to measure the temporal dynamics of the contrast response function. We were particularly interested in the dynamics of two nonlinear characteristics of cortical neurons, contrast-set gain control and response expansion. There is a substantial literature (seeintroduction) providing evidence for a contrast-set gain control mechanism (a contrast normalization) that maintains stimulus selectivity despite the limited response range of cortical neurons, and a response expansion mechanism (an accelerating nonlinearity, e.g., half-squaring) that sharpens the stimulus selectivity of cortical neurons. Most of the evidence for these mechanisms derives from measurements that used extended (steady-state) stimulus presentations. However, it is important to measure the temporal dynamics over short time intervals because under many circumstances, the retinal stimulus changes frequently due to eye movements; for example, during saccadic inspection, the average duration of a fixation is approximately 200 ms. In general, the local retinal contrast and spatial pattern can vary greatly from one fixation to the next, and hence, the contrast-set gain control and response expansion mechanisms must operate very quickly if they are to maintain and enhance selectivity during any single fixation within the rapid progression of images created by eye movements.
To characterize the dynamics of the contrast-set gain control and response expansion mechanisms, over a time scale comparable to a single fixation, we measured the response as a function of contrast using transiently presented stationary gratings. We then analyzed the development and the dynamics of the contrast response function during the course of the temporal response, based upon PSTHs with time bins that varied in different analyses from 2 to 20 ms. We found that the basic properties of the contrast response function, including the contrast-set gain control and the response expansion, were fully operational within the first 10 ms after the onset of the response. Further, we found that the contrast response function was relatively invariant throughout the entire course of the temporal response in spite of the considerable heterogeneity in the shape of the temporal response profile from cell to cell and the complexity of the profile within an individual cell.
Generalizing the invariance to other stimulus dimensions
In this report, we have shown that the shape of the contrast response function is relatively invariant from the onset of the response through the entire temporal response, and that the shape of the temporal response profile is relatively invariant with contrast. Such invariance may not hold for all stimulus dimensions. For example,Ringach et al. (1997) reported that orientation tuning can change over the course of the initial temporal response (see, however, Gillespie et al. 2001; Müller et al. 2001).
Using the same methods described in this report, we measured the temporal dynamics of the spatial frequency response function (Albrecht and Geisler, unpublished observations): Spatial tuning was measured in small time intervals (2–20 ms) throughout the 200 ms after stimulus onset. We found that for many cells the shape of the spatial frequency tuning changes through time. For these cells, the responses to lower spatial frequencies appear to occur sooner than the responses to higher spatial frequencies, in a systematic progression (see also Mazer et al. 2002). Thus over time the preferred frequency shifts from lower frequencies to higher frequencies, the high-frequency cutoff increases, and the overall bandwidth increases.
Latency of the response
The latency of the response of V1 neurons decreases as contrast increases; further, the shift is determined by the magnitude of the contrast (Albrecht 1978, 1995; Carandini and Heeger 1994; Carandini et al. 1997; Dean and Tolhurst 1986; Gawne et al. 1996;Reich et al. 2001; Reid et al. 1992;Richmond et al. 1997). This contrast-induced latency shift is what one would expect from the shunting inhibition proposed in the normalization model (Carandini and Heeger 1994;Carandini et al. 1999) (see discussion in the following text) and in earlier models of temporal dynamics (e.g.,Craik 1938). Within the context of such models, the changes in the temporal dynamics may have beneficial consequences, because at higher contrasts, the responses arrive sooner and the temporal resolution is improved. It has also been suggested that the variation in the latency as a function of contrast could potentially be utilized by subsequent neurons to signal the magnitude of the contrast (Gawne 1999; Gawne et al. 1996;Reich et al. 2001; Richmond et al. 1997).
It is interesting to note that there is a good deal of variation in the latency of the response, from cell to cell, even at the highest contrasts (see Figs. 1-6 and 8). Given that the output of V1 neurons is utilized by subsequent brain regions for many different visual functions and that the time constraints for each of these functions are different, it seems likely that those functional decisions that must be made very rapidly (e.g., directing the next saccadic eye movement) would have to depend on the output of those neurons with the shortest latencies (e.g., Fig. 5 B); whereas those functional decisions that can integrate over the entire fixation interval (e.g., discrimination and identification performance; see Fig.16) could rely upon the output of all of the neurons, even those with the longest latencies (e.g., Fig.5 J).
Contrast discrimination as a function of integration interval
Although the duration of a single fixation is, on average, approximately 200 ms, the temporal response to a stationary grating is often quite transient: most of the action potentials occur at the onset of the response (see for example Figs. 1-6 in this report; or see Fig. 1 in Müller et al. 1999). It is worth considering the implications of this temporal response pattern for the functional performance of single neurons. Here we consider the implications for contrast discrimination performance.
We know from standard signal detection analysis that a longer temporal integration interval improves discrimination performance, assuming1) that there are no time constraints on decision making,2) the variance is proportional to the mean, 3) the relationship of the variance to the mean does not depend on the duration of the integration interval, and 4) the base rate is zero. Discrimination performance improves with a longer integration interval because the sum of the spikes increases more rapidly than the standard deviation of the sum and thus the signal-to-noise ratio increases.
There are various properties of visual cortex neurons that, to some extent, violate the assumptions noted in the preceding paragraph. First of all, there are constraints on the time available for decision making: 1) many natural tasks require rapid decisions;2) single fixations are on average only 200 ms;3) responses to stationary stimuli often decay in less than 200 ms; and 4) other constraints might impose even shorter integration intervals (e.g., planning the next eye movement). Second, although the variance is generally proportional to the mean in cortical cells (e.g., Snowden et al. 1992; Softky and Koch 1993; Tolhurst et al. 1983; Vogels et al. 1989; for a general review, see Geisler and Albrecht 1997, 2000), the value of this ratio generally increases as a function of the integration interval. Also, at high firing rates (when optimal stimuli are presented at high contrasts), the refractory effects tend to regularize the inter-spike interval (Geisler et al. 1991), which would decrease the value of the variance proportionality constant. Third, the base rate may not always be zero: integrating when the response to the stimulus is small relative to the base rate decreases the signal-to-noise ratio.
Given all of the potential violations of the assumptions in the standard signal detection analysis and the uncertainty regarding how they might interact, it is not obvious how discrimination performance will be affected by the temporal integration interval. The measurements reported here provide an opportunity to determine how contrast discrimination performance is affected by the integration interval (for related work, see Frazor et al. 1998;Müller et al. 1999, 2001).
Figure 16 shows the contrast discrimination thresholds for 12 different visual cortex neurons as a function of the integration interval. (Refer to the figure caption for details.) As can be seen, for most of the cells in this figure, the threshold decreases rapidly as the integration interval increases. For some of the cells (B, F,and J), the discrimination threshold continues to decrease over the entire 200 ms. For other cells (C, D, G, H, andL), the discrimination threshold reaches an asymptotic value before 200 ms. The points at which these cells reach their asymptotic values vary from 50 ms (C) to 150 ms (D). For still other cells (A, E, I, and K), the discrimination threshold decreases to a minimum value and then begins to increase beyond a certain point in time (cf.Müller et al. 2001). The point of minimum varies from approximately 50 ms (I) to 150 ms (E), and the magnitude of the rise at 200 ms varies from 10% of the minimum value (E) to 50% of the minimum value (A). For the entire group of cells, the value of the minimum contrast threshold varies from approximately 3% contrast to 20% contrast, which is similar to the contrast discrimination performance reported previously using steady-state drifting gratings and a 200 ms integration interval (Geisler and Albrecht 1997).
These results are similar to those recently reported byMüller et al. (2001). The initial rapid decrease in contrast threshold occurs because small increases in the integration interval around the peak of the response produce large increases in the total number of spikes. However, because the response decays rapidly there is little value (or at least diminishing returns) in integrating the responses over longer time periods. Further, integrating over the entire 200 ms can sometimes be deleterious (e.g., the cell shown inA) because, as noted in the preceding text, the variance proportionality constant generally increases as the integration interval increases and because of the detrimental effects of integrating when the response is small relative to the base rate.
Contrast gain control and contrast adaptation
We (Albrecht et al. 1984) and others (Bonds 1991; Müller et al. 1999; Ohzawa et al. 1985) have used the terms “gain control” and “adaptation” interchangeably to describe a wide variety of contrast-induced changes in the gain of visual cortex neurons. These various gain changes span a wide range of different time scales from a few milliseconds to as much as a minute. Each of these various gain changes has a unique set of properties. Although the use of these terms is generally accurate within the context of any specific phenomenon under consideration, the terms might become confusing at best, and misleading at worst, when they are used virtually interchangeably to describe qualitatively different phenomena within a more general context.
Consider, for example, a comparison of two specific phenomena that have both been referred to as contrast gain control: 1) the very rapid contrast gain control that occurs within a matter of milliseconds [as demonstrated in this report and elsewhere: for example,Geisler and Albrecht (1992)] and 2) the slower contrast gain control that occurs over the course of seconds (e.g., Albrecht et al. 1984; Bonds 1991;Ohzawa et al. 1985). For the purposes of this discussion alone, we will term the former, the “fast” type, and the latter, the “slow” type. These two phenomena might be manifestations of the same basic mechanism viewed over very different time frames; on the other hand, the two might be qualitatively different. The terms adaptation and gain control have been applied to both phenomena, and in principle, this is not inaccurate. However, if the two are1) qualitatively different, 2) dependent on very different biophysical mechanisms, and 3) ultimately have very different functional roles, then using the same terms to describe both types of gain control could potentially be confusing and misleading.
Now consider some of the differences between the fast and slow types of gain control. First, the time courses are quite different. As demonstrated in this report (e.g., Figs. 9 and 10), the fast type achieves full strength within 10 ms after the onset of the response. On the other hand, for the slow type, it takes approximately 15 s to achieve two-thirds of its full strength (e.g.,Albrecht et al. 1984). Second, the effects on stimulus selectivity are quite different: for example, the fast type preserves the spatial frequency tuning (Albrecht et al. 1982;Geisler and Albrecht 1997), whereas the slow type distorts spatial frequency tuning (Albrecht et al. 1984;Movshon and Lennie 1979). Third, the effects of optimal and nonoptimal inducing stimuli are equivalent for one type of gain control but not for the other. For the slow type, the magnitude of the gain change is smaller for nonoptimal stimuli (see Fig. 8 inAlbrecht et al. 1984), whereas for the fast type, the magnitude of the change is approximately equal for both optimal stimuli and nonoptimal stimuli (e.g., Figs. 11-15). In fact, the fast type of gain change can be evoked by a contrast stimulus that produces no action potentials whatsoever (Geisler and Albrecht 1992).
Finally, consider two of the functional consequences of the two types of gain control. First, consider the effects on contrast discrimination. The slow type adjusts the gain slowly enough (on the order of seconds) that it could potentially reposition the neuron's limited dynamic response range so as to maintain high sensitivity to transient contrast changes, analogous to the slow types of light adaptation that maintain sensitivity to transient intensity changes (see, however, Barlow et al. 1976). On the other hand, the fast type changes the gain so rapidly that it is of no value for discriminating transient contrast changes that occur on a time frame greater than 10 ms. Further, temporal integration in the retina removes transient contrast changes that occur in a time frame of less than 10 ms. In other words, because the fast type of gain control is so rapid, it is only deleterious for contrast discrimination. (It is important to note, however, that this fast type of gain control is very beneficial for discrimination along other stimulus dimensions, such as orientation, spatial position, spatial frequency, and so forth.) Second, consider the effects of the two gain controls in the context of eye movements. The slow type builds up and decays so slowly it cannot adjust to the contrast within a single fixation; whereas the fast type adjusts to the contrast within the first few milliseconds of a single fixation.
Taken together, these differences seem to indicate that the slow and fast types are probably not manifestations of the same phenomenon, even though we use the same set of terms to describe them both. The unique properties of the fast type perhaps warrant a unique name. It might be reasonable to term it “instantaneous contrast gain control” because in our experiments it appears to be present at full strength as soon as there is a measurable response from the cell. Of course, this gain control cannot be truly instantaneous because it must build up and decay over some time interval, however brief. For example, at low contrasts the first measurable response occurs after a greater time delay than at high contrasts, and thus the contrast gain control could well develop with a substantially slower time course for low contrast stimuli and still be at full strength by the time there is a measurable response.
Implications for cooperative cortical networks
The responses of visual cortex neurons are the result of a vast network of neural interactions, from the retina through the cortex. It seems likely that some of the contrast-set gain control is occurring within the retina as well as the LGN (for a review, seeShapley and Enroth-Cugell 1984). However, the phenomena associated with the instantaneous type of gain control described within this report and many previous reports (see the references in theintroduction) suggest that there might be a specific neural network within the visual cortex that is primarily responsible for this type of gain control. This leads one to wonder and to speculate about what kinds of neural interactions could produce very fast gain adjustments.
Within the cortex itself one might speculate that the gain control is the result of lateral connections from a local pool of similar neurons or perhaps from a local group of specialized fast-responding neurons whose sole purpose is to regulate the gain of the “other” neurons, based upon contrast. Such lateral connections could very well occur within 10 ms. However, if the delay associated with the lateral connections takes more than a few milliseconds to occur, one can eliminate these neural mechanisms as likely candidates for instantaneous contrast gain control; for example, the measured delay associated with the effects of lateral recurrent excitation is on the order of 100 ms (Douglas and Martin 1991; Douglas et al. 1995). It also seems possible that feedback connections (from other cortical areas) could potentially regulate the gain, given that it has been demonstrated that some types of cortical feedback (to the striate cortex) can occur within 10 ms (Hupe et al. 2001). However, if the delay associated with the feedback connections takes more than a few milliseconds to occur, one can eliminate these neural mechanisms as likely candidates for instantaneous contrast gain control; for example, the delay associated with some types of feedback is more than 50 ms (Lamme et al. 1998; Zipser et al. 1996). There are, of course, many other potential neural mechanisms that one could postulate (see following text). To the extent that one can determine the time course of these other possible neural mechanisms, one can evaluate their viability based on the known time constraints.
Models at different levels of analysis
There are many different models, at different levels of analysis, that have been proposed to describe various phenomena, structures, and functions associated with the behavior of visual cortex neurons. In this discussion, we define three basic classes of models, at three different levels of analysis. Each class of model has a different goal and each has a unique and important role to play in our understanding of visual cortex neurons.
At one end of the continuum, there are models that might be termeddescriptive models. The goal of a descriptive model is to provide atheoretical mathematical equations that accurately describe the pattern of responses within the context of specific stimulus situations without any consideration of potential theoretical neuronal structures or functions. At the other end of the continuum, from this perspective, there are models that might be termedstructural models. The goal of a structural modelis to specify the detailed biophysical and biochemical mechanisms within and between neurons. Somewhere in between these two ends of the continuum, there are models that might be termed functional models. The goal of a functional model is to characterize the operations that are being performed by the neuron or neural circuit at an algorithmic, information processing level of analysis without committing to any specific biophysical or biochemical mechanisms. An important role of descriptive models is to guide the development of functional models, and an important role of functional models is to guide the development ofstructural models.
Examples of descriptive models include: the Naka-Rushton equation, used to describe the responses as a function of contrast (e.g., Albrecht and Hamilton 1982; DeAngelis et al. 1993; Kayser et al. 2001; McLean and Palmer 1994; Sclar et al. 1990); the four-parameter linear phase equation, used to describe the spatiotemporal phase transfer function (Albrecht 1995;Hamilton et al. 1989; Lee et al. 1981;Reid et al. 1992); the ellipse equation, used to describe the responses as a function of orientation (Movshon et al. 1978; Reid et al. 1991); the Gaussian equation, used to describe the time course of the responses to stationary gratings (Müller et al. 2001); and, the sinusoidal equation, used to describe the responses to binocular stimuli (Freeman and Ohzawa 1990).
Examples of functional models for visual cortex neurons include: the standard linear spatiotemporal model (e.g., Adelson and Bergen 1985; DeAngelis et al. 1993, 1994;De Valois et al. 1982; Hamilton et al. 1989; Hubel and Wiesel 1962, 1968;Movshon et al. 1978; Palmer et al. 1991;Watson and Ahumada 1985; for general reviews, seeCarandini et al. 1999; Robson 1975;Shapley and Lennie 1985); the contrast-gain exponent model (Albrecht and Geisler 1991; see Fig. 12 inGeisler and Albrecht 2000); and, the functional half-squaring normalization model (Heeger 1991). Thesefunctional models evolved from earlier descriptive models.
Examples of structural models for response expansion and contrast gain control include: expansive voltage-spike transduction, noisy membrane potential, recurrent excitation, intracortical inhibition, correlation-based inhibition, synaptic depression, nonspecific suppression, shunting inhibition, tonic hyperpolarization, strong push-pull inhibition, and changes in membrane conductance. Some of these proposed models rely upon feedforward inputs, some rely upon feedback inputs, and others rely upon lateral inputs. Some of these inputs come through local connections, while others come through the far-reaching interconnectivity among cortical neurons. For recent discussions and reviews of this literature and related issues, seeAbbott et al. (1997), Adorjan et al. (1999), Anderson et al. (2000a,b),Carandini et al. (1999), Chance et al. (1998), Douglas et al. (1995), Ferster and Miller (2000), Gilbert et al. (1990),Hirsch et al. (1998), Kayser et al. (2001), Miller and Troyer (2002), Murthy and Humphrey (1999), Nelson et al. (1994),Somers et al. (1995), Stetter et al. (2000), and Troyer et al. (1998). These proposed models might not be mutually exclusive. Further, it is possible that many of the mechanisms in these structural models might operate in a cooperative fashion to accomplish the important functional operations of contrast normalization and response expansion.
Descriptive model: invariant response model
Given that the temporal response profiles reported here are relatively invariant as a function of contrast, and the contrast response functions are relatively invariant as a function of time, it is possible to develop a descriptive model by combining the three functions illustrated in Fig. 17: response amplitude as a function of time, response amplitude as a function of contrast, and response latency as a function of contrast. The equations for the model are given in the . In this model, the shape of the temporal response profile is scaled by the contrast response function and shifted by the contrast latency function.
The smooth curves in Fig. 18 illustrate the fit of this “invariant response model” for three neurons. As can be seen in these examples, this descriptive model provides a reasonably good description of the overall magnitude of the response as a function of time and contrast. However, there are several obvious deviations that are worth noting: The model tends to overshoot the responses in the earliest time bins, peak shortly after the maximum firing rate, and overshoot the responses in the last set of time bins. Further, for the cell shown in A, there is a secondary oscillation that increases as a function of contrast (see also Fig.6 E). Nonetheless, the descriptive function accounts for some 95% of the variation in the response. In comparison, the scaled and shifted average temporal response profile accounts for 99% of the variation. In B, the descriptive function accounts for 97% and the average profile accounts for 99%. In C, the descriptive function accounts for 95% and the average profile accounts for 99%.
We have fitted the invariant response model to all of the cells, and we have compared the results to the scaled and shifted average temporal response profile. Figure 19 shows a scatter plot that compares the percentage of the variance accounted for: The descriptive model is plotted along the horizontal axis and the average profile is plotted along the vertical axis. As can be seen, the descriptive model does not account for as much of the variance as the scaled and shifted average profile. Across all 50 cells, the scaled and shifted profile accounts for approximately 94% of the variance (on average, with a median value of 95%) and the descriptive model accounts for 91% of the variance (with a median value of 92%).
It is undoubtedly possible to develop a somewhat more accurate descriptive model using a function for the temporal response profile that has more parameters. However, the invariant response model provides a fairly accurate description of each neuron's response as a function of time and contrast. Descriptive models can be very useful in many different applications (see ). For example, descriptive models can be used for simulating the responses of populations of cortical neurons, and for implementing “neuron sampling” models of visual performance (Geisler and Albrecht 1997).
Functional model: contrast-gain exponent model
Some years ago we (Albrecht and Geisler 1991;Geisler and Albrecht 1997) proposed a functional model of cortical neuron responses, the “contrast-gain exponent model” that is quite similar to the half-squaring normalization model proposed by Heeger (1991, 1992a,b). The contrast-gain exponent model consists of four functional components: 1) a linear filter that determines the basic tuning characteristics of the neuron in space and time, 2) a contrast-set gain control mechanism (a divisive normalization mechanism) that reduces the response gain of the neuron as contrast increases, 3) an expansive response exponent (with a value greater than 1.0) that sharpens the neuron's tuning in space and time, and 4) a multiplicative noise source that causes the variance of the response to be proportional to the mean response. Heeger and we found that this class of models accounts for a wide range of phenomena that have been observed in the responses of striate cortex neurons and that it often successfully predicts the results of novel experiments.
The results of the experiments reported here are generally consistent with the predictions of the contrast-gain exponent model and the half-squaring normalization model. According to these models, the contrast gain control mechanism is responsible for the saturation and the scaling in the contrast response function as well as the maintenance of stimulus selectivity. To be effective, this kind of contrast gain control would need to operate quickly; for simplicity, it was assumed to operate instantaneously. The results reported here suggest that this may be a reasonable assumption within the context of a functional model even though it cannot truly be instantaneous at a biophysical level.
The contrast-gain exponent and normalization models also predict that the temporal response profiles to transient stimulation should be invariant with contrast. This prediction follows directly from the assumptions that the basic tuning characteristics are determined by a linear filter and that the contrast gain control and expansive response exponent operate instantaneously relative to the temporal linear filter. This prediction is consistent with the results reported here, although in some neurons, there are systematic changes in the shape of the profile with contrast (e.g., Figs. 6 E and18 A). Neither the contrast-gain exponent model nor the half-squaring normalization model contains mechanisms that can account for the systematic changes in response latency observed as a function of contrast or for the systematic changes in the temporal frequency tuning of cortical cells measured with steady-state drifting gratings (Albrecht 1995; Hawken et al. 1992;Holub and Morton-Gibson 1981). The structural model described in the following section is an elaboration of the half-squaring normalization and contrast-gain exponent models.
Structural model: biophysical normalization model
It is useful to consider how contrast gain control and response expansion are implemented within the framework of one rather comprehensive structural model that appears to be consistent with the rapid timing constraints described in this report as well as many other phenomena described in this and other reports: the “biophysical normalization model” of Carandini and Heeger (1994). This elegant model has recently been thoroughly described and evaluated by Carandini et al. (1999).
Within the framework of the normalization model, consider first the proposed biophysical mechanism for contrast-set gain control: it is a cooperative cortical network that is implemented through lateral input from a local pool of surrounding neurons, the normalization pool. The lateral input from the normalization pool is applied at the level of the cell membrane through the well-established neural mechanism of shunting inhibition (Coombs et al. 1955; Fatt and Katz 1953). Specifically, the equilibrium potential of the shunting conductance is approximately equal to the resting potential, thus as the contrast increases, the responses from the normalization pool increase, and the magnitude of the shunting conductance increases. As a result, the voltage potential across the cell membrane is scaled (or normalized) by the local luminance contrast. Consider next the proposed biophysical mechanism for the response expansion: This nonlinearity is implemented through an expansive transduction from the voltage across the cell membrane to action potentials.
The biophysical structural mechanisms of the normalization model appear to be consistent with many of the measurements described within this report, including our finding that contrast-set gain control and response expansion have a rapid onset. However, within the context of the normalization model, it is somewhat surprising that the temporal response profile (to a 200-ms temporal transient pulse) did not show obvious variations with contrast given that in principal as contrast increases and the shunting inhibition increases, the temporal frequency tuning should shift to higher frequencies. This shift in the temporal dynamics should be reflected in the response to a temporal transient (see the next section). Finally, note that shunting inhibition (untuned for orientation) was not observed in intracellular recordings (Anderson et al. 2000a) and that membrane time constants may not have the dynamic range required for the normalization model's explanations of the temporal effects of contrast. (See Chance et al. 1998; Kayser et al. 2001 for a discussion of these issues and for alternative proposals.)
Temporal dynamics: transient versus steady-state measurements
Quantitative measurements of both the amplitude and the phase of the temporal frequency transfer function of V1 neurons have been made using drifting gratings, which were presented for relatively long durations (e.g., many seconds) to approximate a steady-state condition (e.g., Albrecht 1995; Hamilton et al. 1989). There is a growing body of evidence to indicate that the temporal dynamics of the neurons under steady-state conditions may be different from (possibly even qualitatively different from) the temporal dynamics under transient conditions. Consider several apparent differences. First, Tolhurst et al. (1980) found that the temporal response profile for transient stationary stimuli was not linearly related to the temporal frequency tuning measured using steady-state stimuli: the responses to transient stationary stimuli decay more rapidly than expected from the steady-state temporal frequency transfer function. Second, the steady-state temporal frequency tuning changes substantially as the contrast increases (e.g.,Albrecht 1995; Hawken et al. 1992;Holub and Morton-Gibson 1981): the temporal frequency transfer function does not simply change by a scale factor. In comparison, the measurements reported here indicate that the temporal response profile for transient stationary gratings is relatively invariant as contrast increases: the temporal response profile appears to change simply by a scale factor. Third, using steady-state stimuli it has been demonstrated that the variability of cortical neurons is approximately proportional to the mean firing rate (e.g.,Geisler and Albrecht 1997; Softky and Koch 1993; Tolhurst et al. 1983). In comparison,Müller et al. (2001) have reported that this relationship does not hold for the initial transient response to stationary gratings.
Finally, temporal responses to drifting stimuli appear to be different from temporal responses to stationary stimuli, even if they are both presented in a transient fashion. Consider several apparent differences. First, Müller et al. (2001) have reported that the responses to stationary gratings are more transient than the responses to drifting gratings. Second, Müller et al. (2001) reported that both detectability and discriminability are better for moving gratings as opposed to stationary gratings. Third, we have measured and compared the responses of cortical cells to drifting and stationary gratings, both of which were presented for a transient 200-ms duration (Frazor 2002; Frazor et al. 1997, 1998). What we found was that 1) in agreement with Müller et al., the responses to transient drifting gratings are much more sustained (over the 200-ms interval) than the responses to transient stationary gratings. 2) Transient stationary gratings generally produce large off-responses (sometimes as large as, or even larger than, the on-responses), whereas the responses to transient drifting gratings generally do not. 3) Responses to transient stationary gratings often evoke complex secondary oscillations that are not generally seen in the responses to transient drifting gratings. Some of these differences between drifting and stationary gratings might be explained by fast local adaptation mechanisms (in the retina, LGN, and cortex) that are induced by stationary stimuli (Müller et al. 2001; see also Shapley and Enroth-Cugell 1984) and latency differences between various subpopulations of inputs (Sherman et al. 1984).
These unexplained differences between steady-state versus transient measurements and drifting versus stationary measurements indicate that it is important to make measurements using transient stationary stimuli (as well as drifting steady-state stimuli). Further, measuring responses with transient stationary stimuli is important because such stimulus conditions are more similar to what occurs during natural viewing of stationary stimuli. Finally, these unexplained differences among the various measurements indicate that much remains to be learned about the temporal dynamics of cortical neurons.
We thank the anonymous reviewers whose efforts greatly improved this report.
This research was supported by the National Eye Institute (EY-02688) and the University of Texas.
Address for reprint requests: D. G. Albrecht (E-mail:).
↵1 Given the inherent variability within the responses of cortical cells and the finite number of repeated presentations in the stimulus protocol, it is possible that some fraction of the residual variation is a consequence of stochastic variation and not a consequence of any systematic variation in the PSTHs (at the different spatial phases). To determine what value of the residual variation would not be expected from chance alone (with a 99% level of confidence), we performed the following analysis: (1) We measured the mean-variance relationship for a given cell. (2) We took the average PSTH (the “original” average PSTH) and scaled the amplitude of the average for a given spatial position. (3) We introduced random variation around the scaled average PSTH, given the mean-variance relationship for the specific cell and the number of repeated presentations. (4) We repeated the second and third steps for each of the eight positions. (5) We then fitted the “new” randomized values as described in the text (i.e., we determined eight scale factors for the “new” average PSTH). (6) We computed the percentage of variation accounted for (by the new average) and the percentage of residual variation. Steps 2 through 6 were repeated on 10,000 occasions to obtain the sampling distribution for the estimate of the average residual variation that would be expected based on the specific properties for each cell, under the null hypothesis that there was no systematic variation in the shape of the PSTH at the different spatial positions. In this exercise, we know that there is no systematic variation in the shapes of the PSTH at the different spatial phases, and therefore any residual variation should be what one would expect from chance alone (i.e., stochastic variation). The sampling distribution was evaluated in order to determine how large the percentage of the residual variation would have to be to occur with a probability of <0.01; that is, a 99% level of confidence. In other words, one can conclude that the residual variation is systematic, with a 99% level of confidence.
↵2 If the response is linearly related to the contrast, then a simple two- parameter linear function provides a good description of the contrast response function. Nonetheless, the Naka-Rushton equation will also provide a good fit; however, the half-saturation parameter and the response maximum parameter may not provide an accurate estimate of the contrast that evokes half of the maximum response or the response at the maximum contrast, respectively. Therefore to estimate these two statistics as accurately as possible, one can determine their values by reading them off from the parameter-optimized Naka-Rushton equation. For a related discussion of this issue, see Müller et al. (2001).
- Copyright © 2002 The American Physiological Society
Invariant response descriptive model
Figure 17, A–C, graphically illustrates the three-component descriptive functions of the invariant response model. First, consider quantifying the shape of the PSTH (C). For most cells, a simple Gaussian function provides a good description of the initial transient rise and fall of the PSTH (as noted byMüller et al. 2001). However, as shown in Figs.1-6, in addition to the transient portion of the response, many cells have a sustained plateau that is not captured by a simple Gaussian. This sustained portion can be quantified by incorporating an additional “½ Gaussian.” The second ½ Gaussian is added to the whole Gaussian after the temporal response reaches the point of maximum Equation A1Equation (A1) describes the relative response as a function of time, rt , where ςa is the Gaussian half-bandwidth, ςb is the ½ Gaussian half-bandwidth, α gives the relative magnitude of the Gaussian to ½ Gaussian, t is time, and τc(c) is the latency to the peak of the response (which may depend on the contrast). Although this descriptive function (which we term a “1½ Gaussian”) will not incorporate all of the diversity, from cell to cell, illustrated in Figs. 4 and 5 (e.g., those cells with secondary oscillations, etc.), Fig. 19 shows that it provides a good description for most cells. It would, of course, be possible to incorporate additional components to describe the more unusual characteristics of some cells (e.g., the secondary oscillations); however, this would increase the number of free parameters and the complexity of the overall resulting descriptive function.
Second, consider quantifying the shape of the contrast response function (Fig. 17 B). For most cells, the relative response as a function of contrast, rc , can be described using the Naka-Rushton equation Equation A2where n is the response exponent,c 50 is the half-saturation contrast, and c is contrast. Many studies have shown that this function provides a good fit to the contrast response function of striate cortex neurons (e.g., Albrecht and Hamilton 1982; DeAngelis et al. 1993; Geisler and Albrecht 1997; McLean and Palmer 1996;Sclar et al. 1990; Tolhurst and Heeger 1997).
Third, consider quantifying the latency to the peak of the response as a function of contrast (Fig. 17 A). For most cells the latency can be described using an inverted Naka-Rushton equation Equation A3where τmax is the maximum latency to the peak of the response at the lowest contrast, τshift is the maximum possible shift (decrease) in latency, ε is the latency shift exponent,s 50 is the latency shift half-saturation, and c is contrast. It has been demonstrated that this function provides a good fit to the contrast induced latency shifts (Albrecht 1995) (see also Fig. 14 in this study). Substituting Eq. A3 into Eq.A1 and then multiplying by Eq.A2, expresses the relative response as a function of time and contrast; this is graphically illustrated in Fig.17 D. Finally, the maximum firing rate,r max, and the spontaneous discharge,r 0, can be incorporated to express the absolute response, r, as a function of time and contrast Equation A4The mean ± SD and median of the parameters for theinvariant response descriptive model, for the cells in this study, are given as follows : ςa = 19.0 ± 1.91, 13.6; ςb = 761 ± 76.5, 543; α = 0.27 ± 0.03, 0.23; n = 2.4 ± 0.18, 2.2; c 50 = 38.7 ± 3.51, 32.3; τmax = 121 ± 4.53, 114; τshift = 65.3 ± 3.48, 61.2; ε = 1.80 ± 0.28, 1.18; s 50 = 24.6 ± 3.27, 23.1; r max = 81.8 ± 12.2, 50.9.
We note that descriptive models, such as the one described here, have proven to be useful in a variety of different applications, including1) summarizing descriptive statistics, in a unified fashion, across a sample of cells; 2) testing various competing hypotheses; 3) assessing the performance of neurons in tasks such as discrimination or identification; 4) performing randomization tests of a particular null hypothesis; and 5) developing functional models, in concert with descriptive functions for other dimensions, to predict the responses of single neurons or populations of neurons under a wide variety of stimulus situations.