Studies on processing in primary visual areas often use artificial stimuli such as bars or gratings. As a result, little is known about the properties of activity patterns for the natural stimuli processed by the visual system on a daily basis. Furthermore, in the cat, a well-studied model system for visual processing, most results are obtained from anesthetized subjects and little is known about neuronal activations in the alert animal. Addressing these issues, we measure local field potentials (lfp) and multiunit spikes in the primary visual cortex of awake cats. We compare changes in the lfp power spectra and multiunit firing rates for natural movies, movies with modified spatio-temporal correlations as well as gratings. The activity patterns elicited by drifting gratings are qualitatively and quantitatively different from those elicited by natural stimuli and this difference arises from both spatial as well as temporal properties of the stimuli. Furthermore, both local field potentials and multiunit firing rates are most sensitive to the second-order statistics of the stimuli and not to their higher-order properties. Finally, responses to natural movies show a large variability over time because of activity fluctuations induced by rapid stimulus motion. We show that these fluctuations are not dependent on the detailed spatial properties of the stimuli but depend on their temporal jitter. These fluctuations are important characteristics of visual activity under natural conditions and impose limitations on the readout of possible differences in mean activity levels.
How are sensory stimuli represented and processed in cortical networks? Which stimulus properties determine the neural activity at a given processing stage and which properties of an activity pattern are relevant for the representation of the stimulus? These questions have been central to neuroscience research for a long time. However, we do not have a conclusive answer at hand. In cat visual cortex, one of the best-studied sensory systems, most of our knowledge is based on the activity of single neurons recorded in anesthetized subjects stimulated with artificial stimuli. This experimental paradigm, however, has several limitations.
First, most experiments recording in cat primary visual cortex (V1) use artificial stimuli such as bars, gratings, or texture fields. Although these are mathematically well defined and easy to generate, they contrast with the natural scenes encountered by a mammalian visual system on a daily basis. Natural stimuli are very complex and their statistical structure differs markedly from that of the stimuli used in most experiments. It is presently unclear whether results obtained using artificial stimuli can be extrapolated to the processing of natural scenes. Often, the stimuli are stationary or smoothly varying and their parameters, such as the orientation of a grating, are matched to the preferred properties of the neuron recorded. The response properties are then inferred from the steady-state responses to these stimuli. Under natural conditions, however, stimuli are not optimal for most neurons and the visual system cannot wait for a steady-state response, but must use whatever activity pattern the stimuli elicit to build a representation of the visual scene.
Second, the anesthesia used in many experiments allows convenient electrophysiological experiments and prolongs the time available for recording. However, anesthetics can have profound influences on the response properties (Lamme et al. 1998; Lee 1970; Roberston 1965) and especially in the cat visual system little is known about activities in awake animals.
Third, most experiments record spikes from isolated single cells, usually one or a few cells at a time. For statistical analysis, data are not averaged across a large population of neurons, but over repeated trials with identical stimulation. This is in contrast to the situation of a single neuron: it receives input from a large population and does not average over repeated trials, but enacts its input–output transformation continuously. The statistical properties of the activity in a population of neurons might, however, differ from those of the particular single unit under investigation.
The goal of the present study is to advance our understanding of the processing of natural scenes in awake animals. To do so we record population activities in the primary visual cortex of alert cats to natural movies. More specific, we measure stimulus-induced changes in the power spectrum of local field potentials and firing rates of multiunit spikes. Whereas the multiunit spikes reflect the activity of a small number of nearby neurons, the local field potential includes both pre- and postsynaptic potentials around the electrode tip. The natural movies used as stimuli were captured by a camera mounted on the head of a freely moving cat exploring different environments. These movies resemble the world as seen from a cat's perspective and are a good approximation to the animal's natural visual input during every day vision. To quantify the impact of different stimulus properties on the activity we compare responses to modified stimuli with altered statistical properties and altered motion. Finally, we quantify the temporal fluctuations imposed by the natural movies onto the activity pattern.
Surgical and recording procedures
Recordings were performed in 4 adult female cats. In each animal a microdrive was implanted under aseptic conditions. The animals were initially anesthetized using ketamine hydrochloride (20 mg/kg; Narketan, Chassot, Bern, Switzerland) and xylazine hydrochloride (1.1 mg/kg; Rompun, Bayer, Leverkusen, Germany). They were intubated, ventilated (30% O2- 70% NO2), and continuously anesthetized with isoflurane (0.4–1.5%). Body temperature, end-expiratory CO2, and blood oxygenation were continuously monitored and kept in the desired physiological range (37–38°C, 3–4%, 90–100%, respectively). The animals were infused with Ringer's lactate solution (40 ml/h) and received intramuscular injections of steroids and analgesics. Titanium screws (7–8) were fixed in the skull to later hold the implant. Two small holes were drilled and reference and ground electrodes were placed between dura and bone. Two small craniotomies (roughly 4 mm diameter) were made over areas 17/18 and 21 of one hemisphere according to stereotaxic coordinates (AP: –3, L: +2 and +8, respectively). After removing the dura the microdrive was positioned and fixed to the skull and the screws using dental acrylic (Stoelting Physiology). The cavity was filled with silicon oil. Nuts, later used to restrain the animal in the recording setup, were inserted into the implant and fixed in the acrylic. Recording sessions began only after the animal had fully recovered, usually after 4 days.
Each microdrive contained 4 movable electrodes (500–1,000 kΩ impedance), 2 of which were placed over the primary visual area 18 and 2 over area 21. The present project concentrates on the analysis of neuronal activity in primary visual cortex only. Signals were first passed through a 24-channel preamplifier (Neurotrack, 10× amplification) and finally amplified and digitized using a Synamps system (Neuroscan, El Paso, TX) at a resolution of 20 kHz.
Recordings were made at sites of different depths. The depth was estimated from the reading of the micromanipulator and the first and last site of a penetration where any visual stimulus evoked significant activity. The activity measures used in this project (i.e., spectrograms of local field potential power) yielded qualitatively similar results at recording sites of different depths. Furthermore, there was no qualitative change in the power spectrum of the local field potential between supra and infra granular sites (see also results). Therefore most results were averaged across all recording sites within one subject.
For recordings the animals were placed in a sleeve equipped with adjustable Velcro fasteners. This served the purpose to restrain the animal and to provide a comfortable position. This sleeve was placed in an acrylic tube, allowing stable and accurate positioning of the animal in front of the monitor and within a Faraday cage. To ensure a stable visual stimulation the head of the animal was fixed using screws inserted into the chronic implant holding the microdrive. Each recording session lasted roughly 15 min and each animal performed one or two sessions a day. We regularly checked the state of alertness of the subject either by direct visual inspection or using an infrared camera system. All procedures were in accordance with the national guidelines for use of experimental animals and conformed to the National Institutes of Health and Society for Neuroscience (U.S.) regulations.
To investigate activity pattern for natural time-varying stimuli we used a set of movies closely resembling the visual input to the cat's eye under natural conditions. To further determine the effect of specific properties of natural movies such as motion or higher-order statistics on the activity, we used a set of modified movies.
Natural scenes differ from classical lab stimuli in both their spatial as well as temporal properties. A uniformly drifting sine wave grating, for example, is characterized by a single spatial as well as a single temporal frequency. Time-varying natural scenes on the other hand contain a wide range of spatial and temporal frequencies. The contrast of the different frequencies can be computed from the amplitudes of the Fourier spectrum of the stimulus. Natural scenes have a characteristic amplitude distribution of both spatial and temporal frequencies. The properties determined by the amplitudes of the Fourier spectrum are also know as the second-order structure, or second-order correlations. The phases of the Fourier spectrum characterize the alignment of the different frequencies, and determine the higher-order structure inherent to natural scenes. Artificial images constructed from the amplitude spectrum of a natural image but with random phases of the different frequencies have a quite different appearance compared with the original image from which the amplitude spectrum was taken. Such images, known as pink noise, have a foggylike appearance lacking any visible object because of their random higher-order statistics.
We applied phase randomizations to natural movies in both the spatial and temporal domain simultaneously. The obtained stimuli have the same spatiotemporal power spectrum and thus the same spatial and temporal frequency distribution but lack the higher-order structure of the original natural movies.
The same principle, using an original stimulus and one with altered higher-order structure, was also applied to natural movies filtered with Gabor wavelets. These stimuli, which are based on a reconstruction of a natural movie from a wavelet representation, have a reduced content of spatial frequencies. The corresponding manipulated movies have the property that the alignment of different wavelets defining the stimuli is changed. In this way the local contrast edges defined by the Gabor wavelets are left unchanged but their global alignment is randomized.
The following gives a detailed presentation of the stimuli used in the present study (Fig. 2, left panel).
1) Square and sine wave gratings (Fig. 2A). Gratings were oriented either horizontally or vertically. The parameters of the sine wave grating (spatial frequency: 0.2 cycles/deg; temporal frequency: 4 Hz) were chosen to elicit strong responses in the recorded local field potentials. These parameters match the tuning properties of single units in area 18 (Movshon et al. 1978). The properties of the square-wave grating (1.2-deg width of a bar; drifting at 6.6°/s) were adapted to the statistical properties of the natural stimuli.
2) Natural movies (Fig. 2B). These were recorded from a camera mounted to a cat's head while the animal was exploring different local environments such as forests and meadows (for details see Kayser et al. 2003). Thus these videos incorporate the specific body and head movements of a cat. However, they do not include the animal's eye movements. Given that cats move their eyes infrequently and more slowly than do primates (Crommelinck and Roucoux 1976; Evinger and Fuchs 1978; Möller et al. 2002) this does prevent these movies from closely reproducing the spatial as well as temporal structure of a natural visual stimulus.
3) Pink pixel noise (Fig. 2C). For each natural movie we created a stimulus with the same second-order distribution of spatial and temporal frequencies but random higher-order correlations. This was done by computing the space–time Fourier transform over all movie frames and replacing the phase at each frequency by a random value between 0 and 2π. The inverse Fourier transform was applied to obtain the new stimulus. In total, 3 pixel noise stimuli were used, each constructed from one of the 3 natural movies.
4) Wavelet-filtered movies (Fig. 2D). These stimuli were constructed from a natural movie by applying a set of Gabor filters. The set of filters contained 6 equally spaced orientations, 3 spatial frequencies (0.6, 1.25, 2.4 cycles/deg), and a bandwidth of 1.1 octaves. For each frame of a video the amplitudes of all filters were computed at the critical spatial resolution to allow reconstruction of the original frame (Mallat 1989). Then each frame was reconstructed from these amplitudes. Applying these Gabor filters effectively corresponds to a band-pass filter in the spatial domain. As before, our stimulus set consisted of 3 wavelet-filtered movies, each obtained from one of the natural movies.
5) Pink wavelet noise (Fig. 2E). Similarly to the wavelet-filtered movie, this stimulus was reconstructed from the wavelet amplitudes obtained from the natural movies. However, before reconstruction, the relative alignment of different wavelets in space and time was altered by eliminating their higher-order correlations. The wavelet amplitudes computed from a natural movie form a 3D (two spatial and one temporal) matrix. In total there are 6 (orientations) × 3 (frequencies) wavelets and thus 18 such matrices for a movie. For each of these matrices the space–time Fourier transform was computed and the phases at each frequency were replaced by random numbers equally distributed between 0 and 2π. The matrices were then transformed back. The wavelet noise stimulus was obtained by reconstruction from these manipulated wavelet amplitudes.
Stimuli were presented in a block design. In each session one of 3 possible blocks was chosen randomly. Each block contained all 5 stimulus types listed above. The sinusoidal and the square-wave grating of either horizontal or vertical orientation (chosen randomly), i.e., 2 stimuli of type 1; all 3 clips of natural movies, i.e., 3 stimuli of type 2; each modification based on one natural movie, i.e., one stimulus of types 3, 4, and 5 each. This resulted in a total of 8 stimuli within a block. Each stimulus lasted 2 s. The stimuli were separated by a uniform screen (blank) having the same mean luminance as the stimuli, and also lasting 2 s. Each block was repeated 30 times within one recording session. Stimuli were presented full screen on a 19-in. Hitachi CRT monitor (120 Hz refresh rate) 50 cm in front of the animal, thus covering 40 × 30° of visual angle. The room was otherwise darkened. The color lookup table was manipulated to obtain a linear transformation between pixel intensity and luminance on the monitor. This was verified with a photometer (J1800 Luma Color; Tektronix, Wilsonville, OR) under monitor settings with 30 cd/m2 mean luminance. For recordings, we used a somewhat lower mean luminance value and a monitor radiation shield (3M, Switzerland) resulting in a mean luminance of 8 cd/m2. However, we would point out that the effective gamma value is of no great importance, given that the different stimuli have a similar distribution of pixel intensities. The stimulus presentation was controlled by a Macintosh computer (Apple, Cupertino, CA) running custom-written MATLAB (Mathworks, Natick, MA) code based on the psychophysics toolbox extensions (Brainard 1997; Pelli 1997).
motion altered stimuli. In a second series of recordings we used stimuli with altered motion properties. The flow field describing the motion in a stimulus movie was estimated as described below. This flow field was decomposed into 2 components. First, a linear drift obtained as the linear component of the flow field. Second, the residual, after the linear drift was removed from the flow field, was termed jitter. These two components of the flow field were used to generate a set of stimuli. A single frame taken from a natural movie and a grating patch were used as the base images. The motion pattern imposed on the two base images matched either the linear drift or the jitter component of the flow field, thus giving a total of 4 motion-altered stimuli.
All stimuli were encoded as a series of 8-bit gray-scale frames at a resolution of 640 × 480 pixels. The pixel values in each frame were additively scaled so that each frame had the same mean intensity and multiplicatively scaled so that each frame had the same root mean square contrast.
Estimation of flow fields
The flow field characterizing the motion was measured on a frame by frame basis using standard techniques (see e.g., Beauchemin and Barron 1995). The optical flow was measured at each point of a grid covering the central 20° of the visual field. A patch of 30 × 30 pixels centered on the grid point was compared with patches at different positions in a range of 70 pixels in each direction in the next video frame. The comparison was based on the mean square difference after removal of the overall mean luminance of each patch. The best match defined the local optical flow. The global motion vector of the video was computed from this locally defined flow field as the arithmetic average.
Separate analysis was carried out for the local field potentials and multiunit activity, both implemented in MATLAB. The local field potential was extracted by low-pass filtering the recorded signals below 500 Hz. Each recording session was cut into single trials, with a trial consisting of one stimulus plus a part of the blank before and after it. Sometimes the signals were contaminated by movement artifacts. If in any channel the maximal amplitude of the recorded signals exceeded 5 times the SD of the signal, taken over the entire session, the trial was discarded. Usually <5% of the trials had to be discarded. The activity in the local field potential was quantified by computing the time-localized Fourier spectrum (spectrogram) using windows of 160-ms length, overlaid with a Hanning window and zero-padded to 256 ms. The overlap of neighboring windows was 152 ms, leading to a nominal temporal resolution of 8 ms.
To determine the activity induced by the visual stimulus, changes in the power spectrum locked to the stimulus onset were computed. The power at each frequency was normalized by the SD of the power during 600 ms of the leading blank. This emphasizes changes in the power compared with the blank at each frequency separately. The resulting measure corresponds to a z-score and is a measure of the reliability of neuronal activation by the respective stimulus (de Oliviera et al. 1997; Logothetis et al. 2001). Qualitatively similar results were obtained by computing the percentage change of the signal amplitude at each frequency before the stimulus to during the stimulus (data not shown).
For the analysis of the multiunit activity, the signals were high-pass filtered at 500 Hz. Spikes were identified by applying a threshold of 3 SDs of the signal and stored at a resolution of 1 ms. Visual inspection showed the reliability of this automated measure. Units that did not show a modulation for any stimulus of at least a factor of 2 compared with the blank (i.e., spontaneous activity) were discarded.
The effectiveness of different types of stimuli
A total of 165 sessions were recorded in 4 animals (cat L: 55; cat S: 49; cat F: 42; cat M: 19 sessions). An example of raw data from the local field potential during presentation of a natural movie is shown in Fig. 1A. Evidently, the recorded signal was modulated by the stimulus. The spectrogram resulting from these data is shown in Fig. 1B. After stimulus onset an increase in power at most frequencies is observed. To emphasize the activity elicited by the stimulus compared with the leading blank, each frequency axis was normalized by its SD during the blank (Fig. 1C; see methods). This normalized spectrogram reveals that the strongest increase of power does not occur in the gamma range (30–80 Hz) but at frequencies above 80 Hz. A decrease is observed below frequencies of 20 Hz.
Local field potential activity for different stimuli
The following section compares the activity elicited by the different stimuli in one subject in detail.
Figure 2 summarizes the local field potential activity averaged across recording sites (n = 48) at different depths. Common to all stimuli is evoked activity, characterized by an increase in power at all frequencies, just after the stimulus onset. In the following this evoked activity is ignored and the analysis concentrates on the “steady-state” activity after the onset response. Figure 2A shows the activity elicited by the gratings. It reaches a steady state after about 150 ms characterized by an increase in power in the range of 30–60 Hz, a moderate increase in the range of 80–130 Hz, and decrease below 20 Hz. The response is stationary and can be well summarized by averaging the 2D spectrogram over time. We define a modulation curve by averaging the spectrogram over the interval of 400–1,900 ms. This modulation curve shown on the right of the spectrogram quantifies the relative contribution of different frequencies to the response. For gratings it reveals a prominent increase around 40 Hz and a decrease at low frequencies.
We now compare this response pattern to that elicited by natural movies. Figure 2B shows the spectrogram averaged across all presentations of the different natural movies. The response shows a phasic pattern, with large variations of activity over the time course of the stimulus. Again, by averaging over time a modulation curve is obtained showing a decrease for frequencies below 20 Hz, a strong increase in the gamma range, and an increase in activity at frequencies well beyond 80 Hz. Thus the activity pattern for natural movies differs from that for gratings: strong activation at frequencies above 80 Hz and large fluctuations during the “steady-state” response.
The response to the pink pixel noise stimuli includes a large range of frequencies similar to the response to natural movies (Fig. 2C). Furthermore, pink noise also elicits an irregular activity pattern over time. For this stimulus the modulation curve is similar to that for natural movies (6.8% deviation over the whole frequency range). The pink pixel noise activates the same frequencies with similar amplitudes despite the different higher-order statistics and the resulting very different appearances of the two stimuli. Thus the local field potential is most sensitive to the second-order statistics of the pixel intensities, but not to their higher-order correlations.
In contrast to natural movies and the pixel noise, the wavelet manipulations show a much weaker stimulus-induced change in the local field potential (Fig. 2, D and E). However, their modulation curves reveal a similar pattern as for natural movies: a decrease below 20 Hz, an increase in the gamma range, and a second peak of activity above 80 Hz. Comparing the wavelet-filtered movie with the wavelet noise, the difference between the modulation curves is small (4.7% deviation over the whole frequency range). Thus the response of the local field potential seems influenced only by the local contrast edges of the filters, but not by their global alignment.
Magnitude of the response for the different stimuli
Comparing the modulation curves in Fig. 2 shows that most of their shapes are similar. Figure 3 demonstrates this similarity more quantitatively and compares the total response magnitudes. In Fig. 3A all modulation curves from Fig. 2 are redrawn in one graph. Scaling each curve by the area under the curve makes the similarity of the activity for natural stimuli and all their modifications evident (Fig. 3B). The scaling considerably reduces the mean absolute differences between all these curves (13.1% vs. 4.5 %, before vs. after, respectively). The grating stimuli, however, induce little activity beyond 80 Hz, resulting in a qualitatively different form of the modulation curve. For natural movies, pixel noise, the wavelet-filtered and the wavelet noise stimuli, the relative contribution of different frequencies to the response pattern is very similar, but differs qualitatively from that for gratings.
The only difference between the modulation curves for the natural stimuli and all their modifications relates to the mean amplitudes. These are a measure of the total response amplitude induced by the stimulus. The distribution of these amplitudes for the same subject is shown in Fig. 3E. The amplitudes for pixel noise and the natural movies are largest. In particular, the pixel noise lacking higher-order structure is at least as efficient as the natural stimuli. A similar observation holds for the wavelet-filtered and the wavelet noise stimuli.
Comparison of different subjects
Similar results as those described above were found in all subjects (n = 4). For two additional subjects the averaged modulation curves are shown in Fig. 3, C and D. The overall shape of the modulation curves differs between animals, an issue discussed further below. However, the activity patterns for natural movies and pink pixel noise are similar within all subjects. The same result holds for the comparison of the wavelet-filtered stimulus and wavelet noise. The scaling behavior described in the previous section holds for all subjects as well. Figure 3F summarizes the distribution of modulation amplitudes across all 4 subjects. The natural movies and the pixel noise lead to modulations of the same strength (P > 0.45, Wilcoxon test). The difference between the wavelet-filtered stimulus and the wavelet noise is significant (P < 0.05, Wilcoxon test). Also consistently across all subjects, the activity pattern for gratings differs in every respect from that for all other stimuli. Gratings elicit a steady-state response and mostly activate frequencies in the gamma range with a similar peak of activity in different subjects (cat L: 43 Hz; F: 51 Hz; S: 51 Hz; M: 45 Hz). Thus although the local field potential activity elicited by one particular stimulus can differ between subjects, the relative activity of different stimuli is similar across subjects and the results described above for one subject hold in general.
In each subject we recorded at sites of different depths (see methods). The modulation curves for each subject were averaged across all recording sites from this subject. It is known that the shape of the evoked potential in the local field potential changes shape, amplitude, and sign depending on the depth of the recording site (Freeman 1975). However, the spectrograms and the modulation curves are derived from the power spectrum of the signal. Therefore these measures are invariant with respect to a change in sign of the local field potential. The normalization of the spectrogram further eliminates an overall scaling affecting the stimulus as well as the blank period. We separately analyzed recording sites that clearly were in supra and infra granular layers (data not shown). Within all subjects, the shapes of the averaged supra and infra granular modulation curves for a given stimulus were quite similar. Thus the relative activity induced by different stimuli at supra and infra granular sites is similar. Additionally, for 3 subjects of which we had data from ≥5 supra and 5 infra granular sites, we compared these quantitatively: averaged across all frequencies and stimuli, the modulation was stronger at the infra granular sites (4.2, 5.2, and 5.8%; subjects L, F, and S, respectively). For comparison, the difference of the modulation curves between subjects, averaged across stimuli and frequencies, is 13.6%. Although we do not have an explanation for the difference in shape of the modulation curves between subjects, we can confidently exclude that this difference is the result of a sampling bias in the recording sites.
As a complement to the local field potential, which measures the activity in a region around the electrode tip and emphasizes synchronous signals (Abeles 1982), we recorded the spiking activity of multiunit sites. The example shown in Fig. 4A illustrates the responses of one recording site to a natural movie. The response is clearly modulated by the presentation of the stimulus and varies over time. Figure 4B shows the averaged firing rates during the tonic part (300 ms after stimulus onset until stimulus offset) for the different stimuli and this recording site. Natural stimuli are clearly effective in driving neurons in primary visual cortex. Comparing the stimuli with different higher-order statistics shows that natural movies and the pixel noise elicit similar firing rates. The same holds for the comparison of wavelet-filtered movie versus wavelet noise. To average across different recording sites one has to account for the differences in mean firing rate between sites. To do so, the mean rate over all stimuli was computed for each site and the rates for the individual stimuli were divided by this mean. The average across all sites (n = 33, 2 subjects) is shown in Fig. 4C. The natural movies elicit slightly but significantly stronger firing rates than the pixel noise (1.28 and 1.20 times the mean rate, respectively; P < 0.05, Wilcoxon test). This quantitative difference (6.6%) roughly matches the result from the local field potential amplitudes (0.1%, Fig. 3F) and is much smaller than the difference from the amount of activity induced by the other stimuli. Thus both the firing rate of multiunits and the local field potential amplitudes are only weakly sensitive to the higher-order statistical properties of the stimuli.
Temporal structure of the activity pattern
The above results show that natural movies elicit a variable and irregular temporal activity pattern (Fig. 2A, Fig. 4A). The following section investigates the cause of this temporal structure and compares it quantitatively to proposed modulations of the activity reported in previous studies on visual representations.
The natural movies used for stimulation contain many irregularities in their temporal structure such as short and rapid translations. To quantify their impact on cortical activity, cross-correlations between the amplitude of the stimulus motion and the activity at different frequencies of the local field potential were computed. An approximation for the global stimulus motion was obtained from the average amplitude of the flow field (see methods). An example is shown in Fig. 5A together with the spectrogram of the local field potential. Visual inspection indicates a close correlation of neuronal activity and stimulus motion. The correlogram averaged over all recording sites in one subject is shown in Fig. 5B. As can be seen, the activity in the local field potential is correlated with the motion in the stimulus. This correlation is particularly prominent at frequencies beyond 80 Hz. The correlations are strongest at a time lag where the stimulus leads the activity by 50–100 ms, which is in agreement with known latencies of visual responses. This finding is consistent across subjects and stimulus movies (Fig. 5C).
The strength of the temporal variations can be quantified by the coefficient of variation (CV) of the power in the local field potential computed over time. The coefficients averaged across frequencies lie between 0.23 and 0.25 for the different subjects. Given that our stimuli closely match the visual input to a cat's eye under natural conditions these results highlight an important characteristic of activity patterns under natural conditions. Body and head motion lead to temporal changes in the stimulus, bringing repeatedly new stimuli onto the retina. These continuous changes in the retinal image evoke a series of visual transients. Therefore under real-world conditions, we do not observe what classically would be called a steady-state response, but response strengths are continuously changing.
In the following we analyze the magnitude of the evoked transients in the local field potential and the multiunit activity in a more quantitative way. A number of studies report modulations of the mean activity in the tonic part of neural responses. Examples of such modulations in primary visual cortex are, for example, contextual effects (Nothdurft et al. 2000) and figure ground modulations (Lamme 1995; Roelfsema et al. 1998), both of which are especially relevant for the processing of natural scenes. These studies report an average modulation of single-unit firing rates of the order of 25%. To quantify the impact of the fluctuations in our population responses we measure the length of the interval necessary to reliably detect a 25% modulation of the recorded mean activity.
Taking a limited sample of the response gives an estimate of the average activity with a certain precision. Using extended windows can increase this precision. The following analysis seeks for the length of the interval necessary to reliably (e.g., on the 5% level) detect a 25% modulation of the mean activity. This implies that our method should give a positive result in only 5% of the cases when no modulation is present. Simultaneously, the detection threshold should be that low that a 25% modulation is detected in half the cases. Spectrograms of local field potentials were computed as described above. From these the mean and the variance of the total energy in a window were determined as a function of the length of the window and the width of the frequency band. Using t-statistics we determined the minimum window length, where one sample is enough to satisfy the requirements for detection (i.e., P < 0.05) as explained above. Averaging over subjects and stimuli, the minimum window length in the gamma frequency domain (30–80 Hz) was 380 ms. At higher frequencies (80–250 Hz) the necessary window length resulted in roughly 380 ms as well. By pooling different frequency bands the sensitivity could not be increased because the variations in power were correlated across different frequencies. Obviously, higher modulations (40%) are quicker to detect (230 ms, gamma frequency range) and lower degrees of modulation (12%) require more time for significant detection (590 ms, gamma frequency range).
The temporal structure of natural movies elicits not only a highly variable activity in the local field potential, but also induces fluctuations in the spiking response of units (Fig. 4A). For these we performed a similar analysis searching for the shortest window allowing the discrimination of a given mean firing rate and an increased or decreased mean firing rate. The different percentages of offsets in mean firing rate resulted in similar window lengths as for the local field potential (12%: 460 ms; 25%: 420 ms; 40%: 400 ms).
The above analysis investigates the minimal temporal interval needed to reliably detect a modulation of the local field potential and multiunit activity. These signals do not necessarily match the input to a decoding neuron. Whereas the multiunit activity constitutes the activity of a local group of neurons and the local field potential reflects the neuronal activity in a region of a few hundreds of micrometers, neurons in the cortex may integrate signals from a larger domain. However, the fluctuations in the neuronal activity observed are correlated over large distances, thus making it unlikely that these fluctuations average out in the input to a decoding neuron. Recent results, furthermore, strengthen the link between population activity and behavior (Supèr et al. 2003). Thus fluctuations in population the activity are likely to affect the processing of incoming stimuli as well as behavioral responses. The intervals necessary to detect changes in the average activity reported above are long compared with visual latencies of cells in higher visual areas (Oram and Perret 1992; Perrett et al. 1982) and reaction times of experimental subjects (Thorpe et al. 1996; van Rullen and Thorpe 2001). Thus the transient fluctuations of the activity evoked by the motion the natural stimuli impose constraints on theories of cortical representations relying on differences in mean activations.
Stimuli with altered motion patterns
The difference in local field potential activity between gratings and natural movies could have several reasons. On the one hand, the stimuli contain different motion patterns, uniform and irregular, respectively. On the other hand, the stimuli have widely different spatial structures, simple and complex, respectively. To investigate which of these aspects are dominant we constructed 4 artificial stimuli. The flow field inherent to a natural movie was decomposed into 2 parts and imposed on a grating and a single frame of a natural movie (see methods). The resulting stimuli are a uniform drifting grating, a uniform drifting natural image, a jittering grating, and a jittering natural image.
The activity patterns for these stimuli were recorded in 2 animals (cat L: 7 sessions; cat S: 7 sessions) and are summarized in Fig. 6. The response to the uniformly drifting natural image is stationary over time as is the response to the drifting grating (Fig. 6, A and C). In contrast, the response to the jittering stimuli is irregular in both cases (Fig. 6, B and D). To quantify the temporal irregularity we compute the CV of the power averaged over frequencies. For the grating the difference between uniform (CV = 0.186) and jitter (CV = 0.208) motion is significant in all animals recorded (P < 0.05, Wilcoxon test). For the natural image the difference between uniform (CV = 0.186) and jitter (CV = 0.203) motion is significant in both subjects as well (P < 0.1). Comparing the effect of the spatial structure there is no significant difference neither for the 2 uniformly moving stimuli (P > 0.90), nor for the 2 jittering stimuli (P > 0.50). Furthermore, the difference between the jittering natural image and the natural movie is not significant (P > 0.30). Thus the irregular motion pattern is the dominant factor of the high temporal variability of the power in the local field potential.
For the 2 jittering stimuli Fig. 6, E and F shows the cross-correlation of the motion amplitude and the power of the local field potential at different frequencies. Both the gratings and the natural image lead to strong correlations extending over a large range of frequencies. Furthermore, for the more complex stimuli, the jittering natural image, and the natural video, the locking of the local field potential is most prominent at frequencies above 80 Hz (Figs. 5C and 6F).
However, the different stimuli cause not only differences in the temporal response but also in the frequency range activated. The uniform grating leads to a decrease below 20 Hz and an increase in the gamma range. In contrast, the jittering grating leads to an increase above as well as below 20 Hz (Fig. 6B). Interestingly, the peaks of high activity elicited by the jitter also recruited frequencies above 80 Hz. However, this is not obvious in the modulation curve because of temporal averaging. In fact, the two modulation curves are significantly different at frequencies below 31 Hz, but not above (2-way ANOVA, P < 0.05). The jittering natural image leads to a similar shape of the modulation curve as the real movie (Fig. 6D). The uniformly drifting natural image in contrast causes a stronger increase in power than the natural movie. The difference between the two modulation curves is significant at all frequencies above 70 Hz and most frequencies below 70 Hz. However, the basic shape of the modulation curve is the same.
Concluding, the above results show that temporally irregular stimuli induce locking of the local field potential to the stimulus motion independent of the detailed spatial structure of the stimulus. Thus temporal properties and irregularities of the responses reported above for the natural movies are not bound the spatial characteristics of natural images but occur for other reasonably complex stimuli. The detailed pattern of the response, like the frequencies of the induced local field potential, depends on both the spatial and the temporal properties of the stimulus in a complex way.
In this study we investigate processing of global structure of natural scenes in the primary visual cortex of alert cats. We measured local field potentials as well as multiunit firing rates for natural movies and modified movies with different statistical properties as well as gratings. The activity pattern for drifting gratings differs strongly from that for natural movies and this difference is contingent on both the spatial and the temporal properties of the natural movies. Furthermore, the activity elicited by pink noise stimuli and the activity elicited by natural stimuli does not differ significantly, indicating only a weak impact of the higher-order stimulus statistics on the responses. Natural movies elicit a very irregular temporal activity pattern. We show that the temporal structure is dominated by the motion in the visual input and quantify this irregularity that strongly limits the concept of a steady-state response.
Limitations of the present study
We recorded from awake animals that, although their head was fixed, were freely watching the stimuli and moving their eyes. However, instead of excluding some eye movements (e.g., saccades) from analysis or performing separate analysis during fixations and eye movements we averaged across all eye positions. Thus we cannot directly exclude whether eye movements as such, independent of the stimulus, are the cause of some of the response properties described. However, from other studies in the same recording setup it is known that cats move their eyes much more infrequently up on stimulation with natural movies than, say, humans do (Möller et al. 2002). The average intersaccade interval (>3 s) is much longer than the window used for analysis in the present study. Furthermore we measured eye movements in a few of our recording sessions and found no relation of eye movements to the timing of the stimulus presentation.
A second potential shortcoming is our limited knowledge about the nature of the local field potential and its relation to the activity of single neurons. Supposedly the local field potential is determined by electrical activity approximately 500 μm around the electrode tip and is influenced not only by spiking response of neurons but also by somatic and dendritic potentials, especially emphasizing synchronous components (Abeles 1982). However, because it is a population measure it is not susceptible to noise in the firing of single neurons and might give a better approximation to the relevant activity in a patch of cortex. Recently evidence was put forward showing a close relationship between the local field potential and measures from magnetic resonance imaging, especially the blood oxygen level dependant BOLD signal (Lauritzen 2001; Logothetis et al. 2001; Mathiesen et al. 2000). In view of the growing body of studies using functional MRI, especially with primate subjects, one can expect a larger number of studies with results comparable to those presented here. Thus although the sources of the local field potential are not known in detail it seems a well-suited measure for the study of population activity.
Are the results compatible with previous studies?
Several properties of the local field potential activity reported in this study are in agreement with results previously published. We find that stimulation with drifting gratings leads to strong activation in the classical gamma range. Using natural stimuli, however, we find a strong increase of activity at higher frequencies above 100 Hz as well. In a previous study, also recording from awake cats, the strongest increase in power on stimulation with flashed gratings was also found in the gamma range (Siegel and König 2003). However, the optimally orientation tuned frequency band determined in the latter study extended beyond 100 Hz. Together, these results consistently point to the relevance of high-frequency activity in the local field potential.
Recently a study recording in V1 of anesthetized cats reported locking of the local field potential activity to the time course of uniform moving and randomly accelerated gratings (Kruse and Eckhorn 1996). The present study extends this finding to awake animals and stimuli of different spatial structure. In particular we show that natural movies induce activity fluctuations locked to the stimulus motion. The study by Kruse and Eckhorn further reports a decrease of induced oscillations with increasing irregularity of the stimulus motion. The induced oscillations, as measured with the power spectrum of the local field potential, were most prominent for the uniform drifting grating. In the present study we do find this effect as well, albeit only on stimulation with natural stimuli.
Little is known about how activity patterns in the visual cortex depend on the global structure of the stimuli. So far only a small number of studies compared stimuli with different statistical properties. Lehky et al. (1992) compared firing rates in the primary visual cortex of anesthetized monkeys to image patches of different complexity. They found that complex stimuli such as random textures or 3D surfaces elicit larger firing rates than simple stimuli such as gratings. This result is in good agreement with our data showing stronger activity for natural movies and pink noise than for gratings.
Baddeley et al. (1997) measured responses in V1 of anesthetized cats to different natural movies and white noise stimuli. They reported larger firing rates for natural movies than for the white noise, although the differences were small (4 Hz vs. 2.5 Hz). Given these small differences in activity and the fact that they used a different type of noise, this difference is not a contradiction to the results reported here.
A further comparison of cortical activity to natural stimuli and pixel noise comes from functional imaging. Measuring BOLD responses in anesthetized monkeys, Rainer et al. (2001) reported significantly stronger V1 activation for natural images than for pink pixel noise.
Despite the relevance of natural stimuli to every day life, most studies use highly simplified stimuli and it is presently unclear which results generalize to processing of natural scenes. However, it seems clear that neurons in the primary visual cortex are well adapted to processing of natural scenes. In a seminal study Vinje and Gallant (2000, 2002) showed that V1 neurons in awake monkeys are well driven by natural movies, that information rate and transmission increase with increasing stimulus size and that response sparseness is increased by stimulation of the receptive field surround. Another study reporting efficient coding of natural scenes in V1 comes from recordings in anesthetized ferrets. Weliky and coworkers (2003) report high population and lifetime sparseness on stimulation with natural images. Interestingly this study reports that single cells responses are not enough to provide a reliable estimate of the local contrast in natural images but shows how a population encodes this information efficiently. Although a direct comparison of the results reported in these and the present study is impossible, the above studies provide support for the paradigm used here: First, to obtain response properties matching those under natural conditions large field stimulation with complex scenes is required. Second, a population measure might be more relevant than single-unit activity.
Implications for theories of object binding
Natural scenes contain many structures a human observer would classify as objects. The question how the visual system achieves this segmentation of an image into distinct objects and what the underlying neuronal mechanisms are lead to a number of theories. Two prevailing mechanisms have been proposed to explain how the responses of neurons responding to different features of the same object are bound together. The first, binding by synchrony, proposes that the spikes of two such neurons have a fixed temporal relationship (Eckhorn 1994; Singer and Gray 1995). Such response properties have been demonstrated using artificial bars (Gray et al. 1989) and figures defined by textural differences (Castelo-Branco et al. 2000; Gail et al. 2000; Woelbern et al. 2002). Following the second mechanism, a global visual structure is represented by the modulation of firing rates. Two neurons falling on parts of the same object show similar increases or decreases of activity. Evidence for this hypothesis comes from studies reporting differences in the tonic part of the response depending on whether the receptive field lies on an object or the background (Lamme 1995; Nothdurft et al. 2000; Roelfsema et al. 2002; Zipser et al. 1996). However, both mechanisms are hotly debated and no final conclusion has been reached so far (Lamme and Spekreijse 1998; Shadlen and Movshon 1999; Singer 1999).
What are the predictions of these mechanisms for the responses to the stimuli used in this study? The noise stimuli have a random phase spectrum and do not contain structures a human observer would classify as an object whereas the natural movies do. Thus following either binding mechanism there should be a significant difference in the response properties for these two stimuli. Following the binding by synchrony hypothesis we should see an increased level of synchronization for the natural movies compared with the noise. Assuming that this synchronization affects nearby neurons within a few hundred micrometers, such an increase in synchronization should affect the power of the local field potential. However, this is not what we observe. The modulation curves for pink pixel noise and natural movies are very similar. Following the binding by modulation hypothesis, one could expect a change in the overall level of activity between noise and natural stimulus. In the local field potential we do not see this, and the firing rates of the multiple units show an increase for natural movies by only a small amount. Furthermore, the variance of the activity over time during the “steady state” is large and severely limits a readout of changes in the mean activity. Our estimates of the necessary readout times to detect such differences are on the order of several hundreds of milliseconds. This questions whether such a mechanism can operate fast enough under real-world conditions. These considerations, however, have to be taken with a grain of salt. Whereas previous key experiments supporting either hypothesis (Gray et al. 1989; Lamme 1995) have been conducted using artificial stimuli, neither hypothesis has been elaborated in sufficient detail to allow a quantitative prediction and a hard experimental test under natural stimulus conditions. For example, we do not know enough about the statistics of “objects” in natural stimuli, about their spatial and temporal scale, to be certain that the methods applied are actually sensitive enough to elucidate the responsible mechanisms in the brain.
This work was supported by the Center of Neuroscience Zurich (C. Kayser), the Swiss National Foundation (Grant 31-65415.01 to P. König and R. Salazar), and the EU/BBW (IST-2000-28127, 01.0208-1 to P. König).
We thank G. Möller for help during the experiments, M. Siegel for inspiring discussions, and the referees for prompt and helpful comments on the manuscript.
The costs of publication of this article were defrayed in part by the payment of page charges. The article must therefore be hereby marked “advertisement” in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.
- Copyright © 2003 by the American Physiological Society