Mareschal, Isabelle and Curtis L. Baker, Jr. Temporal and spatial response to second-order stimuli in cat area 18. J. Neurophysiol. 80: 2811–2823, 1998. Approximately one-half of the neurons in cat area 18 respond to contrast envelope stimuli, consisting of a sinewave carrier whose contrast is modulated by a drifting sinewave envelope of lower spatial frequency. These stimuli should fail to elicit a response from a conventional linear neuron because they are designed to contain no spatial frequency components within the cell's luminance-defined frequency passband. We measured neurons' responses to envelope stimuli by varying both the drift rate and spatial frequency of the contrast modulation. These data were then compared with the same neurons' spatial and temporal properties obtained with luminance-defined sinewave gratings. Most neurons' responses to the envelope stimuli were spatially and temporally bandpass, with bandwidths comparable with those measured with luminance gratings. The temporal responses of these neurons (temporal frequency tuning and latency) were systematically slower when tested with envelope stimuli than with luminance gratings. The simplest kind of model that can accommodate these results is one having separate, parallel streams of bandpass processing for luminance and envelope stimuli.
Numerous nonlinearities arise in cortical processing, examples of which include neurons' contrast response, response thresholds, and adaptation (see Bonds 1992; Carandini et al. 1998; Shapley and Lennie 1985 for review). However, these “trivial” nonlinearities primarily affect the magnitude of neurons' responses and do not principally determine the neuron's spatiotemporal selectivity. Instead, many cells exhibit stimulus selectivity that can be understood in terms of spatially and temporally linear summation of luminance inputs over their receptive fields (DeAngelis et al. 1993; DeValois et al. 1979; Movshon et al. 1978). Consequently, neurons in the early stages of the mammalian visual system were likened to bandpass filters. For example, when tested with luminance sinewave gratings, they display selective tuning to spatial frequency (Campbell et al. 1969; DeValois et al. 1982; Foster et al. 1985; Maffei and Fiorentini 1973; Tolhurst and Movshon 1975), which can be predicted from the neuron's measured receptive field profile (Field and Tolhurst 1986; Jones and Palmer 1987; Kulikowski et al. 1982).
However, a qualitatively different type of nonlinearity was reported, whereby neurons respond selectively to stimuli that contain no Fourier frequency components overlapping their luminance-defined passband (Albright 1992; Mareschal and Baker 1998; Zhou and Baker 1994, 1996). These stimuli were termed “second-order” (or “non-Fourier”) because they contain attributes or features that are not defined by luminance variations but rather by a second-order image statistic such as, for example, contrast (Cavanagh and Mather 1989; Chubb and Sperling 1988). Neurons that respond to second-order stimulus attributes are thus “nonlinear” in a profound sense because a nontrivial nonlinearity must be invoked to account for their selective detection of stimuli outside their luminance defined passband.
By using contrast envelope stimuli (a static spatial frequency “carrier” multiplied by a drifting low frequency “envelope”), previous studies demonstrated nonlinear neurons whose responses were contingent on the carrier frequency falling within a narrow range of high spatial frequencies (Zhou and Baker 1994, 1996) and orientations (Mareschal and Baker 1998). These results could not be accounted for by an early nonlinearity applied to the stimulus before frequency filtering; such a nonlinearity would introduce spectral components at the frequency of the envelope but would not produce a narrow bandwidth for carrier frequency and orientation. These findings are consistent with psychophysical experiments with second-order stimuli demonstrating that subjects do not rely on the detection of “distortion products” arising from a nonlinearity before filtering (Badcock and Derrington 1989; Turano and Pantle 1989).
Most results of psychophysical and physiological experiments examining the processing of second-order motion were accounted for with a model having parallel streams to process first- and second-order stimuli (Graham et al. 1992; Ledgeway and Smith 1994; Mareschal and Baker 1998; Nishida et al. 1997; Solomon and Sperling 1994; Wilson et al. 1992; Zhou and Baker 1994; but see Johnston et al. 1992). Psychophysically, the spatial resolution (or bandwidth) of the second-order stream was explored with selective adaptation paradigms (Nishida et al. 1997); however, its temporal characterization (bandwidth and optimum) remains unclear (Gegenfurtner and Hawken 1996; Holliday and Andersen 1994; Ledgeway and Smith 1994).
We examine the spatiotemporal characteristics of envelope-responsive neurons in area 18 by measuring both spatial and temporal frequency tuning. Area 18 was chosen for its higher proportion of envelope-responsive neurons (Zhou and Baker 1996). Assessing neurons' spatiotemporal response function to envelope stimuli is important because it can be compared with their luminance responses in an attempt to characterize the mechanisms involved in the processing of the different types of stimuli. In addition, if these neurons underlie the perception of second-order motion, these findings will have important implications for the design of psychophysical experiments as well as the generation of appropriate models.
Animal preparation was conventional and was described in detail previously (Zhou and Baker 1994). Briefly, experiments were carried out on paralyzed adult cats (gallamine triethiodide) under nitrous oxide/oxygen anesthesia supplemented with intravenous barbiturate. Electroencephalogram, electrocardiogram, expired CO2, and body temperature were monitored and maintained at normal levels throughout the experiment. Penetrations were made with platinum-iridium microelectrodes (Frederick Haer) in area 18 (A3/L4). Each eye was refracted with a retinoscope and fitted with gas-permeable neutral contact lenses. Artificial pupils and spectacle lenses were inserted such that stimuli presented at a viewing distance of 57 cm were in focus.
Two types of spatially one-dimensional stimuli were used in these experiments, conventional sinewave gratings and contrast envelope stimuli. The luminance profile of a drifting sinewave grating is Equation 1where L o is the mean luminance, C the contrast, f s the spatial frequency, and f t is the temporal frequency (the direction of motion is determined by the sign of f t). The luminance profile of a drifting envelope stimulus is Equation 2where f c is the carrier spatial frequency, f e is the envelope spatial frequency (f e [dlt] f c), and f t the envelope temporal frequency.
Figure 1 A shows the luminance profile of a leftward drifting sinewave grating as a space-time plot. Each pixel is assigned a gray level value corresponding to the luminance at that point in space. As time proceeds, the gray level of a given pixel is modulated sinusoidally at frequency ft . Figure 1 B shows the luminance profile of an envelope stimulus consisting of a stationary carrier with a leftward drifting contrast envelope. Idealized magnitude spectra of the stimuli are shown in Fig. 1, C and D, along with the spatiotemporal luminance passband of the neuron (ovals). A drifting sinewave grating has two rotationally symmetric spectral components (Fig. 1 C) at frequencies fs and ft (Eq. 1 ). Figure 1 D shows the Fourier spectrum of the envelope stimulus, consisting of three spectral components (and their symmetric counterparts in opposite quadrants), the stationary carrier and two sidebands having frequencies equal to that of the carrier plus and minus the envelope spatiotemporal frequency and drifting in opposite directions (at the envelope temporal frequency) (Zhou and Baker 1994). There is no Fourier component at the envelope spatiotemporal frequency.
The stimuli were generated with a 66-MHz 80486 microcomputer, with a VSG 2/2 graphics card (Cambridge Research Systems) and displayed on a NEC XP-17 monitor with a frame refresh rate of 160 Hz, a raster of 512 × 379 pixels, and a mean luminance of 28.6 cd/m2. The luminance nonlinearity of the display was measured with a photometer (United Detector Technology, S-370) and then linearized following the method of Pelli and Zhang (1991) with the VideoToolbox software and an ISR Video Attenuator (Institute for Sensory Research, Syracuse University, NY). The contrast of the luminance gratings was set to 30% and that of the envelope stimuli to 70% unless specified otherwise. Stimulus motion was generated by look-up table (LUT) animation, which, for the envelope stimuli, consisted of digitally multiplying on each frame the profiles of the stationary carrier and the drifting envelope in the host computer. These were then used as indices to the Pelli-Zhang LUTs and downloaded to the graphics card LUTs in real time for each frame.
Single unit signals from area 18 were amplified and isolated with a window discriminator (Frederick Haer) and monitored on a backward-triggered digital storage oscilloscope. Preliminary receptive field mapping was done with a hand projector to determine the location, ocular dominance, preferred orientation, and eccentricity. Subsequently, the monitor was centered over the receptive field, and computer-generated stimuli were presented to the neuron's dominant eye. The centering of the receptive field was confirmed with line-weighting functions (Movshon et al. 1978) or white noise analysis. Test conditions were randomly interleaved, and spontaneous activity measured with an initial “blank time” at the onset of each condition. Spike collection was computer controlled (0.1-ms accuracy) and synchronized with the frame rate of the graphics board.
Drifting luminance-defined sinewave gratings were used to measure the neuron's tuning to spatial frequency, temporal frequency, and orientation. Initial testing with the envelope stimuli was done by setting the envelope spatial frequency to the neuron's optimal luminance spatial frequency. The envelope temporal frequency was set lower than the optimal luminance temporal frequency, and a series of carrier frequencies much higher than the neuron's luminance passband was tested. These initial settings for the envelope parameters were used because, on average, they represented the optimal conditions to probe for envelope-responsive neurons. Subsequently, envelope spatial and temporal frequency response functions were measured independently with the measured optimal carrier spatial frequency.
Poststimulus time histograms (PSTHs) were collected, and neurons were classified as simple if they displayed a strong temporal modulation of response to luminance gratings (Movshon et al. 1978). The PSTHs were integrated to obtain an average spike frequency as a function of the stimulus parameter being varied. Estimates of bandwidth and optimal frequency (spatial or temporal) were obtained by fitting Gaussian functions to the response curves.
In envelope-responsive simple cells displaying a strongly modulated response at the frequency of the stimulus, a measure of visual latency was calculated by Fourier analysis of the PSTHs obtained from the temporal frequency experiments. The phase of the first harmonic was plotted as a function of temporal frequency on linear axes, and the slope of the line fit to the data was taken as a measure of latency (Hamilton et al. 1989; Lee et al. 1981; Saul and Humphrey 1990). The position of the drifting stimuli relative to the cell's receptive field cannot be estimated with drifting gratings, thus introducing an additional offset in the temporal phase. However, because envelope and luminance stimuli were presented in consistent initial spatial phases, any difference in temporal phase between the responses to the two types of stimuli would reflect an underlying difference in the processing properties. Thus we were able to estimate temporal latency but not absolute phase.
A total of 30 cats were used in these experiments. From the 128 cells recorded, 59 were envelope responsive, but only 29 could be analyzed completely. Where appropriate, statistical tests were carried out (2-sided t-test) using P = 0.05 as the criterion for significance. Error bars on the graphs for individual neurons represent SEs from the mean.
Figure 2, A and B, illustrates envelope responses of two cortical neurons to luminance gratings (filled squares) and to envelope stimuli (open squares). Both neurons gave robust responses to luminance-defined gratings at low spatial frequencies. However, when tested with luminance gratings having higher spatial frequencies similar to the carrier, responses were minimal and showed no tuning. In both panels, the optimum obtained with the envelope stimulus was significantly different from the response obtained with a luminance grating at the carrier spatial frequencies.
The relative strength of responses of envelope stimuli versus luminance-defined stimuli is plotted in Fig. 2 C, where each data point corresponds to a given neuron's optimal response for the two types of stimuli. The solid line depicts the equality ratio, the dashed line represents the one-half strength ratio, and the dotted line represents the quarter strength ratio. Sixty percent of neurons' responses to envelope stimuli were greater than one-half the strength of response to luminance gratings.
Spatial and temporal tuning to envelope stimuli
Responses of a simple type cell to luminance and envelope stimuli are shown as PSTHs in Fig. 3. When tested with a luminance grating at different temporal frequencies, this neuron gave a modulated response that was strongest at ∼12 Hz (Fig. 3 A, top row). The neuron was only moderately direction selective for luminance gratings, as evidenced by its response to stimuli presented in the nonpreferred direction (Fig. 3 A, bottom row). The cell's response to envelope stimuli was also modulated at the temporal frequency of the contrast modulation (Fig. 3 B, top row) but differed in other respects. The optimal temporal frequency was lower than when measured with luminance gratings; the response was band-pass with an optimum at ∼3 Hz and a clear high-frequency cutoff; and the response was strongly direction selective (Fig. 3 B, bottom row).
To quantitatively compare the spatial and temporal tuning to luminance and envelope stimuli, normalized firing rates measured with these two types of stimuli were plotted for each cell (Fig. 4). Figure 4 A shows the spatial frequency responses, and B shows the temporal frequency responses. The spatial and temporal frequency response curves when tested with luminance-defined stimuli (▪) were bandpass, consistent with previous studies (DeValois et al. 1982; Hawken et al. 1996; Tolhurst and Movshon 1975). When tested with envelope stimuli the neuron's spatial frequency response (Fig. 4, □) was bandpass, approximating the spatial response measured with luminance gratings. The neuron's temporal response displayed a high frequency cutoff with a lower optimal temporal frequency, characteristic of most neurons' responses to the envelope stimuli.
Figure 5 A shows the optimal spatial frequency measured with luminance gratings against that measured with envelope stimuli for 29 neurons. The straight line represents a unity ratio. Neurons were tuned to significantly higher luminance spatial frequencies (average luminance spatial frequency = 0.1 ± 0.05 cycles/deg, average envelope spatial frequency = 0.08 ± 0.04 cycles/deg). Figure 5 B shows the optimal temporal frequencies for the luminance and envelope stimuli, also revealing a significant difference in the temporal tuning to the two types of stimuli (average luminance temporal frequency = 6.57 ± 3.28 Hz, average envelope temporal frequency = 3.78 ± 2.05 Hz).
To further characterize the relationship between temporal and spatial parameters in envelope stimuli, we calculated the optimal velocity for each neuron in Fig. 5 C. This can be estimated by dividing the neuron's optimal temporal frequency by its optimal spatial frequency (Baker 1990). Although the average optimal velocity to luminance stimuli (average = 100.4°/s) was higher than to envelope stimuli (average = 70.95°/s), the difference was not statistically significant.
Previous studies with luminance gratings in area 17 (Baker 1990; DeAngelis et al. 1993; Holub and Morton-Gibson 1981) reported a systematic covariation of spatial and temporal frequency tuning such that neurons tuned to lower spatial frequencies prefer higher temporal frequencies. We examined this in area 18 by plotting each neuron's preferred spatial frequency against its preferred temporal frequency for luminance gratings (Fig. 6 A) and envelope stimuli (Fig. 6 B). The straight lines represent log–log regression fits that have slopes of −0.46 ± 0.25 for Fig. 6 A (Pearson r correlation of −0.4) and −0.104 ± 0.36 for Fig. 6 B (r = −0.06). The relationship between low spatial frequency and high temporal frequency holds for area 18 neurons when using luminance gratings but not when envelope stimuli are employed.
One concern that may arise from comparing the temporal properties of neurons with first- and second-order stimuli is the role of contrast. It has been shown that reducing the contrast of a stimulus may lower the preferred temporal frequency for a neuron (Albrecht 1995; Hawken et al. 1997). In our experiments, the strength of a neuron's response to the envelope stimuli was usually lower than to luminance stimuli, possibly because of the envelope stimuli being less efficient (e.g., having a lower effective contrast). Although this would not account for the temporal differences measured for neurons whose strength of response was the same to both types of stimuli (e.g., Fig. 3), we tested the possibility of contrast biasing our temporal results by measuring temporal frequency responses with sinewave gratings at a series of contrasts on six cells, three of which are shown in Fig. 7. Figure 7 A plots temporal frequency tuning curves for one neuron with the optimal sinewave grating at three different contrast levels (▪) and with the envelope stimulus (□). When the stimuli were equated for effective contrast (taken as the grating contrast at which the response amplitude matched that obtained with envelope stimuli) the optimal temporal frequency for envelope stimuli (4 Hz) was still lower than that obtained with the sinewave grating (6 Hz). The same pattern can be seen in Fig. 7, B and C, where the reduced contrast gratings matched the effective strength of the envelope stimuli, yet their preferred temporal frequencies were still higher. Although contrast can shift the preferred temporal frequencies of neurons it is not sufficient to account for the differences between envelope and luminance stimuli.
Measurements of bandwidth reflect a neuron's response range and provides information about its selectivity. Figure 8 shows histograms of measured spatial and temporal bandwidths, taken as full width at one-half height, for luminance gratings (A and B), and envelope stimuli (C and D). For the luminance data, the mean spatial frequency bandwidth was 1.43 ± 0.59 octaves, and the mean temporal frequency bandwidth was 2.2 ± 0.8 octaves. For the envelope data, the mean spatial frequency bandwidth was 1.39 ± 0.86 octaves, and the mean temporal frequency bandwidth was 1.84 ± 0.94 octaves. The luminance data are consistent with previous findings in that the temporal frequency bandwidths are significantly broader than the spatial bandwidths (Holub and Morton-Gibson 1981; Tolhurst and Movshon 1975). Although the average temporal frequency bandwidths for the envelope data were broader than those for spatial frequency, these differences were not statistically significant. The luminance spatial and temporal bandwidths were not significantly different from the envelope spatial and temporal bandwidths.
Envelope spatiotemporal separability
To test whether the optimal envelope frequencies measured were independent of the spatial frequencies at which they were measured, we obtained spatial frequency tuning curves at a series of test temporal frequencies. This was carried out on seven neurons, six of which are presented in Fig. 9. In Fig. 9 A five different temporal frequencies, spaced an octave apart were tested on this cell. Data from preferred and preferred-minus-null directions were similar, so only data for the preferred direction is shown. At 2, 4, and 8 Hz, there was little variation in the optimal spatial frequency (∼0.07–0.08 cycles/deg). At both 1 and 16 Hz, the neuron's response was very weak and the tuning was quite broad, making it difficult to estimate an optimal spatial frequency. Fig. 9, B–D, shows similar data for three other neurons and demonstrates that the peak spatial frequency is relatively invariant over the temporal frequencies used for testing, except in D, where an optimal spatial frequency could not be measured at 2.8 Hz.
Figure 9, E and F, depicts envelope temporal frequency tuning at a series of fixed envelope spatial frequencies for two additional neurons. The data from Fig. 9 E show very little variation over the three different spatial frequencies tested. For the neuron in Fig. 9 F, there is more variation in the measured optimal temporal frequency, mainly because of measurements obtained at 0.2 cycles/deg.
A more quantitative index of envelope spatiotemporal separability was obtained by plotting the measured frequency optima against the fixed test frequencies. Figure 10 A shows the measured optimal spatial frequency against the test temporal frequencies for five neurons (neuron F2507 was shifted vertically for clarity). If a neuron demonstrated spatiotemporal separability to the envelope stimuli, a linear function fit to the data would have a slope of zero. Figure 10 B plots the data in a similar manner for the two neurons of Fig. 9, E and F, but with temporal frequency as the independent variable. Statistical analysis reveals that only F1904 (Fig. 8 D) shows a slope significantly different from zero. Although some neurons do show deviations from strict separability, the impact on measured optima in Figs. 5 and 6 is minimal.
A different characterization of a neuron's temporal processing is obtained by measuring the latency of its response, or integration time (Hamilton et al. 1989; Lee et al. 1981; Saul and Humphrey 1990). We measured latency on four simple cells displaying a strongly modulated response to both envelope stimuli and sinewave gratings (Fig. 11). The results for luminance gratings are shown with the filled symbols, and the envelope stimuli are shown with open symbols. In Fig. 11 A, the linear regression for the luminance grating had a slope of 320 ± 3.8 ms and 570 ± 8.2 ms for the envelope stimuli. The results for the other cells were 180 ± 1.0 and 350 ± 4.2 ms (Fig. 11 B), 155 ± 5.8 and 600 ± 11.6 ms (C), and 75 ± 0.5 and 200 ± 5.5 ms (D) for the luminance grating and envelope stimuli, respectively. Although the low proportion of modulated simple cells responding to the envelope stimuli limited our sample size, these data indicate for each neuron a significantly longer integration time for the processing of the envelope stimuli.
Previous studies (Zhou and Baker 1994, 1996) characterized the carrier spatial frequency selectivity of envelope-responsive neurons in areas 17 and 18 of the cat; however, because of limitations in their graphics display, the spatial frequency tuning of the envelope could rarely be fully measured. Here we have shown that cortical neurons' responses to the stimuli were contingent on both the spatial and temporal frequencies of the envelope modulation falling within a narrow range. For both parameters, the optimal frequency was significantly lower than that measured with luminance gratings. In addition, the latency measured in simple cells was always longer for envelope stimuli (2- to 3-fold).
Spatial and temporal differences in the processing of luminance gratings and envelope stimuli
Envelope responses are contingent on the spatial frequency of the carrier varying from 5 to 36 times the optimal luminance spatial frequency (Zhou and Baker 1994, 1996). Here we show that responses also depend on the envelope spatial and temporal frequencies being lower than those measured with luminance gratings.
The longer latencies required in the processing of envelope stimuli compared with the luminance gratings was another important difference. Latency gives an estimate of the delay in processing of visual information and was measured in cat lateral geniculate nucleus and area 17 (Saul and Humphrey 1990, 1992) and monkey V1 (Hamilton et al. 1989; Hawken et al. 1996). Our latency estimates for luminance stimuli were on the higher end of the range obtained by Saul and Humphrey, but this may simply reflect differences between processing speeds in area 18 compared with 17. Whether the substantially longer latencies estimated for the envelope stimuli can be attributed to a specific component of the processing stream cannot be determined from our results.
Contrast has been shown to affect the perceived speed of stimuli (Ledgeway and Smith 1995; Thompson 1982) as well as modify neurons' temporal frequency response (Albrecht 1995; Hawken et al. 1997). For example, second-order stimuli generally elicit weaker responses in neurons, which might be interpreted as reflecting a lower effective contrast. Albrecht (1995) measured neurons' temporal properties for luminance gratings at different contrast levels and found that temporal phase, temporal latency, and temporal frequency tuning curves (peak and bandwidths) were shifted by varying the stimulus contrast. For example, halving contrast could shift the optimal temporal frequency approximately one octave and increase latency ∼45 ms. Although we cannot rule out a role of contrast in the neurons' responses to the envelope stimuli, the magnitude of its effects are not sufficient to account for the differences in temporal frequency dependence for the two kinds of stimuli.
Spatial and temporal similarities in the processing of luminance gratings and envelope stimuli
Despite the differences in spatial and temporal tuning, certain characteristics in the neurons' responses were invariant to the type of stimulus used. All envelope-responsive neurons demonstrated bandpass tuning to both luminance and envelope stimuli, often covering a similar frequency range and having similar bandwidths. Such tuned responses support the idea that envelope-responsive neurons may represent a selective, specialized mechanism of information processing.
The preferred direction of motion (Zhou and Baker 1994) or orientation (Mareschal and Baker 1998) for a neuron was stimulus invariant, although the relative strength of the directional response often differed (usually stronger for the luminance stimuli). Similar preferred direction of motion between luminance- and contrast-defined stimuli (“form cue invariance”) was also reported in the monkey (Albright 1992), with different second-order stimuli. The finding that envelope-responsive neurons maintain fundamental components of their behavior (directionality, separability, bandpass spatiotemporal response) implies that neurons may rely on second-order cues in addition to or in place of luminance cues when these are absent or unreliable.
Relationship to earlier psychophysical and physiological research
Psychophysical studies examining second-order stimuli suggest processing by multiple band-pass spatial channels (e.g., Badcock and Derrington 1985, 1989; Henning et al.1975; Nishida et al. 1997). However, studies of temporal processing of second-order stimuli conflicted. First- and second-order stimuli appeared to drift at the same speed when they were equated for visibility (Ledgeway and Smith 1994). Lu and Sperling proposed that the second-order processing mechanism was as fast and as sensitive to high temporal frequencies as the first-order, luminance-based, mechanism (Lu and Sperling 1995). However, other experiments suggested a slower nonlinear stream for both motion processing (Derrington 1994; Werkhoven and Boulton 1994; Wilson et al. 1992) and texture segregation (Graham et al. 1992; Sutter and Graham 1995).
Physiological studies reveal that neurons responding to different types of second-order stimuli (e.g., envelope stimuli, texture-defined stimuli, short-range illusory contours, or abutting gratings) display relatively selective tuning to the second-order stimulus attributes (Albright 1992; Grosof et al. 1993; O'Keefe and Movshon 1996; Sheth et al. 1996; von der Heydt et al. 1984; Zhou and Baker 1993, 1994, 1996). Whether these different types of second-order stimuli are processed via the same nonlinear mechanism remains unknown.
Implications for models of second-order motion processing
Most models of second-order motion processing are based on psychophysical results and posit two streams of information processing, one for luminance defined stimuli and one for second-order stimuli (Graham et al. 1992; Ledgeway and Smith 1994; Nishida et al. 1997; Wilson et al. 1992). The outputs from these two streams are thought to be combined at a later stage (area MT in primate) (Wilson et al. 1992). However, two other types of second-order models were proposed.
In the first type, there are two streams of information processing whose outputs are combined at the level of the second stage filter. In this scheme, there is one filter that is both the second-stage filter of the nonlinear stream and the luminance filter (Henning et al. 1975). This model differs from the two-stream model in two ways; there are no neurons that respond exclusively to second-order stimuli (“nonlinear-only”), and a neuron's spatiotemporal tuning is similar to a luminance grating and to the envelope. We never found nonlinear-only neurons; however, this may be entirely because of our search stimulus, which consists of luminance bars or sinewave gratings and would preclude finding nonlinear-only neurons. Despite the lack of finding nonlinear-only neurons, the significant differences in the spatiotemporal response characteristics to luminance and envelope stimuli that we report are difficult to reconcile with this model.
In the second type, there is only one stream of motion processing for both first-order luminance defined stimuli and second-order stimuli (Johnston et al. 1992). This model is based on spatiotemporal filters that calculate luminance gradients over the image to estimate motion. This model can accurately extract a motion signal from the second-order stimulus; however, this is acheived independently of the carrier content (spatial frequency and orientation). Like the “early-nonlinearity” one-stream model, this model predicts that envelope responses do not depend on the two-dimensional spatial characteristics of the carrier, a finding that is in discord with the physiology (Mareschal and Baker 1998; Zhou and Baker 1993–1995).
In light of the previous discussion, we suggest a two-stream model for motion processing to account for our results. Despite its requirement for nonlinear-only neurons, this model accounts for the differences in processing to first- and second-order stimuli while remaining simple to implement. The luminance stream consists of linear spatiotemporal filters responding to luminance-defined stimuli. The nonlinear stream consists of an initial filter (tuned to the carrier spatial frequency) whose output is subjected to a nonlinearity and then processed by a second filter (tuned to the envelope spatiotemporal frequency) (Zhou et al. 1994, 1996). The envelope spatial and temporal selectivity in this model arises at the level of the second filter. The source of the longer processing latencies could arise from sluggish geniculate precursors, such as the longer latency W cells (Irvin et al. 1986) or lagged cells (Mastronarde et al. 1991; Saul and Humphrey 1990). Alternatively, slow feedback mechanisms might be involved in the production, or modulation, of neurons' responses to envelope stimuli (Lamme 1995).
It is important to note that in the neuronal implementation of this type of model, the structure, location, and connections (feedforward, feedback, or lateral connections) between the neurons at different stages are left open. The notion of “streams” in this model is not intended to correspond to segregated cortical areas (although we cannot rule out this possibility). Instead, the term is used loosely to refer to connections between two different sets of neurons. The model is simply intended to provide a simple, unified working framework and is not to be taken as a definitive biological mechanism. In addition, this model does not imply that information in the two streams be independently accessible (e.g., that the neuron's responses are “labeled” as first-order or second-order, based for example on differences in firing rate).
Independent of the model involved and its anatomic structure, we propose that the function of neurons responding to second-order stimuli could be to encode visual information in situations where luminance cues are absent or unreliable. In the natural environment, second-order cues (particularly contrast) might be useful to break camouflage of textured objects or handle transparency (Daugman and Downing 1995; Derrington and Henning 1993).
We are grateful to S. Dakin for helpful comments on this manuscript. We thank K. Charles and M. Moskovich for contributions to the computer software, L. Domazet for technical assistance, and Rhone-Poulenc Rorer for generous donation of Gallamine Triethiodide.
This research was supported by a Canadian Medical Research Council Grant MA 9685 to C. L. Baker, Jr. and an Fonds pour la Formation de Chercheurs et l'Aide à la Recherche fellowship to I. Mareschal.
Address for reprint requests: I. Mareschal, McGill Vision Research Unit, 687 Pine Ave. West (H4-14), Montreal PQ, H3A 1A1 Canada.
- Copyright © 1998 the American Physiological Society