Abstract
Most models of disparity selectivity consider only the spatial properties of binocular cells. However, the temporal response is an integral component of real neurons' activities, and timevarying stimuli are often used in the experiments of disparity tuning. To understand the temporal dimension of V1 disparity representation, we incorporate a specific temporal response function into the disparity energy model and demonstrate that the binocular interaction of complex cells is separable into a Gabor disparity function and a positive time function. We then investigate how the model simple and complex cells respond to widely used timevarying stimuli, including motionindepth patterns, drifting gratings, moving bars, moving randomdot stereograms, and dynamic randomdot stereograms. It is found that both model simple and complex cells show more reliable disparity tuning to timevarying stimuli than to static stimuli, but similarities in the disparity tuning between simple and complex cells depend on the stimulus. Specifically, the disparity tuning curves of the two cell types are similar to each other for either drifting sinusoidal gratings or moving bars. In contrast, when the stimuli are dynamic randomdot stereograms, the disparity tuning of simple cells is highly variable, whereas the tuning of complex cells remains reliable. Moreover, cells with similar motion preferences in the two eyes cannot be truly tuned to motion in depth regardless of the stimulus types. These simulation results are consistent with a large body of extant physiological data, and provide some specific, testable predictions.
INTRODUCTION
Numerous physiological studies have documented disparitytuned cells in V1 (Barlow et al. 1967; Freeman and Ohzawa 1990; Poggio and Poggio 1984). To understand the mechanism of tuning, many researchers have also investigated how the disparity responses of a cell may be explained by the underlying binocular receptive field (RF) structure. Since disparity is a spatially defined property, nearly all stereo models are solely based on spatial considerations while leaving out the temporal dimension as irrelevant. Specifically, most models (Fleet et al. 1996; Nomura et al. 1990;Ohzawa et al. 1990; Qian 1994;Sanger 1988; Zhu and Qian 1996) only consider how the spatial RFs of binocular cells may respond to static stimuli and generate the physiologically observed disparity tuning curves, such as the tuned, near, and far types found in V1 (Poggio and Fischer 1977; Poggio et al. 1988). However, the spatial and temporal response properties always come together for real neurons. More importantly, physiological studies of disparity tuning often use timevarying stimuli such as motionindepth patterns, drifting gratings, moving bars, moving randomdot stereograms, or dynamic randomdot stereograms in addition to static images. To fully understand these data, the temporal response properties of cortical cells must be considered.
There is also a functional reason to include time into stereo modeling: consistent with the physiological finding that many visual cortical cells are tuned to both disparity and motion (Bradley et al. 1995; Maunsell and Van Essen 1983; Ohzawa et al. 1996), there is increasing psychophysical evidence indicating that motion and stereo interact with each other in generating our perception (Anstis and Hassis 1974;Nawrot and Blake 1989; Qian et al. 1994a;Regan and Beverley 1973). We have already proposed a model for motionstereo integration based on the general properties of binocular, spatiotemporal RFs of visual cortical cells (Qian 1994; Qian and Andersen 1997; Qian et al. 1994b). However, we did not explicitly model the disparity tuning curves of cortical cells to specific timevarying stimuli. In this paper, we first present a simple function that conveniently describes the temporal response profiles of real V1 cells and incorporate this function into the disparity energy model (Ohzawa et al. 1990; Qian 1994). We then apply the model to investigate V1 disparity responses to a variety of timevarying stimuli used in physiological experiments. Some of the results were reported previously in abstract form (Chen et al. 2000).
METHODS
It is well established that the spatial RFs of V1 simple cells can be accurately fit by Gabor functions (Daugman 1985;Jones and Palmer 1987; Marcělja 1980; Ohzawa et al. 1990). Since we are concerned with disparity tuning instead of orientation tuning in this paper, we only consider vertically oriented binocular cells whose left and right RFs are given by (DeAngelis et al. 1991;Ohzawa et al. 1990, 1996)
Unlike the spatial RFs, the temporal response of cortical cells is not Gaborlike (DeAngelis et al. 1993a, 1999; Ohzawa et al. 1996). We examined the temporal profiles of real V1 cells and found that they can be conveniently described by an envelope of the gamma probability density function, multiplied by a sinusoidal modulation
The frequency tuning of Eq. 3
is determined by its Fourier transform, which can be calculated analytically as
The temporal function h(t) can then be combined with the spatial function g(x, y) to model threedimensional spatiotemporal RFs of simple cells (Adelson and Bergen 1985; Watson and Ahumada 1985). For binocular simple cells, this can be done for the left and right RFs separately
The response of simple cells to a stereo image pairI
_{l}(x, y, t) andI
_{r}(x, y, t) can be approximated by linear spatiotemporal filtering (DeAngelis et al. 1993b; Jones and Palmer 1987; Ohzawa et al. 1990), followed by halfsquaring (Anzai et al. 1999a,b; Heeger 1992)
Under the assumption that the RF size is much larger than the horizontal disparity D of the stimulus, it can be shown that the simple cell response is approximately (see
)
We model complex cell responses using the wellknown quadrature pair method for disparity energy computation (Adelson and Bergen 1985; Emerson et al. 1992; Ohzawa et al. 1990; Pollen 1981; Qian 1994;Watson and Ahumada 1985). The complex cells derive both their spatial and temporal properties from the constituent simple cells. Because of the halfwave rectification contained in the halfsquaring operation for each complex cell, we need to sum the responses of four simple cells (Ohzawa et al. 1990), all with identical φ_{−} but with their φ_{+}/2 differing in steps of π/2. (This is exactly equivalent to summing the squared responses of two simple cells without the half squaring.) The resulting complex cell response is approximately
Previously, we pointed out that for both physiological and computational reasons, a spatial pooling step should be added after the quadraturepair construction to better simulate complex cell responses (Qian and Zhu 1997; Zhu and Qian 1996). We add this step for modeling complex cell responses to the randomdot type of stimuli, as such pooling significantly improves the reliability of disparity tuning (Fleet et al. 1996; Qian and Zhu 1997; Zhu and Qian 1996). The pooling step is omitted for bar and grating stimuli because it does not make any difference for those stimuli. The weighting function for the spatial pooling is a normalized, circularly symmetric twodimensional Gaussian with a ς equal to ς_{x} in Eqs. 1 and 2.
RESULTS
Binocular interaction RFs of complex cells
Equations 5 and 6 can be used to model simple cells' binocular, spatiotemporal RFs (results not shown), which are firstorder kernels of the white noise analysis (Adelson and Bergen 1985; Anzai et al. 1999a;DeAngelis et al. 1999; Ohzawa et al. 1996). One cannot obtain similar firstorder RFs for complex cells because complex cells do not have separated on and off subregions. However, as Ohzawa, DeAngelis, and Freeman (1997) have shown, real complex cells have welldefined binocular interaction RFs, which are the impulse response functions obtained by flashing a line at the preferred orientation at timet to locations x _{l} andx _{r} in the two eyes, respectively. It is a firstorder temporal and secondorder spatial kernel. Previously,Ohzawa et al. (1997) have modeled the secondorder spatial kernel. Here we add the time variable and compare our simulations with the experimental data.
It can be shown that the binocular interaction RF defined byOhzawa et al. (1997) for a complex cell can be written as (see
)
Equation 13 is plotted in Fig.2 for four model complex cells. The timeintegrated tuning curves are also shown at the bottom of each panel, indicating that these cells are tunedexcitatory (TE), tunedinhibitory (TI), near (NE), and far (FA) types, respectively, according to Poggio's classification. The disparitytime separability in Eq. 13 is clearly exhibited in the figure for both the nondirectional cell (η = 0, Fig. 2 A) and the strongly directional cell (η = 1, Fig. 2 B).
Another feature in Fig. 2 is that the D − Tprofiles of nondirectional or weakly directional complex cells (Fig. 2,A and C) have two peaks along the time axis, while strongly directional complex cells (Fig. 2, B andD) are unimodal over time. This originates from Eq.15. When the directional factor η = 0, the complex cell temporal response function becomes
Motion in depth
When an object is moving toward or away from an observer, the binocular disparity of the object changes over time, and the motion speeds or directions in the two eyes are different. The fact that the disparity tuning of complex cells does not vary with time (Fig. 2) implies that these cells are not tuned to motion in depth (Ohzawa et al. 1997; Qian 1994;Qian and Andersen 1997). Consistent with this, most V1 cells have the same motion preference for the two eyes, and give the strongest response to the frontoparallel motion at the preferred disparity (Ohzawa et al. 1996, 1997; Poggio and Talbot 1981). In addition, Maunsell and Van Essen (1983) reported that no MT (V5) cells were found to be truly tuned for motion in depth when the motion trajectories of the stimuli were properly positioned (see following text).
We have simulated motionindepth tuning curves under a variety of conditions (Figs. 35). The format of each plot in each figure is identical to that used by Maunsell and Van Essen (1983). Twelve motion trajectories, represented “around the clock,” were considered for each tuning curve. The 0 and 180° paths represent the rightward and leftward motions, respectively, in a frontoparallel plane; the 90 and 270° represent motions straight away from and toward the observer, respectively. The remaining eight trajectories represent intermediate, oblique paths in depth. Maunsell and Van Essen (1983) pointed out that to properly assess the motionindepth tuning, the midpoints of all trajectories should meet at a point with the preferred disparity of the cell. In this case, the 0 and 180° trajectories are on the cell's preferred disparity plane if it exists.
The 12 trajectories for the moving stimuli are specified by the horizontal speeds for the two eyes (Maunsell and Van Essen 1983). Starting from the 0° path and going counterclockwise, the 12 speed pairs for the left and right eyes used in our simulations are (1.8, 1.8), (0.6, 1.8), (−0.6, 1.8), (−1.8, 1.8), (−1.8, 0.6), (−1.8, −0.6), (−1.8, −1.8), (−0.6, −1.8), (0.6, −1.8), (1.8, −1.8), (−1.8, −0.6), and (1.8, 0.6), in deg/s.
MOVING BARS.
Figure 3 shows the results for a directional simple cell (A) and the corresponding complex cell (B) in response to a moving bar stimulus. The two rows are for the cases with and without a threshold term in Eq. 7, respectively. Since both the left and right RFs of the model cells prefer leftward motion, it is not surprising that the tuning curves are peaked in the left, frontoparallel direction, indicating that these cells are not tuned to motion in depth. We have also performed simulations with nondirectional model cells (results not shown). In this case, the tuning curves usually had two peaks pointing at 0 and 180° directions, and for simple cells, there were additional, smaller peaks at 90 and 270° directions, again indicating the absence of motionindepth tuning. These results are consistent with the physiological data for the majority of visual cortical cells (Maunsell and Van Essen 1983; Poggio and Talbot 1981). The inclusion of a threshold term (2nd row) makes the tuning curves sharper because it suppresses small responses from the nonpreferred paths. This could explain some sharp tuning curves found experimentally (Maunsell and Van Essen 1983; Poggio and Talbot 1981).
Although most cortical cells are like those shown in Fig. 3, preferring frontoparallel motion with fixed disparity, there is evidence that some cells in areas V1 and V2 are tuned to motion toward or away from the observer (Cynader and Regan 1978; Poggio and Talbot 1981). However, cells preferring frontoparallel motion may appear to be tuned to motion in depth if the midpoints of the stimulus trajectories meet at a point outside the preferred disparity plane (Maunsell and Van Essen 1983). Under this condition, the 0 and 180° trajectories are not in the cell's preferred disparity plane and thus may not evoke the strongest responses. By contrast, the cell may be most excited by the oblique depthpath that happens to have the best overlap with the preferred disparity plane. The tuning curves under this “offpreferredplane” situation for the same simple and complex cells in Fig. 3 are shown in the top row of Fig. 4. Here, the midpoints of all paths meet at a point with a disparity of −0.04° while the cells' preferred disparity is 0.04°. As predicted by Maunsell and Van Essen (1983), now the cells appear to prefer motion along oblique paths in depths. Thus some cells may appear tuned to motion in depth simply because of the improper choice of the test paths in an experiment. However, this possibility does not rule out the existence of cortical cells that are truly tuned to motion in depth. These cells should have different preferred directions or speeds in the two eyes (Cynader and Regan 1978; Poggio and Talbot 1981) and can thus show motionindepth tuning even when the stimulus paths are properly chosen. Our simulationresults for a simple and a complex cell preferring opposite directions of motion in the two eyes are shown in the bottom row of Fig. 4. The cells are tuned to motion straight away from the observer. Unlike the cells in thetop row, these true motionindepth cells have a single prominent peak in their tuning curves.
RANDOMDOT STEREOGRAMS.
We have also simulated motionindepth tuning curves of the same simple and complex cells in Fig. 3 (with threshold) to coherently moving randomdot stereograms (MRDSs), and dynamic randomdot stereograms (DRDSs), and examined the effect of spatial pooling (seemethods) for the complex cell responses. The dots of a MRDS are all on the same disparity plane at a given time and the whole plane moves along each of the 12 motion paths mentioned in the preceding text. Each MRDS is large enough so that it covers the cells' RFs at all times without the edge effect. A DRDS is identical to the corresponding MRDS in terms of disparity change over time, but the dot positions are randomly replotted for each frame. To investigate the reliability of the tuning curves, we simulated two tuning curves for each case, with two sets of independently generated MRDSs or DRDSs. The results are shown in Fig. 5. It can be seen that the tuning for MRDSs is very similar to that for moving bars (Fig. 3), except that the curves are narrower because there are more weak responses for MRDSs than for moving bars that are suppressed by the threshold. The curves for DRDSs, on the other hand, are quite different. First, because DRDSs, by definition, can only have disparity changes over time, but no directions of motion, the tuning curves are symmetrical with respect to the 90–270° axis. This is independent of the direction selectivity of the cell. Second, the two curves from the two independent simulations are very different from each other for the simple cell but are quite similar to each other for the complex cell with spatial pooling. This indicates that complex cells have more reliable tuning to DRDSs than do simple cells. Finally, the tuning curves for DRDSs are not as narrow as those for moving bars or MRDSs. For the simple cell, the main peak location is often located outside the preferred disparity plane. These specific features of motionindepth tuning to MRDSs and DRDSs can be tested experimentally, and have implications for some relevant psychophysical observations (see discussion).
Similar to Fig. 4 for the bar stimuli, MRDSs and DRDSs can also give false motionindepth tuning if the motion paths are not properly chosen, and real motionindepth tuning can only be obtained with cells preferring opposite directions in the two eyes.
Disparity tuning curves
DRIFTING SINUSOIDAL GRATINGS AND BARS.
Unlike the motionindepth stimuli discussed in the preceding text, all stimuli in this and subsequent subsections have a constant disparity over time. Ohzawa and Freeman (1986a,b) used binocular drifting sinusoidal gratings to test the disparity tuning of V1 cells in the cat. Figure 6 shows the response time courses and disparity tuning curves of a model simple and complex cell stimulated by drifting sinusoidal gratings of various interocular phase differences. The parameters are chosen to simulate the data shown in Fig. 3 of Ohzawa and Freeman (1986b) for the simple cell, and Fig. 1 of Ohzawa and Freeman (1986a) for the complex cell. Since that particular simple cell had shorter active halfcycles than the silent halfcycles, we include a threshold equal to 20% of the maximum value of the linearfilteringresult inEq. 7. The spatial and temporal frequencies of gratings match the preferred frequencies of the cells, as in the actual experiments. Ohzawa and Freeman (1986b) used the first harmonic amplitude of the simple cell response for plotting the tuning curve. We simply use the timeintegrated total response because it is proportional to the first harmonic in the context of our model. Figure6 shows that the responses of both the simple and complex cells depend on the interocular phase difference (proportional to disparity) of the gratings. The simple cell's responses are modulated sinusoidally in time followed by rectification, while the complex cell responses are sustained. These features agree with the experimental data (Ohzawa and Freeman 1986a,b).
Another feature in Fig. 6 A is that the temporal responses of the simple cell are tilted to the right as the interocular phase difference increases. This is also consistent with the physiological results in Fig. 3 of Ohzawa and Freeman (1986b). It can be shown that this tilt stems from the specific way of introducing binocular disparity. In both the experiments (Ohzawa and Freeman 1986a,b) and our simulations, the disparity is generated by keeping the grating phase of one eye's image fixed while varying the phase in the other eye. If the disparity is symmetrically divided between the two eyes, then the tilt disappears (results not shown). The reason is that the asymmetric disparity generates a small positional change that leads to a temporal delay in the simple cell's response.
The model cells used in the preceding simulations are ocularly balanced. However, similar results can be obtained when one eye is more dominant than the other. There are two ways to introduce ocular dominance into the model. The first method is to introduce a weighting factor in front of one of the two RF profiles in Eq. 7.Mathematically, this is equivalent to presenting a stereogram with different contrast scales (but of the same contrast sign) to the two eyes. As we have shown previously (Qian 1994;Qian and Mikaelian 2000), the tuning curves will maintain the same shape under this condition although the pedestal will be higher and the amplitude will be smaller. The second method for introducing ocular dominance is to assume that one eye has a higher response threshold than the other. We find through simulations that again similar tuning curves can be obtained unless one of the thresholds is so high that the corresponding eye does not respond (results not shown).
We have also simulated response time courses and disparity tuning curves of simple and complex cells to moving bars (results not shown). Like the grating case, the tuning curves for both simple and complex cells peak at locations predicted by Eq. 12, and the vertical alignment of the response time courses depends on whether the disparities are introduced symmetrically in the two eyes or not. For directional cells, the disparity tuning curves for the preferred and antipreferred directions have the same peak locations although the responses amplitudes differ markedly. These features are consistent with the experimental data in Fig. 4 of Poggio and Fischer (1977). For each bar sweep, the complex cells give longer responses than the corresponding simple cells because the former do not have the discrete on and off RF subregions.
RANDOMDOT STEREOGRAMS.
Poggio et al. (1985, 1988) also applied DRDSs to measure disparity tuning curves. In their experiments, each stereogram maintained a constant disparity during a trial, but the actual dot locations were randomly replotted from frame to frame. They found that simple cells do not show reliable disparity tuning to DRDSs but that complex cells do.
To investigate how reliably our model simple and complex cells were disparitytuned to DRDSs, we computed, for each cell type, 1,000 disparity tuning curves from 1,000 independent sets of DRDSs, all generated from the same parameters. All DRDSs had a refresh rate of 100 Hz as in Poggio et al.'s experiments. Figure7 shows the results. We also considered the effect of adding a spatial pooling stage to the complex cell responses (Fig. 7
C, see methods). For clarity, only 30 randomly picked curves for each cell are shown in the top panels. The distribution histograms of the preferred disparities (bottom panels) are compiled from all 1,000 curves. It is clear from the figure that the peak location of the tuning curves is much more variable for the simple cells than for the complex cells and that spatial pooling helps to further improve the reliability of the complex cell responses. Specifically, 40, 77, and 99% of the tuning curves peak within 0.02° of the predicted preferred disparity for the simple cell, the complex cell without pooling, and the complex cell with pooling, respectively. Additional simulations show that for complex cells, the standard deviation of the peak locations is inversely proportional to the ς of the twodimensional Gaussian used for the spatial pooling. Since the number of cells (N) pooled is proportional to ς^{2}, the variability of the peak locations follows the inverse
Our simulation result, that disparity tuning curves to DRDSs are more reliable in complex cells than in simple cells, is in qualitative agreement with the experimental data of Poggio and coworkers (Poggio et al. 1985, 1988). Quantitatively, however, there may be some discrepancies. Although they did not publish any simple cell tuning curves to DRDSs, Poggio et al. (1985,1988) reported that nearly all neurons responding to DRDSs are complex cells and that simple cells are not tuned to these stimuli. In contrast, the simulated tuning curves in Fig. 7 A are not completely random but show a tendency to peak around the preferred disparity of the corresponding complex cell (marked by the vertical line in the figure). A close examination reveals that the disparity tuning trend of the model simple cell results from the fact that a small number of frames in each DRDS generate relatively reliable tuning because they happen to contain dot distributions that excite the cell strongly.
A closely related problem in Fig. 7 A is that the response amplitudes of the simple cell to different sets of DRDSs fluctuated over a very large range (because some DRDSs happen to contain more frames that strongly excite the cell than other DRDSs). However, experimental data show that although some V1 cells occasionally give a strong response to one randomdot pattern and a weak response to another pattern, most cells have comparable responses to different random dot stimuli (Qian and Andersen 1995;Skottun et al. 1988; Snowden et al. 1992).
The preceding two problems can be resolved by introducing the following contrast response function to replace the halfsquaring operation inEq. 8
We next simulated the responses of the cells used for Fig. 8 to coherently MRDSs. The results are shown in Fig.9. Obviously, the simple cell's disparity tuning to MRDSs is much more reliable than to DRDSs. The reason is that θ(t) in Eq. 9 varies randomly over time for DRDSs, while it changes smoothly for MRDSs. Since the temporal averaging of a continuous θ(t) is much closer to a constant than is the averaging of some random values, coherently moving stereograms should always generate more reliable disparity tuning curves than the random frames unless a very large number of frames (>200) is used (in which case both types of tuning curves become reliable). This is a specific prediction that can be tested physiologically. Poggio et al. measured disparity tuning of some V1 cells to MRDSs (Poggio et al. 1985, 1988). Unfortunately, they did not systematically compare the cells' responses to DRDSs and MRDSs but instead appeared to group the two types of stereograms together as the “cyclopean stimuli.”
Finally, for the purpose of comparison, we also simulated the disparity tuning of the cells in Fig. 8 to static randomdot stereograms (SRDSs). The results are shown in Fig. 10. Consistent with our previous simulations with spatial RFs only (Qian 1994; Qian and Zhu 1997; Zhu and Qian 1996), the simple cell showed completely random disparity tuning curves when different sets of SRDSs were used, while the complex cell maintained reasonable tuning reliability when the spatial pooling is applied. Moreover, for all cell types, disparity tuning to SRDSs is not as reliable as that to DRDSs, which in turn is not as reliable as the tuning to MRDSs. This is easy to understand because for static patterns there is only a single value for θ(t) in Eq. 9, and therefore temporal integration does not help to reduce the influence of the first cosine term in the equation.
DISCUSSION
The main goal of this paper is to understand how V1 cells respond to binocular disparity in timevarying stimuli. We introduced a specific function that conveniently describes temporal response profiles of real cortical cells including the transient (or bandpass) and the sustained (lowpass) types. We then incorporated this temporal function into the disparity energy model (Ohzawa et al. 1990; Qian 1994) and found that the binocular interaction RFs of V1 complex cells, with the typical disparitytime separability in the D − T plot (Ohzawa et al. 1997), can be explained. The disparity part is a Gabor function and the time part is always positive. Finally, we investigated how the model simple and complex cells respond to various timevarying stimuli, including motionindepth patterns, drifting gratings, moving bars, MRDSs and DRDSs. We found that the simulated tuning curves agree with the extant experimental data quite well (Cynader and Regan 1978; Ohzawa and Freeman 1986a,b; Poggio and Fischer 1977; Poggio and Talbot 1981; Poggio et al. 1985). Our results indicate that both spatial pooling and temporal averaging can significantly improve the reliability of disparity tuning and that in general, complex cells are much better disparity detectors than simple cells (Ohzawa et al. 1990; Qian 1994), although the difference between the two cell types depends on the stimuli (see following text).
Tuning reliability
We pointed out previously that for static stereograms, simple cells do not have reliable disparity tuning since their responses are highly dependent on the Fourier phases of the stimuli (Qian 1994, 1997; Qian and Zhu 1997; Zhu and Qian 1996). For example, simple cells' tuning curves vary with the spatial phase of sinusoidal gratings and with the lateral position and contrast polarity of bars (Ohzawa et al. 1990). For coherently moving stimuli considered in this paper, this Fourierphase dependence is manifested as the temporal modulation of the response: as a stimulus such as a bar or a grating sweeps through the RFs of a cell and its Fourier phase changes continuously, and therefore the response changes accordingly in time. If the tuning curve of a simple cell is calculated by temporally integrating the responses over time, the phase dependence will be averaged out, and simple cells will then have reliable disparity tuning curves to moving stimuli. Indeed, we found that for moving bars and gratings, simple and complex cells show equally reliable disparity tuning curves. However, the situation is quite different for DRDSs. Here the simple cells' disparity tuning is still highly unreliable even with temporal integration of 50 different frames, and this lack of reliability is consistent with the experimental reports (Poggio et al. 1985, 1988). Intuitively, a DRDS only contains random samples of the possible Fourier phase values, while for coherently moving stimuli, the Fourier phase changes smoothly so that the full range of phase values can be quickly covered for every stimulus used in an experiment. Therefore temporal integration of simple cell responses is much more effective in improving disparity tuning for coherently moving stimuli than for DRDSs. In contrast to simple cells, complex cells have reliable disparity tuning to all of the stimulus types mentioned above, including DRDSs, and this is particularly true when the spatial pooling step is included for modeling complex cell responses. The simulated reliability of complex cell tuning is consistent with experimental data (Ohzawa et al. 1990; Poggio et al. 1985,1988). The pooling reduces variability according the expected inverse
One might conclude, based on the preceding discussion, that simple cells can reliably extract disparity for coherently moving stimuli but not for static patterns and DRDSs, whereas complex cells can do so for all stimulus types. This conclusion requires some qualification because for simple cells, the reliable tuning to coherently moving stimuli is only obtained after integrating the responses over a certain period of time. The brain, however, may not have the luxury of waiting for the temporal integration to complete before responding to stimuli in the real world. In fact, disparitytriggered vergence eye movement has a latency of less than 60 ms in monkeys (Masson et al. 1997), only about 10 or 20 ms longer than the V1 response latency. Therefore the brain might have to extract disparity based on the responses over a time slice of only 10 or 20 ms. If this is the case, then simple cells may not be able to extract disparity reliably even for moving stimuli. Consider, for example, the simple cell response time courses to gratings (Fig. 6 A). It is clear that tuning curves calculated from different brief time slices will have different peak locations. This problem does not exist for the complex cell in Fig. 6 B because its responses are more sustained in time. We conclude that in general, complex cells are better suited than simple cells for disparity extraction.
Motion in depth
We have also shown that a cell with identical motion preference for its left and right RFs is not truly tuned to motion in depth. AsMaunsell and Van Essen (1983) predicted, such a cell may give a false impression of motionindepth tuning if the stimulus paths are not properly aligned with the preferred disparity plane. True motionindepth tuning, however, can only be obtained for cells with different left and right motion preferences. Our simulations may help explain some relevant psychophysical findings. Westheimer (1990) reported that with line stimuli, the threshold for detecting disparity motion in depth is much higher than that for detecting the disparity difference of frontoparallel motions. This agrees with the fact that most visual cortical cells have the same motion preference in the two eyes (Maunsell and Van Essen 1983; Ohzawa et al. 1996, 1997; Poggio and Talbot 1981) and therefore are not tuned to motion in depth. Cumming and Parker (1994) found that stereomotion is primarily detected by means of the temporal change of binocular disparity, instead of the interocular velocity difference. Again, this is consistent with physiology because cells with identical motion preference in the two eyes cannot be sensitive to the interocular velocity difference. Finally, Harris and Watamaniuk (1995) concluded that the rate of pure disparity change is not a good cue for speed discrimination of DRDSs moving in depth. This could be due to the poor reliability and broad widths of the motionindepth tuning curves under this condition, as shown in Fig. 5.
Alternative methods
Although we used the phasedifference RF model and the quadrature pair construction proposed by Ohzawa et al. (1990) in all analyses and simulations presented here, similar results can be obtained for the positionshift RF model and for some other methods of constructing complex cell responses. As we demonstrated previously (Zhu and Qian 1996), there is little difference in disparity tuning between the phasedifference and the positionshift RF models for the broadband stimuli such as bars and randomdot patterns when the disparity range is smaller than the preferred spatial period (the inverse of preferred spatial frequency) of the RFs. For narrowband stimuli like sinusoidal gratings, the main difference is a small horizontal shift of disparity tuning curves. But even this difference disappears when the grating frequency matches the cell's preferred spatial frequency, which is the case for the simulations reported here. We have also shown previously that the quadrature pair construction is exactly equivalent to a phase averaging procedure that integrates the responses of all simple cells with their φ_{+} uniformly distributed in the entire 4π range (Qian and Mikaelian 2000). We can further demonstrate that squaring in the quadrature pair method is also not important because similar results can be obtained if the exponent of 2 in Eq. 8
is replaced by a positive number n(Albrecht and Hamilton 1982; Sclar et al. 1990), and if the phase averaging procedure is used. In this case, Eq. 11
for complex cell response simply becomes something very similar
Predictions
Several specific, testable predictions can also be made based on our analyses and simulations. First, strongly directional complex cells should only have a single peak along the time axis in theD − T plot. Nondirectional cells should have more than one peak unless their temporal frequency bandwidths are so large (i.e., small τ and ω
Problems with the disparity energy model
The disparity energy model has been highly successful in explaining a wide range of physiological and perceptual observations as demonstrated by this and numerous previous publications (Anzai et al. 1999b; Fleet et al. 1996;Mikaelian and Qian 2000; Ohzawa et al. 1990,1997; Qian 1994; Qian and Andersen 1997; Qian and Zhu 1997; Qian et al. 1994b; Zhu and Qian 1996). This is quite remarkable given that the model is a relatively highlevel abstraction that does not include detailed morphology, connectivity, and membrane biophysics of the visual cells. However, there are also some experimental findings that are inconsistent with the model.Ohzawa et al. (1997) noted that the spatial elongation of the binocular interaction RF of real complex cells is significantly larger than that predicted by a single quadraturepair mechanism. This problem may be alleviated by adding a spatial pooling procedure for computing complex cell responses (Fleet et al. 1996;Qian and Zhu 1997; Zhu and Qian 1996), which also accounts for the larger RFs of complex cells compared with simple cells at the same eccentricity (Hubel and Wiesel 1962; Schiller et al. 1976). Another problem noted by Ohzawa et al. (1997) is that for real complex cells, the disparity frequency (obtained from the disparity tuning curves to broadband stimuli) is usually lower than the preferred spatial frequency (especially for highfrequency cells), while the energy model predicts equality of the two frequencies (Ohzawa et al. 1990; Qian 1994; Zhu and Qian 1996). However, the discrepancy may be, at least partially, due to something unrelated to the model: the disparity frequency was measured with the whitenoise method while the preferred spatial frequency was measured with drifting sinusoidal gratings (Ohzawa et al. 1997). Since the spatial frequency measured with noise stimuli is lower than that measured with drifting gratings (Gaska et al. 1994), perhaps the whitenoise method also underestimates the disparity frequency (Ohzawa et al. 1997). Indeed due to the timeconsuming nature of the whitenoise method, one might tend to chose a lower spatial sampling density for the noise stimuli than for the grating stimuli. We found through simulations that an insufficient spatial sampling density (which would be more likely to happen for cells with high spatial frequencies) can indeed lead to an underestimation of the measured disparity frequency (results not shown).
The energy model also predicts that when stimuli presented to the two eyes have opposite signs of contrast, the disparity tuning curve of a complex cell should be inverted in shape, with the same amplitude as the samecontrastsign case (Ohzawa et al. 1990;Qian 1994; Qian and Mikaelian 2000). In reality, while many complex cells do show the predicted tuning curve inversion, the amplitude of tuning is typically reduced (Cumming and Parker 1997; Ohzawa et al. 1997). It has been suggested that an introduction of monocular thresholds at the simple cell stage may explain the reduced amplitude (Read et al. 2000). Finally, there are cells that appear monocular when the two eyes are tested separately but show a large binocular interaction (either disparity or nondisparityselective) when the two eyes are stimulated together (Ohzawa and Freeman 1986a,b;Poggio and Fischer 1977). This is presumably due to some subthreshold events and may be partially explained by adding a binocular threshold in Eq. 7 after the summation of the monocular contributions. A full account, however, may require a highly nonlinear summation mechanism for combining the two monocular inputs. How to modify the energy model to resolve these and other problems without completely sacrificing its simplicity will be a challenge to future research.
Acknowledgments
We thank Drs. Nestor Matthews and Izumi Ohzawa and anonymous reviewers for helpful discussions and comments.
This work was supported by National Institute of Mental Health Grant MH54125 and a Sloan Research Fellowship, both to N. Qian. Y. Wang was supported by Grants 69835020, 39670186, and 3989334006 from the National Natural Science Foundation of China.
Footnotes

Address for reprint requests: N. Qian, Center for Neurobiology and Behavior, Columbia University, P.I. Annex Rm. 730, 722 W. 168th St., New York, NY 10032 (Email: nq6{at}columbia.edu).
 Copyright © 2001 The American Physiological Society
Appendix
Derivation of Eq. 9
We derive the simple cell responses Eq. 9
under the general assumption that the size of the RFs is much larger than the image disparity. First, rewrite g
_{l} and
Derivation of Eq. 13
The binocular interaction RF for complex cells is the impulse response function obtained by flashing a line with preferred orientation (vertical in our case) at time t to locationsx _{l} andx _{r} in the two eyes respectively. Because for vertical line stimuli, the Y dimension ofEq. 7 simply integrates to a constant, we can ignore theY dimension.
First, the response of the linear filtering of the dichoptically flashed line through the binocular simple cell RFs is given by