The temporal properties of disparity-sensitive neurons place important temporal constraints on stereo matching. We examined these constraints by measuring the responses of disparity-selective neurons in striate cortex of awake behaving monkeys to random-dot stereograms that contained interocular delays. Disparity selectivity was gradually abolished by increasing interocular delay (when the delay exceeds the integration time, the inputs from the 2 eyes become uncorrelated). The amplitude of the disparity-selective response was a Gaussian function of interocular delay, with a mean of 16 ms (±5 ms, SD). Psychophysical measures of stereoacuity, in both monkey and human observers, showed a closely similar dependency on time, suggesting that temporal integration in V1 neurons is what determines psychophysical matching constraints over time. There was a slight but consistent asymmetry in the neuronal responses, as if the optimum stimulus is one in which the right stimulus leads by about 4 ms. Because all recordings were made in the left hemisphere, this probably reflects nasotemporal differences in conduction times; psychophysical data are compatible with this interpretation. In only a few neurons (5/72), interocular delay caused a change in the preferred disparity. Such tilted disparity/delay profiles have been invoked previously to explain depth perception in the stroboscopic version of the Pulfrich effect (and other variants). However, the great majority of the neurons did not show tilted disparity/delay profiles. This suggests that either the activity of these neurons is ignored when viewing Pulfrich stimuli, or that current theories relating neuronal properties to perception in the Pulfrich effect need to be reevaluated.
In computing depth from binocular disparity, it is first necessary to deduce which image feature in one eye corresponds to a given feature in the other eye. Most studies of this problem have focused on spatial properties of the image (Hayashi et al. 2004; Marr and Poggio 1979; Pollard et al. 1985; Qian 1994; Read 2002; Tsai and Victor 2003). However, when the visual scene is changing, temporal information can provide important constraints (Burr and Ross 1979; Chen et al. 2001; Julesz and White 1969; Qian and Andersen 1997; Ross 1974). The images of a stationary object should appear simultaneously in both eyes. Moving objects can give rise to identical images appearing with an interocular delay, a principle that forms the basis of classical explanations for the Pulfrich effect (Julesz and White 1969; Pulfrich 1922). However, how these temporal constraints are implemented in the brain remains unclear. In this study, we aimed to relate temporal aspects of stereo psychophysics to the properties of disparity-selective neurons in primary visual cortex.
Psychophysical studies using interocular delay suggest that the stereo system integrates information over a period of about 50 ms (Julesz and White 1969; Lee 1970; Morgan 1979; Ross and Hogben 1975, 1974). It is currently unclear how this relates to neuronal properties. One problem is that the psychophysical studies have generally not expressed their results in terms of a quantitative neuronal model (e.g., if neurons had a Gaussian temporal integration kernel, it is unclear what SD would be implied by the psychophysics). Physiological studies have either not quantified binocular integration time at all or not provided a population mean. Finally, there is considerable variation between cells and across species (Anzai et al. 2001; Gardner et al. 1985; Pack et al. 2003; Pettigrew et al. 1968). For all these reasons, it is currently difficult to assess the extent to which neuronal responses account for psychophysical behavior.
A second point of interest concerns the underlying mechanisms of temporal integration. In standard models of V1 neurons, such as the energy model (Ohzawa et al. 1990), the binocular integration time, derived from the response to cyclopean stimuli with an interocular delay, follows straightforwardly from the monocular integration time calculated from the response to monocular contrast stimuli (Chen et al. 2001). It is not clear whether this relationship holds in real neurons.
Finally, much work on the temporal aspects of stereopsis has been stimulated by the observation that viewing a moving object with interocular delay causes it to appear in depth (the Pulfrich effect). Modern explanations of this illusion invoke disparity detectors in which interocular delays cause changes in the preferred disparity (Anzai et al. 2001; Carney et al. 1989; Morgan and Castet 1995; Morgan and Fahle 2000; Morgan and Tyler 1995; Pack et al. 2003; Qian 1997; Qian and Andersen 1997). Such neurons are common in cat area 17/18 (Anzai et al. 2001) and monkey MT (Pack et al. 2003), but appear to be less common in monkey V1 (Pack et al. 2003). However, the significance of these neurons remains unclear, for several reasons. First, the studies measured receptive fields with a reverse-correlation technique, using one-dimensional dichoptic noise (Anzai et al. 2001) or bar stimuli (Pack et al. 2003). This depends on the assumption that the cells are linear in space and time; the effect of delay on disparity tuning has not been tested directly. Second, because the stimuli were oriented parallel to the neuron's preferred orientation, cells tuned to horizontal orientations were probed with vertical disparity. The likely effect on depth perception of shifts in preferred vertical disparity is unclear. Third, for those cells where interocular delay does cause shifts in preferred disparity, it is unclear how this relates to motion sensitivity. In standard linear models, such as the binocular energy model (Ohzawa et al. 1990), neurons whose preferred disparity changes with interocular delay must have tilted receptive fields, i.e., they must encode direction of motion as well as disparity (Chen et al. 2001; Qian and Andersen 1997). For this reason such cells are commonly referred to as joint disparity/motion sensors. The studies by Anzai, Pack, and colleagues did not quantify whether the delay-induced shifts in preferred disparity in V1 could be predicted from direction selectivity, so it is unclear whether their results are compatible with standard models. If they are, then the finding that delay-induced shifts in preferred disparity are more common in cat A17/18 and monkey MT than in monkey V1 may simply reflect the well-documented fact that direction-selective cells are less common in monkey striate cortex (Casanova et al. 1992; DeValois et al. 1982; Gizzi et al. 1990; Hamilton et al. 1989; Hawken et al. 1988).
Thus the current data permit a very simple interpretation: that joint encoding of disparity and motion is found only with direction selectivity. If correct, this would raise an interesting puzzle about the role of disparity-selective neurons in V1 that are not direction selective. Modern theories of the Pulfrich effect use only joint motion/disparity sensors, ignoring disparity-selective cells that are nondirectional. The implication is that these cells do not contribute to depth perception, despite the disparity signal they carry. Before this puzzle can be addressed, it is first necessary to substantiate this simple interpretation of the physiological data.
To explore all these issues, we examined the interaction between temporal delay and disparity tuning in neurons' responses to random-dot stereograms. We aimed to answer 3 main questions. 1) What is the temporal window over which neurons integrate binocular information? 2) Does this explain the temporal integration observed psychophysically? 3) Does interocular delay shift tuning for horizontal disparity, and is this as expected from direction selectivity? Importantly, we compared neuronal properties with psychophysical results in the same animals. This enables us to explore both whether the results can be explained mechanistically with simple models and also whether they are compatible with psychophysical performance. In this way, physiological recording helps bridge the gap between our understanding of early visual mechanisms and perceptual experience.
Two adult male macaque monkeys were implanted under general anesthesia with scleral search coils in both eyes, a head-restraining post, and a recording chamber placed over the operculum of V1. Glass-coated platinum–iridium electrodes (FHC) were placed transdurally each day. All protocols were approved by the Institute Animal Care and Use Committee and complied with Public Health Service policy on the humane care and use of laboratory animals.
Stimuli were generated on a Silicon Graphics Octane workstation and presented on 2 Eizo Flexscan F980 monitors (mean luminance 41.1 cd/m2, contrast 99%, frame rate 72 Hz) viewed by a Wheatstone stereoscope. At the viewing distance used (89 cm) each pixel in the 1,280 × 1,024 display subtended 1.1 min arc, and antialiasing was used to render with subpixel accuracy. The monkeys initiated a stimulus presentation by maintaining fixation on a binocularly presented spot to within ±1°. They were required to maintain fixation for 2.1 s to earn a fluid reward. During each such trial, 4 stimuli were presented, each lasting 420 ms, separated by 100 ms.
In the experiments probing disparity tuning, the stimuli were random-dot stereograms composed of black and white dots in equal proportions (dot size 0.1° square), presented against a gray background. The dot density was sufficient to cover 50% of the gray background but, because the dots were allowed to overlap one another, the total coverage was somewhat <50%. A central disparate region was presented within a larger surround, to remove monocular clues to disparity. The stimulus size was almost always 3 × 3° for the central disparate region and 4.5 × 4.5° for the surround; for a few cells, these values were altered slightly to optimize the cell's response. On each new video frame, a new pattern of random dots was presented. A single 420-ms stimulus therefore contained a sequence of 30 different random-dot patterns. Interocular delay was manipulated by shifting the sequence of dot patterns shown in one eye. Thus if frames 0–29 were shown to the left eye while frames 1–30 were shown to the right eye, this is described as an interocular delay of one frame (positive delays indicate that the right eye is shown any one dot pattern first). The first few frames of such a sequence are shown in Fig. 1. Note that on the first frame of such a sequence, the right eye was shown frame 1, whereas the left eye was shown a dot pattern that was never presented to the right eye. This ensured that the stimulus onset and offset did not change with interocular delay.
Drifting grating stimuli were used to measure the cell's direction selectivity. The orientation and spatial frequency were varied to find the optimum grating stimulus, and then the cell's direction selectivity was assessed by recording responses to this grating as it drifted in either direction. The grating stimuli were usually 3 × 3°, although for nearly half of cells this had to be reduced to maintain responsiveness. The stimulus was always kept larger than 1 × 1°, to minimize the disruption caused by small fixational eye movements.
The variance of neuronal spike counts is typically proportional to the mean spike count (Dean 1981). To avoid having to correct for the changing variance, we performed all our analysis on the square root of neuronal firing rates. (Because the stimulus duration was the same for all stimuli, this is equivalent to using spike counts.) The variance of the transformed firing rates is roughly independent of the mean, greatly simplifying the analysis (Prince et al. 2002). We write ri(δ, τ) for the obtained on the ith trial with disparity δ and interocular delay τ, and r(δ, τ) for the mean value of averaged across all trials with this disparity/delay combination.
Disparity discrimination index
The strength of disparity tuning for zero interocular delay was determined with the disparity discrimination index (DDI; Prince et al. 2002) (1) where r represents the mean value of as a function of disparity at zero interocular delay, rmax is the mean at the preferred disparity, and rmin is the mean at the null disparity. RMSerror is the square root of the residual variance around the means across the whole tuning curve. This is a contrast measure in which the range of the response is compared with the range plus its variability. Prince et al. (2002) showed in some detail that the use of ensures that the index is not distorted by changes in mean firing rate; without this, cells with smaller mean firing rates would appear more disparity-selective.
Direction selectivity index
The strength of direction tuning was measured with a direction selectivity index (DSI) defined analogously to the DDI where rpref and rnull represent the mean value of in the preferred and null directions, respectively, for the binocular grating stimulus. RMSerror is the square root of the residual variance around the means across both conditions. The DSI was assigned a positive sign if the direction closest to rightward was preferred and negative if the direction closest to leftward was preferred.
Tilt direction index.
Disparity tuning curves were obtained for several different values of interocular delay. In this way, the cell's response can be plotted as a delay/disparity profile. This is a cell's firing rate as a function both of disparity and of interocular delay (see Fig. 3 for examples). To quantify changes in the cell's preferred disparity as a function of interocular delay, we use the tilt direction index (TDI) introduced by Anzai et al. (2001). The TDI is obtained from the Fourier transform of the delay/disparity profile. First we compute the DC component where the angle bracket with subscript δ,τ indicates averaging over all disparities δ and delays τ, and r(δ, τ) is the mean at disparity δ and interocular delay τ. After subtraction of this DC component, the Fourier amplitude at disparity frequency f and delay frequency ν is We calculated the Fourier amplitude on a finely spaced grid of disparity frequencies f and delay frequencies ν, up to the Nyquist limits implied by the sampling of disparity and delay. We define Rp to be the peak Fourier amplitude, and fp and νp to be the disparity and delay frequencies at which this occurs. The TDI contrasts the amplitude of this dominant component with the amplitude Rn of the component with the opposite direction in space–time, Rn = R(fp, −νp). Thus
Note several subtle differences between the present TDI and that of Anzai et al. (2001). First, their method of analysis automatically eliminated any DC component, so they did not need to explicitly subtract it. Second, they used the fast Fourier transform, so frequencies were sampled relatively coarsely; we interpolated to sample more finely in f and ν. Finally, their TDI was unsigned, whereas we have introduced a sign to enable us to relate the sense of the tilt, clockwise or counterclockwise, to the cell's direction preference (not reported by Anzai et al.). A positive TDI means that the cell shifts from preferring near disparities when the right eye is leading, to preferring far disparities when the left eye is leading (Fig. 2 A). We expect a positive TDI to be associated with a preference for rightward motion [positive direction selectivity index (DSI)]. To see why, note that an object with a far disparity appears to the left of the fixation point in the left eye and to the right of fixation in the right eye. If the object stimulates the left-eye receptive field first (negative interocular delay), then the object appears first to the left and then to the right of fixation; in other words it appears to move right. Thus a cell that is tuned for rightward motion will respond well to far disparities when the left eye is leading. By a similar argument, the cell will also respond well to near disparities when the right eye is leading. Figure 2B–D, shows this diagrammatically: the pattern of stimulation produced by a zero-disparity object moving to the right resembles that produced by a stationary near object when the right eye is leading, or that produced by a stationary far object when the left eye is leading. Thus a cell that prefers rightward motion (positive DSI) is expected to change its disparity preference from near at positive interocular delays to far at negative interocular delays (i.e., have a positive TDI).
Fitting the delay/disparity profiles.
Delay/disparity profiles were fitted with a 2-dimensional (2D) Gabor function G, which was a function of horizontal disparity δ and interocular delay τ. Because neuronal firing rates cannot be negative, the Gabor function was half-wave rectified (2) where (3)
The Gabor function has 9 free parameters: B, A, σ‖, σ⊥, f, δ0, τ0, φ, and θ. B represents the baseline firing rate and, equivalently, the response to uncorrelated random-dot patterns. A controls the amplitude of the disparity response. The angle θ controls the orientation of the Gabor relative to the space–time axes. The other parameters have a simple interpretation when θ is zero: then, G(δ, τ) is the product of a Gaussian along the delay axis, and a (half-wave rectified) one-dimensional (1D) Gabor function along the disparity axis. δ0 is the disparity of the center of the Gaussian envelope of the 1D Gabor, whereas σ⊥ is its SD in degrees of disparity. f is the frequency of the 1D Gabor carrier in cycles per degree disparity and φ is its phase. Similarly, τ0 is the temporal delay at which the response is maximal and σ‖ is the SD of the temporal Gaussian, both in milliseconds. When θ is nonzero, these units are not valid because then the 1D Gabor and 1D Gaussian are no longer aligned with the space–time axes. σ‖ and σ⊥ are the SD values parallel and orthogonal to the carrier cosine, respectively. To obtain a general measure of how the cell's disparity selectivity decays as a function of time, valid for all θ, we integrate the Gaussian envelope of the 2D Gabor (Eq. 2) over disparity. This gives a Gaussian function of interocular delay, whose SD is (4)
In practice, the value of θ was usually so small that στ was effectively the same as σ‖, but this correction ensures that στ remains a valid measure of the sensitivity to interocular delay even for large θ.
To weight responses according to variance, we first took the square root of the firing rates, and fitted this with by the method of least-squares (Prince et al. 2002). The parameters were constrained as follows: B was forced to be positive and less than the maximum square-mean-root firing rate. A was forced to be positive and less than twice the range of the square-mean-root firing rate. The frequency f was forced to be positive and less than the Nyquist limit implied by the sampling along the disparity axis. The SD terms σ⊥ and σ‖ were forced to be positive and less than the range of sampled disparities/delays, respectively. δ0 and τ0 were not allowed to lie outside the range of sampled disparities/delays, respectively. θ was forced to lie between +45 and −45°.
The significance level was set to P = 0.05 throughout. Confidence intervals were obtained by bootstrap resampling (Efron 1979). This is a means of generating representative new data sets from a given experimental data set. For the physiological data, the residuals were calculated by subtracting the mean at each disparity and delay from the obtained on individual trials
Then residuals were pooled across all disparities and delays. A new for a particular disparity/delay combination was generated by taking the original mean , r(δ, τ), and adding on a value drawn at random from the pool of residuals. This procedure was repeated as many times as there were trials at that disparity/delay in the original data set. These were then averaged to obtain a new, “resampled” mean . A similar procedure was also used to generate resampled responses to grating stimuli. This pooling was essential to avoid biases because we had on average 12 repetitions at a single disparity and delay (for some cells the number was as low as 4). Resampling with so few samples underestimates the true variance of the population. Pooling across residuals substantially increases the number of samples, to over 500 on average, and produces more conservative confidence intervals (Read and Cumming 2003). The validity of this procedure depends critically on variance being comparable across all disparity/delay combinations. Because the variance of neuronal firing rates tends to vary with the mean, this pooling would not be valid if the resampling procedure were applied to raw neuronal firing rates. This is why we first stabilized the variance by taking the square root of firing rates before applying the bootstrap.
Derived quantities such as TDI and DSI were calculated for each of 1,000 resampled data sets, and the 2.5 and 97.5% percentiles of the 1,000 values were taken as estimates of the 95% confidence interval of the quantity in question, given the variance in the original data. For example, if the 95% confidence interval for the TDI included zero, we concluded that the TDI did not differ significantly from zero. In the figures, error bars for these derived quantities show the 68% confidence intervals, again obtained by resampling. This is equivalent to showing ±1 SE for a normally distributed quantity.
The stimuli were the same dynamic random-dot stereograms as used in the physiology experiments, containing both disparity and interocular delay. They were presented at the mean of the locations used for recording, 1.65° below the horizontal meridian and 5.33° either to the right or left of the vertical meridian. Monkeys viewed the stereogram for 2 s, and then made a forced-choice judgment as to whether the disparate region appeared in front of or behind the surround. Monkeys indicated their choices by making a saccade as described in Prince et al. (2000), and received a water reward for a correct judgment. The definition of correct referred to the spatial disparity of the stimulus. Adding interocular delay to the dynamic random-dot patterns would be expected to introduce an additional percept of a swirling cloud rotating in depth (Ross 1974; Tyler 1974, 1977). However, this cloud should be symmetric about the depth defined by the spatial disparity, so should not bias subjects' reports. Because of previous measurements of their stereoacuity, before this study began, the monkeys were already fully trained on the front/back discrimination task for dynamic random-dot stereograms with zero interocular delay. Nevertheless, we still spent several months making sure their performance had asymptoted for stereograms with an interocular delay. Initially, delays of opposite signs but the same magnitude were randomly interleaved during a given block of presentations. However, because one sign of delay was often easier than the other, this led to an exaggeration of the difference between delays: We found that, when the harder delay was presented separately, the monkeys were capable of better performance than they had shown when both were interleaved. The data presented in this paper were therefore gathered in blocks where only one delay was presented. The disparity for each presentation was picked at random from a set of 8 disparities; the interocular delay and the location of the stimulus (left or right) were kept constant within each block.
Human subjects used the same stimuli as the monkeys, offset 5° to right or left of fixation. Because eye position was not monitored in the human subjects, we used short presentations, lasting 200 ms, and presented stimuli randomly on the left or right, to keep fixation centered. Delays of opposite sign but the same magnitude were also randomly interleaved within a block.
For each interocular delay, we obtained psychometric curves giving the proportion of correct judgments P as a function of disparity (see Fig. 10). These were fitted with cumulative Gaussians by the method of maximum likelihood. The disparity threshold for that interocular delay was defined to be the SD of the fitted cumulative Gaussian. Confidence intervals on the thresholds were generated by resampling from a binomial distribution. Simply resampling the subject's responses is unsatisfactory for such binary data. For example, if the subject judges “behind” 10 times on 10 presentations, then resampling will always yield a “behind” judgment for that disparity. Yet the 95% confidence interval for the true probability P of a “behind” judgment includes P as low as 0.7. Thus simply resampling the subject's judgments would underestimate the variability. We dealt with this by picking a new random P on each resampling run, from a probability density function reflecting the uncertainty in the value of P. If the subject made m “behind” judgments out of n presentations of a stimulus, then P was picked from the distribution proportional to Pm(1 − P)n−m. Using Bayes' rule, it can be shown that this distribution specifies the likelihood that the true probability was P, given that there were m “behind” judgments in n presentations. The distribution peaks at the observed proportion m/n and its width decreases with the number of repetitions n.
Several of the quantities that we discuss in this paper have arbitrary sign conventions. For convenience, we here group these together for reference. Disparity: negative values mean crossed (near) relative to the fixation point, positive mean uncrossed (far). Interocular delay: positive values mean that the right eye sees a given image before the left eye, and negative values vice versa (cf. Fig. 2A). Orientation of drifting grating stimuli: 0° means the bars are horizontal and moving down; 90° means the bars are vertical and moving to the left, and so on around the clock. Tilt direction index (TDI): a positive TDI means that the cell's disparity preference shifts from near disparities when the right eye is leading, to far disparities when the left eye is leading. Direction selectivity index (DSI): a positive DSI means that the cell responds more to stimuli moving to the right than to the left.
We recorded from 72 disparity-selective cells, 28 in monkey D and 44 in monkey R. Disparity-selective cells were defined as those in which disparity had a significant effect (P < 0.05, ANOVA), and whose disparity discrimination index (DDI, Eq. 1) was ≥0.3 for stimuli with no interocular delay. Cells that passed this test were probed with random-dot stereograms with ≥7 disparities and 5 interocular delays, and ≥4 trials at each disparity/delay combination. Each of these 72 data sets therefore contains ≥140 stimulus presentations, and the majority contain many more (mean over the 72 cells = 593). Figure 3A shows example results for one cell, r142. The dots show mean firing rate at different disparities; error bars show the SE. The curves show the 2D Gabor function fitted to all data together. Black shows the standard disparity tuning curve, obtained with random-dot stereograms with no interocular delay. The colors show disparity tuning curves obtained for different interocular delays, as indicated in the legend. The disparity tuning curves have roughly the same shape, independent of interocular delay. However, their amplitude decreases as the magnitude of interocular delay increases. For interocular delays of 28 ms, the disparity tuning is essentially abolished. In Fig. 3B, the same data are displayed as a delay/disparity profile. Now, the vertical axis shows interocular delay; firing rate is represented as color, as indicated in the color bar. The Gabor is now shown with contour lines. Figure 2A shows how to interpret the signs of disparity and delay.
We found that the cells fell into 2 broad groups. The larger group behaved like the example just considered. Interocular delay reduced the amplitude of disparity tuning curves, but did not substantially change their shape. Figure 3, C and D shows data from 2 more cells of this type. However, in a few cells, interocular delay systematically shifted the preferred disparity. Two examples of this type are shown in Fig. 4. In both these cells, the peak response shifts from far (positive) disparities when the left eye is leading the right (negative delays) to near (negative) disparities when the right eye leads the left (positive delays). In the color plot, this shift shows up as a diagonal structure. For this second group of cells, the delay/disparity profile is tilted relative to the space–time axes; it is space/time-inseparable. In contrast, the cells in Fig. 3, where preferred disparity is independent of interocular delay, show no such tilt: the delay/disparity profile is space/time-separable.
We quantified the amount of tilt using the tilt directional index introduced by Anzai et al. (2001). Although the absolute value of this index can range from 0 to 1, the distribution was highly skewed to small values. The mean magnitude of the TDI was 0.094 (SD = 0.17, SE = 0.020, n = 72) and the median was just 0.022. The TDI was significantly different from zero in only 5/72 cells (including both the examples shown in Fig. 4, r148 and r499). Thus the cells shown in Fig. 3 are much more typical of the population than those shown in Fig. 4. Space/time-separable profiles are far more common in monkey V1 than the inseparable, tilted profiles.
The advantage of Anzai et al.'s TDI as a measure of tilt is that it is model independent. For the 58/72 cells where the fitted Gabor explained more than 60% of the variance, the fit parameter θ provided an alternative measure of tilt (see Eq. 3). In 9/58 cells, θ is significantly different from zero, indicating that the delay/disparity profile is tilted relative to the space–time axes. This includes all 5 cells classed as tilted by the TDI; however, there are another 4 cells that have significant nonzero θ but not significant TDI. These are cells where θ differs only slightly from zero, although the data are sufficiently reliable that essentially the same θ is found on every resampling run. Two examples are d177 (Fig. 3C) and d294 (Fig. 6 C); in both of these θ was significantly different from zero even though the TDI was not, and very little tilt is apparent on inspecting the delay/disparity profile. We felt therefore that Anzai's TDI was more appropriate for classifying cells as space/time-separable or -inseparable, and we use this measure in the rest of the paper. Note that whichever measure is chosen, the overwhelming majority of cells are classified as nontilted, space/time-separable.
SIMPLE AND COMPLEX CELLS.
The classification of cells into simple and complex by means of their response to drifting gratings (ratio of fundamental to DC; Movshon et al. 1978a) presents problems in the awake monkey due to eye movements. We recently developed a classification using the response to counterphase-modulating stimuli to extract a complexity index (Cumming, unpublished observations). This information was available for 67/72 cells. Of the 67 cells, 52 (78%) were classed as complex and 15 as simple. There did not seem to be any difference between the space/time-separability of the simple and complex types. All 5 cells with significantly tilted profiles were classed as complex, compared with 47/67 cells where the tilt direction index was not significantly different from zero (70%) (NS, Fisher's exact test), nor was there a significant correlation between complexity index and the magnitude of the tilt (r = 0.16, n = 67, P = 0.20). This is not surprising; standard models such as the energy model do not lead us to expect any difference in separability between simple and complex cells.
For 55/72 cells, direction selectivity was measured with drifting gratings, presented binocularly and monocularly in both eyes. Because the orientation tuning measured in each eye was generally similar (cf. Bridge and Cumming 2001), gratings were presented at the same, optimal orientation in each of the 3 cases, and the response was compared for opposite directions of drift. The direction selectivity index (DSI) obtained with a monocular grating in the dominant eye was generally similar to that obtained with a grating presented binocularly (correlation coefficient r = 0.661, n = 55, P < 10−6). In what follows, the DSI refers to the results with either a binocular grating or a monocular grating in the dominant eye, whichever gave the strongest response at the preferred direction. Cells were classified as “direction selective” if the response to the 2 directions of motion was significantly different (P < 0.05) under the t-test: 22/55 (40%) cells were direction-selective with the optimal grating stimulus. This proportion is large compared with previously published estimates in V1 (35% in Schiller et al. 1976; 27% in DeValois et al. 1982; 27% in Orban et al. 1986; 28% in Hawken et al. 1988). This reflects a conscious selection bias. Because we were interested in the relationship between tilt and direction selectivity, toward the end of the study we would run the disparity/delay experiment whenever we encountered a direction-selective cell. Thus direction-selective cells were more likely to be included in this study.
RELATIONSHIP BETWEEN DIRECTION SELECTIVITY AND SPACE/TIME-INSEPARABILITY.
In the energy model and other models with an initially linear stage, tilted delay/disparity profiles must arise from tilted (space/time-inseparable) receptive fields (Anzai et al. 2001; Morgan and Fahle 2000; Qian and Andersen 1997). These tilted receptive fields would in turn make the cell sensitive to the direction of motion (Adelson and Bergen 1985); it would jointly encode motion and disparity. The preferred direction (leftward or rightward) can be predicted from the direction of the shift that interocular delay causes in preferred disparity. We investigated whether this expectation was borne out in our data by looking for a correlation between the DSI and the TDI.
With the sign convention we have chosen (Fig. 2), linear models predict that the signed TDI should be positively correlated with the signed DSI. If delaying the left eye's image shifts the cell's disparity tuning toward nearer disparities (positive TDI), then the cell is expected to prefer rightward-moving stimuli (positive DSI). As noted in methods, this is because the image of a near object falls to the left of fixation in the right eye and to the right of fixation in the left eye. If its image in the left eye is artificially delayed (positive interocular delay), then it is seen first to the left of fixation, then to the right (i.e., it appears to move to the right). Similarly, a far object with the same interocular delay will be seen as moving to the left. If the receptive field is tuned to rightward motion, this means that the near object will elicit larger responses, i.e., the positive interocular delay has shifted the cell's disparity tuning toward near disparities. This is defined as a positive TDI (Fig. 2A).
Figure 5 shows that this expected correlation is borne out. Figure 5A shows the correlation between DSI and TDI for all cells whose direction preference was assessed, apart from 3 cells whose orientation preference was exactly horizontal. These 3 were tested only with gratings drifting up or down, so it was not possible to assess whether they preferred leftward or rightward motion and no sign could be assigned to the DSI. None of the 3 showed a significant TDI. For the remaining 52 cells, TDI and DSI are significantly correlated (r = 0.46, n = 52, P < 0.001): cells with tilted delay/disparity profiles are, as expected, more likely to be direction selective. The filled symbols indicate the 5 cells with a statistically significant TDI; the 2 examples shown in Fig. 4, r148 and r499, are labeled. The diamonds indicate cells with a significant DSI. All 5 tilted cells are extremely direction-selective.
The correlation in Fig. 5A is weakened by other cells that are also direction selective but not tilted. Two examples are r089 and r066, labeled in Fig. 5A. These cells do not in fact provide convincing evidence against the simple linear models that require a correlation between DSI and TDI. Both r089 and r066 were so sensitive to interocular delay that their disparity tuning was essentially abolished by a delay of just one frame (see data for r066 in Fig. 9 B). It is possible, therefore that they might have revealed a tilted response if it had been possible to probe them with shorter delays. A different problem affects some other cells, such as ruf144. This cell's preferred orientation was about 15° from the horizontal, so it was tested with gratings drifting in near-vertical directions and found to be selective for upward versus downward motion. Thus according to standard linear models its receptive field should be inseparable on (y, t) axes, predicting that we would have found a tilted profile if we had measured the cell's response as a function of delay and vertical disparity. However, because we did not measure direction selectivity for horizontal motion, it remains possible that the cell's receptive field is separable on (x, t) axes, in which case the lack of a tilted profile for delay and horizontal disparity is unsurprising.
To avoid both problems, we reanalyzed the correlation between TDI and DSI excluding 21 cells whose preferred orientation was within 45° of horizontal, meaning that the relevant direction tuning was not measured, and a further 3 cells that were so sensitive to interocular delay that tilt could not reliably be assessed (στ < 10 ms, Eq. 4, meaning that an interocular delay of one 14-ms frame reduced the amplitude of disparity tuning to <40% of its peak amplitude). Figure 5B shows the correlation between DSI and TDI after excluding both sets of cells; the error bars show the 68% confidence interval estimated by resampling. The correlation is now even stronger (r = 0.64, n = 28, P < 0.001), and there are no cells that obviously violate the expected relationship. Three examples, labeled in Fig. 5B, are shown in Fig. 6. r549 (Fig. 6A) is strongly direction selective and has, as predicted, a tilted (inseparable) delay/disparity tuning surface. r233 (Fig. 6B) is a cell that responds equally to leftward/rightward motion and has, as predicted, a separable delay/disparity tuning surface. d294 is the closest we come to an exception to the prediction. It is direction selective but not significantly tilted. However, weak tilt is visible in the surface (the orientation of the fitted Gabor, θ in Eq. 3, is significantly different from zero) and the sign of the tilt is consistent with the sign of the direction selectivity (TDI and DSI are both positive). Thus across the population, more pronounced tilt measured with horizontal disparity is associated with more pronounced left/right direction tuning and the sign of the tilt is as expected from the direction tuning. This suggests that standard linear models are essentially accurate in predicting an association between space/time-inseparable disparity profiles and direction selectivity.
OPTIMAL INTEROCULAR DELAY.
In the delay/disparity profiles shown in Fig. 3, even though they are space/time-separable, it is noticeable that delays of opposite sign do not have the same effect on the cell's response. For example, in r142 (Fig. 3A), when the right eye's image sequence is delayed 14 ms relative to the left (blue curve), the amplitude of the disparity tuning curve is reduced to only about 80% of its zero-delay value. When the left eye is delayed 14 ms relative to the right (red), however, the amplitude is halved. Similar asymmetries are visible for the other cells shown in Fig. 3. The natural conclusion is that, had we been able to apply interocular delays in much smaller increments, we would have seen the response peak at a small but nonzero delay. We can estimate this optimal interocular delay from the fitted Gabor; the asymmetries visible at our coarse sampling have the effect of shifting the maximum amplitude of the fit away from zero interocular delay.
We define τp to be the interocular delay at which the fitted delay attains its maximum departure from baseline, within the range of disparities/delays actually sampled. Because this estimate of optimal interocular delay relies on the fit, we restricted our analysis to the 58/72 cells for which the fitted function explained more than 60% of the variance, although the results were essentially the same when all 72 cells were included. The distribution of τp for these 58 cells is shown in the red histogram at the top of Fig. 7. The distribution is clearly shifted toward negative delays, in which the right eye's image sequence is delayed relative to the left. The mean is −4.1 ms (±5.4 ms SD), marked in Fig. 7 with a vertical arrow and flanking broken lines. The distribution is clearly shifted well to the left of the black vertical line marking zero, and we can confidently reject the null hypothesis that the population mean is actually zero (P < 10−7, t-test). Most cells in our study respond best when there is a small interocular delay between the eyes, such that a given image is presented first in the left eye.
This asymmetry between left and right probably represents a nasotemporal difference. Because we were recording from the left hemisphere in both animals, the stimuli were all presented in the right visual field, i.e., projecting to nasal retina in the right eye and temporal retina in the left. Thus the retinal images fell closer to the optic disk in the right eye than in the left (Fig. 8). Although the distances are small, fibers in the retina are unmyelinated, so conduction velocity is slow: about 60 cm/s at 5° eccentricity (Sutter and Bearse 1999). This introduces a latency difference of a few milliseconds (Auerbach et al. 1961; Hood et al. 2000; Lee 1970; Sutter and Bearse 1999). If this time lag is not corrected for in the brain, but binocular neurons simply tend to respond best to changes occurring at the same time in inputs from both eyes, then this could explain the asymmetry in the neuronal data. Delaying the right eye's image sequence by a few milliseconds means that corresponding images from the 2 eyes reach the cortex at the same time. The optimal interocular delay in our neuronal data, around 4 ms, is commensurate with estimates of retinal conduction latency in humans (5 ms; Hood et al. 2000; Lee 1970).
TOLERANCE OF INTEROCULAR DELAY.
As the interocular delay moves further away from the optimal value for each cell, the amplitude of the disparity tuning curve decreases. To quantify the rate of this fall-off for both tilted and nontilted cells, we integrated the Gaussian envelope of the 2D Gabor fits over disparity and measured the SD of the resulting Gaussian, στ (Eq. 4). (In fact, as explained in methods, this correction for tilt was negligible: for all but one cell the corrected value στ was within 0.2 ms of the uncorrected value σ‖.)
For the 58/72 cells for which the Gabor fit explained more than 60% of the variance, the distribution of στ is shown in the blue histogram in Fig. 7. The mean value was 15.5 ms (±5.1 ms SD); again, this is shown with a horizontal arrow and dashed lines in Fig. 7. Space/time-inseparable delay/disparity profiles tended to extend longer temporally: 〈στ〉 = 22.5 ms (±6.7 ms SD) for the 5/58 cells with a significant TDI, and 14.9 ± 4.5 ms for the 53/58 cells that were not significantly tilted. The 2 groups had significantly different values of στ (P < 10−3, 2-sample t-test). Note that the larger temporal extent of the tilted profiles may partly reflect a selection effect: if neurons respond strongly over a wider range of delays, tilt is more readily detected. However, even when we restricted the analysis to the 50/58 cells for which στ > 10 ms (same criterion as applied to Fig. 5B), there was still a significant correlation between TDI and στ (r = 0.51, n = 50, P < 0.0002). We conclude that most V1 cells can detect disparities between correspondences that are separated in time by ≲15 ms, although a minority can detect disparities beyond 20 ms.
Comparison with temporal frequency tuning
The tolerance to interocular delay in random-dot stereograms implies a temporal integration time στ of around 15 ms. This would be expected to impose an upper limit on the cell's ability to respond to modulations of contrast at high temporal frequency. We looked to see whether this was reflected in the high-cut fhi of the contrast temporal frequency tuning curve, where the response falls to 61% of the maximum. The energy model predicts that fhi = 1/(2πστ). To our surprise, no such correlation was apparent. We experimented with several different ways of measuring temporal frequency tuning. We used drifting gratings at the cell's optimal spatial frequency and orientation, counterphase-modulating gratings at the cell's optimal spatial frequency and orientation, and counterphase-modulating random-dot patterns. These could be presented either monocularly or binocularly, and with presentations to different eyes either in blocks or interleaved. In cells where several measures were used, the high-cuts obtained by the different methods were generally correlated, but with considerable scatter. There was no correlation between fhi and 1/στ. In general, cells responded to higher temporal frequencies than would be predicted from their tolerance of relatively long interocular delays. The mean value of 1/(2πστ) was 12 Hz; the mean contrast high-cut was around 20 Hz.
Three examples are shown in Fig. 9. The top row shows delay/disparity profiles, with the integration time στ indicated with a blue arrow. The bottom row shows temporal frequency tuning for each cell. This was assessed for every cell with a counterphase-modulating binocular random-dot pattern (squares), and for 2 cells also with a drifting monocular grating (triangles). The curves show the fits made to the data. The high-frequency cutoff derived from each fit is marked with an arrow. The high-cut predicted from the delay/disparity profile, fhi = 1/(2πστ), is indicated with the blue arrow descending from the top row. The first column shows a rare cell whose high-cut temporal frequency is as predicted from its delay/disparity profile. Its temporal integration time στ, estimated from the fitted Gabor, was 7.7 ms, implying a high-cut of around 21 Hz. This is roughly what was obtained both with binocular counterphase-modulating random-dot patterns and with monocular drifting gratings (although the low-pass characteristics were very different with these 2 stimuli). r066 (Fig. 9B) is one of the few cells whose high-cut temporal frequency was lower than predicted from its delay/disparity profile. It was highly sensitive to interocular delay, its disparity tuning being completely abolished by interocular delays of just 14 ms, suggesting a short temporal integration time of perhaps 4 ms. It would thus be expected to continue responding to counterphase modulation of a random-dot stereogram up to frequencies >20 Hz, but in fact it gave its maximum response to modulations at just 2 Hz, and its response had fallen to 61% of this maximum by 6 Hz. D407 (Fig. 9C) shows one of the majority of cells whose high-cut temporal frequency was higher than predicted from its delay/disparity profile. It responded to disparity over a very wide range of interocular delays (fitted στ 27 ms), suggesting a high-cut of just 6 Hz. In fact, it carried on increasing its firing as the temporal frequency rose to 18 Hz, both for counterphase-modulating random-dot patterns and for drifting gratings.
This failure to find the correlation predicted by simple models suggests that, as indicated by previous studies in the cat (Dean et al. 1982; Reid et al. 1991, 1992; Tolhurst et al. 1980), V1 neurons contain temporal nonlinearities. These enable the cells to respond to high-frequency contrast modulation, while nevertheless tolerating relatively long interocular delays when comparing inputs from the 2 eyes.
We looked to see whether these temporal properties of V1 cells' disparity tuning were reflected in perceptual performance. We obtained horizontal disparity thresholds for random-dot stereograms at various interocular delays, and in both the left and right visual hemifields. Figure 10 shows example psychometric functions for monkey D. The curves are cumulative Gaussians fitted to the data. The SD of this cumulative Gaussian was taken to be the disparity threshold for this delay, i.e., the change in disparity needed to lift performance from chance to 84% correct. The magnitude of the interocular delay is 14 ms. The 4 curves show results for stimuli presented in left and right visual hemifields, and for both positive and negative delays. In both hemifields, the threshold depends on the sign of delay, with opposite signs in opposite hemifields giving the best performance. When stimuli are presented in the left hemifield, the threshold is lowest (stereoacuity highest) when the left eye sees a given image after the right (positive interocular delay). When stimuli are presented in the right hemifield, the reverse is true. This is exactly what one would expect based on the interocular latency difference attributed to retinal conduction. Suppose that binocular neurons are most sensitive to disparity, and stereoacuity is thus highest, when signals from both eyes simultaneously reach the cortex. When the stimulus is in the left visual hemifield, the image in the left eye falls closer to the optic disk. A small delay applied to the image in the left eye gives the image in the right eye a chance to “catch up,” so the 2 signals simultaneously reach the cortex. Stereoacuity is thus actually improved by a small positive interocular delay. The optimal delay is the difference in the times taken by the signals in the 2 retinae to reach the optic disk.
Figure 11 summarizes many similar psychometric functions for monkeys D and R, and for 2 human observers, BC and HN. The symbols show sensitivity (1/threshold) as a function of interocular delay. Results for the left visual hemifield are shown with red symbols and those for the right hemifield with blue. The 68% confidence interval for each threshold was estimated by resampling and is marked with an error bar. The curves show a Gaussian fitted to these data by the method of maximum likelihood. We know that the sensitivity must fall to zero as the interocular delay rises indefinitely, so the baseline of the Gaussian was set to zero. The fitting was performed in logarithmic coordinates (i.e., log-Gaussian was fitted to log-sensitivity) because the error bars increase as a function of sensitivity (and are nearly constant for log-sensitivity).
Optimal interocular delay.
The peak of the Gaussian is our estimate of the optimal interocular delay: the delay that would result in the greatest stereoacuity for stimuli presented in that hemifield. This is marked with a vertical arrow for each fit. For the right hemifield, we can compare these psychophysical results with the neuronal data. From the psychophysics, the optimal interocular delay for stimuli in the right hemifield was −5.7 ms for D and −6.1 ms for R. From the physiology, the optimal interocular delay was −4.1 ms (mean for 58 neurons from both monkeys; ±5.4 ms SD, 0.7 ms SE). These numbers are in good agreement.
The monkey observers, in particular, display a bias: their optimal delays are negative for both hemifields. The origin of this bias is unclear. Some humans do have a noticeable latency difference between their eyes (Harker and O'Neal 1967). In the case of our monkeys, the bias may be related to their long history of attending to and making psychophysical judgments about stimuli in the right hemifield. However, whatever the reason for the bias, we note that in every case the optimal delay is a few milliseconds more negative for stimuli presented in the right hemifield. This difference was significant for 3 of the subjects individually (P < 0.05, by bootstrap resampling; the exception being D); in Fig. 11 the width of each vertical line indicates the 68% confidence interval for the optimal delay. Although not definitive, this is nevertheless in the same direction as expected from retinal conduction delays. Thus it is possible that the small difference in optimal interocular delay for the 2 hemifields represents a perceptual consequence of conduction delays in the retina.
Tolerance of interocular delay.
The observer's tolerance of interocular delay is obtained from the rate at which the sensitivity to disparity falls off as the interocular delay moves away from the optimal value. This is given by the temporal SD of the Gaussian fits shown in Fig. 11. As is apparent from the figure, the SD was slightly narrower for the human observers (mean over both subjects and hemifields, 11.4 ms) than for the monkeys (15.1 ms). Once again, we can compare the psychophysical and neuronal results. The SD for psychophysical stimuli in the right hemifield is 14.7 ms for monkey D and 15.9 ms for monkey R; this compares with a neuronal estimate of 15.5 ms (mean over 58 neurons in both monkeys; ±5.1 ms SD, 0.7 ms SE). This is visualized in Fig. 11 by the black and blue curves, representing neuronal and psychophysical results respectively for stimuli in the right hemifield. The black curves are Gaussians whose SD (15.5 ms) and peak (at −4.1 ms) were taken from the neuronal data, and whose amplitude was set to match the peak performance in that hemifield. Clearly, the decline in psychophysical performance as a function of interocular delay mirrors the decline in neuronal sensitivity. This agreement suggests that the limits on perceptual performance in this task are set by the temporal properties of disparity-sensitive neurons in V1.
We measured responses of disparity-selective V1 neurons as a function of interocular delay in random-dot stereograms. The resulting disparity/delay profiles revealed 3 major findings. First, disparity selectivity is diminished by interocular delays, suggesting a binocular integration time of about 15 ms. Second, this is closely similar to the effect of interocular delay on psychophysical performance. Third, the preferred disparity for most neurons did not change as a function of interocular delay, suggesting that they do not jointly encode disparity and motion in the way that has been postulated to explain the Pulfrich effect (Anzai et al. 2001; Carney et al. 1989; Morgan and Castet 1995; Morgan and Fahle 2000; Morgan and Tyler 1995; Pack et al. 2003; Qian 1997; Qian and Andersen 1997).
Binocular integration time and tuning to interocular delay
Previous experimental and theoretical studies have raised the possibility that tuning to a range of interocular delays may exist. In agreement with Anzai et al. (2001) and Pack et al. (2003), we found that stimuli with no interocular delay almost always elicited stronger responses than stimuli delayed by one or more frames. However, the attenuation produced by delay was not symmetric: stimuli in which the right eye experienced the delay generally elicited stronger responses than those in which the left eye did. This suggested that, if we had been able to apply delays smaller than our 14-ms frame, the optimal delay might not have been exactly zero. Our fitting procedures suggested that, on average, the maximum response would have been obtained if the right eye had experienced a delay of 4–5 ms. Because all stimuli were presented in the right hemifield, this may reflect conduction delays in the retina (Fig. 8). Images in the right hemifield fall closer to the optic disk in the right eye than in the left. Because these fibers are unmyelinated, this is sufficient to introduce a relative lag of about 5 ms. If cortical neurons respond best when inputs from the 2 eyes arrive simultaneously, the optimum stimulus would be presented to the left eye before the right eye. Thus our results do not indicate a specialized encoding of interocular delay. Rather, it appears that, in general, cells simply fire most strongly when inputs from the 2 eyes are coincident. Intriguingly, this suggests that the cortical encoding has failed to adapt to these consistent interocular delays. Although this explanation is adequate to explain the data, data from 2 hemispheres of one animal are required to render it compelling.
As interocular delay increased, disparity tuning in each cell was gradually abolished. This is explained by the fact that, when the delay exceeds the period over which these neurons integrate information, the signals become uncorrelated. The time constant with which disparity tuning decays with delay gives us an estimate of the binocular integration time. The mean value was about 15 ms. This is in reasonable agreement with recent data from the monkey (Pack et al. 2003; although see Perez et al. 1999), but considerably shorter than previous estimates in the cat (Anzai et al. 2001; Gardner et al. 1985; Pettigrew et al. 1968). This difference between the cat and monkey probably reflects more sluggish temporal integration early in the cat's visual processing, since a similar difference is evident in tuning for contrast temporal frequency (DeAngelis et al. 1993; Hawken et al. 1996; Movshon et al. 1978b). The neuronal integration times seen in the monkey closely matched the psychophysical integration time we found in both monkey and human observers, suggesting that the temporal constraints used in stereo matching are implemented by these early stages of processing.
To explore the underlying mechanism, we compared this with the monocular integration time inferred from responses to stimuli of varying temporal frequency. The 2 measures of integration time were poorly correlated. In general, cells responded to higher temporal frequencies than would be predicted from their observed binocular integration times, although discrepancies were observed in both directions. These discrepancies seem to require a more complex model than a linear kernel followed by a static nonlinearity. The solution may be a nonlinearity in temporal processing, similar to that documented by several studies in the cat (Dean et al. 1982; Reid et al. 1991, 1992; Tolhurst et al. 1980). Alternatively the discrepancies could reflect temporal filtering that is somehow applied only to binocular responses (Julesz and White 1969).
Either way, the effect is to allow V1 neurons to modulate their response quickly when the monocular stimulus changes, while nevertheless integrating signals from the 2 eyes over relatively long periods (about 15 ms). Functionally, it is unclear why this long integration of binocular information should be beneficial. One possibility is that it is related to the nearly 5-ms latency difference between the eyes in normal viewing, noted above. If cortical neurons are unable to compensate for this latency difference, but always respond best to simultaneous inputs from the 2 eyes, then the binocular integration time of 15 ms ensures a good disparity response when the delay is 5 ms. Thus it may be that conduction delays in the retina place a lower limit on the binocular integration times of cortical neurons.
Joint encoding of motion and depth
In most cases, interocular delay reduced the amplitude of the disparity tuning curve without altering its shape. Preferred disparity did not usually change as a function of interocular delay, indicated by low values of the tilt direction index (TDI) (Anzai et al. 2001). The median value of the TDI was only 0.02 and the mean was 0.09. This is fairly similar to the mean of 0.17 reported by Pack et al. (2003) in macaque V1, but much lower than the results of Anzai et al. in cat area 17/18 (mean 0.44). Tilted delay/disparity profiles imply space/time-inseparable receptive fields. Cells with tilted profiles are therefore expected to be tuned to the direction of motion as well as to disparity (DeAngelis et al. 1995; Qian and Andersen 1997). This expectation was borne out in our data. Thus the lack of tilted profiles in macaque V1 (Pack et al. 2003; this study) compared with cat A17/18 (Anzai et al. 2001) is probably at least partly explained by the relative scarcity of direction selectivity in monkey V1 relative to cat A17/18 (Casanova et al. 1992; DeValois et al. 1982; Gizzi et al. 1990; Hawken et al. 1988).
Thus our results are compatible with the existing physiological literature: it appears that joint encoding of motion and depth is seen only in direction-selective neurons. This simple statement summarizes our data, that of Anzai et al. and that of Pack et al. for both V1 and MT, and agrees with the model of Qian and Andersen (1997). However, it represents a challenge to existing explanations of the Pulfrich effect (Anzai et al. 2001; Carney et al. 1989; Morgan and Castet 1995; Morgan and Fahle 2000; Morgan and Tyler 1995; Pack et al. 2003; Qian 1997; Qian and Andersen 1997) that rely exclusively on cells jointly encoding both motion and disparity. Our results were obtained with a version of the dynamic-noise-with-delay stimulus (Ross 1976, 1974; Tyler 1977, 1974), widely regarded as the most compelling evidence for joint encoding (Morgan and Fahle 2000; Morgan and Tyler 1995; Morgan and Ward 1980). The absence of joint-encoding responses in single neurons, in response to the very stimulus that led to their theoretical adoption, highlights the need to reevaluate theories of the Pulfrich effect. It is now clear that most disparity-selective cells in monkey V1 do not jointly encode motion and disparity. Thus the joint-encoding model implies that the perception of depth caused by Pulfrich-like stimuli is supported only by the small minority of cells with tilted delay/disparity profiles. Although not impossible—for example, the brain area that is the neural correlate of depth perception may receive projections preferentially from direction-selective V1 neurons, perhaps via MT (Movshon and Newsome 1996; Pack et al. 2003)—this would mean that the majority of disparity-selective neurons in V1, although encoding substantial information about the binocular disparity of the stimulus, make no contribution to stereo depth perception.
Before drawing such a surprising conclusion, note that there are (at least) 2 schemes under which the nontilted disparity-selective neurons in V1 do contribute to depth perception. First, it may be that the outputs of V1 neurons with nontilted profiles are combined in extrastriate cortex with outputs from other neurons, so as to produce tilted profiles from which perception is derived. Second, it is possible that joint motion/disparity sensors are not solely responsible for depth perception in Pulfrich-like stimuli. Although in recent years there has been an emphasis on explanations that invoke joint encoding (Anzai et al. 2001; Morgan and Castet 1995; Morgan and Fahle 2000; Morgan and Tyler 1995; Qian 1997; Qian and Andersen 1997), earlier theories explained depth perception in terms of spatial disparities physically present in the stimulus (Morgan 1979; Tyler 1977, 1974). These earlier theories have been discarded after the success of simulations based on the joint encoding of disparity and motion in model neurons (Qian and Andersen 1997). However, the possible role of nontilted disparity/delay profiles in explaining Pulfrich-like phenomena has not been explored in explicit models. Thus disparity-related signals in V1 may not need to undergo any transformation to explain depth perception in the Pulfrich effect. In this view, the disparity information contained in the activity of V1 neurons with nontilted disparity/delay profiles would contribute to depth perception. We stress that we are here discussing only the depth percept. Obviously, because such cells are generally not direction selective, they would not contribute to the perception of motion. Direction-selective cells, characterized by tilted disparity/delay profiles, must be crucial for perceiving stimulus motion in the Pulfrich effect.
A thought experiment helps distinguish these hypotheses. Imagine a lesion that destroyed all joint motion/disparity sensors everywhere in the brain. If the illusory depth in the stroboscopic Pulfrich effect stems solely from joint motion/disparity encoding, then clearly this lesion will abolish the perception of depth. If, on the other hand, the sensation of depth in this stimulus arises partially from the activity of pure disparity sensors, then an animal with this lesion might still be able to discriminate depth in the stroboscopic Pulfrich stimulus—even though the lesion would be expected to abolish the apparent motion usually associated with the depth. Needless to say, this lesion will be unfeasible for the foreseeable future. A more practical approach may be to investigate models based on an initially separate encoding of disparity and motion, and then test quantitative psychophysical predictions. A key element of such models is the use of appropriate binocular integration times for the neuronal elements, made possible by the data presented here. Together, physiology, psychophysics, and computational modeling may then finally yield a definitive understanding of the Pulfrich effect.
This work was supported by the National Eye Institute.
Thanks to M. Szarowicz and C. Hillman for excellent animal care.
The costs of publication of this article were defrayed in part by the payment of page charges. The article must therefore be hereby marked “advertisement” in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.
- Copyright © 2005 by the American Physiological Society