The Journal of Neurophysiology Vol. 77 No. 6 June 1997, pp. 2879-2909
Copyright ©1997 by the American Physiological Society
Encoding of Binocular Disparity by Complex Cells in the
Cat's Visual Cortex
Izumi Ohzawa,
Gregory C. Deangelis, and
Ralph D. Freeman
Group in Vision Science, School of Optometry, University of California, Berkeley, California 94720-2020
 |
ABSTRACT |
Ohzawa, Izumi, Gregory C. DeAngelis, and Ralph D. Freeman. Encoding of binocular disparity by complex cells in the cat's visual cortex. J. Neurophysiol. 77: 2879-2909, 1997. To examine the roles that complex cells play in stereopsis, we have recorded extracellularly from isolated single neurons in the striate cortex of anesthetized paralyzed cats. We measured binocular responses of complex cells using a comprehensive stimulus set that encompasses all possible combinations of positions over the receptive fields for the two eyes. For a given position combination, stimulus contrast could be the same for the two eyes (2 bright or 2 dark bars) or opposite (1 bright and 1 dark). These measurements provide a binocular receptive field (RF) profile that completely characterizes complex cell responses in a joint domain of left and right stimulus positions. Complex cells typically exhibit a strong selectivity for binocular disparity, but are only broadly selective for stimulus position. For most cells, selectivity for disparity is more than twice as narrow as that for position. These characteristics are highly desirable if we assume that a disparity sensor should exhibit position invariance while encoding small changes in stimulus depth. Complex cells have nearly identical binocular RFs for bright and dark stimuli as long as the sign of stimulus contrast is the same for the two eyes. When stimulus contrast is opposite, the binocular RF also is inverted such that excitatory subregions become suppressive. We have developed a disparity energy model that accounts for the behavior of disparity-sensitive complex cells. This is a hierarchical model that incorporates specific constraints on the selection of simple cells from which a complex cell receives input. Experimental data are used to examine quantitatively predictions of the model. Responses of complex cells generally agree well with predictions of the disparity energy model. However, various types of deviations from the predictions also are found, including a highly elongated excitatory region beyond that supported by a single energy mechanism. Complex cells in the visual cortex appear to provide a next level of abstraction in encoding information for stereopsis based on the activity of a group of simple-type subunits. In addition to exhibiting narrow disparity tuning and position invariance, these cells seem to provide a partial solution to the stereo correspondence problem that arises in complex natural scenes. Based on their binocular response properties, these cells provide a substantial reduction in the complexity of the correspondence problem.
 |
INTRODUCTION |
One of the most remarkable features of the visual system is the ability to see the world in three-dimensional depth. The visual system reconstructs depth from the pair of two-dimensional images projected on the retinas of the two eyes. These two images are very similar, but they contain small variations in the position of corresponding features in the visual scene, because the two eyes see the world from slightly different view points. This positional variation is called binocular disparity. Although there are other means for estimating depth, stereopsis, the process of recovering depth information from binocular disparity, is usually the most robust and accurate for near distances (Howard and Rogers 1995
; Pierce and Benton 1975
). It has been demonstrated that binocular disparity alone can give rise to a vivid sensation of depth without the presence of any other depth cues (Julesz 1960
, 1971
; Wheatsone 1838).
The neural analysis of visual information for stereopsis is thought to begin in the primary visual cortex because it is the first stage along the visual pathway where neurons may be activated by stimulation of either eye and because extensive binocular interactions occur between stimuli presented to the two eyes simultaneously (Barlow et al. 1967
; Ferster 1981
; Hubel and Wiesel 1962
, 1968
; LeVay and Voigt 1988
; Nikara et al. 1968
; Ohzawa and Freeman 1986a
,b
; Ohzawa et al. 1990
; Pettigrew et al. 1968
; Poggio and Fischer 1977
; von der Heydt et al. 1978
). Although there are numerous studies that present descriptions of how neurons respond to binocular stimuli, little is known as to the specific neural circuitry that endows these neurons with the ability to respond to stereoscopic stimuli. We do not yet know, for example, the roles that simple and complex cells play with respect to stereopsis. Although both types of neurons clearly are tuned for binocular disparities (Ferster 1981
; Joshua and Bishop 1970
; Pettigrew et al. 1968
), only complex cells appear to respond selectively to dynamic random dot stereograms (DRDS), which are defined by binocular disparity alone (Poggio et al. 1985
, 1988
; Poggio and Poggio 1984
; Poggio 1995
). These findings suggest that complex cells may perform more advanced and specialized processing of binocular information for stereopsis than simple cells.
Because the overall size of a complex cell receptive field (RF) is much larger than its optimal width for a bar-shaped stimulus (Emerson et al. 1987
; Gaska et al. 1994
; Movshon et al. 1978a
), it is expected that multiple image features in the visual scene (with sizes optimally excitatory to the cell) will fall into the RF of each complex cell (Fig. 1A). Unlike simple cells, which have multiple discrete flanks and appear capable of signaling the presence of multiple features within the RF (Fig. 1B) by a linear transform (Gabor 1946
; Geisler and Hamilton 1986
; Ohzawa et al. 1996
; Robson 1983
; Watson 1991
), complex cells seem to face a more difficult binocular correspondence problem, i.e., identification of corresponding image features in left and right images. This is one of the key problems in stereopsis (Julesz 1968
, 1971
; Marr and Poggio 1976
). As illustrated in Fig. 1C (Julesz 1968
, 1971
), the correspondence problem arises because there is inherent ambiguity in attempts to match corresponding features in left and right images. Without an appropriate filtering mechanism, false targets (dots in Fig. 1, C and D) may elicit as much excitation as correctly matched targets (open and filled squares in Fig. 1, C and D). An examination of Fig. 1C indicates that one of the appropriate filtering operations may be the selection of the matches contained within the horizontally elongated ellipse. When plotted in a familiar Cartesian coordinate system (XL, XR), the desired region of sensitivity for such a filter is an elongated diagonal region as shown in Fig. 1D. Such a sensitivity map may be described as the binocular receptive field of the cell. We examine quantitatively if complex cells possess binocular RF profiles similar to that shown in Fig. 1D. It is also of interest to know the effects of stimuli falling outside the elongated diagonal area. They could cause suppression or have no effect. We also examine the responses of models in the same (XL, XR) domain, and compare predicted responses with the binocular RFs obtained from cells.

View larger version (22K):
[in this window]
[in a new window]
| FIG. 1.
Binocular correspondence problem is illustrated in relation to 2 major receptive field (RF) types: simple and complex. A: complex cells generally have large RFs that may contain several targets (bars) as illustrated ( and ). B: simple cells, on the other hand, have discrete RF subregions. C: when multiple targets are present as in A, there are large number of possible binocular matches (all intersections of rays) including false matches ( ) and correct matches (enclosed in a horizontal ellipse), (redrawn after Julesz 1971 ). Number of possible matches grows with square of number of targets in each image. Any stereo vision system must be able to solve the binocular correspondence problem, i.e., that of finding correct matches for left and right targets. If a complex cell responds in a nondiscriminatory manner to any conjunction of left and right targets within its RFs, it will not be able to tell the difference between correct and false matches. D: configuration of C is shown again in a Cartesian coordinate system.
|
|
In addition to studying response properties of complex cells as outlined above, we wish to devise a physiologically realistic model for the role of these cells in binocular processing. Such a model must be consistent with the physiological data we obtain, as well as with findings from previous studies. We know that the RFs of complex cells appear generally broad in spatial extent, and nonspecific with respect to the sign of contrast (bright or dark) of a bar or edge stimulus (DeAngelis et al. 1995b
; Hubel and Wiesel 1962
; Movshon et al. 1978a
; Ohzawa et al. 1990
). However, they possess similar spatial frequency tuning properties to those of simple cells and only slightly broader orientation tuning characteristics (Gizzi et al. 1990
; DeValois et al. 1982
; Movshon et al. 1978b
). When studied with binocular stimuli, their disparity tuning is often narrower than the overall RF size predicts (Joshua and Bishop 1970
; Pettigrew et al. 1968
). This high degree of selectivity is thought to originate from multiple underlying RF subunits (Gaska et al. 1987
, 1994
; Movshon et al. 1978a
; Ohzawa and Freeman 1986b
; Spitzer and Hochstein 1985
; Szulborski and Palmer 1990
). Properties of these RF subunits appear to be similar in every respect to those of simple cells: 1) Within a monocular RF of a complex cell, spatial antagonism between neighboring subregions may be demonstrated by studying interactions between two stimuli that are presented at a variety of spatial separations (Movshon et al. 1978a
; Szulborski and Palmer 1990
). 2) These two-stimulus interaction profiles accurately predict the orientation and spatial frequency tuning of the complex cell (Gaska et al. 1994
; Movshon et al. 1978a
; Szulborski and Palmer 1990
). 3) RF subunits of complex cells combine input from the two eyes in a linear manner (Ohzawa and Freeman 1986b
). Furthermore, electrical stimulation of LGN afferents evokes mostly polysynaptic excitation in pyramidal neurons, especially in layers 2+3 (Douglas and Martin 1991
). These findings strongly suggest a hierarchical model of a complex cell that consists of a simple cell subunit stage and an output stage at which multiple subunits are combined.
On the basis of physiological findings outlined above, a number of monocular hierarchical models of complex cells has been proposed (Adelson and Bergen 1985
; Emerson et al. 1992
; Gaska et al. 1994
; Pollen et al. 1989
). In this study, we develop a model that is suitable for the binocular case (Ohzawa et al. 1990
) and derive a theoretical framework in which physiological results may be compared quantitatively with model predictions. For this purpose, we obtain a complete characterization of response properties of complex cells in the joint space-disparity-time domain and then compare these properties with predictions of specific models (Fleet et al. 1996
, 1997; Ohzawa et al. 1990
; Qian 1994
, 1997).
 |
METHODS |
Surgical methods, experimental apparatus, and neurophysiological recording procedures have been described in detail elsewhere (DeAngelis et al. 1993a
; Ohzawa et al. 1996
). Brief descriptions and procedures not described previously are presented here.
Surgical procedure
Adult cats (2-4 kg) were prepared for electrophysiological recording as follows. First, a subcutaneous injection was given of Atropine sulfate (0.2 mg/kg) and Acepromazine (1 mg/kg). Anesthesia was induced and maintained during surgery with halothane (2.5-3% in oxygen). Electrocardiogram (ECG) electrodes and a rectal temperature probe were installed. ECG and core temperature were monitored using a PC-based physiological monitoring system (Ghose et al. 1995
), which logs heart rate and temperature automatically every 5 min. Catheters were inserted into femoral veins on two limbs for infusion of drugs and fluids. A glass tracheal cannula was inserted immediately after tracheostomy. A stereotaxic apparatus was used to position the animal's head securely. Lidocaine ointment (5%) was used at pressure points. The skull was exposed, and two small machine screws were inserted for use as electroencephalogram (EEG) electrodes. Then, a craniotomy was performed to access the central representation of the visual field in the striate cortex (Horsley-Clark P4 L2.5). The dura was removed carefully to allow insertion of microelectrodes. After this point, anesthesia was administered by intravenous injection of sodium thiamylal (Surital). Then, paralysis was induced with an initial dose of gallamine triethiodide (Flaxedil, 7-10 mg/kg), and the animal was placed under artificial respiration. Anesthesia for the rest of the recording session was maintained by a combination of nitrous oxide (70% mixed with oxygen) and Surital (1 mg·kg
1·hr
1). Paralysis was maintained by continuous infusion of Flaxedil (10 mg·kg
1·hr
1) in lactated Ringer solution containing 5% dextrose. To maintain a proper level of respiration, a CO2 sensor (Hewlett-Packard 47210A) was used. For the remainder of the experiment, four physiological parameters: heart rate, temperature, end-tidal CO2 level, and EEG amplitude, were displayed continuously and logged by the monitoring system (Ghose et al. 1995
). The system provides voice warnings if any of these parameters exceed preset limits. Pupils were dilated with 1% atropine sulfate solution, and nictitating membranes were retracted with 5% phenylephrine HCl. Contact lenses of appropriate power with 4 mm artificial pupils were placed over the corneas. Locations of optic disks and the area centrales were mapped onto a tangent screen using a reversible ophthalmoscope.
Stimuli and data acquisition
A tangent screen with a back-projected bar stimulus was used for initial exploration of RFs. The position and orientation of the bar were controlled by a joystick to facilitate manual exploration. Visual stimuli for quantitative measurements were generated on a matched pair of cathode ray tube displays and presented dichoptically through half-silvered front-surface mirrors angled at 45° in front of the animal's eyes. The mean luminance of the displays was 45 cd·m
2 by direct viewing and 17 cd·m
2 as viewed through the half silvered mirrors. The screens were placed at 57 cm from the cat's eyes, at which distance they subtended 28 × 22°.
The displays were driven by a PC-based dichoptic visual stimulator with two graphics adapters (Imagraph) and had a spatial resolution of 1,024 × 804 pixels. The two displays were refreshed at a frame rate of 76 Hz. The timing of video frames for the two displays was synchronized by hardware so that the dichoptic stimuli might be delivered without onset asynchrony. The stimulator generated sync pulses that indicated stimulus onset and timing of temporal modulation. These pulses were recorded by the data acquisition system along with the spike data.
Conventional amplifiers, oscilloscopes, and audio speakers were used to monitor raw signals from the microelectrodes. Spike data and stimulus sync pulses were recorded by custom-built data acquisition systems. A separate computer was used to control experiments and to perform preliminary real-time analysis of incoming data. The PC-based visual stimulator was controlled via a serial port, and the acquired spike and stimulus sync data were received from the data acquisition systems via high-speed interfaces. Sufficient information to reconstruct each trial, as well as times of occurrence of all spikes and sync pulses, were saved to a file to allow flexible and complete reanalysis of data. The data were recorded with 1 ms (previous system) or 40-µs (current system) resolution.
Recording procedures
Tungsten-in-glass microelectrodes (Levick 1972
) were used for extracellular recording from neurons in the striate cortex. To increase the chance of encountering cells, two electrodes were mounted in a protective guide tube. They were driven in parallel with a single microelectrode drive (Inchworm, Burleigh). To minimize tissue damage, the two electrodes were not glued together to allow cortical tissue to pass between them. After confirming under a microscope that the electrodes do not penetrate blood vessels on the cortical surface, a small amount of agar in warm Ringer solution was applied to stabilize the cortex. Then molten wax was applied over the agar and the surrounding cranial bone to form a sealed chamber. This provided additional stability and protected the agar from drying.
After isolation of spike waveforms, the position and approximate preferred orientation of the RFs (for each eye) were recorded using a bar stimulus projected on the tangent screen. Then, an interactive search program (DeAngelis et al. 1993a
) was used to determine optimal parameters with a small circular patch of drifting sinusoidal grating. Initial estimates of the optimal spatial frequency, orientation, and the center location of RFs were obtained. These values were refined by subsequent quantitative measurements under computer control. As a routine procedure, orientation tuning, and direction selectivity were measured first for each eye. Then, spatial and temporal frequency tuning curves were obtained. Each curve was defined by 7-11 points, and a cubic spline procedure (Press et al. 1992
) was used to locate the peaks of the tuning curves.
Binocular RF measurement and analysis
A reverse correlation procedure (DeBoer and Kuyper 1968
; Eggermont et al. 1983
; Jones and Palmer 1987a
; Sutter 1974
, 1975
) was used to study binocular RFs. Details of our monocular reverse correlation method have been described previously (DeAngelis et al. 1993a
, 1995a
; Freeman and Ohzawa 1990
; Ohzawa et al. 1996
). A binocular version of the reverse correlation method was similar to the monocular one except that each stimulus consists of a pair of bars presented dichoptically, as illustrated schematically inFig. 2.

View larger version (36K):
[in this window]
[in a new window]
| FIG. 2.
A method of obtaining detailed binocular RF maps is illustrated. An exhaustive set of dichoptic stimuli is presented in which all possible combinations of left and right stimulus positions are included for each left-right permutation of contrast sign: dark-dark, bright-bright, dark-bright, and bright-dark. Stimuli in the set are presented in random order, 1 by 1, at a rate of 20-25 stimuli/s. Stimulus presentations are repeated after reshuffling the sequence. Resulting spike trains are cross-correlated with the stimulus sequence using a procedure known as reverse correlation (see text).
|
|
A binocular RF profile is defined in a joint two-dimensional domain (XL, XR) that includes the conjunction of left and right RFs (see Fig. 1D). Bright and dark bars of optimal orientations were presented in randomized order, at all possible combinations of left and right stimulus positions. Typically, 20 stimulus locations were used for each eye. This defined a 20 × 20 point stimulus grid in the (XL, XR) domain. Because each eye may be shown either a bright or dark bar, there were four permutations of bright and dark stimuli at each grid point. Therefore each binocular RF measurement tallied responses to 1,600 (20 × 20 × 4) distinct stimuli. The sheer size of the binocular stimulus set limited the stimulus configuration for each eye to a long bar that was moved along the axis perpendicular to the preferred orientation. Each stimulus was presented for three to four video frames (40-53 ms/stimulus or 19-25 stimuli/s) in a randomized sequence without any blank frames. Even at this rapid presentation rate, a complete stimulus sequence lasted 64-85 s.
Presentation of such a stimulus sequence elicited a train of spikes. For each spike generated, a causal stimulus pair that was likely to have elicited the spike was identified by looking up the stimulus that preceded the spike by a given delay (Fig. 2, top). For the real-time analysis during measurements, we used a delay of 40-70 ms, empirically determined to be effective for most neurons. However, the choice of this delay was not critical because we reanalyzed the data for delays ranging from 0 to 300 ms or more as soon as a run was completed. For example, for the right-most spike shown in Fig. 2, a look-up in the stimulus sequence identifies k as the causal stimulus, which happens to be a pair of bright and dark bars presented to the left and right eyes, respectively, at the locations indicated. An element in the two-dimensional map at the corresponding location (XL, XR) is incremented. In this example, only the map for bright-dark stimulus combination is shown, but note that there are a total for four such maps, one for each permutation of bright and dark stimuli for the two eyes. Stimulus sequences were repeated, with random reshuffling of the stimulus set, until smooth profiles were obtained (or until the unit was lost). Typically, a total of 20-40 sequences was used, which took 20 min to 1 h. This process yielded a complete binocular RF map for a given correlation delay. However, as we have emphasized for the monocular case, RFs should be considered in the joint space-time domain because spatial and temporal profiles are clearly interdependent in many cases (DeAngelis et al. 1995b
). For this reason, we computed binocular RFs in the joint domain of space and time (XL, XR, T), or in the related domain of binocular disparity and time (D, T), where T is the correlation delay.
Histology and laminar analysis
For each electrode track, electrolytic lesions (5 µA, 10 s) were made at 700- to 1,500-µm intervals while the electrodes were retracted. The animal then was given an overdose of pentobarbital sodium (Nembutal) and perfused through the heart with Formalin (4% in buffered saline). Coronal sections (40 µm thickness) of the visual cortex were made and stained with thionin. Electrode tracks then were reconstructed. Based on lesions and the depth information for each recorded cell, the laminar locations of the cells were identified. Histological analyses confirmed that all cells were recorded from area 17.
 |
RESULTS |
We recorded from a total of 257 neurons in the striate cortex of 18 normal adult cats. Of these, 115 were classified as complex on the basis of subjective criteria (Hubel and Wiesel 1962
) and on the degree of temporal modulation of responses to drifting sinusoidal gratings (Skottun et al. 1991
). The remaining 142 cells were classified as simple, and results for many of these cells are reported elsewhere (DeAngelis et al. 1991
, 1995a
; Ohzawa et al. 1996
). Of the 115 complex cells, quantitative binocular RF measurements were completed for 46 disparity selective cells and 8 disparity insensitive cells. Cells were sometimes lost during preliminary grating measurements for spatial frequency tuning or dichoptic relative-phase sensitivity (Ohzawa and Freeman 1986a
). We also should note that we focused on cells that showed clear relative-phase sensitivity with dichoptically presented gratings. We did not always perform complete RF mapping on cells that were nonphase-specific because they are not likely to be directly involved with stereopsis (Ohzawa and Freeman 1986b
; Ohzawa et al. 1996
). Therefore, our complex cell sample is biased toward those that exhibited some binocular interaction.
Figure 3A shows the measured binocular RF for a representative complex cell. There are four separate RF maps for different permutations of bright and dark stimuli presented to the two eyes. The basic characteristics of these RF maps may be summarized in three main points. First, there is a clear elongated region of strong excitation along a 45° diagonal when stimuli shown to the two eyes have the same contrast sign (i.e., bright-bright and dark-dark combinations). This pattern of response translates into a region of narrow selectivity for depth (Fig. 1C), thus eliminating many possible false matches that could occur within the cell's RF (Fig. 1D). Second, responses to bright-bright or dark-dark stimuli are nearly identical, indicating that the cell is not sensitive to the sign of contrast as long as it is matched for the two eyes. Third, for combinations of opposite stimulus contrast for the two eyes (bright-dark and dark-bright), there are two parallel regions of excitation on each side of the diagonal. These are nearly equally strong but are weaker than the excitation to the matched combinations of contrast. Note that responses near the margins of the XL-XR domains represent monocular excitation profiles, because one eye's stimulus is outside the RF for that eye. For example, the profile near the left margin of each domain (i.e., the left-most vertical cross-section) represents a monocular RF profile for the right eye. Compared with the excitation level at the peak of the monocular profile, there is a suppression of the response along the diagonal for the bright-dark and dark-bright combinations. This indicates that the response is suppressed when opposite contrasts are presented at the optimal binocular disparity (for same contrast combinations). The above pattern of binocular responses is observed frequently among complex cells. These results indicate that many complex cells respond in a nearly ideal manner, filtering out responses to false targets as outlined in Fig. 1, C and D, when the sign of contrast is matched for the two eyes.

View larger version (58K):
[in this window]
[in a new window]
| FIG. 3.
Representative data from 2 cells in striate cortex are shown. A: complex cell (layer 2+3). Binocular RFs in XL-XR domain are shown as contour plots for 4 permutations of stimulus contrast. The darker the shading within a contour, the greater the response. Peak response (darkest contours) for this cell was 2.0 spikes per stimulus presentation. Contour levels are equally spaced, and the same scale is used for all 4 domains. Dashed curves near the margins of each domain represent monocular RF profiles. Stimulus size, optimal spatial frequency, and orientation were 4 × 0.5°, 0.31 c/deg, and 75° (0° is horizontal and 90° is vertical), respectively. Stimulus duration and correlation delay were 52.8 and 65 ms, respectively. B: simple cell (layer 4). Monocular responses plotted near the margins of each panel (dashed curves) show the difference of bright and dark responses. Peak response for this cell was 3.9 spikes per stimulus. Stimulus size, optimal spatial frequency, and orientation were 5 × 0.5°, 0.4 c/deg, and 155°, respectively. Stimulus duration and correlation delay were 52.8 and 60 ms, respectively.
|
|
For comparison, results of a similar experiment on a simple cell are presented in Fig. 3B. The response pattern is clearly different from that of the complex cell of Fig. 3A. First, all four panels show different patterns. In particular, patterns for bright-bright and dark-dark stimulus combinations are not the same. In fact, bright-bright and dark-dark RF maps are complementary to each other: where there is a peak in one, there is a blank area in the other. This also applies to bright-dark and dark-bright conditions. The structure of the binocular RF profile for the simple cell is well predicted from the monocular response profiles given along the margins (dashed curves). The binocular pattern is approximately the sum of monocular excitation from the two eyes, i.e., those cross-points at which excitation (peaks of monocular profiles) from the left and right eyes coincide, show highly enhanced responses. On the other hand, at those locations where a trough in one eye's monocular profile meets a peak for the other eye, there is little response. This indicates that a bright-excitatory subregion is indeed inhibitory to a dark stimulus and a dark-excitatory subregion is inhibitory to a bright stimulus. This simple cell behavior is consistent with linear spatial summation of inputs from the two eyes (Ohzawa and Freeman 1986a
). Note that the behavior of the complex cell (Fig. 3A) cannot be explained by linear mechanisms. The basic pattern of binocular responses shown in Fig. 3B is duplicated for other simple cells we have studied. These results suggest that simple cells are involved in general purpose processing and are not specialized specifically for stereopsis. An array of simple cells represents a general linear transformation of the retinal images. As such, they are useful for a variety of visual functions (Ohzawa et al. 1996
).
Disparity tuning curves and disparity-time RFs
The format of data presentation in Fig. 3 is somewhat unusual in the sense that most binocular data in previous studies are presented in the form of a disparity tuning curve (i.e., a one-dimensional function of disparity) and not as a two-dimensional profile (Pettigrew et al. 1968
; Poggio and Fischer 1977
). This is primarily because of methodological limitations in previous studies. It would have taken too long with traditional peristimulus time histogram techniques, to measure responses to 400 (20 × 20) combinations of stimulus positions. By virtue of its experimental efficiency, the reverse correlation technique largely eliminates this difficulty, allowing point-by-point measurements of binocular responses in depth. To allow comparisons, we can reduce our data to obtain disparity tuning curves. This process is illustrated in Fig. 4. A disparity tuning curve is derived by integrating the two-dimensional XL-XR profile along constant disparity lines parallel to the 45° diagonal. The resulting curve is shown at the top-right of Fig. 4. [Note that the pythagorean geometrical distance relationship does not hold in the XL-XR domain. For example, the upper left and bottom right corners of the XL-XR domain are separated by 10 degrees of disparity as depicted in the tuning curve, not by 7.07 degrees (5·20.5). This is because the XL-XR domain is distorted from real space as illustrated in Fig. 1C.] Because we integrate over all combinations of positions that correspond to a particular disparity, this derivation process is essentially equivalent to measuring a disparity tuning curve with a slow moving bar stimulus, in which the total number of spikes generated by a swept bar is plotted as a function of disparity. Note that the original XL-XR profile is obtained using stationary flashed stimuli. Therefore, the disparity tuning curve generated in this manner is likely to be somewhat different from that obtained by swept bar stimuli. The difference will probably be more pronounced for direction selective complex cells for which second and higher order (sequential) stimulus effects make a large contribution (Baker 1990
; Emerson et al. 1987
). The disparity tuning curves that we obtain represent pure disparity sensitivity profiles without the confounding effects produced by moving stimuli. Similar problems are present when attempting to interpret monocular RF profiles measured using moving bar stimuli (DeAngelis et al. 1995a
; Maske et al. 1984
; Ohzawa et al. 1996
). Direct comparisons of our disparity tuning curves to those obtained by moving bar stimuli are not possible because we did not use such stimuli. In any case, neither our method nor the use of moving bar stimuli is appropriate for studying sequential effects. A nonlinear analysis technique (Anzai et al. 1995
) is required to adequately address this problem.

View larger version (44K):
[in this window]
[in a new window]
| FIG. 4.
A process is shown by which a traditional form of disparity tuning curve is derived from data in XL-XR domain. Oblique lines indicate loci along which binocular disparity of stimuli is constant. Integrating XL-XR data along lines of constant disparity yields the disparity tuning curve (top right).
|
|
In addition to facilitating comparisons between our data and those of previous studies, reducing the XL-XR profile to a disparity tuning curve allows us to examine the temporal behavior of disparity selectivity. We can compute disparity tuning curves (as in Fig. 4) for a range of correlation delays between stimulus and response, thus producing a disparity-time (D-T) plot. Just as the monocular space-time RF is a key predictor of direction selectivity for simple cells (Burr and Ross 1986
; DeAngelis et al. 1993a
,b
; McLean and Palmer 1989
; McLean et al. 1994
; Reid et al. 1991
; Watson and Ahumada 1985
), the disparity-time RFs can describe how neurons respond to changes in disparity of binocular stimuli, i.e., motion-in-depth (Cynader and Regan 1978
, 1982
; Spileers et al. 1990
). Figure 5 shows binocular RF profiles as well as disparity-time profiles for a complex cell. XL-XR profiles (Fig. 5A) show characteristics similar to those of the complex cell in Fig. 3A. Almost identical response patterns are observed for the two matched contrast sign conditions (bright-bright and dark-dark) each having a single diagonally elongated region of excitation. The two diagonal regions of excitation for opposite contrast conditions (bright-dark and dark-bright) are also similar to those for the cell of Fig. 3A, except that the responses are stronger. Disparity tuning profiles, derived as outlined above, are shown in Fig. 5B. These profiles clearly exhibit even-symmetric disparity tuning. Figure 5C shows D-T profiles for each left-right contrast sign combination. The D-T profiles have a relatively simple structure in that the shape of the disparity tuning curve remains constant over the time course of the response. Only the response amplitude appears to change over time, with the peak response occurring at 60 ms for this cell (dashed horizontal line). Note, in particular, that the preferred disparity remains constant over time, i.e., there is no tilt of excitatory regions in the disparity-time domain. This indicates that the cell is not particularly sensitive to motion-in-depth (Cynader and Regan 1978
, 1982
; Spileers et al. 1990
).

View larger version (52K):
[in this window]
[in a new window]
| FIG. 5.
Data from another complex cell (layer 2+3) are shown. A: data are presented in same format as in Fig. 3A. Peak response for this cell was 3.4 spikes per stimulus. Stimulus size, optimal spatial frequency, and orientation were 20 × 0.4°, 0.35 c/deg, and 145°, respectively. Stimulus duration and correlation delay were 52.8 and 60 ms, respectively. B: disparity tuning curves, obtained by procedure illustrated in Fig. 4, are shown for respective XL-XR plots above. C: time courses of disparity tuning are shown as disparity-time (D-T) plots. Time delay used for A and B is shown by a horizontal dashed line.
|
|
Figure 6 presents data from another cell. Again, XL-XR maps for the two matched-contrast conditions exhibit nearly identical response patterns, each having a 45° diagonal region of excitation. However, the disparity tuning curves shown in Fig. 6B are not even symmetric. There is a prominent dip in the response below the plateau level (monocular excitation level) on the left side of the excitatory peak, but not on the right side. This disparity tuning pattern is again constant over time as indicated by the D-T profiles shown in Fig. 6C. Curiously, for this cell, opposite contrast conditions (dark-bright and bright-dark) produced little binocular interaction. For these conditions, the disparity tuning curves in Fig. 6B do not show clear peaks, and the D-T profiles in Fig. 6C exhibit very little structure.

View larger version (50K):
[in this window]
[in a new window]
| FIG. 6.
Data from another complex cell (layer 2+3) are presented in same format as Fig. 5. A: XL-XR plots are shown. Peak response for this cell was 4.4 spikes per stimulus. Stimulus size, optimal spatial frequency, and orientation were 20 × 0.4°, 0.93 c/deg, and 85°, respectively. Stimulus duration and correlation delay were 52.8 and 70 ms, respectively. B: disparity tuning curves for this cell are nearly odd-symmetric with respect to center of envelope. C: disparity-time plots are shown.
|
|
Not all complex cells exhibit clear binocular interactions as shown in Figs. 3, 5, and 6. Figure 7 presents results from a cell that had no apparent binocular interaction. In Fig. 7A, excitation due to left and right eye monocular stimuli extends as vertical and horizontal bands, respectively, forming a cross-shaped profile. At the intersection of these bands, the excitation level is generally higher. This is most pronounced for the left-most panel (dark-dark), but it is also visible for the middle two panels. This pattern occurs because excitation from the two eyes adds together at the intersection. However, the transformed disparity tuning curves are flat, as shown in Fig. 7B. This may be understood by examining the procedure for deriving a disparity tuning curve, as depicted in Fig. 4. A diagonal line at any location in Fig. 4 will cross the peaks of excitation for the left and right eyes, given a sufficient path length. Therefore the cumulative response remains constant regardless of whether the monocular excitation coincides or occurs at separate positions along a given path. The existence of these cells is expected, because complex cells that are not selective to disparity have been reported previously (Chino et al. 1994
; Joshua and Bishop 1970
; Ohzawa and Freeman 1986b
). The cell shown in Fig. 7 did not exhibit interocular phase tuning to drifting sinusoidal grating stimuli presented dichoptically (Ohzawa and Freeman 1986b
).

View larger version (40K):
[in this window]
[in a new window]
| FIG. 7.
Data are illustrated for a complex cell (layer 5) that is not sensitive to binocular disparity. A: there is no diagonal structure in these XL-XR plots unlike cells in previous figures. Peak response for this cell was 2.7 spikes per stimulus. Stimulus size, optimal spatial frequency, and orientation were 20 × 0.25°, 0.95 c/deg, and 160°, respectively. Stimulus duration and correlation delay were 52.8 and 100 ms, respectively. B: disparity tuning curves are nearly flat for all permutations of stimulus contrast. D-T plots are not shown because there is no structure in any of the plots.
|
|
Disparity energy model
We have proposed a model for disparity-selective complex cells based on a combination of simple-cell subunits (Ohzawa et al. 1990
). Here, we present the model's behavior in detail and examine quantitatively whether it provides an adequate description of experimental data. Figure 8A illustrates the model for a complex cell that is tuned to zero disparity. The model consists of a minimum of four simple-cell subunits that are combined to produce the output of a complex cell. In this sense, the model is hierarchical and gives a concrete functional design to the original scheme proposed by Hubel and Wiesel (1962)
. Our model is also a natural binocular extension of monocular complex cell models (Adelson and Bergen 1985
; Emerson et al. 1992
; Pollen et al. 1989
). Each subunit is binocular and linearly combines inputs from the two eyes (Ohzawa and Freeman 1986a
). The output of the subunits passes through a half-squaring nonlinearity before converging onto a complex cell (Emerson et al. 1989
, 1992
; Heeger 1992a
,b
; Ohzawa et al. 1990
; Pollen et al. 1989
). The output of the subunit S1 at the top of Fig. 8A to a line stimulus (or a thin bar) is given by
|
|
|
(1)
|
where k is the factor that determines the width of the subunit RFs and f is the spatial frequency. These two parameters are assumed to be equal for the two eyes, and this assumption is supported by our data for simple cells (Ohzawa et al. 1996
). The parameter
is the phase difference between left and right RFs. Pos [
] is a half-rectifying function
|
(2)
|
This is a relatively straightforward model of a simple cell with linear binocular convergence (Ohzawa and Freeman 1986a
) and a half-squaring nonlinearity (Emerson et al. 1989
, 1992
; Heeger 1992a
,b
; Pollen et al. 1989
). This latter component may be considered as a form of "soft threshold" (Carandini et al. 1996
). The phase difference,
, accounts for the observation that the left and right RFs of simple cells may have different shapes (DeAngelis et al. 1991
, 1995a
; Freeman and Ohzawa 1990
; Ohzawa et al. 1996
). For simplicity, we use even- and odd-symmetric subunit RFs as shown in Fig. 8A. This nonbiological restriction (DeAngelis et al. 1993a
; Field and Tolhurst 1986
) is removed later.

View larger version (33K):
[in this window]
[in a new window]
| FIG. 8.
A disparity energy model is illustrated. A: a complex cell (Cx) tuned to 0 disparity is modeled as consisting of 4 simple subunits (S). Each subunit combines input from 2 eyes linearly according to left and right RFs (left). Output of each subunit goes through a half-squaring nonlinearity that represents the fact that only postsynaptic potentials that exceed a threshold value elicit action potentials. "Tap points" (S1-S4) are included for reference in B. B: binocular responses in XL-XR domain are shown as contour plots for 4 subunits. The darker the shading, the stronger the response. Responses are normalized for each plot to show details of weaker responses such as those for S2. C: binocular responses of complex cell are shown. This is simply a point-by-point sum with appropriate scaling of all 4 subunit RFs in B. The plot represents responses to bright or dark bar stimuli presented to both eyes. Compare this to experimental data in Fig. 3A. D: binocular responses to opposite contrast conditions are shown, i.e., a bright bar to 1 eye and a dark bar to the other. E: responses are shown of a model that employs a half-rectifier instead of a half-squarer as the output nonlinearity for simple subunits. Note that substantial ripples remain in the complex cell response profile, but the overall response pattern is similar to that of C.
|
|
In Fig. 8A, we show the case of
= 0. Plots for
0 are shown later. Because the subunits S1 and S2 have inverted RFs and for any function g(a)
|
(3)
|
the sum of contributions from the top two subunits S1 and S2 is given by
|
(4)
|
The response of complex cell C1 is then given by adding the contribution from subunits S3 and S4 to Eq. 4
|
(5)
|
This function is shown as a contour plot in Fig. 8C and is very similar to the responses of the complex cells shown in Figs. 3A and 5A for the matched contrast conditions. A traditional disparity tuning curve also is obtained for the model, using the procedure described in Fig. 4, and plotted below the two-dimensional profile in Fig. 8C. The disparity tuning curve is quite similar to those shown in Figs. 4 and 5B. The case described in Fig. 8A (
= 0) produces a cell that is tuned to zero disparity as indicated by the dashed diagonal line in Fig. 8C.
Models that compute the sum-of-squares (e.g., Eq. 5) are called "energy models" on the basis of a formal definition of energy (Adelson and Bergen 1985
). For example, in physics, the integral over time of the square of a voltage waveform across a resistor is proportional to the energy dissipated within the resistor. This notion may be generalized to neural signals. Simple-cell subunits that feed into a complex cell (i.e., a binocular energy unit) must meet specific requirements to produce a sufficiently smooth binocular profile. First, all monocular parameters must be the same among the four subunits, including spatial frequency, orientation, size, and position of the RF envelopes. This has been shown to be true for most simple cells (Ohzawa et al. 1996
; Skottun and Freeman 1984
). The requirement does not apply to phase, however. Second, all subunits must share a common preferred disparity as measured with bar or grating stimuli (Ohzawa and Freeman 1986b
). Although simple cells do not possess a unique preferred disparity to noise stimuli and hence they tend not to respond to dynamic noise stereograms (Poggio et al. 1985
; Qian 1994
; Zhu and Qian 1996
), they do exhibit a clear disparity tuning to bar or grating stimuli (Ferster 1981
; LeVay and Voigt 1988
; Ohzawa and Freeman 1986b
). Third, the phases of the four subunits (not left and right RF phases) must differ from each other by multiples of 90° (Pollen et al. 1989
). Our model assumes that these conditions are fulfilled.
To illustrate how each of the four subunits contributes to the final complex cell response, XL-XR maps for the individual subunits are shown in Fig. 8B. The binocular response profile of each subunit exhibit the pattern of excitation and inhibition that is well predicted by monocular RFs of the subunit. For example, even-symmetric monocular RFs of the subunit at the top of Fig. 8A yield the binocular response at S1, which is even-symmetric along both the horizontal and vertical axes. The response is strongest at the center of the profile, where the peaks of monocular excitation coincide. Responses of the second subunit, at S2, are shown in the second panel of Fig. 8B. Again, the binocular response pattern is related closely to the monocular RF structure shown in Fig. 8A (second subunit from top). The binocular response exhibits four peaks at locations where peaks in the two monocular RF profiles coincide. Note the similarity of the binocular responses of the model's subunits to those of the simple cell shown in Fig. 3B. This is not surprising given that binocular simple cells combine input from the two eyes in a linear manner (LeVay and Voigt 1988
; Ohzawa and Freeman 1986a
), as do the subunits in the model. The binocular response of the complex cell, as an XL-XR map, is given in Fig. 8C. This is the sum of the four subunit profiles shown in Fig. 8B. Note that the four subunit profiles, each of which lacks any elongation along the diagonal, combine to produce a remarkably smooth, diagonally elongated complex cell binocular RF profile.
Binocular responses to opposite contrast stimuli to the two eyes are shown in Fig. 8D (Ohzawa et al. 1990
). The response pattern may be expressed as
|
(6)
|
This equation is identical to Eq. 5, except for the sign inversion of the right-eye terms. There is prominent suppression along the diagonal, exactly at the location where there is an excitatory region in Fig. 8C. In this case, the cell exhibits two diagonal bands of excitation for opposite contrast conditions, again at the disparities where there was suppression in Fig. 8C.
The form of nonlinearity that follows binocular convergence in the linear subunits is important. The model shown in Fig. 8A employs a squaring nonlinearity that produces a smooth binocular response profile as shown in Fig. 8C. If, instead, a simple half-wave-rectifying nonlinearity is used, the otherwise identical model produces a binocular response as shown in Fig. 8E. For both monocular and binocular portions of the responses (responses near the edges and the central diagonal, respectively), substantial ripples are observed in the profile. Except for these ripples, however, the basic pattern of the response is similar to that of the squaring configuration shown in Fig. 8C. Additional subunits may be used in the model to smooth out the ripples to obtain a final smooth profile. However, the nonlinearity must be of a squaring form if the number of subunits is to be minimized.
Phase model
We previously have proposed an efficient scheme for encoding binocular disparity information by a population of simple cells in the striate cortex (DeAngelis et al. 1991
, 1995a
; Freeman and Ohzawa 1990
; Ohzawa et al. 1996
). This scheme is called a phase model because disparity information is encoded by binocular simple cells that have different RF phases (i.e., shapes) for the two eyes. The traditional notion of binocular disparity encoding is based on a position model, in which disparity is encoded via positional offsets of left and right RFs. We now examine predictions of our complex-cell model whose subunits have phase or position shifts. If the model provides different results for the phase and position schemes, we will be able to evaluate the schemes by examining data from complex cells.
A predicted response of a binocular energy unit based on the phase model is shown in Fig. 9A. Phases of the subunit RFs (right column) differs by 90° for the two eyes. We previously have shown that a substantial fraction of simple cells have RF profiles that are different for the two eyes (DeAngelis et al. 1991
, 1995a
; Freeman and Ohzawa 1990
; Ohzawa et al. 1996
). Therefore the scheme shown in Fig. 9A is quite reasonable with respect to the existence of the required subunits. Note that the only difference between the model complex cells in Fig. 8A and 9A is that the phases of the right eye RFs are shifted by 90° in the same direction for all four subunits. With this construction, the binocular response (Fig. 9A, left) no longer shows a symmetric profile with respect to the diagonal (compare with Fig. 8C). Instead, a region of excitation is shifted from and parallel to the 45° diagonal indicated by a dashed line. The odd-symmetry of the binocular response is clear in the disparity tuning curve shown below the two-dimensional profile in Fig. 9A. This response pattern is very similar to that of the cell shown in Figs. 6, A and B (left 2 panels). It will be shown below that the phase of this disparity tuning curve is
, the phase difference between the left and right RFs of the subunits.

View larger version (38K):
[in this window]
[in a new window]
| FIG. 9.
Two possible mechanisms are shown for creating complex cells that are tuned to a non-0 binocular disparity. A: phase model. Four subunits (right) have RF structures that differ in phase. All 4 units have a left-right phase difference of 90°. Left: vertical and horizontal dashed lines represent peaks of monocular RF profiles for left and right eyes, respectively. Peak binocular response (center of darkest contour) does not fall on diagonal dashed line. Disparity tuning curve (solid curve below XL-XR plot) is odd-symmetric. B: position model. Organization of subunits is identical to that of Fig. 8A, except that all subunits have a positional shift of their right RFs. The XL-XR plot is also a shifted version of Fig. 8C. Disparity tuning curve is even-symmetric.
|
|
Note that the intersection of the peaks of monocular excitation (intersection of vertical and horizontal dashed lines) does not coincide with the peak of the binocular response, indicated by the contours with the darkest shading. In other words, a combination of monocularly optimal stimuli does not result in a binocularly optimal stimulus. The maximum binocular response is obtained when stimuli for the two eyes are at nonoptimal locations within the monocularly measured RFs. This may be how a binocular complex cell that prefers a non-0 disparity is constructed from subunits that have different RF structures for the two eyes, and it provides an extension of the phase model to complex cells (Fleet et al. 1996
, 1997).
Position model
The position model combines four simple-cell subunits exactly as in Fig. 8A, except with a common positional offset. This is shown in Fig. 9B (right). Because the underlying structure of the subunits is the same as that for Fig. 8A, the binocular response pattern is also the same. The whole response pattern is shifted downward, reflecting the offset of the right RFs of the subunits. The disparity tuning curve obtained from the two-dimensional map has exactly the same shape as that for Fig. 8C and is even symmetric. The cell has a preferred disparity that is non-0 as indicated by the shift of the excitatory peak from the diagonal dashed line. The position of the peak of the binocular response is completely predictable from the intersection of the peaks of monocular responses, as shown by the fact that the peak of the binocular response lies exactly at the crossing of the horizontal and vertical dashed lines. Note that the binocular energy units shown in Figs. 9B and 8A have identical shapes, despite the fact that the units are tuned to different disparities. Thus these two cases cannot be distinguished unless one has an accurate measurement of zero disparity (i.e., corresponding points). This information is not readily available in our paralyzed preparation.
Hybrid phase and position model
It is possible that neither the phase nor position model accurately describes the binocular RFs of cortical cells. Disparity information may be encoded by a hybrid mechanism based on both phase and positional differences (Anzai et al. 1995
; Fleet et al. 1996
, 1997; Jacobson et al. 1993
; Qian and Zhu 1997; Zhu and Qian 1996
). Preliminary evidence from our lab indicates that, in fact, both phase and positional offsets contribute to a simple cell's disparity preference (Anzai et al. 1995
). In the absence of absolute eye position information in our paralyzed preparation, the positional offset component cannot be measured. However, the phase component may be estimated exactly by determining the phase of the disparity tuning curve. In other words, for the energy model, the phase of the disparity tuning profile exactly reflects the phase difference,
, of the RF profiles for the two eyes, regardless of the degree of positional offset. Therefore, the symmetry of binocular response profiles provides a signature for the phase model. However, the hybrid model cannot be ruled out even if asymmetry is found in the profiles.
Evaluation of the models
With the limitations described above, we are able to evaluate the validity of the models. Specifically, we can determine experimentally the contribution of the phase difference between left and right subunit RFs to a given complex cell's disparity tuning. This is because there is a direct relationship between the left-right phase difference of the subunit RFs and the symmetry of the binocular response profile for the complex cell that combines these subunits. We have seen this graphically for the cases of phase difference
= 0° (Fig. 8) and
= 90° (Fig. 9A). In general, for any value of
, the expression in E. 5 may be simplified as
|
(7)
|
Equation 7 shows that the binocular response of a disparity energy unit defined by Eq. 5 may be expressed as the sum of three terms: two monocular response terms and a binocular term (Fleet et al. 1996
, 1997). The monocular terms describe Gaussian-shaped profiles as would be obtained by monocular mappings of the left and right RFs. The third (binocular) term is a two-dimensional Gabor function (Daugmann 1985
; Gabor 1946
; Jones and Palmer 1987a
,b
; Marceljà 1980
) that is oriented at 45°. This relationship is described graphically in Fig. 10A for the case of
= 0°. It is clear from Fig. 10A that the diagonal region of excitation in the leftmost panel is due to the binocular term plotted in the rightmost panel. Monocular responses (middle 2 panels) appear horizontal and vertical bands of excitation and determine the response near the margins of the leftmost panel. Two suppressive bands (shown with dashed contours in the rightmost panel) on each side of the excitatory diagonal region create interruptions in the binocular response in the leftmost panel. For an intuitive interpretation of this binocular response (the rightmost panel) in actual three-dimensional space, it may be helpful to visualize a sandwich held out in space, with layers of the sandwich perpendicular to the cyclopean direction of gaze. A stimulus of appropriate orientation will be excitatory if it falls within the central (filling) layer and inhibitory if it falls within the surrounding (bread) layers.

View larger version (36K):
[in this window]
[in a new window]
| FIG. 10.
Responses of complex cell energy model in XL-XR domain may be decomposed into 3 terms: left and right monocular response terms and a purely binocular term. This decomposition is shown graphically for 2 phase differences of left and right subunit RFs, A: = 0° and B: = 90°. Monocular components are identical and independent of phase difference. Binocular term is a Gabor function that is oriented at 45°. Phase difference, , determines symmetry of binocular component.
|
|
Note that the monocular terms are independent of the phase difference,
, between left and right RFs, and that the symmetry of the binocular response profile is determined solely by the third (binocular) term of Eq. 7. In Fig. 10B, this binocular term is shown for the phase difference
= 90°. One may see readily how the pattern of response presented in the leftmost panel is produced by the sum of the monocular responses and the binocular term (the rightmost panel), which, here, is odd symmetric.
Observe also that the subunit RFs do not have to be even or odd symmetric as shown in Fig. 8A, i.e., their phases do not have to be multiples of 90°. The absolute phases of subunit RFs do not affect the complex cell response because they are canceled in the third term of Eq. 7 (Fleet et al. 1996
; Qian 1994
; Zhu and Qian 1996
). Only the phase difference between subunits (not
, the phase difference between left and right RFs) must be multiples of 90° for a minimum configuration. If more subunits are allowed, even this quadrature constraint may be removed.
To examine how well the energy model fits the data from cells, we have performed decompositions of binocular responses from the cells shown in Figs. 5 and 6. Results are illustrated in Fig. 11. Figure 11A shows a decomposition of the binocular response (dark-dark) for the cell of Fig. 5. The original data in the leftmost panel may be decomposed into left eye, right eye, and binocular responses in the three panels to the right. The rightmost panel shows the residual error of the fit. Fitting of the two-dimensional profile was performed by a modified Levenberg-Marquardt optimization algorithm using Matlab (MathWorks). To allow for variations in the data such as inexact centering of RFs, ocular dominance (Hubel and Wiesel 1962
), and other scaling factors, the actual function used for the fit is given by