JN Add DOIs to your references at manuscript stage!
HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS
 QUICK SEARCH:   [advanced]


     


J Neurophysiol 95: 2768-2786, 2006. First published January 4, 2006; doi:10.1152/jn.00955.2005
0022-3077/06 $8.00
This Article
Right arrow Abstract Freely available
Right arrow Full Text (PDF)
Right arrow All Versions of this Article:
95/5/2768    most recent
00955.2005v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Right arrow Citation Map
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via Web of Science (1)
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Sanada, T. M.
Right arrow Articles by Ohzawa, I.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Sanada, T. M.
Right arrow Articles by Ohzawa, I.

Encoding of Three-Dimensional Surface Slant in Cat Visual Areas 17 and 18

Takahisa M. Sanada1 and Izumi Ohzawa1,2

1Graduate School of Engineering Science and 2Graduate School of Frontier Biosciences, Osaka University, Osaka, Japan

Submitted 9 September 2005; accepted in final form 29 December 2005


 ABSTRACT
 
 TOP
 ABSTRACT
 INTRODUCTION
 METHODS
 RESULTS
 DISCUSSION
 APPENDIX
 GRANTS
 ACKNOWLEDGMENTS
 REFERENCES
 
How are surface orientations of three-dimensional objects and scenes represented in the visual system? We have examined an idea that these surface orientations are encoded by neurons with a variety of tilts in their binocular receptive field (RF) structure. To examine whether neurons in the early visual areas are capable of encoding surface orientations, we have recorded from single neurons extracellularly in areas 17 and 18 of the cat using standard electrophysiological methods. Binocular RF structures are obtained using a binocular version of the reverse correlation technique. About 30% of binocularly responsive neurons have RFs with statistically significant tilts from the frontoparallel plane. The degree of tilts is sufficient for representing the range of surface slants found in typical visual environments. For a subset of neurons having significant RF tilts, the degrees of tilt are correlated with the preferred spatial frequency difference between the two eyes, indicating that a modified disparity energy model can account for the selectivity, at least partially. However, not all cases could be explained by this model, suggesting that multiple mechanisms may be responsible. Therefore an alternative hypothesis is also examined, where the tilt is generated by pooling of multiple disparity detectors whose preferred disparities progressively shift over space. Although there is evidence for extensive spatial pooling, this hypothesis was not satisfactory either, in that the neurons with extensive pooling tended to prefer an untilted surface. Our results suggest that encoding of surface orientations may begin with the binocular neurons in the early visual cortex.


 INTRODUCTION
 
 TOP
 ABSTRACT
 INTRODUCTION
 METHODS
 RESULTS
 DISCUSSION
 APPENDIX
 GRANTS
 ACKNOWLEDGMENTS
 REFERENCES
 
One of the fundamental roles of the visual system is to reconstruct a three-dimensional (3D) model of the external world from a pair of two-dimensional images on the two retinae. Horizontal displacement of the eyes causes small differences between the retinal images. This difference of the retinal images is called binocular disparity and stereopsis is the process of determining depth from binocular disparity. Visual information processing for stereopsis begins in the primary visual cortex and neurons found in this area are known to encode binocular disparities of stimuli for a small area of visual field (Barlow et al. 1967Go; Ferster 1981Go; Hubel and Wiesel 1962Go, 1968Go; LeVay and Voigt 1988Go; Nikara et al. 1968Go; Ohzawa and Freeman 1986aGo,bGo; Ohzawa et al. 1990Go, 1996Go, 1997Go).

How does the processing of stereoscopic information proceed once binocular disparity for small localized areas is available? Is a possible next stage of processing that of detecting the rate of change of binocular disparity, i.e., detecting 3D orientations of surfaces in depth? Some recent studies have examined these possibilities and report that a subset of neurons in higher visual areas such as MT, V4, and CIPs encode information regarding slant/tilt of surfaces (Hinkle and Connor 2002Go; Nguyenkim and DeAngelis 2003Go; Taira et al. 2000Go). Response to 3D curvature is also reported in the inferotemporal cortex (IT) (Janssen et al. 1999Go, 2000Go Liu et al. 2004Go). It is not known, however, whether such surface slant/tilt sensitivity is a unique feature of these higher-order visual areas. Because neurons in these areas receive inputs from primary visual cortex, selectivity for 3D surface slant/tilt may be inherited from the early visual areas. Historically, the role of interocular orientation difference has been examined in some detail (Blakemore et al. 1972Go; Nelson et al. 1977Go). More recent work has examined whether a subset of V1 neurons encode surface tilt by orientation disparity based on physiological experiments in the monkey and a computational study (Bridge and Cumming 2001Go; Bridge et al. 2001Go). However, possible roles of interocular spatial frequency difference have not been examined physiologically.

As illustrated in Fig. 1A, projection of a slanted surface onto the two retinae produces a spatial frequency difference, such that the eye closer to the nearer end of the slanted surface sees higher spatial frequency than the other eye. Such a difference in spatial frequency across the eyes is a potent cue for perceiving surface slants. With psychophysical experiments, Blakemore and later investigators reported that a difference of spatial frequency across the eyes produces a perception of slant-in-depth (Blakemore 1970Go; Fiorentini and Maffei 1971Go; Wilson 1976Go). Binocular disparity caused by interocular spatial frequency difference is designated dif-frequency disparity (Tyler and Sutter 1979Go). As expected, the angle of perceived surface slant depends on interocular ratio of spatial frequencies. Despite these psychophysical results, we are not aware of any physiological study that has systematically examined possible roles of dif-frequency disparity for encoding surface slant in the early visual cortex. In this study, we will thus address this question using modern receptive field-mapping techniques.


Figure 1
View larger version (21K):
[in this window]
[in a new window]
 
FIG. 1. A possible neural encoding scheme for 3-dimensional (3D) surface slant is illustrated. A: difference of spatial frequency across the eyes (dif-frequency) produces a perception of slant-in-depth. Angle of perceived surface slant depends on the interocular ratio of spatial frequencies. B: disparity energy model is modified to allow encoding for interocular spatial frequency difference. All subunits (S) share the same receptive field position, orientation, and size for left and right eyes as in the standard model, but their preferred spatial frequencies differ across the eyes. CF: prediction of dif-frequency version of disparity energy model. Equal preferred spatial frequencies for the 2 eyes produces a frontoparallel binocular receptive field (RF) (C, E). Unequal preferred spatial frequencies cause a tilt of its binocular receptive field (D, F). Interocular SF ratio of 1.5 (left:right = 3:2) represents binocular RF tilt of about 10°, that corresponds to a slant in real space of about 70° from the frontoparallel plane at 57 cm viewing distance.

 
To provide a framework within which we design our experiments and analyze data, we start with the standard disparity energy model (Ohzawa et al. 1990Go). With the standard disparity energy model, parameter values for orientation, spatial frequency, position, and size of their monocular receptive field are the same across the eyes. Only the receptive field phase is allowed to be different across the eyes, and this difference in phase determines preferred binocular disparity. One obvious way to incorporate dif-frequency disparity sensitivity is to modify the standard disparity energy model and to allow spatial frequency to be different between left- and right-eye receptive fields of all subunits (S) (Fig. 1B). This idea was suggested in a previous computational study (Qian and Mikaelian 2000Go). Otherwise, the new model is identical to the standard model. Comparisons of predictions from the standard disparity energy model and those from the dif-frequency model are shown in middle and bottom rows, respectively, of Fig. 1. Figure 1C illustrates a binocular receptive field (RF) predicted from the standard disparity energy model where the spatial frequency-tuning curves are matched exactly between the two eyes (Fig. 1D). Notice in Fig. 1C that the strong region of excitation is exactly horizontal, indicating selectivity to the frontoparallel plane. However, a clear tilt in the binocular RF is predicted from the dif-frequency case as shown in Fig. 1, E and F. Note that, herein, we refer to the rotation angle of binocular RF from the frontoparallel axis as the "tilt" of binocular RF. The term "slant" is used exclusively for referring to angles of surfaces in the visual stimuli. Such differences in preferred spatial frequencies for the two eyes are not unreasonable assumptions. Actual neurons do not always prefer the same spatial frequencies for the two eyes (Hammond and Pomfrett 1991Go; Read and Cumming 2003Go).

Considering the predictions illustrated above from a modified version of the disparity energy model, we will first examine the extent to which neurons in areas 17 and 18 of the cat visual cortex exhibit tilted binocular RFs. We will also examine the validity of the dif-frequency disparity energy model by comparing the degree of tilt of binocular RF and monocular spatial frequency-tuning curves for the two eyes.


 METHODS
 
 TOP
 ABSTRACT
 INTRODUCTION
 METHODS
 RESULTS
 DISCUSSION
 APPENDIX
 GRANTS
 ACKNOWLEDGMENTS
 REFERENCES
 
All animal care and experimental procedures conformed to those established by the National Institutes of Health and were approved by the Osaka University Animal Care and Use Committee.

Surgical procedure and animal maintenance

Forty-four adult cats (1.5–4 kg) were prepared for electrophysiological recording as follows. First, subcutaneous injection was given of atropine sulfate (0.017 mg/kg) and hydroxyzine hydrochloride (Atarax, 0.83 mg/kg). Anesthesia was induced and maintained during surgery with isoflurane (2.5–3.5% in O2). Cefotiam hydrochloride (Panspolin, 2.8 mg/kg) and dexamethasone sodium phosphate (Decadron, 0.13 mg/kg) were administered. Electrocardiogram (ECG) electrodes and a rectal temperature probe were installed. The rectal temperature probe was coated with lidocaine ointment. ECG and core temperature were monitored using a custom-built PC-based physiological monitoring system. Catheters were inserted into femoral veins of two limbs for infusion of drugs and fluids. A glass tracheal cannula was inserted after tracheostomy. A stereotaxic apparatus was used to securely position the animal’s head. Lidocaine ointment was used at pressure points of ear bars. After securing the animal to the stereotaxic apparatus, anesthesia was switched to thiopental sodium (Ravonal, administered continuously in infusion, 1.0–1.5 mg · kg–1 · h–1). Then, paralysis was induced with an initial dose of gallamine triethiodide and the animal was placed under artificial respiration at the rate of 20–30 strokes/min. The respiration rate and stroke volume were adjusted to maintain the end-tidal CO2 between 3.5 and 4.3%. A CO2 sensor (Datex-Ohmeda) was used to maintain a proper level of respiration. Anesthesia for the rest of recording session was maintained by a combination of 70% N2O-30% O2 and thiopental sodium as noted above. Paralysis was maintained by continuous infusion of Ravonal, gallamine triethiodide (10 mg · kg–1 · h–1) in lactated Ringer solution containing 50% glucose (40 mg · kg–1 · h–1). Body temperature was maintained near 38.3°C with the use of a servo-controlled heating pad. After securing the animal, a craniotomy was performed to access the central representation of the visual area 17 or 18 (Horsley–Clarke P4 L2.5 for recordings of A17, A3 L3 for A18). The dura was carefully removed to allow insertion of microelectrodes. Pupils were dilated with atropine (1%), and nictitating membranes were retracted with phenylephrine hydrochloride (Neosynesin, 5%). Contact lenses of appropriate power with 4-mm artificial pupil were placed over the corneas.

The area of recording was primarily determined by the coordinate of electrode penetrations, although histological confirmation of recorded areas was conducted for the majority of animals. There is a possibility that a small fractions of neurons, especially from long penetrations, may be classified into a wrong cortical area. However, we did not eliminate those neurons (for which we were not completely certain of the area) from our analyses because they still represent important and valid samples for purposes of this study and there were no obvious areal differences.

Experimental apparatus

Tungsten microelectrodes (5 M{Omega}, A-M Systems) were used to record spike activities extracellularly. To increase the chance of encountering cells, two electrodes were mounted in parallel in a protective single guide tube and driven by a common microelectrode drive (Narishige). After confirming under a microscope that the electrodes do not penetrate blood vessels on the cortical surface, agar in warm Ringer solution was applied to stabilize and protect the cortex. Then, melted wax was applied over the agar to form a sealed chamber. An oscilloscope and audio speakers were used to monitor raw signals from the microelectrodes. Electrical signals from the microelectrodes were amplified (10,000x) and band-pass filtered (300–5,000 Hz). Then spike sorting was achieved using a custom-built spike sorter (Ohzawa et al. 1996Go), where each spike was sorted by their waveforms and time stamped with 40-µs resolution.

Visual stimulation and recording procedures

Experiment control functions and generations of visual stimuli were performed using custom-built software. Visual stimuli are generated by a dedicated PC and displayed on a CRT display (Sony GDM-FW900, a resolution of 1,600 x 1,024 pixels, covering the display area of 46.6 x 29.9 cm, 34.3 dot/deg; refresh rate: 76 Hz). A custom-built mirror haploscope was used to present stimuli to left and right eyes separately (Fig. 2A). To preclude projection of stimulus to contralateral eye, a separator was placed between the left and right visual fields. Distance between the screen and the eyes was set to 57 cm, subtending the visual field of 23.3(horizontal) x 29.9(vertical) degrees for each eye. Because we were examining interocular differences in neuronal responses, we carefully set up the haploscope and adjusted distances to the screen to equate the viewing conditions for two eyes as much as possible. The display surface of CRT monitor was carefully set perpendicular to the lines of sights for the subject.


Figure 2
View larger version (20K):
[in this window]
[in a new window]
 
FIG. 2. Experimental setup and binocular reverse correlation analysis are illustrated. A: one-dimensional sparse bar noise stimuli were presented to left and right eyes simultaneously by a mirror haploscope setup. B: all possible combinations of left and right eye stimulus position are included for each left–right permutation of contrast sign (dark–dark, bright–bright, dark–bright, bright–dark). Spike trains were cross-correlated with stimulus sequences, and results are displayed as binocular receptive field maps for the 4 permutations (only the map for bright–dark is depicted for clarity).

 
After isolation of spike waveforms from one or more cells, approximate receptive field locations, preferred orientations, and spatial frequencies were determined manually by a mouse-controlled search program. Then, a standard reverse correlation procedure (DeAngelis et al. 1993aGo,bGo; Jones and Palmer 1987Go; Jones et al. 1987Go) was performed to obtain the accurate position and size of receptive field of two eyes. Subspace reverse correlation (Ringach et al. 1997Go) was then conducted for the dominant eye to obtain the preferred orientation and spatial frequency. Peaks of orientation and spatial frequency tuning correspond well to the peak values obtained by tests using drifting sinusoidal gratings (Nishimoto et al. 2005Go). After these preliminary tests, a binocular receptive field map was measured by a binocular reverse correlation procedure (Ohzawa et al. 1990Go, 1997Go). To compare interocular spatial frequency difference and tilt of binocular receptive field, spatial frequency, and orientation tests using drifting sinusoidal grating were performed for both left and right eyes, while using optimal values for other stimulus parameters for each cell. However, when using grating stimuli, contrast and temporal frequency were generally set to 50% and 2 Hz, respectively.

Binocular receptive field mapping

Binocular reverse correlation procedure was equivalent to that used by Ohzawa et al. (1997)Go. A pair of one-dimensional bar stimuli was simultaneously presented to left and right eyes by a mirror haploscope setup (Fig. 2A). Twenty stimulus locations of the bar were used to stimulate receptive fields for each eye. This defined 20 x 20-point stimulus grid in the (XL, XR) domain (Fig. 2B). Therefore the binocular receptive field was measured by tallying up responses to 1,600 (20 x 20 x 4) different dichoptic pairs of stimuli. The orientation of the bar stimuli was set to the preferred orientation for each eye and for each cell. All possible combinations of left and right eye stimulus positions were included for each left–right permutation of contrast sign (dark–dark, bright–bright, dark–bright, bright–dark). All pairs of positions and combinations of stimulus contrast were presented in a random order, each stimulus lasting for 26 ms (two video frames) or 53 ms (four video frames) without any blank stimulus. Stimulus sequence was reshuffled for each set. A complete stimulus sequence lasted 42 s. Typically 20–40 sequences were used, which took 20–40 min in all. The response map for each contrast subset was calculated by cross-correlating spike trains with stimulus sequences (Fig. 2B). Binocular receptive field is a sum of response maps for matched polarity (bright–bright and dark–dark) conditions minus those for mismatched polarity (dark–bright and bright–dark) conditions. Monocular responses are cancelled by this computation and do not appear in the binocular RF (Ohzawa et al. 1997Go). We calculated the binocular RF for correlation delays from –100 to +300 ms in 5-ms steps. Because there is no correlation between spike train and stimulus sequence for negative time delays, we defined the response at negative time delays as noise. To obtain the optimal correlation delay, the sum of squared value of all data points in the RF at each correlation delay was obtained for the range of delays, and the peak delay was determined. A binocular receptive field is constructed at this optimal correlation delay. To evaluate the signal-to-noise ratio, we calculated the SD of the response at the optimal correlation delay divided by the average SD for negative correlation delays (–100 to –5 ms in 5-ms steps). We rejected data when the total spikes are <1,000 impulses and the peak response at the optimal delay did not exceed the mean of response at negative correlation delays +10SD.

Spatial frequency-tuning test

Left and right spatial frequency tunings were obtained by using drifting sinusoidal gratings in a separate test. Orientations of grating stimuli were fixed at the optimal value for each eye because preferred orientations were typically different by 5–15° for the two eyes, probably arising from cyclorotation of the eye after paralysis (Nelson et al. 1977Go). The gratings were presented in a random order and each presentation lasted for 4 s interspersed with 1 s of interstimulus intervals. Mean firing rates were calculated at each spatial frequency. One-dimensional Gaussian functions were fitted to each spatial frequency tuning. Preferred spatial frequencies were obtained by the peak position of the fitted Gaussian function.


 RESULTS
 
 TOP
 ABSTRACT
 INTRODUCTION
 METHODS
 RESULTS
 DISCUSSION
 APPENDIX
 GRANTS
 ACKNOWLEDGMENTS
 REFERENCES
 
Binocular tests were conducted for a total of 271 neurons that were recorded from both areas 17 and 18 of 44 cats. Of these, binocular RFs could be obtained with sufficient signal-to-noise ratio (see METHODS) for 177 neurons. These neurons are further classified into two groups. Sixty-four neurons are classified as separable type and 113 neurons are classified as inseparable.

Binocular RFs for representative examples from the separable and inseparable types are illustrated in Fig. 3. Figure 3A depicts a binocular RF for a simple cell recorded in area 18. The binocular RF appears to be described reasonably well by a product of left and right monocular receptive field profiles for simple cells, as reported by Anzai et al. (1999a)Go. Correlation between the standard simple/complex RF types and separability of binocular RF is high, but these classifications are not identical. This issue, including the basis for our choice of using the separability, will be described later (see following text). An exemplar complex cell recorded in area 18 is illustrated in Fig. 3C. The binocular RF showed a horizontally elongated structure like that in previous studies (Anzai et al. 1999bGo; Ohzawa et al. 1990Go, 1997Go). Complex cells tend to exhibit binocular RFs that are not left–right separable. Such inseparable receptive fields are well described by a disparity energy model where the sum of output of quadrature pairs of separable RFs constructs an inseparable RF (Anzai et al. 1999aGo,bGo; Ohzawa et al. 1990Go, 1997Go). On closer examination of this binocular RF, we noticed a small amount of tilt in the binocular RF from the frontoparallel axis in the clockwise direction. We wished to determine whether these small tilts are reliable properties of the neurons or arise from experimental noise or variability. Note that a small degree of tilt in the (XL, XR) domain translates into a substantially larger surface slant in real object space in front of the animal. This is because, under realistic viewing conditions, the lines of sight from the two eyes to a fixation point crosses with a much more acute angle than the 90 ° angle for the (XL, XR) domain. For example, given a viewing distance of 57 cm and interpupillary distance of 3 cm, a tilt of 5° in the (XL, XR) domain is equal to the surface slant that is 73.3° from the frontoparallel plane (see APPENDIX). Therefore even a small visible tilt in the (XL, XR) domain may have a large perceptual significance.


Figure 3
View larger version (27K):
[in this window]
[in a new window]
 
FIG. 3. Binocular receptive fields and their Fourier spectra are shown for simple (A, B) and complex (C, D) cells, respectively. Binocular RF of simple cells tended to be separable in the (XL, XR) domain with 4 peaks in the spectrum, whereas those for complex cells tended to be inseparable and with 2 peaks in the frequency domain. There is a small but apparent tilt ({theta}) of the binocular RF in C.

 
To estimate quantitatively the tilt of binocular RFs, we analyzed binocular RFs in the frequency domain. Frequency analysis is highly effective for evaluating the orientation of binocular RF without regard to specific local features of the RF, phase, or position. It also uses the entire set of RF data. Representative Fourier spectra of binocular RFs are shown in the right column. Figure 3, B and D shows Fourier spectra of the binocular RFs shown in Fig. 3, A and C, respectively. The axes (along oblique edges) of the domain are now left and right frequencies. The spectrum for separable binocular RF (Fig. 3B) has four peaks, whereas that for the inseparable neuron (Fig. 3D) shows a pair of strong peaks.

Alternatively, the same frequency domain may be referenced by a pair of orthogonal axes, along the vertical and horizontal directions corresponding to the diagonals of the square domain (Fig. 3B). These dimensions are defined as the disparity frequency and the frontoparallel frequency for vertical and horizontal axes, respectively (see APPENDIX). Interestingly, the four quadrants of the domain may be assigned to either disparity frequency tuning or frontoparallel frequency tuning. Top and bottom quadrants represent tuning for disparity, as indicated by two spectral peaks in Fig. 3D. The locations of the peaks in these domains allow extraction of such parameters as the optimal disparity frequency and binocular RF tilt. Left and right quadrants, on the other hand, will have substantial peaks only for separable neurons, and represent spatial frequency tuning of combined input from the two eyes. Therefore the peaks in these quadrants define the optimal frontoparallel frequency.

The process of determining binocular RF parameters in the frequency domain is illustrated further in Fig. 4. Based on the observation that substantial peaks are present in the left and right quadrants only for separable RFs, we define an index of separability of receptive field in the XL, XR domain, the binocular separability index (BSI), as follows

Formula 1(1)
where RD is the peak response amplitude in the bottom quadrant. RF is the response in the left quadrant along the cross section parallel to the left frequency axis going through the peak in the bottom quadrant, and taken at the same right frequency (inset of Fig. 4A). The left rather than the right quadrant is selected arbitrarily because the profiles in the left and right quadrants are symmetrical about the origin. The value of BSI ranges from 0 to 1. Based on the disparity energy model, simple cells will exhibit high BSI and complex cells will show BSI close to 0. Therefore neurons with BSI >0.73 are defined as the separable type, and otherwise, the inseparable type. The cutoff criterion for the BSI (0.73) gave the most consistent agreement with our visual inspection; neurons that have BSI values >0.73 have visually separable profiles for binocular RF and vice versa.


Figure 4
View larger version (16K):
[in this window]
[in a new window]
 
FIG. 4. Procedures are shown for computing binocular separability index (BSI) and binocular RF tilt angle ({theta}) from the spectra (see text). A: BSI is determined by a ratio of 2 spectral peak amplitudes, RF and RD, taken from a cross-sectional profile through the highest peak of the spectrum. When BSI is >0.73, neurons are classified as separable, and inseparable otherwise. BSI and {theta} for this cell (same as that for Fig. 3, A and B) are 0.92 and 3°, respectively. B: BSI for this complex cell (same as that for Fig. 3, C and D) is low (BSI = 0.37), indicating clear inseparability. C: tilt angle ({theta}) of the binocular RF is calculated from the angular position of the peak in the frequency domain, as the arctangent of the ratio of the peak frequencies for left and right eyes. Tilt angle was –4° for this cell. Same procedure is used for both separable and inseparable types. Cross sections going through the spectral peak that are parallel to the left and right frequency axes depict monocular spatial frequency-tuning curves, as estimated from the binocular RF data. Monocular spatial frequency tuning for left and right are drawn as solid and dashed curve, respectively. Line that goes through the spectral peak and the origin is defined as the cardinal disparity axis for the neuron.

 
Using the same spectral profile of the binocular RF as above, the "tilt" of the binocular RF ({theta}) may be defined as the angular deviation of the spectral peak from the disparity frequency axis, connecting the top and bottom corners of Fig. 4C. If the spectral peak is exactly on the disparity frequency axis, original binocular RF has zero tilt. A nonzero {theta} indicates a corresponding tilt of binocular RF.

To estimate peak frequencies with greater accuracy, we interpolated the Fourier spectrum by cubic spline before evaluating the binocular RF tilt. A spatial frequency step of 0.005 (cycles/deg) is used as the resolution of interpolation for all neurons. To determine the step size for interpolation, we calculated percentage errors for binocular RF tilts for various resolutions of interpolation by simulations. Fourier transforms are performed on simulated binocular RF data obtained from model binocular complex cells with various interocular spatial frequency ratios (fL/fR = 0.66 to 1.5), and various disparity frequencies (fdisparity = 0.07 to 0.5; see APPENDIX). The data array is set to the same size as that in our experiments (20 x 20 grid). Then, interpolations are tested for various final resolutions (0.005 to 0.1 cycle/deg). On average, a sufficiently small error level (0.96 ± 0.03% error) for binocular RF tilt is obtained with the interpolation resolution of 0.005 cycle/deg. The percentage error increased to 13.43 ± 0.17% at the 0.1 cycle/deg resolution.

Because there is always a spectral peak in the bottom quadrant regardless of binocular RF separability, the calculations outlined above are applicable both to separable and to inseparable type of neurons. Note that cross sections going through the spectral peak that are parallel to the left and right frequency axes depict monocular spatial frequency-tuning curves, as estimated from the binocular RF data. These tuning curves are illustrated at the bottom left and right insets of Fig. 4C. The "tilt" angle of the binocular RF ({theta}) may be determined from the peak coordinate of the binocular RF (f0L, f0R), as follows

Formula 2(2)
The line that goes through the spectral peak and the origin is defined as the cardinal disparity axis for the neuron.

Binocular RF tilt {theta} is transformed into disparity gradient, which is more commonly used to quantify slants of oriented surfaces in 3D. Disparity gradient represents surface slant independent of viewing distance. It is usually defined as

Formula 3(3)
where dA and dB are binocular disparities for two observed objects and {gamma} is the angular separation between the directions for the two objects as viewed from the cyclopean eye, i.e., the midpoint between the two eyes (Burt and Julesz 1980Go). Therefore a slant in actual space can be represented as disparity gradient, which may take on a value between –2.0 and 2.0. Disparity gradients at these limiting values indicate the cases where two objects lie on a common line of sight for one eye. It was reported that absolute value of disparity gradient for two dots must be <1–2 for binocular fusion depending on exact dot parameters (Burt and Julesz 1980Go; Prazdny 1985Go; Trivedi and Lloyd 1985Go). For this reason, we would expect most neurons to be encoding disparity gradient within these limits, if neural encoding of surface slants is constructed in an efficient manner. Note that the disparity gradient in Eq. 3 defines a property of the stimulus configuration. What we wish to estimate here instead is a property of a binocular neuron, i.e., its preferred disparity gradient given the cell’s binocular RF profile. This may be obtained from the binocular RF tilt {theta}, as described in the following equation

Formula 4(4)
To intuitively grasp the relationship between these metrics of surface slants, consider the following realistic example. When the binocular RF tilt {theta} is 10°, the disparity gradient is 0.35, which corresponds to about 80° of physical surface slant at 57 cm of viewing distance. Using the disparity gradient as defined above, we will quantify and summarize RF slant for all neurons below.

As illustrated in Figs. 3 and 4, simple and complex cells tended to show different binocular RF profiles, binocularly separable and inseparable, respectively. However, simple/complex and separable/inseparable classifications are not the same. There are simple cells that are classified as inseparable, and vice versa. For the reasons outlined below, we will use the separable/inseparable type classification throughout the paper. However, before we set out to perform all the analyses based on this classification, we should examine the correlation between the two classification methods.

Note that an ideal complex cell based exactly on the disparity energy model will have a BSI of exactly 0 (Anzai et al. 1999bGo; Ohzawa et al. 1990Go, 1997Go). On the other hand, ideal binocular simple cells that linearly sum left and right eye input will have a BSI of 1 (Anzai et al. 1999aGo; Ohzawa et al. 1990Go, 1996Go). The actual population of neurons we have recorded exhibited substantial deviations from the ideal cases as illustrated in Fig. 5.


Figure 5
View larger version (17K):
[in this window]
[in a new window]
 
FIG. 5. Comparison of conventional simple/complex-type classification and classification by separability of binocular RF is illustrated. A: correlation is shown between left and right F1/F0 ratios. This ratio is used in conventional classification of simple (F1/F0 > 1) and complex cells based on the degree of response modulation to drifting sinusoidal gratings. Ratios for left and right eyes showed significant correlation (Pearson’s r = 0.87, P < 0.01). However, for some neurons, classified types were mismatched between the eyes. B: scatterplot is shown of F1/F0 ratio vs. binocular separability index (BSI). These 2 parameters showed a significant correlation (Pearson’s r = 0.76 and 0.78 for left and right eyes, respectively, P < 0.001, n = 135). F1/F0 ratios were obtained from responses to drifting sinusoidal gratings of optimal spatial frequency for each eye. Open and filled symbols depict data from left and right eyes, respectively. Therefore each cell has 2 symbols for F1/F0 ratios, connected by a line segment for indicating paired data. C: F1/F0 ratios show a bimodal distribution. Filled and open bars indicate right and left eyes, respectively. D: distribution of BSI is shown. Majority of neurons have inseparable binocular RF (as indicated by BSI <0.73), many of which are classified as simple based on the F1/F0 ratio.

 
First, for the simple/complex classification, we use the standard criteria based on the F1/F0 ratio, the ratio of the amplitude modulation (AM) in response to an optimal drifting sinusoidal grating stimulus to the average firing rate for the same response (Li et al. 2003Go; Skottun et al. 1991Go). Relationships between left and right F1/F0 ratios are plotted in Fig. 5A. The ratios were evaluated at the optimal spatial frequency for each eye. Circle and triangle symbols indicate data recorded from areas 17 and 18, respectively. The correlation of F1/F0 ratios for the left and right eyes is highly significant (for area 17, r = 0.9, P < 0.001, N = 66; for area 18, r = 0.83, P < 0.001, N = 69). However, there are several neurons with a large mismatch in the F1/F0 ratios between the eyes. That is, some neurons had highly modulated responses to sinusoidal drifting gratings for one eye, but practically no modulation was observed for the opposite eye. The relationship between F1/F0 ratio and BSI is illustrated in Fig. 5B. Open and filled symbols depict data for the left and right eyes, respectively. Each cell has two symbols for F1/F0 ratios (for the two eyes), connected by a line segment for indicating paired data. Although these two parameters show significant correlations (r = 0.76 and 0.78 for left and right, respectively; P < 0.001, n = 135), there are many cases where the predictions of ideal model cases break down. For example, neurons with BSI values close to zero had a wide variety of F1/F0 ratios, indicating that binocularly inseparable RFs may be observed commonly in both simple and complex cells. Figure 5, C and D indicates the distributions for left and right F1/F0 ratios and the distribution of BSI, respectively. Filled and open bars in Fig. 5C indicate data for the right and left eyes, respectively. The F1/F0 ratios show a bimodal distribution as reported previously (Li et al. 2003Go; Mechler and Ringach 2002Go). Note also that BSI is derived directly from the data from a key binocular measurement in this study, whereas F1/F0 ratios are obtained from monocular tests and therefore are expected to be less directly linked to binocular properties. There have also been questions about a multitude of factors that influence F1/F0 ratios (Mata and Ringach 2005Go). Considering further that the use of the classical criteria in simple/complex classification can sometimes result in discrepant types between the eyes, the use of binocular separability of the RF offers a better classification method overall for the purposes of this study.

Recall that one of the purposes of this study is to examine whether the apparent tilt of binocular RF profile is based on the difference in the optimal spatial frequencies across the eyes (Fig. 1). The question is addressed in the next several figures based on results of binocular RF and spatial frequency-tuning measurements from both binocularly separable and inseparable neurons. Data from representative examples of binocularly separable neurons are illustrated in Fig. 6. Binocular RF profiles are shown in the left column. In the middle column, monocular Fourier spectra derived from the binocular RF are shown as solid and dashed curves for the left and right eyes, respectively. These are cross sections through the peak of the Fourier spectrum as illustrated in Fig. 4C, taken parallel to the left and right frequency axes. Actual spatial frequency-tuning curves obtained by drifting sinusoidal grating stimuli are illustrated in the right column. The predicted tuning curves in the middle column and those in the right column should be comparable directly under certain linearity assumptions (DeAngelis et al. 1993aGo,bGo). Open and filled symbols depict responses for the left and right eyes, respectively. Error bars represent the SE. A horizontal dashed line indicates the spontaneous firing rate. A Gaussian function of the following form is fitted to each tuning curve

Formula 5(5)
Only those cells that had significantly modulated responses as a function of spatial frequency (ANOVA, P < 0.05) are included in further analyses of spatial frequency tunings. From these fits, preferred spatial frequencies were obtained from the peak of Gaussian function (f0). We used two criteria for selection of spatial frequency-tuning curves that have gone into the summary. First, the goodness of fit is >60%. Second, we selected only those responses that showed a band-pass tuning for the two eyes. Cells exhibiting low-pass spatial frequency tuning are excluded because it is difficult to determine the peak spatial frequency accurately for these neurons. For a spatial frequency tuning to be considered as band-pass, there must be at least two data points below f0, the peak of the fitted Gaussian function.


Figure 6
View larger version (28K):
[in this window]
[in a new window]
 
FIG. 6. Examples are shown of 3 separable type neurons that have different frequency tunings across the eyes. A, left: separable binocular RF of a simple cell is depicted. A, middle: left and right Fourier spectra of the binocular RF are shown as solid and dashed curves, respectively. Although the tilt of binocular RF is small (binocular RF tilt = –7.0), it is statistically significant (P < 0.05, bootstrap test). Predicted disparity gradient is –0.25. A, right: spatial frequency-tuning curves obtained by drifting sinusoidal grating stimuli are illustrated. Open and filled symbols depict responses for the left and right eyes, respectively. Error bars depict SEs. Horizontal dashed line indicates the spontaneous firing rate. Gaussian functions were fitted to the tuning curves. Vertical thin lines indicate the peaks of fitted Gaussian function for left and right eyes, respectively. Preferred spatial frequencies are significantly different between the eyes (bootstrap test, P < 0.05). Spatial frequency ratio is 0.71. B: data from another separable binocular RF is shown for a simple cell in the same format as A. Binocular RF is significantly tilted from frontoparallel plane (binocular RF tilt = –6.3; P < 0.05, bootstrap test). Predicted disparity gradient is –0.22. Spatial frequency ratio is 0.69. C: additional example of separable binocular RFs is shown (binocular RF tilt = –8.5). Predicted disparity gradients and spatial frequency ratio are –0.3 and 0.7, respectively.

 
For the cell presented in Fig. 6A, tilt of the binocular RF (as measured by the displacement of spectral peaks illustrated in Fig. 4C) is statistically significant (tilt = –7°, P < 0.05, bootstrap test). The bootstrap test for estimating significance of tilt is conducted as follows. Binocular RF mapping consists of trials, each of which contains a randomized sequence of complete permutations of left and right stimuli (Ohzawa et al. 1990Go, 1997Go). From spike data for a total of N (typically 40) trials, N trials are randomly drawn while allowing duplications, from which a new binocular RF is constructed. For each neuron, this process was repeated 1,000 x to obtain the estimates of variability in the RF measurements (Efron 1982Go; Efron and Tibshirani 1993Go). When the mean tilt of the distribution of resampled binocular RFs was deviated from zero by more than 1.96SD, the RF tilt was judged to be significant. With this criterion, the probability of RF tilt being on the opposite side of zero is <5%.

The binocular RF tilt determined as above is automatically reflected as a difference in the predicted spatial frequency-tuning curves shown in Fig. 6A (middle). The predicted disparity gradient for this cell is 0.25, as calculated by Eq. 4. A similar statistically significant difference in the optimal spatial frequencies for the two eyes is also observed for the actual tuning curves measured by drifting sinusoidal gratings (Fig. 6A, right; P < 0.05, bootstrap test) in that the optimal spatial frequency for the right eye (vertical dashed line) is higher than that for the left eye (vertical solid line). Therefore for this neuron, there is a good correspondence between the tilt of the binocular RF (measured by reverse correlation) and the interocular difference between the optimal spatial frequencies (measured by drifting gratings).

Similar additional data from two separable binocular RFs are shown in Fig. 6, B and C in the same format as that of Fig. 6A. For these two cells (both of which were simple), tilt angles {theta} of binocular RFs were significantly different from zero (P < 0.05, bootstrap test). The tilt angles of binocular RF for Fig. 6, B and C are –6.3 and –8.5°, which correspond to predicted preferred disparity gradients of –0.22 and –0.3, respectively. Again, for these additional cells, the actual spatial frequency-tuning curves measured by drifting gratings (right column) also show statistically significant difference between the eyes (P < 0.05, bootstrap test). The ratios of optimal spatial frequencies (left/right) are 0.71, 0.69, and 0.70 for cells in Fig. 6, AC, respectively. Again, the direction of the difference in predicted spatial frequency-tuning profiles (middle column) corresponds well to that for the measured data (right column) for each neuron. Therefore these results for binocularly separable neurons indicate that the tilt angles of their binocular RFs and their predicted disparity gradients correspond well with the left–right differences of optimal spatial frequencies measured by monocularly presented drifting gratings.

Data from representative examples of inseparable neurons are illustrated in Fig. 7 in the same format as that of Fig. 6. Spatial frequency-tuning curves are not available for B and D either because spikes for one of the cells appeared after the initial tuning tests were already completed or, for the case of D, data for the frequency-tuning test did not show significantly modulated responses as a function of spatial frequency (P > 0.05, ANOVA). Binocular RFs shown in Fig. 7, pairs A and B, C and D are from neurons that were recorded simultaneously. The neuron shown in A is an example for which the binocular RF was significantly tilted from the frontoparallel plane (P < 0.05, bootstrap test). In fact, all of the examples except for that in Fig. 7C had statistically significant tilt for their binocular RFs. Note that the neuron illustrated in B had a statistically significant tilt in the opposite direction from the other member of the pair shown in A. The opposite tilt directions for the pairs of neurons clearly indicate that the tilts of binocular RFs do not arise from optical factors such as errors in the eye-display distances or magnification differences between the eyes. Because there are significant differences in the degree of tilt among simultaneously recorded neurons, these variations must be neural in origin.


Figure 7
View larger version (21K):
[in this window]
[in a new window]
 
FIG. 7. Examples are shown of 6 inseparable type neurons. A and B and C and D are pairs of neurons recorded simultaneously, and the data represent responses to the same stimuli. Binocular receptive fields are shown in the left column and their Fourier spectra are shown in the middle column. Dotted and white lines in the left panels indicate tilt angle of binocular RF and frontoparallel line, respectively. Right column depicts monocular spatial frequency-tuning curves obtained by drifting sinusoidal grating stimuli. Format of the figure is otherwise identical to that of Fig. 6. A and B: binocular RFs of both neurons are tilted significantly from frontoparallel plane (P < 0.05, bootstrap test) but their tilts are in the opposite directions (tilt angles are A: –4.0°, B: 7.7°). Spatial frequency ratio for A is 0.62. C: binocular RF of this neuron is not significantly tilted. D: binocular RF of this neuron is tilted significantly from frontoparallel plane (10.2°; P < 0.05, bootstrap test). Spatial frequency-tuning data for this cell are not available. E and F: 2 additional examples are shown of inseparable type neurons. Both of these neurons exhibited significant tilt from the frontoparallel plane (tilt angles are E: –3.8°, F: 6.8°; P < 0.05, bootstrap test). Preferred spatial frequencies are different for the 2 eyes for both neurons. Spatial frequency ratios are 0.76 and 1.28 for E and F, respectively. All of the neurons in this figure were classified as complex, except for B and D for which spatial frequency tunings are not available.

 
Another pair of simultaneously recorded neurons also exhibited a clear difference in the tilts of binocular RFs. Although the neuron depicted in Fig. 7C did not have a significant RF tilt, the other member of the pair had its RF tilted significantly from the frontoparallel plane. Tilt angles for D, E, and F are 10.2, –3.8, and 6.8°, which correspond to 0.36, –0.13, and 0.24 as disparity gradients, respectively.

As with binocularly separable neurons presented in Fig. 6, independent measurements of spatial frequency-tuning curves are also conducted using drifting sinusoidal gratings. Preferred spatial frequencies, shown as vertical solid and dashed thin lines in the right column, differ significantly across the eyes for Fig. 7, A, E, and F (P < 0.05, bootstrap test), but not for Fig. 7C (P > 0.05, bootstrap test). This is consistent with the lack of significant tilt of binocular RF for this neuron. Therefore for all cases shown in Fig. 7, A, C, E, and F, directions of interocular spatial frequency difference correspond well to the frequency difference of binocular RFs.

Paired recordings are also possible between neurons of different binocular separability. Such an example is shown in Fig. 8. Binocular RFs shown in Fig. 8, A and B are separable and inseparable RF, respectively. For both neurons, binocular RFs exhibit significant tilts from the frontoparallel plane (P < 0.05, bootstrap test). Furthermore, the tilts are in opposite directions between the two neurons. The tilt angles for cells in Fig. 8, A and B are 3.2 and –3.7°, with the corresponding preferred disparity gradients of 0.11 and –0.13, respectively. Actual spatial frequency-tuning curves measured with drifting gratings are shown in the right column. As expected from the binocular RF tilts, the interocular difference in the preferred spatial frequencies are opposite for the two neurons. The left preferred spatial frequency is significantly higher than the right frequency (frequency ratio = 1.23, P < 0.05, bootstrap test) for A; the difference is significant and opposite (frequency ratio = 0.87, P < 0.05, bootstrap test) for B. Taken together with the results from the previous figure, both separable and inseparable binocular RFs show a variety of tilts that are consistent with the interocular difference in the monocularly measured preferred spatial frequencies. Therefore the notion of the basis of 3D surface tilt representation, as illustrated in Fig. 1, appears quite likely based on these examples.


Figure 8
View larger version (23K):
[in this window]
[in a new window]
 
FIG. 8. Binocular RFs are illustrated for a pair of neurons. Format of figure is equivalent to Figs. 6 and 7. A: a separable neuron is shown for which the left peak spatial frequency is significantly higher than that for the right (P < 0.05, bootstrap test). Spatial frequency ratio is 1.23. B: an inseparable neuron is depicted similarly. Left and right spatial frequencies are significantly different, and the binocular RF tilted significantly from frontoparallel (tilt angle is –3.7°; P < 0.05, bootstrap test). Spatial frequency ratio is 0.87. Preferred spatial frequencies for the left and right eyes are significantly different for both neurons (bootstrap test, P < 0.05). Note that the directions of interocular spatial frequency shifts are opposite between the 2 neurons. Data from these neurons are recorded about 100 min apart, but without any optical disturbances such as contact lens or eye manipulations or display movements across the measurements.

 
What is the range of binocular RF tilts observed for cells in areas 17 and 18? Distributions of disparity gradients of both separable and inseparable cells are illustrated in Fig. 9. Most neurons had disparity gradients in the range of –0.5 to 0.5. The SDs of the mean were 0.19 and 0.14 for separable and inseparable types, respectively. Black bars indicate cells whose binocular RFs are tilted significantly from the frontoparallel plane, whereas white bars indicate those with nonsignificant tilt (P < 0.05, bootstrap test). About 30% of neurons exhibited significant tilts (28%, 18/64 for separable; 33%, 37/113 for inseparable). Therefore the distributions of disparity gradients in areas 17 and 18 are capable of supporting slant-in-depth encoding.


Figure 9
View larger version (11K):
[in this window]
[in a new window]
 
FIG. 9. Distributions of disparity gradients are shown of separable (A) and inseparable neurons (B). Most neurons had disparity gradients within a range of ±0.5. SD was 0.19 for the separable and 0.14 for the inseparable neurons. Black bars indicate cells for which the binocular RF showed significant tilt from a frontoparallel plane; white bars indicate those for which tilt were not significant.

 
Although paired recordings of multiple neurons are ideal for demonstrating variations of binocular RF tilts (see Figs. 7 and 8), such recordings are not always possible. The majority of the data in our sample must be analyzed as individual binocular RF recordings. Therefore we have analyzed the effects of potential artifactual sources that may contribute to apparent tilts of measured binocular RF profiles. One such possibility is a difference in viewing distances between left and right eyes that may be caused by positioning errors of the CRT monitor and the mirrors used in the haploscope setup (Fig. 2A). Another possibility is a magnification difference between left and right eyes that may result from improper corrections for refractive errors. Both of these optical factors produce apparent differences in the spatial frequency content as imaged on the retina for the two eyes. Contributions of viewing distance errors to disparity gradients are illustrated in Fig. 10. We are confident that our positioning error of optical elements in the setup is well within 5 cm. Given this assumption, what is the limit of erroneous change in the disparity gradient? Figure 10A shows that a 5-cm distance error translates into a disparity gradient of about 0.1. Distributions of disparity gradients for both separable and inseparable binocular RFs, and that of viewing distance errors, are illustrated in Fig. 10B. The SD ({sigma}) of the error distribution is set such that 1.96{sigma} = 0.1. Statistical tests for data and artifactual distributions are carried out by the F-test. Distribution of preferred disparity gradients is significantly wider than that of the error distribution (test for equal variance, F = 13.4, P < 0.001 for separable type; F = 7.42, P < 0.001 for inseparable type). Similarly, we also calculated possible contributions of interocular magnification differences.


Figure 10
View larger version (21K):
[in this window]
[in a new window]
 
FIG. 10. Potential contributions are examined of artifacts that produce apparent tilt of binocular RF and estimated disparity gradient. These factors include mismatch between the distances from the 2 eyes to the display arising from errors in positioning the display and haploscope mirrors, and image magnification difference arising from refractive errors. A: relationship between display positioning error (abscissa) and predicted disparity gradient (ordinate) is shown. Limits of possible positioning errors are estimated to be well below ±5 cm, which translate into disparity gradient errors of less than ±0.1. B: comparisons are shown of distribution of disparity gradients generated by artifacts and those of our data. Data distributions are shown for separable and inseparable binocular RFs by dashed and dotted curves, respectively. Disparity gradient distributions of actual data are significantly wider than that of experimental artifact (test for equal variance, inseparable: F = 13.4, P < 0.001; separable: F = 7.42, P < 0.001).

 
A 3% magnification difference between the two eyes (assuming the power of the cat’s eye of 78D, and |error in refractive correction in diopters| <2D) will result in a disparity gradient of ±0.03 (Hughes 1979Go). The SD for this distribution is so small that we can essentially ignore the effect of refractive errors. Even considering the simultaneous contributions of the two factors, the variations of binocular RF tilts observed in our data cannot be accounted for by these artifactual sources (test for equal variance, F = 6.7, P < 0.001 for separable type; F = 3.71, P < 0.001 for inseparable type). These results suggest that the tilts of binocular RFs and spatial frequency differences are intrinsic neuronal characteristics and are able to carry signals regarding 3D orientations of surfaces in visual scenes.

Relationship between disparity gradient and spatial frequency ratio

In representative examples shown in Figs. 68, the spatial frequency differences across the eyes were generally qualitatively correlated with the tilt of binocular RFs. How does this correlation hold for the entire population of neurons? In general, how do other binocular tuning characteristics correlate with monocular tuning properties? Figure 11 summarizes the results relevant for addressing these questions.


Figure 11
View larger version (35K):
[in this window]
[in a new window]
 
FIG. 11. Goodness of predictions of binocular RF tilts (expressed as disparity gradients) based on the ratios of left and right optimal spatial frequencies are examined. A and B: illustration of correlations of left and right optimal spatial frequencies for separable and inseparable binocular RFs, respectively. Optimal frequencies are obtained from the peak of fitted Gaussian functions. Dotted lines indicate 1 octave difference in the optimal frequencies for the 2 eyes. C and D: disparity gradients from separable and inseparable binocular RFs are plotted against the frequency ratios obtained from A and B. Frequency ratios are defined as fL/fR, where fL and fR, respectively, are left and right optimal spatial frequencies obtained from drifting sinusoidal grating tests. For inseparable neurons shown in D, there is a significant correlation (all data: r = 0.26, n = 90, P < 0.05, Spearman’s correlation coefficient). Correlation analysis on the subset of neurons that exhibited significant tilt (see Fig. 9B) showed somewhat improved correlation coefficient (r = 0.5, n = 29, P < 0.01). Correlation for separable type was not significant. Black and gray symbols indicate data from neurons that exhibited significant and nonsignificant tilts of binocular RFs. Circles and triangles depict cells recorded from areas 17 and 18, respectively. Solid line represents prediction of the dif-frequency version of disparity energy model, as derived from Eq. A8 (see APPENDIX). Labeled data points represent the example cells shown in Figs. 6 and 7. E and F: comparisons of spatial frequency and disparity frequency are illustrated for both separable (E) and inseparable RFs (F). Spatial frequency and disparity frequency are highly correlated in separable RFs. Dashed lines represent regression lines through the data and their slopes are 0.92 and 0.71 for E and F, respectively.

 
First, left and right preferred spatial frequencies, as obtained from tests using drifting sinusoidal gratings, are compared in Fig. 11, A and B for separable and inseparable binocular RF, respectively. Peak spatial frequencies are obtained from the peak of fitted Gaussian functions (Eq. 5). The identity relationship and 1-octave difference between the eyes are illustrated as solid and dotted lines, respectively. Circles and triangles indicate cells recorded from areas 17 and 18, respectively. Preferred spatial frequencies for the left and right eyes are well correlated (Pearson’s r = 0.96, n = 45 for separable type; r = 0.96, n = 90 for inseparable type, P < 0.05). Differences of left and right spatial frequencies were within the range of +1 to –1 octave regardless of separability.

To examine the correlation between the interocular frequency difference and the tilt of binocular RF, the ratios of preferred spatial frequencies were computed as follows and compared with the disparity gradients. The frequency ratio is given by

Formula 6(6)
where fL and fR are left and right preferred spatial frequencies from measurements with drifting gratings. The results of comparisons are illustrated in Fig. 11, C and D for separable and inseparable cells, respectively. Cells recorded from areas 17 and 18 are plotted as circles and triangles, respectively. Black and gray symbols indicate cells that exhibited significant and nonsignificant tilts of binocular RF, respectively, as shown in Fig. 9. Labeled symbols in Fig. 11, C and D indicate example cells shown in Fig. 6, AC and Fig. 7, A, C, E, and F, respectively. Error bars depict the SDs of disparity gradient. The frequency ratio and the disparity gradient were significantly correlated for inseparable neurons with significant binocular RF tilts (black symbols in Fig. 11D; r = 0.5, n = 29, P < 0.01, Spearman’s correlation coefficient). The correlation was not significant for separable neurons (black symbols in Fig. 11C; r = 0.27, n = 14, P > 0.05, Spearman’s correlation coefficient). Because our initial interest was primarily on disparity-selective complex cells, which tend to be inseparable binocularly, the number of separable cells is small in our sample, which may have affected the results. A solid line depicts the prediction based on the dif-frequency version of disparity energy model as illustrated in Fig. 1. The relationship for the theoretical curve is given as Eq. A8 (see APPENDIX). The limits of artifactual variations of disparity gradient about the predicted value are illustrated as dotted lines (prediction ±0.1, 1.96SD of artifactual distribution as shown in Fig. 10). Although the significance of correlation between two parameters suggests that the interocular difference in spatial frequency tuning underlies the tilted binocular RF structure, not all neurons lie on the theoretical line. Considering the variance in the data, 35.6% (16/45) of neurons fall within the dotted line for separable cells and 42.2% (38/90) for inseparable cells. Therefore approximately only one third of neurons behave in a manner consistent with the prediction of the dif-frequency model. However, responses of many neurons cannot be accounted for by the dif-frequency disparity energy model. There are neurons with a clear and statistically significant interocular spatial frequency difference, and yet possess clearly frontoparallel binocular RF, and vice versa. Therefore in sections further below, we will examine possibilities of additional factors that may contribute to tilts of binocular RFs.

An additional point was examined in relation to predicting binocular properties from monocular tuning characteristics. Figure 11, E and F shows the relationship between binocular disparity frequency and monocular preferred spatial frequency. The disparity energy model predicts identity between the two frequencies. Regarding this question, Ohzawa et al. (1997)Go reported the discrepancy between the model prediction and the data. They reported that the disparity frequency tended to be lower than the monocular spatial frequency as measured by drifting grating stimuli. Because their analyses were performed only for complex cells, it is not clear at which stage of binocular processing this discrepancy occurs. Based on a new set of data and a more robust analysis method, we have addressed this issue. In our analysis, we use Fourier analysis both for separable and inseparable RFs to obtain disparity frequencies. For the monocular preferred spatial frequency, the average of left and right preferred spatial frequencies (from data in Fig. 11, A and B) are used. Cells recorded from areas 17 and 18 are plotted as circles and triangles. The scatterplot for the inseparable binocular RFs showed a discrepancy between the disparity frequency and the spatial frequency, in that the disparity frequency tends to be lower than the monocularly measured preferred spatial frequency (Fig. 11F). Deviations of actual data from the identity line tended to be larger for high spatial frequencies (slope = 0.71). Because most inseparable binocular RFs are from complex cells (Fig. 5B), our data show a trend similar to that reported in previous work (Ohzawa et al. 1997Go). In contrast, separable binocular RFs show a much better fit with the identity relationship between the two frequencies. The slope of separable RF is close to 1 (slope = 0.92) (Fig. 11E). These results probably suggest that separable cells sum monocular inputs through linear processing, whereas neurons with inseparable RFs have substantial nonlinearities in their processing. The source of the deviation must therefore lie between the linear subunits of complex cells and the final complex cell stage if we assume the hierarchical organization similar to that in the disparity energy model.

Aspect ratio of binocular receptive field

Although the dif-frequency version of the disparity energy model accounts for the trend in the data as we have seen in the previous section, we wondered whether there are additional mechanisms by which tilted binocular RFs are constructed. Another possibility we examine is a hierarchical organization as illustrated in Fig. 12. A tilt in the binocular RF profile may be generated if the outputs of multiple disparity energy units are combined, where each unit is tuned to a specific disparity without tilt and its preferred disparity progressively shifts as a function of its frontoparallel position (Fig. 12A). Such a hierarchical pooling produces a binocular RF, shown in Fig. 12B. This neuron (Fig. 12B) will have a highly elongated and tilted binocular RF. The angle of tilt depends on the rate at which subunits’ preferred disparities shift with the frontoparallel position. Such an organization predicts a substantial elongation of the binocular RF in the frontoparallel dimension. The degree of pooling may be quantified by an aspect ratio of binocular RF. If the hierarchical organization underlies slant sensitivity of binocular neurons, there should be a correlation between the tilts of binocular RF and their aspect ratios.


Figure 12
View larger version (19K):
[in this window]
[in a new window]
 
FIG. 12. Diagram of a possible alternative hierarchical organization is shown, whereby the tilt of binocular RF is generated by convergence from multiple neurons with untilted binocular RFs. A: each subunit is constructed according to the original disparity energy model. Each subunit’s binocular RF is not tilted, but the optimal disparity progressively shifts depending on their RF’s frontoparallel position. B: a neuron in the next stage will have a highly elongated and tilted binocular RF. Angle of the tilt depends on the rate at which subunits’ preferred disparities shift with frontoparallel position. Such an organization predicts a substantial elongation of the binocular RF in the frontoparallel dimension. Degree of pooling may be quantified by an aspect ratio. Aspect ratio of a binocular RF is defined by the ratio of SDs along the major (a) and minor (b) axes of the RF envelope. C: frequency analysis of binocular RF. Sizes of binocular RFs are proportional to the inverse of SDs of spectral amplitude profiles.

 
To obtain structural parameters of binocular RFs such as the RF sizes and aspect ratios, we conducted frequency analysis (Fig. 12C). Although RF sizes may be obtained by direct measurements in the spatial domain (Fig. 12B), we have found that estimating RF size in the frequency domain (Fig. 12C) is more robust. The procedure for computing spectral data was described earlier (Fig. 4), except that a two-dimensional Gaussian function is fitted to the amplitude spectrum, and its SDs are used. Binocular RF sizes, 2a and 2b for frontoparallel and disparity directions, respectively, are calculated as the inverse of SDs of fitted spectral amplitude profiles

Formula 6
where {sigma}d and {sigma}f are the SDs of the fitted Gaussian in the disparity and frontoparallel frequencies, respectively. The aspect ratio of a binocular RF is defined by the ratio of SDs as

Formula 6
The aspect ratio of <1 indicates that the envelope of binocular RF is elongated along the disparity axis. If it is >1, the envelope of binocular RF is elongated along the frontoparallel axis. Therefore if there are neurons with the hierarchical organization as illustrated in Fig. 12, A and B, we would expect aspect ratios of RFs for those neurons to be substantially >1. The disparity energy model with no such pooling predicts the aspect ratio equal to 1.

Distributions of aspect ratios for separable and inseparable RFs are shown in Fig. 13, A and B. For most neurons, aspect ratios were >1 for inseparable RFs (Fig. 13B). Mean aspect ratios are 1.15 and 1.67 for separable and inseparable RFs, respectively. The result for inseparable RFs, the majority of which are complex cells, indicates a substantial degree of spatial pooling, deviating substantially from prediction of the disparity energy model. The relationship between the aspect ratio and the disparity gradient is presented in Fig. 13, C and D. If the hierarchical organization hypothesis (Fig. 12A) is correct as a basis for slant selectivity, neurons with highly elongated receptive fields should possess a wide range of disparity gradients. In contrast, neurons with aspect ratios close to 1 should show a narrow distribution for disparity gradients near zero. However, our data show the opposite: Disparity gradients tended to be highly variable for neurons with low aspect ratios, but were relatively small for those with high aspect ratios for inseparable RFs (separable: n = 45, P = 0.09, Mann–Whitney U test; inseparable: n = 90, P < 0.05, Mann–Whitney U test).


Figure 13
View larger version (19K):
[in this window]
[in a new window]
 
FIG. 13. A and B: distributions of aspect ratios for separable (A) and inseparable (B) binocular RFs are illustrated, respectively. Disparity energy model predicts an aspect ratio of 1 (solid vertical line). However, most inseparable RFs had aspect ratios >>1. Mean aspect ratio was 1.16 for separable RFs and 1.76 for inseparable RFs (vertical dashed line). C and D: relationships are shown between the aspect ratio and disparity gradient of separable (C) and inseparable (D) binocular RFs. Disparity gradients tended to be variable for neurons with low aspect ratios but were relatively small for those with high aspect ratio for inseparable RFs (separable: Mann–Whitney U test, P = 0.09, n = 45; inseparable: Mann–Whitney U test, P < 0.05, n = 90). E and F: relationships are illustrated between the sampling noise quantified in terms of SD of the disparity gradient and the aspect ratio. SDs of the disparity gradients are obtained from the distribution of disparity gradients by bootstrap resampling of 1,000 times. Same vertical scaling is used for CF for comparison.

 
It may be argued that the increased range of disparity gradients for neurons with small aspect ratios may simply reflect poorer reliability for estimating tilts for these cells. For example, orientations of ellipses may be determined more reliably for highly elongated ellipses than for nearly circular ones, given a constant level of noise or measurement errors. To examine this factor, we show the relationship between the confidence (SD, i.e., the length of error bars in Fig. 13, C and D) for estimates of the disparity gradient and the aspect ratio in Fig. 13, E and F. The scales of the vertical axes are equal for Fig. 13, C and D for comparison. Although there is a tendency for cells with smaller aspect ratios to have longer error bars (n = 18, P < 0.01, Pearson’s r = –0.77: black symbols in Fig. 13E), the error-bar length is much smaller than the mean value of disparity gradient. It is nearly a constant fraction (20%) of the absolute value of the disparity gradient for neurons with significant tilts (black symbols). Therefore it is probably accurate to say that the variability of disparity gradient is almost independent of the aspect ratio. Therefore these results indicate that highly tilted binocular RFs are not constructed by the spatial pooling process as shown in Fig. 12. Nevertheless, 21.6% (8/37) of significantly tilted inseparable cells have substantially elongated binocular RFs (aspect ratio ≥2), indicating that pooling may play at least some role in constructing slightly tilted binocular RFs.

The fact that the model of Fig. 12 is rejected should not be interpreted to mean that pooling is not important. Rather, it may play an important role in slant discrimination. It is possible that the role of pooling for constructing RF with a high aspect ratio is to create neurons that can signal near-zero surface slants with greater accuracy, allowing fine slant discriminations for surfaces near frontoparallel. Our results are certainly consistent with such a possibility.

Having defined the aspect ratio of binocular RF, we return to the question of the relationship between the interocular spatial frequency difference and binocular RF tilt. Do neurons with untilted binocular RF with a clear interocular spatial frequency difference have elongated RFs (with high aspect ratios)? These neurons cannot be explained by either of the models we have examined so far. However, one possibility we have not considered is the opposite of the model in Fig. 12 where the spatial pooling is performed over highly tilted subunits but along the exact frontoparallel direction. That is, although individual unpooled units possess tilted binocular RFs, the spatial pooling produces a counteracting effect, thereby canceling the tilts of pooled members. We therefore examined the aspect ratios of representative neurons of this type. Four neurons in the rightmost part of the scatterplot in Fig. 11D have been selected. These neurons have a frequency ratio >1.5. Three of the four neurons had large aspect ratios of 1.91-2.53, and one of them had an aspect ratio of 1.08. The results are not conclusive, but there is a tendency for these neurons to have highly elongated binocular RFs.

Does the aspect ratio relate to other parameters of binocular RF? Figure 14 illustrates relationships among the depth-domain aspect ratios, RF sizes, and preferred spatial frequencies. The relationships between the depth-domain aspect ratio and RF sizes are illustrated in Fig. 14, A and B. Sizes of binocular RFs are defined both in the disparity and frontoparallel directions as illustrated in Fig. 12. There is no correlation between the aspect ratio and the RF size in the frontoparallel direction (Fig. 14A) (separable: r = 0.14, P > 0.05, n = 45; inseparable: r = –0.02, P > 0.05, n = 90). In contrast, there is a significant negative correlation between the aspect ratio and the RF size in the disparity direction (Fig. 14B) (separable: r = –0.1, P > 0.05, n = 45; inseparable: r = –0.48, P < 0.001, n = 90). These results indicate that RFs with high aspect ratios tended to have narrow absolute RF sizes in the depth dimension.


Figure 14
View larger version (21K):
[in this window]
[in a new window]
 
FIG. 14. A and B: illustration of relationships between the aspect ratio of binocular RFs and their sizes in the frontoparallel and disparity directions, respectively. (A: separable: r = 0.09, P > 0.05, n = 45; inseparable: r = –0.02, P > 0.05; n = 90; B: separable: r = –0.1, P > 0.05, n = 45; inseparable: r = –0.48, P < 0.001, n = 90.) C and D: illustration of relationships between the preferred spatial frequency of neurons and their binocular RF sizes in the frontoparallel and disparity directions, respectively. (C: separable: r = –0.88, P < 0.001, n = 45; inseparable: r = –0.68, P < 0.001, n = 90; D: r = –0.86 P < 0.001, n = 45 for separable, r = –0.82, P < 0.001, n = 90 for inseparable.) E: relationship between preferred spatial frequency and aspect ratio is illustrated (Pearson’s separable: r = –0.1, P > 0.05, n = 45; inseparable: r = 0.5, P < 0.001, n = 90). Circles and triangles indicate cells recorded from areas 17 and 18, respectively. Open and filled symbols depict separable and inseparable RFs, respectively.

 
It is known that monocular RF sizes are inversely correlated with the preferred spatial frequency (DeAngelis et al. 1993bGo; De Valois et al. 1982Go). Because the RF size in the frontoparallel direction is essentially the average of monocular RF sizes, it is expected to show similar correlation with the preferred spatial frequency. How, then, is the RF size in the depth dimension related to the preferred spatial frequency? Figure 14, C and D depicts relationships between sizes of binocular RF and the preferred spatial frequency. As expected, RF sizes in the frontoparallel direction are inversely correlated with the preferred spatial frequency, as shown in Fig. 14C (C: r = –0.88, P < 0.001, n = 45 for separable; r = –0.68, P < 0.001, n = 90 for inseparable). Figure 14D illustrates the relationship between the RF size in disparity direction and the preferred spatial frequency. Again, significant correlations are observed (D: r = –0.86, P < 0.001, n = 45 for separable, r = –0.82, P < 0.001, n = 90 for inseparable). The results of Fig. 14D present evidence for a size–disparity correlation at the single-cell response level, indicating that neurons tuned to fine features tend to have a correspondingly small range of disparities for which they are sensitive (Ohzawa et al. 1997Go). These correlations for the binocular RF sizes appear to be natural results of the correlation found for the monocular RF size and the spatial frequency. Finally, in Fig. 14E, we present the relationship between the aspect ratio and the preferred spatial frequency. There is a significant correlation between aspect ratios and spatial frequencies for inseparable RFs (separable: r = –0.1, P > 0.05, n = 45; inseparable: r = 0.5, P < 0.001, n = 90). It is interesting that the neurons tuned to high spatial frequencies tended to have RFs highly elongated in the frontoparallel direction. These results suggest that the spatial pooling of basic disparity energy units (whose aspect ratios are 1) is not uniform in the binocular domain. The pooling occurs more for neurons tuned to high spatial frequencies and tends to occur only along the frontoparallel direction but not in the disparity direction.

Relationship between orientation and disparity gradient

Orientation bias was found for encoding of binocular disparity in that neurons with dissimilar RF profiles (RF phases) between the two eyes tended to prefer near-vertical orientations (DeAngelis et al. 1991Go, 1995Go; Ohzawa et al. 1996Go). Is there a similar orientation bias for the neural representation of slant-in-depth? The relationship between the preferred orientation and disparity gradient is illustrated in Fig. 15. The preferred orientation was evaluated by the peak of fitted Gaussian function to the orientation tuning data measured by drifting sinusoidal gratings. The average orientation for the two eyes was used. Each preferred orientation is represented as an angle from the horizontal. Black and gray symbols depict binocular RFs tilted significantly and nonsignificantly from the frontoparallel plane. Circles and triangles indicate cells recorded from areas 17 and 18, respectively. If the slant-in-depth encoding depends on the preferred orientation of RFs, there should be a positive correlation between these parameters, although no correlations are observed between the two parameters (Pearson’s r = 0.09, n = 168, P > 0.05).


Figure 15
View larger version (34K):
[in this window]
[in a new window]
 
FIG. 15. Relationship between the preferred orientation and the disparity gradient is illustrated. Preferred orientations are evaluated by the peak of orientation tuning measured by drifting grating test. Average of left and right preferred orientations is used. Each preferred orientation is normalized from 0 to 90°, for horizontal and vertical orientations. There are no correlations between two parameters (Pearson’s r = 0.09, P > 0.05, n = 168).

 
Relationship between SF/DF ratio and RF structures

As illustrated in Fig. 11F, there is a discrepancy between the preferred spatial frequency and the disparity frequency for inseparable RFs as originally reported by Ohzawa et al. (1997)Go and confirmed in the present study. That the discrepancy was found for inseparable binocular RFs, but not for separable ones, suggests that the discrepancy originates from a stage that pools the output of multiple simple-type subunits of complex disparity energy units. Moreover, nonlinearities in these pooling processes may be the source of the discrepancy. If this is the case, there may be a correlation between the size of discrepancy and the degree of pooling quantified by the aspect ratio. To examine this, we compared the aspect ratio with the size of discrepancy, which is quantified by the ratio of the preferred spatial frequency to the disparity frequency (SF/DF ratio). Because the preferred spatial frequency tended to be higher than the disparity frequency (Fig. 11F), the SF/DF ratio is >1 for most neurons.

Figure 16A illustrates the relationship between the aspect ratio and the SF/DF ratio. Filled and open symbols depict separable and inseparable RFs, respectively. Circles and triangles indicate cells recorded from areas 17 and 18, respectively. There is a positive significant correlation between the two parameters for inseparable RFs but not for separable RFs (separable: r = 0.09, P > 0.05, n = 45; inseparable: r = 0.55, P < 0.001, n = 90). Figure 16, B and C illustrates relationships between the RF size in the frontoparallel direction and the SF/DF ratio, and between RF size in the disparity direction and the SF/DF ratio, respectively. No significant correlations are observed from these scatters (B: r = 0.006, P > 0.05, n = 45 for separable; r = 0.08, P > 0.05, n = 90 for inseparable; C: r = –0.03, P > 0.05, n = 45 for separable; r = –0.2, P > 0.05, n = 90 for inseparable). The significant correlation between the SF/DF ratio and the aspect ratio suggests that the pooling process is responsible for the discrepancy between the monocular preferred spatial frequency and the binocular disparity frequency.


Figure 16
View larger version (27K):
[in this window]
[in a new window]
 
FIG. 16. Relationships are shown of binocular RF characteristics and ratio of spatial frequency to disparity frequency (SF/DF ratio). Because the disparity energy model predicts SF/DF = 1, the ratio reflects discrepancy between the data and predictions by the model. A: relationship between aspect ratio and SF/DF ratio is illustrated. Significant correlation (separable: r = 0.09, P > 0.1, n = 45; inseparable: r = 0.55, P < 0.001, n = 90) is observed for inseparable RFs. B and C: relationships between the SF/DF ratio and the binocular RF sizes in frontoparallel and disparity directions are illustrated, respectively. Circles and triangles indicate data for neurons recorded from areas 17 and 18, respectively. Open and filled symbols depict separable and inseparable RFs, respectively.

 

 DISCUSSION
 
 TOP
 ABSTRACT
 INTRODUCTION
 METHODS
 RESULTS
 DISCUSSION
 APPENDIX
 GRANTS
 ACKNOWLEDGMENTS
 REFERENCES
 
We approached the possible mechanisms for encoding of the 3D surface slant by analyzing characteristics of binocular RFs of neurons in early visual cortex. One of our key findings is that the neurons in the early visual cortex possess a variety of binocular RF tilts that are sufficient for encoding the range of physical surface slants that occur under normal viewing conditions. Therefore our results suggest a possibility that surface slant encoding as reported by previous studies in higher-order visual areas such as MT, CIP, V4, and IT (Hinkle and Connor 2002Go; Janssen et al. 1999Go, 2000Go; Liu et al. 2004Go; Nguyenkim and DeAngelis 2003Go; Taira et al. 2000Go) may originate at least partially in the early visual areas. In this section, we discuss potential problems in our experimental procedures and interpretations of our data, as well as relationships to previous findings in the literature.

Selectivity of neural responses for 3D surface slant was estimated by the tilt of binocular RF from the frontoparallel plane. Because the degrees of tilts are small, approximately 10° at most, we had to show that these tilts do not arise from noise, variabilities of experimental calibrations, or other nonneural factors. We have presented three pieces of evidence to establish that observed variations of binocular RF tilts are real and neural in origin. First, bootstrap tests were performed to show that measured binocular RFs possess significant nonzero tilt in the presence of neural response variability. Second, we have shown that pairs of neurons recorded simultaneously had significantly different RF tilts. If the RF tilts are attributable to optical factors such as errors in distance or magnification adjustments for the two eyes, the degree and direction of tilt should be similar for the pair of neurons. Our examples show tilts of binocular RFs that are in opposite directions between the pair of cells (Figs. 7 and 8). Third, we have determined that the contribution of artifactual sources of errors is much smaller than the variance we observe in the data. Distributions of actual RF tilts are significantly wider than that for artifacts (Fig. 10). Based on these pieces of evidence, we conclude that there are true variations in the tilt of binocular RFs that are neural in origin.

The range of distribution of disparity gradients, which were converted from the tilts of binocular RF maps, is ±0.5 both for separable and inseparable binocular RFs (Fig. 9). These ranges are capable of representing actual surface slants in the real world of >70° assuming the 57-cm viewing distance. The range of disparity gradient representation we have found (for the cat) is also similar to that for neurons in area MT of the monkey (Nguyenkim and DeAngelis 2003Go).

Although the possible ranges of slant encoding are similar across visual areas, there is a critical difference between the slant representation in the early visual cortex and higher-order areas. Neurons in extrastriate areas are known to have selectivities to surface orientations that are invariant with respect to positional disparity (Hinkle and Connor 2002Go; Nguyenkim and DeAngelis 2003Go; Taira et al. 2000Go). Although we have not tested explicitly for the disparity invariance of slant-in-depth selectivity, it is clear that a model based on the tilted binocular RF by itself cannot be position-disparity invariant. Therefore neurons in areas 17 and 18 are likely to be highly sensitive to position disparity, but they also carry additional information on surface slant because the RF model predicts the maximum firing for the neuron when both the disparity and the surface slant match the binocular RF.

Dif-frequency organization for slant-in-depth encoding

We have examined a modified disparity energy model based on the interocular spatial frequency difference (dif-frequency). When viewing a slanted 3D surface, the spatial frequency contents for the corresponding areas of the surface are different between the eyes (Blakemore 1970Go; Fiorentini and Maffei 1971Go; Tyler and Sutter 1979Go; Wilson 1976Go). It was known that for some neurons in early visual cortex, the preferred spatial frequencies for the two eyes were not always the same (Hammond and Pomfrett 1991Go; Read and Cumming 2003Go). However, no examination of corresponding predicted tilt in the binocular RF (Fig. 1), which would be more direct evidence for surface slant representation, was available. Comparison of the preferred disparity gradient and the ratio of optimal spatial frequencies for the two eyes as measured by drifting gratings shows a significant correlation between these two parameters. However, the correlation was significant only for neurons with inseparable RFs. In general, the correlation was not as good as we initially expected. In fact, only about one third of the neurons show responses consistent with the theoretical prediction. Other neurons are distributed outside the range of prediction for the dif-frequency disparity energy model. Thus we have explored additional alternative possibilities.

Hierarchical organization for slant-in-depth encoding

Another obvious possibility for generating slant-in-depth selectivity is by spatially pooling multiple neurons with progressive shifts of their preferred disparities (Fig. 12A). To examine this possibility, we analyzed the aspect ratio of binocular RFs. The prediction based on the pooling model of Fig. 12 was not fulfilled despite the evidence for extensive spatial pooling. On the contrary, the relationship between the aspect ratio and the disparity gradients correlated in the opposite direction from our expectation. The neurons with little pooling (aspect ratio near 1) tended to have a variety of preferred disparity gradients, whereas RFs of those with substantial pooling were not tilted (Fig. 13D). However, the opposite result does not necessary rule out possible roles for the spatial pooling. For example, we should also note the possibility that spatial pooling actively generates neurons with high aspect ratios and tuned to near-frontoparallel surface slants to enhance slant discrimination performance for near-frontoparallel surfaces. Such a possibility is consistent with our findings.

In addition, our findings may have a possible basis in the way disparity gradients and lateral spatial extents are negatively correlated. If we assume equal average physical spatial extents for depth and frontoparallel directions for a large number of objects in the physical world, slanted surfaces on average should occupy a narrower frontoparallel extent than that of nonslanted surfaces. It would be of interest to examine stereoimage statistics of natural scenes to determine the exact form of such a correlation.

Two kinds of dif-frequency models

Psychophysically, Halpern et al. (1996)Go reported that dif-frequency organization per se does not provide a robust slant-in-depth signal. Such a result appears to contradict the premise of the dif-frequency notion. However, we must note that there are two distinct levels of dif-frequency models. One is the strong form of the dif-frequency model that was examined and ruled out by Halpern et al. This model is based on the notion that a spatial frequency difference as such (without consistent local binocular correlations) is sufficient to signal surface slant. The other, weaker form of dif-frequency model, which we have examined, is an extension to the disparity energy model where the spatial frequency difference provides additional information regarding slant on top of local disparity information. Neurons with tilted binocular RFs will respond maximally when both the local binocular disparity and the surface slant simultaneously match the RF parameters. Therefore such a neuron is tuned to both the disparity and the interocular frequency difference.

Our findings on the effects of the interocular frequency difference is highly analogous to those reported by Bridge and Cumming (2001)Go with respect to the interocular orientation difference. They have found that monkey V1 neurons show responses to interocular orientation difference in a predictable manner based on the "dif-orientation" disparity energy model, and that the neural responses depend on both the binocular disparity and orientation difference. This is exactly what we find for spatial frequency. They have also found that the V1 neurons are not tuned for the relative orientation difference. Tuning for the relative orientation difference means that the optimal orientation difference is invariant regardless of the absolute orientations of the stumuli. Similarly, the model based on tilted binocular RF predicts no tuning for the relative spatial frequency difference. As with the disparity invariance of surface orientation tunings found in higher-order visual areas, tunings for the relative orientation or spatial frequency difference may be found in those cortical areas.

Is there an orientation bias for slant-in-depth encoding?

We investigated the possible orientation bias for slant-in-depth encoding because such an orientation bias has been found for phase-disparity encoding (DeAngelis et al. 1991Go, 1995Go). As apparent from Fig. 15, there is no orientation bias in the distributions of disparity gradients. Perhaps, this difference may be explained by the fact that the ratio of spatial frequencies across the eyes is independent of the orientation. In other words, neurons with any preferred orientation, except those tuned to the exact horizontal, can make equal contributions for signaling a given surface slant (i.e., by signaling a given spatial frequency ratio). The situation is quite different for the RF phase disparity because the key parameter—the horizontal disparity (the primary determinant of depth)—is dependent on the orientation. To produce neurons tuned to a given horizontal disparity, the required RF phase difference is smaller for neurons tuned to orientations closer to horizontal (Ohzawa et al. 1996Go). Therefore although there is no need for neurons having large phase difference at near horizontal orientations, neurons tuned to any orientation are equally important and useful for signaling slant information. Admittedly, this ishighly speculative, but the results presented in Fig. 15 appear quite natural based on these considerations.

Discrepancies between the monocular and the disparity-tuning properties

The discrepancies between the optimal spatial and disparity frequencies (Fig. 11F) are similar to those reported previously (Ohzawa et al. 1997Go; Read and Cumming 2003Go). However, we have found substantial discrepancies only for neurons with inseparable RFs but not those with separable RFs. These results suggest that the discrepancy originates from some form of nonlinearity in the pooling process that underlies a hierarchical chain of processing where outputs of units with separable RFs are used to construct neurons with inseparable RFs. This notion is strengthened by the results presented in Fig. 16 in that neurons with larger aspect ratios (thus more pooling) tended to have a greater degree of discrepancy. Unfortunately, from our study it is not possible to determine details of where exactly the presumed nonlinearity lies. For example, it is still not known whether neurons with large aspect ratios receive input from complex cells organized as a disparity energy unit (with aspect ratio = 1) or if they directly collect input from neurons with separable RFs without the intermediate units. Further studies will be needed to address these questions.

In conclusion, neurons in areas 17 and 18 appear to encode slant-in-depth of 3D surfaces by having a variety of tilts in their binocular RFs. There are sufficient variations in the RF tilt angles for representing the range of 3D surface slants that occur in the real world. However, there may be multiple mechanisms by which tilted binocular RFs are generated. RF tilts for a subset of neurons could be accounted for by the dif-frequency model. However, neither the dif-frequency model nor the hierarchical pooling model could completely explain the entire data. It is possible that these neurons in the early visual areas contribute to surface slant selectivity of neurons in higher-order visual areas.


 APPENDIX
 
 TOP
 ABSTRACT
 INTRODUCTION
 METHODS
 RESULTS
 DISCUSSION
 APPENDIX
 GRANTS
 ACKNOWLEDGMENTS
 REFERENCES
 
The relationship between disparity gradient and slant in the actual world is described by Blakemore (1970)Go. Here we extend the description of surface slant into the frequency domain, and derive the relationships among disparity gradient, tilt angle of binocular receptive field, disparity frequency, frontoparallel frequency, and monocular spatial frequencies.

When two objects, A and B, are separated in 3D space as shown in Fig. A1, disparity gradient for the line connecting the two objects ({Delta}d) is defined as the difference of binocular disparities for the two objects, (dAdB), divided by their spatial separation in cyclopean space ({gamma}). Thus the disparity gradient is defined as

Formula A1(A1)

Formula A2(A2)
where visual angles {alpha} and beta indicate the separation between the objects in the monocular retinal space for the left and right eyes, respectively. {alpha} and beta are calculated from the viewing distance and the separation of eyes as

Formula A3(A3)
where a indicates the separation between left and right eyes, b is the distance to the fixation point from the subject, and c is the distance between the fixation point and the objects. {phi} is the slant angle in the real-world space.


Figure 17
View larger version (20K):
[in this window]
[in a new window]
 
FIG. A1. A: schematic diagram is shown for illustrating the definition of disparity gradient. Because disparity gradient is a slope in the depth space, the horizontal distance between objects A and B is set arbitrarily equal to the distance between the eyes without losing generality for simplifying the derivation. B: diagram illustrates the geometrical relationship between the sizes of binocular RF as viewed by the 2 eyes ({alpha}, beta) and the tilt angle {theta} of binocular RF.

 
Similarly, slant of binocular RF may be expressed in terms of disparity gradient. When the size of binocular RF for the left and right eyes are {alpha} and beta, the binocular RF size is described as a function of slant of binocular RF, {theta}

Formula A4(A4)
Therefore by substituting {alpha} and beta into Eqs. A1 and A2, we obtain the disparity gradient as

Formula A5(A5)
If a slanted surface contains n cycles of a grating, the spatial frequencies as viewed by the left and right eyes are written as

Formula A6(A6)
Therefore the spatial frequency ratio is

Formula A7(A7)
The disparity gradient is rewritten in terms of the spatial frequency ratio as

Formula A8(A8)
The theoretical curve shown by solid line in Fig. 11, C and D is obtained from this equation.

Disparity frequency and frontoparallel frequency of tilted binocular RF of complex cell are derived as follows. We begin with a model of complex cells based on a generalized disparity energy model (Ohzawa et al. 1990Go, 1997Go; Qian and Mikaelian 2000Go) where the left and right spatial frequencies may be different. According to this model, a complex cell receives input from quadrature pairs of simple cells. Members of the quadrature pairs may be modeled as having left and right monocular RFs that are even (Weven) and odd (Wodd) symmetric

Formula A9(A9)

Formula A10(A10)
where {sigma} is the envelope width and fL, fR are spatial frequencies of the left and right RFs, respectively. {varphi} depicts phase disparity. Response of the complex cell is the sum of squared sums of the left and right RF profiles

Formula A11(A11)
Figure 1, C and E is derived from this equation

To present the binocular RF data, we remove the contributions of monocular terms by taking the difference of binocular RFs (measured with contrast-matched and mismatched stimuli) as described by Ohzawa et al. (1997)Go, thereby extracting the pure binocular interaction component. The last term of Eq. A11, 2(WLevenWReven + WLoddWRodd) is the binocular interaction component.

Therefore

Formula A12(A12)
We now rewrite Rinteraction in disparity and frontoparallel dimensions, by converting monocular positions, xL and xR, into binocular disparity (d) and fontoparallel position (h) (Tanabe et al. 2005Go). In performing this conversion, care must be used because the standard geometrical rules such as the Pythagorean theorem cannot be applied directly. That is, the disparity and frontoparallel dimensions are uneven as shown Fig. A2. For instance, when the left and right spaces span {alpha} degrees of visual angle, frontoparallel space, which is oriented 45° from left position axis in Fig. A2B, also spans {alpha} degrees. In contrast, the disparity dimension is expanded twofold from frontoparallel space, spanning 2{alpha} degrees. This unevenness is based on the definition of binocular disparity. The binocular disparity is described as the difference of left–right positions, and the frontoparallel position is defined as the average of left and right positions

Formula A13(A13)
Therefore left and right positions are written as

Formula A14(A14)
By substituting Eq. A14 into Eq. A12, Rinteraction is expressed as

Formula A15(A15)
Therefore based on this equation, frequencies in the frontoparallel and disparity dimensions are

Formula A16(A16)
When the left and right spatial frequencies are equal, fL = fR = f, the frontoparallel term within the cosine of Eq. A15 is zero. This case is identical to that presented in previous studies (Ohzawa et al. 1990Go, 1997Go).


Figure 18
View larger version (18K):
[in this window]
[in a new window]
 
FIG. A2. Relationships between the xL, xR domain and the disparity-frontoparallel dimension are shown. A: geometry of binocular viewing condition is illustrated. Gray diamond-shaped area is the region of real space that is jointly covered by left and right receptive fields. B: area corresponding to the diamond region in A is illustrated as the Cartesian xL, xR domain. When the monocular viewing angle is {alpha}, the disparity axis spans 2{alpha} and the frontoparallel axis spans {alpha}. Because of this asymmetry, the xL, xR domain and the disparity-frontoparallel domain cannot be transformed directly by standard coordinate rotations.

 

 GRANTS
 
 TOP
 ABSTRACT
 INTRODUCTION
 METHODS
 RESULTS
 DISCUSSION
 APPENDIX
 GRANTS
 ACKNOWLEDGMENTS
 REFERENCES
 
This work was supported by Ministry of Education, Culture, Sports, Science and Technology Grant 15029230 and the Project on Neuroinformatics Research in Vision through special coordination funds for promoting science and technology and by Japan Society for the Promotion of Science Grant 13308048.


 ACKNOWLEDGMENTS
 
 TOP
 ABSTRACT
 INTRODUCTION
 METHODS
 RESULTS
 DISCUSSION
 APPENDIX
 GRANTS
 ACKNOWLEDGMENTS
 REFERENCES
 
We thank laboratory members H. Tanaka, S. Nishimoto, R. Kimura, K. Sasaki, M. Fukui, M. Iida, M. Arai, T. Ninomiya, and T. Ishida, who participated in recording sessions.


 FOOTNOTES
 
The costs of publication of this article were defrayed in part by the payment of page charges. The article must therefore be hereby marked "advertisement" in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.

Address for reprint requests and other correspondence: I. Ohzawa, Graduate School of Frontier Biosciences and School of Engineering Science, Osaka University, 1-3 Machikaneyama, Toyonaka, Osaka 560-8531 Japan (E-mail: ohzawa{at}fbs.osaka-u.ac.jp)


 REFERENCES
 
 TOP
 ABSTRACT
 INTRODUCTION
 METHODS
 RESULTS
 DISCUSSION
 APPENDIX
 GRANTS
 ACKNOWLEDGMENTS
 REFERENCES
 
Anzai A, Ohzawa I, and Freeman RD. Neural mechanisms for processing binocular information I. Simple cells. J Neurophysiol 82: 891–908, 1999a.[Abstract/Free Full Text]

Anzai A, Ohzawa I, and Freeman RD. Neural mechanisms for processing binocular information II. Complex cells. J Neurophysiol 82: 909–924, 1999b.[Abstract/Free Full Text]

Barlow HB, Blakemore C, and Pettigrew JD. The neural mechanism of binocular depth discrimination. J Physiol 193: 327–342, 1967.[Abstract/Free Full Text]

Blakemore C. A new kind of stereoscopic vision. Vision Res 10: 1181–1199, 1970.[CrossRef][Web of Science][Medline]

Blakemore C, Fiorentini A, and Maffei L. A second neural mechanism of binocular depth discrimination. J Physiol 226: 725–749, 1972.[Abstract/Free Full Text]

Bridge H and Cumming BG. Responses of macaque V1 neurons to binocular orientation differences. J Neurosci 21: 7293–7302, 2001.[Abstract/Free Full Text]

Bridge H, Cumming BG, and Parker AJ. Modeling V1 neuronal responses to orientation disparity. Vis Neurosci 18: 879–891, 2001.[Web of Science][Medline]

Burt P and Julesz B. A disparity gradient limit for binocular fusion. Science 208: 615–617, 1980.[Abstract/Free Full Text]

DeAngelis GC, Ohzawa I, and Freeman RD. Depth is encoded in the visual cortex by a specialized receptive field structure. Nature 352: 156–159, 1991.[CrossRef][Medline]

DeAngelis GC, Ohzawa I, and Freeman RD. Spatiotemporal organization of simple-cell receptive fields in the cat’s striate cortex. I. General characteristics and postnatal development. J Neurophysiol 69: 1091–1117, 1993a.[Abstract/Free Full Text]

DeAngelis GC, Ohzawa I, and Freeman RD. Spatiotemporal organization of simple-cell receptive fields in the cat’s striate cortex. II. Linearity of temporal and spatial summation. J Neurophysiol 69: 1118–1135, 1993b.[Abstract/Free Full Text]

DeAngelis GC, Ohzawa I, and Freeman RD. Neuronal mechanisms underlying stereopsis: how do simple cells in the visual cortex encode binocular disparity? Perception 24: 3–31, 1995.[Web of Science][Medline]

De Valois RL, Albrecht DG, and Thorell LG. Spatial frequency selectivity of cells in macaque visual cortex. Vision Res 22: 545–559, 1982.[CrossRef][Web of Science][Medline]

Efron B. The Jackknife, the Bootstrap, and Other Resampling Plans. Philadelphia, PA: Society for Industrial and Applied Mathematics, 1982.

Efron B and Tibshirani R. An Introduction to the Bootstrap. New York: Chapman & Hall, 1993.

Ferster D. A comparison of binocular depth mechanisms in areas 17 and 18 of the cat visual cortex. J Physiol 311: 623–655, 1981.[Abstract/Free Full Text]

Fiorentini A and Maffei L. Binocular depth perception without geometrical cues. Vision Res 11: 1299–1305, 1971.[CrossRef][Web of Science][Medline]

Halpern DL, Wilson HR, and Blake R. Stereopsis from interocular spatial frequency differences is not robust. Vision Res 36: 2263–2270, 1996.[CrossRef][Web of Science][Medline]

Hammond P and Pomfrett CJ. Interocular mismatch in spatial frequency and directionality characteristics of striate cortical neurones. Exp Brain Res 85: 631–640, 1991.[Web of Science][Medline]

Hinkle DA and Connor CE. Three-dimensional orientation tuning in macaque area V4. Nat Neurosci 5: 665–670, 2002.[CrossRef][Web of Science][Medline]

Hubel DH and Wiesel TN. Receptive fields, binocular interaction and functional architecture in the cat’s visual cortex. J Physiol 160: 106–154, 1962.[Free Full Text]

Hubel DH and Wiesel TN. Receptive fields and functional architecture of monkey striate cortex. J Physiol 195: 215–243, 1968.[Abstract/Free Full Text]

Hughes A. A useful table of reduced schematic eyes for vertebrates which includes computed longitudinal chromatic aberrations. Vision Res 19: 1273–1275, 1979.[CrossRef][Web of Science][Medline]

Janssen P, Vogels R, and Orban GA. Macaque inferior temporal neurons are selective for disparity-defined three-dimensional shapes. Proc Natl Acad Sci USA 96: 8217–8222, 1999.[Abstract/Free Full Text]

Janssen P, Vogels R, and Orban GA. Three-dimensional shape coding in inferior temporal cortex. Neuron 27: 385–397, 2000.[CrossRef][Web of Science][Medline]

Jones JP and Palmer LA. The two-dimensional spatial structure of simple receptive fields in cat striate cortex. J Neurophysiol 58: 1187–1211, 1987.[Abstract/Free Full Text]

Jones JP, Stepnoski A, and Palmer LA. The two-dimensional spectral structure of simple receptive fields in cat striate cortex. J Neurophysiol 58: 1212–1232, 1987.[Abstract/Free Full Text]

LeVay S and Voigt T. Ocular dominance and disparity coding in cat visual cortex. Vis Neurosci 1: 395–414, 1988.[Web of Science][Medline]

Li B, Peterson MR, and Freeman RD. Oblique effect: a neural basis in the visual cortex. J Neurophysiol 90: 204–217, 2003.[Abstract/Free Full Text]

Liu Y, Vogels R, and Orban GA. Convergence of depth from texture and depth from disparity in macaque inferior temporal cortex. J Neurosci 24: 3795–3800, 2004.[Abstract/Free Full Text]

Mata ML and Ringach DL. Spatial overlap of ON and OFF subregions and its relation to response modulation ratio in macaque primary visual cortex. J Neurophysiol 93: 919–928, 2005.[Abstract/Free Full Text]

Mechler F and Ringach DL. On the classification of simple and complex cells. Vision Res 42: 1017–1033, 2002.[CrossRef][Web of Science][Medline]

Nelson JI, Kato H, and Bishop PO. Discrimination of orientation and position disparities by binocularly activated neurons in cat striate cortex. J Neurophysiol 40: 260–283, 1977.[Abstract/Free Full Text]

Nguyenkim JD and DeAngelis GC. Disparity-based coding of three-dimensional surface orientation by macaque middle temporal neurons. J Neurosci 23: 7117–7128, 2003.[Abstract/Free Full Text]

Nikara T, Bishop PO, and Pettigrew JD. Analysis of retinal correspondence by studying receptive fields of binocular single units in cat striate cortex. Exp Brain Res 6: 353–372, 1968.[Web of Science][Medline]

Nishimoto S, Arai M, and Ohzawa I. Accuracy of subspace mapping of spatiotemporal frequency domain visual receptive fields. J Neurophysiol 93: 3524–3536, 2005.[Abstract/Free Full Text]

Ohzawa I, DeAngelis GC, and Freeman RD. Stereoscopic depth discrimination in the visual cortex: neurons ideally suited as disparity detectors. Science 249: 1037–1041, 1990.[Abstract/Free Full Text]

Ohzawa I, DeAngelis GC, and Freeman RD. Encoding of binocular disparity by simple cells in the cat’s visual cortex. J Neurophysiol 75: 1779–1805, 1996.[Abstract/Free Full Text]

Ohzawa I, DeAngelis GC, and Freeman RD. Encoding of binocular disparity by complex cells in the cat’s visual cortex. J Neurophysiol 77: 2879–2909, 1997.[Abstract/Free Full Text]

Ohzawa I and Freeman RD. The binocular organization of complex cells in the cat’s visual cortex. J Neurophysiol 56: 243–259, 1986a.[Abstract/Free Full Text]

Ohzawa I and Freeman RD. The binocular organization of simple cells in the cat’s visual cortex. J Neurophysiol 56: 221–242, 1986b.[Abstract/Free Full Text]

Prazdny K. On the disparity gradient limit for binocular fusion. Percept Psychophys 37: 81–83, 1985.[Web of Science][Medline]

Qian N and Mikaelian S. Relationship between phase and energy methods for disparity computation. Neural Comput 12: 279–292, 2000.[CrossRef][Web of Science][Medline]

Read JC and Cumming BG. Testing quantitative models of binocular disparity selectivity in primary visual cortex. J Neurophysiol 90: 2795–2817, 2003.[Abstract/Free Full Text]

Ringach DL, Sapiro G, and Shapley R. A subspace reverse-correlation technique for the study of visual neurons. Vision Res 37: 2455–2464, 1997.[CrossRef][Web of Science][Medline]

Skottun BC, De Valois RL, Grosof DH, Movshon JA, Albrecht DG, and Bonds AB. Classifying simple and complex cells on the basis of response modulation. Vision Res 31: 1079–1086, 1991.[CrossRef][Web of Science][Medline]

Taira M, Tsutsui KI, Jiang M, Yara K, and Sakata H. Parietal neurons represent surface orientation from the gradient of binocular disparity. J Neurophysiol 83: 3140–3146, 2000.[Abstract/Free Full Text]

Tanabe S, Doi T, Umeda K, and Fujita I. Disparity-tuning characteristics of neuronal responses to dynamic random-dot stereograms in macaque visual area V4. J Neurophysiol 94: 2683–2699, 2005.[Abstract/Free Full Text]

Trivedi HP and Lloyd SA. The role of disparity gradient in stereo vision. Perception 14: 685–690, 1985.[Web of Science][Medline]

Tyler CW and Sutter EE. Depth from spatial frequency difference: an old kind of stereopsis? Vision Res 19: 859–865, 1979.[CrossRef][Web of Science][Medline]

Wilson HR. The significance of frequency gradients in binocular grating perception. Vision Res 16: 983–989, 1976.[CrossRef][Web of Science][Medline]




This article has been cited by other articles:


Home page
J. Neurophysiol.Home page
R. Kimura and I. Ohzawa
Time Course of Cross-Orientation Suppression in the Early Visual Cortex
J Neurophysiol, March 1, 2009; 101(3): 1463 - 1479.
[Abstract] [Full Text] [PDF]


Home page
J. Neurophysiol.Home page
K. S. Sasaki and I. Ohzawa
Internal Spatial Organization of Receptive Fields of Complex Cells in the Early Visual Cortex
J Neurophysiol, September 1, 2007; 98(3): 1194 - 1212.
[Abstract] [Full Text] [PDF]


This Article
Right arrow Abstract Freely available
Right arrow Full Text (PDF)
Right arrow All Versions of this Article:
95/5/2768    most recent
00955.2005v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Right arrow Citation Map
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via Web of Science (1)
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Sanada, T. M.
Right arrow Articles by Ohzawa, I.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Sanada, T. M.
Right arrow Articles by Ohzawa, I.


HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS
Visit Other APS Journals Online
Copyright © 2006 by the The American Physiological Society.