JN Fuel your research with LabChart
HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS
 QUICK SEARCH:   [advanced]


     


J Neurophysiol 95: 2768-2786, 2006. First published January 4, 2006; doi:10.1152/jn.00955.2005
0022-3077/06 $8.00
This Article
Right arrow Abstract Freely available
Right arrow Full Text (PDF)
Right arrow All Versions of this Article:
95/5/2768    most recent
00955.2005v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Right arrow Citation Map
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via ISI Web of Science (1)
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Sanada, T. M.
Right arrow Articles by Ohzawa, I.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Sanada, T. M.
Right arrow Articles by Ohzawa, I.

Encoding of Three-Dimensional Surface Slant in Cat Visual Areas 17 and 18

Takahisa M. Sanada1 and Izumi Ohzawa1,2

1Graduate School of Engineering Science and 2Graduate School of Frontier Biosciences, Osaka University, Osaka, Japan

Submitted 9 September 2005; accepted in final form 29 December 2005


    ABSTRACT
 TOP
 ABSTRACT
 INTRODUCTION
 METHODS
 RESULTS
 DISCUSSION
 APPENDIX
 GRANTS
 ACKNOWLEDGMENTS
 REFERENCES
 
How are surface orientations of three-dimensional objects and scenes represented in the visual system? We have examined an idea that these surface orientations are encoded by neurons with a variety of tilts in their binocular receptive field (RF) structure. To examine whether neurons in the early visual areas are capable of encoding surface orientations, we have recorded from single neurons extracellularly in areas 17 and 18 of the cat using standard electrophysiological methods. Binocular RF structures are obtained using a binocular version of the reverse correlation technique. About 30% of binocularly responsive neurons have RFs with statistically significant tilts from the frontoparallel plane. The degree of tilts is sufficient for representing the range of surface slants found in typical visual environments. For a subset of neurons having significant RF tilts, the degrees of tilt are correlated with the preferred spatial frequency difference between the two eyes, indicating that a modified disparity energy model can account for the selectivity, at least partially. However, not all cases could be explained by this model, suggesting that multiple mechanisms may be responsible. Therefore an alternative hypothesis is also examined, where the tilt is generated by pooling of multiple disparity detectors whose preferred disparities progressively shift over space. Although there is evidence for extensive spatial pooling, this hypothesis was not satisfactory either, in that the neurons with extensive pooling tended to prefer an untilted surface. Our results suggest that encoding of surface orientations may begin with the binocular neurons in the early visual cortex.


    INTRODUCTION
 TOP
 ABSTRACT
 INTRODUCTION
 METHODS
 RESULTS
 DISCUSSION
 APPENDIX
 GRANTS
 ACKNOWLEDGMENTS
 REFERENCES
 
One of the fundamental roles of the visual system is to reconstruct a three-dimensional (3D) model of the external world from a pair of two-dimensional images on the two retinae. Horizontal displacement of the eyes causes small differences between the retinal images. This difference of the retinal images is called binocular disparity and stereopsis is the process of determining depth from binocular disparity. Visual information processing for stereopsis begins in the primary visual cortex and neurons found in this area are known to encode binocular disparities of stimuli for a small area of visual field (Barlow et al. 1967Go; Ferster 1981Go; Hubel and Wiesel 1962Go, 1968Go; LeVay and Voigt 1988Go; Nikara et al. 1968Go; Ohzawa and Freeman 1986aGo,bGo; Ohzawa et al. 1990Go, 1996Go, 1997Go).

How does the processing of stereoscopic information proceed once binocular disparity for small localized areas is available? Is a possible next stage of processing that of detecting the rate of change of binocular disparity, i.e., detecting 3D orientations of surfaces in depth? Some recent studies have examined these possibilities and report that a subset of neurons in higher visual areas such as MT, V4, and CIPs encode information regarding slant/tilt of surfaces (Hinkle and Connor 2002Go; Nguyenkim and DeAngelis 2003Go; Taira et al. 2000Go). Response to 3D curvature is also reported in the inferotemporal cortex (IT) (Janssen et al. 1999Go, 2000Go Liu et al. 2004Go). It is not known, however, whether such surface slant/tilt sensitivity is a unique feature of these higher-order visual areas. Because neurons in these areas receive inputs from primary visual cortex, selectivity for 3D surface slant/tilt may be inherited from the early visual areas. Historically, the role of interocular orientation difference has been examined in some detail (Blakemore et al. 1972Go; Nelson et al. 1977Go). More recent work has examined whether a subset of V1 neurons encode surface tilt by orientation disparity based on physiological experiments in the monkey and a computational study (Bridge and Cumming 2001Go; Bridge et al. 2001Go). However, possible roles of interocular spatial frequency difference have not been examined physiologically.

As illustrated in Fig. 1A, projection of a slanted surface onto the two retinae produces a spatial frequency difference, such that the eye closer to the nearer end of the slanted surface sees higher spatial frequency than the other eye. Such a difference in spatial frequency across the eyes is a potent cue for perceiving surface slants. With psychophysical experiments, Blakemore and later investigators reported that a difference of spatial frequency across the eyes produces a perception of slant-in-depth (Blakemore 1970Go; Fiorentini and Maffei 1971Go; Wilson 1976Go). Binocular disparity caused by interocular spatial frequency difference is designated dif-frequency disparity (Tyler and Sutter 1979Go). As expected, the angle of perceived surface slant depends on interocular ratio of spatial frequencies. Despite these psychophysical results, we are not aware of any physiological study that has systematically examined possible roles of dif-frequency disparity for encoding surface slant in the early visual cortex. In this study, we will thus address this question using modern receptive field-mapping techniques.


Figure 1
View larger version (21K):
[in this window]
[in a new window]
 
FIG. 1. A possible neural encoding scheme for 3-dimensional (3D) surface slant is illustrated. A: difference of spatial frequency across the eyes (dif-frequency) produces a perception of slant-in-depth. Angle of perceived surface slant depends on the interocular ratio of spatial frequencies. B: disparity energy model is modified to allow encoding for interocular spatial frequency difference. All subunits (S) share the same receptive field position, orientation, and size for left and right eyes as in the standard model, but their preferred spatial frequencies differ across the eyes. CF: prediction of dif-frequency version of disparity energy model. Equal preferred spatial frequencies for the 2 eyes produces a frontoparallel binocular receptive field (RF) (C, E). Unequal preferred spatial frequencies cause a tilt of its binocular receptive field (D, F). Interocular SF ratio of 1.5 (left:right = 3:2) represents binocular RF tilt of about 10°, that corresponds to a slant in real space of about 70° from the frontoparallel plane at 57 cm viewing distance.

 
To provide a framework within which we design our experiments and analyze data, we start with the standard disparity energy model (Ohzawa et al. 1990Go). With the standard disparity energy model, parameter values for orientation, spatial frequency, position, and size of their monocular receptive field are the same across the eyes. Only the receptive field phase is allowed to be different across the eyes, and this difference in phase determines preferred binocular disparity. One obvious way to incorporate dif-frequency disparity sensitivity is to modify the standard disparity energy model and to allow spatial frequency to be different between left- and right-eye receptive fields of all subunits (S) (Fig. 1B). This idea was suggested in a previous computational study (Qian and Mikaelian 2000Go). Otherwise, the new model is identical to the standard model. Comparisons of predictions from the standard disparity energy model and those from the dif-frequency model are shown in middle and bottom rows, respectively, of Fig. 1. Figure 1C illustrates a binocular receptive field (RF) predicted from the standard disparity energy model where the spatial frequency-tuning curves are matched exactly between the two eyes (Fig. 1D). Notice in Fig. 1C that the strong region of excitation is exactly horizontal, indicating selectivity to the frontoparallel plane. However, a clear tilt in the binocular RF is predicted from the dif-frequency case as shown in Fig. 1, E and F. Note that, herein, we refer to the rotation angle of binocular RF from the frontoparallel axis as the "tilt" of binocular RF. The term "slant" is used exclusively for referring to angles of surfaces in the visual stimuli. Such differences in preferred spatial frequencies for the two eyes are not unreasonable assumptions. Actual neurons do not always prefer the same spatial frequencies for the two eyes (Hammond and Pomfrett 1991Go; Read and Cumming 2003Go).

Considering the predictions illustrated above from a modified version of the disparity energy model, we will first examine the extent to which neurons in areas 17 and 18 of the cat visual cortex exhibit tilted binocular RFs. We will also examine the validity of the dif-frequency disparity energy model by comparing the degree of tilt of binocular RF and monocular spatial frequency-tuning curves for the two eyes.


    METHODS
 TOP
 ABSTRACT
 INTRODUCTION
 METHODS
 RESULTS
 DISCUSSION
 APPENDIX
 GRANTS
 ACKNOWLEDGMENTS
 REFERENCES
 
All animal care and experimental procedures conformed to those established by the National Institutes of Health and were approved by the Osaka University Animal Care and Use Committee.

Surgical procedure and animal maintenance

Forty-four adult cats (1.5–4 kg) were prepared for electrophysiological recording as follows. First, subcutaneous injection was given of atropine sulfate (0.017 mg/kg) and hydroxyzine hydrochloride (Atarax, 0.83 mg/kg). Anesthesia was induced and maintained during surgery with isoflurane (2.5–3.5% in O2). Cefotiam hydrochloride (Panspolin, 2.8 mg/kg) and dexamethasone sodium phosphate (Decadron, 0.13 mg/kg) were administered. Electrocardiogram (ECG) electrodes and a rectal temperature probe were installed. The rectal temperature probe was coated with lidocaine ointment. ECG and core temperature were monitored using a custom-built PC-based physiological monitoring system. Catheters were inserted into femoral veins of two limbs for infusion of drugs and fluids. A glass tracheal cannula was inserted after tracheostomy. A stereotaxic apparatus was used to securely position the animal’s head. Lidocaine ointment was used at pressure points of ear bars. After securing the animal to the stereotaxic apparatus, anesthesia was switched to thiopental sodium (Ravonal, administered continuously in infusion, 1.0–1.5 mg · kg–1 · h–1). Then, paralysis was induced with an initial dose of gallamine triethiodide and the animal was placed under artificial respiration at the rate of 20–30 strokes/min. The respiration rate and stroke volume were adjusted to maintain the end-tidal CO2 between 3.5 and 4.3%. A CO2 sensor (Datex-Ohmeda) was used to maintain a proper level of respiration. Anesthesia for the rest of recording session was maintained by a combination of 70% N2O-30% O2 and thiopental sodium as noted above. Paralysis was maintained by continuous infusion of Ravonal, gallamine triethiodide (10 mg · kg–1 · h–1) in lactated Ringer solution containing 50% glucose (40 mg · kg–1 · h–1). Body temperature was maintained near 38.3°C with the use of a servo-controlled heating pad. After securing the animal, a craniotomy was performed to access the central representation of the visual area 17 or 18 (Horsley–Clarke P4 L2.5 for recordings of A17, A3 L3 for A18). The dura was carefully removed to allow insertion of microelectrodes. Pupils were dilated with atropine (1%), and nictitating membranes were retracted with phenylephrine hydrochloride (Neosynesin, 5%). Contact lenses of appropriate power with 4-mm artificial pupil were placed over the corneas.

The area of recording was primarily determined by the coordinate of electrode penetrations, although histological confirmation of recorded areas was conducted for the majority of animals. There is a possibility that a small fractions of neurons, especially from long penetrations, may be classified into a wrong cortical area. However, we did not eliminate those neurons (for which we were not completely certain of the area) from our analyses because they still represent important and valid samples for purposes of this study and there were no obvious areal differences.

Experimental apparatus

Tungsten microelectrodes (5 M{Omega}, A-M Systems) were used to record spike activities extracellularly. To increase the chance of encountering cells, two electrodes were mounted in parallel in a protective single guide tube and driven by a common microelectrode drive (Narishige). After confirming under a microscope that the electrodes do not penetrate blood vessels on the cortical surface, agar in warm Ringer solution was applied to stabilize and protect the cortex. Then, melted wax was applied over the agar to form a sealed chamber. An oscilloscope and audio speakers were used to monitor raw signals from the microelectrodes. Electrical signals from the microelectrodes were amplified (10,000x) and band-pass filtered (300–5,000 Hz). Then spike sorting was achieved using a custom-built spike sorter (Ohzawa et al. 1996Go), where each spike was sorted by their waveforms and time stamped with 40-µs resolution.

Visual stimulation and recording procedures

Experiment control functions and generations of visual stimuli were performed using custom-built software. Visual stimuli are generated by a dedicated PC and displayed on a CRT display (Sony GDM-FW900, a resolution of 1,600 x 1,024 pixels, covering the display area of 46.6 x 29.9 cm, 34.3 dot/deg; refresh rate: 76 Hz). A custom-built mirror haploscope was used to present stimuli to left and right eyes separately (Fig. 2A). To preclude projection of stimulus to contralateral eye, a separator was placed between the left and right visual fields. Distance between the screen and the eyes was set to 57 cm, subtending the visual field of 23.3(horizontal) x 29.9(vertical) degrees for each eye. Because we were examining interocular differences in neuronal responses, we carefully set up the haploscope and adjusted distances to the screen to equate the viewing conditions for two eyes as much as possible. The display surface of CRT monitor was carefully set perpendicular to the lines of sights for the subject.


Figure 2
View larger version (20K):
[in this window]
[in a new window]
 
FIG. 2. Experimental setup and binocular reverse correlation analysis are illustrated. A: one-dimensional sparse bar noise stimuli were presented to left and right eyes simultaneously by a mirror haploscope setup. B: all possible combinations of left and right eye stimulus position are included for each left–right permutation of contrast sign (dark–dark, bright–bright, dark–bright, bright–dark). Spike trains were cross-correlated with stimulus sequences, and results are displayed as binocular receptive field maps for the 4 permutations (only the map for bright–dark is depicted for clarity).

 
After isolation of spike waveforms from one or more cells, approximate receptive field locations, preferred orientations, and spatial frequencies were determined manually by a mouse-controlled search program. Then, a standard reverse correlation procedure (DeAngelis et al. 1993aGo,bGo; Jones and Palmer 1987Go; Jones et al. 1987Go) was performed to obtain the accurate position and size of receptive field of two eyes. Subspace reverse correlation (Ringach et al. 1997Go) was then conducted for the dominant eye to obtain the preferred orientation and spatial frequency. Peaks of orientation and spatial frequency tuning correspond well to the peak values obtained by tests using drifting sinusoidal gratings (Nishimoto et al. 2005Go). After these preliminary tests, a binocular receptive field map was measured by a binocular reverse correlation procedure (Ohzawa et al. 1990Go, 1997Go). To compare interocular spatial frequency difference and tilt of binocular receptive field, spatial frequency, and orientation tests using drifting sinusoidal grating were performed for both left and right eyes, while using optimal values for other stimulus parameters for each cell. However, when using grating stimuli, contrast and temporal frequency were generally set to 50% and 2 Hz, respectively.

Binocular receptive field mapping

Binocular reverse correlation procedure was equivalent to that used by Ohzawa et al. (1997)Go. A pair of one-dimensional bar stimuli was simultaneously presented to left and right eyes by a mirror haploscope setup (Fig. 2A). Twenty stimulus locations of the bar were used to stimulate receptive fields for each eye. This defined 20 x 20-point stimulus grid in the (XL, XR) domain (Fig. 2B). Therefore the binocular receptive field was measured by tallying up responses to 1,600 (20 x 20 x 4) different dichoptic pairs of stimuli. The orientation of the bar stimuli was set to the preferred orientation for each eye and for each cell. All possible combinations of left and right eye stimulus positions were included for each left–right permutation of contrast sign (dark–dark, bright–bright, dark–bright, bright–dark). All pairs of positions and combinations of stimulus contrast were presented in a random order, each stimulus lasting for 26 ms (two video frames) or 53 ms (four video frames) without any blank stimulus. Stimulus sequence was reshuffled for each set. A complete stimulus sequence lasted 42 s. Typically 20–40 sequences were used, which took 20–40 min in all. The response map for each contrast subset was calculated by cross-correlating spike trains with stimulus sequences (Fig. 2B). Binocular receptive field is a sum of response maps for matched polarity (bright–bright and dark–dark) conditions minus those for mismatched polarity (dark–bright and bright–dark) conditions. Monocular responses are cancelled by this computation and do not appear in the binocular RF (Ohzawa et al. 1997Go). We calculated the binocular RF for correlation delays from –100 to +300 ms in 5-ms steps. Because there is no correlation between spike train and stimulus sequence for negative time delays, we defined the response at negative time delays as noise. To obtain the optimal correlation delay, the sum of squared value of all data points in the RF at each correlation delay was obtained for the range of delays, and the peak delay was determined. A binocular receptive field is constructed at this optimal correlation delay. To evaluate the signal-to-noise ratio, we calculated the SD of the response at the optimal correlation delay divided by the average SD for negative correlation delays (–100 to –5 ms in 5-ms steps). We rejected data when the total spikes are <1,000 impulses and the peak response at the optimal delay did not exceed the mean of response at negative correlation delays +10SD.

Spatial frequency-tuning test

Left and right spatial frequency tunings were obtained by using drifting sinusoidal gratings in a separate test. Orientations of grating stimuli were fixed at the optimal value for each eye because preferred orientations were typically different by 5–15° for the two eyes, probably arising from cyclorotation of the eye after paralysis (Nelson et al. 1977Go). The gratings were presented in a random order and each presentation lasted for 4 s interspersed with 1 s of interstimulus intervals. Mean firing rates were calculated at each spatial frequency. One-dimensional Gaussian functions were fitted to each spatial frequency tuning. Preferred spatial frequencies were obtained by the peak position of the fitted Gaussian function.


    RESULTS
 TOP
 ABSTRACT
 INTRODUCTION
 METHODS
 RESULTS
 DISCUSSION
 APPENDIX
 GRANTS
 ACKNOWLEDGMENTS
 REFERENCES
 
Binocular tests were conducted for a total of 271 neurons that were recorded from both areas 17 and 18 of 44 cats. Of these, binocular RFs could be obtained with sufficient signal-to-noise ratio (see METHODS) for 177 neurons. These neurons are further classified into two groups. Sixty-four neurons are classified as separable type and 113 neurons are classified as inseparable.

Binocular RFs for representative examples from the separable and inseparable types are illustrated in Fig. 3. Figure 3A depicts a binocular RF for a simple cell recorded in area 18. The binocular RF appears to be described reasonably well by a product of left and right monocular receptive field profiles for simple cells, as reported by Anzai et al. (1999a)Go. Correlation between the standard simple/complex RF types and separability of binocular RF is high, but these classifications are not identical. This issue, including the basis for our choice of using the separability, will be described later (see following text). An exemplar complex cell recorded in area 18 is illustrated in Fig. 3C. The binocular RF showed a horizontally elongated structure like that in previous studies (Anzai et al. 1999bGo; Ohzawa et al. 1990Go, 1997Go). Complex cells tend to exhibit binocular RFs that are not left–right separable. Such inseparable receptive fields are well described by a disparity energy model where the sum of output of quadrature pairs of separable RFs constructs an inseparable RF (Anzai et al. 1999aGo,bGo; Ohzawa et al. 1990Go, 1997Go). On closer examination of this binocular RF, we noticed a small amount of tilt in the binocular RF from the frontoparallel axis in the clockwise direction. We wished to determine whether these small tilts are reliable properties of the neurons or arise from experimental noise or variability. Note that a small degree of tilt in the (XL, XR) domain translates into a substantially larger surface slant in real object space in front of the animal. This is because, under realistic viewing conditions, the lines of sight from the two eyes to a fixation point crosses with a much more acute angle than the 90 ° angle for the (XL, XR) domain. For example, given a viewing distance of 57 cm and interpupillary distance of 3 cm, a tilt of 5° in the (XL, XR) domain is equal to the surface slant that is 73.3° from the frontoparallel plane (see APPENDIX). Therefore even a small visible tilt in the (XL, XR) domain may have a large perceptual significance.


Figure 3
View larger version (27K):
[in this window]
[in a new window]
 
FIG. 3. Binocular receptive fields and their Fourier spectra are shown for simple (A, B) and complex (C, D) cells, respectively. Binocular RF of simple cells tended to be separable in the (XL, XR) domain with 4 peaks in the spectrum, whereas those for complex cells tended to be inseparable and with 2 peaks in the frequency domain. There is a small but apparent tilt ({theta}) of the binocular RF in C.

 
To estimate quantitatively the tilt of binocular RFs, we analyzed binocular RFs in the frequency domain. Frequency analysis is highly effective for evaluating the orientation of binocular RF without regard to specific local features of the RF, phase, or position. It also uses the entire set of RF data. Representative Fourier spectra of binocular RFs are shown in the right column. Figure 3, B and D shows Fourier spectra of the binocular RFs shown in Fig. 3, A and C, respectively. The axes (along oblique edges) of the domain are now left and right frequencies. The spectrum for separable binocular RF (Fig. 3B) has four peaks, whereas that for the inseparable neuron (Fig. 3D) shows a pair of strong peaks.

Alternatively, the same frequency domain may be referenced by a pair of orthogonal axes, along the vertical and horizontal directions corresponding to the diagonals of the square domain (Fig. 3B). These dimensions are defined as the disparity frequency and the frontoparallel frequency for vertical and horizontal axes, respectively (see APPENDIX). Interestingly, the four quadrants of the domain may be assigned to either disparity frequency tuning or frontoparallel frequency tuning. Top and bottom quadrants represent tuning for disparity, as indicated by two spectral peaks in Fig. 3D. The locations of the peaks in these domains allow extraction of such parameters as the optimal disparity frequency and binocular RF tilt. Left and right quadrants, on the other hand, will have substantial peaks only for separable neurons, and represent spatial frequency tuning of combined input from the two eyes. Therefore the peaks in these quadrants define the optimal frontoparallel frequency.

The process of determining binocular RF parameters in the frequency domain is illustrated further in Fig. 4. Based on the observation that substantial peaks are present in the left and right quadrants only for separable RFs, we define an index of separability of receptive field in the XL, XR domain, the binocular separability index (BSI), as follows

Formula 1(1)
where RD is the peak response amplitude in the bottom quadrant. RF is the response in the left quadrant along the cross section parallel to the left frequency axis going through the peak in the bottom quadrant, and taken at the same right frequency (inset of Fig. 4A). The left rather than the right quadrant is selected arbitrarily because the profiles in the left and right quadrants are symmetrical about the origin. The value of BSI ranges from 0 to 1. Based on the disparity energy model, simple cells will exhibit high BSI and complex cells will show BSI close to 0. Therefore neurons with BSI >0.73 are defined as the separable type, and otherwise, the inseparable type. The cutoff criterion for the BSI (0.73) gave the most consistent agreement with our visual inspection; neurons that have BSI values >0.73 have visually separable profiles for binocular RF and vice versa.


Figure 4
View larger version (16K):
[in this window]
[in a new window]
 
FIG. 4. Procedures are shown for computing binocular separability index (BSI) and binocular RF tilt angle ({theta}) from the spectra (see text). A: BSI is determined by a ratio of 2 spectral peak amplitudes, RF and RD, taken from a cross-sectional profile through the highest peak of the spectrum. When BSI is >0.73, neurons are classified as separable, and inseparable otherwise. BSI and {theta} for this cell (same as that for Fig. 3, A and B) are 0.92 and 3°, respectively. B: BSI for this complex cell (same as that for Fig. 3, C and D) is low (BSI = 0.37), indicating clear inseparability. C: tilt angle ({theta}) of the binocular RF is calculated from the angular position of the peak in the frequency domain, as the arctangent of the ratio of the peak frequencies for left and right eyes. Tilt angle was –4° for this cell. Same procedure is used for both separable and inseparable types. Cross sections going through the spectral peak that are parallel to the left and right frequency axes depict monocular spatial frequency-tuning curves, as estimated from the binocular RF data. Monocular spatial frequency tuning for left and right are drawn as solid and dashed curve, respectively. Line that goes through the spectral peak and the origin is defined as the cardinal disparity axis for the neuron.

 
Using the same spectral profile of the binocular RF as above, the "tilt" of the binocular RF ({theta}) may be defined as the angular deviation of the spectral peak from the disparity frequency axis, connecting the top and bottom corners of Fig. 4C. If the spectral peak is exactly on the disparity frequency axis, original binocular RF has zero tilt. A nonzero {theta} indicates a corresponding tilt of binocular RF.

To estimate peak frequencies with greater accuracy, we interpolated the Fourier spectrum by cubic spline before evaluating the binocular RF tilt. A spatial frequency step of 0.005 (cycles/deg) is used as the resolution of interpolation for all neurons. To determine the step size for interpolation, we calculated percentage errors for binocular RF tilts for various resolutions of interpolation by simulations. Fourier transforms are performed on simulated binocular RF data obtained from model binocular complex cells with various interocular spatial frequency ratios (fL/fR = 0.66 to 1.5), and various disparity frequencies (fdisparity = 0.07 to 0.5; see APPENDIX). The data array is set to the same size as that in our experiments (20 x 20 grid). Then, interpolations are tested for various final resolutions (0.005 to 0.1 cycle/deg). On average, a sufficiently small error level (0.96 ± 0.03% error) for binocular RF tilt is obtained with the interpolation resolution of 0.005 cycle/deg. The percentage error increased to 13.43 ± 0.17% at the 0.1 cycle/deg resolution.

Because there is always a spectral peak in the bottom quadrant regardless of binocular RF separability, the calculations outlined above are applicable both to separable and to inseparable type of neurons. Note that cross sections going through the spectral peak that are parallel to the left and right frequency axes depict monocular spatial frequency-tuning curves, as estimated from the binocular RF data. These tuning curves are illustrated at the bottom left and right insets of Fig. 4C. The "tilt" angle of the binocular RF ({theta}) may be determined from the peak coordinate of the binocular RF (f0L, f0R), as follows

Formula 2(2)
The line that goes through the spectral peak and the origin is defined as the cardinal disparity axis for the neuron.

Binocular RF tilt {theta} is transformed into disparity gradient, which is more commonly used to quantify slants of oriented surfaces in 3D. Disparity gradient represents surface slant independent of viewing distance. It is usually defined as

Formula 3(3)
where dA and dB are binocular disparities for two observed objects and {gamma} is the angular separation between the directions for the two objects as viewed from the cyclopean eye, i.e., the midpoint between the two eyes (Burt and Julesz 1980Go). Therefore a slant in actual space can be represented as disparity gradient, which may take on a value between –2.0 and 2.0. Disparity gradients at these limiting values indicate the cases where two objects lie on a common line of sight for one eye. It was reported that absolute value of disparity gradient for two dots must be <1–2 for binocular fusion depending on exact dot parameters (Burt and Julesz 1980Go; Prazdny 1985Go; Trivedi and Lloyd 1985Go). For this reason, we would expect most neurons to be encoding disparity gradient within these limits, if neural encoding of surface slants is constructed in an efficient manner. Note that the disparity gradient in Eq. 3 defines a property of the stimulus configuration. What we wish to estimate here instead is a property of a binocular neuron, i.e., its preferred disparity gradient given the cell’s binocular RF profile. This may be obtained from the binocular RF tilt {theta}, as described in the following equation

Formula 4(4)
To intuitively grasp the relationship between these metrics of surface slants, consider the following realistic example. When the binocular RF tilt {theta} is 10°, the disparity gradient is 0.35, which corresponds to about 80° of physical surface slant at 57 cm of viewing distance. Using the disparity gradient as defined above, we will quantify and summarize RF slant for all neurons below.

As illustrated in Figs. 3 and 4, simple and complex cells tended to show different binocular RF profiles, binocularly separable and inseparable, respectively. However, simple/complex and separable/inseparable classifications are not the same. There are simple cells that are classified as inseparable, and vice versa. For the reasons outlined below, we will use the separable/inseparable type classification throughout the paper. However, before we set out to perform all the analyses based on this classification, we should examine the correlation between the two classification methods.

Note that an ideal complex cell based exactly on the disparity energy model will have a BSI of exactly 0 (Anzai et al. 1999bGo; Ohzawa et al. 1990Go, 1997Go). On the other hand, ideal binocular simple cells that linearly sum left and right eye input will have a BSI of 1 (Anzai et al. 1999aGo; Ohzawa et al. 1990Go, 1996Go). The actual population of neurons we have recorded exhibited substantial deviations from the ideal cases as illustrated in Fig. 5.


Figure 5
View larger version (17K):
[in this window]
[in a new window]
 
FIG. 5. Comparison of conventional simple/complex-type classification and classification by separability of binocular RF is illustrated. A: correlation is shown between left and right F1/F0 ratios. This ratio is used in conventional classification of simple (F1/F0 > 1) and complex cells based on the degree of response modulation to drifting sinusoidal gratings. Ratios for left and right eyes showed significant correlation (Pearson’s r = 0.87, P < 0.01). However, for some neurons, classified types were mismatched between the eyes. B: scatterplot is shown of F1/F0 ratio vs. binocular separability index (BSI). These 2 parameters showed a significant correlation (Pearson’s r = 0.76 and 0.78 for left and right eyes, respectively, P < 0.001, n = 135). F1/F0 ratios were obtained from responses to drifting sinusoidal gratings of optimal spatial frequency for each eye. Open and filled symbols depict data from left and right eyes, respectively. Therefore each cell has 2 symbols for F1/F0 ratios, connected by a line segment for indicating paired data. C: F1/F0 ratios show a bimodal distribution. Filled and open bars indicate right and left eyes, respectively. D: distribution of BSI is shown. Majority of neurons have inseparable binocular RF (as indicated by BSI <0.73), many of which are classified as simple based on the F1/F0 ratio.

 
First, for the simple/complex classification, we use the standard criteria based on the F1/F0 ratio, the ratio of the amplitude modulation (AM) in response to an optimal drifting sinusoidal grating stimulus to the average firing rate for the same response (Li et al. 2003Go; Skottun et al. 1991Go). Relationships between left and right F1/F0 ratios are plotted in Fig. 5A. The ratios were evaluated at the optimal spatial frequency for each eye. Circle and triangle symbols indicate data recorded from areas 17 and 18, respectively. The correlation of F1/F0 ratios for the left and right eyes is highly significant (for area 17, r = 0.9, P < 0.001, N = 66; for area 18, r = 0.83, P < 0.001, N = 69). However, there are several neurons with a large mismatch in the F1/F0 ratios between the eyes. That is, some neurons had highly modulated responses to sinusoidal drifting gratings for one eye, but practically no modulation was observed for the opposite eye. The relationship between F1/F0 ratio and BSI is illustrated in Fig. 5B. Open and filled symbols depict data for the left and right eyes, respectively. Each cell has two symbols for F1/F0 ratios (for the two eyes), connected by a line segment for indicating paired data. Although these two parameters show significant correlations (r = 0.76 and 0.78 for left and right, respectively; P < 0.001, n = 135), there are many cases where the predictions of ideal model cases break down. For example, neurons with BSI values close to zero had a wide variety of F1/F0 ratios, indicating that binocularly inseparable RFs may be observed commonly in both simple and complex cells. Figure 5, C and D indicates the distributions for left and right F1/F0 ratios and the distribution of BSI, respectively. Filled and open bars in Fig. 5C indicate data for the right and left eyes, respectively. The F1/F0 ratios show a bimodal distribution as reported previously (Li et al. 2003Go; Mechler and Ringach 2002Go). Note also that BSI is derived directly from the data from a key binocular measurement in this study, whereas F1/F0 ratios are obtained from monocular tests and therefore are expected to be less directly linked to binocular properties. There have also been questions about a multitude of factors that influence F1/F0 ratios (Mata and Ringach 2005Go). Considering further that the use of the classical criteria in simple/complex classification can sometimes result in discrepant types between the eyes, the use of binocular separability of the RF offers a better classification method overall for the purposes of this study.

Recall that one of the purposes of this study is to examine whether the apparent tilt of binocular RF profile is based on the difference in the optimal spatial frequencies across the eyes (Fig. 1). The question is addressed in the next several figures based on results of binocular RF and spatial frequency-tuning measurements from both binocularly separable and inseparable neurons. Data from representative examples of binocularly separable neurons are illustrated in Fig. 6. Binocular RF profiles are shown in the left column. In the middle column, monocular Fourier spectra derived from the binocular RF are shown as solid and dashed curves for the left and right eyes, respectively. These are cross sections through the peak of the Fourier spectrum as illustrated in Fig. 4C, taken parallel to the left and right frequency axes. Actual spatial frequency-tuning curves obtained by drifting sinusoidal grating stimuli are illustrated in the right column. The predicted tuning curves in the middle column and those in the right column should be comparable directly under certain linearity assumptions (DeAngelis et al. 1993aGo,bGo). Open and filled symbols depict responses for the left and right eyes, respectively. Error bars represent the SE. A horizontal dashed line indicates the spontaneous firing rate. A Gaussian function of the following form is fitted to each tuning curve

Formula 5(5)
Only those cells that had significantly modulated responses as a function of spatial frequency (ANOVA, P < 0.05) are included in further analyses of spatial frequency tunings. From these fits, preferred spatial frequencies were obtained from the peak of Gaussian function (f0). We used two criteria for selection of spatial frequency-tuning curves that have gone into the summary. First, the goodness of fit is >60%. Second, we selected only those responses that showed a band-pass tuning for the two eyes. Cells exhibiting low-pass spatial frequency tuning are excluded because it is difficult to determine the peak spatial frequency accurately for these neurons. For a spatial frequency tuning to be considered as band-pass, there must be at least two data points below f0, the peak of the fitted Gaussian function.


Figure 6
View larger version (28K):
[in this window]
[in a new window]
 
FIG. 6. Examples are shown of 3 separable type neurons that have different frequency tunings across the eyes. A, left: separable binocular RF of a simple cell is depicted. A, middle: left and right Fourier spectra of the binocular RF are shown as solid and dashed curves, respectively. Although the tilt of binocular RF is small (binocular RF tilt = –7.0), it is statistically significant (P < 0.05, bootstrap test). Predicted disparity gradient is –0.25. A, right: spatial frequency-tuning curves obtained by drifting sinusoidal grating stimuli are illustrated. Open and filled symbols depict responses for the left and right eyes, respectively. Error bars depict SEs. Horizontal dashed line indicates the spontaneous firing rate. Gaussian functions were fitted to the tuning curves. Vertical thin lines indicate the peaks of fitted Gaussian function for left and right eyes, respectively. Preferred spatial frequencies are significantly different between the eyes (bootstrap test, P < 0.05). Spatial frequency ratio is 0.71. B: data from another separable binocular RF is shown for a simple cell in the same format as A. Binocular RF is significantly tilted from frontoparallel plane (binocular RF tilt = –6.3; P < 0.05, bootstrap test). Predicted disparity gradient is –0.22. Spatial frequency ratio is 0.69. C: additional example of separable binocular RFs is shown (binocular RF tilt = –8.5). Predicted disparity gradients and spatial frequency ratio are –0.3 and 0.7, respectively.

 
For the cell presented in Fig. 6A, tilt of the binocular RF (as measured by the displacement of spectral peaks illustrated in Fig. 4C) is statistically significant (tilt = –7°, P < 0.05, bootstrap test). The bootstrap test for estimating significance of tilt is conducted as follows. Binocular RF mapping consists of trials, each of which contains a randomized sequence of complete permutations of left and right stimuli (Ohzawa et al. 1990Go, 1997Go). From spike data for a total of N (typically 40) trials, N trials are randomly drawn while allowing duplications, from which a new binocular RF is constructed. For each neuron, this process was repeated 1,000 x to obtain the estimates of variability in the RF measurements (Efron 1982Go; Efron and Tibshirani 1993Go). When the mean tilt of the distribution of resampled binocular RFs was deviated from zero by more than 1.96SD, the RF tilt was judged to be significant. With this criterion, the probability of RF tilt being on the opposite side of zero is <5%.

The binocular RF tilt determined as above is automatically reflected as a difference in the predicted spatial frequency-tuning curves shown in Fig. 6A (middle). The predicted disparity gradient for this cell is 0.25, as calculated by Eq. 4. A similar statistically significant difference in the optimal spatial frequencies for the two eyes is also observed for the actual tuning curves measured by drifting sinusoidal gratings (Fig. 6A, right; P < 0.05, bootstrap test) in that the optimal spatial frequency for the right eye (vertical dashed line) is higher than that for the left eye (vertical solid line). Therefore for this neuron, there is a good correspondence between the tilt of the binocular RF (measured by reverse correlation) and the interocular difference between the optimal spatial frequencies (measured by drifting gratings).

Similar additional data from two separable binocular RFs are shown in Fig. 6, B and C in the same format as that of Fig. 6A. For these two cells (both of which were simple), tilt angles {theta} of binocular RFs were significantly different from zero (P < 0.05, bootstrap test). The tilt angles of binocular RF for Fig. 6, B and C are –6.3 and –8.5°, which correspond to predicted preferred disparity gradients of –0.22 and –0.3, respectively. Again, for these additional cells, the actual spatial frequency-tuning curves measured by drifting gratings (right column) also show statistically significant difference between the eyes (P < 0.05, bootstrap test). The ratios of optimal spatial frequencies (left/right) are 0.71, 0.69, and 0.70 for cells in Fig. 6, AC, respectively. Again, the direction of the difference in predicted spatial frequency-tuning profiles (middle column) corresponds well to that for the measured data (right column) for each neuron. Therefore these results for binocularly separable neurons indicate that the tilt angles of their binocular RFs and their predicted disparity gradients correspond well with the left–right differences of optimal spatial frequencies measured by monocularly presented drifting gratings.

Data from representative examples of inseparable neurons are illustrated in Fig. 7 in the same format as that of Fig. 6. Spatial frequency-tuning curves are not available for B and D either because spikes for one of the cells appeared after the initial tuning tests were already completed or, for the case of D, data for the frequency-tuning test did not show significantly modulated responses as a function of spatial frequency (P > 0.05, ANOVA). Binocular RFs shown in Fig. 7, pairs A and B, C and D are from neurons that were recorded simultaneously. The neuron shown in A is an example for which the binocular RF was significantly tilted from the frontoparallel plane (P < 0.05, bootstrap test). In fact, all of the examples except for that in Fig. 7C had statistically significant tilt for their binocular RFs. Note that the neuron illustrated in B had a statistically significant tilt in the opposite direction from the other member of the pair shown in A. The opposite tilt directions for the pairs of neurons clearly indicate that the tilts of binocular RFs do not arise from optical factors such as errors in the eye-display distances or magnification differences between the eyes. Because there are significant differences in the degree of tilt among simultaneously recorded neurons, these variations must be neural in origin.


Figure 7
View larger version (21K):
[in this window]
[in a new window]
 
FIG. 7. Examples are shown of 6 inseparable type neurons. A and B and C and D are pairs of neurons recorded simultaneously, and the data represent responses to the same stimuli. Binocular receptive fields are shown in the left column and their Fourier spectra are shown in the middle column. Dotted and white lines in the left panels indicate tilt angle of binocular RF and frontoparallel line, respectively. Right column depicts monocular spatial frequency-tuning curves obtained by drifting sinusoidal grating stimuli. Format of the figure is otherwise identical to that of Fig. 6. A and B: binocular RFs of both neurons are tilted significantly from frontoparallel plane (P < 0.05, bootstrap test) but their tilts are in the opposite directions (tilt angles are A: –4.0°, B: 7.7°). Spatial frequency ratio for A is 0.62. C: binocular RF of this neuron is not significantly tilted. D: binocular RF of this neuron is tilted significantly from frontoparallel plane (10.2°; P < 0.05, bootstrap test). Spatial frequency-tuning data for this cell are not available. E and F: 2 additional examples are shown of inseparable type neurons. Both of these neurons exhibited significant tilt from the frontoparallel plane (tilt angles are E: –3.8°, F: 6.8°; P < 0.05, bootstrap test). Preferred spatial frequencies are different for the 2 eyes for both neurons. Spatial frequency ratios are 0.76 and 1.28 for E and F, respectively. All of the neurons in this figure were classified as complex, except for B and D for which spatial frequency tunings are not available.

 
Another pair of simultaneously recorded neurons also exhibited a clear difference in the tilts of binocular RFs. Although the neuron depicted in Fig. 7C did not have a significant RF tilt, the other member of the pair had its RF tilted significantly from the frontoparallel plane. Tilt angles for D, E, and F are 10.2, –3.8, and 6.8°, which correspond to 0.36, –0.13, and 0.24 as disparity gradients, respectively.

As with binocularly separable neurons presented in Fig. 6, independent measurements of spatial frequency-tuning curves are also conducted using drifting sinusoidal gratings. Preferred spatial frequencies, shown as vertical solid and dashed thin lines in the right column, differ significantly across the eyes for Fig. 7, A, E, and F (P < 0.05, bootstrap test), but not for Fig. 7C (P > 0.05, bootstrap test). This is consistent with the lack of significant tilt of binocular RF for this neuron. Therefore for all cases shown in Fig. 7, A, C, E, and F, directions of interocular spatial frequency difference correspond well to the frequency difference of binocular RFs.

Paired recordings are also possible between neurons of different binocular separability. Such an example is shown in Fig. 8. Binocular RFs shown in Fig. 8, A and B are separable and inseparable RF, respectively. For both neurons, binocular RFs exhibit significant tilts from the frontoparallel plane (P < 0.05, bootstrap test). Furthermore, the tilts are in opposite directions between the two neurons. The tilt angles for cells in Fig. 8, A and B are 3.2 and –3.7°, with the corresponding preferred disparity gradients of 0.11 and –0.13, respectively. Actual spatial frequency-tuning curves measured with drifting gratings are shown in the right column. As expected from the binocular RF tilts, the interocular difference in the preferred spatial frequencies are opposite for the two neurons. The left preferred spatial frequency is significantly higher than the right frequency (frequency ratio = 1.23, P < 0.05, bootstrap test) for A; the difference is significant and opposite (frequency ratio = 0.87, P < 0.05, bootstrap test) for B. Taken together with the results from the previous figure, both separable and inseparable binocular RFs show a variety of tilts that are consistent with the interocular difference in the monocularly measured preferred spatial frequencies. Therefore the notion of the basis of 3D surface tilt representation, as illustrated in Fig. 1, appears quite likely based on these examples.


Figure 8
View larger version (23K):
[in this window]
[in a new window]
 
FIG. 8. Binocular RFs are illustrated for a pair of neurons. Format of figure is equivalent to Figs. 6 and 7. A: a separable neuron is shown for which the left peak spatial frequency is significantly higher than that for the right (P < 0.05, bootstrap test). Spatial frequency ratio is 1.23. B: an inseparable neuron is depicted similarly. Left and right spatial frequencies are significantly different, and the binocular RF tilted significantly from frontoparallel (tilt angle is –3.7°; P < 0.05, bootstrap test). Spatial frequency ratio is 0.87. Preferred spatial frequencies for the left and right eyes are significantly different for both neurons (bootstrap test, P < 0.05). Note that the directions of interocular spatial frequency shifts are opposite between the 2 neurons. Data from these neurons are recorded about 100 min apart, but without any optical disturbances such as contact lens or eye manipulations or display movements across the measurements.

 
What is the range of binocular RF tilts observed for cells in areas 17 and 18? Distributions of disparity gradients of both separable and inseparable cells are illustrated in Fig. 9. Most neurons had disparity gradients in the range of –0.5 to 0.5. The SDs of the mean were 0.19 and 0.14 for separable and inseparable types, respectively. Black bars indicate cells whose binocular RFs are tilted significantly from the frontoparallel plane, whereas white bars indicate those with nonsignificant tilt (P < 0.05, bootstrap test). About 30% of neurons exhibited significant tilts (28%, 18/64 for separable; 33%, 37/113 for inseparable). Therefore the distributions of disparity gradients in areas 17 and 18 are capable of supporting slant-in-depth encoding.


Figure 9
View larger version (11K):
[in this window]
[in a new window]
 
FIG. 9. Distributions of disparity gradients are shown of separable (A) and inseparable neurons (B). Most neurons had disparity gradients within a range of ±0.5. SD was 0.19 for the separable and 0.14 for the inseparable neurons. Black bars indicate cells for which the binocular RF showed significant tilt from a frontoparallel plane; white bars indicate those for which tilt were not significant.

 
Although paired recordings of multiple neurons are ideal for demonstrating variations of binocular RF tilts (see Figs. 7 and 8), such recordings are not always possible. The majority of the data in our sample must be analyzed as individual binocular RF recordings. Therefore we have analyzed the effects of potential artifactual sources that may contribute to apparent tilts of measured binocular RF profiles. One such possibility is a difference in viewing distances between left and right eyes that may be caused by positioning errors of the CRT monitor and the mirrors used in the haploscope setup (Fig. 2A). Another possibility is a magnification difference between left and right eyes that may result from improper corrections for refractive errors. Both of these optical factors produce apparent differences in the spatial frequency content as imaged on the retina for the two eyes. Contributions of viewing distance errors to disparity gradients are illustrated in Fig. 10. We are confident that our positioning error of optical elements in the setup is well within 5 cm. Given this assumption, what is the limit of erroneous change in the disparity gradient? Figure 10A shows that a 5-cm distance error translates into a disparity gradient of about 0.1. Distributions of disparity gradients for both separable and inseparable binocular RFs, and that of viewing distance errors, are illustrated in Fig. 10B. The SD ({sigma}) of the error distribution is set such that 1.96{sigma} = 0.1. Statistical tests for data and artifactual distributions are carried out by the F-test. Distribution of preferred disparity gradients is significantly wider than that of the error distribution (test for equal variance, F = 13.4, P < 0.001 for separable type; F = 7.42, P < 0.001 for inseparable type). Similarly, we also calculated possible contributions of interocular magnification differences.


Figure 10
View larger version (21K):
[in this window]
[in a new window]
 
FIG. 10. Potential contributions are examined of artifacts that produce apparent tilt of binocular RF and estimated disparity gradient. These factors include mismatch between the distances from the 2 eyes to the display arising from errors in positioning the display and haploscope mirrors, and image magnification difference arising from refractive errors. A: relationship between display positioning error (abscissa) and predicted disparity gradient (ordinate) is shown. Limits of possible positioning errors are estimated to be well below ±5 cm, which translate into disparity gradient errors of less than ±0.1. B: comparisons are shown of distribution of disparity gradients generated by artifacts and those of our data. Data distributions are shown for separable and inseparable binocular RFs by dashed and dotted curves, respectively. Disparity gradient distributions of actual data are significantly wider than that of experimental artifact (test for equal variance, inseparable: F = 13.4, P < 0.001; separable: F = 7.42, P < 0.001).

 
A 3% magnification difference between the two eyes (assuming the power of the cat’s eye of 78D, and |error in refractive correction in diopters| <2D) will result in a disparity gradient of ±0.03 (Hughes 1979Go). The SD for this distribution is so small that we can essentially ignore the effect of refractive errors. Even considering the simultaneous contributions of the two factors, the variations of binocular RF tilts observed in our data cannot be accounted for by these artifactual sources (test for equal variance, F = 6.7, P < 0.001 for separable type; F = 3.71, P < 0.001 for inseparable type). These results suggest that the tilts of binocular RFs and spatial frequency differences are intrinsic neuronal characteristics and are able to carry signals regarding 3D orientations of surfaces in visual scenes.

Relationship between disparity gradient and spatial frequency ratio

In representative examples shown in Figs. 68, the spatial frequency differences across the eyes were generally qualitatively correlated with the tilt of binocular RFs. How does this correlation hold for the entire population of neurons? In general, how do other binocular tuning characteristics correlate with monocular tuning properties? Figure 11 summarizes the results relevant for addressing these questions.


Figure 11
View larger version (35K):
[in this window]
[in a new window]
 
FIG. 11. Goodness of predictions of binocular RF tilts (expressed as disparity gradients) based on the ratios of left and right optimal spatial frequencies are examined. A and B: illustration of correlations of left and right optimal spatial frequencies for separable and inseparable binocular RFs, respectively. Optimal frequencies are obtained from the peak of fitted Gaussian functions. Dotted lines indicate 1 octave difference in the optimal frequencies for the 2 eyes. C and D: disparity gradients from separable and inseparable binocular RFs are plotted against the frequency ratios obtained from A and B. Frequency ratios are defined as fL/fR, where fL and fR, respectively, are left and right optimal spatial frequencies obtained from drifting sinusoidal grating tests. For inseparable neurons shown in D, there is a significant correlation (all data: r = 0.26, n = 90, P < 0.05, Spearman’s correlation coefficient). Correlation analysis on the subset of neurons that exhibited significant tilt (see Fig. 9B) showed somewhat improved correlation coefficient (r = 0.5, n = 29, P < 0.01). Correlation for separable type was not significant. Black and gray symbols indicate data from neurons that exhibited significant and nonsignificant tilts of binocular RFs. Circles and triangles depict cells recorded from areas 17 and 18, respectively. Solid line represents prediction of the dif-frequency version of disparity energy model, as derived from Eq. A8 (see APPENDIX). Labeled data points represent the example cells shown in Figs. 6 and 7. E and F: comparisons of spatial frequency and disparity frequency are illustrated for both separable (E) and inseparable RFs (F). Spatial frequency and disparity frequency are highly correlated in separable RFs. Dashed lines represent regression lines through the data and their slopes are 0.92 and 0.71 for E and F, respectively.

 
First, left and right preferred spatial frequencies, as obtained from tests using drifting sinusoidal gratings, are compared in Fig. 11, A and B for separable and inseparable binocular RF, respectively. Peak spatial frequencies are obtained from the peak of fitted Gaussian functions (Eq. 5). The identity relationship and 1-octave difference between the eyes are illustrated as solid and dotted lines, respectively. Circles and triangles indicate cells recorded from areas 17 and 18, respectively. Preferred spatial frequencies for the left and right eyes are well correlated (Pearson’s r = 0.96, n = 45 for separable type; r = 0.96, n = 90 for inseparable type, P < 0.05). Differences of left and right spatial frequencies were within the range of +1 to –1 octave regardless of separability.

To examine the correlation between the interocular frequency difference and the tilt of binocular RF, the ratios of preferred spatial frequencies were computed as follows and compared with the disparity gradients. The frequency ratio is given by

Formula 6(6)
where fL and fR are left and right preferred spatial frequencies from measurements with drifting gratings. The results of comparisons are illustrated in Fig. 11, C and D for separable and inseparable cells, respectively. Cells recorded from areas 17 and 18 are plotted as circles and triangles, respectively. Black and gray symbols indicate cells that exhibited significant and nonsignificant tilts of binocular RF, respectively, as shown in Fig. 9. Labeled symbols in Fig. 11, C and D indicate example cells shown in Fig. 6, AC and Fig. 7, A, C, E, and F, respectively. Error bars depict the SDs of disparity gradient. The frequency ratio and the disparity gradient were significantly correlated for inseparable neurons with significant binocular RF tilts (black symbols in Fig. 11D; r = 0.5, n = 29, P < 0.01, Spearman’s correlation coefficient). The correlation was not significant for separable neurons (black symbols in Fig. 11C; r = 0.27, n = 14, P > 0.05, Spearman’s correlation coefficient). Because our initial interest was primarily on disparity-selective complex cells, which tend to be inseparable binocularly, the number of separable cells is small in our sample, which may have affected the results. A solid line depicts the prediction based on the dif-frequency version of disparity energy model as illustrated in Fig. 1. The relationship for the theoretical curve is given as Eq. A8 (see APPENDIX). The limits of artifactual variations of disparity gradient about the predicted value are illustrated as dotted lines (prediction ±0.1, 1.96SD of artifactual distribution as shown in Fig. 10). Although the significance of correlation between two parameters suggests that the interocular difference in spatial frequency tuning underlies the tilted binocular RF structure, not all neurons lie on the theoretical line. Considering the variance in the data, 35.6% (16/45) of neurons fall within the dotted line for separable cells and 42.2% (38/90) for inseparable cells. Therefore approximately only one third of neurons behave in a manner consistent with the prediction of the dif-frequency model. However, responses of many neurons cannot be accounted for by the dif-frequency disparity energy model. There are neurons with a clear and statistically significant interocular spatial frequency difference, and yet possess clearly frontoparallel binocular RF, and vice versa. Therefore in sections further below, we will examine possibilities of additional factors that may contribute to tilts of binocular RFs.

An additional point was examined in relation to predicting binocular properties from monocular tuning characteristics. Figure 11, E and F shows the relationship between binocular disparity frequency and monocular preferred spatial frequency. The disparity energy model predicts identity between the two frequencies. Regarding this question, Ohzawa et al. (1997)Go reported the discrepancy between the model prediction and the data. They reported that the disparity frequency tended to be lower than the monocular spatial frequency as measured by drifting grating stimuli. Because their analyses were performed only for complex cells, it is not clear at which stage of binocular processing this discrepancy occurs. Based on a new set of data and a more robust analysis method, we have addressed this issue. In our analysis, we use Fourier analysis both for separable and inseparable RFs to obtain disparity frequencies. For the monocular preferred spatial frequency, the average of left and right preferred spatial frequencies (from data in Fig. 11, A and B) are used. Cells recorded from areas 17 and 18 are plotted as circles and triangles. The scatterplot for the inseparable binocular RFs showed a discrepancy between the disparity frequency and the spatial frequency, in that the disparity frequency tends to be lower than the monocularly measured preferred spatial frequency (Fig. 11F). Deviations of actual data from the identity line tended to be larger for high spatial frequencies (slope = 0.71). Because most inseparable binocular RFs are from complex cells (Fig. 5B), our data show a trend similar to that reported in previous work (Ohzawa et al. 1997Go). In contrast, separable binocular RFs show a much better fit with the identity relationship between the two frequencies. The slope of separable RF is close to 1 (slope = 0.92) (Fig. 11E). These results probably suggest that separable cells sum monocular inputs through linear processing, whereas neurons with inseparable RFs have substantial nonlinearities in their processing. The source of the deviation must therefore lie between the linear subunits of complex cells and the final complex cell stage if we assume the hierarchical organization similar to that in the disparity energy model.

Aspect ratio of binocular receptive field

Although the dif-frequency version of the disparity energy model accounts for the trend in the data as we have seen in the previous section, we wondered whether there are additional mechanisms by which tilted binocular RFs are constructed. Another possibility we examine is a hierarchical organization as illustrated in Fig. 12. A tilt in the binocular RF profile may be generated if the outputs of multiple disparity energy units are combined, where each unit is tuned to a specific disparity without tilt and its preferred disparity progressively shifts as a function of its frontoparallel position (Fig. 12A). Such a hierarchical pooling produces a binocular RF, shown in Fig. 12B. This neuron (Fig. 12B) will have a highly elongated and tilted binocular RF. The angle of tilt depends on the rate at which subunits’ preferred disparities shift with the frontoparallel position. Such an organization predicts a substantial elongation of the binocular RF in the frontoparallel dimension. The degree of pooling may be quantified by an aspect ratio of binocular RF. If the hierarchical organization underlies slant sensitivity of binocular neurons, there should be a correlation between the tilts of binocular RF and their aspect ratios.


Figure 12
View larger version (19K):
[in this window]
[in a new window]
 
FIG. 12. Diagram of a possible alternative hierarchical organization is shown, whereby the tilt of binocular RF is generated by convergence from multiple neurons with untilted binocular RFs. A: each subunit is constructed according to the original disparity energy model. Each subunit’s binocular RF is not tilted, but the optimal disparity progressively shifts depending on their RF’s frontoparallel position. B: a neuron in the next stage will have a highly elongated and tilted binocular RF. Angle of the tilt depends on the rate at which subunits’ preferred disparities shift with frontoparallel position. Such an organization predicts a substantial elongation of the binocular RF in the frontoparallel dimension. Degree of pooling may be quantified by an aspect ratio. Aspect ratio of a binocular RF is defined by the ratio of SDs along the major (a) and minor (b) axes of the RF envelope. C: frequency analysis of binocular RF. Sizes of binocular RFs are proportional to the inverse of SDs of spectral amplitude profiles.

 
To obtain structural parameters of binocular RFs such as the RF sizes and aspect ratios, we conducted frequency analysis (Fig. 12C). Although RF sizes may be obtained by direct measurements in the spatial domain (Fig. 12B), we have found that estimating RF size in the frequency domain (Fig. 12C) is more robust. The procedure for computing spectral data was described earlier (Fig. 4), except that a two-dimensional Gaussian function is fitted to the amplitude spectrum, and its SDs are used. Binocular RF sizes, 2a and 2b for frontoparallel and disparity directions, respectively, are calculated as the inverse of SDs of fitted spectral amplitude profiles

Formula 6
where {sigma}d and {sigma}f are the SDs of the fitted Gaussian in the disparity and frontoparallel frequencies, respectively. The aspect ratio of a binocular RF is defined by the ratio of SDs as

Formula 6
The aspect ratio of <1 indicates that the envelope of binocular RF is elongated along the disparity axis. If it is >1, the envelope of binocular RF is elongated along the frontoparallel axis. Therefore if there are neurons with the hierarchical organization as illustrated in Fig. 12, A and B, we would expect aspect ratios of RFs for those neurons to be substantially >1. The disparity energy model with no such pooling predicts the aspect ratio equal to 1.

Distributions of aspect ratios for separable and inseparable RFs are shown in Fig. 13, A and B. For most neurons, aspect ratios were >1 for inseparable RFs (Fig. 13B). Mean aspect ratios are 1.15 and 1.67 for separable and inseparable RFs, respectively. The result for inseparable RFs, the majority of which are complex cells, indicates a substantial degree of spatial pooling, deviating substantially from prediction of the disparity energy model. The relationship between the aspect ratio and the disparity gradient is presented in Fig. 13, C and D. If the hierarchical organization hypothesis (Fig. 12A) is correct as a basis for slant selectivity, neurons with highly elongated receptive fields should possess a wide range of disparity gradients. In contrast, neurons with aspect ratios close to 1 should show a narrow distribution for disparity gradients near zero. However, our data show the opposite: Disparity gradients tended to be highly variable for neurons with low aspect ratios, but were relatively small for those with high aspect ratios for inseparable RFs (separable: n = 45, P = 0.09, Mann–Whitney U test; inseparable: n = 90, P < 0.05, Mann–Whitney U test).


Figure 13
View larger version (19K):
[in this window]
[in a new window]
 
FIG. 13. A and B: distributions of aspect ratios for separable (A) and inseparable (B) binocular RFs are illustrated, respectively. Dispa