JN Fuel your research with LabChart
HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS
 QUICK SEARCH:   [advanced]


     


J Neurophysiol 90: 2795-2817, 2003. First published July 16, 2003; doi:10.1152/jn.01110.2002
0022-3077/03 $5.00
This Article
Right arrow Abstract Freely available
Right arrow Full Text (PDF)
Right arrow All Versions of this Article:
90/5/2795    most recent
01110.2002v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Right arrow Citation Map
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via ISI Web of Science (19)
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Read, J. C. A.
Right arrow Articles by Cumming, B. G.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Read, J. C. A.
Right arrow Articles by Cumming, B. G.

Testing Quantitative Models of Binocular Disparity Selectivity in Primary Visual Cortex

Jenny C. A. Read and Bruce G. Cumming

Laboratory of Sensorimotor Research, National Eye Institute, National Institutes of Health, Bethesda, Maryland 20892-4435

Submitted 11 December 2002; accepted in final form 10 July 2003


    ABSTRACT
 TOP
 ABSTRACT
 INTRODUCTION
 METHODS
 RESULTS
 DISCUSSION
 APPENDIX A: FOURIER TRANSFORMS
 APPENDIX B: DISPARITY TUNING...
 APPENDIX C: MONOCULAR SPATIAL...
 APPENDIX D: DISPARITY TUNING...
 APPENDIX E: INEQUALITY RELATING...
 APPENDIX F: EFFECT OF...
 APPENDIX G: GLOSSARY OF...
 ACKNOWLEDGMENTS
 REFERENCES
 
Disparity-selective neurons in striate cortex (V1) probably implement the initial processing that supports binocular vision. Recently, much progress has been made in understanding the computations that these neurons perform on retinal inputs. The binocular energy model has been highly successful in providing a simple theory of these computations. A key feature of the energy model is that it is linear until after inputs from the two eyes are combined. Recently, however, a modified version of the energy model, incorporating threshold nonlinearities before binocular combination, has been proposed to account for the weaker disparity tuning observed with anticorrelated stimuli. In this study, we present new data needed for a critical assessment of these two models. We compare two key predictions of the models with responses of disparity-selective neurons recorded from V1 of awake fixating monkeys. We find that the original energy model, and a family of generalizations retaining linear binocular combination, are quantitatively inconsistent with the response of V1 neurons. In contrast, the modified version incorporating threshold nonlinearities can explain both sets of observations. We conclude that the energy model can be reconciled with experimental observations by adding a threshold before binocular combination. This gives us the clearest picture yet of the computation being carried out by disparity-selective V1 neurons.


    INTRODUCTION
 TOP
 ABSTRACT
 INTRODUCTION
 METHODS
 RESULTS
 DISCUSSION
 APPENDIX A: FOURIER TRANSFORMS
 APPENDIX B: DISPARITY TUNING...
 APPENDIX C: MONOCULAR SPATIAL...
 APPENDIX D: DISPARITY TUNING...
 APPENDIX E: INEQUALITY RELATING...
 APPENDIX F: EFFECT OF...
 APPENDIX G: GLOSSARY OF...
 ACKNOWLEDGMENTS
 REFERENCES
 
The separation of the two eyes introduces disparities between the images received by the left and right eyes. The visual system is somehow able to fuse the images so as to produce a unified percept of the visual world, while using the stereoscopic disparities to extract information about how far away viewed objects are. The neural circuits specific to this ability begin in primary visual cortex (V1), the first place in the visual system where inputs from the two eyes converge on individual cells. Many V1 cells modulate their firing rate according to the stereoscopic disparity of the stimulus (Barlow et al. 1967Go; Nikara et al. 1968Go). These disparity-tuned cells are believed to perform the initial processing of retinal inputs that eventually, in higher visual areas, gives rise to stereoscopic depth perception and to binocular fusion (single vision). Thus, a detailed understanding of the computations carried out by these cells represents the first step toward a complete description of stereoscopic vision.

The current best description of the operation of these cells is provided by the energy model (Adelson and Bergen 1985Go; Fleet et al. 1996Go; Ohzawa 1998Go; Ohzawa et al. 1990Go; Qian 1994Go), sketched in Fig. 1A and described more fully below. This elegant model has been extremely successful in explaining qualitatively the properties of disparity-tuned neurons in V1, for example, the shape of the binocular receptive field obtained with disparate bar stimuli and the shape of the disparity tuning curve obtained with random-dot stereograms (Anzai et al. 1999aGo; Cumming and Parker 1997Go; Ohzawa et al. 1996Go, 1997Go; Prince et al. 2002bGo).



View larger version (30K):
[in this window]
[in a new window]
 
FIG. 1. Block diagrams of the energy model (A) and our modified version (B). Grayscale plots represent receptive fields, which are shown differing in phase. Arrows show results vL and vR of a linear operation performed on image in each eye. The models do not specify physiological details of how this linear operation is calculated, so they are shown simply with arrows. Subsequently, triangles represent excitatory synapses, and disks inhibitory synapses. A: in the energy model, linear inputs from both eyes converge onto a single binocular simple cell. Each binocular simple cell computes the linear sum of its left and right inputs, and outputs the half-squared sum to the complex cell. B: possible implementation of our modified version: linear inputs from left and right eyes pass through monocular simple cells, and are thus half-wave rectified, before converging on a binocular simple cell. After this rectification, the type of synapse (excitatory/inhibitory) at the binocular simple cell has a profound influence on the type of disparity tuning observed. In B, top binocular subunit is shown receiving excitatory synaptic input from both eyes; bottom subunit is shown with one excitatory and one inhibitory synapse.

 

Although the energy model was originally intended as a qualitative description, its success to date suggests that it may be possible to elaborate the model so as to provide a good quantitative description of neuronal behavior. Extending the energy model requires identifying those quantitative discrepancies that have not so far been reconciled with the original structure of the model. First, the response to anticorrelated random-dot stimuli (when contrast polarity is inverted in one eye) must be accounted for. In real cells, anticorrelation inverts the disparity tuning curve and also reduces the amplitude, whereas the energy model predicts inversion only (Cumming and Parker 1997Go; Ohzawa et al. 1997Go). The amplitude reduction can be explained if we modify the energy model by incorporating threshold nonlinearities before binocular combination (Read et al. 2002Go). This modified version of the energy model is shown in Fig. 1B.

Second, the energy model predicts that monocular stimulation in either eye should always have an excitatory effect. In this study, by making a quantitative comparison of monocular and binocular responses, we confirm that there are many cells in which input from one eye always seems to suppress the cell's response, as previously reported by others (Ohzawa and Freeman 1986aGo; Prince et al. 2002aGo). Such behavior is inconsistent with linear binocular combination, but is predicted by our modified version (Read et al. 2002Go).

Third, the energy model predicts that the response to binocularly uncorrelated random-dot patterns should equal the sum of the responses to monocular random-dot patterns. In fact, it is generally much closer to their mean. It has been suggested (Prince et al. 2002bGo) that this is a consequence of a contrast-normalizing mechanism that tends to boost the response to monocular. We show here that the relative size of the monocular and binocular response can be explained by our modification to the energy model, without the need to invoke a normalization process.

Fourth, a key prediction of the energy model is that the shape of monocular receptive fields determines the shape of the disparity-tuned response. Although this has been verified for simple cells in the cat (Anzai et al. 1999bGo), the situation for complex cells is less clear because it is difficult to estimate the receptive fields of the subunits. Fortunately, the model can also be tested in the frequency domain: the Fourier power spectrum of the disparity-tuning curve should match the shape of the monocular spatial frequency-tuning curves. Preliminary testing of this prediction has indicated a conflict with the energy model prediction (Ohzawa et al. 1997Go; Prince et al. 2002bGo); however, for a number of reasons, the seriousness of this conflict is hard to assess. Prince et al. (2002bGo) found that their disparity tuning curves often had spatial frequency bandwidths substantially larger than those estimated from luminance gratings in other studies (de Valois et al. 1982Go). However, Prince et al. measured tuning for horizontal disparity, so these data are not directly comparable with selectivity for the spatial frequency (SF) of luminance gratings at the preferred orientation. Ohzawa et al. (1997Go) found that the frequency of the disparity-tuning curve tended to be lower than the preferred spatial frequency revealed with monocular luminance gratings in the dominant eye, apparently contradicting the energy model. However, their definition of disparity frequency could potentially obscure an underlying agreement with the energy model (see below); also, confidence intervals were not presented. Most important, neither of these studies reported measures of spatial frequency tuning in both eyes. The original energy model assumes that spatial frequency tuning is identical in the two eyes, so it is possible that the discrepancies could be attributable to binocular differences in spatial frequency tuning. If this were the case, it would be easy to extend the energy model to take account of this difference, by allowing different spatial frequency tuning between subunits, either within an eye or between eyes. The data should then agree with this generalized version of the energy model. Thus, a more complete comparison of the spatial frequency tuning and the power spectrum of the disparity tuning is necessary to test the model.

To resolve this important question, we recorded the monocular spatial frequency and orientation tuning in both eyes. This is compared with the selectivity for disparity applied to random-dot patterns along an axis orthogonal to the preferred orientation. Both comparisons systematically violate the predictions of the energy model, even after it has been generalized to allow for differences between subunits. The disparity-tuning curves show more power at lower frequencies than is possible within these models, even allowing for the presence of several subunits that may differ in position and/or spatial frequency tuning. However, once again, the results may be explained by our modified version of the energy model incorporating a threshold nonlinearity before binocular combination.

In summary, therefore, we have compared two families of models of disparity selectivity: 1) the energy model and a set of generalizations of it, all postulating linear binocular summation; and 2) our modified version incorporating threshold nonlinearities before binocular combination. For a wide range of observations, the data are quantitatively at odds with the linear model and can be accounted for by the threshold model. We conclude that adding thresholds to the energy model, before inputs from the two eyes are combined, represents a substantial step forward in our understanding of disparity selectivity in V1.


    METHODS
 TOP
 ABSTRACT
 INTRODUCTION
 METHODS
 RESULTS
 DISCUSSION
 APPENDIX A: FOURIER TRANSFORMS
 APPENDIX B: DISPARITY TUNING...
 APPENDIX C: MONOCULAR SPATIAL...
 APPENDIX D: DISPARITY TUNING...
 APPENDIX E: INEQUALITY RELATING...
 APPENDIX F: EFFECT OF...
 APPENDIX G: GLOSSARY OF...
 ACKNOWLEDGMENTS
 REFERENCES
 
Detailed descriptions of the general procedures have appeared elsewhere (Cumming and Parker 1999Go; Prince et al. 2002bGo). Briefly, single-unit activity was recorded from primary visual cortex (V1) of two awake macaques trained to maintain fixation while viewing stimuli for fluid reward. All protocols were approved by the Institute Animal Care and Use Committee and complied with Public Health Service policy on the humane care and use of laboratory animals.

Stimuli were generated on a Silicon Graphics Octane workstation and presented on two Eizo Flexscan F980 monitors (mean luminance 41.1 cd/m2, contrast 99%, frame rate 72 Hz) viewed through a Wheatstone stereoscope, in which the monitors are viewed through mirrors positioned in front of the animal's eyes. At the viewing distance used (89 cm) each pixel in the 1,280 x 1,024 display subtended 1.1 min arc. Antialiasing was used to render with subpixel accuracy [pixels are colored intermediate shades of gray to represent edges that only partially cover the pixel (Foley et al. 1990Go)]. Glasscoated platinum–iridium electrodes (FHC) were placed transdurally each day. Electrode position was controlled with a custom-made microdrive that used an ultralight stepper motor mounted directly onto the recording chamber.

The monkeys initiated a stimulus presentation by maintaining fixation on a binocularly presented spot to within ±1°. They were required to maintain fixation at this accuracy for 2.1 s to earn a fluid reward. During each such trial, 4 stimuli were presented, each lasting 420 ms, separated by 100 ms.

Stimuli

Sinusoidal luminance gratings were used to determine the minimum response field, spatial frequency, and orientation tuning of the cell. After an initial determination of the preferred spatial frequency and orientation, the monocular orientation-tuning curve in each eye was obtained using a circular patch of grating with spatial frequency reasonably close to optimal. Quantitative orientation-tuning curves usually spanned a range of 180° centered around the preferred orientation (or direction, for direction-selective cells). The spatial frequency tuning curve (SFTC) was then obtained using a large rectangular grating patch at the preferred orientation. The frequencies generally spanned 0.0625 to 16 cycles per degree (cpd) in steps of 1 octave. A pseudo-random sequence interleaving frequencies and eye of presentation was used in both cases. During monocular trials, the nonstimulated eye viewed a uniform screen of the same mean luminance.

Dynamic random-dot stereograms were composed of black and white dots, scattered at random on a gray background. The dots were usually 5 x 5 pixels (0.1° x 0.1°); for some cells, a different size was used if this enhanced the response rate. A new random stereogram was generated every frame (72 Hz). The dot density was sufficient to cover 50% of the gray background but, because the dots were allowed to overlap one another (dot location was randomly assigned with subpixel precision using antialiasing), the total coverage was slightly less. On average, 20% of pixels were black, 20% white, and 60% gray. Figure 2 shows an example stereogram, together with a circle indicating the size of typical V1 minimum response fields for comparison.



View larger version (59K):
[in this window]
[in a new window]
 
FIG. 2. Random-dot stereogram of type used in our experiments. Dots are 0.1° x 0.1° square. Circle indicates size of typical V1 receptive field for comparison. Diameter is 1.2°, based on a subset of cells in which we obtained the one-dimensional receptive field envelope using a long luminance-modulating bar stimulus at the preferred orientation. The mean SD of Gaussians fitted to receptive field envelopes was 0.3°, and we took 4 SDs as suitable estimate of receptive field diameter.

 

The energy model assumes that all receptive fields feeding into a cell have the same orientation. Its predictions are therefore most easily framed in terms of disparities parallel and orthogonal to this orientation, rather than horizontal and vertical disparities. Accordingly, to facilitate testing of energy model predictions, experimental disparities were applied along the axis orthogonal to each neuron's preferred orientation. These covered the range from –1.2° to +1.2° in the initial test for disparity selectivity, with the range –0.6° to +0.6° covered in steps of 0.1°, and steps of 0.2° outside this range. A larger range of disparities was used if necessary to ensure that there was no modulation at the extremes of the tuning curve (i.e., that the full response range had been explored). In neurons with preferred SFs >4 cpd, the central region of the curve was sampled more finely, to ensure that sampling exceeded the Nyquist limit predicted from the monocular SF tuning.

Data analysis

Data analysis such as curve fitting is greatly simplified if we can make the assumption that variance is constant across the data set. This assumption is invalid for neuronal firing rates, whose variance tends to increase in proportion with the mean (Dean 1981Go). However, the square root of firing rates has variance that is roughly constant, independent of the mean (Cumming and Parker 2000Go; Prince et al. 2002bGo). This variance-stabilizing transformation greatly simplifies the analysis of neuronal data. For this reason, we performed all our analysis on the square root of the recorded firing rates.

To quantify the strength of disparity tuning, we used the disparity discrimination index (DDI) introduced by Prince et al. (2002bGo)

(1)
where Rmax and Rmin are the maximum and minimum , respectively, and RMSerror is the square root of the residual variance around the mean recorded across the whole tuning curve, including the response to uncorrelated stimuli (effectively, infinite disparity). Like the more familiar binocular interaction index, (RmaxRmin)/(Rmax + Rmin), this is a contrast measure, except that here the difference in response between the preferred and null disparity is contrasted not with the mean response, but with the variability of the firing rate RMSerror. This means that cells in which the range in firing rates is largely the result of random fluctuations are not wrongly classified as being highly sensitive to disparity; equally, cells in which the change in firing as a function of disparity is relatively small but highly reliable are correctly described as strongly disparity-tuned. The term (RmaxRmin) in the denominator of Eq. 1 ensures that the index is bounded at 1 when the variability is small.

For most cells, monocular random-dot stimuli were also presented, in trials interleaved with binocular stimuli. Blank stimuli were also usually interleaved, where both eyes viewed a blank screen of the same mean luminance as the random-dot patterns. These were used to obtain an estimate of the spontaneous firing rate.

To allow a cell into the study, we required that binocular random dots at the optimal disparity elicit a response of at least 10 spikes/s. To proceed to quantitative analysis of the response's shape, we further required i) ANOVA indicates a significant (P < 0.05) main effect of disparity; ii) the disparity discrimination index exceeds 0.375. The second condition removes neurons with weak but significant disparity tuning because these tend to produce noisy estimates in the quantitative analysis that follows. Including these weakly tuned neurons did not change any of the substantial results; it only increased the scatter. To subject monocular spatial frequency data to quantitative analysis, we required that i) the optimal drifting grating in that eye elicits at least 10 spikes/s; ii) ANOVA indicates a significant (P < 0.05) main effect of spatial frequency. We do not require tuning to be band-pass, and our sample included a few neurons that showed a low-pass spatial frequency tuning.

Fitting tuning curves

We summarized our tuning curves by fitting them with analytical functions. If we fitted the function directly to the mean firing rates, we would have to reduce the weight given to residuals at higher firing rates, to take account of the higher variance there. As explained above, we avoided this complication by, instead, fitting the square root of our chosen fit function to the mean of the square-root firing rates. Given that has approximately constant variance, we could then just minimize the sum of the squared residuals, without needing to weight them differently.

SFTCs were fitted with Gaussians, in either log or linear frequency space, whichever minimized the residuals. These had 4 free parameters: frequency of the peak f0, standard deviation {sigma}, baseline and amplitude above the baseline. The baseline was assumed to represent the spontaneous firing rate; thus, it was not allowed to be negative. The peak frequency was constrained to lie within the range of stimulus frequencies. The amplitude was not allowed to exceed twice the range of the response.

These fitted curves were used to extract a peak frequency, a low-frequency cutoff, and a high-frequency cutoff, defined as the positions where the tuning curve falls to half its maximum. Where the SFTC was fitted with a Gaussian in linear frequency, with peak at f0 and standard deviation {sigma}, the high and low cuts are . Where the tuning curve was fitted with a Gaussian in log frequency, with standard deviation {sigma} in log space, the high and low cuts are .

Disparity-tuning curves were fitted with half-wave rectified one-dimensional Gabor functions (the product of a sinusoid with an exponential; cf. APPENDIX B). The original energy model predicts a Gabor disparity-tuning curve, provided that the monocular receptive fields are narrow-band Gabor functions differing only in their position and phase. However, our main motivation for using Gabor functions is that they provide a succinct description of most experimental tuning curves (Cumming and Parker 1997Go; Ohzawa et al. 1990Go; Prince et al. 2002aGo). In the RESULTS section, we verify that our conclusions do not depend on the use of a fitted Gabor. One-dimensional Gabors have 6 free parameters: the spatial frequency f and phase {varphi} of the carrier cosine, the standard deviation {sigma}, amplitude A and center {delta}0 of the Gaussian envelope, and the baseline firing rate B about which the sinusoid oscillates. Uncorrelated responses, if available, were included in the fitting; the expected response to uncorrelated stimuli is just B. {delta}0 was constrained to lie within the range of stimulus disparities and the amplitude A was not allowed to exceed twice the difference between the maximum and the minimum response. The spatial frequency of the fit was not allowed to exceed half the Nyquist limit (i.e., one-quarter of the maximum spatial sampling rate of the data). Although these curves generally gave good descriptions of the tuning curves, the parameters of the fitted Gabor must be interpreted with care (see Prince et al. 2002bGo). When using these fits to summarize some property of the tuning curve, we therefore used appropriate measures applied to the fitted curve (illustrated by our measurement of disparity peak frequency in the next section), rather than using the parameters of the fit.

Disparity Fourier spectrum

The original energy model predicts that the disparity peak frequency, the frequency at which the disparity modulation has most power, should be the same as the preferred spatial frequency observed with monocular gratings. In making this comparison, the disparity must be applied at right angles to the cell's preferred orientation. Most previous work using random-dot stimuli in awake animals has employed only horizontal disparities. To enable a test of the energy model prediction, all disparities in the present study were applied orthogonal to the cell's preferred orientation (Cumming 2002Go). It is also important that the disparity tuning is measured with a broadband stimulus such as random dots, to ensure that the disparity tuning curve shape is not trivially determined by the stimulus. If disparity tuning were measured with a grating, for instance, the periodicity of the stimulus would guarantee a periodic response (Cumming and Parker 2000Go).

The disparity peak frequency is slightly different from the "disparity frequency"—a term used by two previous authors in related but distinct senses. Ohzawa et al. (1997Go) used a bar as the broadband stimulus to test this prediction in the anesthetized cat. They used the term "disparity frequency" to mean the carrier frequency of the Gabor fitted to the disparity tuning curve, which they then compared with the monocular spatial frequency tuning. Note, however, that this carrier frequency does not necessarily equal the disparity peak frequency. Thus their finding that the carrier frequency of fitted Gabors was systematically lower than the preferred spatial frequency in the dominant eye is not necessarily at odds with the energy model. For sufficiently narrow-band Gabor functions, the carrier frequency f and disparity peak frequency coincide (APPENDIX B), but many of the disparity tuning curves presented by Ohzawa et al. appear to be fairly broadband (e.g., their Fig. 15). In this case, the disparity peak frequency and the carrier frequency diverge. Which is higher depends on the phase of the disparity-tuning curve (APPENDIX B). The disparitytuning curves presented by Ohzawa et al. display phases across the spectrum: thus, both situations occur. Furthermore, for Gabors that are not narrow-band, the fitted carrier frequency is often poorly constrained by data (Prince et al. 2002aGo; Fig. 6). For these reasons, it is not clear that the data presented by Ohzawa et al. (1997Go) necessarily violate the energy model.



View larger version (19K):
[in this window]
[in a new window]
 
FIG. 6. Scatter plot of preferred spatial frequency in left eye against that in right, on log axes. Preferred frequency is defined by the peak of the Gaussian fitted to the monocular tuning curve obtained with sinusoidal luminance gratings. Solid line shows identity; dotted lines mark difference in spatial frequency (SF) tuning of 1 octave. Filled symbols indicate 25 cells whose preferred frequencies in left and right eyes differed significantly (P < 0.05, resampling). Circles indicate cells from monkey Duf, squares from monkey Ruf. Cells shown in Fig. 7 are indicated. Two cells that had very low preferred spatial frequencies fall outside the range shown in the figure.

 



View larger version (18K):
[in this window]
[in a new window]
 
FIG. 7. Two example cells (A: duf156; B: ruf127) that respond well to grating stimulation in either eye, but show a particularly extreme difference between spatial frequencies evoking maximum response. Triangles show mean firing rate to monocular drifting luminance gratings, as a function of spatial frequency. Error bars are ±SE. Curves are Gaussians fitted to these data; peaks are indicated with vertical lines. Shaded regions show 95% confidence interval for this peak (estimated by refitting resampled data sets). Left eye: leftward open arrowheads {triangleleft}, dotted line; right eye: rightward filled arrowheads {blacktriangleright}, solid line.

 
Prince et al. (2002bGo) used the term "disparity frequency" to refer to the peak frequency of the Fourier transform of the disparity-tuning curve after subtraction of the mean, meaning that for these authors a disparity frequency of zero is impossible by definition. The disparity frequency of Prince et al. (2002bGo) was designed as a way of extracting a measure of the spatial scale of disparity tuning that would work for both band-pass and low-pass tuning curves. Neither sense of disparity frequency provides the appropriate measure for comparison with monocular spatial frequency tuning: hence our use of the disparity peak frequency.

We compared two different ways of extracting the disparity peak frequency. The first was completely model-independent; here we used the response to uncorrelated stimuli as an estimate of the baseline firing rate, subtracted that from the disparity tuning curve, and took the continuous Fourier transform of the result (by trapezoidal integration). We also estimated the disparity spectrum from the Gabor function fitted to the tuning curve. When performing the fit, the Gabor function was half-wave rectified; that is, negative values of the fit function were replaced with zeros for the purpose of evaluating the residual (given that firing rates could not be lower than zero). When obtaining the disparity spectrum, we used the unrectified Gabor, and solved numerically for the peak and half-maximal points of the Fourier spectrum.

Bootstrap resampling

To interpret scientific results, it is important to have an estimate of significance, to be sure that features we observe in our data are not merely the result of the vagaries of finite sampling. Throughout this study we have used bootstrap resampling (Efron 1979Go) to estimate significances. Given a data set consisting of n samples of the random variable, one generates a "new data set" by randomly selecting a member of this data set n times (with replacement). This provides a convenient, nonparametric way to estimate the distribution of some function of a random variable, avoiding the normality assumptions buried in many standard statistical tests. For resampling to be reliable, n must be large. This was one motivation for presenting stimuli for relatively short periods: it provided a large number of independent samples. To increase n further, we pooled the data across all disparities (or spatial frequencies, for the grating stimuli) and resampled the residuals. For this pooling to be valid, the SD must be the same at each disparity, so, as before, we transformed each datum by taking its square root. That is, for each disparity we calculated the mean of the square root of the firing rate, and the residual difference between this mean and the square root of the firing rate on each stimulus presentation. We then pooled all these residuals into a single population. To generate a resampled datum, we picked a residual at random from this pool, added it to the mean square-root firing rate, and squared it to obtain the resampled firing rate. We also explored resampling the data for each stimulus condition separately and found that this gave closely similar results. In the few cases where the results were different, the method of resampling the residuals generally gave the wider confidence interval. Because this yields more conservative estimates for significance testing, resampling of residuals was adopted throughout. This meant that the effective n was always >80 for the SFTCs and always >200 for the disparity tuning. All quoted significances are at the 5% level.

Classification as simple or complex

Within the energy model, complex cells are viewed as being made up from the summed output of several simple cells (Eq. 3 below). Our analysis holds for both simple and complex cells and our conclusions do not depend on a classification of cells as either simple or complex. For this reason we have not treated simple and complex cells differently, and hence avoided the complications of attempting to make the classification in awake animals in the face of small eye movements.

Data set

We recorded monocular and binocular responses to random-dot stimuli in 210 neurons, at eccentricities between 2° and 10°. Of these, 180 produced a maximum firing rate of at least 10 spikes/s; 138/180 were disparity-selective. Adequate data on spatial frequency tuning were available for 101/138 disparity-selective cells, and in an additional 23 disparity-selective neurons we had data on spatial frequency tuning but not monocular responses to random dots.

The energy model and our modified version

This study represents a critical comparison of the energy model (Adelson and Bergen 1985Go; Fleet et al. 1996Go; Ohzawa 1998Go; Ohzawa et al. 1990Go; Qian 1994Go) and our modified version of it, introduced to explain the weaker response to anticorrelated stimuli (Read et al. 2002Go). In this section, we lay out the key features of both models and explain how they differ. Detailed calculations are given in the APPENDICES.

The building blocks of all the models considered in this study are binocular subunits characterized by a receptive field in each eye, which performs a linear operation on the retinal image in that eye. The input from each eye, {nu}L or {nu}R, is the result of this operation (for details, see APPENDIX C, Eq. C2). The distinctive feature of the energy model is that the inputs from the two eyes are combined linearly: the response of a binocular subunit is a function of the sum ({nu}L + {nu}R) of the inputs from each eye separately. If this sum is negative, the binocular cell is silent because it cannot signal firing rates below zero. If this sum is positive, the energy model postulates that the binocular cell outputs the square of this sum. Thus, writing C for the output of the disparity-selective cell

(2)
where Pos denotes half-wave rectification. A complex cell is assumed to receive input from several of these half-squared linear binocular subunits, and its response is assumed to be the linear sum of its inputs

(3)
This is shown schematically in Fig. 1A. Binocular subunits ("BS") are shown receiving input from left and right eye receptive fields, which for illustration are shown with different phases. Several of these subunits feed into a single complex cell ("Cx").

Our modified version (Read et al. 2002Go) differs from the energy model in postulating that inputs from the two eyes are half-wave rectified before being combined

(4)

Figure 1B shows one physiologically plausible implementation of this nonlinearity. In the figure, inputs from the left and right eyes initially synapse onto monocular simple cells ("MS"), which impose an output threshold, before being combined in a binocular subunit. If the inputs are combined with an inhibitory synapse, as in the lower binocular subunit in Fig. 1B, we obtain units like

(5)
(the additional Pos means that the cell does not fire when suppression from the right eye exceeds excitation from the left). Once again, complex cells are constructed from the sums of several binocular units of the type given in Eq. 4 and Eq. 5 (Read et al. 2002Go). The distinction between the two types does not matter in the energy model: there is no need to explicitly include subtypes based on (vL vR) as well as (vL + vR), because (vLvR) is equivalent to (vL + vR) with a phase change of {pi} in the right eye's receptive field.


    RESULTS
 TOP
 ABSTRACT
 INTRODUCTION
 METHODS
 RESULTS
 DISCUSSION
 APPENDIX A: FOURIER TRANSFORMS
 APPENDIX B: DISPARITY TUNING...
 APPENDIX C: MONOCULAR SPATIAL...
 APPENDIX D: DISPARITY TUNING...
 APPENDIX E: INEQUALITY RELATING...
 APPENDIX F: EFFECT OF...
 APPENDIX G: GLOSSARY OF...
 ACKNOWLEDGMENTS
 REFERENCES
 
Overview

DISPARITY SELECTIVITY AND MONOCULARITY. We present evidence that some cells receive purely suppressive input from one eye. We show that this is inconsistent with the linear binocular combination of the energy model, but can be explained in our nonlinear model.

SPATIAL FREQUENCY TUNING IN THE TWO EYES. Motivated by the assumption of the original energy model that receptive fields are identical up to phase, we investigate whether there is evidence for differences in spatial frequency tuning between eyes. We find that tuning in most cells agrees well, but a minority show significant differences.

DISPARITY FREQUENCY AND SPATIAL FREQUENCY TUNING. If the assumptions of the original energy model hold, then the disparity frequency should equal the preferred spatial frequency in the dominant eye. We show that this prediction is systematically violated, and that vergence movements cannot account for the difference. However, this prediction applied only to the original energy model, which included strong constraints on the receptive field profiles in addition to linear binocular combination.

GENERALIZING THE ENERGY MODEL. We therefore generalize the energy model to allow for receptive fields with different phases, positions, and spatial frequency tuning (both across subunits, and across eyes within a subunit). We derive a constraint that even this generalized model must fulfill. We show that the data systematically violate this constraint.

THRESHOLDING BEFORE BINOCULAR COMBINATION. We finally show that our modified version of the energy model, in which a threshold precedes binocular combination, can account for the observations on disparity and spatial frequency tuning.

Disparity selectivity and monocularity

SOME CELLS RECEIVE PURELY SUPPRESSIVE INPUT FROM ONE EYE. Cells that are sensitive to binocular disparity must receive information from both eyes. It is tempting to extrapolate from this that the cells that are most sensitive to binocular disparity must be those that respond most nearly equally to input in either eye. However, previous investigators (Ohzawa and Freeman 1986bGo; Poggio and Fischer 1977Go; Prince et al. 2002bGo; Smith et al. 1997Go) have found little support for this idea. In agreement with these studies, we find no relationship between monocularity and disparity selectivity. Many cells that respond nearly equally to monocular stimulation in either eye are not disparity selective, whereas many cells that show little or no response to monocular stimulation in one of the eyes nevertheless show clear disparity tuning. Examples are shown in Fig. 3 (see also Fig. 8). The response to monocular stimulation is shown by the broken horizontal lines labeled L and R (and marked with a leftward/rightward-point arrowhead respectively: {triangleleft} and {triangleright}). In duf096 (Fig. 3A), monocular stimulation in the left eye evokes almost no response; in duf099 (Fig. 3B), it is the right eye that is silent. Yet the black dots show the cells' responses as a function of disparity (curve = fitted Gabor); clearly both cells are selective to disparity, and so must be receiving information from both eyes. Thus, it is important to distinguish between two common uses of the term "monocular": the classical sense, "responsive to monocular stimulation in one eye and not the other," must not be interpreted to mean "receiving input from only one eye" (Ohzawa and Freeman 1986aGo,bGo; Smith et al. 1997Go).



View larger version (13K):
[in this window]
[in a new window]
 
FIG. 3. Two tuned-inhibitory cells that show evidence of an inhibitory input from one eye. Stimulation in the nondominant eye seems always to reduce firing rate: the response to monocular random dots in the nondominant eye is less than that to a blank screen, whereas the response to binocularly uncorrelated random dots is less than that to monocular random dots in the dominant eye. Disparity discrimination index = 0.61 for duf096 (A), 0.64 for duf099 (B). Filled circles ({bullet}) represent the mean firing rate as the function of disparity; the black curve is the fitted Gabor. Horizontal lines represent responses to stimuli without disparity: line labeled L and marked with leftward arrowhead {triangleleft} indicates response to monocular random dots in left eye (R{triangleright}: right eye); U{square} indicates response to binocularly uncorrelated random dots; S{circ} indicates spontaneous rate (response to gray screen of same mean luminance). All error-bars are ±SE.

 


View larger version (29K):
[in this window]
[in a new window]
 
FIG. 8. Two disparity-tuned cells (A, C: duf092; B, D: ruf065) that seem to show inhibitory influence from one eye. A, B: spatial frequency tuning (symbols as in Fig. 7); C, D: disparity tuning (symbols as in Fig. 3). For ruf065.0, disparity-tuning curve is not well fitted by a Gabor (fit explained <60% of variance); fit is therefore not shown. Disparity discrimination index = 0.58 for duf092 (C), 0.52 for ruf065 (D). For both cells, the maximum binocular response is less than the response to monocular stimulation in the dominant eye. Thus, adding dots to the nondominant eye, at any disparity, always reduces the response.

 

One natural way to explain the phenomenon of disparity selectivity in "monocular" neurons is to propose that the input from one eye always has a net inhibitory effect, and thus no spikes are produced by stimulation in that eye alone. In the absence of complications such as response normalization (which could adjust the response to monocular stimuli relative to binocular), such a scheme makes two predictions. First, binocularly uncorrelated dots should produce a weaker response than monocular dots in the dominant eye (because adding dots to the other eye produces net inhibition). This was the case for 86/138 disparity-tuned cells (significant in 44). Second, the monocular response in the nondominant eye should not be significantly greater than the spontaneous response (it is rarely possible to observe a monocular response less than spontaneous, given that the latter is so frequently indistinguishable from zero); 30/138 disparity-selective neurons showed both these phenomena. In 9/30 cells, the spontaneous response was significantly greater than zero, so that if one eye had an inhibitory influence, it would be possible to observe suppression of the spontaneous response when this eye was stimulated. In 5 of these 9, the response to random-dot stimulation in the nondominant eye was smaller than the spontaneous response. The cells shown in Fig. 3 are two examples. The broken line labeled U and marked with a square ({square}) shows the response to uncorrelated stimuli; in both cells this is below the response to monocular stimulation in the dominant eye (i.e., adding stimulation to the nondominant eye has reduced the response). The broken line labeled S and marked with a circle ({circ}) shows the spontaneous rate, estimated from the response to a blank screen: in both cells, the cell fires more to a blank screen than to monocular stimulation in the nondominant eye. Such examples are clearly indicative of a predominantly inhibitory input from one eye. We conclude that in many cells, stimulation in one of the eyes always has a suppressive effect.

THIS INDICATES A NONLINEARITY BEFORE BINOCULAR COMBINATION. It is important to note that this represents a substantial deviation from any model in which binocular summation is linear. By definition, a model with linear binocular summation is of the form C = f({nu}L + {nu}R), where f is an arbitrary function and {nu}L and {nu}R represent the inputs from left and right eyes, respectively. If f({nu}L) is never positive for any value of the input {nu}L, either positive or negative (no possible stimulus in the left eye elicits a positive response), then f({nu}R) can never be positive either, and so the cell would never respond. In a linear model, if the cell responds at all, then stimulation in each eye can exert either a suppressive or an enhancing effect, depending on the stimulus. To obtain the situation where one eye always exerts a suppressive effect, we must postulate some nonlinearity before binocular combination, such as half-wave rectification followed by an inhibitory synaptic connection. This is exactly what is proposed by our modified version of the energy model. Looking at Eq. 5, C = {Pos[Pos({nu}L) – Pos({nu}R)]}2, it is obvious that stimulation in the right eye always has a suppressive effect. For monocular right-eye stimulation, the response C is zero, and yet with disparate binocular stimuli, this unit is disparity-selective. Figure 4 shows simulations of two subunits described by Eq. 5. The solid line shows the disparity tuning curve. In A, the left and right receptive fields are identical, so— because input from one eye is inhibitory—the disparity tuning curve is of the tuned-inhibitory class. In B, the left and right receptive fields are 180° out of phase. When combined with the inhibitory synapse in Eq. 5, this results in tuned-excitatory disparity tuning. This demonstrates that an inhibitory synapse at binocular combination does not necessarily result in tuned-inhibitory tuning. Thus, our thresholding model explains the existence of cells that would classically be called "monocular" and yet are disparity-selective.



View larger version (13K):
[in this window]
[in a new window]
 
FIG. 4. Our new model, incorporating a threshold linearity before binocular combination, can explain disparity selectivity in classically monocular cells, and cells in which the response to binocularly uncorrelated dots (U{square}) is close to the mean of response to monocular stimuli (L{triangleleft}, R{triangleright}), rather than to its sum as predicted by energy model. Both plots show simulations for single binocular subunit receiving inhibitory input from right eye (Eq. 5); mean response over 100,000 different random-dot patterns. A: monocular receptive fields are identical, so the inhibitory synapse in Eq. 5 results in a tuned-inhibitory cell. B: left eye's receptive field is inverse of right eye's receptive field, so with the inhibitory synapse this results in a tuned-excitatory cell.

 

A THRESHOLDING NONLINEARITY CAN EXPLAIN THE RELATIVE AMPLITUDE OF MONOCULAR AND BINOCULAR RESPONSES. We now investigate the extent to which this model can account quantitatively for the relative amplitude of monocular and binocular responses. Prince et al. (2002bGo) observed that the response to binocularly uncorrelated dot patterns was often close to the mean of the responses to monocular stimulation in the two eyes, whereas the energy model predicts that it should be their sum. Prince et al. suggested that this could be attributable to a normalization process that lowers the response to binocular stimuli. However, our modification to the energy model already allows us to build cells in which the uncorrelated response is the mean of the 2 monocular responses, without incorporating any normalization. The horizontal lines in Fig. 4 show the response to monocular stimuli (L{triangleleft}, R{triangleright}) and binocularly uncorrelated stimuli (U{square}). In both cases, the uncorrelated response is close to the mean of the monocular responses, demonstrating that our model can explain this phenomenon, for both tuned-excitatory and tuned-inhibitory cells.

These simulations portray something of an extreme case: in both these examples, inhibition from the suppressive eye is much stronger than excitation from the excitatory eye, so that the response to monocular stimulation in the dominant eye, M, is nearly twice the response U to uncorrelated stimuli. In fact, 2U is an upper bound for M: our model predicts that M can never exceed 2U. The energy model has a similar upper bound: it predicts that M can never exceed U. We have seen that the energy model's upper bound is violated by most cells (86/138). We now investigate whether the upper bound predicted by our model is similarly violated. Figure 5 shows the distribution of M/U for the 138 disparity-selective cells in our data set. The vertical lines mark the upper bounds predicted by the energy model (dashed) and our model (solid). The mode of the distribution is close to M/U = 1, so over half the cells exceed the energy model upper bound. However, the distribution begins to fall off after M/U = 2, so that the upper bound predicted by our model is violated in only 23/138 cells. We used resampling to estimate the 95% confidence interval for M/U. If this interval lies entirely above 1, we can be 95% confident that the upper bound predicted by the energy is violated; this was the case for 44/138 cells (32%), shaded gray in Fig. 5. If this interval lies above 2, we can be 95% confident that the upper bound predicted by our model is violated; this was so for only 4/138 cells (3%), shaded black in Fig. 5. We conclude that almost all cells respond to monocular stimulation in the dominant eye at less than twice the rate for uncorrelated stimuli, and can therefore be accommodated within our modified model. Thus, our model can explain the observed spectrum of monocular and binocular response rates, without needing to invoke other mechanisms such as contrast normalization.



View larger version (26K):
[in this window]
[in a new window]
 
FIG. 5. Frequency histogram for ratio of response to random-dot stimulation presented monocularly to dominant eye (M) to response to binocular uncorrelated random-dot patterns presented binocularly (U). Dashed vertical line marks M/U = 1, upper bound predicted by energy model; solid vertical line marks M/U = 2, upper bound predicted by our modified version. Shading indicates cells for which we can be 95% confident that ratio exceeds upper bound: gray shows cells where 2.5% percentile exceeded 1, black where it exceeded 2. Thus, gray + black regions indicate 44/138 cells that significantly violate upper bound predicted by energy model; black regions indicate 4/138 cells that significantly violate upper bound predicted by our model.

 

Spatial frequency tuning in the two eyes

The original implementation of the energy model (Ohzawa et al. 1990Go) assumed that all receptive fields have the same spatial frequency and orientation tuning and bandwidth. They differ only in their amplitude, their position, and phase, and even so, the position and phase disparity between left and right receptive fields of a single subunit is assumed to be the same for all subunits. These constraints on the receptive fields have been assumed by all implementations of the energy model we are aware of (e.g., Fleet et al. 1996Go; Lippert and Wagner 2001Go; Ohzawa et al. 1997Go; Qian 1994Go; Read 2002Go; Tsai and Victor 2003Go). We shall use the phrase original energy model to denote Eq. 3 with these additional constraints on the receptive fields. (Later, we shall consider a generalized energy model in which many of these constraints are relaxed.)

The available evidence suggests that these constraints are generally observed in simple cells (Anzai et al. 1999bGo; Ohzawa et al. 1996Go). In complex cells, the situation is harder to assess. Preferred orientation is observed to be closely matched between the two eyes (Bridge and Cumming 2001Go), supporting the view that all receptive fields share the same orientation. However, there is some evidence from the cat suggesting that there may be a population of cells in which spatial frequency differs between the two eyes (Hammond and Pomfrett 1991Go; Ohzawa et al. 1996Go). In this section, we investigate the agreement in spatial frequency tuning for our monkey data.

For 151 cells, the spatial frequency tuning to monocular drifting sinusoidal luminance gratings at the cell's preferred orientation was probed in both eyes; 84 of these were sufficiently responsive and selective to permit fitting in both eyes. We defined the preferred spatial frequency to be the frequency at which the Gaussian fitted to the tuning curve had its peak. To ensure this is meaningful, we required the fits in each eye to explain more than 60% of the variance of the tuning curve data. Figure 6 compares the preferred spatial frequency in the two eyes for the remaining 73/84 cells. The solid line shows the identity; the dotted lines mark difference in SF tuning of 1 octave. Clearly, spatial frequency tuning is usually well matched between eyes. The correlation coefficient is 0.87 (P < 10–5). Nonetheless, 25/73 cells showed a significant difference (P < 0.05, by resampling) in preferred spatial frequency between the eyes; these are colored black in Fig. 6. There was no correlation between the difference in preferred spatial frequency and the difference in peak response between the two eyes. The figure of 25 includes some cells where the difference in preferred frequency was small (but turned out to be significant because the peak positions were robust under resampling). However, for 6/25, the peak firing rates in the two eyes occurred for gratings differing in frequency by over an octave. Two examples are shown in Fig. 7. The arrowheads show the response of the cell to monocular grating stimuli as a function of the grating spatial frequency (L: {triangleleft}, R: {blacktriangleright}); the curves show the fit. The 95% confidence interval for the peak of the fitted function is shaded. The confidence intervals for the two eyes do not come close, indicating significant and substantial differences in spatial frequency tuning between the two eyes. About 10% of cells showed evidence of such a difference.

The selection criteria applied in obtaining Fig. 6 exclude an interesting class of cells in which the response in the nondominant eye was very weak, but was maximal at those frequencies that produced the weakest responses in the dominant eye. Two examples are illustrated in Fig. 8, A and B. On the face of it, these cells show a severe mismatch in spatial frequency tuning, with the nondominant eye being tuned to frequencies an order of magnitude lower than the dominant eye. However, we believe a more plausible explanation is that the spatial structure of the receptive fields is really similar in the two eyes (as in the vast majority of cells, Fig. 6), but that the nondominant eye exerts a suppressive effect. This interpretation is supported by the experiments with random-dot patterns. Both these cells are disparity-selective, but also show virtually no response to random dots in the nondominant eye (Fig. 8, C and D). Thus, such cases are further evidence for purely inhibitory input from one eye.

Disparity frequency and spatial frequency tuning

We now turn to possibly the most important prediction of the energy model: the shape of monocular receptive fields determines the shape of the disparity-tuned response (Anzai et al. 1999bGo; Ohzawa et al. 1997Go). Because most cells do indeed show similar spatial frequency and orientation tuning in the two eyes, we shall assume in this section that the assumptions of the original energy model hold true. Then, the original energy model predicts that the disparity-tuning curve is simply the cross-correlation of the receptive fields in the left and right eyes.

For simple cells, which are single binocular subunits (Eq. 2), this prediction can be tested directly. For complex cells, which represent the sum of several binocular subunits (Eq. 3), the disparity tuning curve is predicted to be the sum of the cross-correlations of the receptive fields in the component subunits. This makes the prediction hard to test in complex cells because it is difficult to obtain the receptive fields of the component subunits experimentally. Fortunately, provided all subunit receptive fields have the same preferred orientation, the comparison can be made without a direct measurement of receptive field profile. We simply need to obtain 1) the cell's response to binocular random-dot patterns as a function of disparity along an axis orthogonal to this preferred orientation, and 2) the cell's response to monocular sinusoidal gratings oriented parallel to this preferred orientation, as a function of spatial frequency. The energy model predicts that the shape of the Fourier amplitude spectrum of the disparity tuning curve measured in 1) will be given by the monocular spatial frequency tuning curves measured in 2). In particular, their peaks should coincide: that is, the disparity peak frequency, defined as the position of the peak in the Fourier amplitude spectrum, should be the preferred spatial frequency of the cell. This key prediction of the original energy model, which depends critically on its linear properties, holds for both simple and complex cells. Previous work (Ohzawa et al. 1997Go; Prince et al. 2002bGo) has suggested that this prediction is not fulfilled, but, as discussed above, these studies leave open a number of possible ways in which the data could be reconciled with the energy model. We carried out a detailed comparison, using bootstrap resampling to estimate the significance of any discrepancy.

Figure 9 shows the comparison for 3 neurons, illustrating the common patterns observed. The left-hand column shows the disparity tuning curves. On the right, the Fourier amplitude spectrum of the disparity-modulated component is compared with the spatial frequency tuning in the dominant eye. For both these quantities, two estimates are shown: one from the raw data and one from the fitted function. The raw SFTCs are shown with filled circles ({bullet}) in the plots on the right, whereas the fits are drawn with the black curve. The disparity-modulated component can be estimated from the raw data by subtracting the mean response to uncorrelated stimuli (horizontal line labeled with the letter U and the symbol {square} in the left-hand plots) from the mean response of the cell to random-dot stereograms at different disparities [filled circles ({bullet}) in the plots on the left]. The Fourier spectrum of this is shown on the right with a dotted gray line ["FT-DMC (data)" in the legend]. Alternatively, the disparity-modulated component can be obtained from the fitted Gabor (solid curve in the left-hand plots). The Fourier spectrum of this is shown on the right with a dashed gray line ["FT-DMC (fit)"].



View larger version (34K):
[in this window]
[in a new window]
 
FIG. 9. Comparing disparity and spatial frequency tuning. Left: disparity-tuning curves (symbols as in Fig. 3). For ruf072 (A), dashed curve shows unrectified Gabor, which dips below zero. Right: Fourier transform of disparity-modulated component of disparity-tuning curve (FT-DMC; gray) compared with spatial frequency tuning curve (SFTC) in dominant eye (SFTC; black), scaled to same peak value. Black dots: mean firing rate as function of spatial frequency; black curve: log-Gaussian fitted to these data. Dotted gray line ["FT-DMC (data)"]: Fourier amplitude spectrum of raw disparity-tuning curve minus mean response to uncorrelated stimuli. Dashed gray line ["FT-DMC (fit)"]: Fourier amplitude spectrum of fitted Gabor minus fitted baseline. A: ruf072: FT-DMC resembles SFTC. B: duf065: FT-DMC appears shifted toward lower frequencies than SFTC. C: duf067: FT-DMC is low-pass, SFTC is band-pass.

 

In a few cases (Fig. 9A) the Fourier transform of the disparity-modulated component (FT-DMC) did closely resemble the SFTC, but for the majority of cases there were substantial discrepancies, of two types. First, the peak of the FT-DMC was often at a lower frequency than the peak of the SFTC (Fig. 9B). Second, the FT-DMC was often close to low-pass in form, despite a clear band-pass SFTC (Fig. 9C).

We had 105 disparity-selective neurons that were sufficiently responsive to gratings in the dominant eye, selective for spatial frequency, and adequately described (>60% variance explained) by the Gaussian fit. To avoid making the assumption that all disparity tuning curves were well described by Gabors, we first used a model-independent estimate of the disparity peak frequency, using the response to uncorrelated stimuli as an estimate of the baseline of the disparity tuning curve, and taking the continuous Fourier transform of the raw data. We compared this estimate of disparity peak frequency with the SFTC peak frequency of the Gaussian fit, for the 105/112 cells in which the response to uncorrelated stimuli was available and in which the Gaussian fitted to the SFTC explained >60% of the variance. The disparity peak frequency was less than the SFTC peak frequency in 84/105 of cells [P < 10–9 under the null hypothesis that the estimated disparity peak frequency is as likely to be above the SFTC peak frequency as below it (binomial distribution)]. The frequency difference was individually significant in 43/84 cells.

This model-independent method of extracting the disparity peak frequency has two disadvantages. First, in about 10% of cells, the disparity tuning curve appeared to be truncated by the lower limit of 0 spikes/s. These cells may represent an energy model unit followed by an output threshold. It is possible that the discrepancy between the disparity peak frequency obtained from the raw data, and the SFTC peak frequency, may reflect distortions introduced into the Fourier spectrum by the threshold. For these cells, a better estimate of the underlying response may be gained from the unrectified Gabor corresponding to the half-wave rectified Gabor fitted to the data (shown in Fig. 9A). Second, because the Fourier spectra of raw data are usually noisy and multimodal, it is hard to extract measures of bandwidth. Again, this is solved by using the fitted Gabor.

For those 99 cells in which the Gabor fitted to the disparity tuning curve explained >60% of the variance, we therefore repeated the analysis using the estimate of disparity peak frequency derived from the fit. The results are shown in Fig. 10A, which plots the disparity peak frequency against the preferred spatial frequency in the dominant eye, both derived from the fitted functions. The solid line marks the identity line; according to the energy model, all points should lie on this line. In fact, the SFTC peak frequency was greater than the disparity peak frequency in 80/99 cells (P < 10–9, binomial), and the difference was significant in 51/80 individual cells (resampling; these are the filled symbols in Fig. 10A). Thus, we obtain very similar results whether we use the fitted Gabor or the raw disparity tuning curve.



View larger version (23K):
[in this window]
[in a new window]
 
FIG. 10. These 4 panels compare the Fourier spectrum of the disparity tuning curve (FT-DMC), after subtraction of the fitted baseline, with the grating spatial frequency tuning in the dominant eye. In each case, the identity line is marked, and filled symbols show cells where properties of disparity tuning curve differed significantly from those obtained with gratings (P < 5%, resampling). Circles indicate cells from monkey Duf, squares from Ruf. All quantities were estimated from fits to data (99 cells). A: frequency at which Fourier spectrum peaks. B, C: low cut and high cut, i.e., frequencies on either side of peak at which fitted spectrum falls to half its maximum value. In AC, one cell falls outside range of axes; in each case its FT-DMC values were higher than SFTC values, but this difference was not significant. D: relative power at lowest frequency tested monocularly (i.e., power at this lowest frequency, normalized by the power at the peak of the frequency spectrum).

 

We also examined the high and low cutoff frequencies of the fitted functions. Figure 10, B and C plot the cutoff frequencies for the disparity tuning curve against those for the SFTCs. Again, the energy model predicts that all points should lie on the identity line (marked with the solid line). In fact, the low cuts differed significantly in 43/99 of cells, whereas the high cuts differed in 67/99 (filled symbols). Once again these significant differences nearly all reflect relatively more power at low frequencies in the FT-DMC than in the SFTC.

In many cases, there is so little attenuation of the FT-DMC at low frequencies that the low-cut frequency is not defined (plotted as a low cut of zero). The discrepancy in the response at very low frequencies is made clearer in Fig. 10D, which compares the relative power at the lowest frequency tested monocularly. In 77/99 cells, the FT-DMC contains more power at these low frequencies than the SFTC (P < 10–6, binomial). In 45/77 cases, this difference is significant (filled symbols). In many cases, the disparity tuning curve is close to Gaussian in form (relative power = 1; i.e., no attenuation at low frequencies). That this occurs in the presence of a band-pass SFTC is a dramatic deviation from the energy model. The band-pass SFTC implies that the receptive fields of the subunits contain both "ON" and "OFF" subregions. At a disparity equal to the separation of the "ON" and "OFF" regions, the contributions from the left and right eyes should be negatively correlated, producing a response that is smaller than the response to uncorrelated dots. These suppressive side lobes in the disparity-tuning curves are often not found. Prince et al. (2002bGo) also noted that many disparity-tuning curves were Gaussian in form. However, those data were not clearly at odds with the energy model for two reasons. First, disparity was applied horizontally, regardless of receptive field orientation, so it remained possible that suppressive side lobes would have emerged if disparity had been applied orthogonally to the preferred orientation. Second, their data on spatial frequency tuning were not generally sufficient to exclude the possibility of a substantial low-pass component in the monocular SFTC. The present data eliminate both of these difficulties, and clearly indicate a need for more complex processing than the original energy model can provide.

VERGENCE EYE MOVEMENTS ARE UNLIKELY TO EXPLAIN THE MISMATCH. One possible explanation of the mismatch between disparity tuning and spatial frequency tuning is that the monkeys may be making small vergence movements. This would have the effect of introducing jitter into the disparity of each stimulus. Effectively, we would be summing several disparity tuning curves of the form predicted by the energy model, each with a random disparity offset. This tends to smear out the sidelobes, shifting the peak of the disparity power spectrum to lower frequencies. This process is illustrated in Fig. 11. Consider an energy-model binocular subunit, whose receptive fields in both eyes are identical, with no position disparity, both having the profile shown in Fig. 11A. The thin line in Fig. 11B shows the disparity tuning curve that would be obtained for this subunit in the absence of vergence movements. Now suppose the monkey makes random vergence movements, so that the disparity of his actual fixation point relative to the fixation target at any moment is a Gaussian centered on zero. This means that the disparity-tuning curve actually measured is the true curve convolved with this Gaussian. The result is shown with the thick line in Fig. 11B. Figure 11C shows the effect on the Fourier amplitude spectrum of the disparity-modulated component. The thin line shows the power spectrum for the original tuning curve, and the thick line for the observed version contaminated by vergence. The vergence has had two effects: it has shifted the peak of the disparity power spectrum toward DC, and it has greatly reduced the amplitude. For clarity, therefore, the broken line shows the observed disparity power spectrum scaled up to the same amplitude as the uncontaminated one. The same effect would be obtained if the neuron being recorded from represented the sum of several subunits that differed in their position disparities.



View larger version (14K):
[in this window]