## Abstract

Binocular disparity is an important cue for depth perception. To correctly represent disparity, neurons must find corresponding visual features between the left- and right-eye images. The visual pathway ascending from V1 to inferior temporal cortex solves the correspondence problem. An intermediate area, V4, has been proposed to be a critical stage in the correspondence process. However, the distinction between V1 and V4 is unclear, because accumulating evidence suggests that the process begins within V1. In this article, we report that the pooled responses in macaque V4, but not responses of individual neurons, represent a solution to the correspondence problem. We recorded single-unit responses of V4 neurons to random-dot stereograms of varying degrees of anticorrelation. To achieve gradual anticorrelation, we reversed the contrast of an increasing proportion of dots as in our previous psychophysical studies, which predicted that the neural correlates of the solution to correspondence problem should gradually eliminate their disparity modulation as the level of anticorrelation increases. Inconsistent with this prediction, the tuning amplitudes of individual V4 neurons quickly decreased to a nonzero baseline with small anticorrelation. By contrast, the shapes of individual tuning curves changed more gradually so that the amplitude of population-pooled responses gradually decreased toward zero over the entire range of graded anticorrelation. We explain these results by combining multiple energy-model subunits. From a comparison with the population-pooled responses in V1, we suggest that disparity representation in V4 is distinctly advanced from that in V1. Population readout of V4 responses provides disparity information consistent with the correspondence solution.

- binocular vision
- correspondence problem
- stereopsis
- V1 and MT
- visual area V4

stereoscopic perception of surface in depth relies on a matching process that solves the binocular correspondence problem (i.e., which parts of the left-eye image correspond to which parts of the right-eye image; Julesz 1971). Anticorrelated random-dot stereograms (aRDSs; Fig. 1*A*) are used to identify the visual areas that solve the problem. In aRDSs, all the dots in one eye's image are contrast-reversed to eliminate the solution to the correspondence problem. Therefore, disparity selectivity for aRDSs is evidence against the neuron's explicit representation of the correspondence solution. Many individual neurons in the primary visual (V1), middle temporal (MT), and medial superior temporal cortices invert their disparity tuning functions for aRDSs relative to those for correlated RDSs (Cumming and Parker 1997; Krug et al. 2004; Takemura et al. 2001; see Parker 2007 for review). The inverted tuning is consistent with correlation-based disparity encoding such as the disparity energy model (Fig. 1, *B* and *C*; Ohzawa et al. 1997), and neurons with such tuning are unlikely to support surface-in-depth perception. In contrast, neurons in inferior temporal (IT) and anterior intraparietal (AIP) areas lose disparity selectivity for aRDSs almost completely (Janssen et al. 2003; Theys et al. 2012), suggesting that the problem is solved before or at the level of these higher areas in the cortical hierarchy.

Neurons in area V4, an intermediate ventral visual area, show disparity tuning characteristics intermediate between V1 and IT (Kumano et al. 2008; Tanabe et al. 2004). However, close inspection of disparity tuning amplitude to aRDSs reveals a large overlap between V1 and V4 (Cumming and Parker 1997; Tanabe et al. 2004). Furthermore, recent studies show that over half of V1 neurons exhibit response patterns that deviate from the energy model, and that this deviation is systematically biased so that responses to highly unnatural anticorrelated stimuli are attenuated (Haefner and Cumming 2008; Samonds et al. 2013). Therefore, it is unclear how much the processing from V1 to V4 contributes to solving the stereo correspondence problem.

In this study, we examined the disparity representation in V4 by expanding the conventional anticorrelation technique: using not only fully anticorrelated RDSs but also several intermediates between correlated and anticorrelated RDSs (see Doi et al. 2011, 2013 for human psychophysics experiments with the same graded-anticorrelation technique). An example of such an intermediate is an RDS where half of the dots have matched contrasts between the two eyes and the other half are contrast-reversed (Fig. 1*A*, *middle*). This half-matched RDS offers zero binocular correlation but carries the disparity information detectable to a process that operates exclusively on binocularly matched elements (Doi and Fujita 2014). Thus the graded anticorrelation is useful to dissociate two types of disparity representations: one resulting from a simple correlation-based mechanism (correlation-based representation), and the other created by a more sophisticated correspondence process (match-based representation). In correlation-based disparity representation, the amplitude of disparity tuning should reflect the strength (i.e., the absolute value) of the stimulus correlation level, whereas the tuning shape should reflect the sign of binocular correlation (i.e., inverted tuning for anticorrelation; Fig. 1*B*, *left*). In match-based representation, the disparity tuning should decrease its amplitude gradually as the level of anticorrelation is increased, while retaining the tuning shape (Fig. 1*B*, *right*).

Previous neurophysiological studies with anticorrelated RDSs quantify the disparity representation by individual neurons using amplitude ratio, which is the tuning amplitude for an anticorrelated RDS divided by the amplitude for a normal, correlated RDS (Cumming and Parker 1997; Tanabe et al. 2004). The amplitude ratio as a function of graded anticorrelation should have distinct patterns for correlation-based and match-based representations (Fig. 1*C*). The correlation-based representation will have a V-shaped profile with zero amplitude ratio at 0% correlation, whereas the match-based representation should have a monotonic decrease. The key feature of the latter prediction is a monotonic decrease over a wide range of binocular correlation, not a linear decrease. This prediction is based on our psychophysical observation that the performance of fine depth discrimination deteriorates monotonically over a wide range of negative binocular correlation (Doi et al. 2011) and is also approximately consistent with the signal strength from a computation that encodes disparity using only contrast-matched elements (Doi and Fujita 2014).

We found that the disparity tuning amplitudes of individual V4 neurons decreased sharply to a nonzero baseline level when a small amount of graded anticorrelation was applied. This was inconsistent with both match-based and correlation-based disparity representations. However, the disparity tuning shapes changed more gradually in response to graded anticorrelation. We combined the tuning curves of a population of neurons according to a readout model for near/far discrimination. The amplitude of this population readout gradually decreased toward zero, consistent with match-based representation. We suggest that population readout of the disparity-selective responses in V4 represents a solution to the correspondence problem.

## MATERIALS AND METHODS

#### Subjects and surgery.

The experiments were carried out on two adult male monkeys, one *Macaca mulatta* and one *Macaca fuscata* weighing 6.5 and 7.5 kg, respectively. We prepared the monkeys for experiments by surgically installing a head post, two scleral search coils, and a recording chamber (for details, see Kumano et al. 2008; Uka et al. 2000). Before each operation, the monkeys were injected with atropine to reduce salivation and were temporarily sedated with ketamine (5 mg/kg im). Inhalational anesthetic, isoflurane (0.3–2% Forane; Abbott), was then administered via a tracheal tube in a mixture with nitrous oxide (66%) and oxygen (33%) during surgery. A plastic head holder was fixed onto the skull using stainless steel screws and acrylic resin. The center of a plastic recording chamber was placed at 25 mm dorsal and 5 mm posterior to an external ear canal to allow recordings from V4. We drilled the skull in the center of the chamber for transdural electrode penetrations. Search coils for monitoring the eye positions were implanted between conjunctiva and the sclera in both eyes. Postoperatively, we administered general antibiotics (piperacillin sodium), antibiotics for eyes (ofloxacin), glucocorticoid (betamethasone), and analgesics (ketoprofen). The Animal Experiment Committee of Osaka University approved the care, surgical, and experimental procedures in accordance with the National Institutes of Health “Guide for the Care and Use of Laboratory Animals” [DHEW Publication No. (NIH) 85-23, Revised 1996, Office of Science and Health Reports, DRR/NIH, Bethesda, MD 20205].

#### Task and visual stimulation.

Monkeys sat in a primate chair with their heads fixed. Their eyes were placed 57 cm away from a 21-in. cathode ray tube (CRT) monitor (Trinitron Multiscan G520; Sony) in a dark booth. The display covered a visual field of 40° × 30° and had a refresh rate of 85 Hz. A pair of ferroelectric liquid crystal shutter glasses (DR-95; Displaytech) enabled dichoptic presentation of stimuli. The shutter goggles alternated such that each eye saw 12 ms of stimulus alternating with 12 ms of darkness. A commercial system (TEMPO; Reflective Computing) was used for task control and data acquisition. We continuously monitored eye positions during experiments (Eye Position Detector; DATEL). When a fixation point (0.2° × 0.2°) appeared at the center of the screen at the beginning of a trial, the monkeys were required to make an eye movement toward it within 350 ms. A visual stimulus was presented on the display 600 ms after the fixation point onset and lasted 700 ms for one monkey and 500 ms for the other. The monkeys were required to keep fixation within a 1° × 1° window and the vergence angle within ±0.5° of the screen plane until the trial was completed (200 ms after stimulus offset). The successful fixation was positively reinforced by delivery of reward (water) at the end of each trial.

We generated visual stimuli using a graphics application programming interface (OpenGL Utility Toolkit). The stimuli were center-surround RDSs (Fig. 1*A*) containing an equal number of bright (2.28 cd/m^{2}) and dark (0.008 cd/m^{2}) dots randomly positioned on a mid-luminance background (1.14 cd/m^{2}). The luminance was measured through active shutter glasses and linearized (gamma corrected). We minimized interocular cross talk by using only red phosphor, which had the shortest decay time among the three phosphors. Cross talk for the left-to-right and right-to-left eyes was 1.10% and 0.89%, respectively. The dots (0.17° × 0.17°) were anti-aliased at a subpixel resolution. The dot density was 25%, and the dot pattern was refreshed at 10.6 Hz. We varied binocular disparity and correlation only in the center patch of RDSs (see below). The diameter of the center patch was determined so that it covered the classical receptive field (RF). During manual estimation of RF, we used a zero-disparity small RDS patch (∼2° in diameter), or in a few cases, a drifting sinusoidal grating. We kept the size of the center patch constant among different disparity values for each neuron. The surrounding annulus was always 0.5° in width for all neurons.

Graded anticorrelation of RDSs was achieved by gradually increasing the percentage of binocularly contrast-reversed dots in the center patch of RDSs (Fig. 1*A*; see Doi et al. 2011 for details). On each trial, one of seven correlation levels (±100%, ±70%, ±30%, and 0%) and one of seven horizontal disparities (±1.2°, ±0.8°, ±0.4°, 0°) were randomly assigned to the center patch. Negative and positive disparities indicate crossed (near) and uncrossed (far) disparities, respectively. The surround annulus always had 100% correlation level and 0° disparity. The total number of stimuli was 52, including monocular right, monocular left, and uncorrelated RDSs. These stimuli were randomly ordered in a block of 52 trials to average out possible response modulation by stimulus history and non-stimulus-driven factors such as top-down attention and arousal. We tested each stimulus on average for 9.9 repetitions (SD 2.7; minimum 6).

#### Simulation of effective correlation.

We simulated an effective binocular correlation of our RDSs for the case of being viewed through a haploscope setup or a shutter glasses setup. In the simulation, RDSs had 10 × 10 pixels. The dot density was 24%, and dot size was 1 pixel. We used zero disparity to calculate the effective correlation for corresponding pixels in the left-eye and right-eye images. In every dot pattern, we balanced the numbers of dark and bright dots, and determined the numbers of contrast-matched and contrast-reversed dots according to the notional correlation level of the RDS. However, the dots were allowed to overlap each other (occluded dots were completely invisible, because the dot size was 1 pixel). The contrast values for the dark dot, background, and bright dot were −1, 0, and 1, respectively, for the haploscope setup and −1, 1, and 3, respectively, for the shutter glasses setup. The contrast value for shutter closure was −1. Thus the average contrast was zero for both setups. On each trial, we generated 128 frames of RDSs, which corresponded to a stimulus duration of 1.5 s at the frame rate of 85 Hz. The dot pattern was refreshed at 10.6 Hz. For the shutter glasses setup, every other frame of the stimulus sequence was replaced with shutter closure, alternately for the left-eye and right-eye images (Fig. 2*A*, *right*). Twenty-two frames of mid-luminance background were added to the beginning and end of the stimulus sequence to model the prestimulus and poststimulus periods. To calculate physiologically plausible effective correlation, we convolved these stimulus sequences with the temporal kernel of either the monophasic (low pass/sustained) or biphasic (bandpass/transient) cells in macaque V1 (Fig. 2, *B* and *C*, *left*, *insets*; De Valois et al. 2000; Hawken et al. 1996; see Doi et al. 2013 for the definitions and characteristics of these kernels). The temporal filtering was performed separately for each pixel of the RDSs. We calculated binocular interaction by multiplying the left-eye and right-eye filtered signals at corresponding pixels (Fig. 2*D*). We averaged the binocular interaction across frames, pixels, and trials (100 trials for each condition) to obtain an average interaction. The average interaction was normalized by the value at 100% notional correlation with the haploscope setup, separately for the monophasic and biphasic filters (Fig. 2*E*). Finally, we subtracted the averaged interaction obtained for uncorrelated RDSs (Fig. 2*E*, dotted lines) from the averaged interaction obtained for graded anticorrelation RDSs (solid lines). The effective correlation calculated this way corresponds to the amplitude of a neuronal disparity tuning (the peak height at preferred disparity minus baseline level for uncorrelated RDSs).

#### Electrophysiology and experimental protocol.

We mapped area V4 based on the estimated locations of lunate and superior temporal sulci and the RF size-eccentricity relationship (Desimone and Schein 1987; Gattass et al. 1988; Watanabe et al. 2002). For one monkey, we anatomically confirmed the locations of the penetrations to V4 in the dorsal prelunate gyrus after transcardial perfusion of the brain (Fig. 3). The location of the recording area and the RF eccentricity of the recorded neurons (9° ± 3°, mean ± SD) indicate that our samples were from the dorsal part of V4 (V4d; Felleman and Van Essen 1991), not from V4A (Kolster et al. 2014). A custom-made, glass-coated tungsten microelectrode (0.3–1.4 MΩ at 1 kHz) was inserted into the brain with a micromanipulator (MO-95S; Narishige). Extracellular signals were amplified (MDA-4I; Bak Electronics) and filtered (3624; NF Corporation; 500 Hz-5 kHz). Action potentials were isolated with either window discriminator (DDIS-1; Bak Electronics) or template-matching software (Multi Spike Detector; Alpha-Omega Engineering), and recorded at a 1-kHz sampling rate. We presented a zero-disparity RDS patch during the search for single-neuron activities. The correlation level of the probe RDS was chosen randomly in each recording session to avoid sampling bias toward a particular correlation level.

#### Data analysis.

For each trial, we counted the number of spikes from 80 ms after stimulus onset to 80 ms after stimulus offset to calculate the firing rate of the V4 neuron (Schmolesky et al. 1998; Tanabe et al. 2004). The spontaneous activity was calculated for a 250-ms period preceding the stimulus onset. The firing rates were averaged across repeated trials to obtain the mean firing rate for each correlation level and disparity. We quantified the disparity tuning functions by both model-fitting and model-free methods.

In model fitting, we described the mean firing rate (*R*) as a function of disparity (*x*) using a Gabor function, a product of Gaussian envelope and a cosine carrier, defined as follows:
(1)

where *A*, φ, *y*_{0}, *x*_{0}, σ, and *ƒ* are amplitude, phase, baseline response, horizontal position of the Gaussian envelope, width of the Gaussian envelope, and disparity frequency, respectively. We half-wave rectified *R* because the firing rate cannot be negative. Only the amplitude and phase parameters were independently estimated across different correlation levels, since how the amplitude and shape of the disparity tuning curve depend on anticorrelation was the focus of the present and earlier studies (Cumming and Parker 1997; Krug et al. 2004; Tanabe et al. 2004). The other parameters were shared among correlation levels. We used the “fmincon” function in the MATLAB Optimization Toolbox (The MathWorks) to find the parameter combination that minimized the summed squared error between single-trial firing rates and Gabor functions. The goodness of fit was evaluated by taking the *R*^{2} between the mean firing rates and the best fitted Gabor function separately for each correlation level.

For the model-free characterization of the disparity tuning function, we calculated the difference between the maximum and minimum mean firing rates (peak-to-trough amplitude) and the disparity that elicited the maximum mean firing rate (preferred disparity) for each correlation level. We then obtained the ratio of peak-to-trough amplitudes and the difference of preferred disparities between each correlation level and 100% correlation level.

To quantify the shape of the disparity tuning curves, we used symmetry phase, which directly quantifies the relative contributions of even and odd symmetric components to the shape of the fitted tuning curve (for details, see Read and Cumming 2004; Tanabe et al. 2005). We calculated the ratio of amplitude parameters and the difference of symmetry phases between 100% correlation and each of the other correlations. For correlation-based representation, the amplitude ratio should have a V-shaped profile, with a linear decrease from 1 to 0 as the correlation level is decreased from 100% to 0% and a linear increase from 0 to 1 as the correlation level is further decreased from 0% to −100% (Fig. 1*C*, black line). The phase difference should be zero and π for any positive and negative correlations, respectively. For match-based representation, the amplitude ratio should gradually decrease from 1 to 0 as the correlation level is decreased from 100% to −100% (Fig. 1*C*, gray line). The gradual decrease does not have to be in a linear fashion if the decreasing range spans over a wide range of correlation level (Doi et al. 2011; Doi and Fujita 2014). The phase difference should be zero for any correlation level. Appropriate circular statistics were used to quantify the mean and standard deviation (SD) of phase differences (Berens 2009).

We directly compared how the graded anticorrelation influenced the amplitude ratio and the SD of the phase-difference distribution. First, we took the negative of the SD so that it decreased with anticorrelation as the amplitude ratio did. Second, we normalized both amplitude ratio and phase-difference SD such that the normalized value varied from 1 to 0 as the correlation level decreased from 100% to −100%. We estimated the SD of phase difference at 100% correlation using a bootstrap method. We constructed an artificial phase-difference distribution by taking the difference between the phase of the original tuning curve and that of the bootstrap tuning curve. We generated the bootstrapped tuning curve by resampling trial-by-trial firing rates with replacement. With the use of the bootstrap method, we took into account the contribution of firing-rate variability to the SD of phase-difference distributions, which was present at every correlation level. Third, we fitted a sum of exponential functions and a Weibull function to the amplitude ratio and the SD of phase-difference distributions, respectively. We chose these functions because they described the data well; we did not intend any mechanistic interpretation from the fittings. The squared error was minimized during the fitting. Finally, we calculated the correlation level at which the normalized amplitude ratio or normalized SD of the phase-difference distributions had a value of 0.5 (the half-decay correlation level). The sum of exponential functions was defined as follows: (2)

where ρ denotes the correlation level, *a* and *b* denote decay parameters, and *w* is the relative weight for the two exponents. The Weibull function was defined as follows:
(3)

where *c* denotes the scaling factor, α denotes the threshold parameter; and β denotes the slope parameter.

We used a bootstrap method to calculate the 95% confidence interval (CI) of the symmetry phase difference (see Fig. 8*A*). The CI measures the estimation reliability of a particular curve feature, phase difference in this case, and thus is different from *R*^{2}, which quantifies the overall match between the fitted curve and data points. To calculate the CI, we randomly resampled from the observed trial-by-trial responses to create an artificial data set for each stimulus condition (binocular disparity and correlation). The resample was made with replacement for the number of trials actually repeated in the experiment. We applied the same analyses that we used for the recorded data to the artificial data and obtained an artificial phase difference. We repeated the procedure 100 times to obtain a distribution of the symmetry phase difference at each correlation level and calculated its 95% CI. This resampling method from trial-by-trial responses was also used to estimate the phase difference at 100% correlation (see Fig. 9).

We also used bootstrap methods to estimate the range of error or confidence in various results. For the errors in SD of the symmetry phase-difference results (see Figs. 7*B* and 8*B*), we resampled from the symmetry phase-difference distribution at each correlation level. The number of resamples was the same as the number of cells (*n* = 88). Similarly, we resampled from the distribution of the amplitude ratio and symmetry phase difference to calculate the range of confidence in their respective half-decay correlation levels (see Fig. 9).

We calculated “population readout” by combining the responses of all the disparity selective cells in our data set (*n* = 92). The population readout represents population disparity information in the case when responses of individual neurons are decoded to make near vs. far decision. A theory for such a two-alternative decision was developed in human psychophysics as signal detection theory and later applied to monkey physiology (Britten and Newsome 1998; Green and Swets 1988; Shadlen et al. 1996). According to the theory, the decoding process involves pooling the activities of near-preferring neurons and those of far-preferring neurons separately. The information irrelevant to near/far discrimination is averaged out. The two pooled activities are compared, and whichever pool that has the larger response determines the choice (Prince and Eagle 2000; Shiozaki et al. 2012; see Shadlen et al. 1996 for the theory originally proposed for motion direction discrimination). To calculate the population readout, first we normalized the mean firing rates of each neuron by the firing rate at its preferred disparity at 100% correlation. Second, we reflected the tuning curves of far-preferring neurons (*n* = 15/92) at zero disparity so that they would become near-preferring neurons. For example, the responses of far-preferring neurons to +0.4° disparity were considered to be the responses to −0.4° disparity, and vice versa. The assumption underlying this operation is that for every far-preferring neuron, there is a near-preferring neuron with the same response characteristics in V4. Third, we averaged the mean firing rates across individual neurons at corresponding disparity values. Finally, we fitted Gabor functions to the population-averaged firing rates to calculate the amplitude ratio, the Gabor amplitude for gradually anticorrelated RDSs divided by the amplitude for correlated RDSs. In addition to this main readout model, we calculated the population readout in three alternative ways. In the first case, the calculation was the same as the main model, except that we performed no response normalization. In the second case, the responses of individual neurons were multiplied by readout weights, instead of being normalized, to make the population readout suitable for near/far discrimination. We calculated the readout weights by subtracting the mean firing rate for +0.4° disparity from that for −0.4° disparity at 100% correlation. These weights are optimal for discriminating ±0.4° disparities if responses are uncorrelated among neurons (Chen et al. 2006). In the third case, individual tuning curves were horizontally shifted to be aligned at their preferred disparities after far-to-near conversion (with no response normalization). We used a resampling method to estimate the variability of the population readout's amplitude ratio. We constructed an artificial data set by resampling 92 neurons from the original data set with replacement and calculated the population readouts using the same methods. We reported the standard deviations of the amplitude ratios across 100 artificial data sets.

We performed similar analyses on the populations of V1 (*n* = 72) and MT (*n* = 140) neurons recorded in the previous studies (Cumming and Parker 1997; Krug et al. 2004). The V1 neurons were selected on the basis of the disparity selectivity for correlated RDSs (ANOVA, *P* < 0.05) and correspond to the data plotted in Fig. 4 of Cumming and Parker (1997). The MT neurons were selected on the basis of the disparity selectivity for both correlated RDSs and correlated cylinder stereograms (ANOVA, *P* < 0.05). This was similar to the selection procedure that we used for our V4 data set, in that the disparity selectivity for anticorrelated RDSs was not considered. Thus the MT population analyzed in the present study was much wider than the data analyzed in Fig. 3 of Krug et al. (2004), where the cells not disparity selective for anticorrelated RDSs were discarded (ANOVA, *P* > 0.05). We also calculated the population readout for a subset of MT neurons that showed significant disparity selectivity not only for correlated RDSs but also for anticorrelated RDSs (*n* = 65). This subpopulation was similar to that analyzed in Fig. 3 of Krug et al. (2004; note that this figure included 6 more neurons without significant disparity selectivity for correlated cylinder stereograms).

For V1, we calculated the population readout by aligning the tuning curves of individual neurons based on their preferred disparities for correlated RDSs. The tuning curves of individual V1 neurons each cover only a small range of disparity and overlap each other less than V4 or MT tuning curves. Thus, without tuning curve alignment, the V1 population readout was noisy, being based on the responses of only a small number of neurons at each disparity. Although this method was similar to the third alternative readout we used for the V4 data set, we aligned and averaged the fitted curves, not raw data points, for the V1 data set. This was because the range and interval of sampled disparities differed greatly from cell to cell in the V1 data set. Individual curves were horizontally reflected around zero disparity and normalized by their peak values before the population average was taken. For the MT data set, we calculated the population readout using the main readout model that we applied to the V4 data set. We excluded some data points from the MT population readout, where the firing rate was recorded from fewer than 100 neurons. The excluded data points were based on disproportionally smaller numbers of neurons (mean: 6.6).

## RESULTS

#### Effective correlation of our RDSs.

We ran simulations to examine whether the use of shutter glasses would introduce artifactual distortion of effective binocular correlation to our stimuli, compared with the use of a haploscope, which represents the ideal case without any distortion. Figure 2*A* shows example sequences of luminance contrast at a given pixel of a correlated RDS with the haploscope setup (*left*) and the shutter glasses setup (*right*). In both cases, the sequence is background → bright dot → background → dark dot → background. These sequences were filtered with two types of temporal kernels modeled after the monophasic (low pass) and biphasic (bandpass) cells in the primate V1 (De Valoi et al. 2000; Hawken et al. 1996). The monophasic filter removed the temporal oscillation of contrast signal generated by shutter glasses (Fig. 2*B*, *right*), whereas the biphasic filter passed a small amount of oscillation, which was binocularly anticorrelated (Fig. 2*C*, *right*). We calculated binocular interaction by multiplying the left-eye and right-eye filtered signals. The binocular interaction of the monophasic filter outputs was well matched between the haploscope and shutter glasses setups, whereas the binocular interaction of the biphasic filter outputs was slightly more anticorrelated for the shutter glasses setup than for the haploscope setup (Fig. 2*D*). We then averaged the binocular interaction across space and time separately for the uncorrelated RDS (dotted lines in Fig. 2*E*) and nine levels of gradually anticorrelated RDSs (solid lines). For the monophasic filter output, the average interaction with the two setups was linearly related to notional correlation with virtually identical slope and *Y*-intercept (Fig. 2*E*, *left*). For the biphasic filter output, the average interaction with the shutter glasses setup had a shallower slope than that with the haploscope setup (the ratio of the 2 slopes was 0.66; Fig. 2*E*, *right*). However, the average interaction at 0% notional correlation had the same level as that for uncorrelated RDSs (horizontal dotted lines in Fig. 2*E*). Once the baseline level for uncorrelated RDSs was subtracted, the effective correlation for the biphasic filter output had an unbiased, linear profile even with the shutter glasses setup (Fig. 2*F*, *right*). V4 neurons presumably receive the outputs of both monophasic and biphasic cells in V1, because their response depends on the outputs of both magnocellular and parvocellular layers of the lateral geniculate nucleus (Ferrera et al. 1992). Therefore, the effective correlation for V4 neurons may be a mixture of the results shown in Fig. 2*F*, *left* and *right*. The shutter glasses would introduce a small amount of reduction in the effective correlation, but importantly, they should not distort the linearity of the effective correlation.

#### Database.

We examined the activities of 171 V4 cells, of which 158 cells (82 from the first and 76 from the second monkey) showed visual responses to at least one combination of binocular correlation and binocular disparity (Wilcoxon's signed-rank test, *P* < 0.05, with Bonferroni correction). Of the visually responsive cells, 92 were disparity selective for correlated RDSs (Kruskal-Wallis test; *P* < 0.05). All the disparity-selective cells were included in the analyses without model fitting. For the analyses based on model fitting, we further discarded four cells based on the low fitting quality of the responses to correlated RDSs (*R*^{2} < 0.6). We chose these disparity-selective neurons on the basis of their disparity selectivity to correlated RDS, the most natural binocular image in our stimulus set to which binocular neurons were expected to exhibit the strongest selectivity (Haefner and Cumming 2008). In a control analysis, we also analyzed the tuning amplitude for the population of visually responsive cells (66/158) that were excluded from the main analysis. The vergence angles of the monkeys did not depend on disparity during the recordings, except for 1 of the 92 disparity-selective cells (ANOVA, *P* > 0.05), indicating that vergence eye movements cannot explain the disparity-dependent modulation of V4 responses described in this report. The classical RF of the recorded neurons was located in the lower visual field and contralateral to the recorded hemisphere. The average eccentricity of the RF center was 9.0° (SD 3.0°).

#### Example responses of V4 neurons to graded anticorrelation of RDS.

The graded anticorrelation of RDSs influenced the amplitude and shape of disparity tuning in V4. The responses of three V4 neurons are shown in Fig. 4. These neurons responded maximally to the RDSs with 100% correlation (black curves in Fig. 4, *A–C*). As the correlation level decreased to 0%, the neurons' amplitudes reduced (green curves in Fig. 4, *A–C*) but the tunings did not become flat (i.e., zero amplitude). The shape of the disparity tuning remained consistent in this range of correlation (i.e., from 100% to 0%). Further decreases in correlation level had different effects on the disparity tunings of the example neurons. For the neurons in Fig. 4, *A* and *B*, the disparity tuning for −100% correlation (red curves) had an amplitude as large as that for 0% correlation (green curves), whereas the tuning shape was inconsistent between −100% correlation and higher (0–100%) correlations. The tuning shape changed from even symmetric to odd symmetric for one neuron (Fig. 4*A*) and from tuned excitatory to tuned inhibitory for another (Fig. 4*B*). For the neuron in Fig. 4*C*, the disparity tuning was almost completely flat for −30% correlation and below (brown, orange, and red curves). These diverse response patterns were also visible in the two-dimensional color plots as a function of disparity and correlation (Fig. 4, *D–F*). The first two examples modulated their responses even at −100% correlation, although the preferred disparity shifted with decreasing binocular correlation level (Fig. 4, *D* and *E*). The neuron in Fig. 4*F* did not modulate its response to the RDSs with low correlation.

We describe below in detail two basic changes of disparity tuning with graded anticorrelation: the decrease in tuning amplitude and an increase in the population variability of the tuning phase. We then describe the population disparity information from the point of view of a readout mechanism that makes a near vs. far decision.

#### Tuning amplitude of individual V4 neurons sharply decreased by graded anticorrelation, inconsistent with both match-based and correlation-based representations.

We examined how graded anticorrelation influenced the amplitude of disparity tuning for individual neurons. We fitted a set of Gabor functions (*Eq. 1*) to the responses of each neuron. The change in amplitude was quantified as the ratio of the amplitude parameters between each correlation level and 100% correlation (Cumming and Parker 1997). We found that the distribution of the amplitude ratios shifted toward a smaller value as the correlation level decreased (Fig. 5*A*, *left*). The mean amplitude ratio across cells sharply decreased as the correlation level decreased from 100% to 70% (Fig. 5*B*, solid line, filled circles). The decrease rate became gradual for further anticorrelation, and the amplitude ratio remained stable over a range of correlation levels from −30% to −100% (Fig. 5*B*). These basic features were preserved even when we used the ratio of peak-to-trough amplitudes of the mean responses without model fitting (Fig. 5*A*, *right*; Fig. 5*B*, dashed line, open circles).

The observed effect of graded anticorrelation on tuning amplitude was inconsistent with both correlation-based and match-based representations of disparity. The correlation-based representation predicts a V-shaped profile with the reflection point at zero correlation (Fig. 1*C*, black line). This is because the amplitude of the tuning curve should linearly decrease from a finite value to 0, as the correlation decreases from 100% to 0%. As the correlation further decreases from 0% to −100%, the amplitude of the tuning curve should increase from 0 to the same finite value with an inverted tuning shape (Fig. 1*B*, *left*). The match-based representation predicts that the amplitude ratio should decrease from 1 to 0 (Fig. 1*C*, gray line), because the tuning amplitude should reflect the proportion of binocularly contrast-matched dots (Fig. 1*B*, *right*). We found that the mean amplitude ratio of V4 neurons neither agreed with the V-shaped profile, the correlation-based prediction, nor decreased from 1 to 0, the match-based prediction.

#### The mixture of intermediates between correlation-based and match-based representations cannot explain the observed mean amplitude ratio.

We examined whether a mixture of various intermediate cells between the correlation-based and match-based representations could explain the observed change in the mean amplitude ratio. We modeled those intermediate cells as having inverted but attenuated disparity tunings for −100% correlation (Fig. 6, *A* and *B*, dashed lines). We assumed that the tuning amplitudes of these intermediate cells linearly change as a function of correlation level. That is, as the correlation level decreases from 100% to 0%, the tunings of these intermediate cells attenuate at different rates and become flat (zero amplitude) at different correlation levels. As the correlation level further decreases, the tunings inversely grow (inverted shapes) to have different amplitudes at −100% correlation. When a signed amplitude ratio is plotted (Fig. 6*A*), where a negative sign indicates an inverted tuning, the intermediate cells (dashed lines) correspond to various weighted averages of the correlation-based (black solid line) and match-based (gray solid line) representations. If the absolute value is taken, the intermediate lines become V shapes that are reflected at different correlation levels between −100% and 0% (dashed lines in Fig. 6*B*). A weighted average of these profiles (dashed and solid lines in Fig. 6*B*) can reproduce a mean amplitude ratio qualitatively similar to that of the observed data (compare Figs. 6*C* and 5*B*). This explanation predicts that the absolute amplitude ratio at −100% correlation and that at 0% correlation should be negatively correlated among individual cells, because cells with larger amplitude ratios at −100% correlation should have smaller amplitude ratios at 0% correlation, and vice versa (compare the absolute value of amplitude ratio at −100% and 0% correlation for different lines in Fig. 6*B*).

Mathematically, the signed amplitude ratio of correlation-based representation can be expressed as *S*_{c}(ρ) = ρ/100, where ρ indicates binocular correlation in units of percentage. The signed amplitude ratio for the match-based representation can be expressed as *S*_{m}(ρ) = (ρ + 100)/200. The signed amplitude ratio for an intermediate cell, which is a weighted-average of the two representations, can be expressed as *S*_{i}(ρ, *g*) = g·*S*_{c}(ρ) + (1 − *g*)·*S*_{m}(ρ), where *g* indicates a relative weight (0 < *g* < 1). When binocular correlation is 0% and −100%, signed amplitude ratios of intermediate cells are *S*_{i}(0, *g*) = 0.5(1 − *g*) and *S*_{i}(−100, *g*) = −*g*, respectively. Therefore, absolute amplitude ratios at 0% and −100% correlations have the following relationship: |*S*_{i}(0, *g*)| = 0.5[1 − |*S*_{i}(−100, *g*)|]. The absolute amplitude ratios at the two correlations should be negatively correlated.

Contrary to this prediction, we found positive correlations between −100% and 0% correlations for both amplitude ratio and peak-to-trough ratio (Fig. 6, *D* and *E*; *r* = 0.27, *P* = 0.02 for amplitude ratio; *r* = 0.38, *P* = 9 × 10^{−6} for peak-to-trough ratio). The positive correlation is predicted if *g* is negative (i.e., inhibitory input from the correlation-based unit to an intermediate cell). However, this scenario is implausible because a mixture of cells constructed with negative values of *g* does not reproduce the observed mean amplitude ratio (Fig. 5*B*). Thus it is unlikely that a mixture of intermediates between the correlation-based and match-based representations can explain the observed tuning amplitude of single V4 neurons. In addition, Fig. 6, *D* and *E*, reveals that our data contained no distinct groups of cells that have pure match-based representation or pure correlation-based representation (gray and black stars, respectively). We suggest that the tuning amplitude of single V4 neurons is not consistent with correlation-based representation or match-based representation, nor with intermediate representations between them (dashed lines in Fig. 6, *D* and *E*).

#### Graded anticorrelation gradually widened the distribution of phase difference.

We examined how the shape of the disparity tuning curve changed as a function of correlation level. Unlike previous studies (Cumming and Parker 1997; Tanabe et al. 2004), we used symmetry phase because it reflects the shape of a fitted function more accurately than the Gabor phase parameter (Read and Cumming 2004; Tanabe et al. 2005). We calculated the difference in symmetry phase between each correlation and 100% correlation to measure the change in the disparity tuning shape. The distribution of the phase difference had a mean around zero consistently across correlation levels (arrowheads in Fig. 7*A*, *left*). This was also the case for the distribution of the preferred disparity difference, calculated from the mean responses without model fitting (arrowheads in Fig. 7*A*, *right*). In the match-based representation, the tuning should have the same shape for any correlation levels (a phase difference of 0; Fig. 1*B*, *right*; note that the phase cannot be defined for the flat disparity tuning at −100% correlation). In the correlation-based representation, the tuning should have the same shape for any positive correlation but an inverted shape for any negative correlation (a phase difference of π; Fig. 1*B*, *left*). Therefore, the mean phase difference was more consistent with the match-based representation than with the correlation-based representation.

A noticeable feature of the distributions of the phase difference and peak disparity difference was that their width gradually increased as the correlation level decreased (Fig. 7*A*). The standard deviations of the distributions increased as a function of graded anticorrelation (Fig. 7*B*). These results may indicate a fundamental nature of V4 depth representation but also could arise from more trivial reasons. One such possibility is that phase differences were estimated less reliably at lower correlation levels, where the tuning amplitude was smaller (Fig. 5*B*). We calculated a 95% CI of phase difference using a bootstrap method (see materials and methods) and found that the CI tended to be larger for lower correlation levels, on average (Fig. 8*A*). However, the reliability difference alone cannot explain the widening of the phase-difference distribution with graded anticorrelation. We split the data set into two groups at each correlation level with respect to the grand median of all CI values (horizontal dashed line, Fig. 8*A*) and recalculated the SD of the phase-difference distribution for each group. We confirmed that the SD gradually increased with decreasing correlation level, even when we only used relatively reliable estimates of phase differences (Fig. 8*B*, filled circles). Moreover, there is no simple relationship between the SDs and the median CIs (Fig. 8*C*). Therefore, we suggest that wider phase-difference distributions at lower correlations reflect a genuine property of the population of V4 neurons. V4 neurons gradually change the shape of their disparity tunings with graded anticorrelation, a change that is incoherent among individual cells.

We showed that graded anticorrelation decreased the tuning amplitude and changed the tuning shape. We next examined whether these two changes reflect distinct influences of graded anticorrelation on the depth representation in V4. To this end, we compared the rates of amplitude reduction and phase shift. The mean amplitude ratio decreased but the SD of the phase-difference distribution increased with anticorrelation. To facilitate a direct comparison, we took the negative of the SD so that both amplitude ratio and the SD decreased with anticorrelation. We then normalized them into values between 0 and 1.

We found that the SD of the phase-difference distribution changed more gradually than the amplitude ratio (Fig. 9). The normalized amplitude ratio decreased to 0.5 at 77% correlation, whereas the normalized negative SD of the phase-difference distribution did so at 34% correlation (95% CIs are ±1.1% and ±4.0% for amplitude ratio and negative SD, respectively; gray horizontal error bars in Fig. 9). We suggest that graded anticorrelation influences the shape and amplitude of disparity tuning curves in a distinct way. The shape changes more gradually than the amplitude decreases.

#### Population readout of V4 responses agrees with match-based representation of disparity.

To characterize disparity representation at the level of neural population in V4, we combined the tuning curves of individual neurons to construct the population readout. The population readout represents the disparity information in the case when individual responses are read out by a decision mechanism to make near vs. far choice. Briefly, we normalized responses for each neuron, converted far-preferring tuning curves into near-preferring ones, and averaged tuning curves across all disparity-selective neurons (see materials and methods for details). The width of the phase-difference distributions should have an impact on the amplitude of the population readout. Suppose a population of neurons with similar tuning shapes for correlated RDSs. A narrow distribution of the phase difference means that these neurons share similar tuning shapes also for anticorrelated RDSs. In this case, the responses averaged across cells would retain their disparity selectivity for anticorrelated RDSs. In contrast, a wide distribution of the phase difference means that anticorrelation changes tuning shapes incoherently among the population. In the latter case, the population readout would average out the disparity tuning of individual neurons.

We found that the population readout for V4 neurons was consistent with the match-based disparity representation. The strong tuning at 100% correlation gradually attenuated to be flat at −100% correlation (compare Fig. 10*A* and Fig. 1*B*, *right*). The amplitude ratio of the population readout decreased toward zero over almost the entire range of correlation levels (compare Fig. 10*C* and Fig. 1*C*, gray). The decrease was not linear as in the simplest prediction we presented in Fig. 1*C*. However, the important feature of the data is that a small but nontrivial amount of the decrease occurred between 0% and −70% correlation. This feature agrees with the human psychophysical data, where the performance of fine depth discrimination decreased between 0% and −75% correlation (Doi et al. 2011; see Fig. 7*D* of Doi and Fujita 2014 for the subject average data). Moreover, the amplitude ratio of the population readout decreased to near zero, consistent with the performance decreasing to near chance level. By contrast, the mean amplitude ratio of individual neurons quickly decreased to a baseline level well above zero before reaching −30% correlation (Fig. 5*B*). The shape of the population readout was consistent across correlation levels, and the preferred disparity was consistently close to 0° for any correlation level (Fig. 10*A*). The gradual reduction of tuning amplitude and the invariance of tuning shape were also clear in a two-dimensional plot, where the response field was elongated vertically across binocular correlations (Fig. 10*B*). We also examined three alternative readout models in Fig. 10*D*: *1*) no decoding weights or response normalization (gray line), *2*) decoding weights beneficial for near/far discrimination (solid black line), and *3*) tuning-peak alignment (dashed black line). Like the main results (Fig. 10*C*), the amplitude ratio of all these alternatives gradually reduced toward zero over nearly the entire range of correlation levels. These results suggest that, unlike tuning curves of individual neurons, the readout of V4 population response is consistent with a solution to the stereo correspondence problem.

The population distribution of phase difference has noticeable dispersion in both V1 and MT, like the phase-difference distribution that we observed in V4 (Cumming and Parker 1997; Krug et al. 2004). This raises the possibility that, as in V4, a pooling process may average out the disparity tuning for anticorrelated RDSs in these areas. To examine this possibility, we constructed the population readout for disparity-selective cells in V1 and MT (*n* = 72 and 140, respectively). Briefly, we averaged individual Gabor fitted tuning curves after we aligned the curves at their preferred disparities for the V1 data set (see materials and methods for details). To the MT data set we applied the same method we used to construct the main population readout for the V4 data set.

The population readouts of V1 neurons showed weak but clear inversion for the anticorrelated RDS (Fig. 11*A*). The amplitude ratio of V1 population readout, evaluated by visual inspection, seems smaller than those for individual neurons (0.52 on average; Cumming and Parker 1997). Thus it is likely that the readout process can diminish the disparity selectivity of individual V1 neurons to some extent. However, the results are in contrast to the population readout in V4, which showed very little, if any, modulation for the fully anticorrelated RDS. We suggest that the population representation of disparity in V4 advances from that in V1 toward a solution to the correspondence problem.

The population readout of MT neurons showed a subtle hint of tuning-curve inversion for the anticorrelated RDS (Fig. 11*B*), but the amplitude ratio (0.13) was much smaller than the average amplitude ratio of individual neurons (0.48). The distribution of the phase difference was bimodal (open bars in Fig. 11*C*). The highest peak was at around π (inverted tuning shape), and the second peak was around zero (same tuning shape). Thus the two groups of neurons directly cancelled out their disparity tunings when the population readout was calculated. Although the population readout presented so far was similar between V4 and MT, these two areas have differential disparity representations. A sizable fraction of MT neurons had significant disparity selectivity for both correlated and anticorrelated RDSs (65/140), whereas only a small fraction (9/92) of V4 neurons met the same criterion (∼46% in MT but ∼10% in V4). The difference between the two fractions was significant (χ^{2} test, *P* = 4.67 × 10^{−9}). When we focused on the disparity-selective subpopulation in the MT data set, the phase difference was distributed more unimodally at around π (Fig. 11*C*, filled bars), as reported in the original paper (Krug et al. 2004). The population readout of the MT subpopulation showed a clearer inversion between the correlated and anticorrelated RDSs (the amplitude ratio of 0.31; Fig. 11*D*). These results suggest that the population readout averages out the disparity tuning for anticorrelated RDSs in both V4 and MT. However, MT, but not V4, contains a subpopulation of neurons whose population readout is reminiscent of the correlation-based representation of disparity.

We also analyzed the population readout for a group of discarded cells in our V4 data set (*n* = 66). Despite their significant visual responses, we discarded these cells because of the lack of significant disparity selectivity at 100% correlation (Kruskal-Wallis test; *P* > 0.05). One can argue that observing the largest tuning amplitude at 100% correlation in the previous analysis (Fig. 10*A*) may have resulted from this selection procedure. We addressed this issue by constructing the population readout for the discarded cells, as we did for our main data set. We found that the discarded cells and the main data set shared an important feature: the tuning amplitude was the largest at 100% correlation. At least some of the 66 discarded cells had disparity selectivity that was too weak to reach our statistical criterion on an individual neuron basis. However, the selectivity became obvious after aggregation into the population readout, particularly for correlated RDSs. The results suggest that larger disparity tuning amplitudes at higher correlation levels reflect the genuine nature of visually responsive V4 neurons.

## DISCUSSION

We examined the representation of stereoscopic depth in macaque area V4 using the graded anticorrelation of RDSs. The amplitude of individual disparity tuning curves decreased sharply with graded anticorrelation (Fig. 5), whereas the width of the phase-difference distribution increased more gradually (Figs. 7 and 8). The amplitude of the population-pooled tuning curve, which combined both factors, gradually decreased toward zero with graded anticorrelation (Fig. 10), closely reflecting the proportion of binocularly matched features (Fig. 1*C*, gray). We suggest that the V4 population-pooled response, but not the single-cell response, represents a solution to the stereo correspondence problem.

#### Effects of our stimulus manipulation on visual perception other than stereoscopic depth.

We manipulated the binocular disparity and correlation of RDSs to examine the neuronal processing of depth perception. In addition to depth perception, other aspects of visual perception may also have varied. First, the surrounding annulus might be perceptually segregated from the center disk when the center disk had a nonzero disparity or a correlation level smaller than 100%, because the surrounding annulus always had 0° disparity and 100% correlation. Second, the perceived size of the center disk might be larger for a more crossed disparity, because the angular size of the center disk was kept constant (Tanaka and Fujita 2015). The neuronal responses that we observed might be exclusively associated with these aspects other than depth perception. However, this is unlikely because the responses of V4 neurons measured with similar RDSs are causally linked to the perceptual judgment of stereoscopic depth (Shiozaki et al. 2012). Also, we used a similar set of RDSs and graded anticorrelation in previous psychophysical studies of depth perception (Doi et al. 2011, 2013) and derived the predictions shown in Fig. 1*C*. Thus our predictions have already taken into account epiphenomenal changes in perception that our stimulus manipulation may cause.

#### Advancement from a previous V4 study that used correlated and anticorrelated RDSs.

The disparity selectivity of V4 neurons was previously examined with correlated and anticorrelated RDSs (Tanabe et al. 2004): the mean amplitude ratio among individual V4 neurons (0.38) is slightly smaller than that among V1 neurons (0.52; Cumming and Parker 1997). The phase difference is uniformly distributed without any apparent peak, in contrast to the distribution with a clear peak at π in V1.

We used not only correlated and anticorrelated RDSs but also graded mixtures of these two extremes and revealed new characteristics of V4. First, the tuning phase shifted (phase-difference distribution became wider) more gradually than the tuning amplitude reduced with graded anticorrelation (Fig. 9). Importantly, human observers decrease their disparity discrimination performance over the range of negative correlations (Doi et al. 2011). The amplitude of individual neurons' tuning curves decreased mostly over the positive range, whereas the phase shifted over not only the positive but also the negative range of binocular correlation, suggesting that phase shift is a better correlate of the change in perceptual performance.

Second, the population but not individual tuning curves had an invariant shape across correlation levels (Fig. 10, *A* and *B*). The tuning invariance is a key feature for the flexible readout of sensory information (Jazayeri 2008; also see Anderson et al. 2000; Cadieu et al. 2007; Finn et al. 2007; Ison and Quiroga 2008; Rust and DiCarlo 2012). Our data indicate that even when tuning invariance is not maintained at the single-neuron level, the population representation can retain it, provided that individual neurons change their tuning shapes in different yet balanced ways.

#### A possible mechanism underlying observed amplitude reduction and phase shift.

The disparity tuning curve of V4 neurons gradually shifted phase (shape) and quickly reduced amplitude with graded anticorrelation (Fig. 4). We propose that a combination of modified disparity energy models (subunits) can reproduce the observed tuning curves (Fig. 12*C*). The original energy model (Ohzawa et al. 1990) decreases the tuning amplitude without changing the tuning shape as the correlation level decreases from 100% to 0%. The tuning curve is flat at 0% correlation and grows in an inverted shape as the correlation level further decreases from 0% to −100% (Fig. 1*B*, *left*). Threshold nonlinearity operating on the output of energy model can discard the selectivity for aRDSs (Fig. 12*A*; Lippert and Wagner 2001; Nieder and Wagner 2001; but see Samonds et al. 2013 for the involvement of recurrent processing). The same nonlinearity can also explain the reduced but nonzero tuning amplitude at 0% correlation (Doi and Fujita 2014), because the additional nonlinearity can convert disparity-dependent variability in energy-model responses into the modulation of mean firing rate (Doi et al. 2013). A limitation of this threshold energy-model is that the tuning curve must be even symmetric. To explain the reduced amplitude for odd-symmetric tuning, we need to combine two even-symmetric subunits (Haefner and Cumming 2008). In our example shown in Fig. 12*B*, we subtracted the responses of a far-preferring subunit from those of a near-preferring subunit to realize an odd-symmetric subunit. These subunits had lower thresholds than the subunit shown in Fig. 12*A*, so the response pattern was intermediate between match based and correlation based: reduced but clear selectivity at −100% correlation and nonzero but weak selectivity at 0% correlation. The subunit model still has either the original or inverted tuning shape, but not other shapes, at any correlation level.

We propose that the tuning curves observed in V4 can be explained by combining the units shown in Fig. 12, *A* and *B* (Fig. 12*C*; see Read et al. 2002 for a similar model using a unit with amplitude ratio >1). Our model is equivalent to a combination of three energy-model subunits, a straightforward extension of the combination of two (Haefner and Cumming 2008). As the correlation level decreased from 100% to 0%, the tuning amplitude quickly decreased, but the tuning shape was preserved. As the correlation level further decreased, the amplitude did not change, but the tuning shape shifted from even symmetric to odd symmetric. These response patterns agreed well with those observed for the example cell shown in Fig. 4*A*. Combining different kinds of subunits would reproduce the observed population data: graded anticorrelation quickly decreased the average amplitude ratio, whereas it gradually increased the width of phase-difference distribution (Fig. 9). Overall, we suggest that an extended subunit model that combines more than two subunits can explain disparity tuning of V4 neurons as a function of graded anticorrelation.

#### Comparison with other visual areas.

Our results indicate that the population readout of V4 responses is consistent with a solution to the stereo correspondence problem and distinctively advanced from the population readout of V1 responses. For anticorrelated RDS, the population readout was nearly completely flat in V4, but had clear tuning in V1. The difference between V4 and MT was more nuanced. When we selected neurons based on the same criterion (significant disparity selectivity for correlated RDSs), the MT population readout showed nearly flat tuning for the anticorrelated RDS with a hint of tuning-shape inversion. However, V4 and MT responses had several important differences. First, the distribution of phase difference had a peak around zero in V4, whereas the distribution had two peaks in MT: the main peak around π and the secondary peak around zero. Second, only a small fraction of V4 cells had disparity selectivity for anticorrelated RDSs (∼10%), whereas nearly half the neurons did so in MT (∼46%). Moreover, the population readout for these MT neurons showed clear tuning inversion from anticorrelation, consistent with an intermediate representation between correlation based and match based. These results suggest that V4 is unique in that disparity-selective responses in V4 do not contain primitive, correlation-based representation of disparity, whereas the responses in V1 and MT do. A more rigorous comparison between these areas will be possible when graded anticorrelation is applied to V1 and MT. Similarly to V4 and MT population responses, single-neuron responses in the anterior part of the IT cortex and AIP area lose disparity selectivity by anticorrelation (Janssen et al. 2003; Theys et al. 2012). Single neurons in these areas may integrate the responses from multiple V4 or MT neurons that have different tuning shapes for aRDS via direct or, more likely, indirect connections (Baizer et al. 1991; Borra et al. 2008; Nakamura et al. 2001; Ungerleider et al. 2008).

#### Roles of V4 and other visual areas in stereopsis.

Area V4 may underlie fine stereopsis. The population readout of V4 responses is consistent with the match-based representation of disparity, which underlies fine depth discrimination in humans (Doi et al. 2011). The population readout had a slope at around zero disparity (Fig. 10*A*; also see Tanabe et al. 2005 for a similar curve constructed with a larger number of cells and finer sampling points), which is useful in the discrimination of fine depth. This view is consistent with previous studies: V4 neurons are selective for relative disparity (Umeda et al. 2007), a prerequisite for contributing to fine stereopsis (Uka and DeAngelis 2006); electrical microstimulation of V4 biases fine depth discrimination toward the disparity preference of the stimulated site (Shiozaki et al. 2012).

It remains unclear whether the population readout of V4 quantitatively matches fine depth discrimination in humans. When the correlation level decreased to 0%, the amplitude ratio of the population readout decreased to a small value (0.15), although the discrimination performance of humans remained perfect (Doi et al. 2011). The difference in stimulus eccentricity can explain at least part of the discrepancy: the average eccentricity in this study was 9.0°, whereas RDSs were presented foveally in the human study. The discrimination performance at 0% correlation decreases with stimulus eccentricity (Doi et al. 2013), possibly because the construction of the match-based disparity representation relies on small receptive fields (Doi et al. 2013; Doi and Fujita 2014). Recording from V4 while monkeys are discriminating the depth of gradually anticorrelated RDSs will be necessary to directly examine the issue.

Although direct evidence is lacking, V4 may also partially contribute to coarse stereopsis, along with area MT (DeAngelis et al. 1998; Uka and DeAngelis 2003, 2004, 2006), because human coarse discrimination relies on both match-based and correlation-based representations (Doi et al. 2011). Monkeys trained on fine discrimination retain the ability to discriminate coarse depth even when MT is inactivated (Chowdhury and DeAngelis 2008), possibly reflecting the contribution of V4 to coarse depth.

Individual neurons in IT and AIP represent a solution to the stereo correspondence problem (Janssen et al. 2003; Theys et al. 2012), like the population readout of V4 responses. However, the behavioral roles of these three areas may be different. The response fluctuation correlated with fine depth judgment emerges soon after stimulus onset in V4, suggesting a bottom-up contribution of V4 to fine stereopsis (130 ms; Shiozaki et al. 2012). By contrast, such responses emerge later during stimulus presentation in IT (360 ms; Uka et al. 2005), likely reflecting a top-down signal from decision-making areas (Nienborg and Cumming 2009). In a three-dimensional shape discrimination task, IT neurons show choice-related response fluctuations soon after stimulus onset (140 ms; Verhoef et al. 2010), whereas AIP neurons show such activity later during stimulus presentation (300–790 ms). The match-based representations in V4, IT, and AIP may serve different functions for stereopsis, such as fine depth discrimination, three-dimensional object recognition, and grasping (Theys et al. 2012), respectively.

In conclusion, we constructed a population readout for the disparity-selective responses of V4 neurons. The amplitude of the population readout gradually decreased with graded anticorrelation. The tuning-shape invariance was maintained at the level of the population readout, but not at the single-neuron level. We suggest that the disparity-selective response in area V4 is consistent with a solution to the stereo correspondence problem when combined across a population of neurons, but not individually. It remains to be determined which part of the human visual cortex represents disparity information in a similar manner as we revealed in the macaque V4, particularly because the human homologue of area V4 is not well established (Brewer et al. 2005; Hansen et al. 2007; Kolster et al. 2014). We also note that the present study and related psychophysical and computational studies (Doi et al. 2011, 2013; Doi and Fujita 2014) focused on an aspect of depth perception and the correspondence problem, near vs. far discrimination. Roles of area V4 in solving the correspondence process in a more general sense (Read and Cumming 2007) may be an important open question for future research.

## GRANTS

This work was supported by Ministry of Education, Culture, Sports, Science and Technology of Japan MEXT Grants 23240047, 23135522, and 15H01437 (to I. Fujita).

## DISCLOSURES

No conflicts of interest, financial or otherwise, are declared by the authors.

## AUTHOR CONTRIBUTIONS

M.A., T.D., and I.F. conception and design of research; M.A. performed experiments; M.A., T.D., and H.M.S. analyzed data; M.A., T.D., H.M.S., and I.F. interpreted results of experiments; M.A. prepared figures; M.A. drafted manuscript; M.A., T.D., H.M.S., and I.F. edited and revised manuscript; M.A., T.D., H.M.S., and I.F. approved final version of manuscript.

## ACKNOWLEDGMENTS

We thank B. G. Cumming, K. Krug, and A. J. Parker for giving us an opportunity to share their data on responses of areas V1 and MT, and S. Aoki and K. Ikezoe for technical assistance. A monkey was provided by the National Institute of Natural Sciences (NINS) through the National Bio-resource Project (NBRP) of the Ministry of Education, Culture, Sports, Science and Technology (MEXT) of Japan.

Present address of M. Abdolrahmani: Laboratory for Neural Circuits and Behavior, RIKEN Brain Science Institute, 2-1 Hirosawa, Wako, Saitama 351-0198, Japan.

Present address of T. Doi: Department of Neuroscience, University of Pennsylvania, Philadelphia, PA 19104-6074.

Present address of H. M. Shiozaki: Laboratory for Circuit Mechanisms of Sensory Perception, RIKEN Brain Science Institute, 2-1 Hirosawa, Wako, Saitama 351-0198, Japan.

- Copyright © 2016 the American Physiological Society