Stereo processing begins in the striate cortex and involves several extrastriate visual areas. We quantitatively analyzed the disparity-tuning characteristics of neurons in area V4 of awake, fixating monkeys. Approximately half of the analyzed V4 cells were tuned for horizontal binocular disparities embedded in dynamic random-dot stereograms (RDSs). Their response preferences were strongly biased for crossed disparities. To characterize the disparity-tuning profile, we fitted a Gabor function to the disparity-tuning data. The distribution of V4 cells showed a single dense cluster in a joint parameter space of the center and the phase parameters of the fitted Gabor function; most V4 neurons were maximally sensitive to fine stereoscopic depth increments near zero disparity. Comparing single-cell responses with background multiunit responses at the same sites showed that disparity-sensitive cells were clustered within V4 and that nearby cells possessed similar preferred disparities. Consistent with a recent report by Hegdé and Van Essen, the disparity tuning for an RDS drastically differed from that for a solid-figure stereogram (SFS). Disparity-tuning curves were generally broader for SFSs than for RDSs, and there was no correlation between the fitted Gabor functions' amplitudes, widths, or peaks for the two types of stereograms. The differences were partially attributable to shifts in the monocular images of an SFS. Our results suggest that the representation of stereoscopic depth in V4 is suited for detecting fine structural features protruding from a background. The representation is not generic and differs when the stimulus is broad-band noise or a solid figure.
Tests of neuronal responses with a dynamic random-dot stereogram (RDS) isolate neuronal sensitivity to horizontal binocular disparity. Cells with disparity-tuned responses to RDSs are found in various visual cortical areas of the monkey brain, from as early as the striate cortex (V1) to as high as the posterior parietal and the inferior temporal (IT) cortices (Janssen et al. 2001; Poggio et al. 1985; Taira et al. 2000). Disparity-selective responses in extrastriate areas are more closely linked to the perception of stereoscopic depth than responses in V1 (Bradley et al. 1998; DeAngelis at al. 1998; Dodd et al. 2001; Janssen et al. 2003; Tanabe et al. 2004; Thomas et al. 2002; Tsutsui et al. 2002; Uka and DeAngelis 2003). Characterizing the disparity-tuning properties of neurons in each cortical area advances our understanding of the neuronal processes underlying stereoscopic depth perception in two ways. First, for the encoding of stereoscopic depth, systematic differences between areas help us understand how responses in higher areas of the primate visual processing hierarchy are derived from responses in lower areas. Second, for the decoding of depth signals, population characteristics indicate how the representation of stereoscopic depth in a given area can be used to form a perceptual judgment about stereoscopic depth. Quantitative analyses of neuronal disparity-tuning characteristics revealed several differences in the stereoscopic depth representations in V1 and the middle temporal area (MT/V5) (DeAngelis and Uka 2003; Prince et al. 2002a). In this study, we quantitatively characterize the disparity selectivity of neurons in area V4.
Area V4 is a major stage in the ventral visual processing pathway, which projects from V1 to IT of the monkey extrastriate visual cortex. It is involved in object and color vision (Heywood et al. 1992; Merigan 1996; Schiller 1993), as well as more cognitive functions such as attention and visual search (Connor et al. 1997; Mazer and Gallant 2003; McAdams and Maunsell 1999; Moore and Armstrong 2003; Motter 1994; Ogawa and Komatsu 2004). V4 neurons are also sensitive to binocular disparity, suggesting that they carry information about stereoscopic depth (Hegdé and Van Essen 2005; Hinkle and Connor 2001,2002; Tanabe et al. 2004; Watanabe et al. 2002).
Some studies used solid-figure stereograms (SFSs) to test the disparity selectivity of V4 neurons (Hinkle and Connor 2001, 2002; Watanabe et al. 2002). An SFS is a reasonable stimulus choice for V4 cells because these neurons are selective for the form and pattern of a stimulus (Desimone and Schein 1987; Gallant et al. 1996; Kobatake and Tanaka 1994; Pasupathy and Connor 1999). Additionally, local features of natural images often contain contrast edges similar to those in solid figures. Tests with SFSs may thus be relevant for understanding depth coding in V4 in some respects, but do not probe genuine disparity selectivity. On the other hand, dynamic RDSs provide a test to isolate neuronal sensitivity to binocular disparity from neuronal modulation caused by other factors.
When viewing an RDS, the stereoscopic system extracts a globally consistent match from the numerous possible matches between the visual patterns projected to the left and right eyes (Julesz 1971). A shift of the depth of the visual target is visible only with the stereoscopic system and is invisible with a monocular system because there is neither a positional shift of the random-dot patch nor a coherent motion of dots because of the constant renewal of the dot pattern. On the other hand, much less computational complexity is required when viewing an SFS. The matching of the corresponding patterns in the two eyes is unique for an SFS. Even a monocular system is sensitive to a shift in the depth of an SFS because the shift is always associated with a positional shift of the monocular image. Thus with an SFS, it is difficult to isolate responses to binocular disparity from responses to positional shifts in monocular features.
To fully characterize stereo processing in V4, it is thus important to study the disparity tuning of V4 neurons using both an RDS and an SFS. V1 and V2 neurons reportedly exhibit similar disparity tuning for RDSs and for SFSs (Gonzales and Perez 1998; Poggio 1990; Poggio et al. 1985), although no quantitative population analyses are published. A recent study examined the disparity tuning of V4 neurons with both RDSs and SFSs (Hegdé and Van Essen 2005), and showed that the type of stereogram influences the disparity tuning of most V4 cells. In the present study, we addressed two issues regarding the comparison of disparity tuning for an RDS and for an SFS. First, we examined the differences in the parameters that characterize the disparity-tuning curves. Second, we examined whether the effect of the positional shifts in the monocular images of an SFS can explain the differences between these curves. We also addressed the functional organization of disparity-selective cells in V4 by analyzing the single-unit and multiunit responses recorded simultaneously at the same sites.
Two female and one male Japanese macaque monkeys (Macaca fuscata) were used. Details of the surgical procedure have been published elsewhere (Uka et al. 2000). Briefly, under full anesthesia and aseptic conditions, stainless steel and plastic bolts were screwed into holes drilled through the skull. Acrylic resin was mounted over the skull, firmly connecting the bolts with a plastic post for head restraint. A custom-made plastic recording chamber was placed 5 mm posterior and 25 mm dorsal to the ear canal. After at least 1 wk of recovery, scleral search coils made of Teflon-insulated stainless steel wires (Cooner Wire, Chatsworth, CA) were implanted into both eyes. All surgical, experimental, and care protocols conformed to the National Institutes of Health Guide for the Care and Use of Laboratory Animals (1996), and were approved by the Animal Experiment Committee of Osaka University.
Task and visual stimulation
A monkey was seated with its head restrained in a primate chair. A computer display (NuVision 21MX, MacNaughton, Beaverton, OR) was set 57 cm away from the monkey's eyes. The display subtended 40 × 30° of the monkey's visual field. We trained the monkeys to perform a simple fixation task, which was controlled using a commercially available system (TEMPO, Reflective Computing, St. Louis, MO). When a white fixation point (0.2 × 0.2°) appeared at the center of the screen, the monkey was required to make an eye movement toward it within 500 ms. During the next 2 s, the monkey had to maintain fixation within an invisible window that was typically a 1.4 × 1.4° square centered at the fixation point. The vergence was restricted to be within ±0.5° relative to the fixation plane. After successful trials, the monkey was rewarded with a drop of juice or water. When the monkey's eyes moved away from the fixation or the vergence window, the task was aborted immediately and no reward was delivered. Intertrial intervals were randomly selected to be 0.5, 1.0, or 1.5 s at an equal probability.
We developed a visual stimulation program using the OpenGL Utility Toolkit (GLUT). A dynamic RDS was presented for 1 s from 500 ms after the onset of the fixation period to 500 ms before the offset (i.e., the RDS period was centered on the 2-s fixation period). The random-dot pattern was composed of 50% bright (3.5 cd/m2) and 50% dark (0.4 cd/m2) dots (Fig. 1A). All the dots had a size of 0.17 × 0.35° and were positioned following a uniform distribution. The dot density was 26%. Antialiasing was accomplished by the hardware of the video board. The random-dot pattern was renewed every five frames (12 Hz). The RDS was a bipartite patch consisting of a center disk whose disparity was varied between ±1.6 or ±1.2° and a surrounding annulus whose disparity was fixed at zero. The width of the annulus was 1°. Because the largest disparity tested in the present experiments was 1.6°, the largest shift of the center disk was +0.8° in one eye and −0.8° in the other eye. The width of the annulus ensures that no dots composing the disk appear outside the random-dot patch (i.e., no shift occurs in the patch position). The background was a uniform field of midlevel luminance (1.5 cd/m2). For visual stimulation, we illuminated only the red phosphors because their relatively quick decay time resulted in minimal interocular cross talk. With our dichoptic display device in which a liquid-crystal filter alternated the polarity of the display every frame (120 Hz), the left-to-right cross talk was 10% and the right-to-left cross talk was 0% (Tanabe et al. 2004).
After the monkeys were sufficiently trained, a hole for electrode insertion was drilled through the skull inside the recording chamber.
On recording sessions, a micromanipulator (MO-95S, Narishige, Tokyo, Japan) with a tungsten-in-glass microelectrode was attached to the recording chamber. Extracellular voltage signals were amplified and filtered with custom-made instruments. Action potentials of single units were isolated by either a custom-made window-discriminator or a template-matching spike-sorting system (Multi Spike Detector, Alpha-Omega Engineering, Nazareth, Israel) and were recorded at 1-ms resolution. In parallel, background multiunit activity and the v-sync pulses of the visual stimulation were recorded. Eye positions were monitored with the magnetic search coil technique (MEL-25, Enzansi Kogyou, Tokyo, Japan). Eye position signals were recorded at a 1-kHz sampling rate. Before recording, we located area V4 inside the recording chamber of each monkey based on the retinotopic map, the size-eccentricity relationship of receptive fields (RFs), and the surrounding sulci (Gattass et al. 1988; Watanabe et al. 2002).
When single-unit spikes were isolated, we determined the position of the cell's RF and the effective stimulus. The probe stimulus was selected from a small bright bar, a small RDS patch, and a drifting sinusoidal grating, based on which stimulus evoked the highest response. We used this stimulus to map the cell's RF. In the initial experiments, we mapped the RF with only a probe stimulus at zero disparity. In later experiments, we tried to maximally drive the cell by testing both crossed and uncrossed disparities in addition to zero disparity for the initial survey of the cell's receptive field. After the RF was determined, we presented an RDS that covered the entire RF. If an RDS of this size evoked a sufficiently strong response, we went on to the recording session; otherwise, we reduced the size of the RDS patch until a sufficient response was evoked. The eccentricity of the RFs ranged from 3.7 to 15.0°. The minimum patch diameter of the RDSs used in the experiments was 4°. In the recording sessions, the tested disparity values and the left and right monocular presentations were randomly ordered. All stimulus conditions were tested at least once before going on to the next block of trials.
In a subset of the tested cells, we examined the horizontal disparity tuning for a bright-bar SFS (Fig. 1B). The length, width, and orientation of the bar were manually adjusted to effectively drive the cell at zero disparity. The bar was typically slightly shorter than the diameter of the RF (range of bar length: 2.3 to 7.4°). A typical bar width was 0.4° (range: 0.2 to 1.4°). In addition to the disparity-tuning test, we examined the responses to monocular presentations of the SFS. The monocular images were the same images for one of the eyes in the disparity-tuning test. For instance, the left monocular image corresponding to an uncrossed disparity of 0.2° was a bar shifted 0.1° to the left from the center. The trials for the disparity tuning for an RDS, the disparity tuning for an SFS, the left monocular shift tuning for an SFS, and the right monocular shift tuning for an SFS were interleaved randomly in the same recording session. Neither orientation disparities nor vertical disparities were presented in this study.
After recording experiments were completed, we histologically reconstructed the recording site in one of the three monkeys. The monkey was overdosed with pentobarbital sodium (64 mg/kg of BW), and then transcardially perfused with phosphate-buffered saline followed by 4% paraformaldehyde. A metal pin was inserted into the brain at each corner of the recording chamber (four black dots in Fig. 1C). We removed the brain from the skull and cut out a block of brain tissue circumscribed by the four pins. The brain tissue was immersed in a graded series of sucrose solutions (10–30%) for 3 days. Then it was frozen, cut into 50-μm sections, and mounted on gelatin-coated glass slides. After the sections were dried, they were stained with standard Nissl staining methods. We found scars from electrode penetrations only in the prelunate gyrus (shaded area in Fig. 1C). The recording site was histologically identified as area V4.
The spike train of each trial was aligned to the onset of the visual stimulus using the presentation command signal and the following v-sync pulses. The response was evaluated as the firing rate from 80 ms after the onset to 80 ms after the offset of the visual stimulus. The 80-ms delay compensated for the typical neuronal firing latency in V4 (Tanabe et al. 2004). For reliable statistics, cells were discarded from the analysis unless at least six trials with good spike isolation were accumulated for each stimulus condition. For most cells, data were accumulated for ten trials.
For an assessment of disparity selectivity, we calculated the disparity discrimination index (DDI) (DeAngelis and Uka 2003; Prince et al. 2002a). The firing rate from each trial was square-root transformed, and the DDI was calculated as follows where Rmax and Rmin are the maximum and minimum mean responses among the disparities tested, SSE is the sum of the squared error of the responses at each disparity, N is the total number of trials, and M is the number of disparities tested.
We fitted Gabor functions to quantitatively analyze the disparity-tuning characteristics of V4 cells. A Gabor function is expressed as follows where x is the disparity and R is the firing rate. All fitting calculations were done with the square-root values of both the response data and the function to reduce the proportionality of the variance to the mean of the firing rates (Prince et al. 2002a). The merit function of the fitting procedure was the summed squared error of the responses from each trial to a half-wave–rectified Gabor function. The rectification allowed appropriate fitting results even when neuronal responses were clipped at a zero firing rate.
The parameter combination that minimized the merit function was obtained with the “fmincon” function of the Optimization Toolbox in MATLAB (The MathWorks, Natick, MA). This function constrained the parameter values to prevent unreasonable fitting results. The vertical offset (y0) was constrained to values between zero and the maximum response of all the trials. The amplitude of the Gaussian envelope (A) was constrained between zero and twice the difference between the maximum and minimum responses of all the trials. The horizontal offset of the Gaussian envelope (x0) was constrained to values within the disparity range being tested. The width of the Gaussian envelope (σ) was constrained to values between 0.1° and the total range of tested disparities. The frequency of the cosine carrier (f) was constrained to ±10% of the frequency at the primary peak of the power spectrum of the raw tuning curve. The phase of the cosine carrier (φ), relative to the center of the Gaussian envelope, was constrained to be within ±3π.
Direct comparison of the fitted Gabor function parameters of the disparity-tuning curves for RDSs and for SFSs was made by simultaneously fitting the two sets of data. x0 and φ both reflect the horizontal position, whereas f and σ both reflect the width of a disparity-tuning curve. Thus very similar curves can be obtained with different combinations of these parameter values (Prince et al. 2002a). To improve comparisons of the fitted parameters of the two curves, both x0 and σ were shared by both curves. The other four parameters, y0, A, f, and φ, were fitted independently using the six constraints described above. For the merit function, we calculated the summed squared error of the raw firing-rate values with respect to the raw values of the Gabor function, rather than their square-root values because a square-root transformation would not be defined for the negative values we obtained when we subtracted responses to monocular presentations from responses to binocular presentations. Details of the subtraction are described in the results section.
The phase parameter φ indicates how much the disparity-tuning curve is displaced from the center of the Gaussian envelope and gives an estimate of the symmetry of the curve. Estimation of symmetry with φ, however, is sometimes misleading when the cycle of the carrier 1/f is much larger than the envelope width σ (Prince et al. 2002a). The symmetry phase is a better estimate to evaluate the symmetry of the fitted function itself (Read and Cumming 2004). To obtain the symmetry phase, we first calculated the centroid of the fitted function, which is given by where R(x) is the fitted function after subtraction of the baseline parameter y0. This centroid was the point along the disparity axis where symmetry was evaluated (Fig. 2 in Read and Cumming 2004). As the reflection of R(x) with respect to x̄ is Rrefl(x) = R(−x + 2x̄), a completely even-symmetric tuning would satisfy R(x) = Rrefl(x), whereas a completely odd-symmetric tuning would satisfy R(x) = −Rrefl(x). Thus the contributions of the ideally even and the ideally odd components of R(x) were calculated as Reven(x) = [R(x) + Rrefl(x)]/2 and Rodd(x) = [R(x) − Rrefl(x)]/2, respectively. We then weighed the contributions of Reven(x) as the maximum deviation from zero E and of Rodd(x) as the peak value O. E was assigned a positive or a negative sign depending on whether the maximum deviation of Reven(x) was a positive or a negative peak, respectively. O was assigned a positive or a negative sign depending on whether the position of the peak of Rodd(x) was on the negative or the positive side of the abscissa with respect to the centroid, respectively. Finally, the symmetry phase was obtained as the angle (rad) formed from the projections of E and O onto two perpendicular axes for the even and odd components.
Our database included 260 cells recorded from the dorsal V4 (i.e., lower visual representation) in three monkeys (76, 29, and 155 from monkeys 1, 2, and 3, respectively). Cells that significantly responded to at least one of the tested disparities (t-test with correction for multiple comparisons) in an RDS were selected for assessment of their disparity selectivity. Over half of the visually responsive cells (142/224, 63%) exhibited significant selectivity for disparity (Kruskal–Wallis test, P < 0.05). We calculated the DDI values for all the visually responsive cells (n = 224; 62, 22, and 140 from monkeys 1, 2, and 3, respectively). The DDI values were distributed unimodally with a mean of 0.48 (SD 0.14) (Fig. 2A). V4 neurons did not fall into discrete classes of disparity-sensitive or -insensitive cells, and thus it might be appropriate to fairly quantify disparity tuning characteristics of all cells. However, it is difficult, if not meaningless, to interpret the disparity tuning of cells that were not significantly disparity sensitive. We thus selected cells with significant disparity tuning for further analysis.
To determine the preferred disparity of each significantly disparity-sensitive cell, we interpolated the disparity-tuning data with a spline curve. The disparity at the peak of the curve was determined to be the preferred disparity of that cell. Sixteen out of the 142 disparity selective cells (11%) whose preferred disparities were at either end of the tested disparity range were discarded from the analysis because the preferred disparities were likely to lie outside the tested range. This method for determining the preferred disparity was used throughout the rest of this paper. The distribution of preferred disparities exhibited a sharp peak near −0.4° (Fig. 2B). A large portion of V4 cells preferred small crossed disparities (−0.6° < disparity < 0°), whereas a small population of neurons preferred uncrossed disparities. The overall bias of cells in V4 for crossed disparities was consistent with previous studies that examined disparity selectivity with SFSs (Hinkle and Connor 2001; Watanabe et al. 2002).
Description of disparity-tuning characteristics
All cells with significant disparity sensitivity were fitted with a Gabor function (n = 142). To evaluate the goodness-of-fit, we calculated the R2 value for each fitted tuning curve. The disparity-tuning curve of most cells demonstrated an R2 value >0.6 (n = 133). The Gabor function provided a fairly good description of the disparity-tuning curve for most V4 cells. Nine cells with R2 <0.6 were discarded from the following analysis because a Gabor function was not adequate to describe their disparity-tuning profiles. Low R2 values were associated more with poor disparity sensitivity instead of bad fitting quality. This was supported by a highly significant correlation between DDI and R2 (Spearman's rank correlation rS = 0.48, P = 10−9; data not shown). Therefore the presence of cells with low R2 values reflects noisy data, not disparity-tuning curves that cannot be captured by a Gabor function (see also Tanabe et al. 2004).
Typical disparity-tuning curves possessed a pronounced peak at a small crossed disparity and a shallow dip at a small, uncrossed disparity. Although a few cells gave a low baseline response (Fig. 3A), most cells responded strongly even at nonpreferred disparities (Fig. 3B). Disparity-tuning curves with a clear dip (Fig. 3C) or a peak at an uncrossed disparity (Fig. 3D) were relatively rare. To capture the overall distribution of the disparity-tuning profiles from the population of V4 cells, we analyzed two Gabor function parameters: the center x0 and the phase φ. The center values were distributed tightly around a small crossed disparity (mean −0.21°) (Fig. 3E, top histogram). There was, however, a notable fraction of cells whose center value lay at either end of the tested disparities. These tuning curves were very broad or nearly monotonic. The distribution of the phases was relatively broad with a peak near π/3 and a dip near zero (Fig. 3E, right histogram). A scatter plot of φ versus x0 shows a single dense region, indicating that many cells had similar disparity-tuning profiles (Fig. 3E, center plot). The example cells, A through D, are indicated inside the plot to demonstrate that the first two examples were typical profiles of V4 cells and that the next two examples were relatively rare cases.
A neuron's functional significance is implied by the shape of its disparity-tuning curve. For example, cells with odd-symmetric disparity-tuning curves may be involved in controlling vergence eye movement to reduce the ambiguity of the binocular matching (Marr and Poggio 1979). We replotted the distribution of the phases φ on a polar coordinate graph, where the azimuth is the phase and the deviation from the center is the distribution density of cells (Fig. 4A). According to the conventional classification of cells based on the Gabor phase of the disparity-tuning curve, most V4 cells would be classified as “near-cells” because of their odd-symmetric disparity-tuning curves (DeAngelis et al. 1991; Prince et al. 2002b). This classification, however, did not fit with our intuitive impression of the symmetry of the disparity-tuning curves (e.g., Fig. 3A). Although the Gabor phase distribution indicated a bias toward odd-symmetric tuning curves, our observations found only tiny dips compared with the pronounced peaks in the curves. This discrepancy occurs when the fitted Gabor function has a long carrier period (1/f) compared with the envelope width (σ) (Prince et al. 2002a). For a better estimation of the symmetry of the actual curve, we calculated the symmetry phase (Read and Cumming 2004; see methods). Most cells whose Gabor phase lay between zero and π/2 were estimated to have a symmetry phase near zero (Fig. 4B). The other cells' phases were consistent across both measures of symmetry, although a sign inversion took place in the symmetry of one cell whose phase was near π. Consequently, the distribution of the symmetry phases of V4 cells exhibited a strong bias toward even symmetry (Fig. 4C). The distribution of the data points in the joint parameter space of the symmetry phase versus the centroid (Fig. 4D) is a better description of the overall disparity-tuning profile from the population of V4 cells than the distribution of the data points shown in Fig. 3E (the Gabor phase vs. the center value). Even in the joint parameter space of Fig. 4D, the examples shown in Fig. 3, A and B represent typical cases, and the examples shown in C and D are among the rare cases.
The characteristic disparity-tuning curve of V4 cells was even-symmetric with only a small trough. The disparity-tuning function may not necessarily require the periodic component given by the cosine carrier in the Gabor function. A Gaussian function, instead of a Gabor function, might suffice to describe the disparity tuning of many V4 cells. We fitted a Gaussian function to the disparity-tuning data and compared the quality of the fit to that of the Gabor function (Fig. 5). The R2 value was lower for the Gaussian fit (median 0.72) than for the Gabor fit (median 0.85). The better fit with a Gabor function simply reflects that the Gaussian function is only a special version of a Gabor function in which the carrier's period 1/f is very large. This analysis, however, revealed that only half of the cells (74/142, 52%) were fitted significantly better with a Gabor function (sequential F-test, P < 0.05). This comparison, in turn, showed that a Gaussian function was sufficient for the description of the disparity-tuning curve for the other half of the V4 cells.
The dependency of disparity tuning on RF eccentricity
To compare disparity-tuning characteristics between cortical areas, it is better to focus on neuronal populations with similar RF eccentricities. To expedite this analysis, we examined the dependency of three parameters that characterize the disparity-tuning curve on RF eccentricity. Because RF size increases with RF eccentricity, neurons with larger RF eccentricities should cover a larger range of disparities and have broader disparity-tuning curves. The eccentricities of the RFs of the analyzed neurons had a mean of 6.8° (SD 1.6°). The absolute value of the preferred disparity was weakly correlated with the RF eccentricity (Spearman's rank correlation, rS = 0.19, P = 0.03) (Fig. 6A). The distribution of the symmetry phases, however, did not change with the RF eccentricity (Fig. 6B). Unlike neurons in V1 and MT (DeAngelis and Uka 2003; Prince et al. 2002b; but see Durand et al. 2002), the Gabor frequency of V4 neurons did not correlate with RF eccentricity (Fig. 6C; rS = −0.08, P = 0.35), at least in the range of eccentricities tested in this study. We also examined the correlation between the DDI and RF eccentricity of the 142 cells with statistically significant disparity tuning. However, we found no correlation between them (r = −0.02, P = 0.76; data not shown).
Size-disparity correlation of human stereo performance holds only in relatively coarse spatial scales. The correlation does not exist at fine spatial scales exceeding 2.4 cycles/deg (Schor et al. 1984). Thus neurons involved in fine stereo information may not necessarily be size-disparity correlated. The absence of correlation between Gabor frequency and RF eccentricity may be reconciled if V4 neurons are involved primarily in fine stereo processing.
The range for optimal disparity discriminability
The typical disparity-tuning profile of V4 cells implies that most of these neurons are most sensitive to subtle disparity differences near zero disparity. To check this point, we calculated the slopes of each of the fitted disparity-tuning curves. It is difficult to directly compare how well a neuron can detect a disparity increment (i.e., how well a neuron discriminates between two adjacent disparities) at a given disparity pedestal with its discrimination between two adjacent disparities at another disparity pedestal because both the mean firing rate and the firing-rate variance depend on the disparity. To normalize the variance across different disparity pedestals, we square-root transformed the fitted Gabor function and then calculated its first derivative (Prince et al. 2002b). Four cells whose fitted curves were clipped at zero were discarded because derivatives are not defined for broken curves.
For the remaining cells (n = 129), we determined the disparity of the maximum absolute slope value. The distribution of the disparities at the maximum absolute slope exhibited a markedly sharp peak very close to zero disparity with the mean at −0.11° (SD 0.66°) (Fig. 7A). Most of the slope values composing the peak in the maximum slope distribution were negative values (Fig. 7B). Moreover, the absolute slope value averaged across the population of V4 cells demonstrated a conspicuous peak at a crossed disparity close to zero (Fig. 7C). The characteristics of the disparity-tuning profiles of typical V4 cells were preserved in the average disparity-tuning function for the population of V4 neurons (Fig. 7D). Unlike the conventional distributed representation scheme, the pooled response of a population of V4 neurons had a surprisingly strong response bias for a crossed disparity in an RDS.
Clustering of disparity selective cells
During the recording experiments, we monitored both single-unit (SU) and background multiunit (MU) spike waveforms. The detection threshold for the MU spikes was adjusted on-line so that the tails of the SU spikes with triphasic waveforms were not mixed with MU spikes. If this separation was not successful on-line, we subtracted the SU firing rate from the MU firing rate during off-line analyses. We discarded any MU data if the subtraction yielded negative MU firing rates. At 118 of the 224 sites where SUs gave significant responses, we succeeded in separating the MU activity from the SU activity (Fig. 8A). At recording sites where the SU displayed a high magnitude of disparity sensitivity, the MU also displayed a high magnitude of disparity sensitivity, and vice versa (Figs. 8B). To evaluate this tendency across the 118 sites, we calculated the DDI values for both SUs and MUs. Indeed, the DDI values of MUs strongly correlated with the DDI values of SUs (r = 0.64, P = 10−14) (Fig. 8C). This correlation indicates that the magnitude of disparity selectivity is shared by nearby cells within V4.
To examine whether nearby cells have similar disparity preferences, we obtained the preferred disparities for SUs and MUs recorded from the sites where both had significant disparity sensitivities (Kruskal–Wallis test, P < 0.05). The preferred disparities of SUs and MUs were weakly, but significantly, correlated (r = 0.30, P = 0.02) (Fig. 8D). The correlation between the preferred disparities of SUs and MUs suggests that adjacent cells inside the local clusters of disparity selective cells within V4 have similar disparity preferences. The correlations in Fig. 8, C and D do not contain monocular cue components because we used dynamic RDSs as stimuli. These results reinforced previous conclusions based on responses to SFSs (Watanabe et al. 2002) that V4 neurons with similar disparity selectivity tend to cluster.
Comparison of disparity tuning for RDSs and SFSs
Among the 260 cells in the initial database, we examined 57 cells for disparity tuning for both an RDS and an SFS. Of these cells, 56 cells were responsive to at least one of the binocular stimuli (t-test, P < 0.05 divided by the number of binocular stimuli).
Direct comparisons were made between the disparity-tuning curves obtained using the two types of stereograms. After superimposing the two disparity-tuning curves, we observed a number of differences. Many cells had a broader tuning curve and a higher baseline response for an SFS than for an RDS (Fig. 9A). For many cells, the end of the disparity-tuning curve for the SFSs did not fall near the baseline response level; thus the overall shape of the curve was nearly monotonic. Furthermore, many cells had different preferred disparities when they were examined with RDSs and with SFSs (Fig. 9B). Some cells exhibited large differences in the magnitude of their responses elicited by the two types of stereograms. We encountered cells that gave much stronger responses to an SFS than to an RDS (Fig. 9C) as well as cells that preferred an RDS to an SFS. Other cells exhibited an equal baseline response level to both types of stereograms, but modulated their responses depending on the disparity value only for one of the two types of stereograms (Fig. 9D). The disparity tunings measured with an RDS and an SFS can thus be drastically different from each other.
We compared the disparity discriminability of each cell tested using the two types of stereograms. There was a weak correlation between the DDI values for the responses to RDSs and the DDI values for the responses to SFSs (r = 0.31, P = 0.022) (Fig. 10). Based on a statistical test of disparity sensitivity (Kruskal–Wallis test, P < 0.05), we divided the cells into four groups. The first group was disparity selective for both RDSs and SFSs. The second and third groups were disparity selective only for RDSs or for SFSs, respectively. The fourth group was not disparity selective for either RDSs or SFSs. A DDI value was obtained separately for each of the responses to the two types of stereograms. The DDIs are plotted with different symbols to distinguish the four groups of disparity selectivity. Although there was a noticeable fraction of cells that were disparity selective for only one of the two types of stereograms, our results indicate that if a cell is disparity selective for one type of stereogram, it also tends to be disparity selective for the other type of stereogram.
For a quantitative assessment of the differences in the disparity-tuning curves obtained using the two types of stereograms, we fitted the disparity-tuning curves with Gabor functions. Two of the 51 cells that had disparity selectivity for either RDS or SFS were discarded based on low R2 values. The R2 in this section measured the goodness-of-fit of the combination of both curves (R2 < 0.6). Hereafter, R2 values were calculated from the two curves fitted simultaneously. Therefore these values cannot be compared with R2 values in the previous sections. We examined three parameters of the fitted Gabor function (Fig. 11, A–C). The amplitude parameter A, the frequency parameter f, and the phase parameter φ did not correlate between the disparity-tuning curves obtained using the two types of stereograms (Spearman's rank correlation rS = 0.013, P = 0.93; rS = −0.14, P = 0.35; rS = −0.031, P = 0.84 for A, f, and φ, respectively). The Gabor frequency parameter f was higher for the disparity-tuning curve obtained with an RDS than that obtained with an SFS (Wilcoxon's signed-rank test, P = 0.00023) (Fig. 11B). It is unlikely that our fitting constraint (i.e., the two curves shared the values of the parameters x0 and φ) confounded the results because all the results presented here were duplicated even when the two disparity-tuning curves were fitted independently.
We examined the preferred disparities of the two disparity-tuning curves for an RDS and for an SFS only if the cell was statistically disparity selective for both types of stereograms (Kruskal–Wallis test, P < 0.05). Most data points are located in the lower left quadrant of Fig. 11D, which indicates that most cells preferred crossed disparities in both SFSs and RDSs. The tendency of these neurons to prefer crossed disparities in an SFS is consistent with previous reports (Hinkle and Connor 2001; Watanabe et al. 2002). Because there was only one outlying point in the right-side quadrant, the preferred disparity in SFSs had a stronger bias toward crossed disparities than in RDSs. Overall, there was no correlation between the preferred disparities identified using the two types of stereograms (rS = −0.17, P = 0.45).
Correction for shifts in the monocular images of SFS
The disparity energy model is a simple description of neuronal selectivity for disparity at the early stages of stereo processing (Ohzawa et al. 1990). A static nonlinearity after binocular summation is the key factor for the disparity-tuned responses to dynamic RDSs (Anzai et al. 1999; see appendix). Because of this nonlinearity, responses to a binocular stimulus involve a binocular interaction component in addition to the sum of the respective signals from the two eyes (Fig. 12A; Ohzawa et al. 1997). Because a dynamic RDS is a spatiotemporal white noise, any neuronal sensitivity to a monocular feature is averaged out when the stimulus is a dynamic RDS. The monocular response components are constant across all disparities. The response modulation to an RDS directly reflects the binocular interaction component (Fig. 12A, top row). On the other hand, the monocular image of an SFS shifts its position depending on the disparity. Neuronal sensitivity to monocular features affects the monocular components as the disparity is changed. The response modulation to an SFS is the sum of the modulation of the monocular component and the modulation of the binocular interaction component (Fig. 12A, bottom row). To eliminate the effects of the monocular component in the response to an SFS for the recorded V4 neurons, we mimicked the binocular interaction component of the disparity energy model. This was done by subtracting the trial-averaged response to a monocularly presented SFS from the trial-by-trial responses to a binocularly presented SFS at each corresponding disparity.
The disparity energy model predicts that the baseline response level to a binocular stimulus is equal to the sum of the responses to the respective monocular stimuli. V1 neurons do not follow this prediction, and exhibit a baseline response level to a binocular stimulus that is the average of the responses to the respective monocular stimuli (Prince et al. 2002a). We examined how the responses to monocular stimuli compare with the responses to binocular stimuli in V4 by plotting the average response to all of the monocular presentations against the vertical offset y0, of the Gabor function fitted to the disparity-tuning curve for an SFS (Fig. 12B). y0 followed the average level of monocular responses (dashed line), which was exactly half the level predicted by the disparity energy model (solid line). Thus the binocular interaction of V4 neurons was primarily negative, as has been reported for V1 neurons.
Among the 57 cells whose disparity tunings were examined with both an RDS and an SFS, we recorded the responses of 50 cells to monocular presentations of an SFS to the left and right eyes in the same block as the binocular presentations. The disparity-tuning data for an SFS were corrected for monocular features by subtracting the sum of the mean firing rates elicited by the left and right monocular presentations at each corresponding disparity. Only one out of the 37 cells that originally had statistically significant disparity sensitivity for an SFS lost its sensitivity after correction of monocular shifts (Kruskal–Wallis test, P < 0.05). Using the resulting surrogate binocular interaction component for an SFS, we refitted the Gabor function for both RDS- and SFS-induced responses (Fig. 12C) of 44 cells that were statistically disparity sensitive for either RDSs or SFSs (Kruskal–Wallis test, P < 0.05). Seven cells were discarded because of inadequate fitting results (R2 < 0.6). Because of the subtraction of the responses to monocular stimuli, many of the surrogate binocular interactions for an SFS were negative. Although negative responses are not realistic, this does not affect the following analysis because we focus only on the modulation, and not on the baseline level of the disparity-tuning data.
We compared the parameters of the fitted Gabor function between the surrogate binocular interaction components obtained using RDSs and SFSs. The amplitude A and the frequency f did not correlate between the two types of stereograms (rS = 0.13, P = 0.45; rS = −0.20, P = 0.24 for A and f, respectively) (Fig. 13, A and B). The frequency f was significantly larger for an RDS than for an SFS (Wilcoxon's signed-rank test, P = 0.03). Although there was no correlation between the Gabor frequency f for the surrogate binocular interaction and the width of the bar in the SFS (rS = −0.21, P = 0.22), the difference in the frequency magnitude may partially be explained by the fact that the bar in the SFS was wider than the dots in the RDS (see appendix). These results were essentially identical to those obtained before subtracting the contribution of the monocular features (Fig. 11, A and B). In contrast, we found a weak, but significant, correlation in the Gabor phase φ, between the tuning curves for the two types of stereograms (rS = 0.41, P = 0.012) (Fig. 13C). This correlation held for the symmetry phase as well (rS = 0.41, P = 0.012) (not shown). Although the phases of the disparity-tuning curves were correlated, the preferred disparity of the surrogate binocular interaction for an SFS did not correlate with that for an RDS (rS = −0.075, P = 0.77) (Fig. 13D). Because the correlation in the phase was absent before subtracting the contribution of the monocular features (Fig. 11C), our data suggest that the modulations of the responses arising from positional shifts in the monocular images of an SFS contribute, at least in part, to the discrepancy in the disparity-tuning profiles obtained with an RDS or an SFS.
In this study, we examined the disparity-tuning properties of neurons in area V4 of alert, fixating monkeys. We found that the characteristic disparity-tuning curve of V4 neurons possessed a pronounced peak at a small crossed disparity and a steep slope at zero disparity. Cells with similar disparity-tuning properties were clustered within V4. The disparity-tuning profile obtained with an SFS was discrepant with the profile obtained with an RDS. Positional shifts in the monocular images of an SFS partially accounted for this discrepancy. The findings from this study demonstrate several striking differences between the disparity-tuning properties of V4 neurons and those of V1 and MT neurons. Comparisons of disparity-tuning properties across cortical areas will help uncover the relative contributions of these areas to stereo processing.
Functional architecture of stereo processing
The functional architecture of disparity tuning was addressed by analyzing the similarities between SU and MU responses. This approach revealed that disparity-selective neurons are columnarly organized in MT (DeAngelis and Newsome 1999). The V4 data of the present study showed a strong correlation between the SU and MU DDIs (rS = 0.64), which is similar to the correlation reported in MT (rS = 0.66) and stronger than the correlation reported in V1 (rS = 0.37) (Prince et al. 2002a). Therefore as in MT, clustering of disparity-sensitive cells and clustering of disparity-insensitive cells exist in the organization of V4, although this clustering was more continuous than discrete because the DDIs exhibited a unimodal instead of a bimodal distribution. Because we did not examine whether the clusters were arranged perpendicular to the surface of the cortex, we cannot address the existence of disparity columns in V4.
Within the clusters of disparity-sensitive cells, there was a weak correlation between the SU and MU preferred disparities (r = 0.30). The magnitude of this correlation is more similar to the correlation observed in V1 (r = 0.30) than the correlation observed in MT (r = 0.91). The small correlation found in V4 does not directly indicate that nearby cells possess a variety of preferred disparities. Neither the distribution of SU preferred disparities nor the distribution of MU preferred disparities covered the full range of tested disparities (Fig. 8D). Thus in V4, the low correlation between the SU and MU preferred disparities reflected only the prevalence of cells preferring a limited range of disparities.
The analysis of SU and MU responses was very similar to the one in Watanabe et al. (2002). However, Watanabe et al. used an SFS as their stimulus. The present results show that the disparity tuning of V4 cells drastically differs when the stimulus is an SFS than an RDS. It was an empirical question whether the way changes in disparity tuning are coherent among nearby cells when the stimulus is switched from an RDS to an SFS. It turned out that the changes are coherent because SU/MU correlation was observed with both RDS (this study) and SFS (Watanabe et al. 2002).
Even-symmetric disparity tuning in V4
Disparity-tuning profiles of cortical neurons are conventionally classified into distinct subtypes (Poggio and Fischer 1977). One qualitative characteristic of these subtypes is the symmetry of the disparity-tuning curve, which can be captured by the phase parameter of the fitted Gabor function (DeAngelis et al. 1991). A “near”- or a “far”-type neuron has an odd-symmetric disparity-tuning curve. These classes of neurons are implicated in vergence eye movements that bring the left and right retinal images into registration (Marr and Poggio 1979). A “tuned-near,” a “tuned-zero,” or a “tuned-far” neuron has an even-symmetric disparity-tuning curve. These classes of neurons are thought to act as disparity detectors (Marr and Poggio 1979). Few studies, however, have inferred the functional significance of neurons from their disparity-tuning profiles, especially because quantitative examination failed to find these discrete classes of neurons in V1 (see Cumming and DeAngelis 2001 for a review; Prince et al. 2002b).
Neurons in MT have a strong bias for odd symmetry, and neurons in the medial superior temporal area (MST) apparently have an even stronger bias for odd symmetry (see Cumming and DeAngelis 2001 for a review; DeAngelis and Uka 2003; Takemura et al. 2001). The disparity-tuning curves of neurons in V4 had Gabor phases between π/4 and π/2, which would classify them as odd symmetric if they were evaluated solely based on this parameter. The reevaluation based on symmetry phase, however, indicated that the actual V4 tuning curves were closer to even symmetric. Although the symmetry of MT and MST cells has not been evaluated based on symmetry phase, it is likely that the majority of their disparity-tuning curves are indeed odd symmetric because the published disparity-tuning curves have a clear trough together with a peak. In contrast to areas MT and MST, neurons with odd-symmetric disparity-tuning profiles were rare in V4. The symmetry estimate in the present study was not confounded by truncation of the tuning curve at zero firing rates. Our fitting procedure was allowed to search for the best-fitting truncated function. Nevertheless, it yielded only four cells that had clipped tuning curves. The observation that V4 neurons have predominantly even-symmetric disparity tuning does not directly indicate that V4 neurons serve as disparity detectors in the classical sense. Such disparity detectors are each assigned to signal the presence of a target with a particular binocular disparity (Marr and Poggio 1976). To sufficiently encode visual information with this interval-encoding scheme, the archetypal disparity detectors should cover an adequately wide range of stereoscopic depth (Lehky and Sejnowski 1990).
The preferred disparities of V4 neurons, however, were strongly biased toward a narrow range of disparities. Although there are more MT neurons that prefer crossed disparities than uncrossed disparities, the bias for crossed disparities in V4 appears to be stronger than is the bias observed in MT or V1 (DeAngelis and Uka 2003; Prince et al. 2002b). Our results suggest that the role of V4 neurons is different from the classical disparity detection. Rather than an interval-encoding scheme, we suggest revisiting a rate-encoding scheme for stereoscopic depth representation in V4. According to the rate-encoding scheme, depth is nearly proportional to the pooled firing rate of a population of relevant neurons, at least within a particular range of disparities. This scheme accounts for some aspects of psychophysical performance during stereoacuity judgments (Badcock and Schor 1985), but was rejected only because no physiological studies had reported disparity-tuning characteristics suitable for the rate-encoding scheme (Lehky and Sejnowski 1990). Because the pooled disparity-tuning curve exhibited a steep slope near zero disparity in this study, the population of V4 neurons had a tuning profile that should allow for rate encoding of fine stereoscopic depth.
What information is signaled by V4 neurons?
Because the preferred disparities were confined to a narrow range at a small crossed disparity, the positions on the disparity-tuning curves with the steepest slopes were confined to a narrow range near zero disparity. The sensitivity for subtle disparity increments is thus highest near a zero disparity pedestal. For an observer or a downstream neural system that has access only to the firing of V4 neurons, the largest benefit is attained when these responses are used to detect a subtle protrusion from the background.
This finding agrees with area V4 lesion studies. Ablation or pharmacological inactivation of area V4 leads to mild deficits in a number of visual detection and discrimination tasks and severe impairments in form discrimination and the detection of structured patterns embedded in a random background (Heywood et al. 1992; Merigan 1996; Schiller 1993). Our results demonstrated that many V4 cells were strongly activated when a stereoscopic percept of a disk is slightly protruding from a background. A protrusion is a salient and structured visual feature, and neuronal responses in V4 are suited to detect this. Because both stereoscopic depth contrast and chromatic contrast can trigger large responses in V4 (Schein and Desimone 1990; Zeki 1983), the visual attribute that renders the salience may not be important to V4 cells. Because neuronal firing in V4 predicts a saccadic eye movement toward the location of the neuron's RF (Mazer and Gallant 2003), V4 responses to salient features are likely used by the oculomotor system in more natural viewing conditions.
The validity of the neuron–orthoneuron hypothesis in V4 for stereoscopic depth discrimination
In principle, our results only suggest that V4 neurons are suited for the detection of stereoscopic depth. To address the role of V4 neurons more directly, it is important for future studies to examine neuronal responses while monkeys are engaged in a specific stereoscopic task. The design of future studies will critically rely on the framework of how V4 responses are read-out by a downstream system that is involved in decision making during a stereoscopic task. The data of the present study are informative for that framework, and in turn for devising a task that is appropriate for exploring the function of V4 neurons.
A recent study compared neuronal performance of MT cells recorded while a subject performed a task with the psychophysical performance (Uka and DeAngelis 2003). The monkey was trained to indicate whether it perceived a near or a far plane embedded in binocularly uncorrelated dots. The near and far planes were presented in successive trials, and the two planes were never presented simultaneously. As a consequence of the task design, this study evaluated the neuronal performance by assuming the existence of a hypothetical antineuron that had an exactly opposite disparity preference. The receiver-operating characteristic (ROC) analysis was based on the estimation of an ideal observer who randomly samples responses from the recorded neuron and the hypothetical antineuron (Britten et al. 1992). The assumption is supported by an earlier report that the preferred disparities of MT neurons are distributed sufficiently widely, ranging from crossed to uncrossed, although there is a small bias toward neurons preferring crossed disparities (DeAngelis and Uka 2003). This scheme could at least explain the psychophysical observation of a stereoscopic depth aftereffect (Blakemore and Julesz 1971). When a neuron decreases its responsiveness as a result of adaptation, an observer's percept of stereoscopic depth is dominated by the antineuron's signal. The assumption of an antineuron, however, may not hold for V4 in a stereoscopic depth discrimination task because cells in area V4 have a marked bias toward crossed disparities.
To reconcile the lack of antineurons in V4, a task in which monkeys discriminate the stereoscopic depth difference of two planes, test and background, presented simultaneously (stereoacuity task) may be more appropriate (Prince et al. 2000). The neuronal performance should be evaluated by assuming the existence of a hypothetical orthoneuron, rather than an antineuron (Prince et al. 2000). An orthoneuron has exactly the same disparity-tuning properties as the recorded neuron, but possesses an RF at a different position. The orthoneuron assumption holds even for area V4, where disparity preference is biased to crossed disparity. In the ROC analysis, an ideal observer samples the recorded neuron's response to the test disparity and the orthoneuron's response to the background disparity. Within this framework, the neuronal performance of most V1 neurons falls short of the behavioral performance (Prince et al. 2000). Our results showed that the steepest slopes on the V4 disparity-tuning curves were near zero disparity. Neuronal responses in V4 are beneficial signals during the stereoacuity task. If stereopsis operates in two modes as previously suggested (Tyler 1990), V4 may be involved in fine, global stereo processing (Neri 2005; Tanabe et al. 2004).
Ample psychophysical evidence indicates perceptual asymmetry between crossed and uncrossed disparity conditions. Crossed disparity requires less stimulus duration than uncrossed disparity for stereoscopic depth perception (Finlay et al. 1989; Landers and Cormack 1997; Manning et al. 1987; Patterson et al. 1995; Tam and Stelmach 1998). Therefore for a given stimulus duration, stereo threshold is lower for crossed than uncrossed disparity. To date, this difference in perceptual performance between crossed and uncrossed disparities has not been satisfactorily accounted for. The prevalence of cells preferring crossed disparity in V4 is, at least, consistent with the higher perceptual sensitivity to crossed disparity, and might underlie the perceptual bias.
Different stereoscopic depth representations for RDSs and SFSs in V4
Some neurons early in the stereo processing pathway have qualitatively similar disparity-tuning properties when they are presented with an RDS or an SFS (Gonzalez and Perez 1998; Poggio 1990). In V4, on the other hand, a recent study demonstrated that neurons have drastically different disparity tuning when they are presented with an RDS or an SFS (Hegdé and Van Essen 2005). A straightforward interpretation would be that the difference in the disparity tuning observed with the two types of stereograms arises from the processing performed by cortical networks that reside between V1 and V4. Analysis of the disparity energy model, however, gave a quantitative prediction that the monocular components of the V1 responses are modulated by disparity in an SFS (see appendix). The binocular interaction components obtained with the two types of stereograms could be identical. If they were identical, then the difference in the disparity-tuning curves for the two types of stereograms is superficial, and provides little evidence for additional processing beyond V1.
To eliminate the effects of monocular features on neuronal sensitivity, Hegdé and Van Essen (2005) jittered the position of the bar in the SFS. Ideally, the neuronal sensitivity to the position of the bar in the monocular images was averaged out in their study. In this study, to quantify the contribution of the positional shifts of the monocular images, we calculated the surrogate binocular interaction component of the V4 responses to an SFS. Our results confirmed that the surrogate binocular interaction component of a response to an SFS is different from the disparity tuning in response to an RDS, albeit the symmetry persisted weakly. This response property strongly deviates from the predictions of the disparity energy model.
In the disparity energy model, the binocular interaction component of the responses is shaped mainly by the underlying monocular RF profiles, and the monocular features of the stimulus only contribute to its amplitude. Thus monocular features of the stimulus are separable from the binocular interaction. The binocular interaction component of the responses of V4 neurons, however, was dependent on the monocular features of the stimulus, i.e., a broad-band noise or a solid figure. This type of nonlinearity either arises within the neuronal network residing between V1 and V4, or is already present in V1 neurons.
Our primary goal was to characterize the difference in the disparity tuning obtained with an RDS and an SFS. Besides the computational complexity and the cyclopean nature we considered, there are other differences involved between the RDS and the SFS used in the present study. For example, the RDS contains an adjacent depth plane at zero disparity, whereas the SFS does not. Sensitivity to relative disparity of V4 neurons might be involved in shaping the disparity tuning (Umeda et al. 2004). Also, the shape of the stimulus might affect the disparity tuning because area V4 plays a role in shape processing (Desimone and Schein 1987; Gallant et al. 1996; Kobatake and Tanaka 1994; Pasupathy and Connor 1999). In this study, the RDS patch was circular, whereas the SFS was rectangular. A more thorough understanding of the cause of the RDS/SFS differences will certainly require further studies.
In terms of using the neuronal responses to make a perceptual judgment in a stereoscopic task, the V4 responses are detrimental when the stereogram is switched from an RDS to an SFS. The downstream system that decodes stereoscopic depth would inevitably draw a different estimation of the depth when the type of stereogram is changed even when the disparity remains unchanged. In some cases, the two types of stereograms have quite different effects on perception. For example, anticorrelated RDSs evoke only little depth percepts, whereas anticorrelated SFSs evoke vivid depth percepts (Cogan et al. 1995; Cumming et al. 1998). Furthermore, adaptation to left–right reversing spectacles has a different effect when the subject is presented with an SFS or an RDS (Shimojo and Nakajima 1981). Stereo processing of an RDS and that of an SFS may be performed in different channels, although the contributing neurons could overlap.
In summary, we studied the binocular disparity tuning of V4 neurons to dynamic RDS stimuli. The disparity preferences of V4 cells were biased to crossed disparities, and the steepest slopes of the disparity-tuning curves were located near zero disparity. Nearby cells had similar disparity selectivities. The tuning profiles differed when the stimulus was an SFS instead of an RDS. These disparity-tuning properties suggest a role for V4 in stereoscopic tasks. Future studies should address the role of V4 more directly using a stereoscopic-discrimination task.
This section formulates the responses of the disparity energy model proposed by Ohzawa et al. (1990). Many of the mathematical details have been previously described for other stimuli such as sinusoidal gratings (Fleet et al. 1996; Qian 1994; Qian and Zhu 1997; Zhu and Qian 1996), static RDSs (Fleet et al. 1996; Qian 1994; Qian and Zhu 1997; Zhu and Qian 1996), dynamic RDSs (Prince et al. 2002a), SFSs containing a thin static bar (Mikaelian and Qian 2000; Qian 1994), and SFSs containing a thin sweeping bar (Anzai et al. 1999; Ohzawa et al. 1997). Here, we derive the analytical solution for the disparity tuning of a model complex cell, in response to a dynamic RDS and a thin static bar SFS. Unlike the analysis by Qian (1994) where the positional shift of a thin bar SFS occurred in only one of the monocular images, we assume that the positional shift of the thin bar in the SFS occurs symmetrically in both eyes relative to the center of the RF. The tested SFSs are projections of a bar modulated only in its depth. Previously derived solutions were based on a different way of applying disparity (Qian 1994). Our assumption fits with the way the SFS was presented in our experiments, and requires novel derivations of the solutions for the disparity-tuning curve. The solutions derived in this section were used in Fig. 12A. We discuss how the width of the bar affects the disparity tuning.
Binocular simple cells have RFs on both the left and right eyes. Let us denote the left and right RF profiles as WL and WR, respectively. Each eye drives the model simple cell with an intensity SL and SR. These values are the spatial summation of the stimulus contrast patterns (IL and IR) weighted by the RF profiles. Thus they are written as the integration across the visual field (A1) The binocular simple cell sums the inputs from the two eyes. This signal is then passed through a static nonlinear filter, which is a half-wave rectification followed by a squaring filter. For a positive binocular input, (SL + SR) > 0, the response R is (A2) Otherwise [i.e., (SL + SR) ≤ 0], R = 0.
Response to a dynamic RDS
In an RDS, the contrast patterns projected onto the left and right retinae are identical, except for the horizontal disparity d. Thus IL and IR can be rewritten as (A3) By substituting IL and IR in Eq. A1 with I, and the variable x as X = x + d, the response is given as (A4) where the variable Y was introduced to separate the integrals of WL and WR. When the stimulus pattern is a dynamic RDS, the mean discharge rate of this simple cell is the time average of R. Because the RF profile is constant over time, the time-averaged response is (A5) where 〈 · 〉 denotes the time-averaged value. 〈I(x, y)I(X, Y)〉 is the cross-correlation between the time series of the contrast values of the stimulus image at two points in space, (x, y) and (X, Y), at zero time delay. The special property of an RDS is that it is a spatiotemporal white noise. The cross-correlation is zero between any two points in space, except for x = X and y = Y, where 〈I(x, y)I(X, Y)〉 is the autocorrelation at zero time lag. For any point in space, this value is the signal energy (C2) of the stimulus image. We must remember, however, that we have assumed that the binocular summation is positive. The binocular summation is actually either positive or negative depending on the phases of the Fourier components of the stimulus image. Because of the white-noise characteristic of the RDS, the binocular summation has a positive value in half of the frames and a negative value in the other half of the frames. Thus the time-averaged response in Eq. A5 is rewritten as (A6) where δx=X, y=Y is the Krönecker's delta function. Here, we see that the monocular components in the first two terms of Eq. A6 are constants, whereas the binocular interaction component in the third term is a cross-correlation of the left- and right-eye RF profiles as a function of the horizontal disparity d. Only the amplitudes and not the disparity modulations of all the terms depend on the image I.
For algebraic simplicity, we simplify our model simple cell from a two-dimensional model to a one-dimensional one. We model the monocular RF profiles of the model simple cell by an even-symmetric Gabor function for both the left and right eyes as (A7) (A8) where σ is the envelope width and f is the carrier frequency of the RF profile. A model complex cell sums the outputs of four simple cell subunits. They are quadrature pairs; the phases of the cosine carrier describing the monocular RF structures are φ = O, π/2, π, and 3π/2, respectively. Substituting WL and WR in Eq. A6 with Eqs. A7 and A8, and summing <R> in Eq. A6 across all the four simple cells, we obtain the analytical solution for the disparity tuning of the model complex cell in response to an RDS as (A9) The first constant term is the sum of the monocular components. The second term is the binocular interaction component. The disparity modulation is present only in the binocular interaction component and reflects the monocular RF profile plotted as a function of disparity, despite differences in the amplitude and the vertical offset.
Response to a thin bar SFS
The response of the model complex cell to a thin bar SFS is the sum of the responses of all the four simple cells. This is given by (A10) If we denote the horizontal dimension of the binocular visual field along the frontoparallel plane as h, and the disparity as d, these values are related to xL and xR as (A11) This relation can be rewritten as (A12) By substituting xL and xR in Eq. A10 with Eq. A12, we have (A13) We assume that the bar is presented at the center of the monocular RFs at zero disparity. Thus the disparity tuning is defined as (A14) The first term is the sum of the monocular components. In contrast to the disparity tuning for a dynamic RDS in Eq. A9, the monocular components of the disparity tuning for an SFS depend on the disparity. The modulation by the monocular component reflects the Gaussian envelope of the underlying monocular RF profile (Fig. 12A). The interaction component in the second term is the same as for the disparity tuning for a dynamic RDS in Eq. A9, although the coefficient is different.
Response to a wide-bar SFS
When the solid figure is a wide bar instead of a thin one, the disparity-tuning curve does not simply increase in its gain. The area of spatial summation of each eye increases with the width of the bar w. The region in the binocular RF that is associated with the monocular images is a w × w square that is centered at a given disparity. Thus the response is computed by spatial summation of the binocular RF (Eq. A10) within the w × w square. The disparity-tuning curve for a wide-bar SFS is similar to a moving average of the disparity-tuning curve for a thin-bar SFS (Eq. A14). Consequently, the tuning curve, as well as the binocular interaction component, for a wide-bar SFS has a lower frequency content than either of the tuning curves for a thin-bar SFS or an RDS (Eq. A9). This analysis of the disparity energy model partially accounts for the lower frequency of the disparity-tuning curve for an SFS than that for an RDS that we found in our study of V4 cells (Figs. 11B and 13B).
This work was supported by grants to I. Fujita from the Ministry of Education, Culture, Science, Sports and Technology (13308046, 15016067) and the Toyota Physical and Chemical Institute. S. Tanabe was supported by the Japan Society for the Promotion of Science Research Fellowship for Young Researchers.
We thank I. Ohzawa, H. Tanaka, and T. Uka for helpful comments on the manuscript.
The costs of publication of this article were defrayed in part by the payment of page charges. The article must therefore be hereby marked “advertisement” in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.
- Copyright © 2005 by the American Physiological Society