The ventral pathway in visual cortex is responsible for the perception of shape. Area V4 is an important intermediate stage in this pathway, and provides the major input to the final stages in inferotemporal cortex. The role of V4 in processing shape information is not yet clear. We studied V4 responses to contour features (angles and curves), which many theorists have proposed as intermediate shape primitives. We used a large parametric set of contour features to test the responses of 152 V4 cells in two awake macaque monkeys. Most cells responded better to contour features than to edges or bars, and about one-third exhibited systematic tuning for contour features. In particular, many cells were selective for contour feature orientation, responding to angles and curves pointing in a particular direction. There was a strong bias toward convex (as opposed to concave) features, implying a neural basis for the well-known perceptual dominance of convexity. Our results suggest that V4 processes information about contour features as a step toward complex shape recognition.
Visual shape information is processed in the ventral cortical pathway, which runs from V1 to V2, V4, and finally into various subregions of inferotemporal (IT) cortex (Felleman and Van Essen 1991; Ungerleider and Mishkin 1982). At lower levels in this pathway (V1 and V2), shape is represented at least partly in terms of local orientation (Baizer et al. 1977; Burkhalter and Van Essen 1986; Hubel and Livingstone 1987; Hubel and Wiesel 1959, 1965, 1968). At the final stages in IT, cells are often selective for complex objects like faces and hands (Desimone et al. 1984; Gross et al. 1972;Perrett et al. 1982; Tanaka et al. 1991). To understand how lower-level orientation signals are transformed into complex object representations, it is important to study shape processing at intermediate stages like area V4.
Only a few studies have addressed shape processing in area V4.Desimone and Schein (1987) showed that many V4 cells are tuned for orientation, width, and length of bar stimuli and for orientation and spatial frequency of gratings, as in V1 and V2.Kobatake and Tanaka (1994) found that some V4 cells respond better to complex shapes than to simple bar stimuli.Gallant and colleagues (1993, 1996) demonstrated selectivity for curvilinear as well as linear gratings. These studies indicate that V4 encodes both orientation and higher-level shape information. The exact nature of the higher level information remains to be determined.
A primary goal in the study of shape processing at intermediate levels like area V4 is to identify the shape primitives or basic features represented at those levels. Many shape processing theories invoke contour features (angles and curves) as intermediate shape primitives (Attneave 1954; Biederman 1987;Dickinson et al. 1992; Milner 1974;Poggio and Edelman 1990; Ullman 1989). Contour features constitute a simple geometric step beyond individual oriented edges simpler, in some sense, than rectangular bars, which comprise 4 edges and 4 right angles in a specific arrangement. They are ubiquitous visual elements with high information content (Attneave 1954), their presence can be derived by combining individual edge orientation signals (Milner 1974), and they form natural parts for constructing more complex representations. Psychological findings imply the existence of specialized mechanisms for perception of contour features (Andrews et al. 1973; Chen and Levi 1996;Fahle 1997; Heeley and Buchanan-Smith 1996; Regan et al. 1996; Treisman and Gormican 1988; Watt and Andrews 1982;Wilson et al. 1997; Wolfe et al. 1992). Physiologists have studied responses to contour features at earlier stages in the ventral pathway (V1 and V2) (Dobbins et al. 1987; Hammond and Andrews 1978; Hegde and Van Essen 1997; Heggelund and Hohmann 1975;Hubel and Wiesel 1965; Versavel et al. 1990). It has been proposed that contour feature extraction is the ultimate purpose of endstopping (i.e., preference for terminated edges or lines) (Hubel and Livingstone 1987;Hubel and Wiesel 1965). For these reasons, we chose to study responses to contour features in area V4.
We designed a large parametric set of contour feature stimuli, illustrated in Figs. 1 and 2A. Each stimulus consisted of a single contour feature (angle or curve) or straight edge centered on the receptive field (RF) of the cell under study. Outside the RF, the stimulus edges continued and stimulus color gradually faded into the background gray, as if a spotlight was illuminating one portion of a larger object. In this way, a single contour feature could be presented essentially in isolation. This allowed us to examine whether some cells in V4 that might appear to be selective for more complex stimuli are actually sensitive to individual corners or curve segments. We found that a substantial fraction of V4 cells exhibit such lower-order specificity, suggesting that in some cases responses to complex shapes can be understood in terms of their constituent contour features.
We recorded spike activity from isolated V4 cells in the lower parafoveal representation on the surface of the prelunate gyrus and adjoining banks of the lunate and superior temporal sulci. Recording locations were initially based on skull landmarks and then adjusted on the basis of response properties, retinotopy, and inferred positions of the sulci. Other technical details have been described previously (Connor et al. 1997). All animal procedures conformed to National Institutes of Health and USDA guidelines and were carried out under an institutionally approved animal protocol.
Stimuli were presented on a computer monitor while the animal maintained fixation (on a small white dot) within a 0.5° radius window. Continuous fixation for 4.5 s was rewarded with a drop of juice. Each isolated cell was initially characterized by handplotting with colored rectangular bars and ellipses to find the approximate RF center and optimum bar orientation.
Color and width tuning were tested by presenting optimally oriented bars (with rounded endcaps) at the handplotted RF center in eight colors and five widths. The colors were red, green, blue, yellow, cyan, magenta, white, and black. All colors were adjusted to an approximate luminance of 20 cd/m2, except for blue (15 cd/m2) and black, and presented against a background gray of 2.5 cd/m2. The widths were 0.025, 0.05, 0.075, 0.1, and 0.125 times the average V4 RF diameter at the handplotted eccentricity [based on the relation between RF diameter and eccentricity reported by Gattass et al. (1988); their data suggest that average diameter equals approximately 1° + 0.625 × eccentricity]. The bar stimuli were flashed for 500 ms each and separated by 250-ms interstimulus intervals. During each trial a sequence of five stimuli was presented. Stimuli were presented in random order until each stimulus had been presented a total of three times. When cells were unresponsive to bar stimuli the optimum color was determined by handplotting.
When cells were responsive to bars, orientation tuning was tested using optimum color and width values derived from the previous test. Bar length was set to 0.5 times the average RF diameter and bar endcaps were rounded. Twelve orientations (15° intervals) were tested (5 repetitions each).
The RF was then plotted more precisely with small bar stimuli flashed at locations in a square grid covering a circular area with a diameter of 2.0 times the handplotted RF diameter and a spacing of 0.125 times the handplotted diameter. The bars were of optimum color, width, and orientation, with length equal to 0.25 times the handplotted diameter. Bars were flashed for 250 ms each with 500-ms interstimulus intervals. The grid locations were sampled once each in random order. The response plot was smoothed by means of local spatial averaging. The RF center was estimated by calculating the center of mass for all responses >50% of the maximum (75% for highly asymmetric plots). Many cells failed to respond in this test; in these cases, handplotting was used to estimate the RF center. Handplotting also was used in some cases where the remaining recording time for the day was limited.
Contour feature test
Figure 1 shows four example contour feature stimuli. The small white dot represents the fixation point, and the dashed circle (which was not part of the actual display) represents the estimated RF. In Fig. 1, A–C, the contour features are 90° sharp angles pointing toward the right. The 90° angle is rendered in white as a projecting convex corner (A), an outline (B), or a concave indentation (C). Smooth curve stimuli were B-spline approximations to the angles. Figure 1 D shows a 90° curve pointing to the right, with the B-spline control points indicated by diamonds. In all cases, stimulus color and brightness were constant within the RF and then gradually faded into the background gray over a distance equal to the RF radius, giving the impression of a spotlight illuminating one corner of a larger object. In this way, a single contour feature could be presented in isolation.
The stimuli were scaled according to average V4 RF diameter at the cell's eccentricity (dashed circles in Fig. 1), based onGattass et al. (1988) (see preceding text). Scaling with eccentricity in this manner ensures a generally consistent relationship between stimulus size, RF size, and acuity. In any case, stimulus size is not a major concern in this experiment because the stimuli consist of individual edges and corners, which have no real size, and the rest of the stimulus fades gradually into the background.
The full set of contour feature stimuli is shown in Fig.2 A. Stimuli were presented in the optimum color for the cell under study (shown here as white). Each stimulus consisted of a single contour feature defined by a sharp luminance/color boundary or a line of width equal to 1/16 the average V4 RF diameter at the cell's eccentricity. Stimulus luminance was constant within the RF (20 cd/m2, except for blue and black), then gradually faded into the background gray (2.5 cd/m2) over a distance of 0.5 times the average RF diameter (the full extent of fading is not shown in Fig.2 A). The stimulus set had four dimensions (the first 3 plotted horizontally and the 4th vertically):
The stimuli were rendered as convex projections (Fig. 2 A, left), concave indentations (right), or outlines (middle). Convexity/concavity was defined by considering the stimulus (shown in white here) to be the figure, based on its smaller size relative to the homogeneous gray background (see Fig. 1).
The stimuli were either sharp angles (on the left within each block of Fig. 2 A) or smooth curved B-spline approximations to the angles (on the right within each block; see Fig. 1 for details of B-spline construction).
The angles and their corresponding curves had three levels of acuteness (45, 90, and 135°), with the straight edges (180°) representing the limit at the obtuse end of the scale for both types of stimuli.
CONTOUR FEATURE ORIENTATION.
The features point in eight directions: upward (90°) in the top row, upper left (135°) in the second row, etc. Contour feature orientation is a circular dimension, and has been arbitrarily split in Fig. 2between 90° (top) and 45° (bottom). For most cells, stimuli were presented at the eight orientations shown in Fig.2. In cases where the preliminary bar orientation test revealed a strong tuning peak, the orientations of all the stimuli were rotated so that there would be straight edge stimuli at the preferred orientation.
Stimuli were flashed for 500 ms each and separated by interstimulus intervals of 250 ms. A sequence of five stimuli was presented in each trial. The entire stimulus set was presented in random order without replacement five times, except in one case where only three repetitions were completed.
Some cells were tested with a subset of the stimuli at five positions: at the RF center, and offset to the right, left, top, and bottom. The offsets were 0.175 times the average RF diameter, so that the total span in the horizontal and vertical directions was 0.35 times the average RF diameter. The selected stimuli included the contour feature evoking the strongest response and at least one other contour feature that contained the same component orientations (or a similar range of orientations for smooth curves) but elicited a weak response (see Fig. 12 for examples).
Response rates were calculated by counting spike occurrences within a 500-ms window beginning at stimulus onset. Background rate was derived in a similar way from null stimulus periods interspersed randomly among stimulus presentations in all tests. Background rates were typically low (average =1.9 spikes/s), and analyses with and without background subtraction yielded similar results. The results presented here are based on subtraction of average background rate from the response rates for each individual repetition of each stimulus.
We used quantitative indices of tuning strength and breadth to assess what kind of information, if any, cells might convey about contour features. We first averaged responses to each stimulus across repetitions to get a 3 (convexity) × 2 (curvature) × 3 (acuteness) × 8 (contour feature orientation) response function. (The 180° acuteness single edge and line stimuli were excluded from this analysis so as not to confound tuning for angles and curves with tuning for edge orientation. Exclusion of the edge stimuli made little difference; seeresults.) We next applied a peak-finding algorithm that identified compact, contiguous regions of the four-dimensional response function in which all stimuli evoked responses greater than half the maximum response (>HM). The region with the largest summed >HM response was designated as the primary peak. (The >HM sum was based on just the portions of the response rates above the half-maximum cutoff.)
Primary peak strength was defined as the primary peak's >HM sum divided by total >HM responses across all stimuli. A cell with a single large peak would have a primary peak strength of 1.0, whereas a cell with many separate small peaks would have a primary peak strength closer to 0. In contrast to some measures of tuning strength, like those based on the difference between maximum and minimum values, primary peak strength reflects specifically unimodal tuning. This was important for our data because multimodal tuning was likely to represent sensitivity to other dimensions such as edge orientation.Primary peak size was defined as the fraction of stimuli (of a total of 144, excluding single straight edges and lines) included within the >HM primary peak. This index is analogous to peak width at half height in a one-dimensional response function.
For those cells with clear overall tuning (based on high primary peak strength and low primary peak size values), we also characterized responses in each of the four stimulus dimensions separately. For each dimension, we generated a one-dimensional response function by summing across the other three dimensions. (The sums included all responses to individual stimulus repetitions that exceeded background, with no thresholding at half-maximum as in the peak determination.) The summed values were normalized by dividing by their average, rather than their maximum, so that response variation could be compared visually in terms of peak values (stronger tuning corresponds to higher peaks). A similar procedure was used to generate edge orientation tuning functions (for all cells), collapsing across the three convexity values to get four summed values that were again normalized by dividing by their average.
Significance of response variation was measured with randomization ANOVA (Edgington and Bland 1993; Manly 1991). Randomization tests rely less on assumptions about sampling, and they can be used to test the significance of derived measures like tuning indices (Manly 1991; cf.Connor et al. 1997; Gallant et al. 1996). A main effect F ratio was calculated for the original data. Then the response rates for individual stimulus repetitions were randomly permuted across the dimension in question (but within the other 3 dimensions), and the test statistic was recalculated. This procedure was repeated 10,000 times to yield a distribution of values expected on the basis of the null hypothesis (that the dimension in question had no bearing on response rates). The level of significance (P) was the fraction of randomly generated values greater than or equal to the original value.
Randomization was also used to test whether contour feature orientation tuning functions were consistent across different values of other dimensions, e.g., acuteness. This was done by first calculating the correlation coefficient between the two contour feature orientation functions in question. Then the pairing of values between the two functions was randomly permuted and the correlation coefficient recalculated 10,000 times. This procedure yielded a distribution of values expected on the basis of the null hypothesis (that there was no underlying correlation between the 2 functions). The level of significance was the fraction of randomized correlation values greater than or equal to the original value.
Another statistical question was whether the distribution of contour feature orientation tuning peaks was significantly nonuniform. This was assessed with a Monte Carlo version of Kuiper's test, which is a circular Kolmogorov-type analysis (Mardia 1972). The Kuiper's test statistic is the sum of the maximum positive and negative deviations of the observed cumulative distribution function from the hypothetical function (which in our case was the uniform distribution function). In each Monte Carlo simulation, a random function (with equivalent number of observations and discretization) was generated (under the assumption of uniformity), and the Kuiper's statistic was calculated. This produced a distribution of values expected on the basis of the null hypothesis (that the underlying distribution was uniform). The level of significance was the fraction of 106 randomly generated Kuiper's values greater than or equal to the observed value.
Contour feature tuning
We used the stimulus set shown in Fig. 2 A to test the responses of 152 V4 neurons with RF eccentricities ranging from 0.1 to 7.8°. Isolated contour features were generally more effective than edges or bars in driving V4 responses. For the large majority of cells (138/152 or 91%), the most effective stimulus in our test was a contour feature rather than a straight edge or line. On average, the strongest edge/line response was only about half the strongest contour feature response (average ratio 0.56). This must at least partially reflect the high degree of endstopping in V4 reported previously (Desimone and Schein 1987). But within a subsample of 61 cells tested with bar stimuli of length equal to half the estimated RF diameter the majority still exhibited stronger responses to contour features. These cells were tested with bars of optimum color and width at 12 orientations (15° intervals) in a preliminary characterization of orientation tuning (see methods). Other cells were not tested in this way either because preliminary handplotting and color/width tests disclosed little or no response to bars, handplotting indicated an absence of bar orientation tuning, or the remaining recording time was too short for extensive preliminary tests. Even in this subsample, which was to some degree preselected for stronger bar responses, 74% of the cells (45/61) had a higher maximum response in the contour feature test than in the oriented bar test.
Many cells exhibited clear, unimodal tuning for a particular range of contour features. An example is shown in Fig. 2 B. In this plot, average firing rate based on five stimulus repetitions is represented by the background gray level surrounding each stimulus icon. Response rates range from 0 (light gray) to 42 ± 2.3 (SE) spikes/s (black). This cell responded best to convex features oriented in the 135–180° range. Responses were stronger for sharp (vs. smooth) and acute (vs. obtuse) features. The results cannot be explained in terms of standard orientation tuning for individual edges, since many of the least effective stimuli (including the straight edges) contain the same edge orientations as the most effective stimuli. Another example of contour feature tuning is shown in Fig.2 C. This cell responded best to convex and outline smooth curve features oriented in the 315–0° range.
A contrasting result is presented in Fig. 2 D. The response pattern for this cell reflects standard orientation tuning. The cell responded to a variety of sharp angle and smooth curve outline stimuli (middle) containing edges oriented near 75°. There is no clear single peak as in the other examples, and thus no indication of contour feature tuning.
To quantify contour feature tuning, we determined for each cell the primary peak in the 3 (convexity) × 2 (curvature) × 3 (acuteness, excluding the 180° single straight edges and lines) × 8 (contour feature orientation) stimulus space. A peak was defined as a contiguous set of stimuli evoking responses greater than half-maximum (see methods). In Fig. 2, B–D, the stimuli falling within the primary peak are indicated by asterisks. The peaks appear discontinuous because two of the dimensions (curvature and acuteness) are plotted recursively. The primary peaks were characterized by two indices, primary peak strength and primary peak size. Primary peak strength represents the fraction of response strength above the half-maximum level contained within the primary peak (see methods). Primary peak strength was high for the cells exhibiting contour feature tuning in Fig. 2, B (1.0) andC (0.83), but low for the cell exhibiting standard orientation tuning in Fig. 2 D (0.37). Primary peak size represents the fraction of stimulus space covered by the primary peak (analogous to width at half-height for a 1-dimensional tuning function). Primary peak sizes for the cells in Fig. 2,B–D, were 0.028, 0.042, and 0.014, respectively.
Figure 3 shows the primary peak index values for the entire sample of 152 V4 cells. Each cell is represented by a dot. Primary peak size is indicated by position with respect to the x axis and primary peak strength by position with respect to the y axis. Cells with strong, focused tuning peaks in contour feature space (as in Fig. 2, B andC) fall near the upper left. At the extreme upper left, several cells with primary peak strength = 1.0 and primary peak size corresponding to just a few stimuli are superimposed (↓). One of these narrowly tuned cells is shown in Fig.4. Cells with multiple small peaks and no apparent contour feature tuning (e.g., Fig. 2 D) fall near the bottom of the plot. The shaded box marks the region of cells chosen for more detailed analysis (see following text). The cutoffs (primary peak strength = 0.7 and primary peak size = 0.15) are necessarily arbitrary, since the distribution is continuous, but the selected cells all had a single predominant and relatively focused tuning peak in contour feature space. The subsample comprises 50 cells (33% of the entire sample; 2 additional cells falling within the specified range were excluded because they responded best to a single straight edge or line). For these cells, on average, the strongest edge/line response was only about one-tenth of the strongest contour feature response (average ratio = 0.11). The peak analysis presented here excluded the edge/line stimuli so as not to confound contour feature tuning with edge orientation tuning, but inclusion of the edge/line stimuli made little difference: No further cells appeared in the shaded region, and four cells that were in the shaded region fell slightly below the primary peak strength cutoff of 0.7.
Contour feature orientation
Contour feature orientation tuning functions for individual cells are shown in Fig. 5. The tuning functions were derived by summing across the other three dimensions and normalizing. Contour feature orientation is plotted in the circular dimension and normalized response rate is plotted in the radial dimension. The inner ring in each plot corresponds to the normalized average response; successively larger rings correspond to twice the average and three times average. Where necessary to avoid truncation, the scale was compressed so as to include four or five times average. The plots are arranged in rows according to which contour feature orientation produced the strongest responses. The distribution of contour feature orientation peaks is uneven but not significantly different from a uniform distribution (Kuiper's test,P = 0.44). Shading denotes significant (P < 0.05) tuning based on randomization ANOVA. Tuning in this dimension was significant in all but one case, and response variance was higher than in any other dimension for most cells (38/50; 76%).
Contour feature orientation tuning was typically consistent across other dimensions. For example, the cell in Fig.6 responded well to features oriented at 30° (and to a lesser extent 75°) across all three acuteness values and all three convexity values (see also Fig. 2, B andC). Analysis showed that this cell's contour feature orientation tuning functions were significantly correlated across all pairings of acuteness values, two pairings of convexity (convex/outline and concave/outline) and across curvature. The response patterns for most cells showed significant correlations across at least two values of acuteness (38/50; 76%), convexity (37/50; 74%), and curvature (36/50; 72%). This consistency argues against explanations of contour feature tuning in terms of lower level factors like contrast direction, spatial frequency and component edge orientation, since these factors change across acuteness and convexity but cells continue to respond specifically to contour features pointing in a particular direction (see discussion).
Convexity response functions for individual cells are shown in Fig. 7 (for the definition of convexity, see Fig. 1 and methods). In each graph, the horizontal axis represents the three convexity values (in the arbitrary order convex/outline/concave) and the vertical axis represents normalized response summed across the other three dimensions. Cells with significant (P < 0.05) response variation in the convexity dimension are plotted in Fig. 7, bottom row; nonsignificant cases are plotted in Fig. 7, top row.Response variation across convexity was significant in all but two cases. Cells are plotted in Fig. 7, left, middle, andright columns, according to whether they responded best to convex, outline, or concave features. Most cells responded best to either convex (23/50) or outline (21/50) features. This bias against concave features is exemplified by the cell in Fig.8, which responded well to convex and outline features oriented at 15° but not at all to the corresponding concave features (see also Fig. 2, B and C). The convexity bias is interesting in light of psychological studies showing that convex features are more perceptually significant than concave features (see discussion).
Acuteness tuning functions for individual cells are shown in Fig.9. In each graph, the horizontal axis represents acuteness, and the vertical axis represents normalized response summed across the other three dimensions. Cells with significant (P < 0.05) acuteness tuning are plotted in Fig. 9, bottom row; nonsignificant cases are plotted in Fig.9, top row. Tuning was significant for 72% (36/50) of the cells in our sample. Cells are plotted left, middle, andright columns according to whether they responded best to the 45, 90, or 135° acuteness levels. The majority of cells responded best to 45° features. However, this acuteness bias was much less pronounced than the convexity bias described above, in that the average response variance associated with acuteness was approximately one-fifth that associated with convexity (compare the steepness of the functions in Figs. 7 and 9).
Curvature results for individual cells are shown in Fig.10. In each graph, the horizontal axis represents sharp angle versus smooth B-spline curve stimuli, and the vertical axis represents normalized response summed across the other three dimensions. Cells with significant differences between sharp and smooth stimuli are plotted in Fig. 10, bottom row. Cells responding better to sharp stimuli are plotted in left column, and cells responding better to smooth stimuli are plotted in right column. Curvature was the dimension of least influence in our data; only 56% of cells (28/50) exhibited significantly different responses to sharp and smooth stimuli, and the average response variance associated with curvature was lower than for any other dimension. Thus many cells responded in a similar fashion to both sharp angles and their B-spline curve counterparts, as can be seen to some extent in Figs. 2 B, 4, and 8. On the other hand, the responses of some cells were clearly biased toward either sharp or smooth stimuli (e.g., Fig. 2 C). These results do not necessarily imply anything about curvature representation in general. They only serve to contrast responses to angles with sharp corners and angles smoothed using the specific B-spline procedure shown in Fig. 1.
Standard orientation tuning was measured by analyzing responses to the single edge/line (180° acuteness) stimuli. Normalized tuning functions were created by collapsing across convexity to yield four values. (As discussed in methods, when preliminary tests revealed an orientation tuning peak for bar stimuli the stimulus set was rotated so that one of the four edge orientations coincided with that peak.) ANOVA indicated significant edge/line orientation tuning in 57% of the entire sample (87/152) and 46% of the subsample from Fig.3 (23/50). These percentages would presumably be higher if the cells had been tested with optimum bar stimuli rather than continuous edges and lines.
Since edge orientation is a standard tuning dimension, it provides a useful comparison with tuning for contour feature-related dimensions. The most relevant comparison is with contour feature orientation, which was the dimension of strongest tuning and is the most analogous to edge orientation. In Fig. 11, orientation tuning for edges and contour features is compared in terms of the differences between maximum and minimum values in the respective tuning functions. In Fig. 11 A, the tuning index for both edges and contour features is (maximum − minimum)/(maximum). (Thus a value of 0.75 indicates that the largest response difference was 75% of the maximum value in the tuning function.) Each cell is plotted with respect to its edge orientation index on the x axis and its contour feature orientation index on the y axis. The different symbols indicate cells with strong contour feature tuning from the subsample in Fig. 3 (●), cells that showed significant edge orientation tuning (▵), and the intersection of these two groups (▴; i.e., cells from the Fig. 3 subsample that also showed significant edge orientation tuning). Cells that belonged to neither group are not shown. Tuning strength in the two domains is roughly comparable, with the majority of cells showing index values >0.5. Edge orientation tuning is stronger than contour feature orientation tuning for the majority of cells overall (as indicated by the preponderance of cells to the right of - - -), though not for the majority of cells in the Fig. 3 subsample (● and ▴).
The analysis in Fig. 11 A ignores absolute differences between edge and contour feature tuning, since the two functions are normalized separately. For the Fig. 3 subsample cells, which had much lower responses to edge stimuli, this means that high edge index values may actually reflect relatively low response rate differences. This would explain why some cells without significant edge orientation tuning (●) still have edge index values around 1.0. To provide a more direct comparison, in Fig. 11 B the index values are scaled according to the maximum response to an individual stimulus within the relevant category (contour features for the contour feature orientation index, edges/lines for the edge orientation index; see legend for details). This greatly reduces the edge orientation index for many cells, especially those from the Fig. 3 subsample. Thus edge orientation tuning appears strong when the edge/line stimuli are considered separately, but less striking when considered relative to the typically higher responses to contour features. Orientation tuning for bars would probably be much stronger in an absolute sense, but bars were not included in our stimulus set. The most that can be said from the present results is that some cells show strong contour feature orientation tuning, others show strong edge orientation tuning, and tuning strength for these two groups in their respective domains is roughly comparable. By extension, tuning for convexity, acuteness and curvature may be somewhat weaker than standard orientation tuning.
A critical question in studies of shape representation is whether apparent tuning for complex stimuli actually depends on changes in the position of simpler components. In our study, for example, a particular contour feature might evoke a stronger response simply because it included an edge close to a particularly responsive region in the RF. This seems unlikely to explain the data presented in the preceding text because in every case the component edges in the optimum stimuli appeared in other stimuli at nearly the same positions but failed to evoke strong responses. As a further control, however, we tested 24 cells of the 50 in the Fig. 3 subsample with a selected subset of the contour feature stimuli presented at multiple positions. In each case, the selected stimuli included one optimum contour feature and at least one other contour feature that contained the same edge orientations but failed to strongly activate the cell. Example results for three cells are shown in Fig. 12. In each plot, the stimuli are shown at left, with the optimum feature at thetop followed by three other features containing the same component orientations (or ranges of orientations in the smooth curve case). The other columns show the responses to these stimuli when presented at the center of the RF and offset to the right, top, left, and bottom. (In A and C, the 2nd row stimuli evoked moderate responses because they fell within the flanking regions of the contour feature orientation peaks for the cells.) The separation between the right/left and top/bottom positions was 0.35 times the average RF diameter (see methods). Larger displacements were found to drastically reduce responses overall, rendering the test less meaningful. Even in the examples shown in Fig. 12, some displacements produced lower responses to the optimum stimulus. But the important point is that there were no positions at which a previously ineffective stimulus evoked responses comparable to those evoked by the optimum stimulus at the center position.
Results for all 24 cells are presented in Fig.13. In each plot, the normalized responses to the optimum stimulus are represented by ░, and responses to a nonoptimum stimulus containing the same edge orientations are represented by ■. Responses at the five different positions are represented by bar graphs at corresponding locations. In some cases, displacements from the center position strongly increased or decreased the response to the optimum stimulus, but the maximum response (across positions) to the nonoptimum stimulus never equaled the maximum response to the optimum stimulus.
We have shown that many cells in area V4 exhibit systematic tuning for contour features, i.e., angles and curves. There is no simple explanation for contour feature tuning in terms of lower-level factors such as edge orientation, spatial frequency, and contrast direction. The dimensions of greatest response variation are contour feature orientation (the direction in which angles or curves are pointed) and convexity (whether the angle/curve is rendered as a convex projection, an outline, or a concave indentation). There is a strong bias toward convex (and outline) features and against concave features, consistent with psychological findings (see following text). Altogether, the results suggest that contour features are extracted as intermediate level shape primitives, as a step toward complex shape recognition.
It is important to consider whether apparent tuning for contour features might simply reflect standard tuning for lower-level factors such as edge orientation, spatial frequency, and contrast direction. Standard tuning for edge orientation fails to explain the present data on several grounds. In every case of contour feature tuning, the component edge orientations contained in the optimum features also appeared in many other stimuli that failed to evoke strong responses (e.g., Fig. 2 B); tuning for contour feature orientation typically remained consistent across acuteness despite changes in component edge orientations (e.g., Fig. 6); and almost all cells responded better to contour features than to any individual edges or lines. Spatial frequency tuning fails to explain the data for similar reasons: similar spatial frequencies appeared in optimum and nonoptimum stimuli, and tuning for contour feature orientation typically remained consistent across acuteness and convexity despite substantial changes in spatial frequency content (particularly between the outline and convex/concave features; see Fig. 6). Selectivity for color/luminance contrast direction is likewise inadequate to explain the response patterns, again because the same contrast edges are shared by both optimum and nonoptimum stimuli and tuning is consistent across different convexity values despite changes in contrast direction. Finally, the response patterns cannot be explained by differential surround stimulation. Although it is true that surround stimulation varied with stimulus type, tuning for contour feature orientation remained consistent across convexity despite the associated changes in surround stimulation (e.g., Figs. 2, 6, and 8).
Endstopping is another standard response characteristic that might be invoked to explain the present results. In fact, it has been proposed that the ultimate function of endstopping is to derive information about contour features (Hubel and Livingstone 1987;Hubel and Wiesel 1965), and our findings support that hypothesis. Endstopping by itself does not predict the contour feature tuning patterns described here, since the same endstopped edges or lines were typically contained in both optimum and nonoptimum stimuli (see, e.g., the outline stimuli in Fig. 8). However, contour feature tuning can be explained specifically in terms ofcombinations of endstopped orientation signals (and other lower-level information). For example, the tuning pattern in Fig. 8could be explained as activation by the combination of an edge oriented near 70° (counterclockwise from horizontal) and endstopped at the top plus an edge oriented near 160° and endstopped at the right, with a preference for a specific contrast direction (brighter toward the left and darker toward the right). (Weak activation by individual component orientations is apparent.) The end result would be a signal related to the presence of a sharp corner pointing to the right. Theorists have proposed that contour feature information is derived by combining endstopped orientation signals precisely in this manner (Hummel and Biederman 1992; Milner 1974).
A simpler, related explanation might be that cells are tuned for a single edge/bar orientation, again with endstopping in just one direction. This could explain contour feature orientation tuning for acute (45°) angles because acute angles contain two closely apposed edges of similar orientation and have a relatively narrow width, so that responses might reflect tuning for either edges or bars at a nearby orientation. However, this mechanism would not explain why contour feature orientation tuning remains consistent for the 90 or 135° angles (as it does for the majority of tuned cells; seeresults and Figs. 2, 4, 6, and 8). These more obtuse contour features contain dissimilar edge orientations that substantially overlap with edge orientations contained by contour features pointing in other (nonoptimum) directions. Moreover, they have no real “bar” orientation, i.e., no oriented section of relatively narrow, relatively constant width. Thus, consistency of contour feature orientation tuning across different levels of acuteness implies a slightly more complex mechanism.
Our data reveal a strong response bias toward convex features and against concave features. Convex features are defined here as angles and curves in which the figure projects into the background (see Fig.1). More precisely, the region inside the angle (or curve) is continuous with the smaller image region, i.e., the figure (which in our experiment was filled with the optimum color for the cell), whereas the region outside the angle is continuous with the large homogeneous field that covers the rest of the screen, i.e., the ground (which in our experiment was dark gray).
Because the figure was always rendered in the optimum color against a dark gray background, we don't know which cue (figure/ground organization or color contrast direction) was critical for the convexity bias. The two alternatives are illustrated in Fig.14, for the case where white is the optimum color. The stimuli observed or predicted to evoke the strongest responses are indicated by checks. One possibility (top row) is that cells responded because the figure was convex (i.e., occupied the interior of the angle). In this case, even if contrast direction were reversed (C and D), cells would continue to show a bias toward convex figures, responding better toC (assuming they respond at all under the contrast reversed condition). The other possibility (bottom row) is that cells responded because the optimum color (white) was convex (occupied the angle interior). In this case, if contrast direction were reversed, cells might respond best to D, where an angle of the white background projects into the dark concave figure. Note, however, that the cells in question (those responding to white) would then be representing the background not the figure. Cells representing the figure (those responding to dark gray in conditions Cand D) should always display a bias toward convex stimuli.
Whichever cue is critical for driving cells, a likely mechanism to explain the convexity bias is surround inhibition. The fadeout portions of the concave stimuli occupy more of the RF surround (see Fig. 1), which is known to be silently inhibitory in area V4 (Desimone et al. 1985). The situation would be similar for concave features within any real life object large enough to exceed V4 RF borders. Concave features are by their nature surrounded by other portions of the object and thus more subject to surround inhibition. The V4 response bias against concave features might be less pronounced for smaller shapes that fit within V4 classical RFs.
The neurophysiological convexity bias that we found in V4 parallels psychological results showing that convex features are perceptually dominant. Human observers favor figure/ground interpretations that emphasize convex projections over concave indentations (Kanizsa and Gerbino 1976). Convex features are more determinative than concave features in judgments of shape similarity (Subirana-Vilanova and Richards 1996). These results are consistent with our finding that convex features are more strongly represented in visual cortex. Hoffman and Richards (1984) predicted on theoretical grounds that segmentation of complex objects into parts (for the purpose of shape recognition) should occur along boundaries of maximum concavity, producing convex parts. This “curvature minima” rule is supported by psychophysical results showing that human observers are more likely to recognize parts from a previously viewed object if they are convex, i.e., segmented at points of concavity (Braunstein et al. 1989). Our results suggest that the Hoffman and Richards minima rule is instantiated in the neural circuitry of the ventral visual pathway.
Implications for cortical shape processing
Most current theories of shape processing are based on the idea of feature extraction, i.e., the identification of object parts. There are alternatives to feature extraction, including template matching and Fourier decomposition, but feature- or part-based mechanisms are better adapted to the real-world difficulties of three-dimensional viewpoint transformations, partial occlusion and plastic deformation (Hoffman and Richards 1984). Moreover, physiological results in higher level extrastriate cortex (e.g., Desimone et al. 1984; Gross et al. 1972; Perrett et al. 1982; Tanaka et al. 1991) seem more compatible with feature-based theories. The simplistic notion of all-or-nothing feature detectors arranged in a hierarchy leading up to “grandmother cells” has been justly criticized, but more reasonable models can be constructed on the basis of broadly tuned feature filters, feeding into higher-level units that are themselves broadly tuned and represent complex shapes through population coding (Barlow 1972; Poggio 1990).
A key question for feature-based theories is the nature of the elementary features or shape primitives on which recognition is based. The first-level feature in most models is local edge orientation, a choice dictated by the physiology of early stages in visual cortex (Baizer et al. 1977; Burkhalter and Van Essen 1986; Hubel and Livingstone 1987; Hubel and Wiesel 1959, 1965, 1968). The choice of intermediate-level features is not so constrained by physiology. Two general types of intermediate features have been considered, one relating to object boundaries and the other to solid volumes. Boundary-related features include two-dimensional angles and curves of the type studied here and homologous three-dimensional surface features (sharp corners, curved surface patches, and indentations). Solid or volumetric primitives (also referred to as generalized cones or geons) are defined by the orientation and shape of their medial axes along with various cross-section attributes. Some theories postulate a progression from local orientation to contour features to volumetric primitives to complete shape descriptions (aggregates of volumetric primitives), with each stage based on inputs from the preceding stage (e.g.,Biederman 1987; Dickinson et al. 1992). According to other models, final shape representations could be based directly on contour features (Poggio and Edelman 1990). A third scheme involves direct progression from local orientation to volumetric primitives, with no intermediate description in terms of contour features (e.g., Marr and Nishihara 1978).
Distinguishing between these theoretical alternatives requires physiological data about what kinds of shape information are represented at various stages in the ventral cortical pathway. Our finding of systematic tuning for contour features in area V4 argues for the importance of boundary-related primitives and reinforces previous evidence for extraction of angles and curves at earlier levels. Most previous studies have focused on angles and curves pointing in the two directions orthogonal to the optimum bar orientation (i.e., stimuli that represent deformations of the optimum bar stimulus). These studies have shown that endstopped cells in cat area 17 respond well to small radius curves (Dobbins et al. 1987; Heggelund and Hohmann 1975; Versavel et al. 1990), as predicted by Hubel and Wiesel (1965). Moreover, some cells respond better to curves pointing in one direction or the other, which has been explained in terms of even- versus odd-symmetric RF substructure (Dobbins et al. 1987). There is little evidence for selective responses to angles, though Hammond and Andrews (1978) showed that for a few cells in cat area 18 small differences in bar orientation tuning in the two halves of the RF resulted in better responses to obtuse angles than to any straight line stimulus, and Hubel and Wiesel (1965) provided examples of endstopped cells responding to angles containing their optimum edge orientation. Tuning for angles and curves in monkey area V2 has been reported in abstract form (Hegde and Van Essen 1997). Our findings extend this line of research by showing systematic tuning for contour feature-related dimensions, especially contour feature orientation, at an intermediate level in the primate ventral pathway (V4). In addition, our data are consistent with reports of V4 selectivity for complex stimuli that contain angle and curve elements (Gallant et al. 1996; Kobatake and Tanaka 1994).
Our findings also relate to psychological studies suggesting the existence of specialized mechanisms for angle and curve perception. Three different groups have recently shown that angle perception acuity is higher than that predicted by line orientation acuity (Chen and Levi 1996; Heeley and Buchanan-Smith 1996;Regan et al. 1996). Observers are highly sensitive to the presence of curvature (Andrews et al. 1973;Wilson et al. 1997), and curvature appears to be a basic feature for visual search (Treisman and Gormican 1988;Wolfe et al. 1992). Moreover, there is no transfer of perceptual learning between curvature hyperacuity tasks and other hyperacuity tasks like orientation and vernier discrimination (Fahle 1997; Watt and Andrews 1982). On the basis of these results, psychologists have postulated specific neural mechanisms for detecting angles and curves. Our data provide convergent evidence for the existence of such neural mechanisms.
Contour feature extraction would support an efficient and flexible population code that could represent a variety of shapes with a limited number of units. A triangle could be represented by the activity of cells tuned for acute angles pointing in three specific directions, and the same cells could participate in the representation of any number of shapes containing similar acute angles. Contour feature signals from intermediate areas like V4 could be combined at subsequent processing stages to create selectivity for more complex patterns, of the sort that has been observed in IT cortex. Our demonstration of systematic tuning for contour features in V4 provides preliminary evidence for such a mechanism. Further studies are under way to investigate how contour feature tuning relates to complex shape responses in V4 and higher levels in the ventral pathway (Pasupathy and Connor 1998).
Technical help was provided by S. Patterson and H. Dong. We thank A. J. Bastian, S. L. Brincat, D. A. Hinkle, S. S. Hsiao, K. O. Johnson, V. B. Mountcastle, G. F. Poggio, R. von der Heydt, and M. A. Steinmetz for comments on earlier versions of the manuscript.
This work was supported by the Lucille P. Markey Charitable Trust.
Address for reprint requests: C. E. Connor, Zanvyl Krieger Mind/Brain Institute, Johns Hopkins University, 338 Krieger Hall, 3400 N. Charles St., Baltimore, MD 21218.
The costs of publication of this article were defrayed in part by the payment of page charges. The article must therefore be hereby marked “advertisement” in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.
- Copyright © 1999 The American Physiological Society