Visual shape recognition in primates depends on a multi-stage pathway running from primary visual cortex (V1) to inferotemporal cortex (IT). The mechanisms by which local shape signals from V1 are transformed into selectivity for abstract object categories in IT are unknown. One approach to this issue is to investigate shape representation at intermediate stages in the pathway, such as area V4. We studied 109 V4 cells that appeared sensitive to complex shape in preliminary tests. To achieve a more complete picture of shape representation in V4, we tested each cell with a set of 366 stimuli, constructed by systematically combining convex and concave boundary elements into closed shapes. Using this large, diverse stimulus set, we found that all the cells in our sample responded to a wide variety of shapes and did not appear to encode any single type of global shape. However, for most cells the shapes evoking strongest responses were characterized by a consistent type of boundary conformation at a specific position within the stimulus. For example, a given cell might be tuned for shapes containing concave curvature at the right, with other parts of the shape having little or no effect on responses. Many cells were tuned for more complex boundary configurations (e.g., a convex angle adjacent to a concave curve). We quantified this kind of shape tuning with Gaussian functions on a curvature × position domain. These tuning functions fit the neural responses much better than tuning functions based on edge or axis orientation. Thus individual V4 cells appear to encode moderately complex boundary information at specific locations within larger shapes. This finding suggests that, at intermediate stages in the V1-IT transformation, complex objects are represented at least partly in terms of the configurations and positions of their contour components.
The ventral pathway in primate visual cortex is thought to be responsible for shape recognition (Felleman and Van Essen 1991;Ungerleider and Mishkin 1982). At early stages in this pathway, such as V1, shape is encoded by cells with small receptive fields (RFs) sensitive to simple features like edge orientation (Hubel and Wiesel 1968). Cells at the end of the pathway in inferotemporal cortex (IT) have large RFs and often appear selective for abstract object categories like faces and hands. The mechanisms by which local orientation signals in V1 are transformed into complex object selectivity in IT are not yet understood. One approach to this issue is to elucidate the nature of shape representation at intermediate stages in the ventral pathway such as area V4.
Prior research has shown that, as in areas V1 and V2, cells in area V4 can be selective for orientation, length, and width of bar stimuli, as well as orientation and spatial frequency of gratings (Desimone and Schein 1987). In addition, many V4 cells are responsive to more complex shapes (Kobatake and Tanaka 1994), and many are sensitive to curvature, as shown by their selective responses to curvilinear gratings (Gallant et al. 1993,1996). We sought to study these more complex V4 cells, and specifically to provide a more complete picture of how they function in representing a wide range of shapes.
To do so, we created a large set of moderately complex shapes, by systematically combining convex and concave boundary elements. Our stimulus set design was partly motivated by prior results showing tuning for individual curves and angles in area V4 (Pasupathy and Connor 1999). Our scheme for combining individual boundary elements into complex closed shapes yielded a total of 366 stimuli. These stimuli varied widely in overall shape but also shared common boundary components.
We used these stimuli to study 109 V4 cells that appeared to have complex shape response properties based on preliminary tests. Each cell in this sample responded to a variety of very different shapes. No cell displayed a response pattern that could be characterized in terms of a single type of global shape. However, for most cells the effective stimuli showed some degree of shape consistency at one position (relative to the center of the object). In other words, these cells were tuned for boundary conformation in one part of the shape. This kind of position-specific tuning for boundary conformation was quantified with Gaussian functions on a curvature × position domain. Many cells were tuned for sequences of two or three curvature values. The curvature-based tuning functions fit the neural responses much better than functions based on linear edge or axial orientation (where axial denotes the axis of greatest elongation; seemethods). The results suggest a parts-based representation of complex shape in V4, where the parts are boundary patterns defined by curvature and position relative to the rest of the object.
We recorded single-cell activity in two female rhesus monkeys (Macaca mulatta), weighing 7.5 and 5.6 kg, respectively. During training and recording sessions the animal was seated in front of a computer monitor at a distance of 50 cm, with the head immobilized by means of a custom-built titanium postsurgically attached to the skull with orthopedic screws. The animal was trained to fixate a 0.1° white spot within 0.5° of visual angle for a period of 3.75 s to receive a juice reward. Eye position was monitored using the scleral search coil method (Robinson 1963). A wire coil was surgically implanted beneath the conjunctiva of one eye (Judge et al. 1980) and connected to a signal converter (Riverbend, Birmingham, AL). The analog signal from the converter was digitized and sampled at 100 Hz through an A/D interface (BG Systems, Palo Alto, CA) connected to a serial port of an Indy workstation (Silicon Graphics, Mountain View, CA). The workstation was also used for generating visual stimuli.
We studied V4 neurons in the lower parafoveal representation on the prelunate gyrus and adjoining banks of the lunate and superior temporal sulci. Recording locations were based on skull landmarks, response characteristics, retinotopy, and inferred positions of the sulci. Neural activity was recorded with 125-μm-diam epoxy-coated tungsten electrodes (A-M Systems, Carlsborg, WA) with impedances of 1–5 MΩ. Electrodes were inserted transdurally through a 5-mm-diam craniotomy by means of a custom guide tube system. Electrode position was controlled with a stepping motor microdrive (National Aperture, Salem, NH). Electrical waveforms were amplified and filtered, and single units were discriminated on the basis of 2 (occasionally 1) independently adjustable time/amplitude windows. The digital output of the window discriminator was collected through the audio input channel of the workstation at a sample rate of 8 kHz. All animal procedures conformed to National Institutes of Health and USDA guidelines and were carried out under an institutionally approved animal protocol.
Each cell was initially characterized with flashing and drifting bars, ellipses, and star-shaped stimuli under the experimenter's control. These stimuli were used to find the cell's RF center and to determine an effective stimulus color (used in all subsequent tests). We presented eight colors: red, green, blue, yellow, cyan, magenta, white, and black. Each color was adjusted to an approximate luminance of 20 cd/m2, except for blue (15 cd/m2) and black, and displayed against a background gray of 2.5 cd/m2. We also assessed tuning for curves and angles (Pasupathy and Connor 1999) and bar orientation. Since we specifically sought to study complex shape representation, and our complete testing procedure was extremely time-consuming, we frequently bypassed cells that appeared sensitive only to bar orientation. We isolated 409 neurons during the course of our experiments. Of these, we chose 222 for further study based on their responsiveness to curves, angles, ellipses or star-shapes during preliminary tests. In this paper we present results for 109 cells for which we completed at least 3 repetitions (usually 5) of the entire stimulus set (see Stimuli).
The stimulus set is shown in Fig.1. Each stimulus is represented by a white icon positioned within a black disk that represents the cell's RF. The stimuli were constructed by systematically combining convex and concave boundary elements to form closed shapes. These boundary elements included sharp convex angles, medium and broad convex curves, and medium and broad concave curves. (Our description here assumes that the stimulus is perceived as figure and the rest of the display screen as ground, so that contour elements projecting outward from the center of the stimulus are convex and indentations toward the center are concave.) We did not include sharp concave angles because this would have further increased the size of an already large stimulus set, and our previous results had shown a strong bias in V4 toward convex contour features (Pasupathy and Connor 1999). The shapes in Fig. 1 constitute a complete combinatorial sampling based on a limited set of boundary elements and certain geometrical constraints (see legend). Obviously, a greater variety of stimuli could be constructed by allowing more variation in the curvatures and lengths of the boundary elements. The stimuli are arbitrarily arranged into blocks according to number and configuration of convex projections. Stimulus orientation varies along the rows.
Stimulus size was based on estimated RF size, which in turn was based on RF eccentricity. The average RF diameter at a given eccentricity was estimated as 1° + 0.625 × RF eccentricity, based on a study byGattass et al. (1988). Stimulus size was scaled with eccentricity such that the outermost stimulus edges were offset from the RF center by 0.75 × estimated RF radius. Thus as a group, the stimuli covered the central three-quarters of the average V4 RF diameter. We based stimulus size on eccentricity rather than individually measured RF diameter so that the stimulus set would be consistent from cell to cell. In some cases stimulus size would have been nonoptimum for the cell being studied. Scaling with eccentricity also compensates for acuity changes and thus maintains the visibility of the stimuli. Stimulus shape remained clearly perceptible at all eccentricities, based on our subjective observations. We did not test stimulus size as a variable in any of our experiments, since smaller stimuli would have been difficult to see, and larger stimuli would have exceeded the RF borders of many V4 cells. Additional tests in which we varied stimulus position (see below) verified that response functions did not depend on the position of specific features with respect to the RF. Hereinafter, “RF diameter” and “RF radius” will be used to denote RF diameter and radius estimated on the basis of eccentricity.
During each trial, following initiation of fixation and a 250-ms prestimulus interval, five randomly selected stimuli were flashed one at a time for 500 ms each, with interstimulus intervals of 250 ms. Total trial length was thus 3.75 s. The entire set of 366 stimuli was sampled without replacement 5 times for most cells (91/109). There were 9 cases in which only 4 repetitions were completed and 9 others in which only 3 repetitions were completed.
To verify that responses did not depend on some specific placement of stimuli, or parts of stimuli, relative to the RF, we performed post hoc control tests in which selected stimuli, including at least one effective and one ineffective stimulus, were presented at multiple positions. In some cases these stimuli were presented in five positions: at the RF center and offset to the left, right, above and below by 0.35 × RF radius. In other cases stimuli were presented at 25 positions in a 5 × 5 square grid centered on the RF, with a spacing of 0.5 × RF radius.
Response rates were calculated by counting spike occurrences within the 500-ms stimulus presentation period. Background response rates were derived in the same way from null stimulus periods interspersed randomly among stimulus presentations in all tests. Background rates were low (average, 1.6 spikes/s), and analyses with and without background subtraction yielded similar results. The results presented here are based on subtraction of average background rate from the response rate for each repetition of each stimulus.
We characterized each shape in our stimulus set in terms of its component boundary elements (angles and curves). For each boundary element we determined average curvature, orientation (of the perpendicular bisector, a perpendicular line intersecting the boundary segment's midpoint; i.e., the direction in which the angle or curve seems to point), and position. Curvature was defined as rate of change in tangent angle (in radians) with respect to contour length (in units of estimated RF radius). For angles, curvature is infinitely high, so we used a squashing function (see below) to map raw curvature values into a continuum that would encompass angles. Divisions between contour elements were defined as regions in which the rate of change in curvature exceeded 40 rad/radius2. This threshold yielded four to eight elements per stimulus. The stimulus shapes were designed to have four to eight contour segments of relatively constant curvature, and the arbitrary cutoff value of 40 rad/radius2 just serves to distinguish these segments. The results accord with subjective impressions of how many segments each shape has. For example, the star-shaped stimulus (3 in Fig. 1) consisted of four convex angles (regions of extremely high curvature) and four intervening concavities (8 boundary elements altogether). Large continuous regions of constant convex curvature (as in the disk stimuli) were divided into 45° sections, since 45° was the sampling interval for contour segment orientation, and the angular extent of other contour segments was on the order of 45°. Dividing the shapes into fewer segments would confound multiple curvature values and thus reduce the power of the analysis. Dividing into more segments would not affect results, since the same approximate curvature values would just be represented redundantly. Position was defined as polar angle and radial eccentricity with respect to the center of mass of the shape.
Thus each boundary element was characterized by four numbers (curvature, orientation, angular position, and radial position) and could be considered a point in a multidimensional space. Each shape could be considered a collection of such points. This provided a metric stimulus domain in which we could characterize shape tuning. In practice, we found that two dimensions, curvature and polar angle, were sufficient to describe shape tuning in this experiment (seeresults).
For each cell, we characterized tuning in shape space by deriving multi-dimensional Gaussian functions based on neural responses. Assume that each stimulus is represented by P points in ann-dimensional shape space. Xip represents the value of the ith stimulus dimension for thepth point. The response function along each dimensioni was fit with a one-dimensional Gaussian with its peak atμi and a standard deviation of ςi. The overall response function was fit by the product of the n Gaussians. Thus the predicted response r is given by where k represents the amplitude of then-dimensional Gaussian. The predicted response to a stimulus with P points was the maximum of the P responses associated with its component points (cf. Riesenhuber and Poggio 1999). Thus if a cell were strongly driven by a particular boundary element, the tuning function would predict high responses to all shapes containing that element, independent of other stimulus characteristics. Tuning function estimates were similar when predicted responses were based on the sum of all component responses instead of ).
The parameters of the Gaussian tuning function were estimated by minimizing the sum of squared errors between observed and predicted values (across all stimuli) using the Gauss-Newton algorithm in MATLAB (MathWorks, Natick, MA). Since nonlinear regression solutions can be highly dependent on starting points, we derived solutions from multiple starting points uniformly spaced across a grid in the stimulus domain. The functions that provided the best fits are presented here. (For each neuron, the majority of starting points yielded similar tuning functions.) Goodness-of-fit was assessed by computing the coefficient of correlation between observed and predicted responses (r).
Since curvature has an inverse relationship to radius, absolute curvature values become extremely high for small radius curves. Moreover, perceived curvature has to asymptote at radii well below the acuity threshold, which will be perceived as angles. Therefore in our analyses, we replaced absolute curvature, c, with squashed curvature, c′, based on the following formula Squashed curvature values range from −1.0 (sharp concave angles) to 0.0 (straight edge) to 1.0 (sharp convex angles). The valuea dictates the slope of the sigmoidal squashing function. In the analyses presented here, a was 0.125 and the curvatures sampled in our stimulus set ranged from −0.31 to 1.0. As an example, in stimulus 1 (Fig. 1), the curvature of the two sharp convexities was 1.0, the concavity was −0.31, and the broad convexity was 0.2. In stimulus 2, the curvature of the two medium convexities was 0.75. Analytical results were similar witha = 0.075.
We also investigated tuning based on edge orientation, hypothesizing that cells might respond to shapes with relatively flat contour segments at a preferred orientation. For this purpose, we decomposed the stimuli into component boundary segments with absolute curvature <6.0 rad/RF radius. This threshold yielded straight edge approximations for all boundary segments except the sharp and medium convexities. Edge orientation tuning was described with one-dimensional (1-D) Gaussian functions. As in the boundary curvature models, the predicted response to a stimulus was the maximum of the responses associated with its component boundary segments.
We also investigated tuning for axial orientation, i.e., orientation of the axis of greatest elongation (as in tuning for oriented bars). For this purpose, we used a standard analogy to mass, finding the axis of lowest rotational inertia and determining its orientation ϕ and elongation ε (Jahne 1993) where mxx, myy, and mxy are the second-order central moments along the Cartesian xand y axes, and mx′x′ andmy′y′ are second-order central moments along ϕ and ϕ + 90°, respectively. The elongation ε ranges from 0.0 (for a circular object) to 1.0 (for a line). We described axial orientation tuning with 2-D Gaussian functions on the ϕ × ε (orientation × elongation) domain. We also determined total extent in the ϕ and ϕ + 90° directions (equivalent to length and width) and fit three-dimensional (3-D) Gaussian functions on the orientation × length × width domain.
Position-specific tuning for boundary conformation
We used the stimulus set shown in Fig. 1 to study 109 area V4 neurons that appeared sensitive to complex shape in preliminary tests. RF eccentricities ranged from 0.0 to 6.62°. Each neuron responded to a diverse set of shapes. An example is shown in Fig.2 A. For each stimulus icon in this figure, the background gray level indicates response rate averaged across five repetitions. Response rates ranged from −6.3 ± 0.0 (SE) spikes/s (light gray; below spontaneous rate) to 38.1 ± 7.0 spikes/s (black; see scale bar). Stimuli that evoked strong responses varied widely in overall structure and included crescents, triangles, teardrops, and four-pronged shapes. A common feature of these shapes, however, was the presence of a convex projection near the bottom left (relative to the object center). Stimuli with a sharp convex angle at this position were particularly effective (e.g., stimuli 1 and 2 in themiddle column, bottom block; these stimuli are labeled with superscript numbers). Stimuli with a medium convex curve evoked moderate responses (e.g., stimuli 3 and4). Thus this cell appears to encode information about the bottom left boundary region, responding well to sharp convexity at this location and poorly to broad convexity or concavity.
These response characteristics were quantified with the Gaussian tuning function shown in Fig. 2 B. The domain in this plot has two dimensions: angular position and curvature. In the angular position dimension, 0° corresponds to boundary elements on the right-hand side of the shape, 90° corresponds to the top of the shape, 180° to the left, etc. In the curvature dimension, positive values denote convex curvature, with larger numbers representing higher (sharper, smaller radius) curvature, and 1.0 corresponding to convex angles (the limit of sharp curvature). Negative values represent concave curvature, and 0.0 corresponds to straight lines. The predicted response for each combination of position and curvature is indicated by the height and color of the surface plot. For this cell, the best-fitting Gaussian had a peak at 229.6° in the angular position dimension (bottom left relative to the object center) and 1.0 in the curvature dimension (sharp convex). The standard deviation in the angular position dimension was 26.7°, implying that the cell was sensitive to convexity within a relatively narrow range of positions. The standard deviation in the curvature dimension was 0.42, indicating responsiveness to a range of convex curvatures. Thus the tuning function indicates that this cell represents convex curvature in the bottom left boundary region.
Figure 2 A shows that the most effective stimuli contained not just a convexity at the bottom left but also a concavity at the bottom; i.e., adjacent in the counterclockwise (CCW) direction (e.g.,stimuli 1 and 2). Stimuli that instead contained a CCW-adjacent convexity evoked much weaker responses (e.g.,stimuli 5-8 in the middle column, top block). In other words, this cell was tuned for boundary configurations comprising more than one curvature element.
This slightly more complex tuning pattern is represented in Fig.3 A. Here, the stimulus domain has four dimensions, as follows. Each individual surface plot represents the same two dimensions as in Fig. 2 A: angular position and curvature. The peak in these two dimensions still corresponds to sharp convexity (1.0) near the bottom left (230.0°). The rows and columns of plots represent the other two dimensions. The rows correspond to different values for CW-adjacent curvature, and the columns correspond to different values of CCW-adjacent curvature. This cell exhibited strong tuning for concave CCW curvature, with a peak at −0.15 (2nd column; SD in this dimension was 0.21). There was no strong tuning for CW curvature, as shown by the similarity of tuning surfaces across rows. Thus the 4-D tuning function indicates that this cell was responsive to shapes containing sharp convex curvature at the bottom left flanked by concave curvature at the bottom.
Goodness-of-fit for these tuning functions is represented by the scatter plots in Fig. 3 B. For each stimulus, the average neural response is plotted against the response predicted by the 2-D (left) or 4-D (right) Gaussian function. The vertical banding in these plots is due to the fact that groups of stimuli shared similar boundary patterns and thus similar predicted response values. The correlation between neural responses and predicted responses appears stronger in the 4-D plot, and this difference is reflected by the correlation coefficients: 0.70 for the 2-D function and 0.82 for the 4-D function. (The correlation between predicted responses based on this cell's edge orientation tuning function and observed neural responses was 0.25; see Tuning for linear orientation.) A partial F test showed that CCW curvature had a significant effect on responses (P < 0.01). Thus the 4-D tuning function, which represents complex local boundary configurations, provides a better description of the responses.
Even the 4-D scatter plot, however, still shows substantial variation not explained by the Gaussian tuning function. This variation may represent some combination of 1) more complex boundary conformation tuning not captured by a simple Gaussian function,2) sensitivity to other shape factors besides boundary conformation, and 3) noise, due to our limited sample of five repetitions of each stimulus. These issues are further addressed below. In any case, the boundary conformation in a specific region of the object (bottom left and bottom) is clearly a major determinant of this cell's responses to complex shapes.
Another example is shown in Fig. 4. This cell was sensitive to boundary conformation on the right side of the object, responding best to concave curvature at that position. This is exemplified by stimuli 1 and 2 in themiddle column, bottom block of Fig. 4 A. Stimulus 1, with a concavity at the right, evoked a stronger response. Stimulus 2 is almost identical, but with a convexity at the right, and it evoked no response. The 4-D curvature tuning function for this cell is shown in Fig. 4 B. The Gaussian peak for the center boundary element is at −0.29 (concave) and 6.3° (to the right of the object center; the peak is artifactually split along the angular position dimension). The cell also appears to be tuned for sharper convexities at the CCW-adjacent position (peak curvature = 1.0, SD = 0.33) and medium convexities at the CW-adjacent position (peak curvature = 0.70, SD = 0.66). This combination is exemplified by stimulus 3 in Fig. 4 A, which evoked a stronger response. The opposite combination (sharp CW and medium CCW) is exemplified bystimulus 4, which evoked a weaker response. (Compare alsoshapes 5 and 6, and similar pairs throughout the stimulus set.) However, stimuli with two sharp adjacent curvatures or two medium adjacent curvatures also produced stronger responses (e.g.,stimuli 7 and 8). The correlation coefficient for the 4-D Gaussian tuning function was 0.81. The correlation coefficient based on edge orientation tuning was 0.38.
A third example is shown in Fig. 5. This cell was sensitive to boundary conformation at the top right, responding best to sharp convexity, especially when flanked by a concavity on one side or the other. The tuning function in Fig.5 B reflects this response pattern, with a center curvature peak at 1.0 (sharp convex) and 44° (top right). Tuning for adjacent concavities was strong, with a CW-adjacent peak at −0.13 (SD = 0.19) and a CCW-adjacent peak at −0.21 (SD = 0.31). The correlation coefficient for the 4-D Gaussian was 0.85. The correlation coefficient based on edge orientation tuning was 0.31.
To ensure that these response patterns did not result from differential stimulation of a RF hotspot (or some other mechanism related to absolute position), we tested shape tuning at multiple positions. The position test for the Fig. 5 cell is shown in Fig.6 A. We selected two stimuli based on the original test, one containing the boundary pattern that drove the cell (the star-shaped stimulus, Fig. 6 A, top) and another without this boundary pattern but otherwise equivalent in shape (bottom). We presented each stimulus at 25 positions arranged in a 5 × 5 grid centered on the RF, with a spacing of 0.5 × RF radius. The star-shaped stimulus evoked strong responses at multiple positions, while the other stimulus never evoked a strong response. We performed similar tests on 33 cells in our sample. As expected, given the limited size of V4 RFs, responses were not invariant with position. In all cases, however, the stimulus containing the critical boundary pattern evoked the strongest response across positions.
We performed other post hoc tests in which we varied the position of the critical boundary pattern relative to the rest of the object. The results of this test for the Fig. 5 cell are shown in Fig.6 B. We tested teardrop-shaped stimuli in which we varied1) the orientation of the convex projection (left, middle, and right blocks in Fig. 6 B),2) the length of the convex projection in the direction parallel to its orientation (rows within eachblock), and 3) the offset of the convex projection in the direction orthogonal to its orientation (columns within each block). Figure 6 Bshows that the cell responded best to shapes that contained a sharp convexity near the top right (relative to the object center). As a result, somewhat surprisingly, the optimum orthogonal offset changed with the orientation of the convex projection. As orientation rotated CCW (blocks, left to right), optimum orthogonal offset shifted in the opposite direction.
We performed equivalent tests on 29 cells tuned for sharp convexity (adjusting for optimum convex projection orientation of the individual cell). The majority of these cells showed a similar interaction between orientation and orthogonal offset. Two-factor ANOVA (orientation × offset) indicated a significant (P < 0.05) interaction effect for 26/29 cells. For these 26 cells, we measured direction of shift in optimum offset by regressing optimum offset on orientation. In 23 cases, the regression line sloped in the same direction as for the cell in Fig. 6 B. In other words, for these 23 cells, the optimum offset shifted opposite to orientation, so that the position of the convex extremity remained similar. This analysis suggests that the position of contour elements relative to the object center is an important tuning dimension for these cells.
In addition, we fit the observed responses with 1-D Gaussian functions on the angular position domain and on the orthogonal offset domain. This analysis was limited to stimuli with the longest convex projections (in the bottom row of Fig. 6 B), since these produced the strongest responses. The effects of orientation were partitioned out by normalizing responses to an average value of 1.0 within each orientation block. For the cell in Fig. 6 B,observed responses were more highly correlated with predicted responses based on angular position of the convex extremity (r = 0.72) than with predicted responses based on orthogonal offset (r = 0.08). Correlation was higher for angular position in 20/29 cells. Median correlation was 0.61 for angular position and 0.23 for orthogonal offset. These results further support the significance of relative position as a tuning dimension for V4 cells.
Distribution of tuning parameters
Each cell in our sample responded to a variety of shapes, as in Figs. 2, 4, and 5, with strong activity distributed across the 3 major stimulus categories in Fig. 1, i.e., stimuli with 2, 3, and 4 convex projections. There were only 2 cases in which responses greater than half-maximum were restricted to just one category, and only 17 cases in which responses >75% of maximum were restricted to one category. Thus most cells responded to a diverse set of stimuli, including ellipses, crescents, teardrops, stars, etc. They were not selective for a single type of global shape. We therefore characterized their responses in terms of tuning for local boundary conformation.
We fit Gaussian tuning functions on the 2-D and 4-D boundary curvature × position domains for all 109 cells in our sample. We also fit tuning functions on domains that included other dimensions (in addition to curvature and position), specifically boundary element orientation (the direction in which the angle or curve seems to point) and radial position (with respect to the object center). These dimensions would be important for complete descriptions of some shapes, but in our stimulus set they were superfluous. Boundary element orientation was usually equivalent to angular position (i.e., most curves were pointed outward from the center) and hence redundant. As a result, we could not completely distinguish the relative importance of angular position and boundary element orientation. However, post hoc tests (see Fig. 6 B) demonstrated that tuning for angular position generalized across boundary element orientation. Radial position, for any given curvature type, was fairly standard across stimuli, and partial F-tests indicated that including radial position as a stimulus dimension did not substantially improve goodness-of-fit in most cases. For these reasons, we have focused our discussion on the curvature × position domain.
Figure 7 A shows the distribution of tuning peaks and SDs in the angular position dimension (for the 4-D Gaussian fits). The distribution of tuning peaks is represented on the vertical axis and summarized by the histogram at theleft of the scatter plot. This distribution was not significantly different from a uniform distribution (P= 0.79) according to a Monte Carlo version of Kuiper's test (a circular Kolmogorov-type analysis) (Mardia 1972;Pasupathy and Connor 1999). Thus the full range of angular positions seems to be represented by our sample of V4 cells. SDs are shown on the horizontal axis. In most cases (86/109) SDs were <90°, indicating that V4 cells are sensitive to boundary conformation at fairly restricted locations.
Figure 7 B shows the distribution of tuning peaks and SDs in the curvature dimension for the 2-D fits. The 2-D fitting procedure provides a better estimate of which single curvature type had the greatest effect on responses. Curvature tuning peaks are represented on the vertical axis and summarized in the histogram at theleft. The distribution covers the entire range of concave and convex curvatures, but there is a stronger representation of sharper convexities, in the 0.5 to 1.0 range, which includes cells like those in Figs. 2 and 5. A smaller number of cells was tuned for concavities, in the negative curvature range, like the example cell in Fig. 4. (Peak positions below the range of curvatures actually tested, i.e., less than −0.31, signify that neural responses were best fit by the flank of an off-center Gaussian.) Other cells were tuned for broad convexity, in the 0.0 to 0.25 range. The example cell in Fig.8 fell within this range, responding to broad convex curvature at angular positions near 90°. SDs were large in some cases but covered less than one-half the sampled curvature range (i.e., <0.65) for the majority of cells (87/109). The distribution of curvature tuning peaks may be influenced by the fact that sharp concavities were not represented in our stimulus set. However, previous results with a stimulus set that included sharp concavities also showed a strong bias toward convexity (Pasupathy and Connor 1999). Also, the definitions of convexity and concavity depend on the assumption that the stimulus is perceived as figure and the rest of the display screen as ground.
Figure 7 C shows the distribution of curvature tuning parameters in the 4-D domain, which represents sequences of curvature elements. For each cell, center curvature tuning is represented by a red dot, CCW curvature tuning by a green dot, and CW curvature tuning by a blue dot. Tuning peaks for all three curvature values are summed into the stacked histogram at the left. The center curvature peaks (red) were again biased toward sharper convexities. The adjacent curvature peaks included more broad convex and concave points. SDs for adjacent curvature (green and blue) were often larger than the sampled range, indicating weak, shallow tuning. If 0.65 (½ the sampled curvature range) is considered as a threshold, there were 97 cells tuned in at least 1 curvature dimension, 49 cells tuned in at least 2 curvature dimensions, and 16 cells tuned in 3 curvature dimensions. Thus the influence of adjacent boundary elements varied across cells, but many cells appeared to represent complex boundary configurations comprising multiple curvature segments. This could be important for encoding relative positions of adjacent boundary features.
Goodness-of-fit was assessed by calculating the coefficient of correlation (r) between neural responses and responses predicted by the tuning functions (see Fig. 3). The distribution ofr for Gaussian tuning functions on the 2-D curvature × position domain is shown in Fig.9 A. The fit was significant (F test, P < 0.01) for 101/109 cells; these cells are plotted with filled bars. The median r value was 0.46. The distribution of r values for Gaussian functions on the 4-D domain (which represents sequences of contour elements) is shown in Fig. 9 B. The inclusion of the adjacent curvature dimensions significantly improved goodness-of-fit (partial Ftest, P < 0.01) in 94/109 cases. The medianr value was 0.57. The fit was significant (Ftest, P < 0.01) for all but one of the cells. Thus many cells appear to encode information about more complex boundary configurations.
We further verified significance of the 4-D Gaussian tuning functions by randomly dividing the stimuli into two groups, fitting a 4-D Gaussian function to the training group (comprising 2/3 of the original stimulus set) and using this function to predict responses to the testing group (the remaining 1/3 of the stimuli). The median correlation (across cells) between predicted and observed responses for the testing group was 0.48. This was similar to the median correlation for the training group (0.61). Thus the 4-D tuning functions generalize to stimuli not used in the original fitting procedure.
The example cells in Figs. 2-5 fell in the high end of the Fig.9 B distribution, with r values of 0.82, 0.81, and 0.85, respectively. The Fig. 8 example had an r value of 0.71. Figure 9 B shows that many cells exhibited a significant amount of response variance not explained by the Gaussian curvature tuning functions. Three possible sources of variance are considered below.
Complex tuning functions
One possibility is that responses depend on boundary conformation in a more complex way that cannot be captured by a simple Gaussian surface. In particular, there might be some interaction, either facilitatory or inhibitory, between different boundary regions within the object. To assess this possibility we constructed models based on two Gaussian peaks in the 4-D curvature × position space. The amplitude of each Gaussian could be either positive or negative. The predicted response was the sum of the responses predicted by each Gaussian alone. The parameters of both Gaussians were simultaneously adjusted by nonlinear regression to minimize squared error between observed and predicted responses.
The addition of the second Gaussian tuning function increased the correlation between neural responses and predicted responses significantly (partial F test, P < 0.01) for the majority of cells (80/109). The amplitude of the second Gaussian was negative in 29/80 cases, suggesting an inhibitory interaction. The average increase in r was moderate (0.07). The distribution of r for the two-Gaussian models is shown in Fig. 9 C. The median r value of 0.64 was only slightly higher than the median of 0.57 for the single Gaussian tuning function. This result suggests that the single Gaussian tuning functions described most of the response variation associated with boundary curvature. However, the two-Gaussian analysis is just one fairly simple approach, and it may be that another, more complex analysis would provide a much better description of shape tuning.
Response measurement error
A second possibility is that unexplained variance represents noise in our response measurements. To explore a greater region of shape space, we opted for a large number of stimuli but a small number of repetitions (5). This approach yields more accurate estimates of overall tuning but less accurate estimates of the true mean responses to individual stimuli. As a result, much of the variance in our response patterns may be due to noise and thus unexplainable. In fact, the standard errors (SEs) of our mean response estimates tended to be high. For responses greater than half-maximum, the average SE (across all stimuli and all cells) was 25.8% of the mean response estimate. For each stimulus, the expected squared difference between the estimated mean response and the true mean response is SE2. To estimate response variance due to noise for each cell, we summed the expected squared differences across stimuli and divided by the number of stimuli. The estimated noise variance averaged 41.6% of total variance, implying that a substantial fraction of the variance not captured by boundary curvature tuning functions was unexplainable. Based on this estimate, the median 4-D Gaussian tuning function explained about 55% of the explainable variance (r = 0.74), and the median two-Gaussian model explained about 70% of the explainable variance (r = 0.84).
Tuning for linear orientation
A third possibility is that some cells are sensitive to other aspects of shape besides boundary curvature. In particular, it has been shown that many V4 cells are tuned for linear orientation (Desimone and Schein 1987). We intentionally sampled cells that appeared more sensitive to complex shape properties and less sensitive to linear orientation, based on preliminary tests. It is possible, nevertheless, that some of the shape selectivity we observed reflected standard orientation tuning. We tested this by using Gaussian functions to describe tuning for edge orientation and axial orientation.
To test for edge orientation tuning, we first decomposed each stimulus into component contour segments with relatively flat curvature (i.e., the broad convex and concave segments; see methods). We hypothesized that these relatively flat segments might drive cells tuned for linear orientation. We fit each cell's responses with a 1-D Gaussian on a 0–180° edge orientation domain. The distribution ofr values for these tuning functions is shown in Fig.9 D. The majority of fits (73/109) were significant (F test, P < 0.01), but the rvalues were generally low (median 0.21). We also tested the possibility that some cells were sensitive to both orientation and contrast polarity of edges, by fitting 1-D Gaussians on a 0–360° domain. This produced a few more significant fits (90/109) and a slightly higher median r value (0.29). The distribution of rvalues is shown in Fig. 9 E. There was only one cell for which the edge-based r value was higher than the 4-D curvature-based r value.
To test for axial orientation tuning, we determined the major axis and degree of elongation of each shape, using a standard analogy to mass to sum contributions from all parts of the shape (seemethods). These numbers are equivalent to orientation and aspect ratio for rectangular bars. We fit responses for each cell with a 2-D Gaussian tuning function on the orientation × elongation domain. The distribution of r values for these fits is shown in Fig. 9 F. The majority of fits (78/109) were significant (F test, P < 0.01), but r values were low (median 0.24). There was only one cell for which the axial orientation r value was higher than the 4-D curvature-basedr value.
We also tested the possibility that cells might be sensitive to total extent along the main and orthogonal axes (comparable to bar length and width). We fit 3-D Gaussian tuning functions on the orientation × length × width domain. The resulting distribution of rvalues is shown in Fig. 9 G. The majority of fits were significant (81/109), but the correlation values were low (median 0.29). There were five cases in which correlation for 3-D axial orientation tuning was higher than correlation for 4-D curvature, but the differences were small (maximum 0.05).
Comparison of r distributions for boundary conformation tuning (Fig. 9, A–C) and linear orientation tuning (Fig. 9, D–G) indicates that boundary configurations consisting of one or more angles and curves were more relevant for most of the cells in our sample. This was not simply due to the number of fitting parameters, since the 2-D curvature fits (based on 5 parameters; Fig. 9 A) were better than the 3-D axial orientation fits (based on 7 parameters; Fig. 9 G). The stronger tuning for boundary conformation is probably specific to the subpopulation of cells that we studied here. In an unbiased sample, a substantial proportion of V4 cells would show strong tuning for linear orientation.
Position-specific tuning for boundary conformation
Our results indicate that many neurons in area V4 are sensitive to boundary information at a specific position relative to the object center. For example, a given cell may respond well to shapes with convex curvature at the right and poorly to shapes with concave curvature on the right, without being much affected by other parts of the shape. The effective boundary pattern often comprises a sequence of adjacent curves and angles. This kind of shape tuning is apparent at a qualitative level from inspection of response patterns (see, for example, Figs. 2 A, 4A, and 5A). It can also be quantified with Gaussian tuning functions on a multidimensional curvature × position domain (as in Figs. 2 B,3A, 4B, and 5B). These tuning functions seem to be biased toward sharper convex curvature, although this may reflect our choice of stimuli, and the definition of convexity depends on an assumption about figure/ground organization. The specific dimensions that we tested may not correspond exactly to the underlying dimensionality in area V4, and our limited stimulus set may have fallen well outside the true shape-tuning peak for many of the cells we studied. It seems clear, however, that some cells in area V4 represent complex shape in a parts-based fashion, and that the relevant parts, for these cells, are contour segments defined by their conformation and position relative to the rest of the object. In our experiment, which involved only simple, silhouette-like stimuli, these contour segments always formed part of the object boundary, but selectivity for contour curvature presumably extends to internal contours of more complex, realistic objects as well.
To our knowledge, this type of shape coding has not previously been demonstrated, but the results are consistent with previous data. Curvature is known to be an important dimension in area V4 (Gallant et al. 1993, 1996;Kobatake and Tanaka 1994; Pasupathy and Connor 1999; Wilkinson et al. 2000) and elsewhere in the ventral processing pathway (Dobbins et al. 1987;Hegde and Van Essen 2000; Heggelund and Hohmann 1975; Janssen et al. 1999; Schwartz et al. 1983; Tanaka et al. 1991; Versavel et al. 1990). Previous work suggests that some area V4 cells encode feature position in relation to objects lying wholly or partially outside their classical receptive fields (CRFs) (Connor et al. 1997; Zhou et al. 2000).
Other sources of response variance
Tuning for local boundary conformation is only one aspect of shape representation in V4. We specifically sampled cells that, in preliminary tests, appeared selective for more complex shapes rather than oriented bars. Our results therefore apply only to a subpopulation within area V4. Many V4 cells are tuned for orientation and other aspects of linearly extended shape elements (Desimone and Schein 1987; Gallant et al. 1996). Some V4 cells may respond to complex shapes in a manner more similar to IT neurons (Kobatake and Tanaka 1994). Our stimulus set represents only one class of shape stimuli (cf. Gallant et al. 1993; Kobatake and Tanaka 1994; Richmond et al. 1987), and many V4 cells must respond optimally to objects not represented in our experiment.
Even among the cells we tested, there were clearly other sources of response variance besides local boundary conformation. A substantial fraction of the remaining variance was due to noise. Standard error values suggest that this fraction was 41.6% of total variance on average. In addition, however, there must have been other shape-related factors affecting the responses of some cells. Our analysis showed little tuning for edge and axial orientation. This was not surprising, given our selective sampling and the nature of the stimulus set. There may be more complex shape factors that affected responses, especially interactions between shape elements that could not be described by Gaussian tuning functions.
In addition, some cells may have been selective along nonshape dimensions that we did not vary. For example, size is an important dimension in area V4 (Ghose and Ts'o 1997). The stimuli in this experiment were designed to be small enough to fit within the average V4 CRF at the cell's eccentricity, but our previous results imply that some V4 cells function to encode parts of larger shapes extending beyond the CRF (Pasupathy and Connor 1999). Tuning for binocular disparity (Hinkle and Connor 2001) and absolute distance (Dobbins et al. 1998) is also common in V4. The stimuli in this experiment were all presented at zero disparity and at a distance of 50 cm, which would be nonoptimum for many cells. We attempted to optimize color, but color and luminance were always uniform across the shape, whereas many ventral pathway cells appear to be selective for color gradients and textures within objects (Hanazawa and Komatsu 2001; Tanaka et al. 1991).
Shape recognition theories
Our data imply that shape representation in area V4 is distributed, with individual cells encoding smaller parts of larger objects. This is consistent with shape-processing theories based on the idea of “recognition by parts” (Biederman 1987;Dickinson et al. 1992; Hoffman and Richards 1984; Marr and Nishihara 1978;Riesenhuber and Poggio 1999). According to these theories, shapes are represented as combinations of simpler elements, called features or primitives. Shape recognition is envisioned as a hierarchical process, with progressively more complex features at each stage. Local orientation (of edges or medial axes) is considered to be the primary shape feature at early stages, based on the prevalence of orientation tuning in areas V1 and V2 (Hubel and Wiesel 1968). The final representation may involve structural descriptions based on volumetric primitives (Biederman 1987; Dickinson et al. 1992; Marr and Nishihara 1978) or interpolation between and alignment with canonical images in memory (Ullman 1989; Vetter et al. 1995).
The results presented here imply that boundary configurations at specific object-relative positions are important second-level shape features at intermediate processing stages like area V4. Of particular significance here is our finding that tuning for local boundary conformation can remain consistent across a variety of complex shapes. Thus an individual cell can participate in coding local boundary conformation within any number of shapes. This is an essential characteristic for units in a parts-based, distributed coding system.
Our results indicate that angles and curves, and combinations of angles and curves, are important boundary features in area V4. A number of theories posit angles and/or curves as intermediate shape features (Biederman 1987; Dickinson et al. 1992;Riesenhuber and Poggio 1999). Psychophysical experiments have demonstrated that human observers are highly sensitive to both angles (Chen and Levi 1996; Heeley and Buchanan-Smith 1996; Regan et al. 1996) and curvature (Watt and Andrews 1982; Wilson et al. 1997). Functional imaging has revealed a strong representation of curvature in human area V4 (Wilkinson et al. 2000).
In our data, curvature tuning peaks cover the range from convex to concave, but there appears to be a bias toward sharp convex curvature. This may be due to the fact that our stimulus set did not include sharp concave curvature. Moreover, our definition of convexity depends on the assumption that the stimulus is perceived as figure and the rest of the display screen as ground. Our previous results also suggested a bias toward convexity (again, assuming that the stimulus is perceived as figure) (Pasupathy and Connor 1999). Theoretical considerations and psychological findings favor the perceptual importance of convexity. As Hoffman and Richards (1984)pointed out, concave curvature is more likely to represent joints between object parts, while convex curvature is more likely to define the parts themselves. Psychological results support this postulate, showing that observers tend to parse shapes into convex elements (Braunstein et al. 1989; Singh et al. 1999). Convex features also dominate shape similarity and figure/ground judgments (Kanizsa and Gerbino 1976;Subirana-Vilanova and Richards 1996).
Ultimately, signals for the identities and positions of shape parts must be integrated for recognition to occur. Responses at later stages of the ventral pathway in IT show a high level of integration and selectivity for global shape. Cells at these later stages may synthesize V4 signals that define the curvature and position of individual boundary segments. Even in V4, however, we find some indication of integration in progress: many cells in our study exhibited tuning for multiple adjacent curvature segments. This kind of tuning may reflect gradual synthesis of global shape representations.
We thank S. Brincat, P. Fitzgerald, D. Hinkle, S. Hsiao, K. Johnson, R. Pasupathy, G. Poggio, and R. von der Heydt for comments and suggestions. Technical assistance was provided by B. Nash and B. Sorenson.
This work was supported by National Institute of Neurological Disorders and Stroke Grant NS-38034 and by the Pew Scholars Program in the Biomedical Sciences.
Address for reprint requests: C. E. Connor, Krieger Mind/Brain Institute, Johns Hopkins University, 338 Krieger Hall, 3400 N. Charles St., Baltimore, MD 21218 (E-mail:).
- Copyright © 2001 The American Physiological Society