JN Fuel your research with LabChart
HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS
 QUICK SEARCH:   [advanced]


     


J Neurophysiol 97: 4310-4326, 2007. First published April 18, 2007; doi:10.1152/jn.00203.2007
0022-3077/07 $8.00
This Article
Right arrow Abstract Freely available
Right arrow Full Text (PDF)
Right arrow All Versions of this Article:
97/6/4310    most recent
00203.2007v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Right arrow Citation Map
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Citing Articles
Right arrow Citing Articles via ISI Web of Science (2)
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Craft, E.
Right arrow Articles by von der Heydt, R.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Craft, E.
Right arrow Articles by von der Heydt, R.

A Neural Model of Figure–Ground Organization

Edward Craft1,3, Hartmut Schütze3, Ernst Niebur2,3 and Rüdiger von der Heydt2,3

1Department of Biophysics, 2Department of Neuroscience, and 3The Zanvyl Krieger Mind/Brain Institute; Johns Hopkins University; Baltimore, Maryland

Submitted 24 February 2007; accepted in final form 8 April 2007


    ABSTRACT
 TOP
 ABSTRACT
 INTRODUCTION
 METHODS
 RESULTS
 DISCUSSION
 APPENDIX
 GRANTS
 ACKNOWLEDGMENTS
 REFERENCES
 
Psychophysical studies suggest that figure–ground organization is a largely autonomous process that guides—and thus precedes—allocation of attention and object recognition. The discovery of border-ownership representation in single neurons of early visual cortex has confirmed this view. Recent theoretical studies have demonstrated that border-ownership assignment can be modeled as a process of self-organization by lateral interactions within V2 cortex. However, the mechanism proposed relies on propagation of signals through horizontal fibers, which would result in increasing delays of the border-ownership signal with increasing size of the visual stimulus, in contradiction with experimental findings. It also remains unclear how the resulting border-ownership representation would interact with attention mechanisms to guide further processing. Here we present a model of border-ownership coding based on dedicated neural circuits for contour grouping that produce border-ownership assignment and also provide handles for mechanisms of selective attention. The results are consistent with neurophysiological and psychophysical findings. The model makes predictions about the hypothetical grouping circuits and the role of feedback between cortical areas.


    INTRODUCTION
 TOP
 ABSTRACT
 INTRODUCTION
 METHODS
 RESULTS
 DISCUSSION
 APPENDIX
 GRANTS
 ACKNOWLEDGMENTS
 REFERENCES
 
Visual perception begins at the retina, with cluttered two-dimensional (2D) projections of the three-dimensional (3D) world. In the early twentieth century, Gestalt psychologists observed that we tend to organize this clutter through a process of figure–ground segregation—i.e., by identifying those regions of the retinal images that are object-related (figures) for further processing, and relegating other regions to the background (Rubin E 1921Go, 2001Go; Wertheimer 1923Go, 2001Go). Borders appear where objects occlude what is behind them and can often be identified by discontinuities in color, luminance, or texture. However, simply locating these borders is not enough to distinguish figure from ground. It is also necessary to infer from the image context which of the two regions abutting along each border corresponds to the closer, occluding surface, and is defined by—or "owns"—the border (Nakayama et al. 1989Go, 1995Go; Rubin E 1921Go, 2001Go). As the illustrations in Fig. 1 demonstrate, this assignment of border ownership critically determines how we perceive individual regions of a scene (Fig. 1, A and B) and how objects are recognized (Fig. 1, C and D).


Figure 1
View larger version (34K):
[in this window]
[in a new window]

 
FIG. 1. Border ownership influences shape perception and object recognition. After examining the scene in A, observers tend to remember the top shape in B, but not the bottom one. Although there are regions in A corresponding to both shapes, one region "owns" its borders and is perceived as a definite figure, whereas the other region is lumped into the background (modified from Rubin N 2001Go). In C, the shaded regions "own" all of their borders and appear to be dissociated and meaningless. When a plausible occluding surface is introduced (D), the fragments lose ownership of key borders and flow together perceptually in the background. Only then can we recognize them as pieces of familiar objects (modified from Kandel et al. 2000Go, after Bregman 1981Go).

 
Although there is evidence suggesting that high-level object understanding can influence figure–ground segregation (see, e.g., Peterson et al. 1991Go), most studies have concluded that figure–ground segregation precedes high-level processing. Recognition of objects or contour shapes depends on the assignment of border ownership (Driver and Baylis 1996Go; Nakayama et al. 1989Go). Such an effect can be observed in Fig. 1C, where manipulation of border ownership impedes the recognition of familiar objects. It has also been observed that, in cluttered displays, a shape that pops out in visual search when it "owns" its borders can only be found with scrutiny when the apparent depth ordering is changed so that some of its borders get assigned to an alternate foreground object (He and Nakayama 1992Go; Rensink and Enns 1998Go). Thus border-ownership assignment seems to precede the process of object recognition, and the deployment of attention (such as in a search task) seems to work on a structured representation in which borders are already assigned to the corresponding regions or objects. The explanation of these findings presents a conundrum, though: If higher-level processing of objects depends on figure–ground segregation, how can figure–ground segregation depend on understanding image context?

Background

The problem of figure–ground organization was previously addressed by several theoretical studies. In this section, we briefly review the models proposed by these studies, in the context of recent neurophysiological findings. Although some were quite effective in resolving the occlusion structure of images and successfully explained the perceptual phenomena, we argue that consideration of neural coding mechanisms, distance and speed limitations on cortical processing, and the need for an interface to central processes such as selective attention requires a different approach. In subsequent sections, we then present an alternative model architecture that satisfies these constraints.

PRINCIPLES OF CODING. Two general schemes have been proposed for figure–ground representation: region labeling and border-ownership coding. Whereas region labeling means that regions are differentiated by labeling the corresponding elements in an isomorphic surface representation (like labeling pixels in a bitmap), border-ownership coding involves the contour representation and thus orientation-selective neurons.

One class of models has used region labeling, assuming, for example, that color/brightness signals spread within a cortical sheet to fill in the regions within the boundaries given by the contour representation (Arrington 1994Go; Gerrits and Vendrik 1970Go; Grossberg 1994Go; Grossberg and Mingolla 1985Go; Roelfsema et al. 2002Go). Although such mechanisms may be appealing theoretically, it is not clear whether spreading of color/brightness signals in the visual cortex occurs (Roe et al. 2005Go; Rossi and Paradiso 1999Go; von der Heydt et al. 2003Go). Figure–ground organization has been studied in area V1, where neural responses representing a figure region were found to be enhanced compared with responses representing the ground region (Lamme 1995Go; Lee et al. 1998Go; Zipser et al. 1996Go). This could be interpreted as region labeling.

The present model is based on the data from recent studies of border-ownership coding. We chose to model these results because the border-ownership signals, particularly those in area V2, tend to be stronger and emerge earlier than the figure enhancement in V1, suggesting that the latter is the result of feedback from V2 or other extrastriate areas (cf. Lamme et al. 1998Go). Zhou et al. (2000)Go found that orientation-selective cortical neurons, which are usually thought to represent local contour properties such as edge contrast and orientation, are also sensitive to the global configuration of contours and code for border ownership. A neuron may respond to a contrast edge with a high firing rate if the edge is a contour of a figure on one side of the receptive field, but with a low firing rate if it is a contour of a figure on the other side. Many of these neurons combine side-of-figure selectivity with selectivity for the depth order of surfaces, as defined by stereoscopic cues (Qiu and von der Heydt 2005Go) or by dynamic occlusion (von der Heydt et al. 2003Go). This shows that side-of-figure selectivity is not just a random asymmetry in the wiring of receptive fields, but has to do with the strife for a 3D interpretation of the image. Thus the side-of-figure–related response modulation reflects figure–ground organization. The most intriguing aspect of the neurophysiological findings is the influence of the image context. Apparently, V2 neurons have some knowledge of global shape even when the shapes are much larger than their receptive fields.

In essence, border-ownership selectivity means that each contour element is represented by two pools of neurons, one for each side of ownership, whose differential activity codes for border ownership (more precisely, the degree of border-ownership assignment), whereas their common activity codes for the local contour attributes such as orientation, motion, color/luminance contrast, and so forth. It is interesting to see the assignment of borders to regions, which is relational information, encoded in the firing rate of neurons like other contour attributes. It was often assumed that coding of relations between features requires a qualitatively different mechanism such as synchronized oscillation across neurons (Singer and Gray 1995Go). The preceding results suggest that border ownership is represented by opponent channels, just as light and dark are represented by on- and off-center ganglion cells, and direction of motion by neurons in MT cortex. Although the evidence described earlier comes from recordings in monkey visual cortex, psychophysical experiments indicate that border-ownership–selective neurons also exist in the human visual cortex (von der Heydt et al. 2005Go). Taken together, these findings indicate that the visual system explicitly represents border ownership at an early cortical level following the stage of local feature representation.

Representing figure–ground relationships in terms of border ownership seems plausible, for theoretical as well as physiological reasons. The contours carry most of the information about the shape of objects. Consequently, as discussed earlier (Fig. 1), border-ownership assignment is critical for object recognition. The vast majority of neurons in V1 and V2 are edge selective and orientation tuned, and a comparison between the activity evoked by the borders and the interior of a figure shows that the border signals are five- to sixfold stronger than the surface signals (Friedman et al. 2003Go).

Several studies have modeled figure–ground organization in terms of border-ownership coding. Some of these assume that image context integration is achieved by lateral propagation of signals within the image representation, such as by horizontal fibers in area V2 (Baek and Sajda 2005Go; Kikuchi and Akashi 2001Go; Nishimura and Sakai 2004Go, 2005Go; Pao et al. 1999Go; Zhaoping 2005Go). As will be discussed in the next section, there are physiological constraints that limit the speed of image context integration in this type of architecture. Other models advocate feedback from higher cortical areas (Finkel and Sajda 1992Go; Kienker et al. 1986Go; Kikuchi and Fukushima 2003Go; Sajda and Finkel 1995Go; Yu et al. 2001Go). These models are not constrained by the limitations of feedforward and lateral connections.

Perhaps the most extensive of these models, in terms of its ability to integrate multiple cue types and bind features for visual surface construction, is that of Sajda and Finkel (1995)Go. It uses algorithms that identify contour terminations, resolve junctions, and identify closed contours, enabling the system to "tag" complete contour segments. Border ownership, represented as a binary value, is then assigned for each segment. Although this model implements border-ownership coding, and thus parallels the physiology better than region-labeling models, questions about its neural implementation remain. For example, it is unclear how the identification of contour segments (which is a global operation) might be realized and how the tags might be represented. The authors suggest coherent oscillation of the neurons representing the same segment as a possible mechanism. However, the functional role of such oscillations in primate visual cortex under awake conditions has been debated (Bair et al. 1994Go; Young et al. 1992Go). Our own recordings from pairs of cells (n = 37) failed to show significant coherent oscillations, whether both cells were activated by a common contour segment or by different, unrelated contour segments (F Qiu, H Schütze, and R von der Heydt, unpublished observations). Also, because we know now that border ownership modulates firing rates, the coherent oscillation hypothesis appears as a rather remote possibility. Finally, the gradual nature of the neural border-ownership signals (Zhou et al. 2000Go) argues against binary coding and the global tagging scheme. The existence of neurons with a fixed border-ownership preference calls for a revision of the basic concept.

PHYSIOLOGICAL CONSTRAINTS. The recent physiological results place important constraints on modeling. First, the extent of visual context integration in border-ownership modulation is much larger than the classical receptive field. Second, it was found that the border-ownership signal emerges with a short latency. Figure 2 illustrates the time course of border-ownership signals (and edge signals, for comparison) in the critical tests. The stimulus configuration is shown schematically at the top. The receptive field (ellipse) was stimulated with a straight edge that could be the border of a square either on one side or on the other. The important point is that the entire region of visual field occupied by the two squares received identical stimulation in the two conditions. Thus any difference between the responses indicates an influence from stimulus features outside this region. The size of this region (and thus the minimum extent of spatial context that needs to be integrated for border-ownership assignment) is given by the size of the squares. In the experiment illustrated in Fig. 2, square sizes of 3 and 8° of visual angle were tested. Thus the region of identical stimulation was either 3 x 6 or 8 x 16°. The black curves show the time course of the edge signals (the average of the firing rates for the two figure locations) and the border-ownership signals (the difference between the firing rates for the two figure locations) are shown in red. It can be seen that the border-ownership signal emerges well before 100 ms, and with only a small delay after the edge signals (which are representative of V2 neurons in general). Importantly, there also appears to be no difference in latency between the border-ownership signals for large and small figures. That is, context integration over larger distances in the visual field does not take more time than context integration over smaller distances.


Figure 2
View larger version (26K):
[in this window]
[in a new window]

 
FIG. 2. Timing of border-ownership signals does not depend on the spatial extent of image context integration. Neural signals (from single-unit recordings in macaque V2) representing the center of the edge of a square are shown for square sizes of 3° (full lines) and 8° (dashed lines). Border-ownership signals (red) were defined as the difference between the responses for the 2 sides of figure location (average of smoothed normalized firing rates of 42 V2 neurons). Edge signals (defined as the mean of the 2 responses) are also shown for comparison (black). Amplitudes of border-ownership–selective (BOS) signals and edge signals are scaled differently, but the signals for the 2 figure sizes are plotted on the same scale. Note that the BOS signals for 8° figures (dashed red line) are not delayed relative to the BOS signals for 3° figures (solid red line), and that both rise at approximately the same rate, whereas models based on within-area connectivity predict significant delays, because the cortical distance from the representation of the critical context features to these neurons is considerably longer for the 8° figures than for the 3° figures. Reproduced from Sugihara et al. (2003)Go.

 
The integration of image context cannot be achieved by convergence in the afference because its spatial extent is much larger than the size of the classical receptive fields (8° is about tenfold the average size of the receptive fields of V2 neurons in the near-foveal region that was studied). It must involve either horizontal fibers within V2 cortex, or recurrent fibers from other extrastriate areas, or both.

A model of border ownership that aims to account for the recent neural data, as well as the perceptual phenomena, assumes that context integration occurs by horizontal fibers within V2 (Zhaoping 2005Go). This parsimonious model reproduces the observed border ownership data from assumptions about the lateral connectivity in V2. Briefly, it posits that neurons with nearby receptive fields are linked by excitatory and inhibitory connections depending on whether the corresponding border segments are consistent with being contours of the same figure.

The assumption made in this model (Zhaoping 2005Go) and others (Baek and Sajda 2005Go; Grossberg 1994Go; Kikuchi and Akashi 2001Go; Nishimura and Sakai 2004Go, 2005Go; Pao et al. 1999Go), that image context integration occurs within the area, through horizontal fibers, implies that the latency of the border-ownership signal would increase with the distance of the relevant image context from the receptive field under consideration. This is a consequence of the retinotopic representation of visual information in V2 cortex.

As pointed out before (Zhou et al. 2000Go), the distances in V2 cortex are considerable and the conduction through intracortical fibers is probably too slow to explain the short latencies of border-ownership signals. Conduction velocity estimates for these fibers range between 0.1 and 0.25 m/s in cat V1 (Bringuier et al. 1999Go; Grinvald et al. 1994Go) and 0.33 m/s in monkey V1 (Girard et al. 2001Go); we are not aware of corresponding data for V2. Note that these figures are median values; there is a range of conduction velocities of single fibers, and it has been argued that longer fibers might conduct much faster than shorter fibers (Zhaoping 2005Go).

To make a quantitative argument, it is necessary to consider the topography of area V2. A look at the well-known illustration of the unfolded cortical areas (Felleman and Van Essen 1991Go) shows that V2 is a large area whose elongated shape is quite different from that of V1. In V2, the visual field representation is split at the horizontal meridian into a ventral part and a dorsal part that are connected only by a narrow bridge. Detailed maps of V2 (Gattass et al. 1981Go) show that intracortical fibers would have to span considerable distances to explain the findings on border-ownership coding (Qiu and von der Heydt 2005Go; Zhou et al. 2000Go).

As an example, consider responses produced by a square figure of 8° side length. When an edge of the square is centered about the receptive field of the neuron under study, which was the condition used in the cited studies, the closest points that can provide border-ownership information are the corners on both ends of the edge, with a distance of 4° visual angle from the receptive field center. The representation of one of those corners is generally also the nearest point in cortex to the neuron where such information is available. From Fig. 9 of Gattass et al. (1981)Go it can be seen that two neurons with receptive fields located 2.5 and 6.5° below the center of gaze (cells 5 and 11) are separated by 21 mm [because no scale bars are provided, we estimated the scale based on brain sections of the monkeys of Zhou et al. (2000)Go to be 3:1]. Thus if the vertically oriented edge of the square were centered on the receptive field of cell 5, then the bottom corner would be represented at a distance of 21 mm. The representation of the top corner would be even farther away, in the ventral part of V2. For edges of horizontal orientation the situation is similar (consider cells 61 and 53): if the center of the edge is at 0.6° horizontal eccentricity, one corner would be at 4.6° eccentricity on the horizontal meridian and represented in cortex at a distance of about 27 mm, and the other corner would be represented in the opposite hemisphere. Because the maximal length of horizontal connections is only 3–4 mm (measured in V1; Table 1 in Angelucci et al. 2002Go), the signals would have to be relayed many times through a cascade of neurons. Moreover, activity cannot be relayed through cells that are not directly stimulated (this would contradict the notion of the classical receptive field: stimuli outside this receptive field generally do not elicit responses). Thus signals could propagate only through neurons that are excited by the given contrast borders.


Figure 9
View larger version (25K):
[in this window]
[in a new window]

 
FIG. 9. Predictions made by the model for additional stimuli. Shown are border-ownership assignments made by the model for a modified C-shape (A) and an occluding bar (B). Model also predicts perceptual grouping of isolated contours by proximity (C), which reverses (D) when horizontal contours are added. Arrows as in Fig. 6.

 
Any model that attempts to explain border-ownership coding in V2 faces the problem that neurons have to communicate over those distances. If communication is assumed to occur within V2, and assuming a median conduction velocity for horizontal fibers of 0.3 m/s (Girard et al. 2001Go), distances of 21–27 mm would imply delays of 70–90 ms just from signal propagation, whereas the observed total delays are only about 30 ms (Fig. 2). These figures show that conduction delays are a serious problem for models based on intracortical connections, not to mention the problem of communication between points in left and right hemifields, for which intracortical connections do not exist, or between points in upper and lower quadrants of a hemifield, for which such connections may not exist. In contrast, the average length of white matter fibers connecting V2 and V4 is no more than about 20 mm (estimated from Gattass et al. 1988Go; Fig. 3), corresponding, at 3.5 m/s (Girard et al. 2001Go), to about 6-ms conduction delay, or 12 ms for the loop. Loops through V3 would be even faster. In addition, white-matter connections of comparable speeds exist within and between hemispheres (Swadlow 2000Go) and, presumably, the feedback connections from higher-order areas target the ventral and dorsal representations of V2 alike, without major discontinuities in the time course of signals.


Figure 3
View larger version (24K):
[in this window]
[in a new window]

 
FIG .3. Model architecture. A: network overview, showing border-ownership selection for a stimulus of 2 overlapping rectangles (bottom). Receptive fields of B cells are shown as ellipses, where attached arrows indicate their preferred side of figure. B cells with opposite arrows compete, and this competition is decided by grouping cell input (receptive fields of active cells are shown in green and red; receptive fields of suppressed cells shown in gray). Output from the B cells is then passed on to higher-order areas, not the subject of this study (but see Fig. 11 and DISCUSSION). B: microcircuitry of the model network. Input to the model is provided by orientation-tuned edge detectors, C{theta}, and end-stopped cells, E{theta}±. At a particular location in the visual scene, neuron C{theta} is activated by the presence of an edge with orientation {theta}. Each edge detector provides input to a pair of B cells, B{theta}+ and B{theta}, which are mutually inhibitory (strength beta). B cells project to grouping cells (G and G') with strength {gamma} and receive inhibition from those with strength {rho}. For clarity, feedback from only one size of G cells (see text) has been drawn, and connections between the G cells and other pairs of B cells have been omitted. E{theta}± bias edges toward one side of ownership wherever line terminations occur (e.g., at T-junctions; see METHODS).

 
In summary, considerations of the V2 topography and the timing issue indicate that the intrinsic connectivity alone is not sufficient to explain the findings on border-ownership coding. In the case of area V1, detailed comparisons of the spatial range of surround effects with the existing anatomical connections have indicated that feedback connections from higher-order areas are responsible for producing the nonclassical surround effects (Angelucci and Bullier 2003Go; Angelucci et al. 2002Go). By analogy, it is plausible that image context integration for border-ownership assignment in area V2, which is a nonclassical surround effect of large spatial extent, also involves fast loops through higher-order areas.

FIGURE–GROUND ORGANIZATION AND MECHANISMS OF ATTENTION. Our motivation for the present study comes also from the need for a more general point of view. Most of the existing models leave open the question of how the figure–ground assignments they compute influence, and are influenced by, higher-level processes (some exceptions are Carpenter and Grossberg 1993Go; Grossberg et al. 1994Go; Vecera and O'Reilly 1998Go). The models transform one retinotopic representation into another in which regions are labeled as figure and ground, or contour segments are assigned border ownership. Although the result is an improved representation, it is essentially no more than an enhanced image. To avoid referring the further processing to a homunculus that views this internal image, we have to offer at least a hypothesis of how the figure–ground representation produced by the model will interact with higher-level processes, specifically processes of selective visual attention and form recognition. As pointed out, the shapes of figure regions capture attention and are remembered, whereas the shapes of ground regions often go unnoticed (Fig. 1B). This shows that figure–ground organization plays a role in selective attention and both must be closely related.

Kienker et al. (1986)Go modeled figure–ground segregation in a purely top-down fashion—border ownership being determined by the location of an attentional spotlight. In contrast, most recent models treat figure–ground segregation as a purely bottom-up process, relying on local, within-area interactions at the lower stages of the visual hierarchy (e.g., Zhaoping 2005Go). Neither of these approaches offers a satisfactory explanation of how the whole system can function, barring the unacceptable solution of a homunculus that, in the case of the first model, determines where to shine the top-down attentional spotlight without the benefit of an organization of the sensory information or, in the case of the latter model, a homunculus that interprets the transformed versions of the retinal image created by the autonomous, bottom-up circuits. One solution is to assume iterative interaction between bottom-up signals and memory-related top-down signals in a way that converges over time (Carpenter and Grossberg 1993Go; Vecera and O'Reilly 1998Go). Inasmuch as these algorithms rely on the filling-in hypothesis and lateral signal propagation in cortex, though, the earlier concerns regarding physiological plausibility and the latency problem remain the same as for the other models.

In the present model, we propose dedicated neural circuits for perceptual grouping and figure–ground organization that also provide handles for attentional selection. Because our circuits for image context integration are separate from those representing the visual sensory information, they may include neurons in a higher-order cortical area and use recurrent white-matter projections, explaining the size invariance of the latency of the border-ownership signal. Our framework is consistent with recent neurophysiological findings (Zhou et al. 2000Go) and makes specific, testable predictions regarding both the physiological mechanism of border-ownership determination and the functional role of this mechanism in higher-level visual processing. Portions of this report were previously presented in abstract form (Craft et al. 2004Go; Schütze et al. 2003Go).


    METHODS
 TOP
 ABSTRACT
 INTRODUCTION
 METHODS
 RESULTS
 DISCUSSION
 APPENDIX
 GRANTS
 ACKNOWLEDGMENTS
 REFERENCES
 
Overview of network structure

Figure 3A shows the overall architecture of our model network and Fig. 3B highlights specific aspects of its connectivity. Input is provided by a stimulus map composed of oriented-edge detectors C{theta}, akin to the topographic representation of a scene by complex cells in primary visual cortex (V1). Because an edge can be owned on either of its two sides by a figure, each edge detector C{theta} provides input to a pair of mutually antagonistic border-ownership cells—one for each direction of ownership, B{theta}+ and B{theta} ("B cells"; see Fig. 5A for direction notation). The B cells inhibit each other (connections labeled beta in Fig. 3B). The current version of the model uses only horizontal and vertical edges, resulting in four border-ownership channels. The B cell pairs receive additional input from end-stopped cells, E{theta}+ and E{theta} (see Border-ownership cells and Fig. 5A).


Figure 5
View larger version (27K):
[in this window]
[in a new window]

 
FIG. 5. Model implementation details. A: notation for directions of ownership and end-stopping at {theta}-oriented edges. B: cross-sectional profiles of receptive field annuli, showing radial variation of connection weights. Large cross-sections have small peak magnitudes because each annulus is normalized to a total weight of unity. Inset: illustration of relation of annulus parameters to the 2D view in Fig. 4A. C: resolution of G cell–receptive field into correlation kernels for each orientation of border ownership, as described in METHODS and by Eqs. A6–A10 in the APPENDIX.

 
The balance of activity in each pair of B cells is influenced by feedback from a third layer in the network, consisting of contour-grouping cells ("G cells"). Residing at a higher level of the visual hierarchy, G cells have larger receptive fields than those of the B cells and sample the activity of the B cells in such a way that they respond preferentially to combinations of cocircular contour segments. The annular patterns shown in Fig. 4 illustrate the relative distribution of B cells at different radii that provide input to each G cell. A given G cell is driven by the activity of those B cells that signal ownership in the direction of the center of its annulus; the G cell then provides feedback that inhibits the B cells’ signaling ownership in the opposing direction. For example, in Fig. 3B, cell B{theta}+ provides input to grouping cell G (connection labeled {gamma}), so G inhibits B{theta} (connection labeled {rho}). This is functionally similar to a circuit in which G applies positive feedback directly to B{theta}+, but does not require any additional mechanisms to preserve the classical receptive field property of the B cells (see RESULTS), allowing us to focus on the performance of the grouping mechanism. Note also that the inhibitory connections in Fig. 3 require inhibitory interneurons that we omitted in this schematic for the sake of clarity.


Figure 4
View larger version (18K):
[in this window]
[in a new window]

 
FIG. 4. Grouping cell connectivity. A: spatial distribution of connections; darker pixels indicate stronger connection weights. B: annular connectivity emphasizes convexity ("C") and proximity ("P") of contours.

 
This configuration causes the G cells to bias B cells with a Gestalt-like preference for convexity and proximity of stimulus contours, as illustrated in Fig. 4B. The preference for convexity is a direct consequence of grouping cocircular contour segments (see annulus "C"). A proximity preference (see annulus "P") results because the individual synaptic weights are larger for the smaller, less-diffuse annuli. Global border ownership is determined by the net balance of these biases as they interact in the recurrent network.

Grouping cell connections

Each pixel of the connection pattern images in Fig. 4A indicates the point of origin and strength (in grayscale coding; darker meaning stronger) of a single feedforward connection from a B cell at the perimeter of the annulus to the G cell at the center. These connection patterns are generated by convolving circles of the desired radii (r, Fig. 5B) with a normalized 2D Gaussian filter (parameter {sigma}, Fig. 5B), resulting in the annuli shown. The annular regions become increasingly diffuse at larger radii to preserve scale invariance, and we normalize the strength of the synaptic inputs (connection weights) to give a common total weight (of unity) for all radii.

Next, we determine which border-ownership orientation channel must give rise to each of the connections along the perimeter of these annuli. The preferred direction of border ownership is defined at each point in the connection pattern image by a vector pointing toward the center of the annulus. Because our model currently implements only four orientations of border ownership, we resolve these vectors into positive and negative components along the horizontal and vertical axes of ownership. Based on these vector components, we then partition the annuli into separate sets of connections arising from each of the four border-ownership orientation channels (cocircular contour fragments K{theta}±(r), Fig. 5C; see APPENDIX for details).

All G cells receive their input through a translated version of this same relative pattern of connections. In such a homogeneously connected network, 2D cross-correlation (i.e., spatial filtering) provides a natural description of the interactions between layers, using the fixed connectivity pattern as the correlation kernel. The four annulus components, K{theta}±(r), are correlated separately with the four border-ownership channels, B{theta}±, as described by

Formula 1(1)
For brevity, we have used boldface symbols B and K to represent the 2D arrays B(x,y) of neuron firing rates and K(x,y) of connection strengths, and "{otimes}" denotes the 2D cross-correlation operation.

If we adopt the shorthand notation S{theta}±(r) = B{theta}± {otimes} K{theta}±(r), we can regard the input from each kernel as representing individual curvature segments, "S{theta}±(r)(x,y)," with center of curvature (x,y), radius of curvature r, and orientation of ownership {theta}±. We would like the G cell activity to reflect a preference for cooccurrences among these segments so, rather than summing them linearly, we combine them as follows

Formula 2(2)
Here, unique pairwise products of the cocircular inputs at each spatial location (x,y) are summed, and the square root is applied as a compressive nonlinearity to ensure that the result stays approximately in the same dynamic range as the individual inputs. Equation 2 represents just one way to model selectivity for cooccurrences among inputs. Our simulations have shown that other forms of input nonlinearity (e.g., logarithmic) can also be used to produce a similar selectivity.

The activity of the G cells is then governed by firing-rate equations of the form

Formula 3(3)
where the term in brackets on the right-hand side is the total input described earlier, {tau}G is a time constant common to all G cells, and {gamma}(r) scales the feedforward connection weights for each G cell receptive field size. Again, the boldface symbol G represents a 2D array, G(x,y), of neuron firing rates.

Border-ownership cells

As was illustrated in Fig. 3, every feedforward connection from a cell B{theta}± to a G cell is accompanied by an inhibitory feedback connection from that same G cell to the opposing border-ownership cell, B{theta}{mp}. Because pixels in the annulus images shown in Fig. 4A represent border-ownership cell locations that a particular G cell "observes," we can characterize the feedback connections by asking the reciprocal question: What is the pattern of all G cells that a single B cell location observes? Mathematically, this pattern can be obtained directly from the feedforward correlation kernels K{theta}±(r) described in the previous section (see APPENDIX). As with the grouping cells, these feedback connections to the B cells are implemented using spatial filtering [G(r) {otimes} K{theta}±(r)]. Together with a term betaB{theta}± that accounts for the mutual inhibition between a pair of border-ownership cells, the sum over these cross-correlations yields the total inhibitory input (see Eqs. 4 and 5, below) to the border-ownership cells.

Excitatory feedforward input to B{theta}± is provided by orientation-selective edge detectors, C{theta} (see Fig. 5A for directional notation). Additional contributions from local stimulus features can also be taken into account in the input stage. For instance, cues such as binocular disparity and dynamic occlusion contribute to the perception of border ownership and have been shown to influence the neural border-ownership signals accordingly (Qiu and von der Heydt 2005Go; von der Heydt et al. 2003Go). In the present study, we model the influence of T-junctions as one example of such local cues.

T-junctions are represented in visual cortex by the activity of end-stopped cells, which are common in V1 and V2 (Heider et al. 2000Go; Hubel and Wiesel 1968Go). Because these cells respond selectively to terminations of edges and lines, they are well suited for detecting occlusion features (Heitger et al. 1992Go). Model end-stopped cells have been used successfully in prior computational models for representing occluding contours (Heitger and von der Heydt 1993Go; Heitger et al. 1998Go).

In our model, oriented edge segments that form the "hat" of a T-junction (encoded by complex cells, C{theta}) are biased toward ownership on the side where termination of the "stem" of the junction (encoded by end-stopped cells, E{theta}±) suggests occlusion (see Fig. 5A). It is important to note that this T-junction bias does not depend on specialized detectors that classify image features. All that is needed are edge selective elements that are also sensitive to the presence of an intersecting contour on one side, similar to what was proposed in the model of illusory contours of Heitger et al. (1998)Go. B cells whose activity is influenced by orthogonally oriented, end-stopped cells achieve this naturally.

End-stopped cells may either excite or inhibit border-ownership cells to produce this bias (Fig. 3B). Thus the combined input from C{theta} and E{theta}± effectively yields three distinct levels of total feedforward input: each cell B{theta}± may receive a baseline input (proportional to the activity of C{theta}) that conveys local edge contrast; a stronger input (e.g., proportional to twofold the activity of C{theta}, as a consequence of complementary excitation from E{theta}±), where end-stopping indicates ownership in the B cell's preferred direction; or a weaker input (e.g., diminishing toward 0, as a result of inhibition from E{theta}{mp}), where end-stopping indicates ownership in the opposite direction. For simplicity, these three levels of feedforward input were combined into a single term in the stimulus maps, as indicated in Eqs. 4 and 5 using the mathematical shorthand CE{theta}± (see APPENDIX for more details).

The activity of all complementary pairs of B cells is thus described by

Formula 4(4)

Formula 5(5)
where {tau}B is the B cell time constant, beta reflects the strength of mutual inhibition between opposing border-ownership cells, and {rho}(r) scales the feedback from the grouping cells. The subscripted plus signs on the brackets indicate that B cell responses are rectified—i.e., if the net input to a B cell becomes negative, its firing rate will not decrease below zero.

Based on Eqs. 3, 4, and 5, we implemented a model network of four orientation channels of border ownership and six sizes of G cell receptive fields (see Figs. 4A and 5B), with a resolution of one neuron per pixel in each visuotopic B cell map. To maintain uniform coverage of the B cell layers by each of the different sizes of G cell annuli, the number of G cells present at each scale (radius, r) was decreased in proportion to 1/r2 (see DISCUSSION). Each pixel of the B cell map corresponds to the base of an arrow in Figs. 6 and 9 representing the vectorial modulation index, as defined below. For comparison with physiology, we assumed that the size of a pixel in the edge map corresponds to 0.5° of visual angle. Cross-correlations were computed using zero padding at the boundaries of the B cell maps and by expanding the G cell maps to preserve feedforward signals from boundary B cells.


Figure 6
View larger version (21K):
[in this window]
[in a new window]

 
FIG. 6. Model border-ownership assignments for stimuli used in neurophysiological experiments. Responses shown for single square (A), C-shape (B), and overlapping squares (C). Base of each arrow represents one pixel of the input edge map, and the arrows reflect the net activity of the B cells at each location according to vectorial modulation index (see METHODS). Scale bar indicates length of arrow corresponding to a modulation index of magnitude 1, which would correspond to a 100% certain assignment of border ownership in a particular direction. D: comparison of model performance to neurophysiological findings for border ownership in alert, behaving macaques. Top 2 rows: stimulus configuration with a cell's receptive field (indicated by small oval; cross indicates center of gaze). Third row: responses of a B cell recorded in area V2; bars indicate average firing rate of neuron for each stimulus condition. Bottom row: response of a model B cell to analogous stimulus conditions (location of cells indicated by circles in AC). Part of this figure was adapted from Fig. 23 of Zhou et al. (2000)Go, with permission.

 
Decreasing the number of G cells at larger scales weakens the contribution of these scales to the overall grouping feedback beyond the proximity preference discussed earlier. This is offset by constraining the weight parameters in Eqs. 3, 4, and 5 according to the expression {gamma}(r) = {rho}(r) = {gamma}0·r, where {gamma}0 is a fixed proportionality constant. This gives the recurrent loops between the B cells and the G cells a net weight of {gamma}0·r2 (the product of the separate feedforward and feedback weights) at each scale r, which counterbalances the similarly proportioned decrease in cell numbers. beta and {gamma}0 represent the only parameters that were used to tune the model. Our results were generated using values beta = 0.5 and {gamma}0 = 4.5, which were chosen based on an exploration of the model's parameter space (see APPENDIX, Fig. A1). Steady-state [dB/dt = 0 and dG(r)/dt = 0] border-ownership assignments and response time courses were then generated by numerically integrating the system of model equations with time constants {tau}B = {tau}G = 10 ms and a one-way conduction delay of 6 ms between the B cell and G cell layers (thus 12-ms "round-trip" for the loop from B cells to G cells and back), as estimated under PHYSIOLOGICAL CONSTRAINTS in the Background section. To keep our model simple and focused on the border-ownership computation, we assumed that the arrays of complex and end-stopped cell firing rates were given; that is, we did not compute them from responses of retinal cells.


Figure 12
View larger version (20K):
[in this window]
[in a new window]

 
FIG. A1. Parameter space of the model. For each of the 3 stimuli in Fig. 6 (A: single square; B: C-shape; C: overlapping squares), contours indicate the effect that varying parameters {gamma}0 and beta has on the modulation index mî of the circled model cells (see Eq. 7; mî < 0 indicates the "correct" direction of border ownership). Parameter values used in the current study are indicated by a cross in each plot. Note, however, that these exact values are not essential to the proper performance of the model. Despite variations in response magnitude (see DISCUSSION), the contour plots show that the model determines the direction of border ownership correctly for all 3 stimuli over the entire range of {gamma}0 and beta plotted. As can be seen in the top right corner of each plot, increasing either parameter value beyond the range shown results in saturated (i.e., |mî| = 1) border-ownership assignments in the correct direction.

 
Vectorial modulation index

The strength of the border-ownership signal is described by a generalization of the modulation index. We define a vector quantity by the expression

Formula 6(6)
where î and J are unit vectors along the horizontal and vertical image axis, respectively, and the components mî(x,y) and mJ(x,y) are the usual modulation indices along their respective axes, defined as

Formula 7(7)

Clearly, both components in Eq. 7 are limited to values between +1 and –1. For the x-component, for instance, a positive value of mî(x,y) signifies that the figure is to the right of position (x,y) and a negative value signifies that the figure is to the left. Its absolute value indicates the "strength" of the border-ownership signal, with zero being equivalent to ambivalence between left and right. The corresponding comments apply to the y-component, mj(x,y), regarding the figure's position upward or downward of (x,y). The direction of the vectorial modulation index Formula 7(x,y) defined in Eq. 6 indicates the position of the foreground figure in the 2D image plane relative to the point (x,y). For instance, positive values in both components [mî(x, y) > 0, mj(x,y) > 0] indicate that the figure is located upwards and to the right of (x, y).


    RESULTS
 TOP
 ABSTRACT
 INTRODUCTION
 METHODS
 RESULTS
 DISCUSSION
 APPENDIX
 GRANTS
 ACKNOWLEDGMENTS
 REFERENCES
 
Comparison with neurophysiology

We began by testing our model with stimuli similar to those from the neurophysiological experiments detailed in Zhou et al. (2000)Go. Figure 6, AC shows edge maps for these stimuli, along with border-ownership assignments made by the model. Arrows on a border point toward the region determined by the model to own the border, and the length of the arrows indicates the "strength" of the border-ownership signal (vectorial modulation index; see METHODS, Eq. 6). As can be seen, the model correctly determines the direction of border ownership—i.e., it assigns the borders to those regions that are perceived as foreground—everywhere for all of these stimuli.

Figure 6D compares the behavior of the model B cells at a single location in each of these figures (indicated by a circle in Fig. 6, AC) to neurophysiological responses. In all three cases, the direction of ownership indicated by the pair of model B cell responses is consistent with experimental findings. Notice that the magnitude of the model's side-of-figure distinction is considerably smaller along the inner arm of the C-shape than for the other two shapes (Fig. 6, B and D). Zhou et al. (2000)Go reported a similar trend in the proportion of neurons they observed with the correct border-ownership modulation for these three stimuli (cf. their Fig. 27), which they attributed to an incomplete use of available cues by B cells.

Zhou et al. (2000)Go also found that the responses of border-ownership cells for the single square were fairly insensitive to stimulus size. In our model, G cell feedback provides B cells with a broad range of contextual information about local edges. Accordingly, as Fig. 7 (left column) shows, our model is able to maintain its border-ownership distinction over a similar range of stimulus sizes as the neurons in area V2.


Figure 7
View larger version (22K):
[in this window]
[in a new window]

 
FIG. 7. Comparison of model performance to neurophysiological findings for border ownership. Left column: border-ownership responses of real and model cells persist over a range of sizes of the square stimulus. Right column: real and model cells have similar classical receptive field properties, demonstrating the lack of activity when the figure is outside the receptive field, despite the large amount of contextual modulation in the model. Left column, top 2 rows: stimulus configuration. Size of square increases >3-fold from left to right, with either right (A) or left (B) edge placed in cell's receptive field (indicated by small oval). Third row: responses of a B cell recorded in area V2 of an alert, behaving macaque; bars indicate average firing rate of neuron to each edge of square. Bottom row: response of a model B cell to analogous stimulus conditions. Right column, top row: stimulus configuration. Left and right: stimulus (white edge) is outside the classical receptive field. Center: stimulus is in the receptive field. Middle row: firing rate as function of distance between the center of the oriented edge and the center of the receptive field. Cell responds only when the stimulus is within the receptive field. Three positions in the top row are indicated by arrows. Bottom row: response of a model border-ownership cell to analogous stimulus conditions. Parts of this figure were reproduced from Figs. 5 and 13B of Zhou et al. (2000)Go, with permission.

 
In our model, feedback is applied through inhibition to rectifying B cells. This scheme accounts naturally for experimental observations that surround stimulation has only a modulatory effect on responses to stimuli in the classical receptive field (Zhou et al. 2000Go; their Fig. 13). Figure 7 (right column) shows that despite a broad range of image context integration for border ownership, no response is elicited from either the observed cells (Fig. 7, middle right) or model cells (Fig. 7, bottom right) unless an edge is placed directly in their classical receptive field. It is important to show that the activity produced by the model is confined to the locations of the contours as defined in the stimulus. In neurophysiology, this rule corresponds to the observation of small, sharply defined minimum response fields. Figure 13, A and B of Zhou et al. (2000)Go illustrates this for a border-ownership–selective cell and Fig. 7 shows that the present model is consistent with this observation.

As discussed in the Background section, another important consideration is the timing of contextual integration. Figure 8A shows the time course of the model's responses for square stimuli at two different sizes. In agreement with physiological findings (Fig. 2), it can be seen that the model's border-ownership signals emerge with only a short delay following the onset of the edge responses (~20 ms in these simulations) and that this delay does not vary with the size of the square stimulus (i.e., the solid and dashed red curves emerge at the same time). Moreover, both signals rise at the same rate and reach their half-maximal height well before 100 ms, demonstrating that context integration in the reentrant circuits is rapid and independent of stimulus size.


Figure 8
View larger version (18K):
[in this window]
[in a new window]

 
FIG. 8. Timing of model border-ownership signals does not depend on the spatial extent of image context integration. Activity of model border-ownership cells centered on the edge of a square is shown for square sizes of 3° (full lines) and 8° (dashed lines). Border-ownership signals (red) were defined as the difference between the responses of a pair of opposing model cells. Edge signals (defined as the sum of the 2 responses) are also shown for comparison (black). Amplitudes of BOS signals and edge signals are scaled differently, but the signals for the 2 figure sizes are plotted on the same scale. A: responses of the model configured as depicted in Fig. 3B and described in METHODS. Note that the BOS signals for both square sizes (red lines) emerge after the same, short (20-ms) delay after the onset of the edge responses (black lines), and that both signals rise at the same rate, reaching their half-maximal value well before 100 ms. B: responses of the model with inhibitory self-feedback applied to the B cells through local interneurons ("I"; see inset), as a form of gain control. This additional feature is sufficient to account for the overshoot seen at the onset of the physiological border-ownership signals. It also introduces damped oscillations that are suggestive of those in the physiological responses. Parameter values used for this simulation were {alpha} = 2, {tau}I = 10 ms, and {gamma}0 = 18 (see METHODS); all other model parameters were identical to those of the simulations presented in the other figures and described in METHODS. For comparison to physiological data under similar stimulus conditions, see Fig. 2.

 
Figure 8B shows that modifying the model slightly, by including inhibitory self-feedback to the B cells through local interneurons (a common gain control mechanism in cortical circuits; cf. Wilson 1999Go), is sufficient to account for the additional overshoot seen at the onset of the physiological border-ownership signals. This improves the agreement of the model's time courses with the data without changing its steady-state assignments and makes it even more apparent that the model's border-ownership signals are rising at the same rate—note that the initial portions of the two red curves overlap and that the border-ownership signal for the large square (red, dashed curve) reaches its peak slightly earlier than the signal for the small square (red, solid curve). The inclusion of this gain control mechanism also introduces damped oscillations that are suggestive of those seen in the physiological border-ownership signals. We have not attempted to model the later structure of the time courses here because these are likely to reflect a mixture of additional influences (for instance, feedback from higher cortical areas) that are outside the scope of the present study.

Model results: predictions

In addition to using the stimuli that had been used in the neurophysiological recordings by Zhou et al. (2000)Go, we tested our model with three new stimulus edge maps. Figure 9A shows a modified version of the C-shape, in which the contour that formed the inner arm of the "C" now appears to be owned by an occluding rectangle to the right of the contour. As is seen by the arrows plotted in the figure, the model correctly reverses its border-ownership assignment along this contour. Similarly, the stimulus in Fig. 9B is commonly perceived by viewers as a vertical bar occluding a horizontal bar of the same length, rather than as a vertical bar flanked by two squares. The model again produces results that are in agreement with perception, which is particularly remarkable given the strong model responses for isolated square figures shown earlier.

Although all stimuli considered thus far have been closed figures or combinations thereof, our model is not limited to this stimulus set. As Fig. 9C shows, the model is also able to account for more general perceptual grouping based only on the proximity of contours. Furthermore, if we introduce cues of an alternate figure–ground relationship (Fig. 9D), the model's assignments reverse, in agreement with perception. No electrophysiological results are as yet available for these stimuli so our model results are genuine predictions. The model response shown in Fig. 9C is important because it demonstrates a general grouping ability and shows that contour closure is not required. Thus the model will produce robust border-ownership assignment even in situations where the input edge map is incomplete, as is the rule when natural images are processed. Models based on collinear facilitation cannot relate distant, isolated contours as in Fig. 9C, and are thus unlikely to yield border-ownership assignment in such displays (see DISCUSSION).


    DISCUSSION
 TOP
 ABSTRACT
 INTRODUCTION
 METHODS
 RESULTS
 DISCUSSION
 APPENDIX
 GRANTS
 ACKNOWLEDGMENTS
 REFERENCES
 
Our network is constructed from generic model neurons that communicate with each other using firing-rate–based mechanisms. We introduce two classes of neurons: B cells and G cells. The basic hypothesis is that at a given location and orientation, input from V1-like edge detector cells (that do not respond differentially to side-of-figure) is routed to a pair of B cells, with each member of this pair responding preferentially to a figure located on one or the other side of an edge. Groups of B cells project to G cells such that activation of a specific G cell represents the presence of a visual object at or near a given location. The two B cells forming a pair inhibit each other and their competition is biased by feedback from the G cells, resulting in border-ownership–sensitive responses. These ownership assignments are then passed on to subsequent stages of form analysis. As we discuss later, in addition to facilitating border-ownership assignment through contour grouping, our G cells also provide a specific, intermediate-level structure for the attentional selection of these assignments that is lacking in other models. The assumption that each physiological B cell has a counterpart with which it forms mutually inhibitory connections implements a processing strategy that exploits the physical law that an occluding contour can be "owned" in only one direction at a time. We propose that the B cells in our model reflect the average behavior of the border-ownership–selective neurons recorded in primate visual cortex (Zhou et al. 2000Go).

Reentrant versus intrinsic circuits

A key premise of our model is that contextual modulation of border-ownership cell responses occurs through recurrent interactions with subsequent levels of the cortical hierarchy (i.e., grouping cells), rather than within-area, lateral interactions. Our choice of mechanisms is supported by a number of physiological and functional considerations. Foremost among these is the relationship between border-ownership–modulation latencies and figure size.

Lateral interactions may well play a role in context integration. However, as we have argued in the Background section, models relying exclusively on signal propagation through horizontal fibers within V2 are unrealistic because they imply long conduction delays, in contradiction to the neurophysiological findings. This is in contrast to Zhaoping (2005)Go, who presented a model based on within-V2 connections in which border-ownership signals emerge without excessive delays. We have estimated the delays produced by bridging the cortical distances in V2 corresponding to 4° of visual angle, the minimum required for generating a border-ownership signal in the center of the edge of a square figure of 8° size. Based on the published median conduction velocity measured for horizontal fibers in V1 (Girard et al. 2001Go) we estimated that axonal conduction alone (not counting synaptic transmission times) would produce delays of the border-ownership signal of 70–90 ms relative to the edge signals, whereas only 30 ms has been found (Fig. 2).

Zhaoping (2005)Go, citing Angelucci et al. (2002)Go, argues that it is reasonable to assume that the longest horizontal fibers in V2 bridge a distance corresponding to 3° and that the conduction for that length would take 8–10 ms. This means that, for a square of 8° size, transmitting information from a corner to the center of a side of the square (a cortical distance corresponding to 4°) would require only one relay of activity, and conduction would cause delays of little >10 ms. However, our calculations indicate that the visual angle corresponding to the longest horizontal fibers is much smaller. Angelucci et al. (2002)Go measured the extent of the fields that were labeled by tracer injections (in V1) and found 6–8 mm on average (see their Table 1), corresponding to a maximum length of fibers of 3–4 mm. According to our estimates of cortical distances in V2 cortex (see Background), a visual angle of 4° in a typical situation corresponds to 21- to 27-mm distance, which is five- to ninefold the maximum length of intracortical fibers (if one assumes, for lack of comparable measurements in V2, that this length is the same in V2 as in V1). The exact visual angular distance that can be bridged by intracortical fibers depends on the eccentricity. For example, Fig. 4 of Angelucci et al. (2002)Go shows that, at an eccentricity of 6.5°, the radius subtended by lateral connections in layer 4B of V1 is about 2°. Most of the data of Zhou et al. (2000)Go came from smaller eccentricities (median of 1.5° for V1, 2.0° for V2) where the cortical magnification factor is higher and the longest length of lateral connections corresponds to smaller visual angles. Clearly, more comprehensive studies of the neuroanatomy and topography of area V2 are needed to make a definitive argument.

Besides the conduction delays, the scale of the retinotopic representation of V2 alone shows that models of figure–ground organization that rely exclusively on intracortical circuits are implausible. Signals would have to be relayed through several neurons, requiring that every neuron in the chain produce action potentials. This contradicts the small size of the classical receptive fields. Most neurons in foveal and parafoveal V2 do not respond to stimuli outside a radius of 2° from the center of their receptive field (even in situations where neurons signal illusory contours, activity spreads over only small distances; Peterhans and von der Heydt 1989Go). This feature of the neurons characterizes the precision of spatial localization. To be compatible with this basic feature, models can use propagation of border-ownership signals only along chains of neurons that are also directly activated by contrast borders. This means, for example, that intracortical network models cannot produce grouping between two parallel lines, as shown in Fig. 9C, unless the cortical representations of the two lines are within the reach of monosynaptic connections. Taking the measurements from V1 by Angelucci et al. (2002)Go (for lack of measurements in V2), this would be distances of <4 mm in cortex (corresponding to about 1° visual angle at 2° of eccentricity; Gattass et al. 1981Go).

Grouping cells

The grouping cell architecture offers a number of advantages over mechanisms that have been proposed in previous models. By forming recurrent circuits between different cortical areas, G cells are able to integrate and disseminate contextual information rapidly, over much greater distances than purely feedforward or within-area lateral connections (Baek and Sajda 2005Go; Grossberg and Raizada 2000