|
|
||||||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
1Department of Biophysics, 2Department of Neuroscience, and 3The Zanvyl Krieger Mind/Brain Institute; Johns Hopkins University; Baltimore, Maryland
Submitted 24 February 2007; accepted in final form 8 April 2007
| ABSTRACT |
|---|
|
|
|---|
| INTRODUCTION |
|---|
|
|
|---|
|
Background
The problem of figureground organization was previously addressed by several theoretical studies. In this section, we briefly review the models proposed by these studies, in the context of recent neurophysiological findings. Although some were quite effective in resolving the occlusion structure of images and successfully explained the perceptual phenomena, we argue that consideration of neural coding mechanisms, distance and speed limitations on cortical processing, and the need for an interface to central processes such as selective attention requires a different approach. In subsequent sections, we then present an alternative model architecture that satisfies these constraints.
PRINCIPLES OF CODING. Two general schemes have been proposed for figureground representation: region labeling and border-ownership coding. Whereas region labeling means that regions are differentiated by labeling the corresponding elements in an isomorphic surface representation (like labeling pixels in a bitmap), border-ownership coding involves the contour representation and thus orientation-selective neurons.
One class of models has used region labeling, assuming, for example, that color/brightness signals spread within a cortical sheet to fill in the regions within the boundaries given by the contour representation (Arrington 1994
; Gerrits and Vendrik 1970
; Grossberg 1994
; Grossberg and Mingolla 1985
; Roelfsema et al. 2002
). Although such mechanisms may be appealing theoretically, it is not clear whether spreading of color/brightness signals in the visual cortex occurs (Roe et al. 2005
; Rossi and Paradiso 1999
; von der Heydt et al. 2003
). Figureground organization has been studied in area V1, where neural responses representing a figure region were found to be enhanced compared with responses representing the ground region (Lamme 1995
; Lee et al. 1998
; Zipser et al. 1996
). This could be interpreted as region labeling.
The present model is based on the data from recent studies of border-ownership coding. We chose to model these results because the border-ownership signals, particularly those in area V2, tend to be stronger and emerge earlier than the figure enhancement in V1, suggesting that the latter is the result of feedback from V2 or other extrastriate areas (cf. Lamme et al. 1998
). Zhou et al. (2000)
found that orientation-selective cortical neurons, which are usually thought to represent local contour properties such as edge contrast and orientation, are also sensitive to the global configuration of contours and code for border ownership. A neuron may respond to a contrast edge with a high firing rate if the edge is a contour of a figure on one side of the receptive field, but with a low firing rate if it is a contour of a figure on the other side. Many of these neurons combine side-of-figure selectivity with selectivity for the depth order of surfaces, as defined by stereoscopic cues (Qiu and von der Heydt 2005
) or by dynamic occlusion (von der Heydt et al. 2003
). This shows that side-of-figure selectivity is not just a random asymmetry in the wiring of receptive fields, but has to do with the strife for a 3D interpretation of the image. Thus the side-of-figurerelated response modulation reflects figureground organization. The most intriguing aspect of the neurophysiological findings is the influence of the image context. Apparently, V2 neurons have some knowledge of global shape even when the shapes are much larger than their receptive fields.
In essence, border-ownership selectivity means that each contour element is represented by two pools of neurons, one for each side of ownership, whose differential activity codes for border ownership (more precisely, the degree of border-ownership assignment), whereas their common activity codes for the local contour attributes such as orientation, motion, color/luminance contrast, and so forth. It is interesting to see the assignment of borders to regions, which is relational information, encoded in the firing rate of neurons like other contour attributes. It was often assumed that coding of relations between features requires a qualitatively different mechanism such as synchronized oscillation across neurons (Singer and Gray 1995
). The preceding results suggest that border ownership is represented by opponent channels, just as light and dark are represented by on- and off-center ganglion cells, and direction of motion by neurons in MT cortex. Although the evidence described earlier comes from recordings in monkey visual cortex, psychophysical experiments indicate that border-ownershipselective neurons also exist in the human visual cortex (von der Heydt et al. 2005
). Taken together, these findings indicate that the visual system explicitly represents border ownership at an early cortical level following the stage of local feature representation.
Representing figureground relationships in terms of border ownership seems plausible, for theoretical as well as physiological reasons. The contours carry most of the information about the shape of objects. Consequently, as discussed earlier (Fig. 1), border-ownership assignment is critical for object recognition. The vast majority of neurons in V1 and V2 are edge selective and orientation tuned, and a comparison between the activity evoked by the borders and the interior of a figure shows that the border signals are five- to sixfold stronger than the surface signals (Friedman et al. 2003
).
Several studies have modeled figureground organization in terms of border-ownership coding. Some of these assume that image context integration is achieved by lateral propagation of signals within the image representation, such as by horizontal fibers in area V2 (Baek and Sajda 2005
; Kikuchi and Akashi 2001
; Nishimura and Sakai 2004
, 2005
; Pao et al. 1999
; Zhaoping 2005
). As will be discussed in the next section, there are physiological constraints that limit the speed of image context integration in this type of architecture. Other models advocate feedback from higher cortical areas (Finkel and Sajda 1992
; Kienker et al. 1986
; Kikuchi and Fukushima 2003
; Sajda and Finkel 1995
; Yu et al. 2001
). These models are not constrained by the limitations of feedforward and lateral connections.
Perhaps the most extensive of these models, in terms of its ability to integrate multiple cue types and bind features for visual surface construction, is that of Sajda and Finkel (1995)
. It uses algorithms that identify contour terminations, resolve junctions, and identify closed contours, enabling the system to "tag" complete contour segments. Border ownership, represented as a binary value, is then assigned for each segment. Although this model implements border-ownership coding, and thus parallels the physiology better than region-labeling models, questions about its neural implementation remain. For example, it is unclear how the identification of contour segments (which is a global operation) might be realized and how the tags might be represented. The authors suggest coherent oscillation of the neurons representing the same segment as a possible mechanism. However, the functional role of such oscillations in primate visual cortex under awake conditions has been debated (Bair et al. 1994
; Young et al. 1992
). Our own recordings from pairs of cells (n = 37) failed to show significant coherent oscillations, whether both cells were activated by a common contour segment or by different, unrelated contour segments (F Qiu, H Schütze, and R von der Heydt, unpublished observations). Also, because we know now that border ownership modulates firing rates, the coherent oscillation hypothesis appears as a rather remote possibility. Finally, the gradual nature of the neural border-ownership signals (Zhou et al. 2000
) argues against binary coding and the global tagging scheme. The existence of neurons with a fixed border-ownership preference calls for a revision of the basic concept.
PHYSIOLOGICAL CONSTRAINTS. The recent physiological results place important constraints on modeling. First, the extent of visual context integration in border-ownership modulation is much larger than the classical receptive field. Second, it was found that the border-ownership signal emerges with a short latency. Figure 2 illustrates the time course of border-ownership signals (and edge signals, for comparison) in the critical tests. The stimulus configuration is shown schematically at the top. The receptive field (ellipse) was stimulated with a straight edge that could be the border of a square either on one side or on the other. The important point is that the entire region of visual field occupied by the two squares received identical stimulation in the two conditions. Thus any difference between the responses indicates an influence from stimulus features outside this region. The size of this region (and thus the minimum extent of spatial context that needs to be integrated for border-ownership assignment) is given by the size of the squares. In the experiment illustrated in Fig. 2, square sizes of 3 and 8° of visual angle were tested. Thus the region of identical stimulation was either 3 x 6 or 8 x 16°. The black curves show the time course of the edge signals (the average of the firing rates for the two figure locations) and the border-ownership signals (the difference between the firing rates for the two figure locations) are shown in red. It can be seen that the border-ownership signal emerges well before 100 ms, and with only a small delay after the edge signals (which are representative of V2 neurons in general). Importantly, there also appears to be no difference in latency between the border-ownership signals for large and small figures. That is, context integration over larger distances in the visual field does not take more time than context integration over smaller distances.
|
A model of border ownership that aims to account for the recent neural data, as well as the perceptual phenomena, assumes that context integration occurs by horizontal fibers within V2 (Zhaoping 2005
). This parsimonious model reproduces the observed border ownership data from assumptions about the lateral connectivity in V2. Briefly, it posits that neurons with nearby receptive fields are linked by excitatory and inhibitory connections depending on whether the corresponding border segments are consistent with being contours of the same figure.
The assumption made in this model (Zhaoping 2005
) and others (Baek and Sajda 2005
; Grossberg 1994
; Kikuchi and Akashi 2001
; Nishimura and Sakai 2004
, 2005
; Pao et al. 1999
), that image context integration occurs within the area, through horizontal fibers, implies that the latency of the border-ownership signal would increase with the distance of the relevant image context from the receptive field under consideration. This is a consequence of the retinotopic representation of visual information in V2 cortex.
As pointed out before (Zhou et al. 2000
), the distances in V2 cortex are considerable and the conduction through intracortical fibers is probably too slow to explain the short latencies of border-ownership signals. Conduction velocity estimates for these fibers range between 0.1 and 0.25 m/s in cat V1 (Bringuier et al. 1999
; Grinvald et al. 1994
) and 0.33 m/s in monkey V1 (Girard et al. 2001
); we are not aware of corresponding data for V2. Note that these figures are median values; there is a range of conduction velocities of single fibers, and it has been argued that longer fibers might conduct much faster than shorter fibers (Zhaoping 2005
).
To make a quantitative argument, it is necessary to consider the topography of area V2. A look at the well-known illustration of the unfolded cortical areas (Felleman and Van Essen 1991
) shows that V2 is a large area whose elongated shape is quite different from that of V1. In V2, the visual field representation is split at the horizontal meridian into a ventral part and a dorsal part that are connected only by a narrow bridge. Detailed maps of V2 (Gattass et al. 1981
) show that intracortical fibers would have to span considerable distances to explain the findings on border-ownership coding (Qiu and von der Heydt 2005
; Zhou et al. 2000
).
As an example, consider responses produced by a square figure of 8° side length. When an edge of the square is centered about the receptive field of the neuron under study, which was the condition used in the cited studies, the closest points that can provide border-ownership information are the corners on both ends of the edge, with a distance of 4° visual angle from the receptive field center. The representation of one of those corners is generally also the nearest point in cortex to the neuron where such information is available. From Fig. 9 of Gattass et al. (1981)
it can be seen that two neurons with receptive fields located 2.5 and 6.5° below the center of gaze (cells 5 and 11) are separated by 21 mm [because no scale bars are provided, we estimated the scale based on brain sections of the monkeys of Zhou et al. (2000)
to be 3:1]. Thus if the vertically oriented edge of the square were centered on the receptive field of cell 5, then the bottom corner would be represented at a distance of 21 mm. The representation of the top corner would be even farther away, in the ventral part of V2. For edges of horizontal orientation the situation is similar (consider cells 61 and 53): if the center of the edge is at 0.6° horizontal eccentricity, one corner would be at 4.6° eccentricity on the horizontal meridian and represented in cortex at a distance of about 27 mm, and the other corner would be represented in the opposite hemisphere. Because the maximal length of horizontal connections is only 34 mm (measured in V1; Table 1 in Angelucci et al. 2002
), the signals would have to be relayed many times through a cascade of neurons. Moreover, activity cannot be relayed through cells that are not directly stimulated (this would contradict the notion of the classical receptive field: stimuli outside this receptive field generally do not elicit responses). Thus signals could propagate only through neurons that are excited by the given contrast borders.
|
|
FIGUREGROUND ORGANIZATION AND MECHANISMS OF ATTENTION.
Our motivation for the present study comes also from the need for a more general point of view. Most of the existing models leave open the question of how the figureground assignments they compute influence, and are influenced by, higher-level processes (some exceptions are Carpenter and Grossberg 1993
; Grossberg et al. 1994
; Vecera and O'Reilly 1998
). The models transform one retinotopic representation into another in which regions are labeled as figure and ground, or contour segments are assigned border ownership. Although the result is an improved representation, it is essentially no more than an enhanced image. To avoid referring the further processing to a homunculus that views this internal image, we have to offer at least a hypothesis of how the figureground representation produced by the model will interact with higher-level processes, specifically processes of selective visual attention and form recognition. As pointed out, the shapes of figure regions capture attention and are remembered, whereas the shapes of ground regions often go unnoticed (Fig. 1B). This shows that figureground organization plays a role in selective attention and both must be closely related.
Kienker et al. (1986)
modeled figureground segregation in a purely top-down fashionborder ownership being determined by the location of an attentional spotlight. In contrast, most recent models treat figureground segregation as a purely bottom-up process, relying on local, within-area interactions at the lower stages of the visual hierarchy (e.g., Zhaoping 2005
). Neither of these approaches offers a satisfactory explanation of how the whole system can function, barring the unacceptable solution of a homunculus that, in the case of the first model, determines where to shine the top-down attentional spotlight without the benefit of an organization of the sensory information or, in the case of the latter model, a homunculus that interprets the transformed versions of the retinal image created by the autonomous, bottom-up circuits. One solution is to assume iterative interaction between bottom-up signals and memory-related top-down signals in a way that converges over time (Carpenter and Grossberg 1993
; Vecera and O'Reilly 1998
). Inasmuch as these algorithms rely on the filling-in hypothesis and lateral signal propagation in cortex, though, the earlier concerns regarding physiological plausibility and the latency problem remain the same as for the other models.
In the present model, we propose dedicated neural circuits for perceptual grouping and figureground organization that also provide handles for attentional selection. Because our circuits for image context integration are separate from those representing the visual sensory information, they may include neurons in a higher-order cortical area and use recurrent white-matter projections, explaining the size invariance of the latency of the border-ownership signal. Our framework is consistent with recent neurophysiological findings (Zhou et al. 2000
) and makes specific, testable predictions regarding both the physiological mechanism of border-ownership determination and the functional role of this mechanism in higher-level visual processing. Portions of this report were previously presented in abstract form (Craft et al. 2004
; Schütze et al. 2003
).
| METHODS |
|---|
|
|
|---|
Figure 3A shows the overall architecture of our model network and Fig. 3B highlights specific aspects of its connectivity. Input is provided by a stimulus map composed of oriented-edge detectors C
, akin to the topographic representation of a scene by complex cells in primary visual cortex (V1). Because an edge can be owned on either of its two sides by a figure, each edge detector C
provides input to a pair of mutually antagonistic border-ownership cellsone for each direction of ownership, B
+ and B
("B cells"; see Fig. 5A for direction notation). The B cells inhibit each other (connections labeled
in Fig. 3B). The current version of the model uses only horizontal and vertical edges, resulting in four border-ownership channels. The B cell pairs receive additional input from end-stopped cells, E
+ and E
(see Border-ownership cells and Fig. 5A).
|
+ provides input to grouping cell G (connection labeled
), so G inhibits B
(connection labeled
). This is functionally similar to a circuit in which G applies positive feedback directly to B
+, but does not require any additional mechanisms to preserve the classical receptive field property of the B cells (see RESULTS), allowing us to focus on the performance of the grouping mechanism. Note also that the inhibitory connections in Fig. 3 require inhibitory interneurons that we omitted in this schematic for the sake of clarity.
|
Grouping cell connections
Each pixel of the connection pattern images in Fig. 4A indicates the point of origin and strength (in grayscale coding; darker meaning stronger) of a single feedforward connection from a B cell at the perimeter of the annulus to the G cell at the center. These connection patterns are generated by convolving circles of the desired radii (r, Fig. 5B) with a normalized 2D Gaussian filter (parameter
, Fig. 5B), resulting in the annuli shown. The annular regions become increasingly diffuse at larger radii to preserve scale invariance, and we normalize the strength of the synaptic inputs (connection weights) to give a common total weight (of unity) for all radii.
Next, we determine which border-ownership orientation channel must give rise to each of the connections along the perimeter of these annuli. The preferred direction of border ownership is defined at each point in the connection pattern image by a vector pointing toward the center of the annulus. Because our model currently implements only four orientations of border ownership, we resolve these vectors into positive and negative components along the horizontal and vertical axes of ownership. Based on these vector components, we then partition the annuli into separate sets of connections arising from each of the four border-ownership orientation channels (cocircular contour fragments K
±(r), Fig. 5C; see APPENDIX for details).
All G cells receive their input through a translated version of this same relative pattern of connections. In such a homogeneously connected network, 2D cross-correlation (i.e., spatial filtering) provides a natural description of the interactions between layers, using the fixed connectivity pattern as the correlation kernel. The four annulus components, K
±(r), are correlated separately with the four border-ownership channels, B
±, as described by
![]() | (1) |
" denotes the 2D cross-correlation operation.
If we adopt the shorthand notation S
±(r) = B
±
K
±(r), we can regard the input from each kernel as representing individual curvature segments, "S
±(r)(x,y)," with center of curvature (x,y), radius of curvature r, and orientation of ownership
±. We would like the G cell activity to reflect a preference for cooccurrences among these segments so, rather than summing them linearly, we combine them as follows
![]() | (2) |
The activity of the G cells is then governed by firing-rate equations of the form
![]() | (3) |
G is a time constant common to all G cells, and
(r) scales the feedforward connection weights for each G cell receptive field size. Again, the boldface symbol G represents a 2D array, G(x,y), of neuron firing rates. Border-ownership cells
As was illustrated in Fig. 3, every feedforward connection from a cell B
± to a G cell is accompanied by an inhibitory feedback connection from that same G cell to the opposing border-ownership cell, B
. Because pixels in the annulus images shown in Fig. 4A represent border-ownership cell locations that a particular G cell "observes," we can characterize the feedback connections by asking the reciprocal question: What is the pattern of all G cells that a single B cell location observes? Mathematically, this pattern can be obtained directly from the feedforward correlation kernels K
±(r) described in the previous section (see APPENDIX). As with the grouping cells, these feedback connections to the B cells are implemented using spatial filtering [G(r)
K
±(r)]. Together with a term
B
± that accounts for the mutual inhibition between a pair of border-ownership cells, the sum over these cross-correlations yields the total inhibitory input (see Eqs. 4 and 5, below) to the border-ownership cells.
Excitatory feedforward input to B
± is provided by orientation-selective edge detectors, C
(see Fig. 5A for directional notation). Additional contributions from local stimulus features can also be taken into account in the input stage. For instance, cues such as binocular disparity and dynamic occlusion contribute to the perception of border ownership and have been shown to influence the neural border-ownership signals accordingly (Qiu and von der Heydt 2005
; von der Heydt et al. 2003
). In the present study, we model the influence of T-junctions as one example of such local cues.
T-junctions are represented in visual cortex by the activity of end-stopped cells, which are common in V1 and V2 (Heider et al. 2000
; Hubel and Wiesel 1968
). Because these cells respond selectively to terminations of edges and lines, they are well suited for detecting occlusion features (Heitger et al. 1992
). Model end-stopped cells have been used successfully in prior computational models for representing occluding contours (Heitger and von der Heydt 1993
; Heitger et al. 1998
).
In our model, oriented edge segments that form the "hat" of a T-junction (encoded by complex cells, C
) are biased toward ownership on the side where termination of the "stem" of the junction (encoded by end-stopped cells, E
±) suggests occlusion (see Fig. 5A). It is important to note that this T-junction bias does not depend on specialized detectors that classify image features. All that is needed are edge selective elements that are also sensitive to the presence of an intersecting contour on one side, similar to what was proposed in the model of illusory contours of Heitger et al. (1998)
. B cells whose activity is influenced by orthogonally oriented, end-stopped cells achieve this naturally.
End-stopped cells may either excite or inhibit border-ownership cells to produce this bias (Fig. 3B). Thus the combined input from C
and E
± effectively yields three distinct levels of total feedforward input: each cell B
± may receive a baseline input (proportional to the activity of C
) that conveys local edge contrast; a stronger input (e.g., proportional to twofold the activity of C
, as a consequence of complementary excitation from E
±), where end-stopping indicates ownership in the B cell's preferred direction; or a weaker input (e.g., diminishing toward 0, as a result of inhibition from E
), where end-stopping indicates ownership in the opposite direction. For simplicity, these three levels of feedforward input were combined into a single term in the stimulus maps, as indicated in Eqs. 4 and 5 using the mathematical shorthand CE
± (see APPENDIX for more details).
The activity of all complementary pairs of B cells is thus described by
![]() | (4) |
![]() | (5) |
B is the B cell time constant,
reflects the strength of mutual inhibition between opposing border-ownership cells, and
(r) scales the feedback from the grouping cells. The subscripted plus signs on the brackets indicate that B cell responses are rectifiedi.e., if the net input to a B cell becomes negative, its firing rate will not decrease below zero. Based on Eqs. 3, 4, and 5, we implemented a model network of four orientation channels of border ownership and six sizes of G cell receptive fields (see Figs. 4A and 5B), with a resolution of one neuron per pixel in each visuotopic B cell map. To maintain uniform coverage of the B cell layers by each of the different sizes of G cell annuli, the number of G cells present at each scale (radius, r) was decreased in proportion to 1/r2 (see DISCUSSION). Each pixel of the B cell map corresponds to the base of an arrow in Figs. 6 and 9 representing the vectorial modulation index, as defined below. For comparison with physiology, we assumed that the size of a pixel in the edge map corresponds to 0.5° of visual angle. Cross-correlations were computed using zero padding at the boundaries of the B cell maps and by expanding the G cell maps to preserve feedforward signals from boundary B cells.
|
(r) =
(r) =
0·r, where
0 is a fixed proportionality constant. This gives the recurrent loops between the B cells and the G cells a net weight of
0·r2 (the product of the separate feedforward and feedback weights) at each scale r, which counterbalances the similarly proportioned decrease in cell numbers.
and
0 represent the only parameters that were used to tune the model. Our results were generated using values
= 0.5 and
0 = 4.5, which were chosen based on an exploration of the model's parameter space (see APPENDIX, Fig. A1). Steady-state [dB/dt = 0 and dG(r)/dt = 0] border-ownership assignments and response time courses were then generated by numerically integrating the system of model equations with time constants
B =
G = 10 ms and a one-way conduction delay of 6 ms between the B cell and G cell layers (thus 12-ms "round-trip" for the loop from B cells to G cells and back), as estimated under PHYSIOLOGICAL CONSTRAINTS in the Background section. To keep our model simple and focused on the border-ownership computation, we assumed that the arrays of complex and end-stopped cell firing rates were given; that is, we did not compute them from responses of retinal cells.
|
The strength of the border-ownership signal is described by a generalization of the modulation index. We define a vector quantity by the expression
![]() | (6) |
are unit vectors along the horizontal and vertical image axis, respectively, and the components mî(x,y) and m
(x,y) are the usual modulation indices along their respective axes, defined as
![]() | (7) |
Clearly, both components in Eq. 7 are limited to values between +1 and 1. For the x-component, for instance, a positive value of mî(x,y) signifies that the figure is to the right of position (x,y) and a negative value signifies that the figure is to the left. Its absolute value indicates the "strength" of the border-ownership signal, with zero being equivalent to ambivalence between left and right. The corresponding comments apply to the y-component, m
(x,y), regarding the figure's position upward or downward of (x,y). The direction of the vectorial modulation index
(x,y) defined in Eq. 6 indicates the position of the foreground figure in the 2D image plane relative to the point (x,y). For instance, positive values in both components [mî(x, y) > 0, m
(x,y) > 0] indicate that the figure is located upwards and to the right of (x, y).
| RESULTS |
|---|
|
|
|---|
We began by testing our model with stimuli similar to those from the neurophysiological experiments detailed in Zhou et al. (2000)
. Figure 6, AC shows edge maps for these stimuli, along with border-ownership assignments made by the model. Arrows on a border point toward the region determined by the model to own the border, and the length of the arrows indicates the "strength" of the border-ownership signal (vectorial modulation index; see METHODS, Eq. 6). As can be seen, the model correctly determines the direction of border ownershipi.e., it assigns the borders to those regions that are perceived as foregroundeverywhere for all of these stimuli.
Figure 6D compares the behavior of the model B cells at a single location in each of these figures (indicated by a circle in Fig. 6, AC) to neurophysiological responses. In all three cases, the direction of ownership indicated by the pair of model B cell responses is consistent with experimental findings. Notice that the magnitude of the model's side-of-figure distinction is considerably smaller along the inner arm of the C-shape than for the other two shapes (Fig. 6, B and D). Zhou et al. (2000)
reported a similar trend in the proportion of neurons they observed with the correct border-ownership modulation for these three stimuli (cf. their Fig. 27), which they attributed to an incomplete use of available cues by B cells.
Zhou et al. (2000)
also found that the responses of border-ownership cells for the single square were fairly insensitive to stimulus size. In our model, G cell feedback provides B cells with a broad range of contextual information about local edges. Accordingly, as Fig. 7 (left column) shows, our model is able to maintain its border-ownership distinction over a similar range of stimulus sizes as the neurons in area V2.
|
As discussed in the Background section, another important consideration is the timing of contextual integration. Figure 8A shows the time course of the model's responses for square stimuli at two different sizes. In agreement with physiological findings (Fig. 2), it can be seen that the model's border-ownership signals emerge with only a short delay following the onset of the edge responses (
20 ms in these simulations) and that this delay does not vary with the size of the square stimulus (i.e., the solid and dashed red curves emerge at the same time). Moreover, both signals rise at the same rate and reach their half-maximal height well before 100 ms, demonstrating that context integration in the reentrant circuits is rapid and independent of stimulus size.
|
Model results: predictions
In addition to using the stimuli that had been used in the neurophysiological recordings by Zhou et al. (2000)
, we tested our model with three new stimulus edge maps. Figure 9A shows a modified version of the C-shape, in which the contour that formed the inner arm of the "C" now appears to be owned by an occluding rectangle to the right of the contour. As is seen by the arrows plotted in the figure, the model correctly reverses its border-ownership assignment along this contour. Similarly, the stimulus in Fig. 9B is commonly perceived by viewers as a vertical bar occluding a horizontal bar of the same length, rather than as a vertical bar flanked by two squares. The model again produces results that are in agreement with perception, which is particularly remarkable given the strong model responses for isolated square figures shown earlier.
Although all stimuli considered thus far have been closed figures or combinations thereof, our model is not limited to this stimulus set. As Fig. 9C shows, the model is also able to account for more general perceptual grouping based only on the proximity of contours. Furthermore, if we introduce cues of an alternate figureground relationship (Fig. 9D), the model's assignments reverse, in agreement with perception. No electrophysiological results are as yet available for these stimuli so our model results are genuine predictions. The model response shown in Fig. 9C is important because it demonstrates a general grouping ability and shows that contour closure is not required. Thus the model will produce robust border-ownership assignment even in situations where the input edge map is incomplete, as is the rule when natural images are processed. Models based on collinear facilitation cannot relate distant, isolated contours as in Fig. 9C, and are thus unlikely to yield border-ownership assignment in such displays (see DISCUSSION).
| DISCUSSION |
|---|
|
|
|---|
Reentrant versus intrinsic circuits
A key premise of our model is that contextual modulation of border-ownership cell responses occurs through recurrent interactions with subsequent levels of the cortical hierarchy (i.e., grouping cells), rather than within-area, lateral interactions. Our choice of mechanisms is supported by a number of physiological and functional considerations. Foremost among these is the relationship between border-ownershipmodulation latencies and figure size.
Lateral interactions may well play a role in context integration. However, as we have argued in the Background section, models relying exclusively on signal propagation through horizontal fibers within V2 are unrealistic because they imply long conduction delays, in contradiction to the neurophysiological findings. This is in contrast to Zhaoping (2005)
, who presented a model based on within-V2 connections in which border-ownership signals emerge without excessive delays. We have estimated the delays produced by bridging the cortical distances in V2 corresponding to 4° of visual angle, the minimum required for generating a border-ownership signal in the center of the edge of a square figure of 8° size. Based on the published median conduction velocity measured for horizontal fibers in V1 (Girard et al. 2001
) we estimated that axonal conduction alone (not counting synaptic transmission times) would produce delays of the border-ownership signal of 7090 ms relative to the edge signals, whereas only 30 ms has been found (Fig. 2).
Zhaoping (2005)
, citing Angelucci et al. (2002)
, argues that it is reasonable to assume that the longest horizontal fibers in V2 bridge a distance corresponding to 3° and that the conduction for that length would take 810 ms. This means that, for a square of 8° size, transmitting information from a corner to the center of a side of the square (a cortical distance corresponding to 4°) would require only one relay of activity, and conduction would cause delays of little >10 ms. However, our calculations indicate that the visual angle corresponding to the longest horizontal fibers is much smaller. Angelucci et al. (2002)
measured the extent of the fields that were labeled by tracer injections (in V1) and found 68 mm on average (see their Table 1), corresponding to a maximum length of fibers of 34 mm. According to our estimates of cortical distances in V2 cortex (see Background), a visual angle of 4° in a typical situation corresponds to 21- to 27-mm distance, which is five- to ninefold the maximum length of intracortical fibers (if one assumes, for lack of comparable measurements in V2, that this length is the same in V2 as in V1). The exact visual angular distance that can be bridged by intracortical fibers depends on the eccentricity. For example, Fig. 4 of Angelucci et al. (2002)
shows that, at an eccentricity of 6.5°, the radius subtended by lateral connections in layer 4B of V1 is about 2°. Most of the data of Zhou et al. (2000)
came from smaller eccentricities (median of 1.5° for V1, 2.0° for V2) where the cortical magnification factor is higher and the longest length of lateral connections corresponds to smaller visual angles. Clearly, more comprehensive studies of the neuroanatomy and topography of area V2 are needed to make a definitive argument.
Besides the conduction delays, the scale of the retinotopic representation of V2 alone shows that models of figureground organization that rely exclusively on intracortical circuits are implausible. Signals would have to be relayed through several neurons, requiring that every neuron in the chain produce action potentials. This contradicts the small size of the classical receptive fields. Most neurons in foveal and parafoveal V2 do not respond to stimuli outside a radius of 2° from the center of their receptive field (even in situations where neurons signal illusory contours, activity spreads over only small distances; Peterhans and von der Heydt 1989
). This feature of the neurons characterizes the precision of spatial localization. To be compatible with this basic feature, models can use propagation of border-ownership signals only along chains of neurons that are also directly activated by contrast borders. This means, for example, that intracortical network models cannot produce grouping between two parallel lines, as shown in Fig. 9C, unless the cortical representations of the two lines are within the reach of monosynaptic connections. Taking the measurements from V1 by Angelucci et al. (2002)
(for lack of measurements in V2), this would be distances of <4 mm in cortex (corresponding to about 1° visual angle at 2° of eccentricity; Gattass et al. 1981
).
Grouping cells
The grouping cell architecture offers a number of advantages over mechanisms that have been proposed in previous models. By forming recurrent circuits between different cortical areas, G cells are able to integrate and disseminate contextual information rapidly, over much greater distances than purely feedforward or within-area lateral connections (Baek and Sajda 2005
; Grossberg and Raizada 2000