The observation of figure-ground selectivity in neurons of the visual cortex shows that these neurons can be influenced by the image context far beyond the classical receptive field. To clarify the nature of the context integration mechanism, we studied the latencies of neural edge signals, comparing the emergence of context-dependent definition of border ownership with the onset of local edge definition (contrast polarity; stereoscopic depth order). Single-neuron activity was recorded in areas V1 and V2 of Macaca mulatta under behaviorally induced fixation. Whereas local edge definition emerged immediately (<13 ms) after the edge onset response, the context-dependent signal was delayed by about 30 ms. To see if the context influence was mediated by horizontal fibers within cortex, we measured the latencies of border ownership signals for two conditions in which the relevant context information was located at different distances from the receptive field and compared the latency difference with the difference predicted from horizontal signal propagation. The prediction was based on the increase in cortical distance, computed from the mapping of the test stimuli in the cortex, and the known conduction velocities of horizontal fibers. The measured latencies increased with cortical distance, but much less than predicted by the horizontal propagation hypothesis. Probability calculations showed that an explanation of the context influence by horizontal signal propagation alone is highly unlikely, whereas mechanisms involving back projections from other extrastriate areas are plausible.
- area V2
- border ownership
- Macaca mulatta
- signal latency
most neurons in areas V1 and V2 of the visual cortex can be activated by patterned stimuli from a small region on the retina, called the classical receptive field (CRF), but not from outside this region. Nevertheless, in many neurons, patterns outside the CRF can influence the activity evoked by a pattern in the CRF. In other words, the response to the local stimulus depends on the image context. The context influence is particularly striking in situations that relate to perceptual figure-ground organization. This has been studied with two different paradigms, one focusing on the regional enhancement of responses and the other on border ownership modulation: When the receptive field of a neuron is located in the center of a texture-defined figure, responses tend to be enhanced compared with the responses evoked by the identical texture that is background or fills the entire display (Lamme 1995). When the receptive field of a neuron is located on the border of a figure, some neurons respond more strongly when the figure is on one side of the receptive field (the neuron's preferred side) than when the figure is on the other side, despite stimuli being locally identical (Fig. 1) (Zhou et al. 2000). These neurons are selective for direction of figure (DOF).1 The figure enhancement effect has been studied in area V1 (Lamme 1995; Lee et al. 1998; Zipser et al. 1996). DOF selectivity was found in areas V1, V2, and V4 (Zhou et al. 2000). In V2, slightly more than half of the orientation-selective neurons (which make up 80–90% of the neurons in this area) show DOF selectivity; in V1, this is about 20%. In V4, this property was found in about half of the neurons that can be driven with straight edges or lines (which is the case in <50% of the neurons). In the present study, we analyzed the emergence of the DOF influence.
DOF selectivity reveals a large range of context integration. Comparison of the two displays in Fig. 1A shows that they are identical not only in the CRF but over the entire region occupied by the two squares (Fig. 1C, dashed line). Therefore, finding a response difference means that the recorded neuron receives information from outside this region or from its perimeter. Thus, by varying this region of identical stimulation, one can probe the range of context integration. In DOF-selective neurons with near-foveal receptive fields, which have CRFs of <1 deg in diameter, this range can be 10 deg radial distance or more (Zhou et al. 2000). Because V1 and V2 are retinotopically organized and have magnification factors as large as 6 mm per degree of visual angle in the foveal representation (Dow et al. 1985; Gattass et al. 1981; Hubel and Wiesel 1968), this means that context information has to be transmitted over large distances in the cortex. In principle, there are three different ways that information can spread laterally in the visual cortex (Angelucci et al. 2002): by convergence of the forward connections [e.g., from lateral geniculate nucleus (LGN) to V1, and from V1 to V2], through lateral connections (horizontal fibers) within an area, and by divergence of backward projections between areas. The spreading of forward connections accounts for the CRFs, which are small and thus provide only little context information (Angelucci and Sainsbury 2006). The nonclassical surround modulation is thought to be mediated by horizontal fibers and by back-projecting fibers from higher level areas. Accordingly, models of DOF selectivity have proposed horizontal propagation (e.g., Baek and Sajda 2005; Finkel and Sajda 1992; Grossberg 1994; Kikuchi and Akashi 2001; Sajda and Finkel 1995; Zhaoping 2005) or spreading via back projections from higher level areas (e.g., Craft et al. 2007; Jehee et al. 2007). Zhou et al. (2000) argued that horizontal fiber communication would be too slow to explain the short latency of the context effect, which reaches one-half its amplitude about 70 ms after stimulus onset. They concluded that the contextual modulation was likely a result of back projections. The argument was based on the fact that intracortical (horizontal) fibers have much slower conduction velocity than white matter fibers. However, other studies came to different conclusions (Zhaoping 2005). The arguments in both cases were only approximate, citing a typical cortical magnification factor (which actually depends on eccentricity) and using a median conduction velocity for horizontal fibers. In the present study, we used a more exact approach, considering the actual cortical distances involved in context integration in the specific experimental conditions and the range of horizontal signal propagation speeds that have been measured.
Zhou et al. (2000) only measured the latency of the DOF signal, which includes delays from processing in the retina and the visual pathways. In the present study, we compared the latencies of the context effect for different figure sizes, which allowed us to determine the time required by the context integration proper. Because the figure size determines the region of identical stimulation, the horizontal propagation scheme predicts that larger figures should produce longer latencies, since cortical signals have to travel a longer path. More precisely, the context effect should be delayed in proportion to the extra length of horizontal fibers (plus any delays caused by synaptic relays). On the other hand, in the back projection scheme, the length of the pathways is mainly given by the length of fibers connecting two areas, e.g., V2 and V-x, and does not vary much with the distance of points within V2.
For comparison with the DOF signals, we also measured the latency of the edge onset responses and the latencies of two classes of edge signals that do not depend on context integration: the signals for contrast polarity (Friedman et al. 2003) and stereoscopic depth order (Qiu and von der Heydt 2005; von der Heydt et al. 2000). Both are differential responses, like the DOF signal. We expected the contrast polarity signals to be fast, because edge contrast polarity is sensed already at the level of simple cells. We were particularly interested to see the latency of depth order signals, because stereoscopic depth order is an important border ownership cue that does not depend on the image context. A comparison of disparities across an edge requires a certain minimum size of neighborhood, but beyond that, the computation should not depend on the size of the figure.
We found that the latency of DOF signals tends to increase with the size of the region of identical stimulation, but much less than expected based on the horizontal propagation assumption. The different edge signals all emerged in <80 ms after stimulus onset and within a small range of latencies. Thus, even for computations that require extensive context integration, the cortex is surprisingly fast. Considering the cortical distances involved, we have concluded that an explanation of the context effect by signal propagation through horizontal fibers alone is highly unlikely and that an involvement of back projection loops is more plausible. This study of the speed of context integration complements a previous analysis of the spatial integration mechanism (Zhang and von der Heydt 2010).
Recordings were made in five adult rhesus monkeys (Macaca mulatta), two females and three males. The details of our general methods have been described previously (Qiu and von der Heydt 2005; Qiu et al. 2007). The animals were prepared by implanting, under general anesthesia, three small posts for head fixation and two recording chambers (1 over each hemisphere). Behavioral training was achieved by controlling fluid intake and using juice or water to reward correct responses. All animal procedures conformed to National Institutes of Health and U.S. Department of Agriculture guidelines as verified by the Animal Care and Use Committee of the Johns Hopkins University.
Single-neuron activity was recorded extracellularly with quartz-insulated platinum-tungsten (Pt-W) microelectrodes or with epoxy-insulated tungsten microelectrodes inserted through small (5 mm) trephinations. Area V1 was recorded right under the dura, V2 either in the posterior bank of the lunate sulcus, after passing through V1 and the white matter, or in the lip of the postlunate gyrus. The recorded neurons were assigned to the areas on the basis of histological reconstruction of the recording sites (see below). Neuronal signals were amplified by a set of conventional amplifiers using low-pass and high-pass filters. Action potentials were isolated by an online spike detector system (Alpha Omega). Times of action potentials were digitized with 0.1-ms resolution. Eye movements were recorded for one eye using a video-based infrared pupil tracking system (Iscan) and a 45° beam splitter, with the virtual camera axis aligned with the line of fixation.
The visual stimuli were generated on a Silicon Graphics O2 workstation or a Pentium 4 Linux workstation equipped with a Nvidia GeForce 6800 graphics card, using the antialiasing feature of the Open Inventor software. The stimuli were displayed on a Barco CCID 121 FS color monitor with 1280 × 1024 resolution and a 72-Hz refresh rate, or an Eizo FlexScan T965 color monitor with 1600 × 1200 resolution and a 100-Hz refresh rate. Stereograms were presented side by side and superimposed optically at 40-cm viewing distance. All stimuli were presented on a uniform background of 28 cd/m2 (except when figure and ground colors were reversed, as explained below).
The responses of single neurons were studied. Before the main tests, the basic characteristics of a neuron (orientation, end stopping, color selectivity, and in some experiments disparity tuning) were examined with bars, and the CRF was determined using the method of the minimum response field (Barlow et al. 1967; see also Zhou et al. 2000). An optimally oriented bar of suitable dimensions and color was moved across the receptive field region at various locations while listening to the responses and marking the limits by hand. In many cases, the hand map was verified by recording position-response curves perpendicular to and along the stimulus orientation. DOF selectivity was determined with square figures of two sizes, 3 deg and 8 deg. If there was notable color selectivity, the preferred color and neutral gray (28 cd/m2) were used for figure and ground; otherwise, white (53–93 cd/m2) and gray were used. Figures were presented for 500–2,000 ms, starting 300 ms after the beginning of fixation. The minimum interval between presentations was 800–1,500 ms (the actual duration depended on when the monkey resumed fixation). During this interval, a blank screen was presented whose color was the mean of figure and ground colors. One edge of a square figure was centered on the CRF at the preferred orientation of the neuron, and the four conditions shown in Fig. 1B were tested for each size of square. The resulting eight conditions were presented repeatedly in random order.
Stereoscopic figures were generated by means of dynamic random-dot stereograms (Julesz 1960) using randomly positioned white dots on gray with a dot size of 2 or 6 arc min, 14% coverage, and a pattern renewal frequency of 8 Hz. Edges of stereoscopic cyclopean squares and square windows were tested. The size of the squares was 4 deg, and the configuration was the same as for the contrast-defined figures (Fig. 1B). The preferred disparity (or zero, the depth of fixation, if the disparity tuning was flat) was used for the nearer plane, whereas the farther plane was placed at a distance of 10 or 24 arc min disparity behind the fixation target.
Histological Reconstruction and Retinocortical Mapping
After the recordings were completed, the animal was anesthetized, and thin, sharply pointed marker pins were inserted at known positions around the recording regions. This was done in the recording setup, using the same calibrated positioning device that had been used for placing the electrodes for recording. The animal was then given an overdose of pentobarbital and the brain perfused with buffered 4% formaldehyde. The pins were removed, the tissue was blocked and soaked in 30% sucrose, and 50-μm frozen sections were cut at right angles to the orientation of the pins (tangential sections) and stained for cytochrome oxidase. The sections were digitally photographed, the pinholes were identified in each section image, and the images were translated, rotated, and scaled to bring the pinholes into register. Knowing the location of the electrode tracks relative to the pinholes and knowing the depths of recording for each neuron, we were able to reconstruct the recording sites. Because the pins were inserted before the fixation and histological processing, the reconstruction was independent of any lateral shrinkage of the tissue that occurred during the processing.
To obtain mapping functions that describe the projection from visual space to cortical space, the cortical surface was reconstructed (Craft 2009). To facilitate the reconstruction, the section images were computationally “stacked” and the tissue block “resectioned” at an orientation perpendicular to the original tangential sections, cutting across the lunate sulcus. In the perpendicular sections, the cortex was traced at midcortical thickness (Van Essen and Maunsell 1980), and the three-dimensional (3-D) surface of the cortex was reconstructed from the traces using the software package Caret (Computerized Anatomical Reconstruction and Editing Toolkit; http://www.nitrc.org/projects/caret/). With the use of custom-written code, the reconstructed surface was combined with the electrode tracks and the positions of the recorded neurons were projected onto the nearest point on the 3-D surface. The surface was then flattened, again using Caret. The resulting positions of the neurons on the flattened surface were associated with the neurons' receptive field locations in visual space. This was achieved using a transformation similar to the “monopole mapping” proposed by Polimeni et al. (2006), which describes the projections of visual space into the flat map of areas V1 and V2. Our transformation included two additional parameters. The mapping parameters were determined that best fit (least squares) the positions predicted from the receptive field locations to the reconstructed neuron positions. The additional parameters improved the fit slightly and also resulted in a closer match of the predicted V1-V2 border to the anatomic V1-V2 border in the reconstructed region.
Spike Train Analysis
Neurons whose mean firing rate for the most effective edge stimulus was <4 Hz were excluded (∼20%). The selectivity of neurons was assessed by analysis of variance (ANOVA) performed on square-root-transformed spike counts in a fixed interval (500–2,000 ms after stimulus onset). The square-root transform served to stabilize the variance. The significance criterion was P < 0.05. For the test of Fig. 1B, an ANOVA with three binary factors was performed (DOF × contrast polarity × size of figure). For the stereoscopic edge test (which involved only 1 size of figure), a two-factor ANOVA was performed (DOF × depth order). If one of the factors DOF, contrast polarity, or depth order had a significant influence, the preferred condition for that factor was assigned according to its main effect. However, for DOF, separate main effects were calculated for the two figure sizes by performing separate two-way ANOVAs; the DOF preference for the small square was assigned according to the large-square responses, and the DOF preference for the large square was assigned according to the small-square responses. This cross-assignment was done to avoid introduction of a bias (see below).
The time course and latency of various neural population signals were analyzed. For visualization, peristimulus time histograms of the responses with 1-ms binwidth were computed and smoothed with a Gaussian kernel (σ = 5 ms). Latencies were determined from cumulative spike count histograms with 1-ms binwidth. The following signals were analyzed: 1) the edge onset response, defined as the mean across the eight conditions of Fig. 1B; 2) the DOF signals for small and large squares; 3) the contrast polarity signal; and 4) the depth order signal. The latter three signals were defined as the difference between the responses to preferred and nonpreferred conditions in the respective stimulus dimension, and the cumulative difference histograms were obtained by subtracting the number of spikes of the nonpreferred conditions from number of spikes of the preferred conditions, averaging across all neurons that were selective for that dimension. In the case of the DOF signal, the two figures sizes were kept separate. The population signals were analyzed for each animal separately and for the pooled set of neurons from all animals.
To determine the latencies, we used two-phase regression on the cumulative spike count histograms. For the edge onset response, the cumulative spike count increases monotonically as a function of time. Generally, it first follows a straight line whose slope corresponds to the baseline firing rate and then assumes a steeper slope some time after the onset of the stimulus. The two-phase regression fits a pair of straight lines by minimizing the total sum of the squared deviations. The intersection of the lines, which indicates the time point when the firing rate changes, was taken as the latency estimate. For the difference signals (DOF, contrast polarity, and depth order), the two-phase regression was performed on cumulative differential spike count histograms. The intersection of lines thus estimates the time point when the differential firing rate changes. Note that this is not the time when it is first detectable, but rather the time when the signal reaches approximately one-half its amplitude. Our estimates are thus comparable to those of Zhou et al. (2000), who measured time at half-amplitude.
For DOF signals, the two-phase regression was performed on the interval from 30 to 150 ms after stimulus onset, with the first leg forced to zero. The first leg should be zero, because the expected value of the differential spike counts before response onset is zero (see next paragraph). The lower limit was chosen because the latencies of the population response onset were >30 ms, and the upper limit was chosen to encompass the linear range of the differential spike count histogram (see results). For edge onset response and contrast polarity signal, the interval from −100 to 80 ms was chosen, and −100 to 100 ms was chosen for the depth order signal. For these signals, the first leg was allowed a nonzero slope.
In principle, the difference signals should be zero before stimulus onset, but a bias can be introduced if the assignment of preferred and nonpreferred conditions is based on the same data (assigning “preferred” to the condition with the higher mean firing rate). The difference preferred minus nonpreferred is then biased toward positive values because of the inherent random variation of responses. This could also bias the latency estimates. We took three measures to avoid such a bias. 1) We selected only neurons with a significant response difference (main effect P < 0.05, ANOVA) to calculate the population difference signal. As a result, the influence of random variations on assignment is limited to a small number of cases, and these tend to have small differences. This reduces the bias but does not completely eliminate it. 2) For DOF signals, we assigned the side preference for small figures on the basis of the main effect for the large figures, and vice versa. Thus the assignment was based on independent data, eliminating any bias. Accordingly, the first leg of the two-phase regression was forced to zero. 3) For contrast polarity and depth order, we assigned preference according to the same data. For these estimates, we allowed the first leg of the regression to have a variable slope. This slope would absorb the trial-by-trial fluctuations of activity during the baseline period, and to the extent that the same fluctuations occurred in the subsequent analysis period, the assignment bias would be compensated. Any residual bias is probably negligible, because, as the results show, contrast polarity and depth order signals showed a rather abrupt onset, resulting in a sharp kink in the cumulative histogram. Hence, variations in the slopes of the lines had little influence on the time point of intersection.
The latency measurements in this study refer to the beginning of the vertical scan of the CRT display. It took about 6 ms until the scan reached the average position of the receptive fields. To obtain correct absolute latencies, this amount should be subtracted from the latency figures in this report.
The standard deviations of the latency estimates were obtained by bootstrapping (Efron and Tibshirani 1993). The unit of random sampling was the spike train of a single response. The procedure involved two steps. 1) For each neuron and each stimulus condition, a number of spike trains were drawn randomly with replacement from the set of available responses, the number being the same as in the original data. 2) Cumulative histograms were then compiled from these spike trains, and a latency estimate was calculated by two-phase regression as described above. Steps 1 and 2 were repeated 1,000 times. The 68% limits of the resulting distributions are listed in Table 1 and shown as error bars in Fig. 8.
We also derived probability distributions of the latency difference between the large and the small figure conditions. First, we generated 1,000 latency estimates for each figure size by bootstrapping, as described above. We then calculated the difference distribution by randomly drawing a pair of latencies from the two sets and taking the difference, repeating this, after replacement, 10,000 times. For an arbitrary time interval Δ, the proportion of difference values ≥Δ was taken as the probability that the present results would be compatible with a model that predicts a latency difference of Δ.
To be able to compare the latencies of neuronal signals under different conditions, such as the DOF signal for different figure sizes, it was important to obtain accurate and unbiased latency estimates. We analyzed single-cell recording data from a large sample of neurons and computed the latencies of the population signals. Population signals were defined as the mean firing rates of all neurons in the sample that share the selectivity under consideration, for example, neurons selective for DOF, contrast polarity, or depth order. Thus, for each kind of signal, we obtained the time course of the mean firing rate and determined its latency. To enable statistical comparisons, we obtained distributions of the latency estimates by bootstrapping (see methods).
Because our task involves comparing latencies of signals of different amplitudes, we needed a latency estimate that is independent of amplitude and signal-to-noise ratio. A common method to estimate response latencies is to compile a spike time histogram and determine the time bin where the spike counts begin to differ significantly from those of the baseline (e.g., Maunsell and Gibson 1992). However, such estimates depend on the signal-to-noise ratio of the responses; of two signals with the identical mean time course, the signal with the higher signal-to-noise ratio is assigned a shorter latency.
To avoid this problem, we estimated latencies by calculating cumulative spike count histograms and fitting line segments by two-phase linear regression (see methods). For edge responses, the first leg of the fit shows a shallow slope, corresponding to the baseline firing rate, and the second leg shows a steeper slope, reflecting the new firing rate. For the DOF signal and the contrast polarity and depth order signals, which are response differences, the two-phase regression was performed on cumulative histograms of the spike count differences (see examples in Fig. 4). For the difference histograms, the slope and the intercept of the first leg has to be zero, because the firing rate before the onset of the stimulus should be the same regardless of the condition, and therefore the difference should be zero (but see discussion of exceptions in methods). The second leg estimates the differential firing rate after the onset of the signal, and the time point of the intersection is the latency estimate.
The following analysis is based on contrast edge responses of 892 orientation selective neurons (318 from V1, 574 from V2) recorded in 4 monkeys (identified by the 2-letter codes TE, RI, JI, and LA) and on stereoscopic edge responses of an additional 264 V2 neurons from RI and a fifth monkey, BA. In each neuron of the first sample, DOF and contrast polarity were tested with the four displays shown in Fig. 1, with squares of two sizes, 3 deg and 8 deg. Of these neurons, 447 (V1: 86, V2: 361) exhibited DOF selectivity, and 718 (V1: 289, V2: 429) showed contrast polarity selectivity, as indicated by significant main effects of the factors DOF and contrast polarity, respectively (ANOVA; see methods). The receptive field (RF) eccentricities of the DOF-selective cells of V2 ranged from 0.25 to 7.1 deg (mean 2.9 deg) and were similar in the four monkeys (means of 1.9, 2.6, 2.9, and 3.1 deg, respectively). The distribution of the RF centers is shown in Fig. 2. The RF eccentricities of the DOF-selective cells of V1 ranged from 0.35 to 11.8 deg (mean 2.8 deg). In the sample of neurons that were tested with stereoscopic edges (“cyclopean stimuli”; Julesz 1971), 74 were depth polarity-selective (33 from RI and 41 from BA).
Context-Defined Border Ownership
Calculation of cortical distances.
The DOF signal was defined as the difference between the responses to displays that have squares on opposite sides of the RF but are locally identical (Fig. 1C). If there is such a difference, the neuron must receive information from outside the region of identical stimulation or from its boundary. Figure 2 illustrates the situation for the case of an 8-deg stimulus and vertical preferred orientation. The rectangle shows the region of identical stimulation, which covers an area of 16 × 8 deg, centered on the mean location of RFs of the DOF-selective V2 cells, which is marked by a small circle. The size of the circle corresponds to the average size of CRF in V2 at that eccentricity. The two dashed circles indicate the distances of the nearest and the farthest discriminative features, respectively, for various stimulus orientations (we depicted the center at the mean RF position only for illustration; in the experiments, the region of identical stimulation was centered on the RF of the neuron and oriented according to the neuron's preferred orientation in each case).
By retinotopic mapping, the region of identical stimulation was projected onto a compact region in the visual cortex, as illustrated in Fig. 3. We calculated the projections into areas V1 and V2, as described in methods. The calculation is based on the “monopole mapping” proposed by Polimeni et al. (2006), which is an extension of the “complex log transform” used to describe V1 (Schwartz 1980). It provides a fairly accurate mathematical description of the projection of the visual field onto a flattened visual cortex (Van Essen and Maunsell 1980). The parameters for the projection in Fig. 3 were calculated for the right hemisphere of monkey RI. The red ellipse indicates a RF located 2 deg to the right and 2 deg below fixation (which is close to the mean RF position of our data). For the illustration, a preferred orientation of 75 deg was assumed. The corresponding cell locations in V1 and V2 are marked by red asterisks. The shaded areas show the projections of the regions of identical stimulations for the 3- and 8-deg squares used in our experiments. Note the greater distance of its boundary for the 8-deg figure compared with the 3-deg figure. The boundary segments 1 and 2 come closest to the cells; segments 3 are projected far away into the contralateral hemisphere.
In this example, going from the smaller to the larger figure increases the distance from the V2 neuron to the nearest discriminative context by 8 mm. Similar distances were obtained with other RF orientations. However, the distance estimates depend strongly on the eccentricity of the RF of the neuron under consideration, because the radial distance in the visual field is compressed about logarithmically in the cortex. The distance increase is smaller than the above estimate for neurons with larger eccentricity and larger for neurons with smaller eccentricity. The illustrated RF had 2.8-deg eccentricity (2, −2), and the median eccentricity of our sample was 2.7 deg. Thus slightly more than one-half of the cells had smaller eccentricities, and therefore larger context distances, than the example. Mapping functions were derived for two hemispheres of two of our monkeys (BA: 174 neurons; RI: 285 neurons) and the cortical distances that signals would have to travel to provide the context information to the DOF selective neurons were calculated (Craft 2009). The calculation assumed 1) that horizontal fibers can bridge a distance of 5 mm and 2) that larger distances are connected via chains of relays, with the constraint that all cells in the chain have CRFs located on the figure contour (cf. Zhaoping 2005). This constraint is in keeping with the basic notion that stimuli outside the CRF do not activate the neurons. Thus a neuron that is not activated by the figure contour cannot produce spikes and therefore cannot relay information. [The infrequent neurons that are activated by the interior of the figures (Friedman et al. 2003) probably do not participate, because experiments have shown that the DOF signal is essentially the same for contour-defined figures as for filled figures (Zhang and von der Heydt 2010), and the former do not activate interior neurons.] The distance to contextual information in the opposite hemisphere was calculated by adding one extra relay of 5 mm for the interhemispheric connection (being interested in the delays rather than the distances, we reasoned that the path through the corpus callosum, although longer, should take about the same time as a relay within the cortex, because the callosal fibers are faster). The shortest path under these conditions was taken as the distance estimate. The length of the shortest path depends on the length of the fibers, because the shortest path compatible with condition 2 may include line segments that do not coincide with the contour representation, e.g., segments cutting across the corners of the squares. We assumed a length of 5 mm, which is greater than the maximum measured length of horizontal fibers in V2 (2–4 mm; Amir et al. 1993; Levitt et al. 1994), to guard for the possibility that these measurements underestimate the maximum length. The results for the two moneys agreed closely, and we assume that the distances in the other animals were similar, because their brains were similar in size. In this report we use the calculations based on the brain of monkey RI.
Prediction of conduction delays from horizontal signal propagation.
To derive predictions of the conduction delays under the horizontal propagation hypothesis, we also need to know the conduction velocities of horizontal fibers. Measurements in V1 showed a range of velocities (Bringuier et al. 1999; Girard et al. 2001; Grinvald et al. 1994; Hirsch and Gilbert 1991). Craft (2009) therefore calculated medians of the minimum expected delays by sampling randomly from his distributions of cortical distances and from the distribution of conduction velocity estimates given in Fig. 4 of Bringuier et al. (1999) (see discussion). For V2 neurons with RF eccentricities between 2 and 4 deg, the minimum conduction delays were 75 and 200 ms for the 3- and 8-deg squares, respectively, and thus the differential delay was 125 ms. For eccentricities of 0–2 deg, the corresponding differential delay was 133 ms, and for eccentricities of 4–6 deg, it was 103 ms. By averaging the predicted delays for the eccentricities of the neurons of our DOF-selective sample, we obtained a mean differential delay of 124 ms.
It could be argued that the earliest possible appearance of DOF modulation is determined by the distance of the closest discriminative feature rather than the average over all discriminative segments, as assumed above. Calculating the minimum latencies based on this assumption leads to a prediction of 55 ms for the mean differential delay in V2 between the two square sizes. Note, however, that there is experimental evidence against this assumption (see discussion). We are aware that there are a number of uncertainties about the prediction. These are addressed in the discussion.
Latencies of DOF signals for small and large figures.
The population responses of the DOF-selective V2 neurons from one animal (LA) are illustrated in Fig. 4. The graphs in Fig. 4, A and B, represent responses to 3-deg squares and 8-deg squares, respectively. The graphs at left show the mean firing rates in the form of peristimulus time histograms for the preferred (black line) and nonpreferred (gray line) locations of the square. The graphs at right show examples of cumulative histograms of the spike count difference (gray) with the fitted two-phase regression lines (black). For each square size, three of the resampled histograms with their latency estimates are shown to illustrate the bootstrap procedure. These plots show that the regression lines generally fit the cumulative histograms well.
The peristimulus time histograms indicate that the response difference (the DOF signal) was somewhat greater for the 3-deg squares than for the 8-deg squares. The distributions of the estimates obtained by resampling are shown in Fig. 5. For the 3-deg square, we obtained a latency of the DOF signal of 80 ± 4 ms (mean ± SD), and for the 8-deg square we obtained 87 ± 7 ms. The difference was not significant (z = 0.78, P = 0.44).
The results for the other animals are summarized in Table 1. In three animals (TE, RI, and LA), the latencies of the DOF signal for small and large squares were rather similar, but monkey JI showed a latency difference of 33 ms. For the small square, JI's latency was similar to those of the other animals, but for the large square, it was much longer. This probably reflects a genuine difference between the animals, because the methods of recording and testing were the same in all four animals, and the RF eccentricities in JI (range 0.25–7.1 deg, mean 2.9 deg) were comparable to those in LA (0.75–6.6 deg, mean 3.1 deg). JI also showed a somewhat unusual time course of responses (see Supplemental Material, Fig. S1). (Supplemental material for this article is available online at the Journal of Neurophysiology website.)
We also computed estimates of the signal latencies for the pooled data from all four monkeys (Table 1). Performing the same analysis on the combined set of neurons, we obtained population averages of 71 ± 5 and 77 ± 8 ms for the latencies of the DOF signals for small and large squares, respectively. The difference was not statistically significant.
Border ownership selectivity is rare in V1, but we have a sufficiently large sample of neurons to compare the results from V1 and V2 (Table 1). The same analysis as described above was carried out after data were pooled from four animals (86 DOF-selective neurons). We found virtually no difference between the onset latencies of DOF signals in V1 and V2, suggesting that those V1 neurons that are DOF selective are modulated by the same context integration mechanism as the neurons of V2.
Direction of Edge Contrast
The onset latency for contrast polarity signals was estimated from the same data as the DOF signal, by pooling the responses for the two DOF conditions. Four hundred twenty-nine cells in the four monkeys showed significant preference for contrast polarity, as measured by the main effect of contrast polarity (ANOVA; see methods). There was no latency difference between the contrast polarity signals for large and small squares (which is no surprise because contrast polarity is detected by simple cells and involves the classical receptive field). Hence, the responses for the two sizes were pooled.
The time course of the mean firing rates for preferred and nonpreferred contrasts is shown in Fig. 6, together with the differential spike count histogram, for monkey LA. The estimated onset latency of the contrast polarity signal for this animal was 53 ms with a SD of 1 ms. The estimates for the other animals and for the pooled data of V2 and V1 were similar (Table 1). A comparison with the latencies of response onset shows that in V2, the contrast polarity signal emerged virtually at the same time as the responses (50 vs. 48 ms in the pooled data). This was not so in V1, where the responses emerged 8 ms earlier (42 ms) than the contrast polarity signals (50 ms).
Stereoscopically Defined Depth Order
Random-dot stereograms produce vivid perception of border ownership based on correlations between the patterns of the two eyes. The correlations define the figure boundaries (cyclopean edges) and also the depth order. In principle, the depth order of an edge can be determined from the correlations in the vicinity of the edge, either from the disparities on either side of the edge (Julesz 1960) or from the presence of unmatched monocular segments (half occlusion; Nakayama and Shimojo 1990). Thus, stereoscopically, depth order can be determined without the broad image context. Indeed, studies of border ownership signals in V2 showed that when figures are contrast-defined and there is no disparity, the neurons use direction of figure (i.e., the global shape of contours) for border ownership assignment, but for disparity-defined figures without contrast borders, the neurons use the disparities (or the half occlusions), and the global shape is irrelevant (Qiu and von der Heydt 2005). It was of particular interest to see if the stereoscopic assignment of border ownership is faster than the context-dependent assignment. Our results show that this is indeed the case.
We calculated the mean responses of stereo edge-selective cells to preferred and nonpreferred depth order and their differential spike count histogram, the depth order signal (Fig. 7, neurons from monkeys RI and BA pooled). The depth order signal emerged with a latency of 61 ± 2 ms, which is only 13 ms later than the estimate for the onset of edge responses in V2. This is remarkably fast. By comparison, the context-dependent DOF signals lagged 23–29 ms behind the onset of responses, which is about twice as long (Table 1, V2 pooled).
Implications for the Mechanisms of Context Integration
To summarize these findings, we plotted all the latency estimates in Fig. 8. The contrast polarity signals (open circles) emerged almost simultaneously with the edge responses (open triangles), whereas the DOF signals (filled circles) were clearly delayed. In three animals, the latencies of the DOF signals for 3- and 8-deg figures (small and large filled circles, respectively) were similar. The exception was monkey JI, in which the 8-deg figure had a much longer latency. The latencies of the corresponding signals in V1, plotted on the left in Fig. 8, were quite similar to those of V2. The depth order signals in stereo edge-selective V2 neurons (filled diamond) emerged slightly later than the contrast edge signals but preceded the DOF signals.
The main goal of this study was to determine whether the latencies of DOF signals increase with figure size as predicted by the horizontal propagation hypothesis. Our data show that the latency clearly increased with figure size in one animal, but not in the other three (Fig. 8). Since the latency values are statistical estimates, the question we ask is, what is the probability that the present results are compatible with the horizontal propagation hypothesis?
As described above, one can in principle derive an exact prediction of the conduction delays produced by horizontal propagation. We found that, in the present data, there should be a mean difference of 124 ms between the latencies of the DOF signals for the 3- and 8-deg figures. Given the uncertainty about this prediction, we rephrase the question and ask, if a hypothesis predicts a latency difference Δ, what is the chance that it is compatible with our data? In other words, given our recordings, what is the probability that there was actually a latency difference greater than Δ in the population signals?2 We calculated this probability as a function of Δ by randomly resampling the individual responses within stimulus conditions and computing the relative frequency of observing a difference greater than Δ (see methods). We did this for the data from each monkey and for the combined results. The distributions are plotted in Fig. 9. They show that a difference of 50 ms would be highly unlikely (P < 0.02 in each of the animals). The prediction, of course, has to hold for each of the four animals. Combining the results by multiplying the four probability values for each Δ, we find that Δ > 10 ms would be incompatible with the present data (P < 0.05). Remember that the predictions were derived from the hypothesis that DOF signals are generated solely by horizontal signal propagation within V2. Rejecting this hypothesis does not call into question the involvement of horizontal signal propagation in general, but it means that another mechanism contributes to the modulation observed in DOF selectivity.
To compare the onset latencies of a variety of neuronal signals, we developed a method based on the detection of a change in slope of the cumulative spike count histograms. By applying it to either spike counts or differential spike counts, this method can be used to measure latencies of responses as well as latencies of differential responses. Differential responses are common in the visual system because it often uses the opponent coding scheme to represent features that can take on positive and negative values, such as contrast, direction of motion, and border ownership. For comparing the latencies of different signals, it was important that the latency estimate not depend on the signal-to-noise ratio. For example, the edge contrast polarity signal (the difference between the responses to the 2 contrast polarities) has a smaller amplitude and larger variance than the mean of the two responses. Thus the contrast polarity signal has a lower signal-to-noise ratio than the mean response. Despite this, our method finds virtually the same latency (48 and 50 ms, Table 1, V2 pooled). Since a response difference can only emerge after there is a response, a 2-ms delay is about the shortest one can imagine. Methods that determine the time point where a signal first deviates significantly from the baseline tend to overestimate the latency of smaller or noisier signals.
The Speed of Context Integration
We found latencies of 71 and 77 ms for the population DOF signals for 3- and 8-deg squares, respectively, and 48 ms for the edge onset response (Table 1, V2 pooled). This agrees well with the figures of Zhou et al. (2000), who found latencies of 68 ms for the DOF signal and 43 ms for the edge onset response in V2 (time at half-maximum amplitude for a square size of 4–6 deg). In the smaller sample from V1, the latencies of the DOF signals (70 and 75 ms) were statistically indistinguishable from those of V2. This suggests that those V1 neurons that show DOF selectivity are modulated by the same context integration mechanism as the neurons of V2.
The speed of context integration can be appreciated by comparing how much the DOF signal, which requires context integration, is delayed relative to the signals for contrast polarity and stereoscopic depth order that can be derived by local computations. The contrast polarity signals emerged with the same latency as the responses themselves. The stereoscopic depth order signal was delayed by 13 ms relative to response onset, and the DOF signals for small and large figures were delayed by 23 and 29 ms, respectively (Table 1, V2 pooled). The delay for the depth order signal is surprisingly short, considering that this signal requires the computation of disparity differences. Although the assignment of border ownership according to the context (global shape) takes slightly longer, it is also very fast.
The main goal of our study was to test the hypothesis that the context influence in V2 neurons is the result of horizontal signal propagation in V2. Theoretically, it is possible to explain DOF selectivity by such mechanisms (Baek and Sajda 2005; Finkel and Sajda 1992; Grossberg 1994; Kikuchi and Akashi 2001; Sajda and Finkel 1995; Zhaoping 2005). Long-range horizontal fibers (Gilbert 1992) could provide context information for smaller figures. For larger figures, it is assumed that edge signals are relayed along the representation of the figure contours (Zhaoping 2005). However, because of the relatively slow spike conduction in horizontal fibers, the context effect will be delayed, and this delay must increase with the distance between the recorded neuron and the site where the context information is represented in the cortex.
By comparing two situations in which the distance requirements differ by a certain amount, we calculated the minimum latency difference between context effects that should be observed if the hypothesis were true. We obtained a predicted mean difference of 124 ms for the present data. There are three sources of uncertainty in this calculation. One concerns our estimation of cortical distances, the second concerns the available data on the conduction velocity of horizontal fibers, and the third concerns the errors in our latency measurements.
We used calculations of cortical distances by Craft (Craft 2009) that are based on a transformation similar to that proposed by Polimeni et al. (2006). Polimeni et al. showed that this transformation is quite accurate, predicting the locations of cells in cortex from the RF positions with a root mean square error of 2–3 mm across the entire V1-V2 complex. The error was assessed for neurophysiological recordings that linked RF positions to cortical locations in a flat map of the visual cortex (Van Essen et al. 1984,. 1990). It might be thought that the distortions produced by flattening might have increased the length estimates. However, we (Craft et al. 2007) have also measured cortical distances directly in the drawings of microscopic slices by Gattass et al. (1981), and these measurements (of the minimum distance for context integration in the case of 8-deg squares) agreed well with those predicted from the mapping function.
To estimate the horizontal propagation delays, we used the distribution of propagation velocities in Bringuier et al. (1999), who measured, in cat V1, the delays of synaptic potentials evoked by visual stimulation at various distances from the RF center. This distribution corresponds to a mean conduction delay rate of about 7.4 s/m. Studies in slice preparations (a design that rules out signal conduction through white matter loops) using electrical stimulation at various distances from the recorded cell found mean conduction delays of 3.5 s/m in cat V1 slices (Hirsch and Gilbert 1991) and 7.1 s/m (the reciprocal of the published velocity of 0.14 m/s) in monkey prefrontal cortex slices (Gonzalez-Burgos et al. 2000). The latter was measured at room temperature. Taking into account the temperature dependence of conduction velocity (Berg-Johnsen and Langmoen 1992), this corresponds to 3.6 s/m at body temperature, in good agreement with the measurements in cat V1 (Hirsch and Gilbert 1991). Using electrical stimulation and extracellular recording in vivo, Girard et al. (2001) found conduction delays with a median of 3.0 s/m (reciprocal of the published 0.33 m/s) for monkey V1. Importantly also, the horizontal propagation delays were found to increase linearly with distance (Bringuier et al. 1999; Gonzalez-Burgos et al. 2000; Grinvald et al. 1994). All these measurements agree well, showing that intracortical propagation involves delay rates of 3 s/m or more, which is an order of magnitude slower than the rate of 0.29 s/m that characterizes signals traveling through the white matter between V1 and V2 (3.5 m/s; Girard et al. 2001).
It should not be overlooked that the cited studies generally found large variations between measurements. Some of this variation is due to conduction velocity differences between fibers. The latency calculations we used (Craft 2009) took this into account by sampling randomly from the velocity distribution published by Bringuier et al. (1999) and averaging the resulting conduction delays, which is equivalent to assuming that signals propagate with the mean conduction delay rate. Alternatively, it is conceivable that context information is mediated by a subpopulation of fibers that conduct significantly faster than the average. For instance, Hirsch and Gilbert (1991) reported conduction delay rates of 3.5 ± 1.2 s/m, which means that a small fraction of fibers, those that have rates two SDs below the mean, might propagate signals at a rate as fast as 1.1 s/m.
The Experimental Results
Our latency measurements indicate that the speed of context integration exceeds the limits set by conduction velocity of horizontal fibers. In three of the four animals studied, we found virtually no increase of the latencies of DOF signals with size of figure (Fig. 8). Because of the uncertainty about the horizontal propagation delays in V2, and taking into account the statistical variance in our latency measurements, we derived the probabilities that our data are compatible with the horizontal propagation hypothesis for a range of propagation delays (Fig. 9). These calculations show that finding a latency difference as large as the predicted 124 ms would be extremely unlikely. Even the more optimistic prediction of 55 ms, which is based on the fictitious assumption that the context influence comes mainly from the closest discriminative feature, is far outside the acceptable range. We call this assumption fictitious, because experiments show that all segments of the figure contours contribute about equally to the DOF signal and that the contribution of close segments is not faster than that of far segments (Zhang and von der Heydt 2010).
How would the hypothesis fare if horizontal fibers would conduct faster than assumed in that calculation? Combining the results from all four monkeys, we found that latency differences >10 ms would be incompatible with the present data (P < 0.05). This shows that the context integration cannot be achieved through horizontal intracortical connections alone, unless one assumes that sufficient numbers of fibers exist that conduct much faster than the average. To produce a latency difference smaller than 10 ms, these fibers would have to conduct at ∼0.6 s/m or faster (this follows from the above because Δ varies in proportion to the conduction delay rates: Δ = 124 ms was obtained assuming 7.4 s/m; therefore, Δ = 10 ms requires 10/124 × 7.4 s/m). Given the measurements of horizontal fiber conduction reviewed above, this seems unlikely. According to Hirsch and Gilbert (1991), for example, 0.6 s/m would be 2.4 SDs below the mean value. If such fibers indeed exist, they are only a very small fraction. This would not be sufficient to produce the present results. Recall that our latency estimates correspond approximately to the time point when the DOF signal reaches one-half its maximal strength (see methods). This means that a substantial fraction of the context signals must have arrived at the recorded neuron by that time. Finally, we must not forget that context integration by horizontal fibers would generally require several relays, because the distances to be bridged are much larger than the maximal horizontal spread of axons. Each relay involves a synaptic delay and a postsynaptic integration time, which add to the conduction times. These relay delays were not included in our predictions.
Powerful Back Projection Mechanisms
We conclude that the only plausible way to explain the short and nearly invariant latencies of DOF signals is to assume that local feature signals (from V1 or V2) are integrated in a different area and that the results are sent back to V2, where they create DOF selectivity. This is plausible, because white matter fibers conduct an order of magnitude faster than intracortical fibers (Girard et al. 2001), and the lengths of loops through other prestriate areas, like V4 and MT, are not necessarily longer than the within-area distances. The average length of white matter fibers connecting parafoveal V2 with V4 is no more than about 20 mm (estimated from Gattass et al. 1988, Fig. 5). Assuming a conduction delay rate of 0.29 s/m (Girard et al. 2001), this would produce conduction delays of about 6 ms (20 mm × 0.29 s/m), or 12 ms for the loop. Between the middle temporal area (MT) and V1, the pure conduction delays are as short as 1.3 ms (Movshon and Newsome 1996). Most likely, the connections between MT and V2 are similarly fast. Neurons in higher level areas have larger receptive fields and thus integrate a broader context than V2 neurons. In addition, the back projections from a given point fan out over several millimeters in V2 (Rockland et al. 1994). That back projections produce surround modulation in V2 neurons has been demonstrated in the case of MT with an inactivation method (Bullier et al. 2001). Studies of surround influences in V1 neurons, using an entirely different paradigm, came to a similar conclusion, namely, that the “far surround” is the result of back projections from a higher level area (Angelucci et al. 2002).
As pointed out, the present results only show that DOF selectivity cannot be explained by horizontal signal propagation alone, but they do not rule out a combination of back projections and horizontal connections. A combination of mechanisms would explain the paradoxical finding that the influence of the most distant contour segments of a figure arrives earlier than the influence of the segments closest to the CRF (Zhang and von der Heydt 2010). Although the influence of distant features likely involves fast white matter loops, that of the closer features might be mediated predominantly by slow horizontal connections. Because surround suppression also exists at earlier stages (e.g., LGN), which may also include back projections (Webb et al. 2005), surround suppression in V1 and V2 neurons could be inherited from these earlier mechanisms. However, the context influence in DOF selectivity shows not only suppression but also enhancement and generally extends across the vertical meridian as illustrated in Fig. 3 (typically, contour segments on both sides contribute; Zhang and von der Heydt 2010). If DOF selectivity were inherited from subcortical circuits, this would not be possible.
Lamme and colleagues (Lamme 1995; Zipser et al. 1996; Lee et al. 1998) also suggested the involvement of feedback from higher level cortical areas, emphasizing the delayed onset of contextual modulation in their paradigm (about 100 ms after stimulus onset) as a result that is compatible with the feedback scheme. However, because this latency is relatively long, and the range of context influence in this paradigm is limited (maximum context distance of 4–5 deg; Zipser et al. 1996), these experiments do not rule out horizontal propagation mechanisms. To strengthen the argument for an involvement of higher level cortical areas, Lamme et al. (1998) showed that the figure enhancement effect disappears under anesthesia.
Visual context integration is often modeled on the basis of horizontal connectivity. Our results suggest that interareal feedback mechanisms must be involved. It is important to emphasize that assuming feedback from higher level visual areas does not mean to imply influences of object shape memory or attention. Indeed, finding signal latencies of 70–80 ms rules out the possibility that shape-selective regions in inferotemporal cortex contribute, because these regions only begin to get active around this time (Bullier 2001). Attention does modulate DOF-selective neurons in V2, but DOF selectivity itself is independent of attention (Qiu et al. 2007). The feedback mechanisms could be simple, for example, based on concentric integration fields of neurons at the higher level and a broad fanning out of back projections (Craft et al. 2007; Roelfsema et al. 2002).
This research was supported by National Eye Institute Grants EY02966 and EY16281 and U.S. Office of Naval Research Grant N000141010278.
No conflicts of interest, financial or otherwise, are declared by the author(s).
We thank E. Niebur for critical comments on the manuscript, E. O. Craft for help in preparing Fig. 2, and O. Garalde for excellent technical assistance.
↵1 Zhou et al. (2000) used the term “border ownership selective” for this property. In the present study we use the more specific term “direction of figure selective” to distinguish border ownership selectivity that is based on the context (the distribution of contours) from border ownership selectivity based on local cues such as disparity variation or dynamic occlusion.
↵2 In comparing experimental results and prediction, we make the assumption that an increase in the mean conduction delay directly adds to the latency of the population signal.
- Copyright © 2011 the American Physiological Society