Neurons with surround suppression have been implicated in processing high-order visual features such as contrast- or texture-defined boundaries and subjective contours. However, little is known regarding how these neurons encode high-order visual information in a systematic manner as a population. To address this issue, we have measured detailed spatial structures of classical center and suppressive surround regions of receptive fields of primary visual cortex (V1) neurons and examined how a population of such neurons allow encoding of various high-order features and shapes in visual scenes. Using a novel method to reconstruct structures, we found that the center and surround regions are often both elongated parallel to each other, reminiscent of on and off subregions of simple cells without surround suppression. These structures allow V1 neurons to extract high-order contours of various orientations and spatial frequencies, with a variety of optimal values across neurons. The results show that a wide range of orientations and widths of the high-order features are systematically represented by the population of V1 neurons with surround suppression.
Outside the classical receptive fields (CRFs), many cortical neurons have additional regions referred to as surround, where stimuli typically reduce responses of neurons (Allman et al. 1985; Blakemore and Tobin 1972; Fitzpatrick 2000; Hubel and Wiesel 1965). The interactions between the CRF (center) and surround are suggested to signal higher-order visual features such as contrast and texture borders (Knierim and van Essen 1992; Nothdurft et al. 2000; Rossi et al. 2001; Shen et al. 2007; Zhou et al. 2000; Zipser et al. 1996) and subjective contours (von der Heydt et al. 1984). However, we do not yet have a systematic understanding of how such higher-order visual features are represented by the population of primary visual cortex (V1) neurons via these center–surround effects. One reason for the lack of the systematic understanding is that, previously, we have not been able to measure spatial structures of the center and surround organization with sufficient detail and accuracy. Although several studies investigated spatial distribution of surround suppression (DeAngelis et al. 1994; Kapadia et al. 2000; Pack et al. 2003; Vinje and Gallant 2000; Walker et al. 1999), they were not able to describe the exact spatial forms of the center and surround. Without exact knowledge of the structure for a population of V1 neurons, it would be difficult to devise a testable hypothesis of encoding.
How is encoding of high-order features related to spatial organizations of the center and surround? The model neuron in Fig. 1A prefers horizontal gratings in the CRF (solid ellipses) and has suppressive surround in the end zones (dotted ellipses). An optimal stimulus for this neuron, a vertical strip of horizontal gratings, contains vertical contours defined by contrast differences, matched to the relative arrangement of the CRF and the suppressive zones. These contours are higher-order stimulus attributes, not detectable by the CRFs of simple and complex cells alone. The high-order borders, particularly their orientation, may be coded effectively if the CRF and surround are both elongated in parallel directions (Fig. 1B), although it is unclear whether neurons really have such structures.
In this study, we developed a new method for simultaneously obtaining the detailed maps of the CRF and surround and examined whether V1 neurons possess center–surround structures suitable for encoding orientation and width of the high-order features as depicted in Fig. 1. We also analyzed whether these structures show sufficient variations in orientations and widths (or spatial frequencies) necessary for systematically representing a wide range of high-order features as a population. We used a set of sinusoidal stimuli as illustrated in Fig. 1C. Intuitively, this stimulus set may be understood as a generalized form of the stimuli in Fig. 1, A and B, encompassing a wide range of widths and orientations of high-order contours. It is known that if a system is linear, the filter profile is fully reconstructed from the frequency-domain responses (Enroth-Cugell and Robson 1966; Movshon et al. 1978b; Ringach 2002; So and Shapley 1979). Here, we have extended this method to examine the center–surround organization of neurons in cat area 17.
Electrophysiological recordings and surgery
All animal care and experimental guidelines conformed to those established by the National Institutes of Health (Bethesda, MD) and were approved by the Osaka University Animal Care and Use Committee.
Subjects and surgeries
Twenty-nine normal adult cats (2–4 kg) were prepared for single-unit recording using standard procedures. After initial preanesthetic doses of hydroxyzine (atarax; 2.5 mg) and atropine (0.05 mg), each cat was anesthetized with isoflurane (2–3.5% in oxygen). Cefotiam hydrochloride (Panspolin; 8.3 mg) and dexamethasone sodium phosphate (Decadron; 0.4 mg) were administered. Electrocardiogram electrodes and a rectal temperature probe were inserted and femoral veins were catheterized. A glass tracheal tube was inserted by tracheotomy. Subsequently, the animal was secured in a stereotaxic apparatus with the use of ear and mouth bars and clamps on the orbital rims. Tips of the ear bars were coated with local anesthetic gel (Lidocaine). Anesthesia was then switched to sodium thiopental (Ravonal, given continuously at 1.0–1.5 mg·kg−1·h−1). After stabilization of anesthesia, paralysis was induced by a loading dose of gallamine triethiodide (Flaxedil, 10–20 mg) and the animal was placed under artificial respiration with a gas mixture of nitrous oxide (70%) and oxygen at the rate of 20–30 strokes/min. The respiration rate and stroke volume were adjusted to maintain the end-tidal CO2 between 3.5 and 4.3%. To maintain paralysis and anesthesia for the rest of the surgery and following recording sessions, we continuously infused gallamine triethiodide (10 mg·kg−1·h−1) and sodium thiopental (1.0–1.5 mg·kg−1·h−1) contained in an infusion fluid (Ringer solution, 1 ml·kg−1·h−1) that also includes glucose (40 mg·kg−1·h−1). A hole (typically 5–7 mm in diameter) was made over the representations of area 17 (Horsley–Clark P4 L2). The dura was dissected away to allow insertion of microelectrodes. Pupils were dilated with atropine sulfate (1%), and nictitating membranes were retracted with phenylephrine hydrochloride (Neosynesin, 5%). Contact lenses of appropriate power with 4-mm artificial pupils were placed over the corneas. Electrocardiogram, end-tidal CO2, intratracheal pressure, heart rate, and rectal temperature were continuously monitored and maintained at normal level throughout the experiments.
Lacquer-coated tungsten microelectrodes (1–5 MΩ; A-M Systems, Sequim, WA) were used for extracellular recording from single cells in area 17. The signals from the electrodes were amplified, band-pass filtered (A-M Systems, Model 1800), and fed to a custom-made data acquisition system (Ohzawa et al. 1996) and an oscilloscope. The data acquisition system consisted of A-D converters and a spike-sorter that sorted signals from each electrode into a maximum of five different classes in real time. The isolated spike data (time resolution: 40 μs) were sent from the data acquisition system, along with stimulus timing information, to a separate computer that controlled trials and performed preliminary on-line analysis. The data were saved to a file to allow off-line analysis of the data.
Stimuli were produced by a Windows-based PC controlling a graphics card (Millennium G550, Matrox, Dorval, Quebec) and were displayed on a color CRT monitor (76 Hz, 1,600 × 1,024 pixels, mean luminance 47 cd·m−2; 46.6 cm in width and 29.9 cm in height; GDM-FW900, Sony, Tokyo). In each recording session, the luminance nonlinearity of the monitor was measured using a photometer (Minolta CS-100, Konica Minolta Photo Imaging, Mahwah, NJ) and linearized by gamma-corrected look-up tables. The animal saw the monitor screen through a custom-built haploscope, which allows visual stimuli to be presented to the left and right eyes separately. A black separator was placed between the left and right visual fields to preclude the projection of stimuli to unintended eyes. Distance (total length of light paths) between the monitor screen and the eyes was set to 57 cm, subtending the visual field of 23.3 × 29.9° for each eye.
The center–surround structure was probed by two-dimensional (2D) contrast modulated gratings (Fig. 1C), in which the contrast of high-frequency luminance grating called carrier is sinusoidally modulated with a low spatial-frequency envelope. The spatiotemporal luminance profile of these stimuli S(x, y, t) is defined as where fcx and fcy are spatial frequencies of the carrier for x-axis and y-axis; fex and fey are those for the envelope; and ωc and ωe are temporal frequencies of the carrier and the envelope, respectively. The contrast of the carrier C was 50% and the modulation of the envelope m was 100%. Lmean indicates the mean luminance.
These gratings are also referred to as amplitude-modulated (AM) gratings. They belong to a class of stimuli referred to as “non-Fourier stimuli” or “second-order stimuli” (Cavanagh and Mather 1989; Chubb and Sperling 1988; Wilson 1999; Zhou and Baker Jr 1993).
Once extracellular spikes of single neurons were isolated in recordings from anesthetized and paralyzed cat V1 (area 17), basic tuning parameters of the CRFs were determined initially using conventional stimuli under a manual mouse control. We determined approximate position of the CRF, preferred orientation, spatial frequency, and direction and whether the neuron had any surround suppression, by manually changing the patch size. Then, a series of quantitative measurements were conducted under computer control. A typical sequence of measurements was as follows. Spatial frequency and orientation tunings were measured by subspace reverse correlation using rapidly flashed grating stimuli (Nishimoto et al. 2005; Ringach et al. 1997a). Dense-noise stimulus was presented to obtain receptive-field (RF) maps of CRF (linear kernel) using standard reverse correlation (Sasaki and Ohzawa 2007). Orientation and spatial-frequency tunings were also obtained using conventional drifting grating stimuli. At this point, accurate position of the CRF center could be determined for simple cells and some complex cells based on linear RF maps from the dense-noise data and via inverse Fourier reconstruction from the subspace-mapping data (Ringach et al. 1997b). For confirming the center position of the CRF, in particular for complex cells for which linear maps could not be measured, we also carefully adjusted the stimulus position using a small drifting grating patch under mouse control. The rest of the measurements were conducted entirely under computer control.
Then, to determine the degree of surround suppression, size-tuning of the neuron was measured by varying the diameter of a patch of drifting grating of optimal parameters. If neurons appeared to show consistent surround suppression, we then measured responses to a set of contrast-modulated gratings (Fig. 1C). Size of these grating patches was set to be larger than that at which responses dropped to an asymptotic level in the preceding size-tuning measurements. The center of stimuli was positioned near the center of the CRF of a main target neuron. The orientation, direction, and spatial frequency of the carrier were always set at the optimal parameters for the CRF. The temporal frequency of the envelope was 0.5 or 0.75 Hz; the temporal frequency of the carrier was 2 to 5 Hz.
Measurements of neuronal responses to the contrast-modulated gratings typically consisted of two runs. First, we recorded responses of the neuron while the orientation of the contrast envelope was varied in 30° steps. The spatial period of the contrast envelope was set at twice the peak diameter in the size-tuning curve. Next, we measured responses to various spatial frequencies of the contrast envelope for one envelope orientation or two or four evenly spaced envelope orientations. These orientations of the contrast envelopes included the optimal one determined in the first run. For each spatial frequency and orientation, the contrast envelope was drifted in two opposite directions. Testing with both directions of envelope movement is important for obtaining accurate phase information of the center–surround structure as well as determining direction selectivity of the center–surround mechanism. Null stimuli consisting of a uniform mean-luminance screen were included in both of the runs. Each stimulus was typically presented five times in a pseudorandom order. In each trial, a stimulus was presented for 4 s. For a small proportion of neurons, the first envelope orientation-tuning run was not conducted.
Responses to the contrast envelopes were quantified as the amplitude and phase of the fundamental frequency components (F1) at the temporal frequency of the envelope drift. F1 components were computed by applying the Fourier transform to peristimulus time histograms (PSTHs) summed over all repeated trials to obtain reliable values. With this method, however, trial variations of F1 responses for each stimulus were not obtained. When they were necessary (e.g., for a statistical test), we also computed F1 components on a trial-by-trial basis. ANOVA for the orientation tuning (Fig. 8A) was applied to these data in which each of six orientations typically had 10 data points (5 trials × 2 directions).
We analyzed neurons that produced >4 spikes/s to at least one of the contrast-modulated gratings (n = 180). For 81 of these neurons, more than two envelope orientations were tested in the run where envelope spatial frequency was varied. In this run, the carrier temporal frequency was set at 4.5 or 5 Hz. These neurons were used for reconstructing the full 2D center–surround structures and for further analyses based on these structures. Additional analyses were conducted for a larger population of neurons if a subset of data were sufficient for the given purpose.
Reconstruction of the center–surround structure
The spatial profiles of the CRF (center) and suppressive surround may both be defined as spatial sensitivity functions for contrast using Gauss functions. The center Gaussian gc(x, y) is defined as (1) where μcx, μcy, σcx, and σcy are position and width parameters for the x- and y-axes and Ac is its amplitude. Similarly, the surround Gaussian gs(x, y) is denoted as (2) where As, μsx, μsy, σsx, and σsy are corresponding parameters for the surround.
We assume that responses of the combined center–surround RF are computed by subtracting responses of the surround from those of the center. Then, the combined center–surround RF is modeled by a difference-of-Gaussian (DOG) function (3) Position and size of the two Gaussians were allowed to be independent. The two Gaussians were also allowed to rotate independently, although not expressed in the preceding equations. This model is conceived as a generalized version of the DOG function and is able to represent a variety of known RF structures from concentric center–surround organizations to simple-cell–like parallel elongated subregion structures.
We show in the Supplemental Note S11 and associated figures that F1 components of responses of V1 neurons to contrast-modulated gratings represent local stimulus contrast that defines contrast envelope and that the following linear system analyses may be applied to these responses to reconstruct the center–surround structure with accuracy.
The modulated components of the stimulus contrast of the 2D contrast modulated gratings are written as (4) where fx and fy are the spatial frequencies of the contrast envelope for the x- and y-axes and ω indicates the contrast-envelope temporal frequency. The plus-minus sign (±) indicates the two directions of the contrast-envelope drift. When we denote a spatial linear filter for contrast as h(x, y) and responses of the filter to the contrast envelopes as R(t), R(t) also becomes a sinusoidal function with the temporal frequency of the stimulus drift ω (5) where A(fx, fy, ω) and P(fx, fy, ω) indicate response amplitude and phase, respectively.
If we further assume that response time course of the neuron is independent of its spatial RF structure h(x, y), these amplitude and phases are written as (6) (7) where AH and PH indicate the amplitude and phase of the Fourier transform of h(x, y) and c and ρ are constants given a temporal frequency ω. It has been shown that the response time courses of the CRF and the surround are roughly matched (the surround response is on average delayed by 7 ms; Knierim and van Essen 1992), validating the aforementioned separability assumption. Notice also that if this separability assumption holds, the center–surround RF filter has no direction selectivity for the contrast envelope, which was actually observed in our data (Figs. 3C and 10D). See more details on this issue in the discussion.
Equations 6 and 7 state that responses of individual neurons to the contrast-modulated gratings can be used to specify functional shapes of h(x, y). When h(x, y) is a DOG function g(x, y), AH(fx, fy) and PH(fx, fy) in Eqs. 6 and 7 are defined by the amplitude and phase of its Fourier transform G(fx, fy), which is defined as (8) where Ac, μcx, μcy, σcx, σcy, As, μsx, μsy, σsx, and σsy are the same parameters as those in Eqs. 1 and 2 and i = (although a rotation of the DOG functions was not expressed in this equation, it was taken into account for real analysis). Predicted responses of the DOG model to the contrast envelopes are thus expressed by Eqs. 6 and 7, using AH(fx, fy) and PH(fx, fy) defined as amplitude and phase components of G(fx, fy). To specify the model, we fitted the obtained F1 amplitude and phase data by these predicted responses. Details of the fitting procedure are described in the following text.
For reconstruction of one-dimensional (1D) center–surround structures along a given axis by the inverse Fourier transform, a 1D spatial phase function of the envelope spatial frequency for a fixed envelope orientation was computed by subtracting two response phases for the same contrast-modulated gratings drifted in two opposite directions and dividing it by 2, that is
A 1D amplitude function was also computed by averaging the two response amplitudes to the two gratings. Application of the inverse Fourier transform to the averaged amplitude function and the spatial phase function produced a 1D spatial profile along the axis perpendicular to the fixed envelope orientation. The relationship between response phases and spatial phases of a linear filter has been derived (Hamilton et al. 1989).
The fitting procedure is as follows. We measured the full data set of amplitude and phase of F1 components, plotted as functions of spatial frequency for the contrast envelope at several envelope orientations. An example of amplitude and phase functions for one selected orientation is shown in Fig. 2, A and B. These are functions for optimal envelope orientation for the neuron shown in Fig. 3 (see full data in Fig. 3, C and D). Next, the amplitude and phase data for each frequency component are plotted on a polar plane in which the radial length and angle of each point indicate the amplitude and phase of each frequency component, respectively (Fig. 2C). Figure 2D shows this data representation for amplitude and phase data in Fig. 2, A and B. Note that the fits were not performed in the space domain in which the DOG function is defined. Instead, the fits and the associated computation of fitting errors were conducted in the domain of Fig. 2D, representing the spatial-frequency response in the polar coordinate system.
We use the Levenberg–Marquardt algorithm (Matlab lsqcurvefit function), which achieves a nonlinear model fitting based on a least-squares criterion. The model includes 13 parameters in total (Ac, σcx, σcy, μcx, μcy, As, σsx, σsy, μsx, and μsy in Eq. 8, two rotation parameters, and temporal delay ρ in Eq. 7). Since the error surfaces usually have multiple local minima for the nonlinear regression, starting from good initial values is important to reach a global minimum. The next procedure was conducted for this purpose.
First, we conducted a 1D inverse Fourier transform to the amplitude and phase function for the optimal envelope orientation, so that we obtained 1D spatial profiles along the axis perpendicular to this envelope orientation. The solid line of Fig. 2E shows the 1D spatial profile obtained by applying this method to data in Fig. 2, A and B. We also applied this procedure to data for the envelope orientation perpendicular to the optimal one, obtaining another spatial profile along an axis perpendicular to that of the former profile.
Note that these 1D profiles are determined model-free via direct inverse Fourier transformation. We now obtain some of the model parameters via 1D fits to these profiles because such fits may be conducted more reliably with only six free parameters. These fitted parameters can then be used as initial model parameters for the full 2D model. The spatial profile for the optimal envelope orientation was fit with a 1D DOG function along the x-axis (9) The other profile obtained at the envelope orientation orthogonal to the optimal was fit with another 1D DOG function along the y-axis (10)
Initial values for these 1D fits were selected based on peak and trough amplitudes and their locations of the 1D profiles. The fitted parameters for Eqs. 9 and 10 were combined to set initial values for 10 of 13 2D DOG model parameters that appear in Eq. 8. For the amplitude parameters, the fitted values of A′c and A′s in Eq. 9, not A′′c and A′s in Eq. 10, were used as the initial values for the 2D fitting. The remaining 3 initial parameters for rotations and temporal phase were set to be 0. The gray broken curve in Fig. 2F is a 1D DOG function fitted to the spatial profile indicated by a black curve for the optimal envelope orientation. As shown in this example, we found that this preparatory fitting works generally well. The mean r2 value of this fitting was 0.87 (SD = 0.19, n = 53 neurons for which 2D reconstruction was conducted; see results). When the 1D spatial profile for nonoptimal envelope orientation is completely flat, there might be no way for obtaining a reliable fit. Even in such a scenario, in any case we used fitted values in Eq. 10 as initial values for the next 2D fitting, as well as rational initial values from Eq. 9.
As the final stage of fitting, we calculated response prediction of the 2D DOG model (Eqs. 6 and 7), and expressed them on the polar plane, where neurons' actual data are also plotted. Starting from the initial parameter values determined earlier, the parameters were then determined that produced the best fit to the data in the polar domain. All 13 parameters were fitted simultaneously. The procedure was repeated with minor variations of initial values, and the best fit was adopted. An example of fitting by a 2D DOG model is shown by gray curves in Fig. 2G, where predictions of a 2D DOG model are fitted to data shown in Fig. 2D. Although only data for one envelope orientation are shown here, the 2D fitting was actually simultaneously conducted for the full data set (as shown in Fig. 3F). The fitting was considered to be successful if the coefficient of determination (r2) was >0.65. If the width parameters of the fitted G(fx, fy), that is, 1/(2πσ), became larger than the highest value of the measured spatial frequencies, we considered that the range of measurement was not appropriate and the fitting was unsuccessful.
In addition to predetermining initial model parameters for fitting as described earlier, we also conducted full 2D fits with 10 independent sets of randomly chosen initial values for the 13 parameters within the range of possible values. Of course, some of these fits ended up being trapped in completely different local minima, but for 36 of 53 neurons for which 2D reconstructions were conducted (see results), at least one case out of 10 random fitting attempts (median 3, maximum 10) converged to the same final fit as that initialized by 1D fits. In only 2 of 53 cells, randomly initialized fits performed better than those with predetermined initialization, having by >0.01 higher r2 values. However, the 2D fittings for these two cells were not successful (r2 <0.65) and not included in the 35 neurons that were used for analysis of 2D center and surround structures (see results).
Note that the main model includes temporal phase parameter ρ, in addition to the model spatial parameters. The temporal phase component in the response phase is absorbed in this parameter, being isolated from spatial phase components (Eq. 7). Therefore our method can determine the absolute localization of the center and surround mechanisms.
To evaluate the reliability of the final fitted parameters, we conducted a bootstrap analysis (Efron and Tibshirani 1993). For each stimulus presented five times, we randomly resampled five data from the original five responses, allowing for repetitive sampling of the same data. We then fit the model to these data and obtained fitted model parameters. We conducted this computation 100 times and calculated the bootstrap SEs of the model parameters (SD of 100 fitted values for each model parameter) as a measure for evaluating reliability of the model parameters. We also calculated 100 samples of elongation indices of the reconstructed structures (see results) and used them for judging the indices to be significantly >1 on a statistical basis (one-tailed t-test).
The DOG model assumes that the interaction between the center and surround is linear and subtractive. However, the interaction has often been described by a division between the responses from the two regions (Cavanaugh et al. 2002a; DeAngelis et al. 1994), reflecting the fact that the surround stimuli that have suppressive effects on the center stimuli may not affect or sometimes enhance neuronal responses when presented alone. For this reason, we must examine possible consequences of using the subtractive DOG model and applying it to the data obtained from an intrinsically divisive mechanism. We conducted this analysis by simulations. The results show that our method generates nearly identical receptive fields regardless of whether the underlying mechanism is subtractive or divisive, as we describe in Supplemental Note S2 and the associated figures (Supplemental Figs. S3 and S4). These results are consistent with the observation that the subtractive and divisive models yield similar estimates of the extent of the center and surround (Sceniak et al. 2001). We therefore use the subtractive DOG models throughout the present study.
Quantitative analysis of the size-tuning curves
Size-tuning curves were measured with optimal luminance sinusoidal gratings. The diameter of the grating typically varied from 0.5 to 18°. For each size, the grating was presented 5 to 12 times. To quantify the size-tuning curves, we performed spline interpolation on the data points and calculated the center diameter at which the spline curve first exceeded 95% of its peak amplitude, and the asymptotic diameter, at which the curve first dropped by >90% of the peak-to-asymptotic amplitude of the curve. The former represents an estimate of the CRF diameter, whereas the latter gives an estimate of the farthest point of the surround (outer diameter of the surround, if the surround is concentric).
To quantify the strength of suppression, we calculated the suppression index, defined as a ratio of the peak-to-asymptotic amplitude to the peak amplitude. The index value of 1 indicates complete suppression, whereas 0 indicates no suppression. The index value of 0.25 was used to select neurons for the main analysis.
To predict size-tuning curves based on the reconstructed center–surround structures, volumes of these structures within a given circle were calculated and plotted as a function of the circle diameter. The center of the circles was set to be the same as that in the real size-tuning measurement. The center (peak) and asymptotic diameters for these curves were determined the same way as the actual tuning curves were.
Tests using small-grating patches in the surround
To directly measure suppressive effects of the surround, a conditioning patch was presented at the CRF while another patch of the same size was presented at one of eight locations around the CRFs (Walker et al. 1999). These locations lay along the absolute vertical, horizontal, and oblique (±45°) axes extending from the CRF center and were placed at a distance equal to the diameter of the patch. The grating parameters for the CRF were set to be optimal for the CRF. Its diameter was set to the center diameter obtained from the size-tuning measurements. The surround patches had the same parameters as those for the CRF except that their locations were different and a part of the patch could be missing (for not invading the CRF region). The presentation of each stimulus lasted 2 s and was repeated 5 to 12 times.
Once one or more neurons were isolated, we first determined basic characteristics of the CRF using conventional stimuli including sinusoidal gratings with uniform contrast (see methods). We then examined the degree of the surround suppression by increasing the diameter of the optimal grating patch centered about the CRF. If neurons appeared to show consistent surround suppression, they were presented with a complete set of contrast-modulated gratings (Fig. 1C), in which the contrast of fine sinusoidal carrier patterns is modulated by yet another sinusoid envelope. They were presented to an area sufficiently large for covering both the CRF and the surround. The carrier was always set to optimal parameters for the CRF and was drifted at a relatively high temporal frequency in the best direction for the CRF. We also drifted the contrast envelope at a relatively low temporal frequency. Responses to these gratings were used to reconstruct the CRF and surround, described in the next section.
Reconstruction of the center–surround structures
To reconstruct the center (CRF) and surround structure, we fitted a relative of a previously published model to data from individual V1 neurons (DeAngelis et al. 1994; Sceniak et al. 1999, 2002). The center and surround were modeled as positive and negative Gaussian filters, respectively, which linearly sum contrast amplitude of visual stimuli, weighted by their Gaussian profiles. Net responses of the center–surround organization model were calculated by subtracting the responses of the two Gaussians. Position, size (length and width), and axis of the center Gaussian were independent of those of the surround. This model is conceived as a generalized version of the DOG model and is able to represent a variety of known RF structures from concentric center–surround organizations to simple-cell–like parallel elongated subregion structures. Gabor functions were not used as models for reconstruction because 1) they do not allow concentric organizations, which do occur for center–surround structures; 2) position, size, and axis of the center and surround cannot be independent; and 3) highly periodic structures containing multiple CRF and surround regions are unrealistic because CRF is generally thought to be a single region.
To specify the model for individual cortical neurons, we used neurons' responses to the contrast-modulated gratings. In general, if stimulated by sinusoidal gratings of various spatial frequencies drifted at a given temporal frequency, a linear filter shows responses that are also sinusoidally modulated at the same temporal frequency. Amplitude and phase of these temporal-frequency components form frequency-transfer functions that fully determine spatiotemporal profiles of the filter (Hamilton et al. 1989). Therefore by fitting the DOG models to measured transfer functions for the contrast envelopes, we specified the center–surround structures for each recorded neuron.
To measure the full range of transfer functions, we varied the spatial frequency of the contrast envelope from 0 to relatively high spatial frequencies. When the spatial frequency of the contrast envelope is close to or higher than the carrier spatial frequencies, it is generally not possible to recover the envelope profile due to aliasing. However, we show in Supplemental Note S1 that, by analyzing the F1 components of responses of V1 neurons to the contrast-modulated gratings, it is indeed possible to recover the envelope profile, thereby allowing reconstruction of the center–surround structure.
Figure 3 shows an example of reconstruction from a complex V1 neuron whose optimal spatial frequency and orientation of the CRF were 0.22 cycle/degree (cpd) and 90°, respectively. This neuron revealed strong surround suppression with a size-tuning curve (for conventional luminance grating) peaking at 3° and dropping to a spontaneous firing level at about 10° (Fig. 3A). We then presented the contrast-modulated gratings with a patch diameter of 15°. First, the best orientation for the contrast envelope was determined by an orientation-tuning measurement (Fig. 3B) where the envelope orientation was varied in 30° steps over one cycle (180°) with a fixed envelope spatial period (about twice the CRF size). This neuron revealed a strong envelope-orientation selectivity, preferring 90°. Then, the spatial frequency of the contrast envelope was varied from 0 to near the carrier frequencies in nine steps for four envelope orientations, one of which was optimal. For each spatial frequency and orientation, the contrast envelope was drifted in two opposite directions. As expected, responses to these gratings were highly modulated at the temporal frequency of the envelope drift (0.75 Hz, Fig. 3E). Figure 3, C and D shows the amplitude and phase tuning curves of these modulated components of responses plotted as a frequency-transfer function for the contrast envelope. Red, green, blue, and black colors indicate the optimal envelope orientation (90° for this cell) and progressively increasing orientations in 45° steps, respectively. The positive and negative spatial frequencies indicate two opposite directions of the envelope drift. A spatial frequency of zero means that the contrast of the full patch of the carrier grating was modulated at the same temporal frequency as that for the envelope drift. This neuron responded equally well to either direction of the envelope movement. The neuron also showed a strong band-pass tuning. The band-pass spatial-frequency tuning, in particular the low-frequency falloff, is generally interpreted as evidence for the presence of antagonistic lateral inhibition (Enroth-Cugell and Robson 1966) for a luminance-defined receptive field. In this case, however, the lateral inhibition must be based on contrast entering suppressive zones outside the CRF.
We then fit the DOG model to the amplitude and phase transfer functions. The details of data fitting were described in methods and Fig. 2. Figure 3F shows a polar plane, in which the amplitude and phase data for each stimulus (Fig. 3, C and D) are represented by the length and angle of a position vector indicated by each dot, respectively. Solid and broken curves in this panel show predicted responses of a DOG model that were best fitted to these data. The 2D spatial profile of this DOG model producing the best fit was taken as a reconstructed center–surround structure and is shown in Fig. 3G. Note that this structure should not be confused with on–off subregions of a standard simple cell despite many superficial similarities. The red area indicates a net excitatory region for contrast signals and thus represents the CRF. The blue area represents the suppressive surround. (Note that shapes of the CRF and the surround regions with these definitions are generally different from positive and negative Gaussian functions of the DOG model, respectively.) The CRF prefers a vertical orientation for the carrier (90°), as indicated by a line drawn within the red area. The surround region of this neuron was limited to the left side of the CRF. The CRF and surround regions were both elongated along the vertical axis. The aspect ratios of the CRF and surround region were 1.4 and 2.3, respectively. Therefore for this cell, the center–surround organization was elongated and oriented parallel to the preferred orientation for the CRF. The structure was nearly odd-symmetric. This center–surround organization is suitable for detecting vertical high-order borders.
Although we found that the DOG model provides good fit to the data (R2 = 0.9 for this cell), one may wonder whether the same structure can be obtained without assuming a specific function. If we had sufficient data points densely covering the 2D spatial-frequency domain, we could have directly obtained 2D structures by the 2D inverse Fourier transform. Unfortunately, dense sampling over 2D domains was too time-consuming to conduct. Instead, as described in methods, we have conducted 1D inverse Fourier transform applied to the data for a selected envelope orientation, which produced 1D spatial profiles for the axis orthogonal to this orientation. For example, we applied 1D inverse Fourier transform to the data for optimal envelope orientation (red points in Fig. 3, C and D), which produced a 1D spatial profile for the axis orthogonal to this orientation. Since the optimal envelope orientation was vertical for this cell, this transform produced a 1D profile for the horizontal axis (shown by green lines in the top margin of Fig. 3G). The same analysis applied to the data for the envelope orientation perpendicular to the optimal (blue points in Fig. 3, C and D) produced a spatial profile for the vertical axis (left margin of Fig. 3G). These profiles, reflecting center–surround structures without specific model assumption, were used to select initial values of the model parameters of the 2D DOG model when this model was fitted to the full data set (see methods). Since final fits were not constrained in any way to be near initial values and the model fitting also started from other randomly chosen initial values, the reconstructed 2D DOG structures, in principle, can become inconsistent with the 1D structures by the inverse Fourier transform. However, the 1D structures by the inverse Fourier transform substantially overlapped the profiles obtained by collapsing the 2D profile along the two axes (broken lines), indicating the reconstructed 2D profiles are highly consistent with 1D profiles. The results indicate that the fitted structures using the DOG model are consistent with true center–surround structures without specific assumption.
Four more examples of center–surround profiles are shown in Fig. 4. The suppressive surround of the neuron in Fig. 4A (dark regions) is limited to the right side of the CRF (bright regions), whose optimal carrier orientation within the CRF is vertical. The combined center–surround organization shows a vertically elongated odd-symmetric structure, with the aspect ratios of the CRF and surround regions as high as 2.5 and 1.9, respectively. Note that this structure is also found in 1D reconstructions (gray solid curves). The neuron in Fig. 4B also showed an asymmetric structure with substantial elongations of CRF and surround regions. For this cell, the surround is limited to the area to the left of the CRF, which prefers a carrier orientation of about 15°. The neuron in Fig. 4C showed suppressive surrounds in both end zones of the CRF. Consequently, this neuron shows an even-symmetric structure. The neuron in Fig. 4D, which shows a concentric structure, possesses a structure suitable for detecting a circular region defined by contrast, but would not be suitable for encoding the orientation of the contrast envelope. As we will see in the following text, neurons with such concentric center–surround organizations are in the small minority.
Oriented center–surround structures
Of the entire population of recorded neurons (n = 180, see Procedures in methods), we could complete measurement of responses to the full set of contrast-modulated gratings (along two or four envelope orientations) for 81 neurons. For these neurons, the temporal frequency of the contrast envelope (0.5 or 0.75 Hz) was sufficiently slower than that of the carrier (4.5 or 5 Hz), desirable for obtaining responses to the contrast envelope (see Supplemental Note S1). The initial conventional size-tuning test (e.g., Fig. 3A) was also conducted for 79 of these neurons. We excluded 15 neurons with weak surround suppression by requiring that, during the conventional size-tuning test, responses to large-size stimulus patches were reduced by >25% of the peak response to the optimal-size stimulus. We also excluded neurons in which responses to the contrast-modulated gratings were not well modulated at the temporal frequency of the envelope drift (F1/F0 <0.75, n = 13). The mean F1/F0 ratio of the 81 neurons was 1.09 (SD = 0.30).
In all, we analyzed 2D center–surround structures for 53 neurons. Of these, 2D center–surround structures were reconstructed for 35 neurons with sufficient reliability (66%, r2 >0.65). The reliability of the reconstruction is also assessed by the bootstrap analysis (see methods). We calculated the bootstrap SE for each model parameter (SD of 100 fitted values of each parameter). The SEs of the position parameters of the center Gaussian for x- and y-axes (μcx and μcy in Eq. 8 in methods) were, on average (across the 35 neurons), 0.29° (SD = 0.35) and 0.34° (SD = 0.42), respectively. Those for the surround Gaussian (μsx, μsy) were 0.58° (SD = 0.45) and 0.61° (SD = 0.52), respectively. These values are about 10 to 20% of the mean CRF diameters estimated from the conventional size-tuning curves (3.20°, SD = 1.5). The bootstrap SEs of the width parameters for the center (σcx, σcy) were 0.18 (SD = 0.20) and 0.26 (SD = 0.53), whereas those for the surround (σsx, σsy) were 0.50° (SD = 0.52) and 0.60° (SD = 0.66), respectively. These values are 20–50% of the mean width parameter values (0.88, 1.01, 1.12, and 1.23, respectively, n = 35). The SEs of the rotation parameters for the center and surround Gaussian were, on average, 17° (SD = 14) and 22° (SD = 11).
Of the 35 neurons with reliable center–surround structures, 17 neurons were simple, whereas 18 were complex cells, based on the F1/F0 ratio obtained from responses to conventional drifting luminance gratings (e.g., conventional size-tuning test). Since we did not find any systematic differences on center–surround structures between these groups, we did not divide them for the following analysis. Note that the F1/F0 here is different from that for contrast-modulated gratings described earlier.
To evaluate whether the surround is uniformly distributed around the CRF or is localized to limited portions, we calculated an angle subtended by the surround (regions with suppression stronger than 10% of maximum suppression) at the peak location of the CRF (Fig. 5 A, inset). We define this value as a concentricness and show its distribution in Fig. 5A. Larger values indicate more concentric structures. The neurons in Figs. 3 and 4, A–D had values of 126, 113, 153, 248, and 360°, respectively. We found that clearly concentric structures (concentricness of ∼360°) are in the minority. Rather, the surround of the reconstructed structures is often localized to limited regions around the CRFs (index <180).
We also characterized localization of the surround, using offset of positional parameters of the two fitted Gaussians, normalized by their width (1SD value). If the center and surround regions form a perfectly concentric structure, the offset value becomes 0. We found that 66% of neurons (23/35) took the value >0.5 (mean = 1.04, median = 0.84, SD = 0.84), indicating that most neurons are not concentric. As expected, there was a significant negative correlation between these values and the concentricness index (r = −0.48, P < 0.01, n = 35).
Besides the localization of the surround, elongation of the center and surround regions is a key factor for encoding orientations of the high-order features. The neurons in Figs. 3 and 4, A and B show substantial degrees of elongation. To quantify the degree of the elongation, we defined the width of the CRFs along the axis connecting the peaks of the CRF and surround (“a” in Fig. 5B, inset). The length of the CRFs was measured along the axis perpendicular to the width axis (“b” in Fig. 5B, inset). The degree of elongation is defined by an aspect ratio b/a, or the elongation index. As shown in Fig. 5B, the majority of cells showed an elongation index >1, with the mean value of 1.5. Filled bars (n = 28) indicate that the index is significantly >1 (bootstrap resampling and one-tailed t-test, P < 0.025; see methods). Neurons in Fig. 6, A and B had the highest and the second-highest elongation indices among the present data population, respectively (5.2 and 2.7, respectively). The elongation index of the surround region was similarly computed for the neurons with the concentricness index <330 (n = 32). If there were two separate suppressive regions as in Fig. 4C, the stronger suppressive region was analyzed. As shown in Fig. 5C, most neurons had values >1 (mean = 1.9), showing that the surround was also substantially elongated in a manner similar to that of the CRF. The elongation indices of the CRF and surround were highly correlated (Fig. 5D, r = 0.67, P < 0.01, n = 32). These results show that the center–surround structures are elongated perpendicular to the axis connecting the peaks of the CRF and suppressive surround. A population of neurons with such elongated structures appears highly suitable for orientation-based encoding of high-order features.
We also confirmed that the CRF and surround are elongated parallel to each other as follows. We computed length of the CRF and the surround along lines passing the respective peak in 5° steps, so that we could determine orientations of the longest axis for the CRF and surround. The two orientations were often very close, with the median of the absolute orientation difference of 10° (mean = 21, SD = 21, n = 35). Note that these orientations (obtained from the net positive and negative regions) were different in general from the orientations of the long axes of the two fitted Gaussians.
We next examined how the elongation axis of the CRF and surround relates to the preferred orientation of the CRFs for the conventional luminance stimuli. Figure 5E shows a distribution of the difference between the orientation of the longest axis of the CRF region and the preferred orientation of the CRF for the luminance grating (n = 35). The distribution extends widely over 0 to 90°. Figure 5F shows the distribution of the difference between the longest axis of the surround region and the CRF preferred orientation, in which difference values are also widely distributed. These varieties of relationship between the center–surround elongation and the preferred orientation of the CRF may allow neurons with surround suppression to be tuned for high-order borders with various carrier–envelope orientation relationships. We later show that neurons' preference for the envelope and carrier orientations is also highly variable (Fig. 8C).
We also examine how the elongation of the CRF and surround relates to the strength of the surround suppression of the neurons. For the 30 neurons, for which the concentricness index was <330 and size-tuning curves were obtained, we plotted the CRF and surround elongation index on the ordinate of Fig. 5, G and H, respectively, against the suppression index. We found that there were significant positive correlations between the elongation index and the suppression index in both cases (r = 0.43, P < 0.05 for the CRF and r = 053, P < 0.01 for the surround, n = 30), indicating neurons with stronger suppression have more elongated center and surround structures. The correlation for the CRF is significant even if the top right point in Fig. 5G was excluded (actually stronger, r = 0.44, n = 29).
We were surprised at the high degrees of elongation of the CRF and surround regions found for some cells exemplified by those in Fig. 6. There is a possibility that these elongations are due to some anomalies in the fitting procedure used to derive the 2D center–surround structure, although results of the bootstrap analysis suggest it is unlikely. To further confirm the validity of the reconstructed structures, we compared predictions from the reconstructed center–surround profiles to an independent set of envelope-orientation tuning measurements (Fig. 3B). For predictions, we reconstructed envelope-orientation tuning curves from Fourier transform of the center–surround maps. Figure 7, A–C shows comparison of orientation-tuning index, peak envelope orientation, and tuning width (full width at half-height) of tuning curves, respectively. Orientation-tuning index is defined as (Rmax − Rmin)/(Rmax + Rmin), where Rmax and Rmin are the maximum and minimum of the above-cited tuning curve. The larger the index, the more modulated the tuning curves are as a function of orientation. Note that most of the points representing individual cells are distributed close to the diagonal line, including those for the neurons in Fig. 6 (indicated by arrows), supporting the validity of these data. Further tests of reliability of the reconstructed center–surround structures are described in the following text (see Comparison with direct surround measurements).
We further analyzed how the elongation of the center–surround structures is related to orientation tunings for the contrast envelopes. It is expected that highly elongated structures should have narrow orientation tunings. This is true, as shown by a negative correlation between the CRF elongation index and tuning width (Fig. 7D, filled circles, r = −0.57, P < 0.05, n = 16). A similar tendency was found for surround elongation and tuning width (open circles, r = −0.47, 0.05 < P < 0.1, n = 16). Not reaching a significance level of 0.05 in the latter analysis is probably due to the small number of data samples because, when the tuning width at 66.6% of the peak height instead of the conventional half-height was used to analyze a larger population of neurons, the correlation between the tuning width and the surround elongation became highly significant (r = −0.59, n = 22, P < 0.01).
Moreover, we unexpectedly found that, as the center–surround structures are more elongated, responses of neurons are more modulated by the envelope orientation, with a significant positive correlation between the CRF elongation index and the orientation-tuning index (Fig. 7E, r = 0.71, P < 0.01, n = 31). A significant positive correlation was also found between the surround elongation index and the orientation-tuning index (Fig. 7F, r = 0.44, P < 0.05, n = 31). These results show that the elongation of the center–surround structures contributes considerably to the orientation tuning for the high-order features.
Using just the envelope-orientation tuning data, we further analyzed properties of orientation tunings of these neurons for high-order stimuli. Since the full set of measurements (for 2D reconstruction) is not required for this analysis, we analyzed neurons that showed ≥4 spikes/s in the initial envelope-orientation tuning measurement and showed 25% suppression in the conventional size-tuning measurement (n = 99). Figure 8A shows the distribution of the orientation-tuning index for this larger population of neurons. If we use a value of 0.3 as a criterion of selectivity, 48 of 99 analyzed neurons showed orientation selectivity for envelopes. The orientation tunings of these neurons were also judged to be statistically significant by one-way ANOVA (P < 0.05, filled portion of bars, n = 55; see methods), justifying the selection of the index criterion of 0.3.
Given the prevalence of orientation selectivity for the contrast envelope, we then must ask whether all orientations are represented. Figure 8B shows a distribution of peak envelope orientations for the 48 orientation-selective neurons. Although this distribution has a peak around 90° and is statistically different from a uniform distribution (χ2 test, n = 48, df = 11, P < 0.05), there is a large scatter in the preferred orientation from 0 to 180°. Therefore the center–surround structures, as a population, can represent a wide range of orientations of the high-order boundaries.
The location of the surround, relative to the preferred orientation for the CRF, was also quite variable across neurons. For example, the surround of the neurons shown in Figs. 3 and 4, A and B lay in the side regions, whereas that of the neurons in Figs. 4C and 6A was located in the end regions. Here, to quantify the position of the surround relative to the preferred orientation of the CRF (for the carrier), we measured the absolute difference between the preferred orientation of the CRF (for the carrier) and the optimal envelope orientation for each neuron. Zero and 90° indicate that the surround lies in the exact side and end zones, respectively. Intermediate angles between them indicate that the surround is at an oblique location. Figure 8C shows a distribution of these orientation differences. The number of neurons with small angles (0–30°), intermediate angles (30–60°), and large angles (60–90°), are 12, 13, and 23, respectively (n = 48 envelope-orientation selective neurons). Therefore although neurons with a surround in the end regions were dominant, there was a substantial scatter in the location of the surround relative to the preferred orientation of the CRF. These scatters, together with a variety of relationships between the longest axis of center–surround orientations and preferred orientations of the CRFs for the carrier (Fig. 5, E and F), allow neurons with surround suppression to be tuned to orientation of contrast envelopes with various carrier components.
Spatial symmetry and frequency of the center–surround structure
We next analyzed whether the center–surround organization shows structures with various symmetries and widths. As with encodings with standard simple cells, the symmetry and width of receptive field structures are closely related to the phase and the spatial frequency. In addition to the orientation, both are important for encoding spatial form (DeAngelis et al. 1999). To analyze the symmetry of a center–surround structure, we calculated the spatial phase of a Gabor function fitted to the 1D section of the 2D structure along an axis that goes through the peak (most excitatory location in the CRF) and the trough (most inhibitory location in the surround): 0° indicates an even-symmetric structure and 90° indicates an odd-symmetric structure. It should be noted that neurons with the spatial phase <30°, including those with concentric or even-symmetric surrounds, were not in the majority (Fig. 9A). Rather, there is a wide variety in the spatial phase ranging from 0 to 90°. Note that, as a generalized DOG model formulation, it is possible for the spatial phase to be in the range of 0–180°. However, there is a conspicuous lack of cells with spatial phases between 150 and 180°. This actually reflects the reality of the center–surround organization, since the filters with spatial phases near 180° will have two CRF regions. We did not find such a neuron.
Figure 9B shows the relationship between the spatial phases and the concentricness as described earlier. As expected from the fact that the odd-symmetric structures necessarily have oriented structures, there is a strong negative correlation between these two metrics (r = −0.56).
We next examined the spatial-frequency tunings of the center–surround structure for high-order stimuli, by analyzing the raw spatial-frequency tuning curves. Of the entire population of recorded neurons (n = 180), we analyzed neurons if the spatial-frequency tuning curves were measured at least for one orientation of the contrast envelope and if the suppression strength was >25% in the conventional size-tuning measurement (suppression index >0.25, n = 98). Since these tuning curves correspond to transfer functions of the antagonistic structures, we expect that they have band-pass tuning properties, as is the case for the neuron in Fig. 3. That is, responses are reduced at low spatial frequencies, because the excitatory center and suppressive surround regions are simultaneously stimulated by high-contrast regions of the stimuli. The responses should also drop at high spatial frequency, since high contrast regions are always within the CRF in this condition and responses are less modulated at the contrast-envelope temporal frequency. To examine whether this is the case, we calculated a band-pass tuning index, defined as the difference between responses to the peak spatial frequency and zero spatial frequency divided by their sum (Fig. 10A). The majority of neurons showed index >0.3 (n = 71 of 98, 72%), indicating that most neurons with surround suppression show band-pass tunings for the contrast envelope (mean = 0.44, SD = 0.23). For the 71 neurons with a clear band-pass tuning, we further analyzed peak spatial frequencies. They distributed from 0.05 to 0.8 cpd (Fig. 10B). This distribution is comparable to that of the preferred spatial frequency of cat area 18 neurons (Movshon et al. 1978a). Therefore these neurons encode a sufficient range of spatial frequencies of high-order borders.
If we calculate band-pass tuning index and optimal spatial frequencies for the neurons excluded from the above-cited population due to weak surround suppression (suppression index <0.25, n = 34), they were both smaller than those for the neurons analyzed earlier (band-pass tuning index: mean = 0.1, SD = 0.14, optimal envelope spatial frequency: mean = 0.07, SD = 0.09), which is consistent with the idea that band-pass tuning for the contrast envelope is caused by surround suppression. With regard to the conventional CRF spatial-frequency tunings (i.e., spatial-frequency tunings for carrier grating), the two groups of neurons did not show a difference in optimal values [strong suppression group, mean 0.48 (SD = 0.33) vs. weak suppression group, mean 0.45 (SD = 0.26)].
We further analyzed the ratio of the carrier to envelope spatial frequency (Fig. 10C). They distributed from 1 to 6 with the average ratio of 2.1 (n = 71). This distribution is quite different from that reported by Baker and colleagues for envelope-responsive neurons in cat area 18 (8 to 33) using similar stimuli (Mareschal and Baker Jr 1999). We subsequently discuss the significance of this difference.
An additional important finding from our study is that these neurons did not show direction selectivity for high-order border movement. From the amplitude of the spatial-frequency tuning curves measured for the two opposite directions, the difference between the two peak amplitude values divided by their sum was calculated as a direction-tuning index for each neuron. Only a few neurons (n = 7 of 98) showed direction selectivity (index >0.3, Fig. 10D). As described in methods, this is consistent with the assumption that the spatiotemporal center–surround structures are separable, which was used in the reconstruction of the center–surround structures.
Comparison with direct surround measurements
In the present study, we developed a new method for reconstructing the center–surround structures of cat V1 neurons. There is a degree of indirectness in the reconstructed structures. Are these structures consistent with results of other more direct measurements on the center–surround structures? To examine this issue, we tested two neurons with successful reconstructed structures to determine whether the reconstructed structures are consistent with response patterns to small localized stimulus patches presented outside the CRF (Fig. 11), as conducted in Walker et al. (1999). Although the CRF region (solid circles in Fig. 11, A and C) was stimulated with a drifting grating of its optimal orientation, the surrounding region was stimulated by another grating patch presented at one of the eight locations indicated by the broken lines in Fig. 11, A and C. The grating in the surround patch had the same parameters as those for the CRF except that its location was different and a part of it could be missing (for not invading the CRF region). The reconstructed structures for the neuron shown in Fig. 11A (the same as that shown in Fig. 6B) had the dominant surround in the 11 to 2 o'clock region with respect to the CRF. Responses to the surround patches together with the center patch are shown by solid squares in Fig. 11B, whereas those predicted by the reconstructed structures are indicated by a broken curve. In agreement with this structure, dominant suppression of this neuron for the small surround patches was observed at these locations, as indicated by data points inside an inner circle representing responses to the center patch alone. Note that this neuron had a highly elongated CRF that extended into the regions where surround patches were presented. Moreover, the results in Fig. 11B exhibit apparent facilitations for the patch locations consistent with the CRF elongation. In retrospect, the choice of the circular grating patch for the CRF was inappropriate for this cell, but there was no way of knowing this without the new method we have used. Similar results were obtained for another neuron (Fig. 11, C and D).
We also examined whether the reconstructed structures are consistent with conventional size-tuning curves. Predicted size-tuning curves are computed from the reconstructed center–surround structures, by integrating the reconstructed structures within given diameters. Examples are shown for three neurons by thick gray curves in Fig. 12, A–C, together with actual size-tuning curves indicated by black solid curves. Note that the two curves have similar shapes. In particular, the peak (CRF) diameters determined from the two tuning curves (black and while squares) are highly similar. The diameters at the asymptotic responses for the two tuning curves (circles) are also similar.
For 33 neurons, both the conventional size-tuning curves and the reconstructed structures were obtained. We conducted population analysis for these neurons except one neuron with a rather small peak diameter (1.5°), in which the reconstructed size-tuning curve showed an unrealistic form with negative values for small stimulus size so that peak diameter could not be defined. This exception occurred because the surround of reconstructed structures erroneously overlapped the center location of the stimulus patch used in the size-tuning measurement. In Fig. 12D, actual peak diameters of the size-tuning curves of individual neurons are plotted against the predicted values. As indicated by the cluster of data points concentrated along the diagonal, these two values are generally matched (r = 0.80, P < 0.001, n = 32). There appears to be a tendency that the predicted diameter of the CRF is slightly larger than the actual diameter. However, this difference was not statistically significant (P = 0.15, t-test). Although the differences are not significant, it is possible in general that small discrepancies between the actual and predicted size-tuning curves arise due to changes in eye position between the size-tuning measurements and the measurements using the contrast-modulated gratings. Alternatively, the difference could also be due to changes in CRF size or relative suppression strength between the two measurements arising from differences in stimulus configurations (Cavanaugh et al. 2002; Sceniak et al. 1999).
A similar comparison was conducted for the diameter at the asymptotic responses as shown in Fig. 12E. Although there are scatters of data points, there is a significant correlation between the two values (r = 0.64, P < 0.001, n = 32). These results show that the reconstructed structures are consistent with results of conventional measurements, validating our methods. Furthermore, these results support the view of the center–surround structures as spatial filters in a strict sense because the reconstructed structures successfully predicted responses to arbitrary stimuli unrelated to the reconstruction.
In the present study, we reconstructed spatial structures of the center–surround organization of cat V1 neurons, based on their responses to the contrast-modulated gratings. The reconstructed CRF and surround were generally elongated parallel to each other, so that they were suitable for coding the orientation of high-order contours. The orientation, symmetry, and width of the reconstructed structures differed across neurons, so that these structures can encode a wide range of parameters of high-order contours. The results show that the spatial organization of the center–surround of V1 neurons can be functionally viewed as spatial filters extracting high-order borders by their filter shapes. These structures make it possible for the V1 population to systematically represent orientations and spatial frequencies of high-order borders consisting of carriers preferred by neurons.
Consideration on the present method in comparison with other studies
Several previous studies have examined spatial organization of the CRF and surround by presenting one patch in the CRF and another patch at one of several locations outside the CRF (Cavanaugh et al. 2002b; Jones et al. 2001; Vinje and Gallant 2000; Walker et al. 1999). In these studies, the boundary between the CRF and the surround was typically determined from a size-tuning curve or as a circumference outside which a stimulus was ineffective if presented to the surround alone. However, it is known that the CRF and surround often substantially overlap and that the border between the two cannot be clearly defined (Cavanaugh et al. 2002a). Nonetheless, conventional methods make assumptions about the boundary position and configure subsequent experiments accordingly. In contrast, the method used in the present study does not assume any exact boundary between the CRF and surround in advance, measuring the two structures at the same time, and thus avoids such boundary problems.
Our method has another advantage in that a strict centering of the stimulus patch is not necessary, which is difficult for neurons with a small RF. Regardless of the difficulty and possibility of errors, however, such centering is required when one tries to estimate the extent of the CRF and surround by a conventional size-tuning measurement. In the present method, the center position of the stimuli is hardly relevant because the stimuli cover a wide area that includes both the CRF and the surround. The offset of the stimulus center will appear as phase shifts in the responses, but the reconstruction process automatically compensates for the miscentering, such that the shape of the reconstructed structure is unaffected by the position of the stimulus.
We introduced the generalized DOG models to characterize the center and surround organizations. This model has several advantages. First, this model is simple and convenient for quantitative analysis. Second, the model is able to describe various common center–surround structures including concentric structures (Fig. 4D), CRF with two surrounds that lie in the two end or side regions (Fig. 4C) and two parallel elongated center–surround structures (Figs. 3 and 4, A and B) in a unified framework. Moreover, this model can represent other somewhat unusual but possible forms of center–surround structures such as a structure consisting of center and surround regions elongated along different orientations (see Supplemental Fig. S4) and a structure consisting of one CRF and two small surrounds that do not lie in an opposite end of the CRFs (e.g., lie asymmetrically split in 0 and 3 o'clock directions with respect to the CRF). Although it may appear difficult for the generalized DOG model to represent the last structure, a DOG with the center of positive Gaussian displaced from that of the negative Gaussian in the 7 to 8 o'clock direction can represent such a structure. The model also does allow an inverted configuration where a central suppressive region is surrounded by a concentric CRF or sandwiched by two CRF regions. This wide range of structural representation of the generalized DOG model allows us to examine the existence and nonexistence of center–surround structures more systematically than previous studies. Generally, not all possible configurations allowed by the model were actually found. We found a small number of the asymmetrically split surround structure described earlier. On the other hand, we did not find a structure consisting of center and surround regions elongating along substantially different orientations. Nor did we find any inverted center–surround structures.
Although the model is very versatile, it cannot deal with a center–surround structure whose carrier orientation tunings are substantially different between the CRF and surround regions and a structure consisting of more than two small surround regions (e.g., tiny surround regions at 0, 3, 6, and 9 o'clock regions around the CRFs). However, note that the former case is rarely found, since carrier orientation tunings for the CRF and surround are usually similar (Akasaki et al. 2002; Blakemore and Tobin 1972; Nelson and Frost 1978; Walker et al. 1999). Even if a neuron is more effectively suppressed by nonoptimal CRF orientations, it is likely that this neuron is suppressed by the CRF preferred orientation to some degree (Sengpiel et al. 1997).
Last, the present DOG model does not take temporal latency difference between the CRF and surround into consideration. This is based on the report that the delay of suppression with respect to that for the CRF was on average <10 ms (Knierim and van Essen 1992). Walker et al. (1999) gives the surround delay of about 10–20 ms. We therefore consider the possible significance of 10- to 20-ms delay with respect to our methods and results. Suppose that the contrast envelope is drifting at 0.5 Hz, which is typical in our experiments. If there is a delay of 10–20 ms, response of the surround relative to that of the CRF is delayed by only 0.005–0.01 cycle. We confirmed that response of a center–surround mechanism with such a delay (modeled by subtraction of the two PSTHs with the delay) is nearly identical to the one without any time delay between the center and surround. Therefore the reconstructed structure is not affected by the relative delay between the center and the surround.
Spatial organizations of the center and surround: comparison with other studies
Walker and colleagues (1999) showed that the surround was often asymmetrically located with respect to the CRF. Our data are consistent with their observations, despite huge differences in the stimuli used. In our analysis, the majority of neurons showed a spatial RF phase >40° (Fig. 9), for which the neuron showed rather asymmetric structures like those shown in Figs. 3, 4, A and B, and 6, A and B. The distribution of the location of the surround was also similar to that shown in Walker et al. (1999). They showed that the number of neurons with suppressive end zones was twofold that with side suppression or that with oblique inhibition (their Fig. 8). This proportion is highly consistent with our data shown in Fig. 9C because the number of neurons with a large orientation difference (60–90°) between the carrier and envelope (indicating end inhibition) was twice that of neurons with medium orientation differences (30–60°, oblique inhibition) and small orientation differences (0–30°, side inhibition). Cavanaugh and colleagues (2002b) also observed similar ratios.
High elongation of the CRF was reported by previous studies (Bolz and Gilbert 1986; DeAngelis et al. 1994; Gilbert 1977; Kapadia et al. 2000). Although implicitly assumed for some neurons, however, these previous studies did not explicitly examine whether the CRF and surround are both elongated parallel to each other, as shown in this study. This is a key property with which we suggest that the CRF and surround effectively jointly encode orientation of the high-order borders.
Comparison between the center–surround effects and envelope-responsive neurons in the area 18
Superficially, the stimuli used in the present study of center–surround organization in area 17 neurons and those used to study contrast-envelope-responsive neurons primarily in area 18 (Mareschal and Baker Jr 1998; Tanaka and Ohzawa 2006; Zhou and Baker Jr 1993) are very similar. Are these two phenomena—the center–surround effects and envelope responses—essentially the same and based on a common neural mechanism? The following observations suggest that this is not the case. First, area 18 neurons also respond to the conventional luminance stimuli and showed the same orientation and spatial-frequency selectivity for the luminance and contrast-envelope cues. This cue invariance is not seen for the center–surround effects of area 17 neurons: the optimal spatial frequency of these neurons for the conventional luminance grating (through CRFs) (mean = 0.48 cpd, SD = 0.35, 71 neurons in Fig. 10B) is higher than that for the contrast envelopes (0.21 cpd, Fig. 10B). (This difference was revealed because we and Baker and colleagues used different stimulus settings, optimized for eliciting responses in each case. We set the carrier parameter to be optimal for the CRF, whereas Baker and his colleagues set the carrier to be much higher than the optimal CRF parameters.) Second, the signals from the center–surround are strongly sensitive to the carrier orientation (by the CRF orientation selectivity), whereas area 18 neurons are weakly sensitive to them (Mareschal and Baker Jr 1999). Third, in the present study, almost all neurons had no or very weak direction selectivity for the contrast envelopes (Fig. 10D), as indicated by the fact that only one neuron had the direction tuning index >0.5. This is in contrast to responses of the CRF to luminance gratings, for which about half of the neurons show the index >0.5 (DeAngelis et al. 1993). Although we examined the direction selectivity at low temporal frequencies (0.5 and 0.75 Hz), many V1 neurons show direction selectivity for conventional luminance gratings at these rates (Saul and Humphrey 1992). Therefore it is unlikely that the center–surround interactions are strongly involved in motion processing for the contrast borders. On the other hand, envelope-responsive neurons in area 18 show as strong direction selectivity as they show for conventional luminance gratings (Zhou and Baker Jr 1994). Fourth, whereas the average ratios of the optimal carrier and envelope spatial frequencies of neurons in the present study were 1 to 6 (Fig. 10C), that of the envelope-responsive neurons in the area 18 was 8 to 33 (Mareschal and Baker Jr 1999). These results suggest that the two responses are likely to be based on different mechanisms.
A number of psychophysical studies have examined motion and orientation processing for contrast-defined borders (Chubb and Sperling 1988; Derrington and Badcock 1985, 1986; Lin and Wilson 1996). Are these psychophysical results more a reflection of center–surround organizations we found for area 17 neurons or that of envelope-responsive neurons in area 18 as found by Baker and colleagues? As described earlier, it seems that the envelope-responsive neurons in area 18 are more deeply involved in processing of motion of the contrast envelope than the center–surround mechanisms. As for orientation discrimination, Lin and Wilson (1996) showed that better orientation discrimination of the contrast envelope was obtained at an envelope-spatial frequency of 3 or 6 cpd than that of 1 cpd. Since the carrier frequency was set at 12 cpd, this indicates that better performance was obtained for a carrier–envelope spatial-frequency ratio of 2 to 4, than that of 12 (Lin and Wilson 1996). It is interesting that these optimal values are near the average ratios of the optimal carrier and envelope spatial frequencies of neurons in the present study (mean = 2.1), whereas, as we described earlier, envelope-responsive neurons in the area 18 prefer a high carrier–envelope spatial-frequency ratio (8 to 33) (Mareschal and Baker Jr 1999). This suggests a possibility that the center–surround mechanisms play a key role for orientation discrimination for the contrast-defined borders. At the same time, since the discrimination is possible for a wide range of carrier–envelope spatial-frequency ratio, it is possible that the center–surround mechanisms and envelope-responsive neurons complement each other for processing contrast envelopes.
Finally, we suggest that signals from the center and surround may have a unique role in the representation of textured surfaces. Since these signals are highly sensitive to both texture carrier (by the CRF selectivity) and texture boundaries (by the center–surround interaction), convergence of such signals over a visual area create selectivity for both inner texture elements and outer boundaries of the textured surfaces, which are often found for neurons in the higher visual cortex of the monkey (Komatsu and Ideura 1993).
As described earlier, we found that the optimal carrier orientation and envelope orientation are generally different. This means that there is no cue invariance between luminance-defined selectivity of the CRF and envelope-defined selectivity of the combined center–surround receptive field. Here, lack of cue invariance that may superficially appear undesirable for decoding is quite desirable for encoding a rich set of possible combinations of luminance carrier and envelope orientations because all configurations shown in Fig. 1C must be represented. Signals carried by these neurons with the center–surround may be interpreted as the result of AND operation of the two independent conditions: the presence of optimal carrier orientation and the optimal envelope orientation.
In conclusion, our results suggest that, viewed as a population, neurons with the center–surround organizations are able to encode oriented high-order contours (consisting of carriers preferred by neurons), using oriented, multifrequency representation. Therefore just as a population of simple cells encodes shapes defined by luminance by an array of Gabor-shaped receptive fields, neurons with surround suppression may play an important role in encoding shapes by an array of center–surround receptive fields.
This work was supported by Ministry of Education, Culture, Sports, Science and Technology Grants 19700290 and 18020017, 21st Century/Global Common Operating Environment Programs Grant from Japan Society for the Promotion of Science, and Core Research for Evolutionary Science and Technology/CREST Yoshioka Project of Japan Science and Technology Agency. H. Tanaka was supported by grants from Naito Foundation and Uehara Foundation.
We thank laboratory members S. Nishimoto, T. Sanada, R. Kimura, K. Sasaki, M. Fukui, T. Ishida, T. Ninomiya, Y. Asada, Y. Tabuchi, and T. Arai for help in experiments and discussions and R. Freeman and P. Karagiannis for comments on the manuscript.
↵1 The online version of this article contains supplemental data.
The costs of publication of this article were defrayed in part by the payment of page charges. The article must therefore be hereby marked “advertisement” in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.
- Copyright © 2009 the American Physiological Society
- Akasaki et al. 2002.↵
- Allman et al. 1985.↵
- Blakemore and Tobin 1972.↵
- Bolz and Gilbert 1986.↵
- Cavanagh and Mather 1989.↵
- Cavanaugh et al. 2002a.↵
- Cavanaugh et al. 2002b.↵
- Chubb and Sperling 1988.↵
- DeAngelis et al. 1994.↵
- DeAngelis et al. 1999.↵
- DeAngelis et al. 1993.↵
- Derrington and Badcock 1985.↵
- Derrington and Badcock 1986.↵
- Efron and Tibshirani 1993.↵
- Enroth-Cugell and Robson 1966.↵
- Fitzpatrick 2000.↵
- Gilbert 1977.↵
- Hamilton et al. 1989.↵
- Hubel and Wiesel 1965.↵
- Jones et al. 2001.↵
- Kapadia et al. 2000.↵
- Knierim and van Essen 1992.↵
- Komatsu and Ideura 1993.↵
- Lin and Wilson 1996.↵
- Mareschal and Baker 1998.↵
- Mareschal and Baker 1999.↵
- Movshon et al. 1978a.↵
- Movshon et al. 1978b.↵
- Nelson and Frost 1978.↵
- Nishimoto et al. 2005.↵
- Nothdurft et al. 2000.↵
- Ohzawa et al. 1996.↵
- Pack et al. 2003.↵
- Ringach 2002.↵
- Ringach et al. 1997a.↵
- Ringach et al. 1997b.↵
- Rossi et al. 2001.↵
- Sasaki and Ohzawa 2007.↵
- Saul and Humphrey 1992.↵
- Sceniak et al. 2001.↵
- Sceniak et al. 2002.↵
- Sceniak et al. 1999.↵
- Sengpiel et al. 1997.↵
- Shen et al. 2007.↵
- So and Shapley 1979.↵
- Tanaka and Ohzawa 2006.↵
- Vinje and Gallant 2000.↵
- von der Heydt et al. 1984.↵
- Walker et al. 1999.↵
- Wilson 1999.↵
- Zhou et al. 2000.↵
- Zhou and Baker 1993.↵
- Zhou and Baker 1994.↵
- Zipser et al. 1996.↵