Abstract
Maps of ocular dominance and orientation in primary visual cortex have a highly characteristic structure. The factors that determine this structure are still largely unknown. In particular, it is unclear how shortrange excitatory and inhibitory connections between nearby neurons influence structure both within and between maps. Using a generalized version of a wellknown computational model of visual cortical map development, we show that the number of excitatory and inhibitory oscillations in this interaction function critically influences map structure. Specifically, we demonstrate that functions that oscillate more than once do not produce maps closely resembling those seen biologically. This strongly suggests that local lateral connections in visual cortex oscillate only once and have the form of a Mexican hat.
INTRODUCTION
The primary visual cortex of cats, monkeys, and ferrets is characterized by a series of columnar maps, each representing particular features of the visual input. The most noticeable and wellcharacterized of these are visual field position (VF, topography), ocular dominance (OD), and orientation preference (OR). These maps coexist on the same neural substrate in highly stereotyped patterns. For instance, OD columns tend to intersect OR columns at steep angles, and OR pinwheels tend to lie at the center of OD columns (Bartfeld and Grinvald 1992; Crair et al. 1997; Hübener et al. 1997; Kim et al. 1999; Müller et al. 2000; Obermayer and Blasdel 1993). These maps and their interrelationships have provided a paradigm example for the investigation of how cortical structure arises during development.
In particular, theoretical models of columnar development have been remarkably successful at accounting for many of the observed features of cortical maps (reviewed in CarreiraPerpiñán and Goodhill 2002b; Erwin et al. 1995; Swindale 1996). Many of these models can be seen, explicitly or implicitly, as trading off coverage versus continuity (CarreiraPerpiñán and Goodhill 2002a; Swindale 1996; Swindale et al. 2000). Coverage refers to a desire to represent all input features uniformly, whereas continuity refers to the desire to represent all input features smoothly. In the cortex, continuity arises from the pattern of shortrange lateral connections. By specifying the degree to which neurons at different distances from a particular cortical neuron are excited or inhibited by that neuron, the continuity term in theoretical models effectively controls the type of surface defined by the sheet of cortical receptive fields through the higherdimensional space of input features, and thus the relations between maps.
A large number of experimental studies have investigated the form these lateral connections take in primary visual cortex (V1; reviewed in Callaway 1998; Lund et al. 2003). Anatomically, a localized injection of an axonal tracer such as biocytin (e.g., Amir et al. 1993; Bosking et al. 1997; Kisvárday et al. 1997) or GFP (green fluorescent protein) adenovirus (Stettler et al. 2002) leads to a small, densely labeled central region of diameter roughly 500 μm, surrounded by a halo of more patchy and diffuse connectivity. Physiological studies investigating the sign of the connections within the central region have generally found a mixture of excitation and inhibition (e.g., Hata et al. 1991; Roerig and Chen 2002). It has also been suggested that within this region excitation is more spatially restricted than inhibition (Hata et al. 1991), although this is not generally confirmed (see also Swindale 1996). Whereas the shortrange connections are largely nonspecific, the longerrange connections tend to link cells with similar orientation preferences (e.g., Bosking et al. 1997; Roerig and Chen 2002). Although the anatomical data do not directly support a lateralconnection pattern with a Mexicanhat form (central excitation surrounded by inhibition), other sources of evidence exist. For example, in mouse visual cortex, extracellular recordings and optical imaging (see Schuett et al. 2002 and references therein) have shown that stimulation of a limited region of the visual cortex also elicits surround inhibition, thus suggesting a Mexicanhat pattern. The range of inhibition also seems to extend farther than suggested by the purely anatomical evidence, and this has a role in map development (see Hensch and Stryker 2004; Fagiolini et al. 2004 and references therein).
Models of cortical pattern formation during development have generally considered only the effect of the dense, shortrange mixture of excitatory and inhibitory connections, and have ignored the patchy longerrange connections. In particular, most models have assumed that lateral interactions take the form of a radially symmetric disk of shortrange excitation surrounded by an annulus of longerrange inhibition (e.g., Goodhill 1993; Miller et al. 1989; von der Malsburg 1973; Willshaw and von der Malsburg 1976; reviewed in Swindale 1996). Even models that allow plasticity of lateral interactions have imposed the condition of shortrange excitation and longerrange inhibition (Sirosh and Miikkulainen 1997). Although as discussed above there is little direct biological evidence to support this assumption, such “Mexicanhat” functions for shortrange connectivity are very popular in models for cortical function as well as cortical development (e.g., Dragoi and Sur 2000; Ernst et al. 2001; Kang et al. 2003; Somers et al. 1998). One very appealing feature of Mexicanhat interactions is their powerful patternforming properties (Ball 1999; Meinhardt 1982; Turing 1952).
However, despite this popularity there has been very little investigation of how important the precise form of shortrange Mexicanhat interactions (i.e., the continuity term) is to theoretical models of cortical development. If it were the case that different types of continuity term led to maps with broadly similar properties, then models would not provide strong constraints on what patterns of lateral connections might exist in the real cortex. Here we show that, on the contrary, the precise form of the continuity term provides very sensitive control over the details of the maps formed. This greatly strengthens the prediction that lateral connections in the real cortex do indeed have the form of shortrange excitation and longerrange inhibition. Our results demonstrate the fragility of the interrelations between maps with respect to a class of Mexicanhat functions that could be implemented in biological hardware in terms of localized arbors (of different widths and amplitudes) of excitatory and inhibitory connections.
The particular model we investigate is the elastic net (Dayan 1993; Durbin and Mitchison 1990; Durbin and Willshaw 1987; Durbin et al. 1989; Goodhill and Willshaw 1990). Together with the Kohonen model, the elastic net exceeds all other models in the closeness with which the maps it produces match the details of real maps (Erwin et al. 1995; Swindale 1996). Unlike the Kohonen model, however, the elastic net has an objective function that explicitly implements the tradeoff between coverage and continuity. This property allows the effect of the continuity term to be cleanly separated from that of the coverage term. We first present a novel interpretation of the continuity term in terms of a pattern of lateral interactions in the cortex and show that the standard form of the continuity term, in effect a firstorder derivative, represents shortrange excitation and longerrange inhibition. We also generalize the continuity term to higherorder derivatives and show how these correspond to patterns of lateral interactions that oscillate more than once (i.e., the annulus of inhibition is now surrounded at longer range by an annulus of excitation, and so forth). Simulations of the joint development of OD and OR columns are then compared using these differing assumptions regarding lateral interactions. We find that as derivative order increases, normal map structure is gradually disrupted: in particular many OR pinwheels move to OD borders, and the histogram of intersection angles between OR and OD columns becomes flat, although the maps themselves remain columnar. These results lead to two testable biological predictions: 1) the sign of shortrange lateral interactions changes only once in visual cortices that have been wellcharacterized so far; 2) if cortices are found where columns do not intersect at right angles and OR pinwheels are not constrained to mostly lie on OD centers, these cortices may contain shortrange lateral interactions that change sign more than once.
METHODS
The elastic net (Durbin and Mitchison 1990; Durbin and Willshaw 1987; Goodhill and Willshaw 1990) produces maps that attempt to minimize a tradeoff between coverage C, the desire to represent all stimulus features uniformly in the cortex (Swindale 1991), and continuity R, the desire to produce a smooth mapping of stimulus features. This tradeoff is expressed in terms of minimizing an “energy function” E = C + (β/2)R, where β controls the weight of R with respect to C. Let x_{n} represent the stimulus space vector n (combination of preferred position, ocular dominance, and orientation), and y_{m} represent the center in stimulus space of the receptive field of cortical neuron m. [Note: Although it is convenient to refer to m as being a single cortical neuron, it should more properly be thought of as representing a small patch of neurons. We use the term “receptive field” to refer to all features jointly (i.e., visual field, ocular dominance, and orientation), and not only to the visual field.] In the elastic net the coverage term is then defined as follows where K is a parameter that gives the receptive field size in feature space (‖…‖ represents the Euclidean norm, or length, of a vector), such that large values of K mean cortical neurons are quite unselective for specific features, whereas small values of K mean they are quite selective. We reduce (anneal) K over the course of the simulation, corresponding to a gradual sharpening of the cortical receptive fields. This implements an efficient optimization procedure known as deterministic annealing (Rose 1998). This iterative minimization procedure can be interpreted in terms of Hebbian learning (Durbin and Mitchison 1990), although other interpretations are also possible. The stimulus space is densely populated with points along the dimensions of visual field (VFx, VFy), ocular dominance (OD), and orientation preference and selectivity (OR, ORr). In our case, the stimulus values (training set) are a grid of N_{x} × N_{y} × N_{OD} × N_{OR} × 1 in the rectangle [0, 1] × [0, 1] × [−l, l] × [−(π/2), (π/2)] × [0, r], where OR is conventionally represented by 2 variables in polar coordinates, with N_{OR} values uniformly arranged on a ring of radius r (see e.g., Swindale 1996). There are N = N_{x}N_{y}N_{OD}N_{OR} such 5dimensional points x_{n}. The net consists of a square lattice with M centroids, representing a square array of cortical neurons (the depth dimension of the cortex is not modeled).
The continuity term in the elastic net is defined as where for each neuron m the term f is a fixed linear combination of cortical neurons in a neighborhood of m. That is, f specifies the strength of the interaction between a cortical neuron and its neighbors. We call the coefficients of the linear combination the stencil. In the original formulation of the elastic net (Durbin and Willshaw 1987), R was defined as ‖y_{m}_{+1} − y_{m}‖^{2}, that is, the sum of squared distances in feature space between adjacent receptive field centers of neighboring cortical neurons. In the generalized elastic net (CarreiraPerpiñán and Goodhill, unpublished observations), each term f(y_{m}) = (y_{m}_{+1} − y_{m}) is viewed as a discretized derivative operator of order p = 1 (i.e., an approximate gradient). We can prove (CarreiraPerpiñán and Goodhill, unpublished observations) that the continuity term can be rewritten as a Mexicanhat function as follows (see p = 1 in Table 1). Define the cortical interaction function e_{k} and the local linear combination g with coefficients e_{k} as (1) Then, for a large number M of net centroids, it holds that (Exact expressions can be derived for any finite M, but are complicated and nearly coincide numerically with the limit approximation.) In other words, we can represent the continuity term R(y_{1},… , y_{M}) either in terms of the original linear combinations f(y_{m}) = y_{m}_{+1} − y_{m} or in terms of the new linear combinations g(y_{m}) given in Eq. 1. That is, the continuity term in the original elastic net is mathematically equivalent to a Mexicanhat function consisting of shortrange excitation and longrange inhibition. This demonstrates the equivalence in the elastic net model of the Mexicanhat cortical interaction and the interpretation of Durbin and Mitchison (1990) in terms of wirelength minimization (in the stimulus space). It also shows that the sumofsquareddistances equation, which involves only nearestneighbor connections, can be implemented with a denser net of connections (whose strength decays rapidly with distance).
Instead of choosing, for the function f(y_{m}), the firstorder derivative of sensory space over cortex, we could explore the effects of choosing higherorder derivatives. For this paper we chose derivatives of orders 2, 3, and 4. The discrete forms of these higherorder derivatives are shown in Table 1; they are obtained by repeatedly applying the firstorder derivative (y_{m}_{+1} − y_{m}). In each case, we can show that the sum ‖f(y_{m})‖^{2} is equivalent, in the limit of large M, to ‖g(y_{m})‖^{2} with an appropriate g interaction function; and that this g interaction function has a progressively larger number of distal oscillations, or alternating rings of excitatory and inhibitory connections, with increasing order of the derivative. Because the continuity term R should not depend on the location of the net, but only the relative arrangement of its centroids, the coefficients in the stencil should add to zero, which is the same as requiring that applying the stencil to a constant function give zero (by definition of differential stencil). 2D stencils can be readily obtained by passing a 1D stencil horizontally and vertically, resulting in 2 terms, R_{h} and R_{v}, that are then added. This is the natural extension of the firstorder 2D stencil used in the original elastic net, although it leads to a Fourier spectrum that is increasingly anisotropic as p increases. There are other ways to construct 2D stencils (e.g., the Laplacian stencil) of order p = 2, which has a more isotropic spectrum. Note however that in a discrete grid there is not a unique definition of the concept of isotropy. Although in reality it is likely that the lateral connections are maturing as the maps develop (Sur and Leamey 2001), the elastic net model (in common with most other models) simplifies this situation by assuming a fixed pattern of lateral connections.
Simulation setup
We trained nets for derivative orders p = 1, 2, 3, and 4 with the following configuration: training set: N_{x} = N_{y} = 20, N_{OD} = 2, N_{OR} = 12, l = 0.09, and r = 0.16 (totaling N = 9,600 points in 5 dimensions); net: 128 × 128 centroids; β = 10; nonperiodic boundary conditions. We reduced K with an annealing rate of 0.992 from 0.1 to the point at which the OD and OR maps have just arisen (K ≈ 0.05 for all values of p). We achieved the minimization of E at each value of K by Cholesky factorization (CarreiraPerpiñán and Goodhill, unpublished observations). This method is both more efficient and more robust than the numerical procedures used previously (e.g., Durbin and Mitchison 1990; Durbin and Willshaw 1987; Goodhill and Willshaw 1990), which were limited to very small β values. We ran 5 different simulations for each value of p to obtain error bars in Figs. 3 and 4, adding a small amount of random noise to the initial net and the training set in each case. The simulations were performed using custom software written in Matlab.
Several of the statistics reported are based on histograms; in all cases we used n = 10 bins and took each bin's center as class representative. Using a different number of bins (100 instead of 10) did not significantly alter any of the results. To quantitatively characterize the histograms (average and variability) as a function of p in a compact way in Fig. 3, we summarized each histogram by 3 statistics: 1) its Kullback–Leibler (KL) divergence (Cover and Thomas 1991) to the uniform histogram in the same domain, defined as KL(u) = p_{i} log (p_{i}/u_{i}), where p_{i} and u_{i} are the values (normalized to sum 1) at bin i of the histogram and the uniform histogram, respectively; KL(u) quantifies departure from uniformity and takes values from 0 for a uniform histogram to log n ≈ 2.303 when all p_{i} are zero except at a single bin; 2) its Fisher skewness γ_{1} = (where μ_{i} is the ith central moment), which is 0 for a histogram symmetric around its mean and negative (positive) for tails on the left (right) of the mean; 3) its mean μ. Both γ_{1} and μ are independent of the number of bins n; all 3 are independent of translations or scalings of the data. Other summary statistics are possible, but these 3 sufficed to characterize the histograms we obtained.
Stripe width
The power of the discrete Fourier transform of OD and OR angle maps was roughly isotropic and concentrated around a ring. We summarized it by the mean wavelength (mean stripe width), averaged over all directions.
Crossing angles
We disregarded the points lying 5 pixels or less from the boundaries of the net (a thin inner stripe framing the maps in Fig. 1) to eliminate border effects. At each pixel, we computed the angle between the gradient vectors of OD and OR (see following text) and mapped it to [0°, 90°], thus obtaining the angle between contours of OD and OR. Each such angle counted with a weight proportional to the product of the gradient moduli of OD and OR, ‖∇OD‖ × ‖∇OR‖, so that pixels lying in areas of either constant OD or constant OR (i.e., away from borders of OD or contours of OR) were effectively removed from the histogram. This is because, in an area where either map is nearly constant, the gradient vector is negligible in modulus and its direction basically arbitrary, unlike along borders, where the gradient vector is large and its direction well defined.
Pinwheel distributions
An OR singularity, or pinwheel, is defined positive if the orientation angle increases in a clockwise direction around the pinwheel and negative if anticlockwise. Pinwheels were automatically located in the OR maps as follows. First, the winding number of each pixel in the OR map was estimated by summing the increments of OR angle (in [−(π/2), (π/2)]) along a closed path (a square of radius 1 pixel centered in the pixel considered) in a clockwise direction and dividing the result by 2π; this results in 0 for nonpinwheel points and + (−) for pixels at or adjacent to a positive (negative) pinwheel, respectively. The exact pinwheel location was obtained by grouping clusters of nonzero winding number and computing their centers of mass. To quantify the layout of pinwheels with respect to OD stripes, we computed, for every pinwheel, the distribution of distances of a given pinwheel to its closest OD border. We prefer this to calculating distances to OD centers because the “center of an OD stripe” is generally not well defined, except for stripes that are translations of each other, and becomes difficult to use with stripes with irregular boundaries, forks, islands, or other complex shapes. In contrast, the OD borders are well defined in all these cases.
To compute the distances between a pinwheel and its closest OD border, we represented the OD borders by a finely spaced collection of points (the contours generated by Matlab for the OD map); the said distance was then given by the point in this collection closest to the given pinwheel. We considered that a pinwheel lay on an OD border if the distance between the 2 was 1 pixel or less. The distributions of random pinwheels were built by generating, uniformly over the cortex, the same number of pinwheels as in the actual map produced by the simulation, and then computing the distance histogram; this process was repeated 50 times to obtain an average histogram with error bars that is the one plotted along with the actual ones for comparison. Note that theoretical distributions (such as the Rayleigh distribution for nearestneighbor pinwheels; Müller et al. 2000) differ from the true one because they do not take into account the border effects of a finite cortex.
Gradient correlations
The gradient of a variable f (one of VFx, VFy, OD, OR) with respect to the cortical location (x, y) was approximated at each pixel (i, j) by a central finite difference where h = 1 pixel is the grid size. For points along the cortex boundaries, a forward difference was used at a small accuracy loss. In both cases, for OR the gradient values were corrected to have a maximal absolute value of (π/2). The 2D gradient vector ∇f was converted to polar coordinates (modulus, angle) for each variable separately, with the angle being used in the histogram of crossing angles. The moduli were used for the correlation plots (the modulus of VF gradient was defined operationally as the sum of the gradient moduli of VFx and VFy). The units of the gradient modulus of variable f are those of variable f per pixel. The linear correlation was quantified by Pearson's correlation coefficient r. The customary regression lines X → Y and Y → X minimize the vertical and horizontal errors, whereas the line X, Y minimizes the orthogonal distances to the line and is given by the principal component of the data covariance matrix. All 3 lines pass through the data mean but with different slopes in general. For highly correlated data (r^{2} ≈ 1), all 3 lines are very close, whereas for highly uncorrelated data (r^{2} ≈ 0) X → Y and Y → X become horizontal and vertical, respectively.
Gradient correlations have also been reported in the experimental literature (e.g., Das and Gilbert 1997; Hetherington and Swindale 1999), where the preferences of VF and OR were determined electrophysiologically at several cortical sites and then the difference in preference between neighboring sites divided by their cortical distance. However, for practical reasons “neighboring” sites are much further apart over the cortical surface in these studies than neighboring points in our simulated cortex.
RESULTS
We present the results obtained with a generalized elastic net model of the maps of visual field position (VF), ocular dominance (OD), and orientation preference (OR). The model attempts to represent the 5 dimensions of VF (x and y directions), OD and OR (preferred angle and selectivity) using a 2D array of cortical neurons (the elastic net), thus achieving a reduction of dimension. The maps are the result of maximizing the coverage of these stimuli while minimizing a continuity constraint on the shape of net (the geometry of the maps). The continuity constraint penalizes the squared magnitude of the derivative of order p integrated over the whole net, thus forcing a representation of the maps that is as smooth as possible. Biologically, the continuity constraint represents a cortical interaction function with the shape of a Mexican hat, in which the number of excitatory–inhibitory regions increases with the derivative order p (Table 1). We considered p = 1 to 4. Table 1 shows, for each value of p from 1 to 4, the corresponding form of the continuity term in the elastic net, and the equivalent (shortrange) cortical interaction function (for details see methods). For p = 1 (the standard elastic net) the cortical interaction function has the form of shortrange excitation surrounded by an annulus of longerrange inhibition, which decreases smoothly to zero. For p = 2, the magnitude of the longerrange inhibition decreases quickly to zero. For p = 3, the magnitude of the longerrange inhibition decreases quickly to zero, and then there is a small annulus of excitation surrounding this, which decreases smoothly to zero. For p = 4, the annulus of excitation is stronger but falls rapidly to zero. It should be emphasized that in each case these functions represent only radially symmetric shortrange cortical interactions (i.e., within a radius of about 500 microns) and longerrange patchy connections are not considered.
By simulating the development of VF, OD, and OR maps using the elastic net with each of these continuity terms, we demonstrate that these very subtle differences in cortical interactions, in the form of the Mexican hat, have profound effects on the detailed structure of the maps obtained. Figure 1 shows the maps obtained from one simulation for derivative orders p = 1 to 4. The maps obtained in simulations using different random numbers differed in the detailed layout but were qualitatively similar. Figures 2–5 quantitatively assess the specific maps of Fig. 1, whereas Figs. 3 and 4 summarize the various statistics as a function of the derivative order; the average value and error bars (SDs) result from 5 simulations per order.
Crossing angles
The distribution of crossing angles of the contours of OD and OR for the maps of Fig. 1 are shown in the left column of Fig. 2. For p = 1, this distribution is strongly skewed toward orthogonality, in agreement with experimental data (macaque: Bartfeld and Grinvald 1992; Obermayer and Blasdel 1993; cat: Crair et al. 1997; Hübener et al. 1997; Kim et al. 1999; Müller et al. 2000). However, as p increases, the distribution quickly loses its bias and becomes practically uniform for p ≥ 3. The loss of orthogonality is apparent by close inspection of the contours in Fig. 1 (right column). The transformation of the distribution from being sharply skewed toward orthogonality (mean 65°) to being practically uniform (mean ≈ 45°) is clearly shown in Fig. 3 (first row).
Stripe widths
Figure 4 (left) plots the mean wavelength of OD and OR in pixels, showing that the stripes become narrower as the derivative order increases. The stripe width depends for any derivative order on the variance along the OD and OR dimensions in the training set, and is related to the order in which both maps arise (Goodhill and Cimponeriu 2000). For the results shown in this paper we chose these variance parameters such that OD columns are slightly wider than OR columns, as is the case in macaques (e.g., Obermayer and Blasdel 1993). However, we also investigated the case where OR columns are wider than OD columns, corresponding to the case in cats (e.g., Löwel et al. 1988). No important differences were found of increasing derivative order in the effects we have quantified here (data not shown). Specifically, all histograms and curves in Figs. 2–5 remained basically the same (same increasing or decreasing pattern, similar location of peaks, etc.), and in Fig. 4 (left) the 2 curves cross over between p = 1 and p = 2 because the OR wavelength decreases faster than the OD one.
It is noticeable that the stripes become increasingly more aligned with the diagonals of the cortical region considered as p increases, as seen from Fig. 1. This is attributed to the fact that the 2D stencils (pattern of lateral interactions) we use are increasingly anisotropic in Fourier space as p increases (CarreiraPerpiñán and Goodhill, unpublished observations). To eliminate the possibility that some of the other effects on map structure we report might not appear using more isotropic 2D stencils representing equal derivative order, we also investigated a 2D Laplacian stencil (order p = 2), which has a more isotropic spectrum. The matching of pinwheels and OD borders and the uniformity of crossing angles still appeared with this stencil (results not shown), so we conclude that the anisotropy is not responsible for the effects we report.
Orientation pinwheels
Figure 2 (right column) shows the distribution of distances between nearestneighbor pinwheels, normalized by the wavelength of OR. The histograms are very similar for all orders of p. When pinwheels are considered independently of their sign, the histogram peaks quite sharply at half the OR wavelength and has a slight tail on the right toward higher distances, while quickly dropping to 0 for small distances. If the same number of pinwheels are uniformly distributed at random, then the histogram peaks at a smaller distance and has a much longer right tail. When only pinwheels of the same sign are considered, the histogram contains half the number of pinwheels and has the same shape, but peaks at about the OR wavelength. This is confirmed by the summary statistics of Fig. 3 (third row); the large error bars for p = 1 are attributed to the very small number of pinwheels present in the map (about 50), which results in a more variable histogram. A slight tendency for pinwheels to be more closely packed as p increases is visible in Fig. 3 (μ graph), consistent with the slight increase of pinwheels per OR module mentioned above.
This reflects the fact that pinwheels are laid out in a noisy gridlike arrangement (of grid length half the OR wavelength), with pinwheels repelling each other, so that samesign pinwheels alternate in the nodes of the grid. Thus whereas the numbers of pinwheels and stripe widths vary with p, the arrangement of pinwheels relative to each other remains the same for all p, in agreement with optical imaging data for squirrel and macaque monkeys (Obermayer and Blasdel 1997).
The effect on the distribution of the distance from a pinwheel to its closest OD border is more striking. In Fig. 2 (center column), for p = 1 the histogram shows a quite sharp peak around of the OD wavelength, with a long tail on the left. Thus pinwheels tend to locate at the centers of OD columns, almost always avoiding OD borders, as apparent in Fig. 1 and in agreement with optical imaging data (macaque: Bartfeld and Grinvald 1992; Obermayer and Blasdel 1993; cat: Hübener et al. 1997). However, for p = 2 the histogram becomes bimodal, preserving a now lesssharp peak at of the OD wavelength and developing a very sharp peak at distance 0, indicating that pinwheels generally lie either away from the OD borders (at a distance around 0.17 of the OD wavelength) or right on top of an OD border. As p increases, the peak at distance 0, which remains sharp, increases at the expense of the peak at distance 0.17 of the OD wavelength, which becomes broader. This remarkable effect is apparent in Fig. 1: note for p ≥ 2 how the OD borders meet runs of pinwheels like beads on a string. As Fig. 2 shows, this distribution is very different from that resulting from the same number of pinwheels but uniformly distributed at random, which would have a roughly flat value for distances ≤0.15 of the OD wavelength to then taper off at larger distances. The histograms also show that, for all orders, the distance to OD borders is equally distributed with respect to the sign of the pinwheels (naturally, the histograms of positive and negative pinwheels are half the height of the histogram for all signs). The summary statistics in Fig. 3 (second row) confirm that the histogram remains nonuniform for all p, its mean μ (normalized by the OD wavelength) monotonically decreases, and its skewness changes from negative for p = 1 (tail on the left) to positive for high p (tail on the right). Figure 4 (right) shows how the proportion of pinwheels that lie right on OD borders grows from barely 10% for p = 1 to almost 50% for p = 4. Note that the flattening of the angle histogram is consistent with the increasing proportion of pinwheels on OD borders, given that an OD border crossing a pinwheel must be parallel to some OR contours. In other words, an orthogonality bias requires that pinwheels avoid OD borders.
Gradient correlations
Figure 5 shows the pairwise correlation plots between the gradient moduli of VF, OD, and OR (cf. Bosking et al. 2002; Buzás et al. 2003; Das and Gilbert 1997; Hetherington and Swindale 1999; White et al. 2001) for orders p = 1 to 4. In general, although there is an absence of pixels where the gradients of two or more variables are simultaneously high, there is practically no correlation between gradients, as shown by the low r value and the horizontal and vertical regression lines X → Y and Y → X, respectively. As p increases, the maximal gradient values increase (note the increasing ranges of the axes) because of a small proportion of highgradient points, but by and large the bulk of the cloud changes little. In particular, there is neither a positive correlation between gradients nor a negative one, or very small if at all (as in Bosking et al. 2002; Buzás et al. 2003; Hetherington and Swindale 1999; White et al. 2001; and in contrast with Das and Gilbert 1997). What is remarkable, and confirms what we said earlier, is the correlation between the gradients of OD and OR at points at or near pinwheels (which is responsible for the apparent trend in the slope of the regression lines). For p = 1, pinwheels (by definition characterized by a high OR gradient) lie almost exclusively in areas of zero OD gradient (i.e., away from borders). As p increases, the pinwheel population separates into 2 groups, one at nearzero OD gradient and another at high OD gradient (i.e., the OD borders), with very few pinwheels lying in between.
DISCUSSION
Although a dense network of shortrange cortical connections is known to exist in visual cortex (e.g., Amir et al. 1993; Bosking et al. 1997; Callaway 1998; Kisvárday et al. 1997; Lund et al. 2003; Stettler et al. 2002), the precise form of the resulting intracortical interaction function is unclear. We have examined the effect of varying the form of the continuity term representing these shortrange interactions in the elastic net model of visual cortical map development. In particular, we have investigated continuity terms corresponding to 1st, 2nd, 3rd, and 4thorder derivative constraints on the rate of change of cortical receptive field location in a feature space consisting of visual field position, ocular dominance, and orientation preference. We have shown that this progression of derivatives corresponds to a progression of patterns of shortrange intracortical interactions with increasing numbers of oscillations in the sign of the interactions. Our simulations lead us to hypothesize that the greater the number of oscillations in the continuity term, the stronger the colocation between OR pinwheels and OD borders. In particular, we predict that lateral connections oscillate only once in the regions of cat and monkey cortex that have been well characterized by optical imaging so far. These results provide a way to infer the form of the lateral connections from the observed map structure. Thus if in other species, visual cortical areas, or at other developmental stages a strong colocation between OR pinwheels and OD borders is observed, this can be interpreted in our model in terms of lateral interactions that oscillate more than once.
In fact, a recent experimental report might be consistent with this. Kisvárday et al. (2001) optically imaged the maps of OD and OR in adult cats (>4 mo) and kittens (<4 mo). They found that the tendency of pinwheels to lie near OD centers and of OD and OR maps to intersect at steep angles was hardly present in the region between areas 17 and 18, and very weak in area 18. Further, the tendency was significantly weaker for adult cats than for kittens in all 3 regions. These results suggest that the arrangement of pinwheels with respect to OD domains may change with age and cortical region. It is also worth noting that published experimental data do show a proportion, though small, of pinwheels right on OD borders (e.g., Fig. 4A of Hübener et al. 1997).
The growing colocation of discontinuities of OR (pinwheels) and OD (borders) and the flattening of the angle histogram with increasing p seems to be an essential feature of the elastic net. However, although the dependency of these statistics on p is generally gradual and monotonic, there seem to be 2 qualitatively different classes of lateral interactions: one for p = 1 and another for p ≥ 2, with p = 2 often being a transition case, as evidenced by the sharp migration of pinwheels to OD borders for p > 1 (Figs. 1 and 2). Another characteristic feature of the generalized elastic net is the narrowing of the stripes and the fact that the ratio of OD to OR wavelength is smaller for the 1storder derivative than for the higherorder ones, given by Fig. 4 (left). In principle, this might suggest cats have cortical interaction functions of higher order than monkeys (consistent with there being less tendency toward pinwheels lying in OD centers in cats compared with monkeys). Also, this could suggest that any changes in the ratio of wavelengths in an individual between the fovea and periphery might be indicative of a change in lateral interactions over the cortex. However, the stripe widths are also strongly determined by the correlations in the input space (Goodhill and Cimponeriu 2000) and the strength of the continuity term β, which makes it difficult to separate the individual contributions of each of these factors.
Our results show that interrelations between OD and OR maps are caused by the specific form of lateral interactions used by the elastic net. From a structural perspective, what is special about the 1storder derivative (i.e., lateral interactions) that oscillate only once? This form was originally used in the elastic net because it is relevant to solving the Traveling Salesman Problem, that is, minimizing (squared) distances between the receptive fields in feature space of neurons that are neighboring in the cortex (Durbin and Mitchison 1990; Durbin and Willshaw 1987). This also had a flavor of wirelength minimization, although in the input feature space (Durbin and Mitchison 1990) rather than the cortical space (Koulakov and Chklovskii 2001). Durbin and Mitchison (1990) explored the effect of varying the exponent in the sumofdistances term, but did not explore higherorder derivatives as we have done. The motivation for exploring higherorder derivatives also comes from regularization theory (Tikhonov and Arsenin 1977). Derivativebased penalties can be used to represent biological or physical constraints and so obtain good solutions to inverse problems in areas as diverse as vision (Poggio et al. 1985), geophysics (Tarantola 1987), or statistical smoothing and spline approximation (Green and Silverman 1994; Wahba 1990). The effect of the derivative order on the stripe width is theoretically understood in some of these problems (Wahba 1990; CarreiraPerpiñán and Goodhill, unpublished observations). However, at the present such a theoretical understanding is lacking for the other aspects of the 2D structure of the maps that are more subtle but occur reliably in our simulations, such as crossing angles and pinwheel locations. Note also that the different maps are independent in the continuity term, so that these effects are attributed to the combined action with the coverage term. However, the uncertainty about the definition of coverage is smaller than that of continuity (CarreiraPerpiñán and Goodhill 2002a; Swindale et al. 2000) and we may expect similar effects with slightly different definitions of coverage. From a functional perspective, the consequences of any of these different forms for the lateral interactions remain unclear.
It should be noted that different models have somewhat different ways of mapping between the form of cortical continuity they implicitly or explicitly use and a pattern of lateral interactions. For instance, in correlational models, such as that of Miller et al. (1989), the effective cortical interaction I maps to a pattern of lateral interactions B by the relation I = (1 − B)^{−1} (where I and B are matrices and 1 represents the identity matrix; Miller and Stryker 1990; Miller et al. 1989). In this model, I values of Mexicanhat form generally thus correspond to patterns of lateral interactions with many oscillations, and when the joint development of OR and OD is simulated with this model, neither the orthogonality between the 2 columnar systems nor the positioning of OR pinwheels at the center of OD columns is robustly reproduced (Erwin and Miller 1998; see also Piepenbrock et al. 1997). Mathematical relationships between correlational and elasticnet–type models are discussed by Dayan (1993) and Yuille et al. (1996).
In general, we find a substantially weaker relationship in our simulations between the rate of change of different input features with cortical position than has generally been thought to occur in the elastic net (Durbin and Mitchison 1990). Also, by simulating tangential penetrations across pinwheels (data not shown) we do not generally find a jump in the VF position (as confirmed by the fact that pinwheels lie in areas of low to middle VF gradient in Fig. 5). The diffuse clouds of points in Fig. 5 demonstrate that many types of relationships between gradients of variables coexist in the simulated cortex. The elastic net (or dimensionreduction models in general) is not best characterized as producing purely negatively correlated representations where, in general, when one feature varies substantially, the remaining ones vary only slightly (Durbin and Mitchison 1990). Das and Gilbert (1997) claimed that the rate of change of orientation preference is positively correlated with that of visual field position in the cat, although this has not been reproduced by other investigators (cat: Buzás et al. 2003; Hetherington and Swindale 1999; ferret: White et al. 2001; tree shrew: Bosking et al. 2002). It is interesting to note that a model specifically formulated to obtain this result (Mitchison and Swindale 1999) yielded as a byproduct pinwheels on OD borders (corresponding to a positive correlation between the rate of change of OR and OD), which is contrary to what is observed in optical imaging images. Our results show that, as the derivative order increases, the discontinuities of OD and OR increasingly tend to colocate, but the gradients of OR and VF do not correlate.
However, it is important to note that the methodology of experimental studies is practically constrained to be quite different from the complete analysis of rates of changes between feature variables that is possible in theoretical models (Fig. 5). Our gradients are, up to the small discretization error induced by the grid, “true” gradients. Gradients computed from a set of electrode penetrations cannot be considered true gradients for 3 reasons: first, they are really directional derivatives along the line joining 2 sites (the gradient equation takes into account all directions); second, the separation between sites is usually so large (>100 μm, or about 10% of the OR wavelength) that the discretization error becomes very large and can corrupt any local correlation or lack of it; third, the number of sites is very small (a few dozens compared with M = 16,384 in our simulations). The significance of the location of pinwheels in areas of low to medium VF gradients with respect to coverage uniformity (Bosking et al. 2002; Das and Gilbert 1997) remains unclear.
Acknowledgments
We are very grateful to P. Dayan for helpful discussions.
Present address of M. Á. CarreiraPerpiñán: Dept. of Computer Science, University of Toronto, 6 King's Rd., Toronto, ON M5S 3H5, Canada (Email: [email protected]).
Footnotes

The costs of publication of this article were defrayed in part by the payment of page charges. The article must therefore be hereby marked “advertisement” in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.
 Copyright © 2004 by the American Physiological Society