|
|
||||||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Department of Neurology and Neuroscience, Medical College of Cornell University, New York, New York
Submitted 12 July 2006; accepted in final form 15 January 2007
| ABSTRACT |
|---|
|
|
|---|
| INTRODUCTION |
|---|
|
|
|---|
The possible dependence of feature tuning on velocity is important from several points of view. First, V1 neurons can be considered to signal the presence of these features only if they do so in a velocity-independent way. Second, psychophysical studies show various degrees of degradation of visual performance with increasing speed (Burr et al. 1986
; Morgan and Castet 1995
). Finally, an increasing number of neurophysiological studies suggest, contrary to previous assertions (Ungerleider and Haxby 1994
) of parallel processing of shape and motion, that these two streams of scene analysis are not independent at various stages of extrastriate visual processing (Desimone and Schein 1987
; Tolias et al. 2005
). Our study fits in this context by seeking to elucidate the velocity dependence of how single V1 neurons and their ensembles represent the stimulus attributes that determine one-dimensional spatial features.
The view that single neurons function as feature detectors, which would imply speed invariance among other characteristics, enjoyed early but not uncontroversial popularity (Barlow 1972
; Lettvin et al. 1959
) and, when applied to the primary visual cortex, initially appeared to gain support from influential early experiments (Hubel and Wiesel 1962
) on simple cells. However, decades of work consistently failed to turn up direct experimental evidence for the single-neuron-as-detector view in any cortical area examined. The evidence accumulated in V1, reviewed most recently by Carandini et al. (2005)
, instead favors the current consensus, according to which V1 neurons represent banks of variously tuned nonlinear filters that adapt to local contrast energy. The "adaptive filter" view is validated by results obtained mostly with stimuli confined to a narrow frequency band such as gratings and Gabor patches. However, salient features such as lines and edges are defined by phase coherences across a range of spatial frequencies (Morrone and Burr 1988
). In fact, natural stimuli (which are natural because of, among other factors, their highly nonrandom local phase spectra) highlighted the weaknesses of the current adaptive filter model by pointing to the need for the incorporation of a pattern-selective modulatory influence (Felsen et al. 2005
). In the absence of a vetted nonlinear model of sufficient accuracy (Rust and Movshon 2005
; Wu et al. 2006
), the sensitivity of cortical neurons to features defined by phase cannot be predicted from their sensitivity to sinusoidal gratings or Gabor patches, but rather, must be determined experimentally.
To this end, we use a family of compound gratings (whose spatial frequencies span a sevenfold range), parameterized by phase congruence (Morrone and Burr 1988
). The stimuli are matched in spectrum and energy, to eliminate any confounding effects of spatiotemporal filtering on feature tuning. Using this stimulus set, we showed earlier (Mechler et al. 2002
) that typical V1 neurons have nonlinearities that allow them to exhibit "feature tuning" to optimally oriented line-like, edge-like, and intermediate one-dimensional spatial profiles. Here we find that speed strongly influenced specificity and depth of feature tuning of individual neurons. These speed-induced changes in feature tuning were comparable in simple cells and complex cells. We also find that, although the feature tuning of individual V1 neurons is strongly speed dependent, the population as a whole retained a full suite of feature analyzers.
Finally, we analyze a simple model to see how well feature tuning is explained, in qualitative terms, by the known basic properties of V1. We consider a recurrent network model for V1 that was proposed to account for the range of behaviors across the simplecomplex gamut observed in response to single gratings (Chance et al. 1999
). In the model, feature selectivity essentially arises from the interaction between the phase-sensitive linear kernel and the static nonlinearity of the spike threshold. This "iceberg" effect can be either diluted by a phase-insensitive recurrent pooling or compounded by phase-biased recurrent pooling or inhomogeneity in the network. We show that this model accounts for several aspects of responses to compound gratings that we observed experimentally: at each speed, there is a full representation within the V1 population of the entire space of one-dimensional features; there is a comparable degree of feature tuning at different speeds and in simple and complex cells; moreover, this tuning has a comparable degree of speed dependence.
Our results are qualitatively consistent with the consensus view that V1 neurons are adapting nonlinear filters. Specifically, our experimental observations constitute direct evidence against the possibility that individual orientation-selective V1 simple cells function as detectors of oriented lines or edges. Rather, it appears V1 neurons provide an ensemble with selectivity and coding properties that depend dynamically on the stimulus.
| METHODS |
|---|
|
|
|---|
Standard acute preparation techniques were used for electrophysiological recordings from single units in the primary visual cortex (V1) of the primate (cynomolgus monkeys, Macaca fascicularis) previously described in detail (Mechler et al. 1998
, 2002
). All procedures were in accordance with institutional and National Institutes of Health guidelines for the care and experimental use of animals.
In brief, extracellular recordings were made with tetrodes (quartz-coated platinumtungsten fibers; Thomas Recording, Giessen, Germany) placed in the occipital cortex (near HorsleyClark 14 mm posterior, 14 mm lateral) of 14 adult animals under general opiate (sufentanil) anesthesia and muscle paralysis. The analogue signal from each tetrode channel was amplified, filtered (0.66 kHz), and digitized (25 kHz). Multiple single units were isolated by cluster analysis of spike waveforms initially performed on-line (Autocut, DataWave Technologies) then off-line (custom software; Reich 2001
). Isolation criteria included stability of principal components of spike waveforms and a 1.2-ms minimum interspike interval consistent with a physiologic refractory period. Spike times were identified to 0.1-ms precision. Recording tracks and the laminar position of recording sites were anatomically reconstructed using standard histological techniques (Mechler et al. 2002
).
Visual stimulation
The pupils were dilated with topical atropine and covered with gas-permeable contact lenses (Metro Optics, Houston, TX). Artificial pupils (2 mm) and corrective lenses were used to focus the stimulus on the retina. Optical correction was optimized by the aid of responses of isolated single units to high spatial frequency visual stimuli.
Foveae and the receptive fields of isolated neurons were mapped on a tangent board. Visual stimuli were generated by a special-purpose stimulus generator (Milkman et al. 1978
, 1980
) under the control of a PDP-11/93 computer and displayed on a Tektronix 608 monochrome oscilloscope (green phosphor, 150 cd/m2 mean luminance, 270.32 Hz frame refresh). Luminance of the display was linearized with lookup tables in the range 0 to 300 cd/m2. At the 114-cm viewing distance of the animal, the stimuli appeared in a 4° circular aperture on dark background.
The receptive fields of isolated single units fell between 3 and 6° eccentricity and were always fully covered by the stimulus patch. The receptive fields were characterized in a standard way using drifting sine gratings: tuning was measured first for orientation, then for spatial frequency, and finally for temporal frequency, each parameter optimized for subsequent tuning measurements. The contrast response function was measured using the optimal sine grating. With tetrodes, simultaneous isolation of two to eight (on average, three) single units per site was routine. To keep experimental time within practical limits, receptive field characterization (i.e., finding the optimal grating) was limited to the most responsive one or two units.
In each trial of the main experiment, taken at a fixed stimulus drift velocity, each of eight compound gratings, each of the four component gratings, and one blank stimulus was presented for 4 s in a randomly interleaved sequence. Trials were rerandomized and repeated (typically 12 to 25 times) until a target signal-to-noise ratio was obtained for at least one isolated unit. The experiment was then repeated with fourfold increase in the drift speed (by changing the temporal but not the spatial fundamental frequency).
Compound gratings
Compound gratings were of near-optimal orientation and drifting in the optimal direction for the V1 neurons. As in our previous study (Mechler et al. 2002
), each of our compound-grating stimuli was a superposition of the first four odd harmonics of a common fundamental, each with a contrast inversely proportional to the harmonic number. Here, a brief formal description of the stimuli follows.
Let
denote the spatial frequency; f, the temporal frequency; and C, the Michelson contrast of the fundamental component. Thus formally, the spatiotemporal light intensity variation around its mean for the mth component grating is given by
![]() | (1) |
![]() | (2) |
is the phase of each component grating at the origin.
CONGRUENCE PHASE.
Across a stimulus set, with the spatial and temporal frequencies and the contrasts of the four components fixed, the phase
was varied systematically to specify the shape of the compound waveform. With the spatial origin (x = 0) centered on the display, all component gratings share the same phase
at the center of the display at time t = 0. If
= 0, each component peaks at x = 0. Because they reinforce each other, they produce a line-like shape. If
=
/2, the components sharpest rising parts coincide at x = 0 and, reinforcing each other, produce an edge-like shapeas expected because they constitute the truncated Fourier approximation of a square wave. Following Morrone and Burr (1988)
, we therefore designate
the "congruence phase" of the compound grating.
The feature space, defined by the congruence phase, is periodic in
. Because compound gratings are sums of only odd harmonics, two stimuli whose congruence phases differ by
have identical spatial waveforms save for a half-cycle shift, which makes them equivalent as periodic stimuli. As shown in Fig. 1, we sampled the congruence phase in eight equal steps on the [0,
) phase interval to construct eight different rigidly drifting compound waveforms.
|
|cos (
)|, with the maximum (0.84) realized by the line and the minimum (0.47) by the edge. The reader is referred to the preceding paper (Mechler et al. 2002
DRIFT VELOCITY.
Two drift velocities were used to determine how stimulus speed interacted with a neuron's sensitivity to congruence phase. Drift velocity, V = f/
, was changed from V = 3.1 deg/s "low" speed to V = 12.4 deg/s at "high" speed. This was done by increasing the temporal frequency of each component grating fourfold while keeping their spatial frequency fixed (the fundamental was at
= 0.25 c/deg). The specific temporal frequencies used for the fundamental and the higher harmonics were (values in Hz) f = 0.78, 3f = 2.34, 5f = 3.90, and 7f = 5.46 at low speed; and f = 3.12, 3f = 9.36, 5f = 15.6, and 7f = 21.84 at high speed. Because all recordings were at approximately the same eccentricity, this choice allowed all four components of the compound grating to be within the spatiotemporal pass-band of each cell at a "low" speed. A "data set" denotes recordings of responses of one cell to the eight compound gratings at a single drift velocity.
Selection and classification of neurons
The 63 cells with 100 data sets (out of a total of 226 data sets recorded in 137 cells) selected for analysis were those that 1) maintained good spike isolation throughout the experiment and 2) passed a signal-to-noise criterion in the compound-grating experiments. Signal variance was defined for each Fourier component as the squared Fourier amplitude of the trial-averaged response to each compound grating summed over all stimuli. Noise was defined as the trial-by-trial variance of the same component summed over all stimuli. The selection criterion required that the median ratio of signal over noise variance taken over the first eight Fourier components of the response be >0.3.
This data set substantially overlaps with that presented earlier (Mechler et al. 2002
), but the two are not identical. The earlier paper, which focused on analyzing single-response harmonics but did not look into the influence of speed, used a different signal-to-noise criterion (it was based on a d' threshold placed on the Fourier components in comparison to the blank) and also included data sets that were obtained with stimuli of different fundamental frequencies. As a result, the 100 data sets analyzed here included 78 of the 121 presented in the earlier paper and 22 from the same pool that were not analyzed earlier.
Cell classification is based on the modulation ratio (Skottun et al. 1991
). According to this convention, the fundamental (F1) of the response to a single drifting grating of near-optimal spatial parameters was compared with the DC component after subtraction of the maintained rate of firing (F0) and a cell was labeled simple if F1/F0 > 1 and complex otherwise. Accordingly, there were 24 complex and 13 simple cells in the speed-paired sample. We analyze and report dependence on the modulation ratio F1/F0 both categorically and parametrically.
Recurrent network model
Chance et al. (1999)
introduced a network model for V1 with variable recurrent gain. In response to drifting gratings, this model produces phase-modulated, simple-like responses at low gain and phase-invariant, complex-like responses at high gain. We asked whether this model could account for various aspects of the feature tuning we observed experimentally. As detailed below, only minor changes to this model were made: we changed the time constant of the feedforward impulse response and we varied the nonlinearity to include nonzero firing thresholds and half-squaring.
In this model, the continuous firing rate of the ith neuron, ri, is instantaneously boosted by the sum of the input from its feedforward sources (Iiff) and those from its recurrent connections (Iirec) and relaxes with a time constant
r (set to 1 ms)
![]() | (3) |
Note that there is no spontaneous activity in the model. The effect of including spontaneous activity would be to allow for negative thresholds, but would not alter the simulation results.
A two-stage linearnonlinear (LN) operator acting on the stimulus supplies the feedforward input Iiff
![]() | (4) |
, f,
) (Eq. 2), with the separable spatiotemporal kernel Gi(x)H(t). The scale factor A sets the absolute response magnitude. The nonlinear operator has two stages. The first is a static nonlinearity that consists of a threshold
and a rectifier [x]+ = max (0, x); the second is a power function with an exponent n
1. As an example,
= 0 and n = 1 represent perfect half-wave rectification and
= 0 and n = 2, half-squaring. The value of
was chosen to be zero for some networks; for other networks, a nonzero
was chosen such that the response of the neurons with the smallest receptive field to the fundamental component (presented alone) was half-maximal.
Gi(x), the spatial filter of the ith cell, is a Gabor function
![]() | (5) |
i, the carrier (or Gabor) frequency ki, and the carrier (or Gabor) phase under the envelope
i. The model included nk = 7 spatial frequency channels, with the Gabor frequency k sampled in equal steps of 0.5 c/deg from 0.5 to 3.5 c/deg, a 3-octave range. For each Gabor frequency, the Gabor phase was evenly sampled in steps of
/32 radians from the entire [
,
] interval (n
= 33). Thus the network size was N = nkn
= 231.
If the shape of receptive field profiles were independent of their size, then
i would be proportional to 1/ki. That is, the dimensionless combination
k, which measures the average number of cycles of the optimal grating "seen" by the neuron within the aperture of its receptive field, would be constant. An alternative to this picture (
= const/k) would be that receptive field size is independent of the optimal spatial frequency, i.e.,
= const. Macaque V1 neurons apparently represent a compromise between these two possibilities. This is based on the observation of a weak negative correlation between size (
) and optimum spatial frequency (k) (D Xing, MJ Hawken, and RM Shapley, personal communication). To endow the model with a bit of realism but keep its details simple, we implemented the compromise between constant shape and constant size by allowing two shape factors, a smaller one, that held at large scales (
i2
ki = 2.5, ki
1.5 c/deg), and a slightly larger one that held for small scales (
i2
ki = 2.7, ki
2.0 c/deg). In equivalent terms, the high spatial frequency channels in this model have somewhat narrower frequency bandwidths than the low spatial frequency channels.
H(t), the temporal response, is a single-parameter biphasic function
![]() | (6) |
. The time constant was set identical for each unit (
= 66 s1) except as noted.
The recurrent input to each neuron is pooled from all other neurons in the entire network by a kernel defined as a difference of two Gaussians in the space of the Gabor frequencies ki of the feedforward inputs
![]() | (7) |
c = 0.5 c/deg and
s = 1 c/deg, respectively). The bandwidth of the resulting spatial-frequency tuning curve is similar for all units because it is primarily determined together by
c and
s, and less dependent on
i, the width of the Gabor envelope of the feedforward input. The gain term gi, normalized by the network size, sets the strength of the recurrent input that each neuron receives. In homogeneous-gain networks, all cells behave like ideal simple cells when g = 0, and increasingly like ideal complex cells as g
gmax, where gmax denotes the maximum gain attainable in homogeneous-gain networks. For gains g
gmax, recurrent amplification makes the network unstable. Numerical values of recurrent gain are presented, even for inhomogeneous-gain ("mixed-gain") networks, as g/gmax, relative to gmax of the homogeneous-gain networks. However, in mixed-gain networks, g is not bounded by gmax. This is because the true maximum gain is a parameter that depends on other network parameters, including the distribution of gains. In particular, the true maximum gain in a mixed-gain network can be made arbitrarily large if the number of units with very high gain are kept sufficiently low, and this in turn permits some units to have gains g
gmax. Data analysis
Firing rate responses for each neuron in the network were analyzed in exactly the same way as the spiking responses collected experimentally from V1 neurons. Off-line data analysis and statistical tests were performed using Matlab (The MathWorks, Natick, MA) toolbox functions and custom software written in Matlab.
| RESULTS |
|---|
|
|
|---|
Feature tuning and its dependence on speed
Earlier we showed (Mechler et al. 2002
) that V1 neurons are tuned to the congruence phase of compound gratings, and that response energy and other response measures based on harmonics beyond the DC are especially sensitive to this tuning. Here we demonstrate that in most V1 neurons, feature tuning is dependent on the drift velocity of the compound gratings.
It is tempting to analyze the responses to compound gratings in terms of the responses to their components and a nonlinear response model. However, as indicated in our earlier study (Mechler et al. 2002
), the accounting for the compound-grating responses requires a highly nonlinear model; idealized rectifiers and energy mechanisms do not suffice. This is further illustrated in Fig. 2. It shows the time histograms of the responses of three representative V1 neurons to the compound gratings (arranged along the phase circle in the same way these stimuli were introduced in Fig. 1), as well as to the four component gratings presented alone (stacked in the center, as labeled in Fig. 2A). For each cell, the set of responses on the left correspond to the stimuli drifting at low speed, and the set on the right, to stimuli drifting at high speed. Other examples (not paired for speed) can be found in Mechler et al. (2002)
.
|
On the other hand, local squaring operations (Burr and Morrone 1992
) can provide some feature selectivity. Additionally, the behavior of Fourier components of the response as a function of congruence phase implies the presence of high-order nonlinearities (order
3), for both complex and simple cells (Mechler et al. 2002
). Another way to rescue a linear-static nonlinear model would be to add phase-sensitive (Felsen et al. 2005
) or strongly dynamic nonlinearities. However, specific forms for such nonlinearities have not yet been proposed, so it is difficult to test models of this kind from the data of individual cells.
The example cells of Fig. 2 typify another feature of our data. They exhibit, to various degrees, a more low pass spatial sensitivity at the (fourfold) higher temporal frequency, indicating that spatiotemporal sensitivity of these neurons is not separable in the two frequency domains. On the other hand, their spatial frequency optimum does not seem to decrease in inverse proportion to the temporal frequency change, indicating that these neurons were not exactly tuned to velocity either. Cells like these, whose sensitivity was neither separable in spatial and temporal frequency nor tuned to velocity when assayed with single gratings, were found to constitute a large fraction of cells in V1 (Priebe et al. 2006
). This mixed behavior in the responses to single gratings (spatiotemporal inseparability) further complicates predictions of the responses to compound gratings when their drift velocity is varied.
In sum, a cell-by-cell approach to fitting the compound grating responses from the single-grating responses is insufficiently constrained by existing models that could conceivably work (spatiotemporally inseparable models with high-order phase-sensitive and/or dynamic nonlinearities). For this reason, our analytical approach will consist of an attempt to account for the range of behaviors across the population from a minimal network model, rather than the details of individual cells.
The first step is the extraction of indices that describe the responses to the compound gratings. Figure 3 shows the tuning to congruence phase (feature tuning) for the three cells in Fig. 2. The three illustrate the observed range of behavior and are ordered (from top to bottom) by increasing difference between the optimal phases at the two stimulus speeds. Each panel shows the response (total energy) of a single cell at low speed (open symbols) and high speed (filled symbols). Total response energy is defined as the summed squared amplitudes of the DC (after subtracting the baseline level) and the first eight Fourier components of the mean response. It is one of many alternative scalar response measures that were shown in our earlier paper to be consistent in identifying the feature optimum and comparable in their sensitivity (depth) of feature tuning.
|
![]() | (8) |
1,
2to minimize the mean squared error of the fit. This family is a natural choice for the empirical description of feature tuning because it encompasses contributions of nonlinearities up to and including fourth-order and captures much of the variance in the tuning. The best-fitting function from Eq. 8 (thick continuous lines in Fig. 3) was used to extract objective measures of tuning curves and their change for further analysis.
We defined the optimal stimulus by its congruence phase,
opt, at the peak position of the tuning curve (Fig. 3, thin arrows for low speed, thick arrows for high speed). The congruence phase,
, which parameterizes the feature space, is periodic with period
.
opt = 0 corresponds to a line-like stimulus;
opt =
/2 corresponds to an edge-like stimulus; and intermediate values of the congruence phase correspond to intermediate one-dimensional features.
To quantify the change in the optimal stimulus,
opt, induced by a change in the drift velocity, we determined
![]() | (9) |

opt must lie between
/2 and
/2. A value of 
opt = 0 indicates no speed-dependent change in optimal congruence phase; values of 
opt = ±
/2 are the maximum possible changes. We also consider the unsigned quantity |
opt|, which indicates the change in feature selectivity independent of the direction of change (0 < |
opt| <
/2).
To quantify the overall similarity of two tuning curves measured at different velocities, we use the Pearson correlation coefficient, r, which is sensitive to the shape of the phase variation but not to the size of the untuned part (mean elevation) of the tuning curves. For a pair of sinusoidal tuning curves, maximum positive and negative correlation (r = ±1) correspond to minimum (
opt = 0) and maximum (|
opt| =
/2) phase shifts, respectively, and minimum correlation (r = 0) corresponds to the intermediate shifts (
opt = ±
/4). The latter are quarter-cycle shifts of tuning curves in this feature space, defining quadrature pairs. Although r = 1 implies that there is no change in the peak of the tuning curve (
opt = 0), the converse is not true because the tuning curve may peak in the same position (
opt = 0) yet change in shape (r < 1).
For most neurons,
opt depended on stimulus velocity, but the extent of this dependence varied widely across the population. The same was true for the relative size of the responses to a given spatial waveform. Exemplifying one extreme is the neuron shown in Fig. 3A. This cell responded about twice as vigorously at high velocity (filled symbols) as at low velocity (open symbols). Despite this overall change in responsiveness, the tuning curves at the two velocities were similar in shape (Pearson correlation coefficient, r > 0.8). Correspondingly, the optimal stimulus was line-like (
opt
0), at both stimulus speeds (|
opt| < 0.11
). Illustrating the other extreme, the neuron shown in Fig. 3C was tuned to almost perfectly opponent congruence phases at the two velocities (|
opt|
0.4
). Its tuning curves at the two speeds were strongly anticorrelated (r < 0.6). This neuron decreased, rather than increased, its response magnitude from low speed (open) to high speed (filled). The neuron shown in Fig. 3B was approximately halfway between these extremes. Its phase preference at the two stimulus speeds approximated a quadrature pair (|
opt|
0.25
), and the correlation coefficient (|r| < 0.3) was small, as expected for a quadrature shift. This neuron responded equally vigorously at both speeds.
The range of the speed-induced changes of the optimal phase and of the shape and size of tuning curves in the examples shown in Fig. 3 is representative of the range observed in the entire V1 sample. (The sign and magnitude of the velocity-induced change in response size were not correlated with the velocity-induced change in feature preference, although the three examples of Fig. 3 may give an impression of correlation.) These and other aspects of feature tuning are shown for the entire V1 sample in Fig. 4. The plot on the left (Fig. 4A) summarizes how the optimal congruence phase depends on the drift velocity of the compound gratings. Note that these scattergrams are periodic in
on both dimensions, corresponding to the periodicity of the stimulus space. In these plots, speed invariance would correspond to a concentration of data points near the diagonal and a constant phase shift from low to high speed would correspond to a concentration of data points on a line that is parallel to the diagonal. The pair of dotted off-diagonal lines traces the locus of maximum phase offset (|
opt| = 0.5
). In our sample, the optimal features obtained at low speed (
opt,low) and high speed (
opt,high) exhibited no significant (linear) circular association as measured by the circular correlation modulus (Fisher 1993
) (|r|
0.1, P > 0.5). Because the modulus of the circular correlation is not significant, there is no observed tendency for an average speed-induced 
opt. In sum, we find no evidence either for speed invariance or a net speed dependence of feature tuning in V1. Rather, we find a scattering of tuning at low and high velocities, which, from our finite data sample, is indistinguishable from random.
|
|
opt| on the vertical axis) varies with the F1/F0 modulation ratio, a traditional index of nonlinearity and the simplecomplex type (Skottun et al. 1991
opt| and the negative correlation between |
opt| and F1/F0 was not significant (Pearson correlation coefficient 0.3 < r < 0 and P > 0.08). Moreover, the distributions of the speed-induced phase-shifts, both the signed and unsigned quantity, were statistically indistinguishable in simple and complex cells (KolmogorovSmirnov two-sample test, P > 0.05 for |
opt|, P > 0.2 for 
opt). However, these statistical results are not robust given the rather small sample size. It is possible that with a larger sample size one would find a significantly stronger tendency among simple cells to maintain their phase preference or that the size of speed-induced change in feature preference negatively correlated with the index of cell type. The meaning of the optimal feature parameter depends on the selectivity of tuning. Therefore we also analyzed the selectivity of the tuning, as measured by the circular variance (CV) of the tuning curve. (Here CV denotes 1 minus the usual measure. For calibration, a delta function of a circular variable has CV = 1 and the CV of a cosine raised to a constant pedestal is about half the modulation depth measured by the Michaelson contrast.) The CV indicated that at both speeds, most cells were broadly tuned: CV < 0.3 for all but two cells. Unlike the preferred feature, tuning selectivity as measured by the CV was highly correlated at the two speeds (r = 0.71). The median CV at low speed was 0.11 and increased to 0.13 at high speed, a slight and marginally significant change (paired sign-rank test, P < 0.1). Also unlike the preferred feature, both the CV and the speed-induced change in the CV were uncorrelated with F1/F0 and these measures were similarly distributed in simple and complex cells (KolmogorovSmirnov two-sample test, P > 0.5).
The CV, unlike the bandwidth or the depth of modulation, is a good measure of the overall shape of a tuning curve. The above results were not dependent on the measure of selectivity, though: the same conclusions were reached when the measure was the depth of modulation of the tuning curve. Thus the relative magnitude of the feature-independent and the feature-modulated components of the compound-grating responses of V1 neurons are essentially independent of stimulus speed.
In principle, a speed-induced change in feature tuning could be attributable to a shift in optimal phase, a change in the shape of the tuning curve, or both. The third plot (Fig. 4C) examines this issue. If the tuning curves at low and high velocities were related by a pure shift in optimal phase 
(i.e., a translation, permitting a rescaling of the tuning curve), it follows that the correlation coefficient r of the two tuning curves is given by
![]() | (10) |

opt| is dominated by declining sigmoid. This accounts for the general shape of the scattergram in Fig. 4C. Thus a 
shift accounts for a substantial component of the velocity-induced change in tuning. On the other hand, if a shift in 
were the sole cause of the velocity-induced change in tuning, then an appropriate translation in the tuning curve measured at high velocity should bring it into coincidence with the tuning curve measured at low velocity (permitting rescaling). We determined this "corrective" phase shift as the phase shift that 
corr maximizes the correlation coefficient r of the tuning curve measured at low velocity and the tuning curve measured at high velocity after a translation by 
corr. Not surprisingly, 
corr is highly correlated with the speed-induced shift in the optimal congruence phase 
opt (r > 0.9). However, this translation does not bring the low- and high-velocity tuning curves into coincidence. Rather, the median correlation coefficient between the speed-paired tuning functions was r = 0.73. Thus a translation of the tuning curve accounts for only about half of the variance (r2
0.5). A change in shape of the tuning curve, as well as measurement error, constitutes the other half of the variance. As a final point, we mention that feature preference or tuning depth did not correlate with relative cortical depth. Laminar location was identified histologically for most cells, but possible laminar variations could not be studied because of the small sample size.
A model of feature tuning
Many aspects of the behavior of real V1 neurons can be understood in terms of some variant of the "iceberg effect," i.e., in terms of the interaction between a linear filter (the spatiotemporal kernel of the receptive field) and a static nonlinearity (that of spike threshold). As we show later, this mechanism is also fundamental in endowing V1 neurons with feature tuning. We now examine to what extent this can account for our data.
A linear operator scales the amplitude and shifts the phase of the frequency components present in the stimulus but adds no new frequency components. Moreover, the amplitude in the output of a linear transform depends only on the frequency but not the phase of the input. Thus neither the amplitude nor its square (the energy), taken in any combination of output components, can exhibit feature tuning for the stimuli used here: feature tuning signifies nonlinearity.
By general considerations similar to those laid out in Mechler et al. (2002)
, one can show that an isolated linearnonlinear (rectified) simple cell receptive field model is expected to exhibit feature tuning, that the tuning is periodic in twice the congruence phase, and that the dominant term in its harmonic expansion in phase is
cos [2(
opt)]. Furthermore, the energy model of complex cells that sums with equal weight the squared output of two quadrature pairs of simple cell (rectified linear) subunits (one even symmetric and one odd symmetric as well as their opposites in contrast polarity) will by design produce no phase tuning because the subunits outputs combine to a phase-independent constant DC elevation. The key premise necessary to reach these conclusions is that, by design, the congruence phase is the same in each component of a given compound grating. The key observation in the analysis is that for a nonlinear contribution of order n, the output phase is the sum of the phases of the interacting components.
However, simple LN models cannot account for the responses to compound gratingsfor example, the peaking of the responses seen in Fig. 2 or the manner by which the response Fourier components depend on the congruence phase (Mechler et al. 2002
). Adding phase-sensitive nonlinearities or dynamic gain controls might recover such features within the context of a feedforward model, but concisely parameterized models of this sort capable of predicting responses to moving stimuli are not yet in hand. An alternative approach to determine whether the critical features of our responses could be derived from a physiologically reasonable elaboration of idealized LN models is to incorporate idealized LN neurons into a simple recurrent network (Chance et al. 1999
). This model departs from the Hubel and Wiesel (1962)
hierarchical (feedforward) model of V1 in which complex cells pool their inputs from simple cells that have complementary receptive field profiles and reflects the growing consensus that corticocortical interactions are critical to understanding responses of individual cortical neurons. Chance et al. (1999)
proposed that complex-cell responses arise through recurrent amplification of simple-cell responses and that simple and complex cells represent the weakly and highly coupled regimes of the same basic cortical circuit. We now ask whether the same basic network model can account for the characteristics of feature tuning that we observe.
Although the isolated linearnonlinear receptive field model is tractable (as outlined earlier), interconnection of such units requires numerical simulation to determine the contributions from single-cell receptive fields and network mechanisms that shape feature tuning.
We implemented several variants of the above network model (as detailed in METHODS). Briefly, the network consists of interconnected rectified Gabor units whose receptive fields are identically centered and oriented. Gabor frequency and phase, representing the linear feedforward input to the network, tile the space of spatial frequency and phase. The recurrent gain relative to the strength of the linear kernel can be varied. Previously, we showed that this model could account for much of the diversity of feature preference and selectivity seen in V1 responses to compound gratings (Ohiorhenuan et al. 2004
). Here we report that this model captures most of the qualitative behavior of V1 neurons to one-dimensional features and, specifically, the model can explain the pattern of speed dependence of V1 responses to this stimulus set.
To develop an intuition for how the recurrent model leads to feature tuning, we begin with homogeneous-gain models, in which the gain of recurrent feedback is the same for every cell. Figure 5 shows tuning to compound gratings drifting at low and high speeds for model neurons in three homogeneous-gain networks that differed only in the gain parameter. In each data set, neurons are organized in rows by k, their Gabor spatial frequency, and in columns by
, their Gabor phase. The network of neurons is evenly subsampled for display. For each model neuron, tuning curves are plotted analogously to Fig. 3. In the simulated experiments, the fundamental grating component's spatial frequency was 0.25 c/deg and its temporal frequency was 1 Hz at low speed, 4 Hz at high speed; each parameter value was chosen to be similar to those used in our V1 experiments.
|
= 0 and n = 1), which uniquely precludes tuning to equal-energy compound gratings because its output preserves the equal-energy property of the input. Next, we describe the characteristics of feature tuning that are common to all model networks studied.
First, at any given stimulus speed, feature sensitivity in each simple cell varies approximately as
cos [2(
opt)] function of congruence phase, with a distinct feature preference,
opt. Thus the simulation of the zero-gain network affirms the qualitative inferences made earlier for the shape of feature tuning in an isolated rectified feedforward unit.
Second, at a given drift velocity, for any particular cell, the feature preference monotonically depends on the receptive field's Gabor phase, i.e.,
opt(
)
(
+ const) mod
. This dependence on Gabor phase survives increased recurrent interactions and points to the critical role that the symmetry of the feedforward kernel plays in shaping feature preference in V1. Furthermore, although the form of this dependence does not change with a change in stimulus velocity, the constant offset and thus the tuning optimum itself depends on speed: changing the drift speed V of the stimulus results in a drift-dependent shift, 
opt(V), in the preferred stimulus, i.e.,
opt(
, V)
[
+ 
opt(V)] mod
.
The dependence of the constant offset is the signature of the complex multipliers of the spatiotemporal kernel. The kernel need not be separable in the frequency domain to have this effect. The phase offset depends on the complex amplitudes (and thus phases) of the spatial and temporal transfer functions of the feedforward kernel. In the simulations discussed so far, all units in a network had identical temporal integration property, which translates into identical complex multipliers in the time domain. Model neurons in different Gabor channels are expected to differ in their spatial complex multipliers, but because of the similar overall shape of their spatial tuning function this difference does not alter the phase dependence very much (its extent is reflected by the scatter in
Fig. 7B)thus the approximately constant phase offset at a fixed stimulus velocity.
|
|
Third, within each spatial frequency channel corresponding to a fixed Gabor frequency k, the magnitude of the response varies regularly with
, the Gabor phase, approximately as
cos (2
). Thus the units with the symmetric Gabor kernel (first column, labeled
= 0, in the plots shown) have the largest and the units with the asymmetric Gabor kernel (column labeled
=
/2) have the smallest responses. This pattern arises through the feedforward input because the even-symmetric linear component, taken after rectification, is larger than the odd-symmetric one. A similar pattern would arise in any family of kernels that sample a mixture of odd and even functions.
Because it arises from an interaction between the linear kernel and the static nonlinearity, this pattern is enhanced by an increase in the threshold or in the acceleration (i.e., the exponent) of the power function. This mechanism is especially prominent in the high spatial frequency channels. This is explained as follows. Stimulus energy, by construction, declines with component frequency. Thus cells of the highest Gabor frequency (largest k values) respond to the compound gratings with the smallest magnitude in the entire network, which, assuming a networkwide constant threshold, makes them the most sensitive to clipping.
Our simulations also indicate that changing the drift speed does not affect the
cos (2
) dependence of the magnitude of the responses across units, but can affect the absolute magnitude of the responses as well as the selectivity of the feature-tuning curve in a spatial frequency-dependent manner.
Chance et al. (1999)
showed that for homogenous gain networks, increasing the gain results in increasing phase-insensitive pooling and leads to single grating responses that are progressively more complex-like. The same mechanism decreases the sensitivity (modulation depth) of feature tuning to compound gratings, as illustrated by Fig. 5, B and C for various (high) levels of gain. Thus when pooled phases are balanced, recurrent pooling acts against the static nonlinearity of the receptive field by making responses more complex-like. Underlying the importance of the role that the rectified feedforward component plays in setting up feature tuning is the fact that the recurrent gain must be quite high to generate a noticeable change in the shape of the feature tuning curves. Specifically, feature tuning remains stable while the recurrent gain is raised from zero (g/gmax = 0, all feedforward simple cells) all the way up to an intermediate level (g/gmax = 0.5, a value that results in interacting model neurons that are all borderline simplecomplex by the measure of the modulation ratio; not shown). Thus a point of special emphasis here is that intermediate gains generate complex cells that exhibit significant feature (phase) tuning. This is all the more notable because the F1/F0 ratio, the index of the simplecomplex continuum, is also a measure of phase sensitivity.
Notice that the preferred feature in each unit is independent of the choice of the static nonlinearity or the recurrent gain, but only if the latter is not too high. At very high homogeneous gains (Fig. 5C), feature tuning becomes homogeneous because all units begin to behave independently of their own afferent input and similarly to the units that respond the most strongly. That is, in the high homogeneous-gain regime, these strongly coupled networks exhibit winner-take-all behavior, which is expected from strongly coupled recurrent networks in general. For these networks, the "winner" among Gabor units of the same spatial frequency k is the one with a symmetric kernel (Gabor phase
= 0 or
=
).
This winner-take-all behavior is more prominent when clipping by the rectifier is more severe. This accounts for the more prominent winner-take-all behavior in the higher spatial frequency channels (Fig. 5C, bottom row) because, in these channels (see above), the linearly filtered stimulus energy is smaller. The winner-take-all favoring of the symmetric Gabor is powerfully reinforced by the recurrent excitation from neighboring frequency channels, where this mechanism is similarly prominent.
Note that even though the high-gain regime of the model leads to cells with complex-like behavior in terms of F1/F0 (Chance et al. 1999
), the high-gain regime does not lead to energy-like behavior in terms of feature tuning. This follows from the biases set up by the feedforward input as explained earlier, along with the winner-take-all behavior. The selectivity of tuning remains larger in the higher-frequency channels because of the relatively stronger effect of clipping in those channels.
INHOMOGENEOUS (MIXED-GAIN) NETWORKS.
Homogeneous-gain networks illuminate the genesis of feature tuning in model neurons. However, a single homogeneous gain can produce only one kind of behavior, not a simplecomplex continuum. Moreover, a well-documented observation about the primate V1 (Ringach et al. 2002
) is that simple and complex cells are both present in every cortical layer, with slight variation of their relative abundance across layers but no obvious spatial segregation within layers. Thus by virtue of its ability to generate an arbitrary simplecomplex continuum, a random-gain network is likely to be a more realistic model of the V1 population.
Before proceeding to the presentation of the mixed-gain network simulations, a technical point about the behavior of the gain parameter needs to be made. In the preceding analysis of homogeneous-gain networks, we (following Chance et al. 1999
) have referenced values of the homogeneous gain g to the maximum stable value of the gain gmax. An inhomogeneous network can remain stable even if some cells have g > gmaxprovided that there are not too many of them. Thus for inhomogeneous-gain networks g can be sampled in a wider range than the one limited by gmax of homogeneous networks of otherwise identical parameters.
To illustrate this point and to examine how gain determines the simplecomplex character in the mixed gain network we plotted in Fig. 6 the F1/F0 modulation ratio for the optimal sine grating as a function of the gain. To facilitate comparison with results for homogeneous-gain networks, we normalized gain with gmax of homogeneous networks of otherwise identical parameters (thus g/gmax > 1 could be realized). Gains were randomly chosen from a uniform distribution over the g/gmax
[0, 1.4] range. The functional relationship is a slowly decaying one, with F1/F0
0 at very large gains. (Thus complex cells with F1/F0 < 0.2 can be realized by recurrent gains greater than the range sampled in Fig. 6.) The dependence of F1/F0 on gain is parametric in the Gabor frequency, as indicated by the fine thread-like densities in the scatterplot, each of which is composed of data from units of a particular spatial frequency channel. The asymptotic dependence is very different from the linear relationship (slanted dotted line) known for the homogeneous gain networks (Chance et al. 1999
). This difference is reflected in the range of gains associated with simple cells (triangles) and complex cells (squares). In homogeneous networks, the class boundary (horizontal dotted line) intersects in a single point with the linear regression of data, sharply dividing the continuum of gain between simple cells (g/gmax < 0.41) and complex cells (g/gmax
0.41). In mixed-gain networks, simple cells are confined to a narrower range of gains and the boundary is not sharp (scatter of triangles and squares along abscissa overlap in Fig. 6). This is because the location of the intersection of class boundary (horizontal dotted line) with the data depends on the Gabor frequency.
Figure 7 A shows the tuning curves for model units in a "mixed-gain" network in an arrangement similar to that in Fig. 5. As may be expected from the observations already made, unit by unit, feature preference in the mixed-gain population closely resembles that observed in the homogeneous intermediate-gain network (Fig. 5B), although there are differences. Selectivity, but especially response magnitude, response parameters that are more dependent on recurrent gain are more varied in the mixed-gain network. A case in point is the lawful variation of tuning magnitude with Gabor phase observed in homogeneous networks. That pattern, which survived even in strongly coupled units in a network of homogeneous high gain (Fig. 5C), is diluted here. The pattern is expected to be fully eliminated in a sufficiently inhomogeneous network.
To compare the mixed-gain model with the V1 population, Fig. 7, BD presents the same population analyses as in Fig. 4. In V1 (Fig. 4A), the scattergram of optimal feature at low speed (
opt,low) versus high speed (
opt,high) showed no statistical association by linear circular correlation statistics. However, the simulations (Fig. 7B) for the recurrent network model show prominent "tracks," indicating strong correlation between feature tuning at the two speeds. They signify the monotonic dependence of feature preference on Gabor phase, a legacy of the linear kernel. Thus not surprisingly, the tracks were also seen in homogeneous-gain networks and their pattern and position were preserved across gain levels (data not shown). The exact shape of that dependence, and thus the shape of the track (e.g., the degree of deviation of the data points from a line of unity slope), depends on the relative spatial frequency of the stimulus and the Gabor frequencydata from units of the same frequency channel form fine "fibers" within the track. We show later (Fig. 8) that the offset of the track along the axes s