JN Fuel your research with LabChart
HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS
 QUICK SEARCH:   [advanced]


     


J Neurophysiol 96: 3492-3505, 2006. First published September 20, 2006; doi:10.1152/jn.00575.2006
0022-3077/06 $8.00
This Article
Right arrow Abstract Freely available
Right arrow Full Text (PDF)
Right arrow A corrigendum has been published
Right arrow All Versions of this Article:
96/6/3492    most recent
00575.2006v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Right arrow Citation Map
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via ISI Web of Science (7)
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by David, S. V.
Right arrow Articles by Gallant, J. L.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by David, S. V.
Right arrow Articles by Gallant, J. L.

Spectral Receptive Field Properties Explain Shape Selectivity in Area V4

Stephen V. David1, Benjamin Y. Hayden2 and Jack L. Gallant3,4

1Graduate Group in Bioengineering, 2Department of Molecular and Cellular Biology, 3Department of Psychology, and 4Program in Neuroscience, University of California, Berkeley

Submitted 31 May 2006; accepted in final form 12 September 2006


    ABSTRACT
 TOP
 ABSTRACT
 INTRODUCTION
 METHODS
 RESULTS
 DISCUSSION
 GRANTS
 ACKNOWLEDGMENTS
 REFERENCES
 
Neurons in cortical area V4 respond selectively to complex visual patterns such as curved contours and non-Cartesian gratings. Most previous experiments in V4 have measured responses to small, idiosyncratic stimulus sets and no single functional model yet accounts for all of the disparate results. We propose that one model, the spectral receptive field (SRF), can explain many observations of selectivity in V4. The SRF describes tuning in terms of the orientation and spatial frequency spectrum and can, in principle, predict the response to any visual stimulus. We estimated SRFs for neurons in V4 of awake primates by linearized reverse correlation of responses to a large set of natural images. We find that V4 neurons have large orientation and spatial frequency bandwidth and often bimodal orientation tuning. For comparison, we estimated SRFs for neurons in primary visual cortex (V1). Consistent with previous observations, we find that V1 neurons have narrower bandwidth than that of V4. To determine whether estimated SRFs can account for previous observations of selectivity, we used them to predict responses to Cartesian gratings, non-Cartesian gratings, natural images, and curved contours. Based on these predictions, we find that the majority of neurons in V1 are selective for Cartesian gratings, whereas the majority of V4 neurons are selective for non-Cartesian gratings or natural images. The SRF describes visual tuning properties with a second-order nonlinear model. These results support the hypothesis that a second-order model is sufficient to describe the general mechanisms mediating shape selectivity in area V4.


    INTRODUCTION
 TOP
 ABSTRACT
 INTRODUCTION
 METHODS
 RESULTS
 DISCUSSION
 GRANTS
 ACKNOWLEDGMENTS
 REFERENCES
 
Cortical area V4 lies near the middle of a hierarchical sequence of visual areas that mediate shape perception (Felleman and Van Essen 1991Go; Ungerleider and Mishkin 1982Go; Van Essen et al. 1994Go). Early in this pathway, in primary visual cortex (V1), neurons are selective for a small number of simple stimulus features such as position, orientation, and spatial frequency (De Valois et al. 1982aGo; Hubel and Wiesel 1968Go). At more central stages of processing, in the inferior temporal cortex (IT), neurons are selective for more complex patterns. Tuning in central areas is often related to object identity and invariant to stimulus position and size (Desimone et al. 1984Go; Kobatake and Tanaka 1994Go). V4 plays a crucial role in transforming simple physical stimulus features to the abstract form representation in IT; damage to V4 interferes with shape perception, color perception, and attention (De Weerd et al. 1996Go; Gallant et al. 2000Go; Merigan 1996Go; Merigan and Pham 1998Go; Schiller 1995Go; Schiller and Lee 1991Go).

Neurophysiological studies have not produced consistent descriptions of shape coding in V4. One early experiment reported that V4 neurons are tuned for size and invariant to stimulus position, properties not found in more peripheral areas (Desimone and Schein 1987Go). A later series of experiments compared selectivity for Cartesian gratings and for polar and hyperbolic (non-Cartesian) gratings in V4 (Gallant et al. 1993Go, 1996Go); V4 neurons are most selective for non-Cartesian gratings containing multiple orientations. A separate study reported that the optimal stimulus for single V4 neurons varied widely, but that most cells respond strongly to stimuli containing multiple orientations (Kobatake and Tanaka 1994Go). A more recent study used a parameterized set of contour features varying in angularity, curvature, and orientation (Pasupathy and Connor 1999Go, 2002Go). Among these stimuli, a large fraction of V4 neurons are tuned for angled or curved contour features.

These previous studies agree that single V4 neurons are tuned for multiple orientations and show position invariance, but they differ in their specific conclusions about shape selectivity in V4: Are V4 neurons tuned for non-Cartesian gratings, simple objects, or curved and angled contour elements? The most likely explanation is that, to some extent, V4 neurons are tuned for all of these patterns. Different studies have used different, limited stimulus sets to test specific hypotheses about shape coding in V4, and most have not systematically compared responses between classes of stimuli (but see Gallant et al. 1996Go). An experiment that uses a limited stimulus set can maximize statistical power for testing a specific hypothesis about tuning, but the conclusions that can be drawn about underlying mechanisms are ambiguous. The observed tuning might actually reflect tuning along untested dimensions correlated with those tested, and the observed tuning reveals nothing about tuning along dimensions orthogonal to those tested. This uncertainty can be resolved only with a general model whose scope is not restricted to a limited stimulus set (Wu et al. 2006Go).

We hypothesized that a single functional model, the spectral receptive field (SRF), can explain previous observations of shape tuning in V4. The SRF accounts for second-order nonlinear response properties, describing tuning in terms of the orientation and spatial frequency power spectrum, independent of spatial phase (Bredfeldt and Ringach 2002Go; David and Gallant 2005Go; Mazer et al. 2002Go). The power spectrum is a basic feature of all visual stimuli; thus the scope of the SRF is not limited to a particular stimulus set. Independence from spatial phase introduces a nonlinearity that enables the SRF to describe spectral tuning, even for neurons with position-invariant responses.

To determine whether the SRF provides an effective general description of shape selectivity in V4, we recorded the responses of single V4 neurons to a large set of natural images and estimated the SRF of each neuron using linearized reverse correlation (David and Gallant 2005Go; Theunissen et al. 2001Go; Wu et al. 2006Go). We then used the SRFs to predict how each neuron would respond to stimuli used in the studies described above (i.e., Cartesian and non-Cartesian gratings and curved-contour elements). Across the entire set of V4 SRFs, we observed a pattern of selectivity for non-Cartesian gratings and curved contours consistent with the conclusions of experiments using synthetic stimulus sets. Inspection of estimated SRFs revealed mechanisms that may underlie this selectivity for complex features. We performed a similar analysis using primary visual cortex (V1) neurons. Neurons in V1 have consistently simpler spectral tuning and are selective for Cartesian gratings rather than for the other stimulus classes. Therefore the tuning we observe in area V4 is an emergent property of the extrastriate cortical network.


    METHODS
 TOP
 ABSTRACT
 INTRODUCTION
 METHODS
 RESULTS
 DISCUSSION
 GRANTS
 ACKNOWLEDGMENTS
 REFERENCES
 
Neurophysiological procedures and data acquisition

SUBJECTS AND PHYSIOLOGICAL PROCEDURES. Data were collected from four adult male macaques (Macaca mulatta; two animals used in V4 recordings and two in V1 recordings). All procedures were in accordance with National Institutes of Health and U.S. Department of Agriculture guidelines and were approved by University oversight committees. Details of neurophysiological procedures were previously published (V4: Hayden and Gallant 2005Go; V1: Vinje and Gallant 2002Go). During recording, V4 and V1 neurons were identified on the basis of both stereotaxic coordinates and receptive field properties (e.g., size/eccentricity ratios and latencies; Gallant et al. 1996Go; Gattass et al. 1988Go; Mazer and Gallant 2003Go; Vinje and Gallant 2002Go).

RECEPTIVE FIELD ESTIMATION. The boundaries of each classical receptive field (CRF; specifically, the minimum response field) were measured while each animal performed a passive fixation task. Bars, Cartesian gratings, and non-Cartesian gratings were presented under manual control to determine basic receptive field properties (Mazer and Gallant 2003Go; Vinje and Gallant 2002Go). Receptive field size, shape, and location were confirmed by reverse correlation using a dynamic sequence of small white, black, and textured squares flashed randomly in and around the CRF (V1: 72 Hz; V4: 10 Hz; Hayden and Gallant 2005Go; Vinje and Gallant 2002Go). CRF diameter was defined to be the diameter of a circle circumscribing the minimum response field. In the few cases where manual and automated estimates disagreed, the estimate from the automated procedure was used. V4 CRFs were centered 3–8° from the fovea (median 5.6°) and ranged from 5 to 10° in diameter (median 10.2°). V1 CRFs were centered 0.9–12° from the fovea (median 2.2°) and ranged from 0.3 to 3.0° in diameter (median 0.65°).

STIMULI. Stimuli were circular natural image patches cut out of black and white photos (Corel). Images were chosen at random by an automated algorithm that favored images with broad spatial frequency spectra. For V4 data, the size of each image patch corresponded to the measured CRF size. For V1 data, the size of each image patch ranged from two to four times the CRF diameter. In both cases, the outer 10% of each image was blended linearly into the mean-luminance gray background.

BEHAVIORAL TASK. Neuronal activity was recorded from single V4 neurons of two animals while they performed a delayed match-to-sample task (Hayden and Gallant 2005Go). Each trial was initiated when the animal grabbed a capacitive touch bar. A fixation spot then appeared at the center of the display. The animal was required to acquire and maintain fixation for the duration of the trial (fixation window radius, 0.5°). A feature cue and a spatial cue then appeared simultaneously for 150–600 ms. The feature cue was the target for that trial, a natural image the size of the CRF, centered at the fixation point. The spatial cue was a small red line (<1°) superimposed on the edge of the feature cue nearest the stream to be attended. After an 850-ms blank delay, two stimulus streams appeared simultaneously: one in the CRF and the other in the opposite hemifield at the same distance from the fovea, 180° away from the first. Images appeared at a constant rate (3.5–4.5 Hz, varying across cells), and there was no blank interval period between successive images. The target image appeared 4–10 s after the onset of the image stream. To receive reward animals had to release the touch bar within 1 s after the onset of the target in the attended stream. Incorrect trials were aborted immediately after broken fixation or early bar release. Only data from correct trials (95%) were included in the analysis. Four attention conditions were constructed by crossing two spatial conditions (attend in and attend out) with two feature conditions (search for target A, search for target B). The data presented in this report were obtained by averaging across all four attention conditions. Responses to target stimuli were excluded from the data because of their behavioral relevance. This ensured that receptive field estimates reflected only visual tuning and were not influenced by attention.

Neuronal activity was recorded from area V1 of two different animals while they performed a fixation task (fixation duration, 5 s; fixation window radius, 0.35°), with no explicit manipulation of attention. While the animal fixated, a sequence of natural image patches was presented at 60 Hz in the receptive field of an isolated neuron. Only data from periods when fixation was successfully maintained were included in the analysis.

Based on previous studies of V1, the difference in stimulus presentation rates for recordings in V4 (3.5–4.5 Hz) and V1 (60 Hz) should be irrelevant for the current study. Our analysis focused on the spectral tuning of excitatory responses. In area V1, temporal stimulus dynamics do not affect the spectral tuning of excitatory responses (although tuning of inhibitory responses in V1 can depend on stimulus dynamics; see David et al. 2004Go).

By averaging V4 responses across attention conditions, we intended to remove the effects of attention and to preserve only the visual response. We assume that this averaging controls for differences in behavior between the V4 and V1 experiments. However, we cannot exclude the remote possibility that some of the differences in tuning between V4 and V1 neurons that we report here might be caused by differences in behavioral state.

DATA ACQUISITION. Behavioral control, stimulus presentation, and data collection were performed on a Linux workstation using custom software. For V4 data, eye movements were recorded with an infrared eye tracker (RK-801 at 120 Hz, ISCAN, Burlington, MA; or Eyelink II at 500 Hz, SR Research, Toronto, Canada). Eye tracker latency was corrected during subsequent analysis (Gawne and Martin 2000Go). For V1 data, eye movements were measured using a scleral search coil (Riverbend Instruments; Judge et al. 1980Go).

Single-neuron responses were recorded using high-impedance epoxy-coated tungsten microelectrodes (nominal impedance 10–25 M{Omega}, 125-µm diameter, 20–25° taper; FHC, New Brunswick, ME). For V4 data, neuronal signals were acquired using an integrated multichannel recording system (amplification, filtering, and spike detection; MAP, Plexon, Dallas, TX). For V1 data, signals were amplified (AM Systems, Everett, WA), band-pass filtered, and isolated with a hardware window discriminator. Only clearly isolated single units were included in the data set. Spike times were recorded with 0.1-ms resolution and synchronized with the behavioral task and eye recordings.

Spectral receptive field model and estimation procedure

THE FOURIER POWER MODEL. Simple cells in peripheral visual areas can be characterized by a linear spatial receptive field model (DeAngelis et al. 1993Go; Jones and Palmer 1987Go). According to the linear model, the response of a neuron is a weighted sum of stimulus luminance over space and time. However, the linear model cannot be used to characterize V4 neurons because these cells show nonlinear position invariance and visual selectivity does not depend on the precise position of the stimulus in the receptive field (Desimone and Schein 1987Go; Gallant et al. 1996Go). To account for position invariance in V4 we used a nonlinear Fourier power model. According to this model the response of a neuron is a weighted sum of the spatial Fourier power of the stimulus. The map of weightings is called the spectral receptive field (SRF; David and Gallant 2005Go; Theunissen et al. 2001Go).

A visual stimulus, s(x, y, t), can be described in terms of luminance sampled at N x N spatial positions (x, y) and at times t = 1...T. The Fourier power transform of the stimulus s({omega}x, {omega}y, t) is

Formula 1(1)
The value of s at each two-dimensional spatial frequency channel, ({omega}x, {omega}y), indicates how much power is present at a particular orientation and spatial frequency in a single stimulus frame (see Eq. 7, below, for interpretation of spatial frequency channels).

According to the Fourier power model, the response is the inner product of the Fourier power transform of the stimulus and the SRF (Bredfeldt and Ringach 2002Go; David and Gallant 2005Go; Mazer et al. 2002Go)

Formula 2(2)
The response r(t) is the average firing rate during time bin t. The SRF, h({omega}x, {omega}y), describes the weight that should be applied to each Fourier power channel to produce the minimum mean-squared error estimate of the response. The baseline r0 represents the response expected when no stimulus is present. The residual {varepsilon}(t) represents observed deviations from Fourier power model predictions (i.e., unexplained variance). These deviations reflect both unmodeled nonlinear response properties and neuronal noise.

The Fourier power transform linearizes the relationship between stimulus and response. That is, the stimulus is nonlinearly transformed so that a linear model more accurately describes the functional relationship between the transformed stimulus and the response (Aertsen and Johannesma 1981Go; David and Gallant 2005Go; Wu et al. 2006Go). The Fourier power model discards spatial phase but preserves information about stimulus orientation and spatial frequency. It is therefore related to the energy model used to describe complex cells in area V1 (Adelson and Bergen 1985Go). However, the Fourier power model is more general than the energy model because it can account for excitation and inhibition across any number of spatial frequency and orientation channels.

Some receptive field models include an additional nonlinear output term to account for spiking threshold and saturation (Albrecht and Geisler 1991Go; David et al. 2004Go). A sigmoidal output nonlinearity does lead to a modest improvement in the predictive power of the SRFs estimated in this study (data not shown). However, fitting the output nonlinearity has no effect on measurements of orientation and spatial-frequency tuning. Because this study focuses on spectral tuning, the SRFs reported here do not include an output nonlinearity.

Fitting the fourier power model by linearized reverse correlation. We estimated SRFs by linearized reverse correlation of neuronal responses and natural image stimuli. This procedure finds the minimum mean-squared error, linear mapping between the Fourier power transform of the stimulus s({omega}x, {omega}y, t) and the observed response r(t) (David and Gallant 2005Go; Theunissen et al. 2001Go). According to this solution, the SRF is the weighted average of the stimulus and response, normalized by the inverse of the stimulus autocorrelation function Css

Formula 3(3)
The stimulus autocorrelation function measures the correlation between each pair of spectral channels in the stimulus

Formula 4(4)

The autocorrelation function can be represented as a matrix with rows corresponding to spectral channels ({omega}x, {omega}y) and columns corresponding to channels ({omega}x, {omega}y). The inverse autocorrelation function is equivalent to the inverse of this matrix (Theunissen et al. 2001Go).

Normalization by the stimulus autocorrelation in Eq. 3 removes bias arising from the autocorrelation inherent in natural scenes (Field 1987Go; Zetzsche and Barth 1990Go). Although necessary for achieving a minimum mean-squared error estimate of the SRF, normalization can amplify noise at high spatial frequencies, overfitting the SRF to noise in the estimation data. To minimize this effect, we used singular-value decomposition (SVD) to estimate a pseudoinverse of the stimulus autocorrelation function (Theunissen et al. 2001Go). The pseudoinverse forces tuning on spectral dimensions to be zero if the stimulus variance along that dimension is not large enough to reliably estimate its effect on responses. This procedure requires selecting a parameter that determines the noise threshold, which was determined simultaneously with the shrinkage parameter (see following text).

A shrinkage filter was used to further reduce noise in the SRF estimate (Brillinger 1996Go; David and Gallant 2005Go). The shrinkage filter applies a soft threshold to each SRF parameter, based on its signal-to-noise level. Signal-to-noise was defined as the ratio of mean to standard error and was measured using a jackknife procedure: jackknife SRFs, hi({omega}x,{omega}y), i = 1... N = 20, were estimated from subsets of the estimation data set, each excluding a different 5% of the available samples. The mean SRF was computed by averaging over the jackknife estimates, Formula 4({omega}x,{omega}y) = 1/N {sum}i=1N hi ({omega}x,{omega}y), and the standard error of each parameter was measured according to the jackknife theorem (Efron and Tibshirani 1986Go)

Formula 5(5)
The shrinkage filter was applied to the mean SRF to produce the final SRF estimate (Brillinger 1996Go)

Formula 6(6)
Applying the filter requires selecting a parameter {gamma}, that determines the filter threshold. Optimal pseudo-inverse and shrinkage parameters were chosen simultaneously by cross validation (David and Gallant 2005Go). This entire procedure (including cross validation) was completed using only the estimation data set. The validation data set (see below) was reserved only for testing SRF prediction accuracy.

DATA PREPROCESSING. For SRF estimation, each stimulus frame was cropped to an area equivalent to one classical receptive field diameter. Each frame was then smoothed, downsampled to 20 x 20 pixel resolution, and multiplied by a Hanning window (ramped from 1 to 0) to reduce edge artifacts in the Fourier transform. This downsampling procedure preserves spatial frequencies ≤10 cycles per receptive field diameter (cyc/RF). In theory, a more accurate model might be obtained by including higher spatial frequencies. However, natural scenes have relatively low power at high spatial frequencies (Field 1987Go), which makes it difficult to obtain data sets large enough to characterize tuning at high frequencies. For V4 data, the response r(t), evoked by each 3.5- to 4.5-Hz stimulus frame s(x, y, t), was defined as the mean spike rate (spikes/s) from 50 to 250 ms after the onset of the frame.

The stimuli used during V1 recordings had the same spatial statistics as those used for V4, but they were shown much more rapidly (60 Hz). We therefore used a slightly different procedure to estimate SRFs for V1 neurons. First, we estimated a complete spectro-temporal receptive field (STRF) for each neuron by repeating the SRF estimation procedure described above at 13 temporal delays (0–192 ms), with the same 20 x 20 pixel/CRF downsampling as for the V4 data. Separable spectral and temporal receptive fields were then extracted from each STRF by SVD (David et al. 2004Go; Mazer et al. 2002Go). The resulting SRF describes orientation and spatial frequency tuning in the same Fourier power parameter space as that used for V4.

EXCLUSION OF NEURONS WHOSE RECEPTIVE FIELDS COULD NOT BE CHARACTERIZED. The goal of this study was to determine whether the SRF can account for shape selectivity in V4. Therefore we used a cross-validation procedure to exclude neurons whose SRF failed to provide any information about visual response properties. For each neuron, a subset of the stimulus–response data (5%) was reserved before SRF estimation (validation data set). The SRF, including regularization parameters, was estimated using only the remaining 95% of the data (estimation data set). Predicted responses to stimuli in the validation data set were then generated from the SRF using Eq. 2. This procedure was repeated 20 times; each time a different 5% subset of the data were reserved for validation. The 20 predicted responses were concatenated into a single prediction of the entire response. Prediction accuracy was quantified in terms of the correlation (Pearson’s r) between predicted and observed responses. Because we strictly separated the estimation and validation data sets, measurements of prediction accuracy were not biased by overfitting to noise in the data. A neuron was included in further analyses only if its SRF predicted the observed responses in the validation data with greater accuracy than would be expected by chance (P < 0.05) (David et al. 2004Go).

Of the 103 V4 neurons in our original sample, 87 had SRFs that significantly predicted responses in the reserved cross-validation data set. The mean prediction correlation was 0.29 for the entire sample of V4 neurons and 0.32 for the 87 significant cells. [Note that this measurement was not corrected to reflect the noise ceiling on predictions (David and Gallant 2005Go; Wu et al. 2006Go). Thus this value is smaller than the theoretical maximum for the Fourier power model in the absence of noise.] Of 56 V1 neurons in the sample, 45 had SRFs that significantly predicted responses in the cross-validation data set. The mean prediction correlation was 0.33 for the entire sample of V1 neurons and 0.37 for the 45 significant cells. (These figures were also not corrected to reflect the noise ceiling.) Excluding neurons whose SRFs did not predict with significant accuracy did not change any trends in the data reported here, but it did slightly increase the magnitude and significance of some effects.

The correlation coefficient indicates the portion of the response in the validation data set explained by the Fourier power model (David and Gallant 2005Go). The remaining, unexplained portion the response results from two factors: visual tuning properties not described by the Fourier power model and nonvisual influences on the response. The latter category includes noise in the neuronal response and changes in attention state. The effect of nonvisual influences is reduced by averaging across stimulus presentations, but it is unlikely to be removed completely.

Analysis of tuning and selectivity

ORIENTATION AND SPATIAL FREQUENCY TUNING CURVES. To facilitate visualization of neuronal tuning and selectivity, each SRF was transformed from the Fourier power domain to an explicit representation of orientation and spatial frequency. This was accomplished by applying a polar-to-Cartesian transformation to the SRF

Formula 7(7)
Figure 1 shows several image patches that have been transformed into the orientation spatial frequency representation; transformed SRFs are shown in Figs. 24.


Figure 1
View larger version (48K):
[in this window]
[in a new window]
 
FIG. 1. Responses of one V4 neuron to a set of natural images. A: 600 different natural images were presented in the receptive field in random order (4 Hz, 4 repeats). Response strength was calculated by averaging firing rate 50–250 ms after stimulus onset over the 4 repeats. Here, responses are sorted by magnitude and vary from 0 to nearly 100 spikes/s. B: Fourier power transform of a typical natural image, enlarged from C. To highlight distinctive spectral features, the mean power spectrum (averaged over the entire set of natural images) was subtracted from the Fourier-transformed images. Without this subtraction images would be dominated by the 1/f2 spectrum typical of natural images. Red regions indicate power greater than the mean at the corresponding orientation (horizontal axis) and spatial frequency (vertical axis); blue indicates power less than the mean. In this example, the red region centered at 90° indicates high power at horizontal orientations. Image has been normalized so that power ranges between –1 and 1. C: 8 natural images that evoked the strongest responses (shaded bar at left of A) and their Fourier power spectra (2nd row). Stimuli that evoked strong responses tend to have power at orientations of 90 or 150°. D: 8 natural images that evoked moderate responses (bar in middle of A) and their Fourier power spectra. E: 8 natural images that evoked the weakest responses (bar at right of A) and their Fourier power spectra. These stimuli tend to have low power in most orientation and spatial frequency channels.

 

Figure 2
View larger version (29K):
[in this window]
[in a new window]
 
FIG. 2. Spectral receptive field (SRF) of a V4 neuron with bimodal orientation tuning. A: SRF describes selectivity in terms of a joint orientation–spatial frequency tuning surface. Red regions indicate orientations (horizontal axis) and spatial frequencies (vertical axis) correlated with increased responses (i.e., positive gain); blue indicates stimulus channels correlated with decreased responses. Contours mark 1 SD above zero after smoothing. This neuron is excited by patterns with horizontal orientation (90°) over a broad range of spatial frequencies and with oblique orientation (150°) at lower spatial frequencies. Stimulus power at low spatial frequencies (1 cycle/RF) tends to decrease responses. SRF was normalized so that gain ranges between –1 and 1. B: an orientation tuning curve is derived by singular value decomposition of the SRF. This measures marginal tuning of the SRF in A, collapsed along the spatial frequency axis (Mazer et al. 2002Go). Each point shows the relative excitatory or inhibitory contribution of stimuli at the respective orientation, collapsed across spatial frequency. Peak orientation tuning is 129° and orientation bandwidth is 73° (measured by fitting a circular Gaussian function). This neuron has 2 orientation tuning peaks (90 and 150°). Relative magnitude of the peaks is quantified by the bimodal tuning index, computed from the ratio of the height of the smaller peak (d2) to the larger peak (d1). Bimodal tuning index for this neuron is 0.23. C: a spatial frequency tuning curve is derived by the same singular value decomposition, effectively collapsing the SRF in A along the orientation axis. Peak spatial frequency tuning for this neuron is 2.5 cycles/RF and spatial frequency bandwidth is 1.1 octaves.

 

Figure 4
View larger version (20K):
[in this window]
[in a new window]
 
FIG. 4. SRF for a V1 neuron. Axes are as in Fig. 2. A: this neuron has simple spectral tuning and narrow orientation and spatial frequency bandwidth. B: orientation tuning curve (peak, 97°) has narrow bandwidth (29°) and is nearly unimodal (bimodal tuning index, 0.02). C: spatial frequency tuning curve has a peak at 2.5 cycles/RF and a bandwidth of just 0.9 octaves.

 
Tuning curves were obtained from SRFs transformed according to Eq. 1 by SVD (Mazer et al. 2002Go). Orientation and spatial frequency tuning curves [f({theta}) and g({omega}), respectively] were defined as the first eigenvectors of each decomposition matrix. According to the definition of the SVD, the product of these two vectors provides the minimum mean-squared-error estimate of the full, two-dimensional SRF

Formula 8(8)
In Eq. 8, the sign of the orientation and spatial frequency tuning curves is ambiguous. We fixed the sign so that the orientation tuning curve produced a positive inner product with the mean of the SRF after averaging over all spatial frequencies.

COMPARISON OF TUNING PROPERTIES. Several properties of the orientation and spatial frequency tuning curves for each neuron were used to compare spectral tuning across cells. Two common metrics used to describe orientation tuning curves are the peak and bandwidth (i.e., width at half height; Desimone and Schein 1987Go; De Valois et al. 1982bGo). We estimated the peak and bandwidth by fitting a circular Gaussian to the orientation tuning curve obtained for each neuron (Fisher 1993Go); the tuning peak and bandwidth were taken as the mean and width at half-height of the Gaussian, respectively.

Because many V4 neurons had more than one orientation tuning peak we also computed a bimodal tuning index. First we identified the orientations of the two largest peaks in the orientation tuning curve, p1 and p2, where f(p1) > f(p2). Two troughs were then defined as the orientations of the lowest points, t1 and t2, in either direction between the peaks, where f(t1) < f(t2). The bimodal tuning index b was taken as the ratio of the difference between the smaller peak and trough, d2 = f(p2) – f(t2), to the difference between the larger peak and trough, d1 = f(p1) – f(t1)

Formula 9(9)
A neuron with two orientation tuning peaks and troughs of equal size will have a bimodal tuning index value of 1. As the relative size of one peak grows larger, index values grow smaller. Orientation tuning curves with only one peak have an index value of 0.

Spatial frequency peak and bandwidth were measured by fitting a Gaussian function to the spatial frequency tuning curve on a logarithmic scale, g[log ({omega})]. Peak spatial frequency was taken as the peak of the Gaussian fit. Spatial frequency bandwidth was taken as the width of the Gaussian at half-height, divided by peak spatial frequency (De Valois et al. 1982aGo).

SELECTIVITY FOR COMPLEX FEATURES. If the SRF accurately describes response characteristics of V4 neurons then it should predict responses to any stimulus. Previous work showed that V4 neurons are selective for non-Cartesian (polar and hyperbolic) gratings over Cartesian gratings (Gallant et al. 1996Go). To test the SRF model we therefore used estimated SRFs to predict responses to both Cartesian and non-Cartesian gratings.

Cartesian gratings were generated according to the function (Gallant et al. 1996Go)

Formula 10(10)
Each Cartesian grating was described by its orientation {theta}, spatial frequency {omega}, and spatial phase {phi}. Mean luminance L0 and contrast C0 were normalized to match the root mean-square (RMS) contrast of the natural image set used to fit the SRF. Cartesian gratings were generated at 12 orientations, eight spatial frequencies (1.0 to 9.0 cycles per receptive field diameter), and four spatial phases (0, 90, 180, and 270°).

Polar gratings were generated according to the function (Gallant et al. 1996Go)

Formula 11(11)
Each polar grating was described by its radial spatial frequency {omega}r, concentric spatial frequency {omega}c, and spatial phase {phi}. Mean luminance L0 and contrast C0 were normalized to match the RMS contrast of the natural image set used to fit the SRF. Polar gratings were generated at 12 radial frequencies (–5 to 6 cycles per rotation), eight concentric frequencies (1.0 to 9.0 cycles per receptive field diameter), and four spatial phases (0, 90, 180, and 270°).

Hyperbolic gratings were generated according to the function (Gallant et al. 1996Go)

Formula 12(12)
Each hyperbolic grating was described by its orientation {theta}, spatial frequency {omega}, and spatial phase {phi}. Mean luminance L0 and contrast C0 were normalized to match the RMS contrast of the natural image set used to fit the SRF. Hyperbolic gratings were generated at eight orientations (0 to 80°), 12 spatial frequencies (1.0 to 7.0 cycles per receptive field diameter), and four spatial phases (0, 90, 180, and 270°).

In addition to Cartesian and non-Cartesian gratings, we also used the SRFs to predict responses to a large set of 20,000 natural images. This stimulus set was generated using the same procedure as for the neurophysiological experiments (see above). To compare expected responses to those for gratings, each natural image patch was normalized to have the same mean luminance and RMS contrast as the gratings. (Without normalization, a large fraction of response variance can be attributed to variability in stimulus contrast rather than spatial patterns within the stimulus. Stimulus contrast was not normalized in the neurophysiological experiments. For this reason, the variability of responses in the experimental data was greater than that in the predictions; e.g., compare Figs. 1B and GoGo7A.)


Figure 5
View larger version (26K):
[in this window]
[in a new window]
 
FIG. 5. Comparison of V4 and V1 orientation tuning properties. A: histogram of orientation bandwidth for 87 V4 neurons with significant visual tuning (median 74.4°). White bar at right denotes neurons that respond equally well to all orientations and are not orientation tuned. Numbered gray arrows indicate the values of examples shown in Figs. 2, 3, and 7. B: histogram of orientation bandwidth for 45 V1 neurons (median 43.7°). Gray arrow indicates the value of the example in Fig. 4. C: comparison of median orientation bandwidth between V4 and V1 neurons (error bars computed by jackknifing). Median for V4 neurons is significantly higher than that for V1 (P < 0.01). D: histogram of bimodal tuning index for V4 neurons (median 0.09). Black bars indicate neurons with index values significantly greater than zero (P < 0.05, jackknifed t-test). E: histogram of bimodal tuning index for V1 neurons (median 0.01). F: comparison of median bimodal orientation tuning between V4 and V1 neurons. Median bimodal tuning index is significantly greater in V4 than in V1 (P < 0.01).

 

Figure 6
View larger version (27K):
[in this window]
[in a new window]
 
FIG. 6. Comparison of V4 and V1 spatial frequency tuning properties. A: histogram of peak spatial frequency tuning for 87 V4 neurons (median 2.6 cycles per receptive field diameter; cyc/RF). White bars indicate neurons (41, 47%) with spatial frequency tuning that extends above or below the range of spatial frequencies tested. Numbered gray arrows indicate the values of examples shown in Figs. 2, 3, and 7. B: histogram of peak spatial frequency tuning for 45 V1 neurons (median 2.5 cyc/RF). Gray arrow indicates the value of the example in Fig. 4. C: comparison of peak spatial frequency tuning between V4 and V1 neurons. Medians are not significantly different (P > 0.25). D: histogram of spatial frequency bandwidth for V4 neurons (median 1.2 octaves; shading as in A). E: histogram of spatial frequency bandwidth for V1 neurons (median 0.9 octaves). F: comparison of spatial frequency bandwidth between V4 and V1 neurons. Median bandwidth is significantly higher in V4 than in V1 (P < 0.01).

 

Figure 7
View larger version (79K):
[in this window]
[in a new window]
 
FIG. 7. Selectivity for Cartesian gratings, non-Cartesian gratings, and natural images of a single V4 neuron. A: SRF for this neuron indicates broad tuning for orientation (bandwidth 151°) and band-pass tuning for spatial frequency (peak, 3.1 cyc/RF; bandwidth 1.6 octaves). B: predicted responses to large sets of Cartesian gratings, non-Cartesian gratings, and natural images are shown sorted from strongest to weakest. Responses are predicted to be strongest for non-Cartesian gratings (NC; best response 35 spikes/s), nearly as strong for natural images (Nat; best response 34 spikes/s), but significantly weaker for Cartesian gratings (C; best response 27 spikes/s, P < 0.05). C: 5 Cartesian gratings predicted to give the strongest responses (left) and 5 predicted to give the weakest responses (right). Below each image is its Fourier power transform. Spectral energy of the preferred Cartesian gratings is aligned to the excitatory component of the SRF and nonpreferred gratings have power aligned to the high-frequency inhibitory component of the SRF. D: 5 most- and least-preferred non-Cartesian gratings. Strongest predicted responses are to polar and hyperbolic gratings with spectral structure similar to the SRF. E: 5 most- and least-preferred natural images. Preferred natural images tend to have localized, round structures also with spectra similar to the SRF.

 
Predicted responses were generated using the same method as in the cross-validation procedure used for measuring the significance of visual tuning. Test stimuli were cropped, downsampled to 20 x 20 pixels, Hanning windowed, and transformed into the Fourier power domain according to Eq. 1. Predicted responses (spikes/s) were then generated for each SRF according to Eq. 2.

Neurons were grouped according to the stimulus class that evoked the strongest predicted response: Cartesian gratings, non-Cartesian (polar and hyperbolic) gratings, or natural images. The three stimulus classes contained different numbers of exemplars. To ensure that this difference in sampling did not bias estimates of maximum expected response, we normalized responses according to the number of exemplars in each class. The smallest stimulus set was Cartesian gratings, containing 384 distinct patterns; the maximum Cartesian response was defined as the expected response to the single best Cartesian grating. The non-Cartesian grating class contained twice as many patterns (768); the maximum non-Cartesian response was defined as the average of expected responses to the two best non-Cartesian gratings. The natural image class contained 20,000 distinct images; the maximum natural image response was defined as the median of expected response to the 52 best images (0.26%, equivalent to 1/384).

CONTOUR SELECTIVITY. We also used the SRFs to generate predicted responses to a set of curved contours that were used in a previous study of V4 (Pasupathy and Connor 1999Go). Contours were composed of two oriented segments (see GoGoFig. 10A). The length of each segment was fixed to be one half the diameter of the classical receptive field. Segments were joined at one end and separated by an angle of 45, 90, 135, or 180°. The joint between segments was either sharp or smooth. Smooth joints were generated by introducing a spline function between the two segments to produce seven different separation angles (the sharp and smooth 180° contours were the same). Eight absolute orientations were used for each separation angle, giving a total of 42 contour elements.


Figure 8
View larger version (41K):
[in this window]
[in a new window]
 
FIG. 8. Selectivity for Cartesian gratings, non-Cartesian gratings, and natural images for a V4 neuron with bimodal orientation tuning. A: SRF is repeated from Fig. 2 for reference. B: predicted responses to each stimulus class, sorted as in Fig. 7B. Strongest response is predicted for natural images (Nat; best response 50 spikes/s), significantly greater than for non-Cartesian gratings (NC; best response 46 spikes/s, P < 0.05), and Cartesian gratings (C; best response 38 spikes/s, P < 0.05). C: preferred Cartesian grating (Fourier power transform at right) has orientation and spatial frequency matched to the peak of the SRF. D: preferred non-Cartesian grating is a hyperbolic grating with spectral energy that overlaps and extends beyond the excitatory region of the SRF. E: preferred natural image is a curved pattern with strong similarity between its spectral structure and the excitatory tuning of the SRF.

 

Figure 9
View larger version (21K):
[in this window]
[in a new window]
 
FIG. 9. Comparison of selectivity for Cartesian gratings, non-Cartesian gratings, and natural images in V4 and V1 neurons. A: fraction of V4 and V1 neurons selective for each stimulus class. Neurons were categorized according to the class of the stimulus for which their SRF was predicted to give the strongest response. Only 21/87 V4 neurons (24%) are selective for Cartesian gratings, whereas 38 V4 neurons (44%) are selective for non-Cartesian gratings and 28 (32%) are selective for natural images (bars at left). In contrast, the majority of V1 neurons (30/45, 67%) are selective for Cartesian gratings, whereas only 8 neurons (18%) are selective for non-Cartesian gratings and 7 neurons (16%) are selective for natural images (bars at right). Patterns of selectivity observed in V4 and V1 are significantly different (P < 0.01, jackknifed Hotelling’s t-test). B: median orientation bandwidth of V4 neurons selective for each stimulus class. Neurons selective for non-Cartesian gratings have significantly greater bandwidth than neurons selective for either other class (P < 0.01). C: median bimodal tuning index of V4 neurons selective for each stimulus class. Neurons selective for natural images have significantly greater bandwidth than neurons selective for either other class (P < 0.05). D: median peak spatial frequency tuning of V4 neurons selective for each stimulus class. There are no significant differences between classes. E: median spatial frequency bandwidth of V4 neurons selective for each stimulus class. Neurons selective for natural images have significantly greater bandwidth than neurons selective for the other classes (P < 0.01).

 

Figure 10
View larger version (32K):
[in this window]
[in a new window]
 
FIG. 10. Spectral tuning properties explain selectivity for curved contours in V4. A: contours generated by joining 2 line segments with a sharp corner at different separation angles (left) and contours containing the same separation angles but joined by a rounded corner (right). Fourier power transform of each stimulus appears in the 2nd row. Entire stimulus set contained these 7 contours at 8 different absolute orientations. Neurons were categorized according to the separation angle (45, 90, 135, or 180°) and corner (sharp or smooth) of the contour that produced the strongest predicted response from the SRF at any absolute orientation. B: median orientation bandwidth for neurons in each contour category. Error bars: 1 SE (estimated by jackknifing). Orientation bandwidth is negatively correlated with preferred separation angle (r = 0.61, P < 0.01). C: median bimodal tuning index for neurons in each contour category. Bimodal tuning index is greater for neurons that prefer sharp corners (median 0.15) than neurons that prefer round corners (median 0.05, P < 0.01).

 
SIGNFICANCE TESTING. Unless otherwise specifically mentioned, we used a jackknifed t-test to verify the statistical significance of our findings (Efron and Tibshirani 1986Go). In many cases, a traditional t-test is sufficient to determine whether two mean values are significantly different. However, this test assumes that individual measurements follow a Gaussian distribution, and estimates of SE will be biased if the distributions are not Gaussian. The jackknifed t-test uses a bootstrapping procedure that avoids potential bias from non-Gaussian distributions in measurements of SE.

One situation in which a non-Gaussian distribution can be particularly problematic is when the sampled values lie near a hard boundary. We encountered this problem when testing the significance of the bimodal tuning index. If each jackknife estimate of the tuning index is generated independently, the distribution used to compute the SE will be biased toward positive values. This bias leads to artifactually small estimates of SE and can cause some neurons to appear to have significant bimodal tuning when they do not. To avoid this problem, we fixed the position of the peaks (p1 and p2) and troughs (t1 and t2) according to the orientation tuning curve averaged across jackknife estimates. Index values measured from the individual jackknifed tuning curves could then fluctuate below zero, leading to unbiased SE estimates.


    RESULTS
 TOP
 ABSTRACT
 INTRODUCTION
 METHODS
 RESULTS
 DISCUSSION
 GRANTS
 ACKNOWLEDGMENTS
 REFERENCES
 
Diversity of spectral tuning properties among V4 neurons

We characterized the spectral tuning properties of 103 V4 neurons in two animals while they performed a delayed match-to-sample task. The stimuli were sequences of natural image patches selected at random from a large image database (see examples in Fig. 1) and flashed in the receptive field at a rate of 3.5–4.5 Hz.

Figure 1A shows the responses of one V4 neuron to 600 distinct natural images, sorted by response magnitude. The visual response is defined as the firing rate 50–250 ms after stimulus onset (averaged over four presentations). For this neuron, responses range from 0 to nearly 100 spikes/s. The eight natural images that evoke the strongest responses are shown in the top row of Fig. 1C. Most of these images contain contours with either horizontal or oblique orientations (90–150°). The images that evoke average or weak responses (Fig. 1, D and E, respectively) have little in common with each other, although the least-preferred stimuli tend to have very low contrast.

This neuron responds most strongly to images with salient horizontal or oblique contours, but the precise spatial position of the contour does not appear to be important (Fig. 1C, top row). This is consistent with previous studies reporting that the responses of V4 neurons are often position and phase invariant (Desimone and Schein 1987Go; Gallant et al. 1996Go). Therefore the patterns that evoke large responses from this neuron might be clearer if we discard information about the precise spatial position of image features while preserving information about orientation and spatial frequency. One efficient way to do this is to compute the Fourier power spectrum of each image patch, as illustrated in Fig. 1B. After transformation into the Fourier power domain, each stimulus channel indicates the relative energy at a single orientation and spatial frequency in the original image, regardless of spatial position or phase. The Fourier power spectra of the effective images for this neuron have consistent peaks at orientations between 90 and 150° (Fig. 1C, bottom row).

It is often difficult to determine the response characteristics of a neuron by simply examining effective and ineffective image patches. A better way to summarize the response properties of a single cell is to estimate the stimulus–response mapping function (Wu et al. 2006Go). We used linearized reverse correlation to estimate the spectral receptive field (SRF), a function that describes the mapping from the Fourier power transformation of the stimulus to the neural response (David and Gallant 2005Go; Theunissen et al. 2001Go). The SRF describes concisely which orientations and spatial frequencies tend to evoke responses. Figure 2A gives the SRF computed for the data in Fig. 1. Spectral domains shown in red indicate orientations and spatial frequencies that evoke strong responses (i.e., excitatory spectral channels); blue domains indicate orientations and spatial frequencies that suppress responses (i.e., inhibitory spectral channels). Consistent with the data in Fig. 1, the SRF reveals that this neuron is excited both by horizontal orientations and by oblique orientations near 150°. Furthermore, the SRF reveals that this neuron is sensitive to a higher and broader range of spatial frequencies at 90 than at 150°.

To visualize spectral tuning properties more clearly we extracted orientation and spatial frequency tuning curves from the SRF (Fig. 2, B and C, respectively; see Mazer et al. 2002Go). We measured three properties of orientation tuning: orientation peak, bandwidth, and bimodal tuning. The orientation tuning peak of the neuron illustrated in Fig. 2 is 129° and its orientation bandwidth is 73°. This neuron (and many others in our sample) has two distinct peaks in its orientation tuning curve. To measure bimodal orientation tuning we used a bimodal tuning index (Eq. 8); index values near 1.0 indicate that the secondary peak has the same height (measured between the shorter peak and shallower trough) to the primary peak and values near 0 indicate just a single peak in the orientation tuning curve. For this neuron the bimodal tuning index is 0.23, indicating that the secondary peak is 23% of the height of the primary peak. We also measured two properties of spatial-frequency tuning: peak and bandwidth. For this neuron peak spatial frequency tuning is 2.5 cycles per receptive field diameter (cyc/RF; or 0.31 cycles per degree, cyc/deg) and bandwidth is 1.1 octaves.

Some V4 neurons have simpler spectral tuning properties. Figure 3 shows the SRF of one V4 neuron whose orientation tuning profile resembles that typically encountered in area V1 (De Valois et al. 1982bGo). The orientation tuning of this neuron is unimodal (bimodal tuning index, 0.01), its orientation peak is 143°, and its orientation bandwidth is 29°. However, the same is not true for spatial frequency tuning. This neuron has a spatial frequency bandwidth of 1.7 octaves, substantially higher than that typically reported for V1 (De Valois et al. 1982aGo).


Figure 3
View larger version (30K):
[in this window]
[in a new window]
 
FIG. 3. SRF for a V4 neuron with narrow orientation tuning. Axes are as in Fig. 2. A: this neuron is excited by a relatively narrow range of orientations but a broad range of spatial frequencies. B: orientation tuning curve (peak, 143°) has narrow bandwidth (29°) and nearly unimodal tuning (bimodal tuning index, 0.01). C: spatial frequency tuning curve has a peak at 3.1 cycles/RF and a bandwidth of 1.7 octaves.

 
Comparison of V4 and V1 spectral tuning

We compared the tuning properties of our sample of V4 neurons to those of 45 neurons in primary visual cortex (V1), where spectral tuning properties are better understood (David et al. 2004Go). Neurons in V1 generally have much narrower and simpler spectral tuning than V4 neurons. One V1 SRF is shown in Fig. 4. The orientation tuning peak is 97°, orientation bandwidth is 29°, and tuning is nearly unimodal (bimodal tuning index, 0.02). The spatial frequency tuning peak is 2.5 cyc/RF and the spatial frequency bandwidth is 0.9 octaves.

Across our sample of 103 V4 neurons, 87 (84%) had significant spectral tuning, and only this subset was used for comparison. Neurons without significant tuning either gave visual responses that could not be described by the Fourier power model or gave responses dominated by noise or other nonvisual inputs (see METHODS for selection criteria). Excluding these neurons increased the significance of some effects across the population but did not affect any trends.

Figure 5 compares orientation tuning properties in V4 and V1. The orientation bandwidth of V4 neurons varies widely and the median is 74.4° (Fig. 5A). A few V4 neurons (6%, 5/87) do not have measurable orientation tuning and instead respond equally to all orientations (white bar in Fig. 5A). In contrast, the median bandwidth across the sample of V1 neurons is just 43.7° (Fig. 5B), significantly lower than that in V4 (P < 0.01, Fig. 5C). Only a small number (5/45) of V1 neurons have orientation bandwidths >90°. These values are comparable to those reported in previous studies of V4 (Desimone and Schein 1987Go) and V1 (De Valois et al. 1982bGo; Ringach et al. 2002Go) that used sinusoidal gratings.

As noted above (see Fig. 2), many V4 neurons in our sample have bimodal orientation tuning. Across the sample, the median bimodal tuning index for V4 neurons is 0.09 (Fig. 5D). Of these neurons, 28% (24/87) have a bimodal tuning index significantly greater than zero (P < 0.05; black bars in Fig. 5D). In contrast, the median bimodal tuning index in V1 is only 0.01 and only 11% (5/45) of V1 neurons have significant bimodal tuning (P < 0.05; Fig. 5E). The median bimodal tuning index for V1 neurons is significantly lower than that for V4 (P < 0.01; Fig. 5F).

Figure 6 compares spatial frequency tuning properties of V4 and V1 neurons. The median peak spatial frequency in V4 and V1 is not significantly different when measured in cycles per receptive field (cyc/RF), although they are likely to differ when measured in cycles per degree (see following text). The median is 2.6 cyc/RF in V4 (Fig. 6A) and 2.5 cyc/RF in V1 (Fig. 6B; P > 0.25, see Fig. 6C). Despite being similar to V1 on average, peak spatial frequency tuning varies more widely in V4, from <1 cyc/RF to over 6 cyc/RF. In contrast, the tuning of most V1 neurons falls between 2.0 and 3.5 cyc/RF. These spatial frequency tuning properties are similar to those reported in previous studies of V4 (Desimone and Schein 1987Go) and V1 (De Valois et al. 1982aGo) that used sinusoidal gratings.

We also observed substantial differences in spatial frequency bandwidth between V4 and V1 neurons. The median spatial frequency bandwidth in V4 is 1.2 octaves (Fig. 6D), which is significantly greater than the median of 0.9 octaves in V1 (Fig. 6E; P < 0.01, see Fig. 6F). In fact, nearly half of the V4 neurons in our sample (41/87, 47%) have spatial frequency tuning curves that extend outside the range of our analysis, compared with only about one fifth of V1 neurons (10/45, 22%; white bars in Fig. 6; SRFs were estimated over 1–10 cyc/RF). For neurons whose spatial frequency tuning extends beyond the tested range, bandwidth could be substantially broader than measured. Because these neurons are more common in V4, the true difference in bandwidth between areas is likely to be even larger than our data suggest.

In this report spatial frequency tuning was measured in cycles per receptive field rather than cycles per degree. Because the spatial extent of receptive fields is much larger in V4 than in V1 (Gattass et al. 1988Go), the median peak spatial frequency data suggest that the V4 neurons in our sample have a substantially lower peak spatial frequency than the V1 neurons when measured in cycles per degree. However, V4 and V1 neurons were sampled at different eccentricities with different cortical magnification factors, so a direct comparison is not possible. In any case, the possibility that V4 neurons may have lower peak spatial frequency tuning does not imply that high spatial frequency information is absent from their responses. Instead, high spatial frequency information appears to be integrated into the responses of neurons with large bandwidth that spans both high and low spatial frequencies. The increased bandwidth of V4 neurons enables a representation of visual features that integrates over a wide range of spatial frequencies, rather than the band-limited representation in V1.

Spectral tuning properties and feature selectivity

Previous studies of shape representation in V4 characterized neuronal tuning using restricted stimulus sets such as non-Cartesian polar and hyperbolic gratings (Gallant et al. 1993Go, 1996Go), curved contours (Pasupathy and Connor 1999Go), and combinations of simple shape elements (Kobatake and Tanaka 1994Go). Because each of these studies probed a different part of shape parameter space it is difficult to draw any general conclusions from them about shape representation in V4. The SRF may provide a solution to this problem. Any visual stimulus can be described in terms of its orientation and spatial frequency spectrum, and responses to different spatial patterns can be interpreted in terms of the SRF.

To test the generality of SRFs, we used the SRF estimated for each V4 neuron in our sample to predict responses to both natural images and to synthetic stimuli that had been used in previous studies. We tested a stimulus set that was much larger than could be used in any actual physiology experiment. This consisted of 384 Cartesian gratings, 786 non-Cartesian (polar and hyperbolic) gratings, 20,000 random natural images, and 56 curved contour features. Selectivity for natural images and non-Cartesian gratings is described in this section; results obtained with curved contour features are presented in the following section.

Predictions for a representative V4 neuron are shown in Fig. 7. This neuron has very broad orientation tuning (bandwidth 151°; SRF shown in Fig. 7A) and is band-pass for spatial frequency (peak, 3.1 cyc/RF; bandwidth 1.6 octaves). In the experimental data, the average response of this neuron was 24 spikes/s. Based on its spectral tuning, this neuron is predicted to respond most strongly to non-Cartesian gratings (best response 35 spikes/s; Fig. 7B). This response is slightly, but not significantly, greater than the best predicted response to natural images (34 spikes/s) and significantly greater than the best predicted response to Cartesian gratings (27 spikes/s, P < 0.05). (Best predicted responses are normalized for stimulus class size; see METHODS.) The members of each stimulus class predicted to evoke the five strongest and five weakest responses are shown in Fig. 7, CE. Stimuli whose spectral power is matched to the excitatory domain of the SRF should evoke the strongest responses (bottom row of each panel). The orientation and spatial frequency of the best Cartesian gratings are aligned to the peak excitatory region of the SRF (Fig. 7C), but their orientation bandwidths are much narrower than the SRF bandwidth. The most effective non-Cartesian gratings (Fig. 7D) and natural images (Fig. 7E) have broad orientation bandwidth that more closely matches the excitatory domain of the SRF.

Responses predicted for a different V4 neuron are shown in Fig. 8 (SRF repeated from Fig. 2). In the experimental data, the average response of this neuron was 26 spikes/s. This neuron is predicted to give a significantly stronger response to natural images (best response 50 spikes/s; Fig. 8B) than to either non-Cartesian gratings (46 spikes/s, P < 0.05) or Cartesian gratings (38 spikes/s, P < 0.05). The stimulus predicted to evoke the strongest response from each class is shown in Fig. 8, CE. As in the previous example, the spectral energy of the best Cartesian grating is aligned to the excitatory region of the SRF, but the narrow bandwidth does not match the broad, bimodal orientation tuning of the SRF (Fig. 8C). The most effective non-Cartesian, hyperbolic grating has a power spectrum that matches the SRF more closely but spans a much wider range of orientations than the excitatory domain of the SRF (Fig. 8D). The most effective natural image has a power spectrum that matches the bimodal structure of the excitatory SRF even more closely and so should evoke the largest response (Fig. 8E).

We classified each neuron according to the stimulus class predicted to evoke the strongest response and compared the fraction of neurons preferring each stimulus class (Fig. 9A). Cartesian gratings are predicted to be the most effective stimuli for only one quarter of the V4 neurons (21/87, 24%). For only four of these neurons, the best response to Cartesian gratings is significantly greater than that to either other stimulus class (P < 0.05). In contrast, non-Cartesian gratings should evoke the largest response from almost half of the V4 neurons (38/87, 44%; 13 significantly greater than either other class, P < 0.05). Natural images should evoke the largest response from the rest (28/87, 32%; one significant, P < 0.05). We evaluated other measures of selectivity (the difference between maximum and minimum response; sparseness of responses; Vinje and Gallant 2000Go) and found similar results (data not shown).

We used the same procedure to evaluate shape selectivity in our sample of 45 V1 neurons (Fig. 9A). In this case, we observed a much different pattern of selectivity. Cartesian gratings are predicted to be the most effective stimuli for the majority of V1 neurons (27/45, 60%). For 18 of these neurons, the predicted best response to Cartesian gratings is significantly greater than that to either other stimulus class (P < 0.05). Non-Cartesian gratings and natural images should each evoke the largest response from only a minority of V1 neurons (non-Cartesian: 7/45, 16%, two significantly greater than either other class, P < 0.05; natural images, 11/45, 24%, three significant, P < 0.05). The distribution of preferred stimulus class predicted across the sample of V1 neurons is significantly different from the distribution across V4 neurons (P < 0.01, jackknifed Hotelling’s t-test).

Our analysis of shape selectivity demonstrates that differences in spectral tuning properties between V4 and V1 neurons are sufficient to explain the selectivity for complex patterns observed only in V4 neurons. To determine which aspects of spectral tuning might influence stimulus selectivity, we compared the tuning properties of V4 neurons classified according to the predicted best stimulus. The orientation bandwidth of neurons in the non-Cartesian class (median 129°) is significantly broader than that of cells in the Cartesian and natural image classes (Cartesian median bandwidth: 39°; natural image bandwidth: 55°; P < 0.01; Fig. 9B). In contrast, the bimodal tuning index is higher for the neurons in the natural image class (median 0.14) than for those in the Cartesian and non-Cartesian classes (median non-Cartesian index: 0.08; Cartesian index: 0.05; P < 0.01; Fig. 9C). There are no significant differences in peak spatial frequency tuning between the three classes of neuron (Fig. 9D). However, spatial frequency bandwidth is significantly greater for neurons in the natural image class (median 1.8 octaves) than that for those in the other two classes (median Cartesian bandwidth: 0.98 octaves; non-Cartesian bandwidth: 1.2 octaves; P < 0.05; Fig. 9E). Thus the selectivity for non-Cartesian gratings and natural images observed in V4 (Gallant et al. 1993Go; Kobatake and Tanaka 1994Go) can be explained by broad tuning bandwidth and complex orientation tuning profiles, properties that appear in V4 SRFs but not V1 SRFs.

The selectivity analysis presented thus far is based on simulations in which stimuli were centered in the receptive field. In V4, visual selectivity is invariant to changes in stimulus position on the order of one-half receptive field diameter (Gallant et al. 1996Go). The Fourier power model is invariant to small changes in position and thus should explain this invariance. However, if the spectral structure of a stimulus varies across space, selectivity could be affected by large spatial offsets. To address this issue we repeated the comparison of selectivity for V4 neurons with an expanded stimulus set. In the expanded set, stimuli of all three classes were positioned either in the receptive field center or offset by one-half receptive field diameter (horizontally, vertically, and diagonally). The pattern of selectivity within the expanded stimulus set (neurons preferring Cartesian gratings: 13/87, 15%; non-Cartesian gratings: 44/87, 51%; natural images: 30/87, 34%) is not significantly different from the distribution for the original set, in which stimuli appeared only in the receptive field center (P > 0.5, jackknifed Hotelling’s t-test). Thus the pattern of selectivity predicted by the Fourier power model does not depend on the position of the stimulus in the receptive field.

Selectivity for curved-contour features in V4

One previous study of shape selectivity in area V4 used stimuli constructed by joining two oriented line segments in a sharp corner or curve (Pasupathy and Connor 1999Go). That study reported that many V4 neurons are selective for the angle separating the contour components and for the sharpness of the cor