Journal of Neurophysiology

Responses of V1 Neurons to Two-Dimensional Hermite Functions

Jonathan D. Victor, Ferenc Mechler, Michael A. Repucci, Keith P. Purpura, Tatyana Sharpee


Neurons in primary visual cortex are widely considered to be oriented filters or energy detectors that perform one-dimensional feature analysis. The main deviations from this picture are generally thought to include gain controls and modulatory influences. Here we investigate receptive field (RF) properties of single neurons with localized two-dimensional stimuli, the two-dimensional Hermite functions (TDHs). TDHs can be grouped into distinct complete orthonormal bases that are matched in contrast energy, spatial extent, and spatial frequency content but differ in two-dimensional form, and thus can be used to probe spatially specific nonlinearities. Here we use two such bases: Cartesian TDHs, which resemble vignetted gratings and checkerboards, and polar TDHs, which resemble vignetted annuli and dartboards. Of 63 isolated units, 51 responded to TDH stimuli. In 37/51 units, we found significant differences in overall response size (21/51) or apparent RF shape (28/51) that depended on which basis set was used. Because of the properties of the TDH stimuli, these findings are inconsistent with simple feedforward nonlinearities and with many variants of energy models. Rather, they imply the presence of nonlinearities that are not local in either space or spatial frequency. Units showing these differences were present to a similar degree in cat and monkey, in simple and complex cells, and in supragranular, infragranular, and granular layers. We thus find a widely distributed neurophysiological substrate for two-dimensional spatial analysis at the earliest stages of cortical processing. Moreover, the population pattern of tuning to TDH functions suggests that V1 neurons sample not only orientations, but a larger space of two-dimensional form, in an even-handed manner.


It is remarkable that a predictively accurate account of the responses of primary visual cortex (V1) neurons remains elusive, despite several decades of quantitative study (Olshausen and Field 2004). These studies used a multitude of simple stimuli, including bars (Hubel and Wiesel 1959, 1968; Kagan et al. 2002; Movshon et al. 1978a,b; Sun and Bonds 1994; gratings (Anderson et al 2001; Bonds 1989; De Valois et al. 1979; Jagadeesh et al. 1997; Kagan et al. 2002a; Movshon et al. 1978a,b; Ringach et al. 1997a), annuli (Jones et al. 2001), Gabor functions (Bauer and Heinze 2002), random or pseudorandom noise, both dense and sparse (Chen et al. 1993; Hirsch et al. 1998; Jones and Palmer 1987; Palmer and Davis 1981; Reid et al. 1997), other geometric stimuli (Conway and Livingstone 2003; De Valois et al. 1979; Hammond and MacKay 1975; Mechler et al. 2002; Pollen et al. 1988; Purpura et al. 1994; Skottun et al. 1991a; Smith et al. 2002 and natural scenes (David et al. 2004; Ringach et al. 2002; Smyth et al. 2003; Vinje and Gallant 2002; Willmore and Smyth 2003). It is generally thought that response properties of at least some V1 cells can be accounted for by a linear filter, perhaps followed by a static nonlinearity such as a firing threshold, as reviewed by Simoncelli et al. (2004). However, such a linear–nonlinear (LN) model is recognized to be incomplete even for classic simple cells. The LN model's failure to predict responses to stimuli outside the set used to specify the model is usually attributed to modulatory influences such as gain controls and other influences from the nonclassical receptive field (Freeman et al. 2001; Heeger 1992a; Ohzawa et al. 1982; Sceniak et al. 1999, 2002; Smyth et al. 2003). For complex cells, energy models (Adelson and Bergen 1985) and their variants (David et al. 2004; Rust et al. 2003, 2005; Touryan et al. 2005) have been proposed to account for the relative lack of phase dependency of responses and for their onoff character.

Deviations between responses predicted from simple geometric stimuli and measured responses can be particularly prominent for natural scenes (David et al. 2004; Smyth et al. 2003). However, it is unclear whether these prediction failures are specific to natural scenes or, rather, reflect a more general failing of LN and energy models derived from simple stimuli. The latter might become apparent if neurons were examined with stimuli outside the usual analytic stimuli used to specify models. The usual analytic stimuli fall into three classes: uniform in space but localized in spatial frequency (e.g., gratings), localized in space but broadband (e.g., spots, bars, and edges), or uniform in space and broadband (e.g., spatiotemporal white noise). Additionally, standard analytic stimuli are typically unstructured in space (e.g., white noise) or structured along a single dimension, with a single dominant orientation (e.g., bars and gratings). In contrast, “features” are typically localized both in space and in spatial frequency (Morrone and Burr 1988). Moreover, some aspects of natural visual scenes, such as T-junctions, have two-dimensional structure and multiple orientations.

With these considerations in mind, we studied the responses of V1 neurons to another set of analytic visual stimuli, the two-dimensional Hermite functions (TDHs), shown in Fig. 1. These functions are localized in space and spatial frequency, in a manner that is precisely intermediate between the extremes of points (localized in space, uniform in spatial frequency) and gratings (uniform in space, localized in spatial frequency). The formal sense in which these functions achieve joint localization in space and spatial frequency (Victor and Knight 2003) is distinct from the sense of joint localization that leads to Gabor functions (Daugman 1985; Gabor 1946; Marcelja 1980), and does not require consideration of complex-valued profiles (Klein and Beutter 1992; Stork and Wilson 1990). Gabor functions optimize localization in space and spatial frequency in the sense that they minimize the product of the variances of the distribution of spatial sensitivity profile and its Fourier transform. TDH functions optimize localization in space and spatial frequency in the sense that their spatial profile is minimally altered by truncation of its power spectrum and windowing in space.

FIG. 1.

Two-dimensional Hermite (TDH) functions used in these experiments. Each family (Cartesian, left; polar, right) forms an orthonormal basis for 2-dimensional patterns and increases gradually in spatial extent and bandwidth as rank (row) increases. For the Cartesian functions, the indices j and k specify the number of zero-crossings along the x- and y-coordinates. Each index is constant along a set of parallel lines, as indicated by the arrows. Rank of a Cartesian function is equal to j + k. For the polar functions, the index ν specifies the number of zero-crossings along each radius and is constant along the inverted “vees” that begin at the bottom right, peak along the middle of the array, and then continue to the bottom left. Index μ specifies the number of zero-crossings along concentric circles and is constant along vertical lines as indicated by the down-pointing arrows. Rank of a polar function is equal to μ + 2v; the “cosine” and “sine” halves of the array contain the functions whose dependency on polar angle θ is given by cos (μθ) and sin (μθ), respectively, where θ is measured clockwise from the horizontal (x-) axis. Midline of the polar array contains the functions that are independent of θ.

One consequence of the difference between the defining characteristics of Gabor functions and TDHs is that the latter [and their one-dimensional analogs, used previously in psychophysical (Yang and Reeves 2001) and VEP (Yang and Reeves 1995) studies] are readily organized into discrete orthogonal basis sets. Although a continuum of basis sets exist, we focus on basis sets that have Cartesian or polar symmetry. Gabor functions do not form basis sets in any natural way.

The two-dimensional structure of the TDHs depends on the choice of the basis set, but all basis sets are equated in contrast, spatial spread, and power spectra. Thus, the TDHs share with standard stimulus sets the ability to reconstruct linear receptive fields (because they form basis sets), but also can distinguish between the effects of two-dimensional structure and the effects of context-dependent modulation because they are equated for contrast, spatial spread, and power spectrum, yet differ in two-dimensional structure.

We find that the linear-static nonlinear picture fails to account for responses to TDH functions in the majority of V1 neurons. Rather, the apparent shape and strength of the reconstructed receptive field depends on the choice of TDH basis set. As described below, these failures are often striking and qualitative and are present in all cortical laminae. The analytic properties of the TDH functions also allow our data to exclude a wide class of generalized energy models as the source of these discrepancies. Moreover, because of these analytic properties, it is difficult to account for these discrepancies on the basis of modulatory influences. Rather, the findings suggest that our current picture of V1 receptive fields may be limited by the relatively simple kinds of stimuli typically used to investigate them. Secondarily, the parameterization of visual form provided by the TDH functions provides a new insight into the diversity of spatial selectivities of V1 neurons: the uniform coverage of orientation space (Blasdel 1992; Dragoi et al. 2000; Sirovich and Uglesich 2004) may be part of a more general coverage of a larger space of local form. Portions of this material were presented at the annual meetings of the Vision Sciences Society (2004) and Society for Neuroscience (Victor et al. 2004a,b).


Our methods for animal preparation, visual stimulation, and recording have been previously described in detail (Aronov 2003; Mechler et al. 2002); we summarize them here. All animal procedures were performed in accordance with NIH and local IACUC standards.

Physiologic preparation

Recordings were made after initial atropine [0.04 mg, administered intramuscularly (im)], anesthesia with ketamine 10 mg/kg im (cats) or telazol 2–4 mg/kg im (macaques), and placement of an endotracheal tube and catheters in both femoral veins, one femoral artery, and the urethra. During recording, anesthesia was maintained with propofol and sufentanil (mixture containing 10 mg/ml of propofol and 0.25 μg/ml sufentanil, initially at 2 mg · kg−1 · h−1 propofol then titrated) and neuromuscular blockade was provided by vecuronium 0.25 mg/kg intravenous (iv) bolus, 0.25 mg · kg−1 · h−1 iv. Heart rate and rhythm, arterial blood pressure, body temperature, end-expiratory pCO2, arterial oxygen saturation, urine output, and EEG were monitored during the course of the experiment. Animal maintenance included intravenous fluids (lactated Ringer solution with 5% glucose, 2–3 cm3 · kg−1 · h−1), administration of supplemental O2 every 6 h, antibiotics (procaine penicillin G 75,000 U/kg im prophyllactically, gentamicin 5 mg/kg im daily if evidence of infection), application of 0.5% bupivicaine to wounds, and ocular instillation of atropine 1% and flurbiprofen 2.5% (and, for cats, Neosynephrine eyedrops 10% to retract the nictitating membranes), dexamethasone (1 mg/kg im daily), and periodic cleaning of the contact lenses. With these measures, the preparation remained physiologically stable for 2 or 3 days (cats) and 4 or 5 days (macaques).


After a craniotomy near P3, L1 (cats) or P15, L14 (macaques), a tetrode (Thomas Recording, Giessen, Germany), coated with DiI (Molecular Probes, Eugene, OR) to aid subsequent localization of the track, is inserted through a small durotomy. Once spiking activity from one or more units is encountered, the region of the receptive field(s) is hand-mapped and then centered on the display of a Sony GDM-F500 19-in. monitor (displaying a 1,024 × 768 raster at 100 Hz, 35 cd/m2), typically at a distance of 114 cm, directly or by a mirror. Real-time spike-sorting software (Datawave Technologies) is engaged to provide TTL pulses corresponding to the time of spikes of tentatively identified single units. Rapid, qualitative characterization of these units' ocularity and grating responses is accomplished by keyboard or mouse control of the visual stimulator.


Among the multiple spikes simultaneously recorded by the tetrode, one well-isolated spike (signal-to-noise >2:1 and usually >3:1, distinctive shape by on-line spike sorting) is selected as the “target” neuron. Beginning with the parameters determined by the qualitative characterization, computer-controlled stimulation paradigms are used to characterize the target neuron quantitatively with sine gratings. Orientation tuning is determined by the mean response (F0) and the fundamental modulated response (F1) to drifting gratings at orientations spaced in steps of 22.5 deg (or, for narrowly tuned units, 11.25 deg), presented at a contrast c = (LmaxLmin)/(Lmax + Lmin) of 0.5 or 1.0, with spatial and temporal frequency determined by the initial assessment. Next, spatial frequency tuning is determined by responses to drifting gratings at an eight- to 16-fold range of spatial frequencies straddling the value determined by the auditory assessment, a contrast 0.5 or 1.0, an orientation determined by the orientation tuning run, and a temporal frequency determined by the auditory assessment. Temporal tuning is then assessed by responses to 1-, 2-, 4-, 8-, and 16-Hz drifting gratings at the optimal orientation and spatial frequency. Finally, a contrast response function is determined by responses to drifting gratings at contrasts of 0, 0.0625, 0.125, 0.25, 0.5, and 1.0, with orientation, spatial frequency, and temporal frequency determined by the previous quantitative runs. The position of the receptive field (RF) of the target neuron is then assessed from online-generated poststimulus time histograms (PSTHs) of the response to either a bright or dark bar, moving slowly (≤1 deg/s) and symmetrically about the origin in both directions along the preferred axis. To center the RF along the preferred axis, the stimulus coordinate system origin is digitally adjusted so that the mean of the times of the peak responses (to stimuli swept in each direction) occurs when the bar traverses the origin of the coordinate system. To center the RF in the orthogonal direction, the origin is digitally adjusted so that it lies halfway between the upper and lower edges of the RF, as determined by the appearance of a response to slowly swept patches along multiple trajectories parallel to the preferred axis.

Once centered, the size of the classical RF is determined from responses to a drifting grating (all parameters optimized) presented in discs of increasing diameter and in a series of annuli with fixed outer radius and decreasing inner radii. In each case, stimuli and blanks are presented for 3-s runs, and four to eight randomized repeats are obtained for adequate statistics on the Fourier components of the responses. The effective diameter D of the RF of the target neuron (used below to determine the size of the TDH patterns) was taken to be the smallest inner diameter of an annulus that did not produce a measurable response, as assessed by t-statistics for F0 or Tcirc2 statistics (Victor and Mast 1991) for F1 (as diagrammed in Fig. 2A, unit 1). The set of annuli were chosen so that D was determined to within deg or, for smaller receptive fields, deg.

FIG. 2.

Relationship of scaling of the TDH stimuli to the classical receptive field (RF) size. Left: diameter D of the classical RF for the target unit (diagrammed as unit 1) is taken to be the smallest inner diameter of an annulus that did not produce a measurable response (bottom left); other units (diagrammed as unit 2) might lead to a somewhat different choice of D and units might show increasing responses to patches of diameter >D (top). See methods for additional details. Parameter σ that defines the spatial spread of the TDH stimuli (see Eqs. A1, A2, A4, and A5) is then chosen as σ = D/10, which produces spatial profiles that are confined to a disk of radius D for low ranks, but extend beyond it for high ranks (right). h0 indicates the radial dependency of the TDH stimulus of rank 0 (common to Cartesian and polar separations); h7 indicates the dependency of the rank-7 Cartesian TDH C0,7 along its long axis.

The ratio of the Fourier component at the modulation frequency to the mean, F1/F0, was calculated from the response to a drifting grating, and units were classified as “simple” if F1/F0 > 1 and “complex” if F1/F0 ≤1 (Skottun et al. 1991b). A direction selectivity index DSI = (RprefRanti)/(Rpref + Ranti) was calculated from grating responses F1 or F0, depending on which component dominated the response.

Usually, there are two to four simultaneously recorded neurons whose spikes are well isolated by the above criteria, and whose spike shapes across the tetrode are reliably discriminated. At some recording sites, some of these neurons differed substantially from the target neuron in RF position, spatial frequency, and/or orientation tuning. At approximately one third of recording sites, we repeated the quantitative characterizations above for one of these additional neurons, so that they could also serve as the “target.” Discriminated event pulses corresponding to the tentatively identified single units are logged by the PC that controls the visual stimulus (AS1b board on the VSG system, NI PCI-6602 on the OpenGL system) for on-line analysis. Timing pulses from the PC that controls the visual stimulus are also led to a PC that hosts the Datawave spike-sorting system and records event waveforms (32 samples at 0.04-ms resolution) for later analysis. Off-line spike sorting is performed with an in-house Matlab implementation (Reich 2000) of the methods of Fee (1996) and Sahani (1998). All the data below are derived from these off-line spike sorts. Because the stimulus lineup was performed on the basis of on-line discriminations and the definitive analysis was defined from an independent analysis of stored waveforms, the identification of the “target” neuron in the off-line analysis is only presumptive and plays no role in the quantitative analysis. Moreover, as will be shown, our main findings were present both for neurons in which the receptive field maps were well centered (a set that includes the presumptive target neurons and likely others) and also for neurons whose receptive field maps were off-center but still within the common envelope of the TDH functions.


After characterization and alignment of one or more target neurons, we recorded responses to patches whose spatial contrast was determined by a two-dimensional Hermite function (TDH) (see Fig. 1 and detailed description in the appendix). Each TDH is a polynomial in the coordinates (x, y), multiplied by a Gaussian envelope. Stimuli were rotated so that the x-axis was along the target neuron's preferred orientation and the positive y-axis was the preferred direction for drifting gratings, if any. We set the spatial scale parameter σ of the Gaussian envelope (see appendix Eqs. A1, A2, A4, and A5) at σ = D/10, where D is the diameter of the classical RF of the target neuron as determined by responses to disks and annuli containing the optimal drifting grating.

The reasoning behind this choice is as follows. The choice of σ simultaneously sets the spatial extent of the two-dimensional Gaussian envelope common to all TDH functions {exp[(x2 + y2)/4σ2]}, and the range of spatial frequencies explored at each rank. As illustrated in Fig. 2, choosing σ = D/10 provides for stimuli that have one, two, or three oscillations within a region of space that covers the receptive field, well-matched to sample (in the Nyquist sense) the typical sensitivity profiles of cortical neurons (Ringach 2002), which have two or three lobes. Had we chosen a substantially larger value of σ, most of the stimuli have would be relatively uniform over the receptive field, and thus not examined the spatial frequencies to which the neuron was likely to be tuned. Had we chosen a substantially smaller value of σ, most of the stimuli would have been confined only to a subregion of the receptive field.

This choice is also supported by the sense in which linear combinations of TDH functions represent receptive fields. Linear combination of TDH functions converges to a target spatial profile in a least-squares sense as weighted by the square of the common envelope {i.e., exp[(x2 + y2)/2σ2]}. That is, if σ is large, the approximation will be inefficient in that it will be weighted by areas far removed from the receptive field. Conversely if σ is small, the convergence will not be valid across the entire receptive field until an unreasonably large number of terms have been added. A choice of σ for which TDH profiles have an envelope that is similar to that of the receptive fields to be approximated avoids these difficulties. The fact that sensitivity profiles have relatively stereotyped shapes (Ringach 2002) allows a common universal choice to be made.

We carried out pilot experiments in one cat (two sites, seven units) and one monkey (three sites, nine units) in which we used the standard choice of σ and one or two values that differed from the standard choice. For values that differed from the standard choice by a factor of 1.5 or 1.66, corresponding features of the derived “L” and “E”-filter profiles (see below) could be identified within the common range of convergence. For values that differed from the standard choice by a factor of 2 or 3, only the coarsest commonalities of the maps could be seen, consistent with the above theoretical considerations.

Finally, we also note (as illustrated in Fig. 2) that with this choice of σ = D/10, the contrast profiles of the lowest-rank stimuli lie within the classical RF, although the contrast profiles of the higher-rank stimuli (by design) extend beyond the classical receptive field.

The TDH patterns each have the same total power, and contrast was scaled by setting Math(see appendix Eqs. A1, A2, A4, and A5), so that the maximum contrast was 1.

Each pattern was presented with the polarity shown in Fig. 1, and in inverted contrast polarity. Up to rank 7, this amounted to 144 stimuli (36 Cartesian stimuli, 36 polar stimuli, and their contrast-inverses). Rank 0 and 1 Cartesian and polar stimuli (i.e., the first three stimuli of each set) were identical. These three stimuli and their contrast-inverted counterparts were not duplicated in the stimulus sequence, reducing the number of stimuli to 138 = 144 − (3 × 2). (There is also a single rank 2 duplication, C1,1,σ = A2, 0,σsin, but both stimuli were presented.) In addition, four stimulus periods of the “blank” stimulus, in which the contrast was held at zero, were added to the sequence. These 142 stimuli were each presented for 250 ms, each followed by 250 ms of a blank, in randomized order, for eight to 16 blocks.

Visual stimulus generation

Control signals for the CRT display are provided by a PC-hosted VSG2/5 (8 Mb) for grating stimuli and by a separate PC-hosted system optimized for OpenGL (NVidia GeForce3 chipset) for the bar and TDH stimuli, both programmed in Delphi. For presentation, TDH, stimuli were discretized as limited by the display resolution. This typically meant at least 64 × 64 display pixels across the stimulus, with each display pixel subtending approximately 1 min. At the edge of each patch, stimulus contrast was reduced to less than -th of its peak value.

Intensity linearization is separately performed for each display controller by VSG software or in-house software of comparable function.


At three locations along the electrode track bracketing the recording sites, lesions are made by current passage (typically 3 μA × 3 s, electrode negative). After all recordings, the animal was killed and perfused (4% paraformaldehyde) in phosphate-buffered saline. Digital microphotographs are first taken of histologically unstained 40-μm cryostatic sections under the fluorescence microscope to capture the DiI trace of the track. Digital microphotographs of the same sections are retaken under light microscopy after Nissl staining (Hevner and Wong-Riley 1990), to highlight laminar organization perpendicular to the track as well as the location of lesions. Laminar location of the recording sites is recovered by digital overlay of the image pairs corresponding to a section. Typically two to six consecutive sections fully contain a single track.


Waveform classification of the tetrode recordings yielded 45 units from 12 sites in three cats and 18 units from five sites in two macaques, whose spiking activity could be driven by drifting gratings or bars. All recordings were within 5 deg of the area centralis (cats) or fovea (macaques); 34/45 of the cat units and 17/18 of the macaque units had responses to TDH stimuli that were clearly distinguishable from their baseline activity. We restrict our further analysis to these 51 units.

Example responses to TDH stimuli

We begin by showing some example responses to TDH stimuli, initially describing them qualitatively, and then introducing quantitative approaches and illustrating their application.


Figure 3 shows PSTHs of responses of four simultaneously recorded units in upper layer III of cat V1. A fifth isolated neuron at this site was not responsive to TDH stimuli. Response histograms are laid out corresponding to the stimulus arrays of Fig. 1, with responses to Cartesian stimuli on the left and responses to polar stimuli on the right. For each stimulus, there is a pair of histograms: in the top histogram of each pair, the stimulus was presented as shown in Fig. 1; in the bottom histogram, the stimulus was presented with reversed polarity. Unit 3003t had a classical RF diameter of 3 deg; thus the size of the stimuli corresponded to σ = 0.3 deg (see methods). All units at this site had a similar preferred orientation (45 deg) and, with the exception of unit 3003s, were narrowly tuned. Stimulus coordinate axes were rotated to conform to this common orientation preference.

FIG. 3.

Poststimulus time histograms (PSTHs) of responses of 4 simultaneously recorded neurons in layer III of cat V1 to TDH functions (left; Cartesian stimuli; right polar stimuli), each presented for 250 ms and followed by 250 ms of mean illumination. In each pair of histograms, the top histogram is the response to the stimulus shown in Fig. 1, and the bottom histogram is the response to the contrast-inverse of that stimulus. Four pseudocolor maps represent the spatial filters Lcart, Lpolar, Ecart, and Epolar for the model of Fig. 4, derived as described by Eqs. 2 and 4. Circle on each color map is of diameter Embedded Image(D is the diameter of the circle in Fig. 2), which marks the point at which the Gaussian component of each Hermite function falls to e−2 times its peak value. For each unit, a common linear pseudocolor scale (color bar as shown in top right) is used for the 4 filters, with green representing 0, red representing the highest positive value, and blue representing the lowest negative value. For the units of panels A and B, there is at least a qualitative similarity of the filters L and E deduced from the 2 basis sets. For the unit of panel C, the shapes of the filters differ substantially. For the unit of panel D, there is a difference in the relative strengths of the linear and nonlinear components (L < E for the Cartesian functions. L comparable to E for the polar functions). A, B, C, and D: units 3003t, s, u, and x. PSTH scale bar: 100 impulses/s in all panels. Range for pseudocolor maps of filters: ±10 impulses/s (A), ±37 impulses/s (B), ±10 impulses/s (C), and ±13 impulses/s (D).

When studied with gratings, unit 3003t was a nondirectional simple (F1/F0 = 1.8) cell with narrow orientation tuning. The unit responded only to Cartesian stimuli that had uninterrupted contrast bands along its orientation preference: the stimuli C0,k. These are the only stimuli that have uninterrupted contrast bands along the preferred orientation. The other Cartesian TDH stimuli Cj,k (j > 0) have j contrast-inversions along the preferred axis.

For some Cartesian stimuli C0,k (e.g., C0,3 and C0,4), this unit responded at stimulus onset, and was quiet at stimulus offset (top histogram of the pair). When the polarity of these stimuli was reversed (bottom histogram of the pair), it was quiet at onset, but produced a burst at stimulus offset. The opposite pattern was seen for Cartesian stimuli C0,2 and C0,5: response at offset for the first polarity, with response at onset for the opposite polarity. Responses to the polar-separated stimuli generally had this temporal pattern as well. This kind of behavior is qualitatively consistent with a linear filter that accounts for spatial selectivity, followed by temporal high-pass filtering and half-wave rectification (e.g., a low maintained firing rate) that accounts for the pattern of responses to a stimulus and its contrast-inverse. The unit responded robustly to some polar TDH stimuli and not to others; as we will see below, this spatial selectivity is fully consistent with that of an oriented filter-then-rectify (“LN”) model.

Unit 3003s was a simple (F1/F0 = 1.7) cell, more broadly tuned than unit 3003t, and also not directionally selective. It had a similar temporal pattern of responses to TDH stimuli of each polarity pair, both for the Cartesian and polar stimuli. In contrast to unit 3003t, however, there were also modest responses to stimuli C1,0, C2,0, and C3,0 (stimuli with contrast bands orthogonal to the preferred axis) and also to stimuli C1,2 and C1,3. The latter have contrast bands that run along the preferred axis, but contrast-reverse at the peak of the Gaussian and thus have no power in the preferred orientation. We will see below that these responses, and the selectivity for polar TDH stimuli, are also consistent with an LN model, but one with a broader orientation tuning associated with the initial linear stage.

Unit 3003u was a complex cell (F1/F0 = 0.6), and had a very different pattern of responses to TDH stimuli. Responses to the Cartesian stimuli were generally independent of stimulus polarity. Qualitatively consistent with a model consisting of an oriented filter followed by a mostly even nonlinearity, the largest responses in unit 3003u occurred for Cartesian stimuli C0,k, i.e., the stimuli whose contrast bands were aligned with the preferred orientation. Responses to polar stimuli, when present, were also independent of stimulus polarity. However, as we will see below, the pattern of selectivity for polar stimuli is inconsistent with the oriented filter that is implied by the selectivity for Cartesian stimuli.

Unit 3003x was also a complex cell (F1/F0 = 0.28), and, like unit 3003u, had responses to Cartesian stimuli that were generally independent of stimulus polarity and primarily responded to Cartesian stimuli C0,k. However, responses to polar stimuli, such as those in the middle of row 3 (A0,1, rank 2) and row 5 (A0,2, rank 4) were strongly dependent on stimulus polarity. Thus although this neuron's polarity -dependency for Cartesian stimuli conformed to the expectations of a complex cell, many responses to polar stimuli were polarity -dependent, like those of units 3003t and 3003s above.


To make the above qualitative observations more precise, we introduce a modified filter-then-rectify model, as shown in Fig. 4. This model is not intended to correspond to anatomy or a wiring diagram, but rather to provide a means to compare the spatial selectivities of responses to Cartesian and polar stimuli. Below (see Energy models) we will also show that, as a consequence of some properties of the TDH functions, the measurements used to test the filter-then-rectify model can also be used to test several variants of energy models.

FIG. 4.

Filter-then-rectify framework for analyzing responses to Cartesian and polar TDH stimuli. L and E represent spatial filters; E is followed by full-wave rectification. This model is used to deduce the filter maps presented in Figs. 3, 5, and 6. For further details, see text.

Model description.

One branch of the model, characterized by a linear filter L, encompasses “on” and “off” inputs that behave in a linear fashion. A second branch, consisting of a linear filter E followed by full-wave rectification, generates onoff responses. The outputs of these branches are added together, along with a maintained firing rate Rm, to produce the neuron's output. The standard filter-then-rectify model makes specific predictions about the relationship between L and E, and conversely, combinations of L and E can be reinterpreted in terms of on and off inputs (see below).

To determine L and E from our data, we make the simplifying assumption that the neural response to each stimulus can be characterized by a scalar “response measure.” For this purpose, we will initially use the total spike count during stimulus presentation; dynamics will be considered later. With this initial simplification, the model response R(S) to a stimulus S is Math(1) where L(x, y) represents the spatial weighting of the filter L in the “linear” branch and E(x, y) represents the spatial weighting of the filter E that precedes a full-wave rectification.

To determine the filters L and E from the responses to a set of TDH functions fk and their negatives −fk, we use the fact that each set of TDH functions (either Cartesian or polar) is an orthogonal basis. We therefore can express L and E as a sum of TDH functions Math Math(2) where La and Ea are the scalar coefficients in these two orthogonal expansions. It follows from the orthonormality of the functions fk that the response of the model (Eq. 1) to the inputs fk and −fk are given by Math Math(3) From Eq. 3 it follows that Math Math(4) This strategy of separating linear and nonlinear components based on responses to stimuli of opposite parity is similar to an approach suggested for sparse noise stimulation (Nykamp 2003); here we exploit the fact that the strategy does not require nonoverlapping stimuli, but merely orthogonal stimuli. The determination of L by Eq. 4 can be viewed as a variant of a “subspace reverse-correlation” approach (Ringach et al. 1997b), where we have chosen the subspace to consist of functions limited in spatial extent and bandwidth. Because we have two basis sets for the same subspace, one test of the model (see below) is that the determination of L must be the same for each basis set (Ringach et al. 1997b).

Equation 4 shows how the responses to either basis set specify the coordinates Lk and |Ek|. Conversely, as shown by Eq. 3, the coordinates Lk and |Ek|, along with the maintained firing rate Rm, fully and exactly specify the responses to each stimulus within the basis set used. Thus our strategy for testing the model of Fig. 4 is not to check consistency of filters L and E as determined with one basis set with the raw responses, but rather to check consistency of these filters across basis sets.

Relation to notions of linearity, “simple” and “complex.”

The model of Fig. 4 will behave in a linear fashion if, and only if, E = 0; in this case, the positive and negative lobes of L correspond to the on and off subfields. Special cases of the model for E ≠ 0 correspond to idealized “simple” and “complex” cells, as characterized by subfield organization (Hubel and Wiesel 1959, 1968). [We do not mean to imply that this distinction is identical to the simple vs. complex distinction based on the response to drifting gratings (Kagan et al. 2002b; Skottun et al. 1991b); the relationship of our model to models that focus on phase dependency is discussed below.] When L = E the model behavior is that of a linear filter, followed by half-wave rectification (i.e., negative signals are set to 0, positive signals are unchanged). This corresponds to an idealized “simple” cell with nonoverlapping on and off subfields, and linear combination of these signals before an output nonlinearity arising from the requisite nonnegativity of the firing rate. When L = 0 (but E ≠ 0) the model behavior is that of a linear filter, followed by full rectification (i.e., negative and positive signals are set to their absolute value). That is, the model produces on and off responses in coextensive areas of space, and is thus an idealized “complex” cell. If L and E are both nonzero but have a similar shape, the model of Fig. 4 simplifies into a one-pathway (LN) model, in which the nonlinearity is partially or asymmetrically rectifying (i.e., intermediate between linear and “simple” or between “simple” and “complex”). Models in which L and E have different shapes, the general case of Eq. 1, correspond to cells with a mixture of spatially distinct on, off, and onoff subfields.

Conversely, any feedforward neuron with a single nonlinearity consisting of half-wave, full-wave, or intermediate (asymmetric) rectification can be recast into the form of Eq. 1, by considering its responses to stimuli and their inverses. Once this has been done, the shapes and magnitudes of the filters L and E should be independent of the basis set used in Eqs. 2 and 4. [For L this is the argument of Ringach et al. (1997b); it extends to E by the symmetry-based separation of Eq. 4]. Because Cartesian and polar stimuli each constitute a basis set, the above model allows us to ask whether the responses to Cartesian and polar stimuli are consistent with a large category of feedforward models and, if not, the manner in which they deviate.

Some details.

The above procedure determines L uniquely (within the linear span of the fk), but the filter E is ambiguous because the data determine the magnitude of each coefficient in its orthogonal expansion, but not its sign (Eq. 4, bottom portion). Any assignment of signs to the coefficients for E will result in a filter that will lead to the same responses. For the purposes of graphical display, we choose the signs of the coefficient Ek to match that of Lk. This is a conservative choice, in that it leads to a visual rendition for E that is as similar as possible (within the constraints of the data) to that of L. Other strategies for fixing the signs of the coefficients Ek, such as minimizing the spatial extent of E or making it as sparse as possible, might also be considered. All of our statistical analyses related to E are based on the absolute values of its coefficients in the orthogonal expansion (Eq. 4, bottom portion) and are thus unaffected by the method chosen to resolve the sign ambiguity.

A second detail is that we set Ek = 0 if, on a trial-by-trial basis, the mean response [R(fk) + R(−fk)]/2 did not deviate from the response elicited by a blank, at a 95% confidence limit (by t-test). The implications of this manipulation are discussed below.

Calculations of the filters L and E were performed on a grid of 64 × 64 or larger with σ set equal to of the grid. On this grid, numerical approximations to orthogonality were better than one part in 105 and the largest values of the functions that lay beyond the grid were < of the peak. Thus the consequences of discretization, both in the display of the functions (see Visual stimulus generation) and subsequent analysis, are negligible. Because the basis functions are smooth, the finite linear combinations of these functions as specified in Eq. 2 are smooth as well and no further smoothing was applied.

The number of spikes used to calculate the maps ranged from 457 to 41,797, with a mean of 6,597 and a median of 4,135. Data sets with relatively few spikes were included only if the responses appeared reliable (e.g., PSTHs clearly modulated by stimulus appearance and disappearance).


To determine the extent to which the estimated filters L and E correspond to certain idealized notions of simple and complex cells (and to mitigate the difficulties related to the ambiguity of E), we construct two kinds of indices, Isym and Ishape. Isym, which is calculated separately for each basis set (denoted Isymcart or Isympolar), compares the strengths of the filters L and E, but ignores their shapes. Generically Math(5) Here, |L|2 and |E|2 indicate spatial integrals of the squared response profiles Math Math(6) where La and Ea are the coefficients determined by Eq. 4 from the basis set of interest. The second equality on each line is a consequence of the orthonormality of the basis functions. Note that this implies that |E|2 is independent of the signs assigned to each coefficient Ea.

For an idealized complex cell (in the sense of overlapping, equally strong, on and off subregions), L = 0 and so Isym = 1. For an idealized simple cell (in the sense of separate on and off subregions) consisting of a linear filter followed by half-wave rectification, L = E and so Isym = 0. For a cell that is truly linear (e.g., has a sufficiently high firing rate to avoid rectification), E = 0 and so Isym = −1. Intermediate values of Isym correspond to asymmetric rectification; overrectification for 1 > Isym > 0 (negative and positive signals both transformed to signals of the same sign, but with unequal gains) and underrectification for 0 > Isym > −1 (negative and positive signals unchanged in sign, but transmitted with unequal gains).

The model of Fig. 4 places no constraints on the shape of L, but requires that L is independent of basis set (Cartesian vs. polar). To test this prediction, we use an index Ishape, the spatial correlation coefficient of the estimates Lcart and Lpolar derived from the two basis sets. For Lcart and Lpolar expressed as maps Math(7) Equivalently, expressed in terms of the expansion coefficients of Eq. 2 Math(8) where ca,b is the dot product of the ath Cartesian function and the bth polar function Math(9) For filter shape to be independent of basis set, Ishape(Lcart, Lpolar) = 1.

The model of Fig. 4 reduces to a single-pathway model when L and E have the same shape. Analogous indices Ishape (Lcart, Ecart) and Ishape (Lpolar, Epolar) express the similarity of the estimated shapes of these filters, as determined from each basis set. Because of the sign ambiguity in the determination of E, we choose the conservative definitions Math(10) and similarly for Ishape(Lpolar, Epolar). This definition is conservative in that it makes Ishape(L, E) as close to 1 as possible, consistent with the data.

Estimates of Isym and Ishape quoted below were calculated from Eqs. 5, 8, and 10 and debiased by a jackknife procedure (Efron and Tibshirani 1998) based on each block of trials; quoted SEs of measurement were determined in a similar fashion.

As described above, before the calculation of the maps and indices we set Ek = 0 if the raw estimates of Ek did not deviate significantly from zero. Nearly all the excluded values were slightly positive. This exclusion avoids the tendency of random spikes to bias the map of E toward that of L [i.e., removes a bias of Ishape(L, E) toward the null hypothesis value of 1] and also reduces random contributions to estimates of the overall size E (i.e., removes a bias of Isym away from the null hypothesis value of 1). Although such biases would also be removed (in the asymptotic limit) by the jackknifing procedure, we considered it more appropriate to remove them at the source. Moreover, because their responses are small, including or excluding them has only a small effect on the filter maps and the resulting statistics, as confirmed by reanalysis of the full data set from one animal without this exclusion.


We now use the above model to analyze the responses at Site 1 (Fig. 3). For unit 3003t (Fig. 3A), the filter L extracted from the Cartesian responses had several parallel lobes oriented along the preferred orientation, consistent with a Gabor-like spatial filter. Shapes of the L filters determined from the Cartesian and polar responses were similar but statistically distinguishable: Ishape(Lcart, Lpolar) = 0.74 ± 0.09. The overall size of the L filter was somewhat larger than that of the E filter, indicating that the “linear” responses dominated the “onoff” responses (3003t had F1/F0 = 1.8). Correspondingly, Isymcart = −0.63 ± 0.22 and Isympolar = −0.34 ± 0.16, consistent with underrectification. Finally, the even-order pathway filter (E) and the L filter had similar shapes, [Ishape(Lcart, Ecart) = 0.79 ± 0.17, Ishape(Lpolar, Epolar) = 0.89 ± 0.13], consistent with a reduction of the model of Fig. 4 to a single-pathway LN model. In sum, the indices show that, although a single-pathway feedforward model with an underrectifying nonlinearity might be considered as a first approximation, a more quantitative analysis reveals clear deviations from this picture.

Unit 3003s (Fig. 3B, F1/F0 = 1.7, broad orientation tuning) showed little deviation from a simple LN model, even when analyzed quantitatively. Consistent with its broader orientation tuning, the sensitivity profile of the L filter had only two lobes, and the lobes were less elongated than those of unit 3003t. The shapes of these filters were similar as determined from either Cartesian or polar stimuli: Ishape(Lcart, Lpolar) = 0.94 ± 0.02. As with unit 3003t, the “linear” responses dominated the onoff components: Isymcart = −0.59 ± 0.02 and Isympolar = −0.60 ± 0.02, consistent with underrectification. The L filters and E filters were similar in shape [Ishape(Lcart, Ecart) = 0.99 ± 0.01 and Ishape(Lpolar, Epolar) = 0.97 ± 0.02], consistent with a reduction to an LN model.

The other two units at this location had very different behavior. For unit 3003u (Fig. 3C), although both Cartesian and polar stimuli elicited responses that led to oriented, Gabor-like maps, the orientation of these maps differed by approximately 37 deg. Correspondingly, Ishape(Lcart, Lpolar) = 0.37 ± 0.15, a substantial deviation from 1. Consistent with its low F1/F0 ratio of 0.6, the E filter dominated the L filter: Isymcart = 0.48 ± 0.12 and Isympolar = 0.32 ± 0.22. For the Cartesian stimuli, the shapes of the L and E filters were similar [Ishape(Lcart, Ecart) = 0.86 ± 0.15]; there was a moderate difference for the polar stimuli [Ishape(Lpolar, Epolar) = 0.57 ± 0.16]. Thus if only the responses to Cartesian stimuli, or only the responses to polar stimuli, are considered, this neuron's response is generally consistent with an oriented filter followed by overrectification (producing an onoff response). However, the full set of responses is qualitatively inconsistent with this picture: the apparent orientation of the of the initial filter depends substantially on the basis set used.

Unit 3003x (Fig. 3D, F1/F0 = 0.28) showed yet another kind of behavior. The spatial maps of the L and E filters were similar across basis set [Ishape(Lcart, Lpolar) = 0.97 ± 0.14) and similar to each other [Ishape(Lcart, Ecart) = 0.98 ± 0.12 and Ishape(Lpolar, Epolar) = 1.0 ± 0.07]. However, confirming the impression that responses to Cartesian stimuli were more symmetrically onoff than responses to polar stimuli, Isymcart = 0.56 ± 0.11 (overrectification) but Isympolar = 0.13 ± 0.18 (half-wave rectification). In sum, although an LN model can give a reasonable account of the responses to either stimulus set alone, the apparent degree of nonlinearity for this unit is substantially higher for Cartesian than for polar stimuli.


Figure 5 shows TDH responses from three neurons in a cluster located in upper layer VI/lower layer V of cat V1, and emphasizes the heterogeneity of behavior encountered. Unit 3301s (Fig. 5A) and 3301t (Fig. 5B) had similar orientation optima for drifting gratings (100 and 90 deg), whereas unit 3301u had an optimum orientation of 200 deg. Unit 3301t was strongly directionally selective; the other two neurons were not. The Cartesian responses of units 3301s and t, as characterized by L and E filters, were oriented along their preferred orientation, and similar in shape: Ishape(Lcart, Ecart) = 0.99 ± 0.01 for 3301s; Ishape(Lcart, Ecart) = 0.94 ± 0.03 for 3301t. The relative sizes of the L and the E filters were also consistent with the degree of nonlinearity seen in the grating responses. That is, the relative sizes of the L and the E filters for both units were in the “underrectification” range: unit 3301t had a larger contribution from the E filter than unit 3301s (Isymcart = −0.62 ± 0.03 for 3301s, Isymcart = −0.10 ± 0.08 for 3301t), corresponding to difference in their F1/F0 ratios (F1/F0 = 1 for 3301s, F1/F0 = 0.1 for 3301t). However, both units were nearly unresponsive to polar stimuli. This behavior is qualitatively inconsistent with an LN picture: a broadly tuned front end could not account for the absence of responses to the polar stimuli because they overlap extensively with the Cartesian stimuli in spatial frequency content, whereas a narrowly tuned front end could not account for the presence of responses to the Cartesian stimuli across many ranks. In the 15 other neurons recorded at four other infragranular recording sites in cat, we encountered one additional neuron that responded well to Cartesian stimuli but not to polar stimuli. No such neurons were encountered in the single infragranular site in macaque (layer V, four neurons).

FIG. 5.

PSTHs of responses of 3 simultaneously recorded neurons in upper layer V1/lower V of cat VI to TDH functions. Data are displayed as in Fig. 3. Units of A and B respond nearly exclusively to the Cartesian stimuli; the unit of C responds in a similar fashion to both basis sets. A, B, and C: units 3303s, t, and u. PSTH scale bar: 75 impulses/s in A and B, 50 impulses/s in C. Range for pseudocolor maps of filters: ±6 impulses/s (A), ±8 impulses/s (B), and ±5 impulses/s (C).

Unit 3301u (Fig. 5C) had a somewhat smaller response, but was approximately equally responsive to Cartesian and polar stimuli. The maps of the L and E filters were consistent with an orientation preference nearly perpendicular to that of the other two units. Notably, this neuron, which would be classified as “complex” from its grating responses (F1/F0 = 0.3), had a predominantly linear response (Isymcart = −0.85 ± 0.15, Isympolar = −0.75 ± 0.10).


Figure 6A shows responses from unit 5013s, one of three units simultaneously recorded in layer IVb of macaque V1. All three neurons were poorly oriented simple cells (F1/F0 ≥ 1.6) and directionally biased (0.2 ≤ DSI ≤ 0.6), consistent with the preponderance of directional-selective neurons in layer IVb (Hawken et al. 1988). Responses were robust, reaching 150 impulses/s. For each pair of opposite-polarity stimuli, the neuron responded at onset to one stimulus, and at stimulus offset to the other. The other two neurons had largely overlapping receptive field profiles, but differed in the sizes of the spike waveform across the tetrode channels, and in response dynamics. As in unit 3003t of Fig. 3A, quantitative analysis indicated consistency of both the Cartesian and polar responses [Ishape(Lcart, Lpolar) = 0.99 ± 0.01] with a one-pathway simplification of Fig. 4 [Ishape(Lcart, Ecart) = 0.99 ± 0.01, Ishape(Lpolar, Epolar) = 0.98 ± 0.01] and an underrectifying nonlinearity: Isymcart = −0.71 ± 0.04, Isympolar = −0.69 ± 0.04.

FIG. 6.

PSTHs of responses of 4 neurons at separate locations in macaque V1. Data are displayed as in Fig. 3. For the unit of A, but not for the other units, the filters L and E deduced from the 2 basis sets are similar. A, B, and C: units 5013s, 5007t, and 5008s. PSTH scale bar: 150 impulses/s in A and B, 75 impulses/s in C. Range for pseudocolor maps of filters: ±60 impulses/s (A), ±40 impulses/s (B), and ±50 impulses/s (C).

Along this penetration at the layer IVb/c border, we isolated two units, a nonoriented complex unit 5007s (F1/F0 = 0.1) and the narrowly tuned directionally selective (DSI = 0.8) unit 5007t (F1/F0 = 0.8) shown in Fig. 6B. Responses of 5007s to TDH stimuli were consistent with the standard picture of a complex cell (small L, large but nonoriented E, for both Cartesian and polar stimuli). Corresponding to its intermediate F1/F0 ratio, unit 5007t (Fig. 6B) responded in an excitatory fashion to the appearance of both members of a polarity pair, although the sizes of these responses were often not equal—qualitatively consistent with a mixture of quasilinear and onoff inputs. The filters Lcart and Ecart consisted of elongated domains consistent with the orientation preference for grating stimuli; the orientation domains were similar [Ishape(Lcart, Ecart) = 0.88 ± 0.04] but were more clearly delineated for Ecart than for Lcart. The filter Lpolar was similar to that of Lcart [Ishape(Lcart, Lpolar) = 0.94 ± 0.05] and consisted of small, minimally elongated blobs, although the dominant orientation of elongated components of Epolar were shifted approximately 20 deg with respect to that of Ecart. [We do not calculate an index Ishape(Ecart, Epolar) because of the ambiguities in the estimation of the E filters, as described above.] The symmetry index was shifted modestly in the direction of greater rectification for polar stimuli: Isymcart = 0.39 ± 0.06, Isympolar = 0.53 ± 0.06.

Near the border of layer IVcβ and layer V, we isolated three complex cells (F1/F0 ratio of 0.1 to 0.15), all of which were highly responsive to gratings and directionally biased or directionally selective (DSI ≥ 0.5). Histologically, this recording site was at the lower border of layer IVcβ. However, these response properties are more consistent with neurons in upper layer V, which would be within the likely recording sphere of the tetrode (Gray et al. 1995). One of these three units, 5008s (Fig. 6C), had predominantly even-order inputs for both basis sets: Isymcart = 0.78 ± 0.09, Isympolar = 0.88 ± 0.13. However, a clear oriented receptive field domain consistent with this neuron's orientation tuning for gratings was seen only for Ecart (the horizontal excitatory subregion). In contrast, Lcart and Lpolar were weak and nonoriented; Epolar was strong but its orientation was not consistent with the orientation tuning of the grating responses. Also at the same recording site, responses generated by unit 5008t (not shown) were partially consistent with the standard picture of a complex cell (small L, large E for both Cartesian and polar stimuli), but Ecart and Epolar differed substantially in shape. Unit 5008u (also not shown) was the only macaque unit that responded well to drifting gratings but not to TDH stimuli. This was the most directionally selective neuron we encountered (DSI ≈ 1.0).

Population summary

Above, we introduced indices derived from an extension of the standard linear–nonlinear model to analyze the responses to Cartesian and polar TDH stimuli. The index Ishape(Lcart, Lpolar) (Eq. 8) indicates the extent to which the linear filters that best account for the responses to the Cartesian and polar stimuli are similar. (An analogous index for the even-order responses is not straightforward to calculate because of the sign ambiguities described above in connection with Eq. 10). The indices Ishape(Lcart, Ecart) and Ishape(Lpolar, Epolar) (Eq. 10) determine for responses to each basis set, to what extent the two-pathway model of Fig. 4 reduces to a single-pathway model. The indices Isym(Lcart, Ecart) and Isym(Lpolar, Epolar) (Eq. 5) determine, for responses to each basis set, whether the response is primarily full-wave rectifying (Isym = 1), consistent with linearity (Isym = −1), or intermediate. We now examine the distribution of these indices and related quantities across the population.


A value of 1 for the index Ishape(Lcart, Lpolar) corresponds to equality of the estimated Cartesian and polar filter shapes, but measurement errors would tend to bias estimates of Ishape downward away from 1. Therefore, as described in methods, we used the jackknife procedure (Efron and Tibshirani 1998) to debias the estimates and to determine confidence limits on them. Across the 51 neurons (Fig. 7A), the debiased estimate of Ishape had a mean of 0.76 ± 0.26, with f0.05 = 28/51 and f0.01 = 14/51 (here and below, population statistics are summarized as mean ± SD of the debiased estimates, along with f0.05, the fraction significantly <1 at P < 0.05, and f0.01, the fraction significantly <1 at P < 0.01).

FIG. 7.

Distribution of the index Ishape (Lcart, Lpolar) (Eq. 8). Values <1 indicate different effective filtering behavior for Cartesian and polar stimuli. Portions of the histograms shaded black represent units for which values were significantly (by jackknife) <1 at P < 0.01; portions shaded gray are significant at 0.01 at P < 0.05; unshaded portions correspond to P > 0.05. Each panel contains calculations based on a different response measure.

The cat subset (mean 0.73 ± 0.27, f0.05 = 21/34, f0.01 = 9/34) and the macaque subset (mean 0.81 ± 0.24, f0.05 = 9/17, f0.01 = 5/17) were similar to each other in this regard (P > 0.20 by Kruskal–Wallis test). The simple cell subset (mean 0.82 ± 0.19, f0.05 = 7/10, f0.01 = 4/10) and the complex cell subset (mean 0.75 ± 0.27 f0.05 = 21/41, f0.01 = 10/41) were also not statistically distinguishable (P > 0.20 by Kruskal–Wallis test). There was a suggestion (P = 0.07, Kruskal–Wallis test) that differences between the filters derived from Cartesian and polar stimuli were more prominent in the infragranular recordings (mean 0.66 ± 0.30, f0.05 = 14/22, f0.01 = 8/22) than in granular (mean 0.83 ± 0.19, f0.05 = 8/16, f0.01 = 3/16) or supragranular recordings (mean 0.84 ± 0.20, f0.05 = 6/13, f0.01 = 3/13).

Thus most neurons in cat and macaque V1 showed a difference in effective filtering behavior when tested with a Cartesian versus a polar stimulus set, and this phenomenon was not restricted to the input or output laminae.

The analysis of Fig. 7A and the subset analysis above used the total number of spikes during the stimulus presentation (0 to 250 ms) as a response measure (the “on response”). This simple but rather gross response measure may overlook a possibly smaller or greater degree of similarity between the maps over the response time course. To test this, we recalculated the index Ishape(Lcart, Lpolar) for other response measures (Fig. 7, BE): the number of spikes from 0 to 100 ms after stimulus onset (the “on transient”), the number of spikes during the 250-ms off-period (the “off response”), the number of spikes during the first 100 ms of the off-period (the “off transient”), and the first principal component (“PC1”; see Response dynamics below).

As seen from Fig. 7, the distribution of Ishape(Lcart, Lpolar) and the number of units in which this index was significantly <1 was similar for the first five response measures. The significant deviations tended to occur in the same units (not shown). The similarity across these response measures (Fig., 7 AE) indicates that the discrepancy between the effective filtering properties for Cartesian and polar stimuli is not a consequence of the temporal weighting of the response measure. As we will see below, the temporal aspects of the responses to Cartesian and polar stimuli are nearly identical and are heavily dominated by the first principal component, which corroborates the essentially spatial nature of this result.

Orientation and spatial frequency.

Ishape(Lcart, Lpolar) is an omnibus index of the difference in the shapes of the maps Lcart and Lpolar. We used this nonparametric approach because many of the maps do not conform closely to Gabor profiles or other shapes that are well described by a small number of parameters. To seek systematic trends in how these maps change, we now examine two parametric descriptors of the maps Lcart and Lpolar: the orientations ORpeakcart and ORpeakpolar and spatial frequencies SFpeakcart and SFpeakpolar of the peak in their Fourier transforms (Fig. 8). As with Ishape(Lcart, Lpolar), 95% confidence limits on these parameters were determined by a jackknife applied to maps determined with single blocks of trials dropped.

FIG. 8.

Best orientation (deg) (A) and spatial frequency (c/deg) (B) as determined by Fourier transformation of the maps of the spatial filters Lcart and Lpolar. Error bars are 95% confidence limits determined by jackknife, and data are plotted only for units in which there was a well-defined best orientation or spatial frequency. There was a modest (rcirc = 0.44, P < 0.02) correlation between estimate the best orientation and no correlation between the estimated best spatial frequencies. See text for details.

Values of ORpeak were considered significant if their confidence limits included less than the full range of 0 to 180 deg. By this criterion, 35 units (of 51 total) had a significant ORpeakcart; 25 units had a significant ORpeakpolar. In the 20 units in which ORpeakcart and ORpeakpolar were both significant, they differed from each other in 8/20 (P < 0.05, two-tailed t-test based on jackknifed SEs). The average angular difference between ORpeakcart and ORpeakpolar was 28 deg. Within units, values of ORpeakcart and ORpeakpolar were correlated [circular correlation (Fisher 1993) rcirc = 0.44, P < 0.02 by permutation test].

Values of SFpeak were considered significant if their confidence limits did not include 0. By this criterion, 44 units had a significant SFpeakcart; 38 units had a significant SFpeakpolar. Of the 33 units in which SFpeakcart and SFpeakpolar were both significant, they differed from each other (P < 0.05) in 18/33. There was no significant difference of SFpeakcart versus SFpeakpolar across the population (paired t-test), but the differences within individual units were substantial: the average of the ratio difference {max [(SFpeakcart/SFpeakpolar), (SFpeakpolar/SFpeakcart)]} was 2.1. Within units, SFpeakcart and SFpeakpolar were uncorrelated (r = 0.16, P > 0.2). There was no significant correlation between Ishape(Lcart, Lpolar) and the change in SFpeak or ORpeak.

In sum, this analysis suggests that changes in the dependency of apparent spatial frequency on basis set are more prominent than changes in orientation tuning. Caution must be exercised in interpreting this result because the parametric descriptors are not robust or complete descriptors of Lcart or Lpolar. Nevertheless, it is worth noting a parallel to the findings of Touryan et al. 2005), who found a very close match between the orientation of receptive field maps obtained from noise stimuli and natural stimuli, but modest discrepancies between their spatial frequency tunings.


The above analyses considered the shape of the receptive field, but ignored the overall responsiveness to Cartesian and polar stimuli, as well as the degree of nonlinearity. Overall responsiveness to Cartesian or polar stimuli was quantified by Math(11) This is essentially a Michelson contrast between the total power elicited by the set of Cartesian stimuli and the total power elicited by the set of polar stimuli. It is 1 for a unit that responds only to Cartesian stimuli, −1 for a unit that responds only to polar stimuli, and 0 for a unit whose overall responses to the two classes is identical. Estimates of Ic–p were debiased by the jackknife procedure, and two-tailed tests of significance were used. For 13 of the 51 cells, responsiveness to Cartesian stimuli was significantly greater than responsiveness to polar stimuli; for eight of the 51 cells, the difference in the opposite direction was significant (Fig. 9A). Across the population, there was only a slight and borderline significant (P = 0.05) bias of Ic–p in favor of Cartesian stimuli (mean 0.087 ± 0.29, median = 0.041).

FIG. 9.

Distribution of relative responsiveness to Cartesian and polar stimuli, Ic-p (Eq. 11). Values >0 indicate larger responses to Cartesian stimuli; values <0 indicate larger responses to polar stimuli. Significance levels calculated by jackknife and are shown as in Fig. 7. Each panel contains calculations based on a different response measure.

Similar findings were obtained with the other response measures (Fig., 9BE), and the units that showed these differences were similarly prevalent in cat and macaque, simple and complex subsets, and across the laminae.

In sum, 28 of 51 units manifested a difference in receptive field shape when studied with Cartesian and polar basis sets (Fig. 7A), and 21 of 51 units manifested a difference in responsiveness (Fig. 9A). Twelve units manifested both differences and only 14 of the 51 units manifested neither difference.

Comparison of tuning within each basis set.

Overall responsiveness to the members of the two basis sets and selectivity of responses within each basis set need not covary. For example, a neuron might have generally larger responses Cartesian stimuli than to polar stimuli, but might be tuned very sharply to specific polar stimuli. Thus we next determine the extent to which V1 neurons are selectively tuned to the stimuli within the two basis sets. A natural measure of the narrowness of tuning within each basis set is the kurtosis of the distribution of the responses to each of the stimuli. This measure has the advantage that it does not require an assumption about the nature of the tuning. Additionally, this measure (Olshausen and Field 1997; Vinje and Gallant 2000, 2002) can be taken as a measure of the sparseness of the responses to either stimulus set. With R(S) denoting the response to the stimulus S (as before), the kurtosis γ is defined as Math(12) where 〈 〉 indicates an average over all stimuli in one of the two basis sets (and their contrast-inverses).

Taking the spike count during stimulus presentation as the response measure, we found γc = 4.8 ± 6.5 and γp = 5.1 ± 8.4 for Cartesian and polar stimuli, respectively (mean ± SD, across N = 51 neurons, estimates for each unit debiased by the jackknife). As seen in Fig. 10, the distribution of the kurtosis across the population was very similar for the two basis sets. The kurtosis as measured with the two stimulus sets were highly correlated (r = 0.72, P < 0.001). There was no significant difference between the means of these distributions (P > 0.3, paired t-test) or their shape (P > 0.5, Kolmogorov–Smirnov test), and no significant dependency on the simple/complex distinction, cat versus monkey, or laminar location. Similar results were obtained for response measures consisting of the on-transient or the size of the first principal component of the response.

FIG. 10.

Distribution of kurtosis of responses to the Cartesian and polar stimuli, γc and γp (Eq. 12). These distributions are not significantly different by parametric or nonparametric tests, indicating the absence of an overall tendency for neurons to be more narrowly tuned to one stimulus set or the other.

Thus although individual neurons typically had different degrees of responsiveness to the two stimulus sets, neurons that were highly tuned to the Cartesian stimuli tended to be highly tuned to the polar stimuli (and vice versa). Moreover, there was no population difference in the size or sparseness of the responses to these two sets.

Linear and nonlinear, simple and complex.

The relative sizes of the linear and nonlinear components of the response (i.e., contributions of the two pathways of the model of Fig. 4) were similar for the two basis sets. In particular, the indices Isym(Lcart, Ecart) and Isym(Lpolar, Epolar) were significantly different at P < 0.05 (two-tailed) in only nine of the 51 units. There were no significant differences (P > 0.05) between cat and macaque subpopulations, nor between the supragranular, infragranular, or granular compartments.

However, there were significant differences between simple and complex cells. For simple cells, Isym(Lcart, Ecart) = −0.36 ± 0.33 but for complex cells, Isym(Lcart, Ecart) = 0.24 ± 0.53. Polar indices behaved similarly: Isym(Lpolar, Epolar) = −0.34 ± 0.46 for simple cells; Isym(Lpolar, Epolar) = 0.26 ± 0.55 for complex cells. Both differences were significant at P < 0.005. This is not very surprising considering that the simple versus complex classification of cells was based on the F1/F0 ratio, an index of the nonlinearity of the grating responses. Additionally, the correlation between Isym(Lcart, Ecart) and the F1/F0 ratio itself is just as strong within simple and complex classes, as across the entire population (Fig., 11A and B), at least for the Cartesian index. For Isym(Lcart, Ecart), r = −0.63 for all units (P < 0.001), r = −0.81 within the simple cell subset (P < 0.005), and r = −0.50 within the complex cell subset (P < 0.001). Thus, the correlation between Isym(L, E) and F1/F0 likely reflects the relationship between the two measures per se, rather than the simple versus complex dichotomy. Finally (Fig. 11, C and D), there is no evidence for a bimodal distribution of Isym(Lcart, Ecart) or Isym(Lpolar, Epolar).

FIG. 11.

Relationship of indices of overall nonlinearity Isym(L, E) (Eq. 5) determined from Cartesian (A) and polar (B) responses to the F1/F0 ratio used to classify cells as simple and complex. For both Cartesian and polar measurements, units with Isym close to 1 tended to have small values (“complex”) of the F1/F0 ratio. C and D: distribution of these indices across the population. The distributions for Cartesian and polar responses are similar.

The correlation between Isym(Lpolar, Epolar) and the F1/F0 ratio is somewhat weaker, but as with Isym(Lcart, Ecart), is consistent with a relationship between Isym(Lpolar, Epolar) and the F1/F0 ratio per se, and not between Isym(Lpolar, Epolar) and the simple versus complex dichotomy: r = −0.55 for all units (P < 0.001), r = −0.43 within the simple cell subset (P > 0.2), and r = −0.41 within the complex cell subset (P < 0.01).


The model of Fig. 4 generalizes the standard linear–nonlinear model, and reduces to the latter if the spatial profiles of the L and E filters are similar, i.e., if Ishape(Lcart, Ecart) and Ishape(Lpolar, Epolar) are close to 1. Both indices were significantly different from 1 in about half of the units (Fig. 12). For the index based on the on-response (Fig. 12A), Ishape(Lcart, Ecart) = 0.83 ± 0.28, and was significantly <1 at P < 0.05 in f0.05 = 24/51 of the neurons, and significantly less than 1 at P < 0.01 in f0.01 = 13/51 of the neurons. The distribution was similar for the polar index (Fig. 12D): Ishape(Lpolar, Epolar) = 0.79 ± 0.32, with f0.05 = 28/51 and f0.01 = 19/51. There were no noteworthy differences across the subsets considered above (simple vs. complex, macaque vs. cat, and laminar location). Similar results were obtained for response measures based on the transient components of the response (not shown).

FIG. 12.

Distribution of the indices Ishape(Lcart, Ecart) and Ishape(Lpolar, Epolar) (Eq. 10). Values near 1 indicate consistency with a single-pathway linear–nonlinear (LN) model; significant departures from 1 indicate that spatial differences in the 2 filters L and E of Fig. 4 are required to account for the data within a single basis set. Significance levels (for Ishape <1) indicated as in Fig. 7. Each column contains calculations based on a different response measure.

In contrast to the discrepancies between the shapes of the filters as determined from Cartesian and polar basis sets (Fig. 7), very few neurons showed large deviations from the LN model on the basis of Ishape(Lcart, Ecart) or Ishape(Lpolar, Epolar) (i.e., values of these indices <0.5). Moreover, there was no correlation between the degree of consistency with the single-pathway LN model within one basis set (i.e., the indices in Fig. 12) and the degree of consistency of the filters between basis sets (i.e., the indices in Fig. 7). That is, there were neurons whose responses for one basis set was consistent with the single-pathway LN model, but not across basis sets, such as that of unit 3003u (Fig. 3C). Conversely, there were neurons that were inconsistent with the LN model for both basis sets (i.e., required L and E filters of different shapes), but did not show a significant change in these filter shapes across basis sets.

Response dynamics

The similarity of the findings concerning the shape index Ishape(Lcart, Lpolar) for five response measures (Fig., 7A–E) indicates that the choice of response measure was not crucial to our results, and that the two stimulus sets yield generally similar response profiles with a stereotypical temporal profile. The failure to find significant differences based on the second principal component (not shown) is also consistent with this idea because a single stereotypical temporal profile would result in only a small amount of power in higher principal components, and consequently poor ability to estimate values of the shape index reliably. We next test this interpretation directly.

Considering responses to each basis set separately, we extracted the first four principal components of the PSTHs elicited by each stimulus, after binning at 10-ms resolution. Figure 13A shows the first principal component obtained from each unit, and the averages within preparations. There is a notable transient response after stimulus appearance and a smaller transient of same polarity after stimulus disappearance. However, there is no difference in the waveform across the two basis sets. Figure 13B shows the grand average across preparations of the first four principal components. The second principal component has transients of opposite polarity following stimulus appearance and stimulus disappearance (also seen in the individual responses and the averages within each preparation) but, as is the case for the first principal component, the time courses of the components derived from responses to the two basis sets were nearly identical. Third and fourth principal components, although somewhat noisier, show mixtures of these behaviors, but also nearly identical waveforms for the two basis sets. Averaged across all units, the first principal component accounted for 53% of the variance for the Cartesian responses and 50% of the variance for the polar responses.

FIG. 13.

Principal components analysis of response dynamics. A: first principal component in response to Cartesian (top row) and polar (bottom row) stimuli. Analyses are carried out separately for each unit and superimposed. Heavy lines in the top 2 rows are the averages within each preparation; the 3rd row compares these averages (Cartesian: black; polar: gray). B: first 4 principal components, averaged across all preparations, displayed as in the 3rd row of A. Although there is much cell-to-cell and preparation-to-preparation variation in the time course of the responses, the time course of the responses elicited by Cartesian and polar stimuli are nearly identical, as seen by the near-superposition of their principal components.

If all responses had an identical time course, then further principal components would not contribute systematically to the response waveform. Thus, the amount of variance explained by the higher additional principal components is a measure of the degree of spatiotemporal inseparability manifest in the responses, and their waveforms indicate how changes in spatial pattern modulate the response time course. The second principal component accounted for an additional 8% of the variance for each basis set (i.e., 16% of the remaining variance). The third and fourth principal components each accounted for an additional 3.5 to 4% of the total variance.

Thus, the responses elicited by the two stimulus sets have nearly identical time courses and degrees of spatiotemporal inseparability.

Energy models

The generalized LN model above, although providing a convenient way to interpret the responses to the Cartesian and polar stimuli and consistent with some views of cortical neurons, does not have a form that is likely to generate phase-invariant responses often considered typical of complex cells (Movshon et al. 1978b). Such behavior is captured by “energy models” (Adelson and Bergen 1985) and their variants (e.g., David et al. 2004). As we now show, properties of the TDH functions lead to tests of these models based on indices identical or closely related to Isym.


We first show that the model of Fig. 4 subsumes models that include a linear filter G and two nonlinear pathways in parallel, one of which consists of linear filtering (by, say, H+), followed by positive half-wave rectification, and the other of which consists of linear filtering (by, say, H) followed by negative half-wave rectification. By positive half-wave rectification, we mean the function Math(13) by negative half-wave rectification, we mean the function Math(14) Thus this model is formally specified by Math(15) To reduce this to the model of Fig. 4, we note that |u|+ = (u + |u|) and |u| = (−u + |u|). With these relationships and the substitutions Math(16) Equation 15 reduces to Eq. 1. As a special case (see above), pure half-wave rectification (G = 0, H = 0) corresponds, by Eq. 16, to L = E. More generally, this analysis shows that exclusion of the model of Fig. 4 necessarily excludes models with separate branches for positive and negative rectification, even though the three filters needed to characterize such models (G, H+, H) cannot be uniquely determined from our data (Eq. 16).

We next make use of the fact (see appendix) that each TDH function is the Fourier transform of itself, other than a scale factor that depends only on the rank. For the even-rank functions, this scale factor is real, and the imaginary (“sine”) portions of the Fourier transforms of TDH functions are zero. For the odd-rank functions, this scale factor is imaginary, and the real (“cosine”) portions of the Fourier transforms of TDH functions are zero. Thus our entire analysis is symmetric under interchange of space and spatial frequency. In particular, because we can exclude models in which the filters H+ and H are localized to single points in space, we can also exclude models in which these filters extract single spatial frequencies. This class of models includes the “phase-separated” model of David et al. (2004) (which is a model that is linear in the half-rectified Fourier components), provided that only a single spatial frequency enters into the nonlinearities.


In a “true energy” model (Adelson and Bergen 1985), the frequency-domain nonlinearities are quadratic, in contrast to the rectification of the phase-separated model (David et al. 2004). As we next show, the fact that the Cartesian and polar TDH stimuli are related by an orthogonal transformation (Victor and Knight 2003) leads to a simple test of these models, independent of the bandwidth of the energy operators.

We consider a model that applies local quadratic nonlinearities in the space domain Math(17) Because none of the quadratic terms influences the difference between the response to a stimulus and its contrast-inverse, the prediction of the model of Fig. 4 that the Cartesian and polar estimates of L based on Eq. 4 apply here as well. However, the energy model makes different predictions concerning the even-order components of the response and how they relate to Q.

We isolate this relationship by averaging the responses to each stimulus and its contrast-inverse. For stimuli fk belonging to either the Cartesian or polar TDH functions Math(18) Note that the left-hand side of Eq. 18 is |Ek| (Eq. 4), the coordinate of the E filter determined by fitting the model of Fig. 4 to the responses from one basis set.

Because the Cartesian and polar TDH stimuli within each rank are related by an orthogonal transformation (Victor and Knight 2003), we can write Math(19) where T is an orthogonal matrix (∑b TabTa′b = δaa′). As a consequence Math(20) where the summation is over all stimuli of a particular rank. Combining Eqs. 4, 18, and 20. and denoting the common sum of Eq. 20 by Pn(x, y) yields a prediction of the local energy model (Eq. 17:) the sum Math(21) must be independent of the choice of orthogonal basis set.


To test this prediction, we use an index similar to that of Eq. 11, in which squaring has been replaced by absolute value, and the L filter contributions have been removed Math(22) In essence, Jc–p quantifies whether the neuron reports unequal amounts of power in response to the two basis sets. Consistency with a pure energy model implies that Jc–p = 0, within each rank (or across any combination of ranks). Here we examine the index calculated across all ranks and, in view of the results described in Response dynamics focus on the “on-response” measure.

Results of this analysis, with significance levels determined as above by a jackknife procedure, are shown in Fig. 14 . Of fifty-one cells, 26 falsify the model prediction Jc–p = 0 at P < 0.05 (two-tailed). Similar proportions of this discrepancy are seen for simple (7/10) and complex (19/41) cells. For 15 units, the polar responses are too large compared to the Cartesian stimuli for consistency with an energy model (Jc–p < 0); for 11 units, responses to the Cartesian stimuli are too large (Jc–p > 0). These differences included several units for which Jc–p was near −1 or 1, indicating nearly a complete failure of the energy model for these stimuli.

FIG. 14.

Distribution of response size index Jc-p (Eq. 22). Values >0 indicate that the response to Cartesian stimuli (relative to polar stimuli) is greater than predicted by an energy model; values <0 indicate the opposite. Significance levels calculated and shown as in Fig. 9.

Not surprisingly, there was substantial overlap (15 units, P < 0.03, χ2) between these 26 units with Jc–p ≠ 0 and the 21 units for which quadratic measures of the overall response size to these stimulus sets (Eq. 11) differed (Ic–p/0, see above). There was also substantial overlap (16 units, P < 0.02, χ2) between these 26 units and units for which quadratic measures of the even-order response differed (Eq. 11 with L-terms removed, data not shown). We also recalculated Jc–p including only ranks 3, 4, and 5, or only ranks 6 and 7. (Each of these subsets contains 15 stimuli; the lower-rank stimuli are shared, either entirely or in part, between Cartesian and polar basis sets.) This analysis did not yield trends, suggesting that violations of the prediction Jc–p = 0 was equally attributable to low-rank or high-rank responses.


The above analysis has two immediate extensions. First, it applies to modified energy models that contain local half-squarers (Heeger 1992b) Math(23) (This equation differs from that of Eq. 15 in that the nonlinearity is local, rather than after spatial integration.) The reason for this is that at each point (x, y), |S(x, y)|+ = |S(x, y)| for one of {fk, f−k} and |S(x, y)|+ = 0 for the other. A similar observation holds for |S(x, y)|. Thus, the average of the responses to the two stimuli is Math(24) Thus, the prediction Jc–p = 0 holds for this model as well, with Q replaced by (Q+ + Q)/2 in the above analysis.

Finally, we note that our analysis also excludes models based on squaring and half-squaring that are local in the frequency domain. This is because the TDH functions are each Fourier transforms of themselves; so the above expressions (Eqs. 1724) can all be reinterpreted with x and y representing spatial frequencies, rather than points. Such models are equivalent to the phase-separated model of David et al. (2004), but with quadratic nonlinearities or half-squares replacing the half-wave rectifiers, and in addition allow for a parallel linear filter.



We describe here the responses of V1 neurons to simple localized visual stimuli, the two-dimensional Hermite functions (TDHs). These stimuli are intermediate between two kinds of stimuli traditionally used for receptive field analysis: spots, which are spatially highly localized but spectrally broad, and gratings, which are spatially uniform but spectrally narrow. The TDH functions are positioned precisely halfway between these extremes. Their formal definition is symmetric in space and spatial frequency (Victor and Knight 2003) and each function is its own Fourier transform (see appendix). Moreover, they can be organized into independent basis sets (the Cartesian and polar sets used here are just two examples) and each basis set can be used to characterize a neuron's receptive field characteristics. Because these basis sets differ in their two-dimensional spatial organization but are matched in many other respects, this approach provides a systematic way to probe spatial processing by visual neurons.

Our main finding is that most neurons (37/51) showed differences in apparent receptive field size and shape, when studied with the two basis sets. Some neurons also showed a difference in the apparent degree of nonlinearity. It is highly unlikely that these differences arose from eye movements or response instability during the course of the recording because stimuli from the two basis sets were randomly interleaved, and we frequently encountered clusters of neurons (e.g., Figs. 3 and 5) in which some showed dramatic effects, whereas others, recorded simultaneously, showed no differences.

To compare responses to the two basis sets, we made use of a model that incorporates the standard LN model: a parallel combination of a linear and a nonlinear component (Fig. 4). This model fits responses exactly if either basis set is considered in isolation, but not if both basis sets are considered together. Although this approach postulates a specific form for the nonlinearity (full-wave rectification), we have shown above that our analysis generalizes to models with separate positive and negative half-wave rectification. Moreover, comparable findings emerged (Sharpee, unpublished results) when the data were analyzed according to the MID approach (Sharpee et al. 2004), in which the nonlinearity is free to vary. In particular, our findings cannot be attributed to our choice of a specific form for the nonlinearity; were that the case, the MID analysis would have identified this nonlinearity and deduced the same spatial filters for both the Cartesian and polar basis sets.


We have also shown that, although our data cannot fully specify the components of an energy model, the discrepancies observed in response to the two basis sets cannot be accounted for by energy models and many of their variants. We have rigorously excluded “true energy” models (Adelson and Bergen 1985) with a quadratic nonlinearity and arbitrary frequency or spatial dependency, as well as generalizations that include half-squarers and parallel linear pathways. These models predict that the sum of the absolute values of the coefficients of the E filters is independent of basis set, in contrast to our observations (Fig. 14). Another variant is the phase-separated energy model (David et al. 2004) in which each positive and negative real and imaginary parts of each Fourier component of the stimulus is separately subjected to half-wave rectification. Although we cannot rigorously exclude this model in its full generality, it is unlikely to provide a robust account for our results. To the extent that the half-wave rectification can be approximated by a half-squarer, the model can be rigorously excluded (with arbitrary frequency dependency) because it is a generalized energy model of the sort described above. Approximation of half-wave rectification by half-squarers is least likely to be valid if there is only a single frequency channel (so that the effects of the cusp are not smoothed over by mixing with responses from other channels), but this extreme (half-wave rectification within a single frequency channel) can also be rigorously excluded because of the self-transform property of the TDH functions.

In general, a nonlinear transformation can be approximated arbitrarily well by a sufficient number of parallel linear and LN components (Wiener 1958). We have argued above that our data exclude a variety of special cases of this structure. Within this feedforward framework, the simplest models that we cannot exclude have two or more parallel components, and the linear front ends must be restricted (but not to a single point), in both space and spatial frequency, and these front ends must differ sufficiently in their characteristics so that they cannot be merged into a single pathway. Thus whereas it is possible that multichannel feedforward models deduced from noise stimuli (Rust et al. 2005; Touryan et al. 2005) can provide an account for our findings, it is at present unclear whether such models have the appropriate nonlinear characteristics to do so.


The Cartesian and polar basis sets are but two choices among a continuum of possible choices of bases within the two-dimensional Hermite functions. Within each rank, any linear combination of these functions will have the same degree of confinement in space and spatial frequency—although they will lack the symmetry of the Cartesian or the polar stimuli. Even though the symmetries that characterize the Cartesian and polar stimuli ensure that these sets are qualitatively different, it is unlikely that the observed discrepancies in apparent receptive field shape are specific to these two basis sets. One reason is that if symmetry per se were crucial, then only neurons that were perfectly lined up would manifest such discrepancies. However, this is not what is observed. Substantial differences between Cartesian and polar filter estimates were seen for neurons that were well centered (Figs. 3C and 5A) and for neurons that are measurably off-center but still within the common envelope of the TDH stimuli (Figs. 5B and 6C). [Conversely, the subset of cells for which L and E filters were consistent across basis sets also includes neurons that were well centered (Fig. 3B) and neurons that were not (Fig. 5C).] The lack of alignment of receptive field shapes with either Cartesian or polar basis sets (Fig. 10) is additional evidence that the symmetry properties of these basis sets do not play an important role in the phenomena observed. However, because we did not study the dependency of responses of individual neurons on spatial shifts of the stimuli, we cannot rigorously characterize neurons as being preferentially responsive to Cartesian versus polar stimuli per se—even though, fortuitously, one unit with nearly complete Cartesian selectivity was well centered (Fig. 5A), whereas another such unit recorded at the same location (Fig. 5B) was not.

Possible physiologic mechanisms

Although the data presented here do not allow one to deduce the neural basis of the discrepancies between the Cartesian and polar responses, they do provide some helpful evidence. The wide laminar distribution of neurons that show such discrepancies makes it unlikely that these discrepancies arise in V2 (Hegde and Van Essen 2000, 2003, 2004) or later visual areas (Gallant et al. 1993, 1996), and are seen in V1 only as a consequence of extrastriate processing. Our findings are fundamentally spatial. There is virtually no difference in the time course of the responses to the two basis sets, and the observed differences in spatial properties are robust across multiple different response measures (Fig. 13). That these discrepancies are just as prominent in the transient portion of the response as in the entire on-period response (Fig. 7A) supports this inference, although feedback from V2 can be quite rapid (Bullier and Nowak 1995; Nowak et al. 1995). Thus we consider possible physiologic mechanisms intrinsic to V1.


It is increasingly recognized that the linear–nonlinear model is at best a caricature for V1 neurons and thus it is not very surprising that, when tested with a novel stimulus set, V1 responses fail to conform to this model. However, the nature and functional implications of these deviations is incompletely understood. Generally, the main deviations are considered to be gain controls and modulatory influences from the nonclassical receptive field (Albright and Stoner 2002; Freeman et al. 2001) that adjust neuronal contrast-response functions based on spatiotemporal context (Cavanaugh et al. 2002a; Ohzawa et al. 1985; Reid et al. 1992; Sceniak et al. 1999). Although such mechanisms are likely to be active during stimulation with the patterns used here, they are unlikely to account for the basic phenomena we observed because 1) all elements in the two basis sets have identical contrast and 2) corresponding ranks within each basis set are identical in spatial coverage. These two characteristics make our study significantly unlike previous studies of Cartesian and polar stimuli in V1 (Mahon and De Valois 2001), and recommend these stimuli as tools to study the neural analysis of shape in extrastriate cortex. Characteristic 1) allows us to exclude nonspecific mechanisms and characteristic 2) allows for precise predictions from simple models.

The phenomena described here depend on the detailed spatial organization of the stimuli and not just their overall size, contrast, or spatial power spectra. This may be consistent with some degree of spatial selectivity within the nonclassical receptive field (Bair et al. 2003; Cavanaugh et al. 2002b; Freeman et al. 2001). However, the effects of such nonclassical inputs, which would have to include shift in orientation tuning, or an overall difference in the level of responsiveness to Cartesian and polar basis sets as a whole, go beyond the phenomena that have been reported: namely, overall changes in receptive field size and sensitivity (Bair et al. 2003; Cavanaugh et al. 2002a, b; Freeman et al. 2001; Sceniak et al. 1999) and the degree of orientation selectivity (Chen et al. 2005).


Because our data imply that a satisfactory mechanistic account will require more than a minor “tweak” of a feedforward model such as a linear–nonlinear cascade, we consider the consequences of the recurrent interactions that feature heavily in the recent cortical models of Chance et al. (1999) and Tao et al. (McLaughlin et al. 2000; Tao et al. 2004). Fundamental neuronal properties such as orientation tuning (Shapley 2004; Shapley et al. 2003; Somers et al. 1995; Sompolinsky and Shapley 1997) and the simple versus complex categorization (Chance et al. 1999; Tao et al. 2004) likely arise from an interaction of feedforward signals with intracortical feedback. One concrete way in which these interactions might account for our observations is as follows. For neurons whose feedforward orientation tuning is similar to that of the local intracortical signals that it receives, orientation tuning will be relatively independent of the balance of these signals. However, the orientation tunings of neighboring neurons are not identical (DeAngelis et al. 1999; Hubel and Wiesel 1977) and a mismatch between the tuning of feedforward signal and the intracortical signals of ≥20 deg is consequently inevitable in some neurons. In neurons with such mismatches, the apparent receptive field shape will depend strongly on the balance of these signals. This balance is likely to be sensitive to the two-dimensional organization of the stimulus because two distinct orientation tunings (that of the neuron and that of the population) are relevant.



Characterizations of V1 neurons deduced from analytically convenient stimuli, such as m-sequences and gratings, fail to predict responses to more natural stimuli, such as vignetted real-world movies, though qualitative correspondences are typically present (David et al. 2004; Smyth et al. 2003; Touryan et al. 2005). Because V1 neurons are nonlinear, even a catalog of responses to a complete basis set can generate out-of-set predictions only if incorporated into a nonlinear model. At present, such nonlinear models consist of one (or occasionally more) linear–nonlinear cascades, perhaps accompanied by modulatory influences driven by overall contrast, or parametric in the ambient power spectrum (David et al. 2004; Rust et al. 2003, 2005; Simoncelli et al. 2004; Touryan et al. 2005). This failure of predictions based on such models implies that there are aspects of natural scenes that are not well captured by such stimuli, and the deviations in actual responses from those predicted by these models reveals the presence of more specific kinds of spatial nonlinearities.

The two-dimensional Hermite functions explore local spatial patterns in a systematic fashion and, like features such as edges and corners in natural images, are local in space and in spatial frequency (Morrone and Burr 1988). As we have shown directly, linear–nonlinear cascades fail to account for the differences between responses to the Cartesian and polar basis sets. More elaborate models of the sort used to make out-of-set predictions of natural scene responses will likely fail as well because the Cartesian and polar basis sets are matched for contrast and power spectrum, and this will neutralize the effect of the modulatory additions to the cascade models. While this does not imply that the failure of predictions of natural scene responses is explained by the same mechanism(s) that underlie the discrepancy between responses to the Cartesian and polar basis sets, it does mean that the failure of the prediction of responses to natural scenes reveals a more general failing: our current understanding of V1 receptive fields is not sufficiently complete to account for responses to images with two-dimensional structure. Moreover, stimuli such as the two-dimensional Hermite functions may have an important role to play in this regard because their properties allow one to separate nonspecific influences based on overall contrast and spectrum from that of specific local features.


Our findings contain an interesting negative result. Though there are many neurons that respond more strongly to Cartesian stimuli than to polar stimuli, and vice versa, there is only a slight bias across all neurons (mean, 9%; median, 4%) in favor of overall responsiveness toward the Cartesian set, as measured by Ic–p. Thus, once stimuli are equated for contrast, spread, and spectral content, overall differences across the population's responses to Cartesian and polar stimuli are minimal if any (cf. Mahon and Devalois 2001), even though differences within individual units are widespread.

Moreover, the distribution of the kurtosis of each unit's responses to the two sets, γc and γp, was nearly identical, in both mean and shape. Note that the latter comparison has little relationship to nonlinear response modulations but, rather, is a measure of the sparseness of the response distribution within each basis set (Olshausen and Field 1997; Vinje and Gallant 2000, 2002). For example, γc > γp would hold for a linear neuron whose receptive field profile was well matched to a single Cartesian basis function. This is because each Cartesian function typically requires two or more polar basis functions for reconstruction as a linear combination. Conversely, the meaning of the similarity of γc and γp is that, across the population, receptive field shapes were equally well matched by the Cartesian basis functions and the polar basis functions.

This has important theoretical implications. The two-dimensional Hermite functions represent a sequence of functions that gradually increase in size and bandwidth. The product of these quantities (size measured in degrees, bandwidth measured in cycles per degree) is a dimensionless quantity, which we call the (space-bandwidth) aperture. The rank 0 function, the most localized in aperture, is a circularly symmetric Gaussian, which does not have any orientation. The rank 1 functions are the next most localized. In essence, their aperture is sufficiently small so that only one spatial frequency can be resolved.

The higher-rank functions constitute a set of shapes that become increasingly complex, as additional Fourier components can be resolved within their aperture. A conceivable (perhaps even anticipated) outcome of our studies would have been that to the extent that more than one Fourier component is required to define a receptive field shape, these multiple Fourier components would all have similar orientation. Were this the case, we would have found that the higher-order Cartesian stimuli, on average, are better matched to receptive field shapes than the higher-order polar stimuli. This is because only the Cartesian stimuli contain multiple spatial frequencies along the same orientation. Instead, we did not detect any biases in the kinds of receptive field shapes encountered in V1 neurons, given the limits set by their apertures.

Tuning to orientation is universally recognized as a preeminent feature of spatial processing by V1 neurons (Hubel and Wiesel 1968, 1977; Somers et al. 1995; Sompolinsky and Shapley 1997). All orientations are equally represented in V1 (Blasdel 1992; Dragoi et al. 2000; Sirovich and Uglesich 2004), at least to a first approximation (Li et al. 2003). As argued above, orientation tuning can also be viewed as a necessary geometrical consequence of enlarging the aperture of a receptive field just beyond that of a blob—it is the only kind of shape tuning possible for an aperture that admits only one spatial frequency. Thus, our results suggest that in V1, the representation of orientation is part of a more general evenhandedness that applies to shapes of higher space-bandwidth aperture. Beyond V1, the more obvious and intricate shape tuning (Brincat and Connor 2004; Gallant et al. 1993, 1996; Hegde and Van Essen 2000, 2003, 2004) requires a further increase in space-bandwidth aperture. Whether this evenhandedness persists or, alternatively, how it becomes biased to specific kinds of shapes, remains to be seen.


Definitions, examples, and basic properties

Each two-dimensional Hermite function (Fig. 1) is a polynomial function in the spatial coordinates x and y, multiplied by a two-dimensional Gaussian envelope. The TDH of rank 0 is this Gaussian envelope and is common to the Cartesian and polar basis sets Math(A1) Cartesian TDH functions of higher rank are products of this envelope and one-dimensional Hermite polynomials of degree j in x and k in y Math(A2) Here, hj(u) and hk(u) denote one-dimensional Hermite polynomials orthogonal with respect to a Gaussian of unit SD and with unit leading coefficient. hn(u) satisfies (Abramowitz and Stegun 1964) the generating function relationship Math(A3) For example, h0(u) = 1, h1(u) = u,h2(u) = u2 − 1, and h3(u) = u3 − 3u.

The rank n of the two-dimensional Hermite function Cj,k in Eq. A2 is the sum of the degrees of the one-dimensional polynomials, n = j + k. Thus, there are n + 1 Cartesian TDH functions of rank n, with indices (j, k) = (n, 0), (n − 1, 1), …, (0, n). As shown in the left half of Fig. 1, the TDH functions Cn,0,σ (left edge of the triangular array) are of constant sign in the y-direction and have n + 1 alternating lobes of bright and dark in the x-direction. The TDH functions C0,n (right edge of the triangular array) are of constant sign in the x-direction and have n + 1 alternating lobes of bright and dark in the y-direction. TDH functions Cj,k with both indices nonzero (interior of the array) have the appearance of a vignetted checkerboard, with j + 1 “checks” in the x-direction, and k + 1 “checks” in the y-direction.

Polar TDH functions are generically paired (a “cosine” function Aμ,ν,σcos and a “sine” function Aμ,ν,σsin), corresponding to the left and right sides of the right half of Fig. 1. For TDH functions of rank n, n = μ + 2ν. As seen in Fig. 1, a polar TDH Aμ,ν,σcos or Aμ,ν,σsin has ν + 1 alternating bands along any radius, and 2μ sectors of alternating light and dark in each of these circular bands. With x = R cos θ and y = R sin θ, the polar TDH functions take the form Math(A4) and Math(A5) For even ranks, there is an unpaired targetlike polar TDH A0,n/2,σ defined in like manner. The radial polynomials P in the above equations are defined by Math(A6) Thus, there are n + 1 polar TDH functions of rank n. If n is odd, there are (n + 1)/2 (cosine, sine) pairs, for ν = 0, 1, …, (n − 1)/2; if n is even, there are n/2 pairs, for ν = 0, 1, …, (n/2) − 1, and one unpaired function A0,n/2,σ that has no angular dependency.

Further details on the above can be found in Victor and Knight (2003). The correspondence is as follows: Eq. A2 is derived from Eqs. 2.24, 2.26, 2.27, and 2.29 (in Victor and Knight 2003) in the limit c → ∞, Math, Math. Orthonormality of the functions Cj,k for K = 1 follows from the relationship of the Hermite polynomials hn of Eq. A3 to the standard Hermite polynomials Hn (Eq. 2.17). Equations A4 and A5 represent the real and imaginary parts of Eq. 2.48. Equation A6 corresponds to Eq. 2.49. Orthonormality of the functions Aμ,ν,σ for K = 1 follows from Eq. 2.62.

The dependency of the Cartesian TDH functions along a coordinate axis resembles a Gabor function (Gabor 1946), but Gabor functions are trigonometric functions under a Gaussian envelope, whereas TDH functions are polynomials under a Gaussian envelope. The Gaussian derivative functions used by Wilson et al. (Lin and Wilson 1996) are also polynomials under a Gaussian envelope, but for Gaussian derivatives, the envelope is narrower by a factor of Math. Consequently, the TDH functions have much heavier side lobes than the Gaussian derivatives. A second consequence of envelope width is that the TDH functions form an orthogonal set, whereas Gabor functions and Gaussian derivatives do not. This follows from the orthogonality of the polynomial portions of the Hermite functions with respect to the weight Math(A7) This weight is the square of the Gaussian envelope in Eq. A1, and has a standard deviation σ along each axis. Were we to replace the Gaussian envelope in Eq. A1 by its square (Eq. A7), the Gaussian derivatives would be obtained.

Another important property of the TDH functions is that the Cartesian and polar functions of each rank are linear combinations of each other. For example, the threefold symmetric polar stimulus A3,0cos (the circled example of rank 3 in Fig. 1) is given by Math and the targetlike polar stimulus A0,2 (the circled example of rank 4 in Fig. 1) is given by Math. General formulae for these linear combinations are given in Victor and Knight (2003).

The self-transform property

Each TDH function is the Fourier transform of itself, other than a scale factor that depends on rank. To show this, we consider the Fourier transforms of the one-dimensional orthonormalized Hermite functions with unit SD (σ = 1). These functions are defined by Math(A8) and, from the generating function for hn(x) (Eq. A3), it follows that Math(A9) The Fourier transforms of the one-dimensional Hermite functions are defined by Math(A10) We calculate the Fourier transforms (Eq. A10) by the generating function (Eq. A9). Math(A11) where the third equality results from completing the square in x, and the fourth equality results from the substitution u = x − 2z + 2iω. The final definite integral is standard, and is equal to Math(Abramowitz and Stegun 1964). This leads to Math(A12) where the second equality is an algebraic rearrangement and the third equality follows from the generating function (Eq. A9). Equating coefficients of zn in the above leads to the desired relationship between a one-dimensional Hermite function and its Fourier transform Math(A13) The above relationship immediately demonstrates the self-transform property for the Cartesian TDHs because the Cartesian functions of rank n (Eq. A2) can be written as Math(A14) with j + k = n. Consequently Math(A15) Because the polar functions are linear combinations of the Cartesian functions of the same rank, the self-transform property holds for them as well.


This work was supported by National Institutes of Health Grants R01EY-09314 to J. Victor and K25MH-068904 and the Schwartz Foundation to T. Sharpee.


Computing resources for MID calculations were provided in part by the National Partnership for Advanced Computing Infrastructure at the TeraGrid system. We thank P. Martin and the National Vision Research Institute, Melbourne, Victoria, Australia for hospitality during J. Victor's sabbatical.


  • The costs of publication of this article were defrayed in part by the payment of page charges. The article must therefore be hereby marked “advertisement” in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.


View Abstract