A recent optical imaging study of primary visual cortex (V1) by Basole, White, and Fitzpatrick demonstrated that maps of preferred orientation depend on the choice of stimuli used to measure them. These authors measured population responses expressed as a function of the optimal orientation of long drifting bars. They then varied bar length, direction, and speed and found that stimuli of a same orientation can elicit different population responses and stimuli with different orientation can elicit similar population responses. We asked whether these results can be explained from known properties of V1 receptive fields. We implemented an “energy model” where a receptive field integrates stimulus energy over a region of three-dimensional frequency space. The population of receptive fields defines a volume of visibility, which covers all orientations and a plausible range of spatial and temporal frequencies. This energy model correctly predicts the population response to bars of different length, direction, and speed and explains the observations made with optical imaging. The model also readily explains a related phenomenon, the appearance of motion streaks for fast-moving dots. We conclude that the energy model can be applied to activation maps of V1 and predicts phenomena that may otherwise appear to be surprising. These results indicate that maps obtained with optical imaging reflect the layout of neurons selective for stimulus energy, not for isolated stimulus features such as orientation, direction, and speed.
The selectivity of neurons in primary visual cortex (V1) depends concurrently on a number of stimulus attributes. For example, a neuron's preferred orientation depends on the spatial frequency content of the stimulus (De Valois et al. 1979). Similarly, a neuron's preferred direction of motion depends on the spatial configuration of the stimulus (Gizzi et al. 1990; Movshon et al. 1986). Likewise, a neuron's preferred speed depends on the direction of motion of the stimulus (Geisler et al. 2001; Skottun et al. 1994). V1 neurons, therefore do not act as feature detectors that isolate individual stimulus attributes.
The interdependency between stimulus attributes is easily explained by the widely held “energy model” of V1 responses. According to this model, cells in V1 perform image filtering, with filters given by their receptive fields (reviewed in Carandini et al. 1999; De Valois and De Valois 1988; Heeger 1992a,b). In the model, the responses of simple cells are the output of individual filters, and the responses of complex cells are the pooled output of filters with different spatial phase (Adelson and Bergen 1985; Emerson et al. 1992; Hubel and Wiesel 1962; Movshon et al. 1978a). The energy model correctly predicts the selectivity of V1 neurons for a number of stimulus attributes (e.g., DeAngelis et al. 1993b; Movshon et al. 1978c; Reid et al. 1991) as well as the interdependency between attributes (Adelson and Bergen 1985; Grzywacz and Yuille 1990; Skottun et al. 1994; Watson and Ahumada 1983). For example, the model predicts that orientation preference depends on spatial frequency content because stimulus components aligned with the receptive field can elicit a response only if they have the appropriate spatial frequency (De Valois et al. 1979).
A similar interdependency between stimulus attributes has been recently observed by Basole et al. (2003) at the level of cortical maps. Using optical imaging of intrinsic signals (Bonhoeffer and Grinvald 1996), these authors measured population responses in V1 to a variety of moving stimuli. They first recorded responses to long bars to measure a map of preferred orientation. They then recorded responses to short bars that moved obliquely and found them to be inconsistent with the map. Indeed, the map of preferred orientation measured with oblique bars depended both on bar length and on bar speed. Moreover, several different combinations of bar orientation, direction, length, and speed elicited the same pattern of activation on cortex. The authors concluded that the maps seen in V1 do not correspond to isolated features of the visual stimuli and suggested that their results would be predicted by the energy model.
We tested this hypothesis by simulating the response of a population of V1 neurons as predicted by the energy model. We chose model parameters so that the responses of model neurons are consistent with single-cell recordings and applied the model to stimuli that closely resemble those used in the imaging study. We found that the predictions of the model closely match the results obtained by Basole et al. (2003). Thinking about the model in the frequency domain, the axes of which are those of spatial and temporal frequency, provides a simple intuition for the responses of V1 to a complex stimulus (Simoncelli and Heeger 1998; Skottun et al. 1994; Watson and Ahumada 1983). In fact, it even provides a ready explanation for the effects of very fast stimuli known as “motion streaks” (Geisler 1999) or “speedlines” (Burr and Ross 2002). These results confirm that maps of V1 activation obtained with optical imaging represent stimulus energy filtered by the population of V1 neurons and that the energy model allows an intuitive explanation of seemingly complex phenomena.
We implemented the energy model for a population of V1 neurons, and simulated its responses to bars of various orientation, direction, length, and speed. We represented the receptive fields and the stimuli in frequency space, the axes of which are ωx, ωy, ωt, the frequencies in the spatial dimensions x and y, and the temporal frequency (Fig. 2A). We sampled this space with a 65 × 65 × 33 grid.
To simplify the simulations, we approximated the bars used by Basole et al. (2003) with two-dimensional Gaussian functions. The contours of these stimuli are ovals rather than rectangles. Had we used rectangular bars the results would have been similar but the figures would have been less clear.
A bar oriented parallel to one of the axes was therefore described as with length and width dependent on σx and σy.
An advantage of this approximation is that the spatial frequency representation of the stimuli is simple—it is again a two-dimensional Gaussian (e.g., Fig. 1B and C) Stimuli of other orientations were obtained by rotating s(x,y) and S(ωx,ωy) by the appropriate angle.
Stimuli move with a constant speed in a constant direction. Therefore their full frequency representation is the projection of the spatial frequency representation S(ωx,ωy) onto a plane (Watson and Ahumada 1985). The plane is given by the expression where the direction of motion α determines the orientation of the plane, and the speed v determines the slant. Because our stimuli move back and forth, their representation lies onto two symmetrical planes, one for positive v and one for negative v. The preceding expression implies that the intersection between the stimulus planes and the ground plane (ωt = 0) is orthogonal to the direction of motion α and that the angle between the stimulus plane and the ground plane increases with speed v (Fig. 3).
To avoid artifacts due to the sampling of frequency space, stimulus energy was slightly smeared, so that it extended orthogonally to the plane. In particular, along any line perpendicular to the stimulus plane, the profile of the energy distribution was set to be a Gaussian with SD corresponding to the length of two voxel edges, a distance that is small compared with the size of the simulated receptive fields.
The width σx of the stimuli was fixed to 5°, the length σy varied between 5 and 25°, and the speed v varied between 10 and 280°/s. Additional simulations with stimuli that were five times smaller gave very similar results.
A model V1 neuron integrates stimulus energy over a small receptive field in frequency space (Adelson and Bergen 1985; Watson and Ahumada 1983, 1985). The shape of this receptive field is given by a three-dimensional Gaussian (Gaska et al. 1994; Jones and Palmer 1987; McLean and Palmer 1994). The angle between the center of the receptive field and the ωx axis represents the preferred orientation of the neuron, the distance from the ωt axis is the preferred spatial frequency, and the distance from the ground plane is the preferred temporal frequency (Fig. 2B). Responses were normalized for each neuron, such that the response elicited by the optimal grating was the same across neurons.
In results, we refer to “the orientation of” a receptive field. This expression is shorthand for the orientation of a full-field grating eliciting optimal responses from that receptive field.
We made receptive fields tile frequency space to match the basic features of measurements from V1 of several species and to match the specific attributes measured in ferret. Although clearly there are differences across species, the basic properties of the tiling are shared in cat (DeAngelis et al. 1993a; Holub and Morton-Gibson 1981; Ikeda and Wright 1975; Movshon et al. 1978b; Tolhurst and Thompson 1981), monkey (De Valois and De Valois 1988; De Valois et al. 1982) and ferret (Alitto and Usrey 2004; Baker et al. 1998; Chapman and Stryker 1993). Our population of neurons covered 24 preferred orientations, 4 spatial frequencies, and 5 temporal frequencies (Fig. 2C). The preferred orientations were uniformly distributed over 360°. The preferred spatial frequencies were logarithmically distributed between 0.05 and 0.2 cycles/°, consistent with measurements in ferret V1 (Baker et al. 1998), if one assumes that both area 17 and 18 contribute to the responses. The preferred temporal frequencies were logarithmically distributed between 2 and 8 Hz, consistent with measurements made in ferret V1 with high-contrast stimuli (Alitto and Usrey 2004). The SD of the receptive field Gaussian determines the bandwidth of each neuron's tuning curves. We chose it to be 1/3 of the preferred frequency, both along the ωx and ωy directions and along the ωt direction. This value corresponds to a bandwidth of 1.15 octaves, at the low end of the bandwidth measured for spatial frequency tuning in ferret V1 (Baker et al. 1998).
To visualize how responses to a stimulus are distributed across receptive fields in frequency space, we estimated a volume of stimulated receptive fields. For each voxel in frequency space, we considered the sum of all receptive field strengths in that voxel, weighted by each receptive field's response to the stimulus. We visualize this cumulative response by plotting the surface containing voxels that yielded ≥65% of the maximum response.
To compare the simulation results to the data of Basole et al. (2003), we pooled the responses of the population of neurons and expressed them as a function of preferred orientation (see e.g., their Fig. 1C and our Fig. 4B). We call the result a “population response.” As in Basole et al. (2003), to determine the peak of these distributions, we fitted them with a Gaussian function.
We describe the basic properties of the model by briefly reviewing the representation of receptive fields and stimuli in frequency space (Figs. 1–3). We then describe the model's response to a simple visual stimulus, a long bar moving orthogonally to its orientation (Fig. 4). We compare the model's behavior with the data of Basole et al. (2003) when direction of motion deviates from the orthogonal, as a function of stimulus length (Fig. 5), direction (Fig. 6), and speed (Fig. 7). Finally, we describe how fast stimuli give rise to motion streaks (Fig. 8), and we illustrate how stimuli that are different can give rise to patterns of neuronal activation that are similar (Fig. 9).
To understand the selectivity of model neurons and to gain an intuition for the model's predictions, it is advantageous to represent stimuli and receptive fields in frequency space (Watson and Ahumada 1983).
The frequency representation is particularly simple for static stimuli (Fig. 1 ). In frequency space, a static stimulus is represented on a plane the axes of which are the spatial frequencies ωx, ωy, with x and y coordinates of visual space (Fig. 1A). Each point on the plane corresponds to a grating with spatial frequency and orientation determined by the point's distance from the origin and angle with the abscissa. Because orientation returns the same after a 180° rotation, each grating corresponds to two symmetrical points (Fig. 1A). The frequency space representation of any other stimulus follows from this simplest case because any stimulus can be represented as a sum of gratings. For example, a long bar (Fig. 1B) can be represented as the sum of many gratings with similar orientation (the orientation of the bar) but with different spatial frequencies. Among these gratings, the ones with highest contrast are contained in the ellipse in Fig. 1B whose short and long axes are inversely proportional to bar height and width. A different representation is obtained for a dot stimulus: this stimulus is composed of gratings covering all possible orientations and a wide range of frequencies, and hence its frequency representation is circularly symmetric (Fig. 1C).
Each stimulus thus corresponds to a distribution in frequency space, the distribution of “stimulus energy.” Strictly speaking, the representation in frequency space is made of complex numbers (the phase of the complex number indicates the phase of the corresponding grating). One can ignore this point by taking the absolute value of the complex number, effectively pooling across phases. In this paper, as in much of the literature, this absolute value is loosely termed stimulus energy. In other words, the gray regions for the stimuli in Fig. 1 indicate the locations in frequency space where those stimuli “have high energy.”
The frequency representation is also useful to describe the operation of receptive fields (Fig. 2A ). In space, an idealized receptive field is a Gabor function (Hawken and Parker 1987; Jones and Palmer 1987). This spatial receptive field corresponds in the frequency plane to two symmetric disks (Fig. 2A). These disks contain all gratings having spatial frequency and orientation that elicit large responses in the neuron. The center of the disks corresponds to the optimal grating (the grating in Fig. 1A).
Consider now a neuron's full space-time receptive field (Fig. 2B). In particular, consider a receptive field with temporal preferences but no preference for direction of motion. This receptive field is represented in the three-dimensional (3-D) frequency space ωx, ωy, ωt by two balls, the vertical position ωt of which indicates preferred temporal frequency (Fig. 2B). In fact, the two balls can correspond to two receptive fields preferring opposite directions of motion (arrows) but identical in all other respects. As for the spatial receptive field, gratings that elicit strong responses are contained in the balls (here a grating is specified not only by orientation and spatial frequency but also by temporal frequency ωt).
Taken together, the entire population of V1 receptive fields defines in frequency space a “volume of visibility” (Fig. 2C). Each receptive field corresponds to a ball the size of which grows with preferred spatial frequency and temporal frequency, so that the bandwidth in octaves is constant (e.g., De Valois and De Valois 1988; Simoncelli 1993). The resulting volume in frequency space lies above the ground plane (there are no receptive fields that respond preferentially to 0 temporal frequency) and has a hole in the center (there are no receptive fields that respond preferentially to 0 spatial frequency). We call this volume the volume of visibility, in analogy to the window of visibility introduced by Watson and Ahumada (1983) for one-dimensional moving stimuli. Stimulus energy that lies outside the volume of visibility does not noticeably affect V1 responses. In particular, stimuli that have energy only outside the volume of visibility are largely invisible to V1.
For simplicity, we refer to the volume of visibility as if it had sharp borders so that energy can lie “inside” or “outside” it. In reality, the responses of each neuron, (represented as discrete balls in Fig. 2B) decrease gradually as frequency departs from the preferred value.
Before delving into the results, it might be useful to inspect the representation in frequency space of a moving bar and the effects of changing the bar's length, direction and speed (Fig. 3). Consider first a long bar moving back and forth perpendicularly to its orientation. The energy of this stimulus in frequency space lies in two wings emerging from the ground plane (Fig. 3A). These wings are obtained by projecting the energy of the static stimulus (Fig. 1B) onto two planes in 3-D frequency space (Watson and Ahumada 1985). There are two planes—and hence two wings—because the bar moves back and forth, and each direction corresponds to a plane. Consider now the effect of changing stimulus length and speed. If the bar is made shorter, the wings become broader (Fig. 3B). Indeed, the stimulus becomes more similar to a dot, whose spatial frequency representation is a disk (Fig. 1C). If the bar drifts faster, the wings become steeper (Fig. 3C) because the slant of the planes increases with speed. Finally, consider the effect of changing the direction of motion, from perpendicular to diagonal (Fig. 3, D–F). The direction of motion determines the intersection of the planes with the ground plane, so a diagonal motion corresponds to angled wings. This effect is subtle in the representation of the long bar (Fig. 3D) and is more evident in the representations of the short bars (Fig. 3, E and F).
Population responses to a moving bar
Following Basole et al. (2003), we start by measuring the responses of the model neurons to a long bar moving orthogonally. We have seen that the energy of this stimulus consists of two wings emerging from the ground plane (Fig. 3A). As one would expect, the bar elicits a distribution of responses that peaks for those neurons whose preferred orientation is parallel to the bar (45°). Strong responses are elicited only in neurons whose receptive fields in frequency space are close to the planes containing the stimulus energy (e.g., the wings in Fig. 3A). The resulting population response is represented by the colored surfaces in Fig. 4 A that contain the volume of receptive fields that respond with ≥65% of the maximal response to that stimulus. These volumes represent the intersection between the distribution of stimulus energy (Fig. 3A) and the volume of visibility (Fig. 2C). Only the tips of the wings lie in the volume of visibility. The remainder of the stimulus energy lies in a region of low spatial and temporal frequencies (close to the origin), outside the volume of visibility.
To summarize the population response to this stimulus, we plot the average response as a function of receptive field orientation (Fig. 4B). Response is large for neurons whose receptive fields have the same orientation as the bar (45°) and minimal for neurons that prefer the perpendicular orientation (i.e., 135°). For this stimulus the population response thus just reflects the orientation of the underlying receptive fields. A similar result would be obtained in response to a grating of the same orientation (not shown).
The population response allows us to compare the predictions of the model with the results obtained by Fitzpatrick and collaborators with optical imaging methods (Basole et al. 2003). These authors measured maps of activation in response to grating stimuli. From these responses, they assigned a preferred orientation to each pixel in their maps. They then plotted the average activity as a function of preferred orientation. As expected, with long bars moving orthogonally to their orientation, they found population responses that peaked at the bar orientations, very similar to the one in Fig. 4B. Basole et al. (2003) then asked whether this distribution of responses was invariant to changes in stimulus attributes.
Effect of bar length
In a first set of experiments, Basole et al. (2003) measured the effect of varying bar length. Crucially, the direction of motion of the bars was oblique, not perpendicular, to their orientation. The results of their experiment (Fig. 5D) indicate that the population response depends on bar length. With long bars, the distribution peaks near 45°, the veridical bar orientation (Fig. 5D, light gray). With shorter bars, however, the distribution peaks at higher orientations (Fig. 5D, medium gray). With the shortest bar, the distribution peaks at 90° (which is the axis of motion), a full 45° away from the veridical orientation (Fig. 5D, dark gray).
To see if the energy model predicts these effects, we simulated its response to three similar stimuli (Fig. 5, A–C). For the long bar (Fig. 5A), the energy is distributed on two narrow wings similar to those seen with the bar of the previous example (Fig. 4A). Because the difference in the stimuli lies in the direction of motion—diagonal in Fig. 5A, and orthogonal in Fig. 4A—the difference in the energy distributions lies in the intersection of the wings with the ground plane. This intersection is always perpendicular to the direction of motion. In terms of population responses, however, the volumes of receptive fields stimulated in the two cases are similar, centered near 45° (between yellow and green).
The similarity of population responses obtained with long bars moving orthogonally (Fig. 5A) and diagonally (Fig. 4A) is a manifestation of the well-known “aperture problem” occurring for one-dimensional stimuli (Movshon et al. 1985; Wallach 1935; Wohlgemuth 1911). The aperture problem is readily seen if one considers the representation of stimuli in frequency space (Simoncelli 1993). The frequency representations of the two stimuli are extremely similar (compare Fig. 3, A and D). They would have been even more similar if the bar had been infinitely long: the wings would have become lines, compatible with an infinite number of plane angles, i.e., an infinite number of directions (Simoncelli 1993).
When bar length is reduced, there is a dramatic change in the distribution of stimulus energy (Fig. 5B). The energy of a short bar lies on wider wings than that of a long bar (Fig. 3, D and E). Widening the wings recruits receptive fields whose orientation is almost perpendicular to the direction of motion (90°). Therefore the intersection between the wings and the volume of visibility contains activated receptive fields that cover a broader range of orientations and the center of which is 60° (green in our color scheme), intermediate between the orientation of the bar and the orientation perpendicular to the direction of motion. When bar length is reduced even further, the bar becomes a dot (Fig. 5C), which contains equal energy at all orientations (Fig. 1C). The wings thus become semi-circles, intersecting a volume of receptive fields with a very broad range of orientations, centered near 90° (cyan), which is the axis of motion.
The predictions of the energy model resemble the results of Basole et al. (2003) as can be verified by comparing the population responses derived from the simulations (Fig. 5E) with the measured ones (Fig. 5D). In both cases, the peak of the population response moves progressively from 45 to 90° as bar length is reduced. In the simulations, however, the width of the population response broadens as bar length is shortened (Fig. 5E), an effect that is not seen in the data (Fig. 5D).
The model predicts quantitatively how the orientation of the peak population response depends on stimulus length (Fig. 5F). For the speed tested here, the response to a long bar peaks at the orientation of the bar (Fig. 5F, rightmost values), whereas the response to a dot peaks at the orientation orthogonal to the axis of motion (Fig. 5F, leftmost values). A short bar is intermediate between a dot and a long bar and elicits a response that peaks at orientations intermediate between those two extremes (Fig. 5F, intermediate values).
Effect of bar direction
In a second set of experiments, Basole et al. (2003) measured the effect of varying direction of motion of short bars. Their results indicate that the population response depends on direction of motion. With bars moving perpendicularly to their orientation (45°), the population response peaked at the veridical orientation (45°). With bars moving horizontally, however, the response peaked closer to vertical (at 67°). With bars moving vertically the distribution peaked closer to horizontal (at 22°).
The model predicts these effects (Fig. 6). We simulated model responses to three short bars with different directions of motion (Fig. 6, A–C). Even though the orientation of the bar is fixed, the distribution of stimulus energy in frequency space depends on the direction of motion (Fig. 3). Changing the direction of the short bar shifts the intersection between stimulus energy and the window of visibility (Fig. 6, A–C) resulting in very different population responses (Fig. 6G). In agreement with the data of Basole et al. (2003), the peak of the distribution lies between the orientation of the bar and the orientation that is perpendicular to the direction of motion (Figs. 6I and 5F).
In this set of experiments, Basole et al. (2003) also showed an intriguing result: for each direction of motion of the short bar they could choose a long bar of different orientation and direction of motion that elicited similar responses. The model predicts these results, as illustrated in Fig. 6, D–F. For example, the population response to a short diagonal bar (45°) moving horizontally (Fig. 6A) peaks at an orientation of 68° (Fig. 6G); by definition, a long bar oriented at 68° (Fig. 6D) will result in a population response that peaks at the same orientation (Fig. 6H). The direction of motion of the long bar has little effect on the distribution (Fig. 5A) and can be chosen perpendicular to its orientation. Thus the short bar (Fig. 6A) and the long bar (Fig. 6D), though they differ both in orientation and in direction of motion, elicit very similar population responses. Proceeding in a similar manner, one can find “equivalent long bars” also for the remaining two stimuli (Fig. 6, E and F). The short bars and the long bars give rise to population responses that peak at similar orientations and have a broadly similar shape (Fig. 6, G and H).
Effect of bar speed
In a third set of experiments, Basole et al. (2003) measured responses to short moving bars differing in speed. As in the previous experiments, the bars moved obliquely. Their results indicate that the population response depends on bar speed (Fig. 7D). At low speeds, the population response peaks near 60°, higher than the veridical bar orientation (Fig. 7D, light gray). This is the same behavior that was observed in the previous set of experiments. With faster bars, however, the distribution peaks near 45°, the veridical bar orientation (Fig. 7D, medium gray). With even faster bars, the distribution peaks around 20°, lower than the veridical bar orientation (Fig. 7D, dark gray).
To see if the energy model predicts these effects, we stimulated its response to those three stimuli (Fig. 7, A–C). For the slowest bar (Fig. 7A), the results are identical as those seen in the previous simulations (Fig. 5B): The tips of the wings intersect the volume of visibility (Fig. 2C), so the volume of stimulated receptive fields is centered on 60° (green). For the faster bar (Fig. 7B), the wings containing stimulus energy become steeper: moving stimuli have energy on planes whose slant is proportional to speed. The higher slant causes the entire wings to intersect the volume of visibility (Fig. 2C). The intersection therefore includes a much larger portion of the volume with receptive fields that cover almost the entire range of orientations. The center of this range is around 45° (between yellow and green). When bar speed is increased even further (Fig. 7C), the planes become so steep that only the bases of the wings intersect the volume of visibility. The rest of the stimulus energy lies outside the volume of visibility because it occurs at temporal frequencies that are too large for the V1 receptive fields. The steep wings elicit strong responses only in receptive fields oriented around 0° (red). In summary the three stimuli elicit responses that peak at widely different orientations, even though spatially they are identical.
Again, the predictions of the energy model resemble the results of Basole et al. (2003) as can be verified by comparing the population responses derived from the simulations (Fig. 7E) with the measured ones (Fig. 7D). In both cases, the peak of the population response moves progressively from near 60 to near 45 to near 0° as bar speed is increased. In the simulations, however, the baseline of the population response is raised substantially at intermediate speeds (Fig. 7E), an effect that is not seen in the data (Fig. 7D).
The model predicts quantitatively how the orientation of the peak population response depends on stimulus speed (Fig. 7F). For the short bar moving obliquely, the response to slow motion peaks above the veridical orientation of the bar (Fig. 7F, leftmost values), whereas the response to a fast motion peaks at the orientation parallel to the axis of motion (Fig. 7F, rightmost values). Between these two speeds, there is an intermediate speed where response peaks near the veridical orientation (Fig. 7F, intermediate values). This behavior is in broad agreement with the measurements of Basole et al. (2003) (Fig. 3C) in that there is a wide range of stimulus speeds where the orientation of peak response depends almost linearly on stimulus speed.
The phenomenon observed with the fastest bar in the previous experiments is related to a perceptual phenomenon known as “motion streaks” or “speedlines” (Burr and Ross 2002; Geisler 1999). This phenomenon is commonly demonstrated with dot stimuli. When a dot moves very fast, it is perceived as a stationary line with orientation parallel to the dot's axis of motion. It has been proposed that this perception involves neurons whose receptive field is parallel to the dot's axis of motion (Burr and Ross 2002; Geisler 1999). In response to a long bar, these neurons would prefer the orthogonal direction of motion. The energy model exhibits a similar behavior (Skottun et al. 1994): a bar moving very fast elicits responses in receptive fields whose orientation is parallel to the axis of motion (Fig. 7C).
Indeed, the model provides a simple explanation of motion streaks (Fig. 8). A dot moving slowly (Fig. 8A) elicits the largest response in a volume of receptive fields oriented orthogonally to the axis of motion (between green and yellow). By contrast, a dot moving fast (Fig. 8C) elicits the largest response in a volume of receptive fields oriented parallel to the axis of motion (between blue and violet). These are the receptive fields that would signal the “motion streak” (Geisler 1999; Geisler et al. 2001). For comparison, the receptive fields stimulated by the slow dot (Fig. 8A) have a similar orientation to the receptive fields stimulated by a long bar moving along the dot's axis of motion (Fig. 8D). By contrast, the receptive fields stimulated by the fast dot (Fig. 8A) have a similar orientation to the receptive fields stimulated by an orthogonal bar, which moves at a right angle to the dot's axis of motion (Fig. 8E).
The explanation of motion streaks in terms of the energy model finds further support in the data of Basole et al. (2003). In their supplementary Fig. 4, these authors report responses to slow and fast dots and found the same results that we have obtained in the simulations. A possible further test of the appropriateness of the energy model lies in measuring responses to dots of intermediate speed. The model predicts that there is a critical speed that separates the two ranges of behavior seen in Fig. 8, A and C: at this critical speed, the dot stimulates receptive fields of every orientation, albeit of different spatial and temporal frequency (Fig. 8B). An analysis of the breadth of tuning of the population responses as a function of dot speed might reveal a similar phenomenon in the optical imaging data.
We have demonstrated that an energy model of V1 exhibits behaviors similar to those measured with optical imaging by Fitzpatrick and collaborators (Basole et al. 2003). In the model, V1 neurons are filters in frequency space whose receptive fields are selective for a small range of orientations, spatial frequencies, and temporal frequencies. Once they are combined across the population, these receptive fields define a volume of visibility. The population response elicited by a stimulus depends on the portion of stimulus energy contained in the volume of visibility. This simple model predicts how the population responses to bars of various lengths and speeds can peak at orientations that differ from the veridical. In particular, the model explains one of the central results of Basole et al. (2003): that there is no single, abstract map of orientation preference—the map depends on the stimulus used.
The model correctly predicts another aspect of the results of Basole et al. (2003), who found that different combinations of stimulus features can elicit the same population responses when measured in terms of preferred orientation. We illustrate this effect by replotting in Fig. 9 some results that were presented in Figs. 6 and 8. As far as receptive field orientation is concerned, the same responses are elicited by three different moving stimuli: a long bar moving orthogonally (Fig. 9A), a shorter bar moving more slowly and obliquely (Fig. 9B), or a dot moving at high speed (Fig. 9C).
This similarity in the population responses does not mean that V1 should confuse those three stimuli. The stimuli would elicit responses in receptive fields with different preferences for spatial and temporal frequency. Differences between the responses therefore should become visible if one expressed the population response not only in terms of preferred orientation but also in terms of preferred spatial frequency (Bonhoeffer et al. 1995; Hubener et al. 1997; Issa et al. 2000; Shoham et al. 1997; Tootell et al. 1981) and of preferred temporal frequency (DeAngelis et al. 1999; Shoham et al. 1997). However, whether these putative differences in the population responses can be measured experimentally depends on a number of factors, not only on how frequency space is mapped on the cortical surface. The expected differences in pattern of activation might be too small to be resolved by optical imaging either because of blur in signal acquisition or because the relevant neurons are intermingled. In fact, to obtain optical images of intrinsic signals (Bonhoeffer and Grinvald 1996), cortical activity is averaged not only over space (because of limits in resolution) but also over time (because intrinsic signals are slow). Therefore even if two measured activity maps were identical, one should not conclude that the underlying neural responses were identical as differences between activity maps could be occurring at a finer spatial or temporal scale than can be resolved with optical imaging.
Our results and those of Basole et al. (2003) are in broad agreement with each other and with the literature on single-cell responses (the energy model stems from this literature), but they are at odds with much of the literature on functional architecture. This literature tends to contain the more or less overt assumption that the functional maps seen in V1 represent independent maps of stimulus features (Hubener et al. 1997; Issa et al. 2000; Shmuel and Grinvald 1996; Swindale 2000; Swindale et al. 2000; Weliky et al. 1996). According to this view, each of the feature maps represents how the selectivity to a given stimulus feature is mapped onto the cortical surface. A stimulus with a particular combination of features would elicit responses in a map given by the intersection of the corresponding feature maps. This prediction is contradicted by the results of Basole et al. (2003) and by the energy model. For example, the three stimuli in Fig. 7 have identical orientation but elicit responses in widely different portions of the population. These results can be reconciled with the idea of independent maps if what is mapped on the cortical surface are not features of the stimulus but rather the receptive fields tuning for stimulus energy.
The effects of stimulus length, direction, and speed that Basole et al. (2003) demonstrated in ferret are likely to occur also in cat and in primate. Indeed, as we have shown, these effects are a consequence of the shape of the volume of visibility, which, unlike its extent, is similar across species. The shape of the volume of visibility can be observed by taking a slice at a given orientation, and observing the density of the slice (Fig. 10). This slice through the volume of visibility closely resembles the “aggregate spatiotemporal frequency response” measured in cats (DeAngelis et al. 1993a) and seems consistent with observations made in primates (De Valois and De Valois 1988). The density of the volume of visibility is a smooth function of frequency. Crucial to the effects demonstrated here is the fact that this density peaks at intermediate values of spatial and temporal frequencies and decreases markedly when these frequencies approach zero. Because this property of the volume of visibility is common to ferret, cat and primate V1, one can expect qualitatively similar effects in all these species (Alitto and Usrey 2004; Baker et al. 1998; De Valois and De Valois 1988; De Valois et al. 1982; DeAngelis et al. 1993a; Holub and Morton-Gibson 1981; Ikeda and Wright 1975; Movshon et al. 1978b; Tolhurst and Thompson 1981).
While our implementation of the energy model succeeds in capturing the main phenomena seen by Basole et al. (2003), it cannot be considered to be entirely realistic in its details. First, we have assumed that a neuron's selectivity is entirely due to a linear operation based on the receptive field. This basic linearity holds only partially, as V1 neurons are known to exhibit nonlinearities such as contrast saturation, phase advance, surround suppression, contrast-dependent temporal frequency tuning, etc. (reviewed in Carandini et al. 1999). These nonlinearities would be likely to emerge if one were to use stimuli that engage them significantly, for example if one were to manipulate stimulus contrast. Moreover, we have assumed that all receptive fields have the same frequency bandwidth and that receptive field preferences are uniformly distributed across spatial and temporal frequency. These assumptions are simplistic. Both the spatial and temporal frequency bandwidth are known to vary across neurons. Moreover, the spatial frequency bandwidth of ferret V1 neurons correlates negatively with preferred spatial frequency (Baker et al. 1998), and the preferred spatial and temporal frequencies are negatively correlated in cat V1 neurons (Baker and Cynader 1986; DeAngelis et al. 1993a). Nonetheless, these differences between the model and the known properties of V1 receptive fields do not have a strong impact on the predicted responses. Indeed, even with our simplifying assumptions, the resulting shape of the volume of visibility (Fig. 10) is reassuringly similar to that measured in cat (DeAngelis et al. 1993a).
Similarly, in our simulations we have made several simplifying assumptions about the visual stimuli. First, we simulated only the response of a population of receptive fields that are spatially centered on the stimuli, while neglecting neurons whose receptive fields contained only part of the stimuli. It is conceivable that a model in which the neurons see the ends of the stimuli or see more than one bar (the stimuli of Basole et al. contained a random arrangement of bars) might give somewhat different results. Second, we approximated the bar stimuli used in the experiments with two-dimensional Gaussians. The difference between these stimuli, however, is not likely to be substantial because optical and neural blur would be expected to remove the sharp edges and corners of the bars: the energy of these stimuli differs mostly outside the volume of visibility. Third, the simulated stimuli are relatively large with respect to the size of the receptive fields. This difference is not likely to play any significant role: In additional simulations we considered smaller stimuli (width = 1°) that more closely resemble those used by Basole et al. (2003) and obtained results similar to the ones shown here.
The energy model can be used to test whether a visual area performs simple image filtering—as is the case to a first approximation for area V1—or whether it performs more advanced image processing and scene analysis. For example, Sheth et al. (1996) suggested that the two subdivisions of cat V1 (areas 17 and 18) differ in their sensitivity to illusory contours. These authors argued that maps of activation in area 17 reflected the physical orientation of stimulus bars, whereas activation of area 18 reflected the orientation of illusory contours generated by the arrangement of the bars. These results might perhaps be explained in terms of the different selectivity of areas 17 and 18 for spatial and temporal frequency (Issa et al. 2000; Movshon et al. 1978b). Alternatively, these results could reflect a genuine difference between areas 17 and 18. If so, the energy model should fail in predicting the responses obtained in area 18. Although it makes no prediction as to the shape of functional maps, the energy model can predict which combinations of stimulus features should elicit similar maps of activation.
In conclusion, our results confirm the intuition of Basole et al. (2003), who had suggested that an energy model would be appropriate to explain their results. Moreover, our results indicate that the same concepts and models that have proven successful in explaining single unit responses can be fruitfully applied to understand the functional maps that are such a striking feature of primary visual cortex.
This work was supported by the James S McDonnell Foundation 21st Century Research Award in Bridging Brain, Mind, and Behavior.
We thank A. Basole, R. Frazor, and D. Schoppik for helpful suggestions. N. Issa and T. Baker, working independently, have obtained results similar to ours. We thank them for coordinating with us the publication of their paper.
The costs of publication of this article were defrayed in part by the payment of page charges. The article must therefore be hereby marked “advertisement” in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.
- Copyright © 2005 by the American Physiological Society