|
|
||||||||
Volen Center for Complex Systems, Brandeis University, Waltham, Massachusetts 02254-9110
| |
ABSTRACT |
|---|
|
|
|---|
Salinas, Emilio and L. F. Abbott. Invariant visual responses from attentional gain fields. J. Neurophysiol. 77: 3267-3272, 1997. Inferotemporal (IT) neurons exhibit a substantial degree of invariance with respect to translation of images used as visual stimuli. Through theoretical and computer-modeling methods, we show how translation-invariant receptive fields, like those of IT neurons, can be generated from the responses of V4 neurons if the effects of attention are taken into account. The model incorporates a recently reported form of attention-dependent gain modulation in V4 and produces IT receptive fields that shift so they are centered at the point where attention is directed. Receptive fields of variable, attention-controlled spatial scale are obtained when the mechanism is extended to scale-dependent attentional gain fields. The results indicate that gain modulation may play analogous roles in the dorsal and ventral visual pathways, generating transformations from retinal coordinates to body- and object-centered systems, respectively.
The ability to recognize an object regardless of the precise location and scale of its retinal image is a striking feature of visual perception. Inferotemporal (IT) neurons in monkeys provide a neuronal correlate of this phenomenon by displaying translation- and scale-invariant responses to complex visual stimuli (Desimone et al. 1984 Model
Our model consists of a population of V4 neurons driving a single model IT neuron through feed-forward synaptic connections. In accordance with the data, the firing rates of the model V4 neurons are represented by the product of two terms: the output of a nonlinear filter acting on the luminance distribution of the visual scene and a gain field that depends on the location where attention is being directed. The detailed structure of the V4 receptive fields is not critical for the results, but the model works better when the visual responses are nonlinear in the luminosity, for reasons given below. To satisfy this requirement, visual responses of the model V4 neurons are generated using an "energy" model (Heeger 1991 Translation invariance
In Fig. 1, images appear on a 64 × 32 pixel array (1 pixel = 1°), and receptive field centers ai are distributed uniformly on a 32 × 16 grid, separated by 2 pixels in each direction. For each location, there are neurons with four orientation preferences, 0, 45, 90, and 135°, and three frequency selectivities, 1/8, 2/8 and 3/8 cycles per degree. Complex-cell-like responses Fi(ai; I) are generated using an energy model (Heeger 1991
Scale invariance
In Fig. 2, images appear on a 32 × 32 pixel array, and receptive field centers are arranged uniformly on a 16 × 16 grid. The same variety of orientations and spatial frequencies as in Fig. 1 is used. Unlike Fig. 1, the visual responses are modeled as rectified linear filters. Two kinds of visual filters are considered
In the computer simulations, an image is shown at a particular location, the model V4 neurons respond according to Eq. 1 and drive the model IT neuron as specified by Eq. 2. The synaptic connections are established first by translating the training image and enabling the Hebbian synaptic modification process described above. After training, the synaptic weights are not modified any more and the model then is tested. During training, the value of y, corresponding to the position of the attentional locus, is equal to the position of the image presented; during testing, it is set to a variety of fixed locations. The results plotted in the figures show IT responses during the testing phase.
There are two costs associated with a gain modulation mechanism for producing object-centered receptive fields. First, there is some loss of resolution in the relative placement of the different V4 filters because the synaptically weighted sum that determines the IT neuron response acts effectively as a convolution over the gain field profile (see Eq. 9). However, analytic calculations show that the attentional gain field causes no loss of resolution for features within the receptive field of a given V4 cell, provided that the visual filter is a nonlinear functional of the luminance distribution. Indeed we found that the simulated IT responses are more selective when the V4 neurons are modeled as nonlinear filters (like, for example, those of complex cells) than as linear filters. Nevertheless, not all nonlinearities are equally resistant to the "smearing" caused by the convolution over the gain field profile. In the case of translation, the complex-cell-like responses used to generate Fig. 1 (Eq. 3) result in a more pronounced IT selectivity than the simple-cell-like filters of Eq. 6, although, because of rectification, these too are nonlinear. The opposite happens in the case of scaling; the rectified linear filters produce IT receptive fields that are more selective than those obtained through the energy model. Thus each invariance is best achieved using a particular type of matched nonlinearity.
![]()
INTRODUCTION
Abstract
Introduction
Methods
Results
Discussion
References
; Hasselmo et al. 1989
; Logothetis et al. 1995
; Lueschow et al. 1994
; Miyashita and Chang 1988
; Tovee et al. 1994
). Neurons at high levels of the object-recognition pathway of the visual system act as complex filters selective for specific patterns of shape and color (Desimone et al. 1984
; Fujita et al. 1992
; Gallant et al. 1993
; Schwartz et al. 1983
). For these cells to exhibit invariant responses, their filters need to be translated from a fixed retinal coordinate frame to a coordinate frame centered on an attended object (Anderson and Van Essen 1987
; Hinton 1981a
,b
; Olshausen et al. 1993
). Despite some interesting suggestions (Olshausen et al. 1993
), a neuronal mechanism capable of producing this shift has not been verified experimentally.
; Schiller and Lee 1991
). Attention produces a number of effects in this area (Connor et al. 1996
; Desimone and Duncan 1995
; Moran and Desimone 1985
; Motter 1993
). Recent observations (Connor et al. 1996
) indicate that the visual responses of many V4 neurons are modulated by a multiplicative gain factor that depends on where attention is being directed. The gain modulation for each cell is maximal when attention is focused on a point that we call the preferred attentional locus, and it decreases when attention moves away from this point (Connor et al. 1996
, 1997
). Although the neurons were not tested with attention focused directly at the center of their receptive fields, in several cases the responses were shown to increase as attention was directed further away from the receptive field center. Interestingly, the preferred attentional loci were found in directions that appear to be unrelated to the preferred orientations or receptive field locations of the cells and that are uniformly distributed (Connor et al. 1996
, 1997
). As will be shown below, this surprising feature is the crucial element that allows V4 neurons to generate object-centered receptive fields further down the visual processing stream.
, 1992
), similar to that used to describe the receptive fields of complex cells in primary visual cortex. The effect of contrast normalization (Carandini and Heeger 1994
; Heeger 1991
, 1992
) is included by dividing all visual responses by the total power present in the image. Receptive field centers for the V4 neurons are distributed uniformly across the visual field. To keep the total number of model cells reasonable, the V4 receptive fields have four orientation and three spatial frequency preferences. The output of the visual filter for cell i is denoted by Fi(ai; I), where ai is the center of the cell's receptive field and I is the image shown.
). In accordance with these results, the gain fields in the model are represented by Gaussian functions, G. The modulatory term for cell i is denoted by G(y
bi), where y is the currently attended location and bi is the preferred attentional locus of cell i. The Gaussian attentional gain fields are approximately twice the size of the visual receptive fields. According to the experimental findings, there is no alignment or correlation between receptive field centers and preferred attentional loci, other than the fact that they are to some degree near to each other. In particular, for a given neuron, the direction that the preferred attentional locus is displaced relative to the receptive field center is random.
i and is equal to the output of its visual filter times the corresponding modulatory factor
The response of the single model IT neuron, termed V, is determined by computing a synaptically weighted sum of V4 responses, subtracting a threshold
(1)
, and rectifying the result
(2)
![]()
METHODS
Abstract
Introduction
Methods
Results
Discussion
References
, 1992
) by adding the squared outputs of two linearly filtered versions of the image I
Si and Ci stand for the outputs of localized sine and cosine linear filters, i.e.,
(3)
The linear filters are similar to Gabor functions (Field and Tolhurst 1986
(4)
; Jones and Palmer 1987
) except that, for reasons of computational efficiency, half-cosine envelopes (rather than Gaussian) are used
The position vector x has components (x1, x2), and the half-cosine function hcos(x) is equal to cos(x) if
(5)

/2 < x <
/2 and to 0 otherwise. Here
i and ki determine the preferred orientation and spatial frequency, respectively;
determines the receptive field width at baseline, which is 4° (= 4 pixels) for all cells. Preferred attentional loci are located at 24 positions uniformly spaced throughout the 64 pixels in the x direction. Each visual filter output Fi(ai; I) is combined with those preferred attentional loci within 8 pixels from the receptive field center ai, producing six or seven attention-modulated responses (originally we included all 24 combinations of preferred attentional loci for each visual filter output, but we found that only the 6 or 7 nearest the receptive field center actually were needed). The Gaussian attentional gain fields have a standard deviation
= 2°, and therefore a baseline width of~4
= 8°. The result is a total of 32 × 16 × 4 × 3 × 6 V4 responses. The threshold
in Eq. 2 serves to enhance the selectivity of the model IT neuron by eliminating the lower, typically broader, part of its response curve. It is set to 50% of the maximum response obtained when
= 0.

View larger version (34K):
[in a new window]
FIG. 1.
Computer simulation of model network for images translated across visual field. Center of circle indicates locus of attention. a: inferotemporal (IT) responses to translated versions of a letter R, which was presented previously during learning. Spike trains on right are generated for visualization purposes using a Poisson process based on mean firing rate of model IT cell. IT neuron response depends on location of R relative to current attentional locus. Examples are shown with attention focused at pixels 16 and 48. Scale bar is 4 pixels. b: IT response (normalized mean firing rate) plotted as a function of image location. As in d and f, filled circles indicate attention centered at pixel 16; open circles indicate attention at pixel 48. c and d: responses of same model neuron to an M. e and f: responses to a degraded version of R. In all cases, IT receptive field moves with attention.
where the brackets again indicate rectification. To model the modulation produced by the attended scale, each neuron is assigned a preferred attentional scale, analogous to the preferred attentional locus in the case of modulation by an attended location. The gain modulation is a Gaussian function of the difference between the current attended scale and the preferred attended scale of each neuron. A set of 20 preferred attended scales, varying from 3 to 30°, is used to modulate the visual responses; the standard deviation of the Gaussian gain field is
(6)
= 1°. A total of 16 × 16 × 4 ×3 × 2 × 20 gain-modulated V4 responses are used. The threshold
is set in this case to 40% of the maximum response obtained when
= 0 (this value is slightly smaller than in the case of translation, so the resulting response curves are not excessively narrow).

View larger version (23K):
[in a new window]
FIG. 2.
Computer simulation of model network for images shown at different scales. Letters E, previously presented during learning, are shown. a: IT responses to images of sizes 27, 25, and 19 pixels when attended scale is set at 27 pixels (- - -). Spike traces are produced for visualization purposes using a Poisson process based on resulting IT firing rates. b: responses when attended scale is equal to 9 pixels, for images of sizes 9, 11, and 17 pixels. In both a and b, neuron responds strongly when attended scale closely matches size of image. c: mean normalized IT response plotted as a function of image size. Filled circles, attended scale of 9 pixels; open circles, attended scale of 27 pixels. d: degraded version of an E and neural response when attended scale is 15 pixels. e: mean normalized IT response vs. attended size. Filled circles, responses to original E of size 15 pixels; open circles, responses to degraded E shown in d.
![]()
RESULTS
Abstract
Introduction
Methods
Results
Discussion
References
This expression indicates that the synaptic weight from a particular V4 cell depends on the displacement between its preferred attentional locus and receptive field center but not on these two locations independently. It also implies that, viewed as functions of ai, the synaptic weights for two groups of neurons with different bi are translated versions of one another. The weights also may depend on other parameters, such as preferred orientation, and no constraints are placed on those additional dependencies. For simplicity, we will ignore these additional dependencies in the following analysis. If condition 7 is satisfied, it can be shown, under fairly general assumptions, that gain modulation gives rise to shifting receptive fields. For clarity, we consider the simple case in which the visual responses are given by linear filters acting on the image I
(7)
However, it should be stressed that nothing restricts the analysis to this case; similar results can be derived for nonlinear filters.
(8)
Making the substitutions a
(9)
a + y and b
b + y, the integral takes the following form
with
(10)
(11)
, will move with the attentional locus. The simulations confirm this result because the simple Hebbian synaptic modification scheme used produces synaptic weights that satisfy Eq. 7; this too can be shown analytically (for a related example see Salinas and Abbott 1995
). The particular values of the synaptic weights determine the precise form of the final shifting filter
. This is not limited significantly by Eq. 7 because the single-variable function on the right side of Eq. 7 is entirely arbitrary. Furthermore, sets of weights Wi and wi projecting to two different IT neurons can satisfy simultaneously the conditions Wi = Wi(ai
bi) and wi = wi(ai
bi) and still be completely different from each other. Thus the same array of gain-modulated V4 neurons can serve as a basis for multiple, arbitrary shifting filters.
![]()
DISCUSSION
Abstract
Introduction
Methods
Results
Discussion
References
) and IT neurons also show some degree of rotation and perspective invariance (Logothetis et al. 1995
). The combinatorial growth could require attentional modulation acting through successive stages in the ventral visual pathway, such that a sequential transformation gradually accumulates. There is some evidence that attentional effects are present in early visual cortical areas (Moran and Desimone 1985
; Motter 1993
). B. Olshausen has pointed out (personal communication) that the modest shifting effectseen in V4 neurons (Connor et al. 1996
) could be due to attentional gain modulation acting at visual stages before V4.
,b
; Olshausen et al. 1993
), we envision that the responses of high-level visual neurons are fed back to guide the attentional signal, so that receptive fields are scaled accurately and centered on objects that produce robust responses. The mechanism described here is distinct from previous models that achieve translation invariance either through multilayered connectionist architectures engineered to produce "grandmother-cell"-like responses (Fukushima 1980
) or by specifying a hypothesized learning or recall dynamics at single synapses (Anderson and Van Essen 1987
; Földiák 1991
; Hinton 1981a
,b
; Olshausen et al. 1993
; Wallis 1994
). The present model exploits the mechanism of gain modulation within a neuronal array in a way that is consistent with reported observations (Connor et al. 1996
, 1997
) and places a much looser constraint on the individual synapses. Our model is related closely to ideas developed during the study of parietal cortex, where gaze-direction-dependent gain modulation of visual responses has been reported (Andersen et al. 1985
, 1990
; Brotchie et al. 1995
). Theoretical work (Andersen et al. 1990
, 1993
; Pouget and Sejnowski 1995
, 1996; Salinas and Abbott 1995
; Zipser and Andersen 1988
) suggests that gain-modulated parietal activity forms the basis for transformations from retinal to body-centered coordinates useful in visually guided motor tasks. We propose here that a similar mechanism acts to transform images from a retinal basis to an object-centered form useful for invariant perception. Thus gain modulation may be used in a similar manner to perform coordinate transformations in both the dorsal-"where" and the ventral-"what" visual pathways.
| |
ACKNOWLEDGEMENTS |
|---|
We are grateful to D. Van Essen and E. Connor for enlightened discussions and for telling us about their experiments. We also thank C. Anderson and B. Olshausen for helpful comments and discussions.
This research was supported by the Sloan Center for Theoretical Neurobiology at Brandeis University, National Science Foundation Grant DMS-9503261, the W. M. Keck Foundation, and the Conacyt-Fulbright-IIE program.
| |
FOOTNOTES |
|---|
Address reprint requests to L. F. Abbott.
Received 11 September 1996; accepted in final form 7 February 1997.
| |
REFERENCES |
|---|
|
|
|---|
This article has been cited by other articles:
![]() |
G. M. Ghose and J. H. R. Maunsell Spatial Summation Can Explain the Attentional Modulation of Neuronal Responses to Multiple Stimuli in Area V4 J. Neurosci., May 7, 2008; 28(19): 5115 - 5126. [Abstract] [Full Text] [PDF] |
||||
![]() |
E. P. Cook, J. A. Guest, Y. Liang, N. Y. Masse, and C. M. Colbert Dendrite-to-Soma Input/Output Function of Continuous Time-Varying Signals in Hippocampal CA1 Pyramidal Neurons J Neurophysiol, November 1, 2007; 98(5): 2943 - 2955. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. S. Quevedo and R. C. Coghill Attentional Modulation of Spatial Integration of Pain: Evidence for Dynamic Spatial Tuning J. Neurosci., October 24, 2007; 27(43): 11635 - 11640. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. R. Cassanello and V. P. Ferrera Visual remapping by vector subtraction: analysis of multiplicative gain field models. Neural Comput., September 1, 2007; 19(9): 2353 - 2386. [Abstract] [Full Text] [PDF] |
||||
![]() |
B. J. Fischer, J. L. Pena, and M. Konishi Emergence of Multiplicative Auditory Responses in the Midbrain of the Barn Owl J Neurophysiol, September 1, 2007; 98(3): 1181 - 1193. [Abstract] [Full Text] [PDF] |
||||
![]() |
P. Dayan Images, Frames, and Connectionist Hierarchies. Neural Comput., October 1, 2006; 18(10): 2293 - 2319. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. Derdikman, C. Yu, S. Haidarliu, K. Bagdasarian, A. Arieli, and E. Ahissar Layer-Specific Touch-Dependent Facilitation and Depression in the Somatosensory Cortex during Active Whisking. J. Neurosci., September 13, 2006; 26(37): 9538 - 9547. [Abstract] [Full Text] [PDF] |
||||
![]() |
J.-M. Hopf, C. N. Boehler, S. J. Luck, J. K. Tsotsos, H.-J. Heinze, and M. A. Schoenfeld Direct neurophysiological evidence for spatial suppression surrounding the focus of attention in vision PNAS, January 24, 2006; 103(4): 1053 - 1058. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. Azouz Dynamic Spatiotemporal Synaptic Integration in Cortical Neurons: Neuronal Gain, Revisited J Neurophysiol, October 1, 2005; 94(4): 2785 - 2796. [Abstract] [Full Text] [PDF] |
||||
![]() |
F. H. Hamker The Reentry Hypothesis: The Putative Interaction of the Frontal Eye Field, Ventrolateral Prefrontal Cortex, and Areas V4, IT for Attention and Eye Movement Cereb Cortex, April 1, 2005; 15(4): 431 - 447. [Abstract] [Full Text] [PDF] |
||||
![]() |
E. Salinas Fast Remapping of Sensory Stimuli onto Motor Actions on the Basis of Contextual Modulation J. Neurosci., February 4, 2004; 24(5): 1113 - 1118. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. W. Spratling and M. H. Johnson A Feedback Model of Visual Attention J. Cogn. Neurosci., February 1, 2004; 16(2): 219 - 237. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. J. DiCarlo and J. H. R. Maunsell Anterior Inferotemporal Neurons of Monkeys Engaged in Object Recognition Can be Highly Sensitive to Object Retinal Position J Neurophysiol, June 1, 2003; 89(6): 3264 - 3278. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. W. Spratling Cortical region interactions and the functional role of apical dendrites. Behav Cogn Neurosci Rev, September 1, 2002; 1(3): 219 - 228. [Abstract] [PDF] |
||||
![]() |
S. Ben Hamed, J.-R. Duhamel, F. Bremmer, and W. Graf Visual Receptive Field Modulation in the Lateral Intraparietal Area during Attentive Fixation and Free Gaze Cereb Cortex, March 1, 2002; 12(3): 234 - 245. [Abstract] [Full Text] [PDF] |
||||
![]() |
G. Bosco and R. E. Poppele Proprioception From a Spinocerebellar Perspective Physiol Rev, April 1, 2001; 81(2): 539 - 568. [Abstract] [Full Text] [PDF] |
||||
![]() |
F. Gabbiani, C. Mo, and G. Laurent Invariance of Angular Threshold Computation in a Wide-Field Looming-Sensitive Neuron J. Neurosci., January 1, 2001; 21(1): 314 - 329. [Abstract] [Full Text] [PDF] |
||||
![]() |
E. T. Rolls and T. Milward A Model of Invariant Object Recognition in the Visual System: Learning Rules, Activation Functions, Lateral Inhibition, and Information-Based Performance Measures Neural Comput., November 1, 2000; 12(11): 2547 - 2572. [Abstract] [Full Text] |
||||
| ||||||||||||||||||||||||||