|
|
||||||||
1 Institute of Neurology, University College London, London, WC1N 3BG, United Kingdom; 2 Institute of Neuroinformatics, University of Zürich and ETH Zürich, 8057 Zürich, Switzerland
Submitted 17 February 2003; accepted in final form 1 August 2003
| ABSTRACT |
|---|
|
|
|---|
| INTRODUCTION |
|---|
|
|
|---|
The properties of sensory neurons, including the complex cells, can be expected to be well adapted to the statistics of the stimuli they are exposed to under natural conditions.
The most prominent hypothesis of how neural properties should be adapted to the statistics of natural scenes is called "sparse coding." It states that sensory neurons should be selective to specific features, only responding strongly to a small subset of stimuli, but otherwise showing low activities (Barlow 1961
; Fyfe and Baddeley 1995
; Olshausen and Field 1996
). This theory could well explain the properties of simple cells in primary visual cortex (Bell and Sejnowski 1997
; Olshausen and Field 1996
; Van Hateren and van der Schaaf 1998
).
Under what assumption about the objective of adaptation do simulated neurons develop the same properties as complex cells? To derive such an objective, we start with the insight that it is one of the tasks of the brain to extract relevant sensory features (Barlow 1961
). Relevant variables, such as the description of a visual scene in terms of objects, change on a slower time scale than low-level features, such as luminance in a small spatial region. If we, for example, see an animal such as a tiger, it usually stays around for some time. However, the position of the image of its stripes on the retina changes on a shorter time scale. Such insight has led to the development of criteria that measure the stability or temporal coherence of the responses of simulated neurons (Becker 1999
; Einhäuser et al. 2002
; Földiak 1991
; Kayser et al. 2001
; Klopf 1982
; Stone and Harper 1999
; Sutton and Barto 1981
; Wallis and Rolls 1997
; Wiskott and Sejnowski 2002
). These studies have successfully applied this criterion to the representations of artificial stimuli such as moving bars to establish that such a mechanism could lead to complex-type neurons (Földiak 1991
; Wiskott and Sejnowski 2002
). However, by using such simple stimuli, the population of neurons does not obtain a rich enough distribution to be thoroughly compared with physiology.
Here we apply a similar stability criterion to the representations of natural stimuli. We then compare the resulting neuronal response properties, i.e., their selectivity to orientation and spatial frequency as well as their response modulation and aspect ratio, to those of complex cells in primary visual cortex.
| METHODS |
|---|
|
|
|---|
We study the response properties of simulated neurons after adaptation to image sequences of natural scenes. A freely moving cat explores the forest located next to the campus in Zürich while carrying a miniature CCD camera (for details, see Einhäuser et al. 2002
) on its head that samples the natural visual input. This procedure is carried out in accordance with institutional and national guidelines of animal care. A video of 3000 frames, recorded at 25 frames/s, digitized at a resolution of 4.5 pixel/°, and converted to grayscale using the MATLAB rgb2gray function, is used for this study. Ideally we would like to take a single long sequence from the central region of the video. Such a sequence, however, would need to be prohibitively long to uniformly sample the stimulus material. That is why we instead take pairs of patches measuring 30 x 30 pixels from randomly selected but matching locations within two subsequent frames in the movie. Temporal coherence is evaluated between the patches of the same pair, approximating the optimal sampling process. The patches are first multiplied pointwise with a Gaussian kernel centered over the patch the SD (width) of which was 10 pixels. This procedure has a limited effect on the amount of information available in the input stream but avoids edge effects and the anisotropy inherent in square patches. Repeating the simulations below without this windowing leads to qualitatively similar results (data not shown). The receptive field obtained in such simulations are localized, do not cover the full patch, and are approximately round too. The resulting patches are decomposed into their principal components. The first component, representing the mean patch brightness, is removed. Components 2100 carry >95% of the variance and define a vector I, which defines the input to the optimization algorithm. As the activity of each subunit linearly depends on the input, the preprocessing of the input by a principal component analysis, which is also linear transformation, has no influence on the optimization process. Discarding the higher-order components, however, does have an effect. As these components carry only a small part of the total variance, we do not expect an influence of this step on the results obtained. Indeed, this assumption is supported by the results of a recent study (Kayser et al. 2001
). On the positive side, as the number of dimensions of the optimization problem is reduced by a factor of 9 a significant increase in computational efficiency is achieved.
Simulated neurons
Complex cells, in contrast to simple cells, display several strong nonlinear properties (Chance et al. 1999
; Movshon et al. 1978
; Ohzawa et al. 1997
; Spitzer and Hochstein 1988
). Hence, it is not possible to describe them adequately by linear models, and we have to consider nonlinear model neurons. Identical to the choice in a number of other studies (e.g., Hyvärinen and Hoyer 2000
) we chose the two subunit energy model (Adelson and Bergen 1985
; Hyvärinen and Hoyer 2000
).
Each such model neuron consists of two subunits (Fig. 1A). Each of the subunits computes the scalar product of the same input patch (I) with a weight vector (W1,i, W2,i respectively). Hence each neuron is characterized by two linear receptive fields. Both outputs are subsequently squared and summed to define the neurons activity:
.
|
Optimization
The input consists of image patches that are extracted from successive frames of the movies. To simulate the adaptation process, we optimize the parameters of a population of 100 neurons so that their responses are maximally coherent over time while being decorrelated from one another. This is done by maximizing the following objective function
![]() |
denotes the average over all stimuli and thus over time;
is the activity of neuron i at time t minus its mean over all times.
stable takes on large negative values if the output activities change fast. It thus punishes fast temporal variations. The 40-ms lag between two successive time points used in that objective function is well within the range of strong correlations of orientations in natural stimuli (Einhäuser et al. 2002
decorr, on the other hand, takes on large negative values in the case of correlated activities of different neurons and thus punishes such correlations. The average squared value of each subunit's activity is multiplicatively normalized to be one each iteration of the algorithm.
The parameters of the model neurons are optimized by scaled gradient descent. For
stable, this leads to a local Hebb-type learning rule. The weight change is local to the synapse and depends only on pre- and postsynaptic activities at two subsequent points in time.
We furthermore compare our results to the work of Hyvärinen and Hoyer (2000
). In this work, they simulate a set of optimally sparse neurons that are modeled as four-subunit energy models. All subunits are constrained to have uncorrelated output thus effectively enforcing a phase shift of 90°. We repeat their simulations using their code with our data as input. In this simulation, 24 energy detector neurons with four subunits are used. We also perform a number of control simulations where we substitute
stable with one of a number of alternative definitions of sparseness.
Data analysis
In analogy to physiological experiments, we characterize the response properties of the model neurons by several indices. The orientation tuning width is calculated as the range of orientations for which the response to a bar of optimal position is above
of the maximal activity. The best orientation
is defined as the stimulus orientation that leads to maximal responses. The selectivity for spatial frequency is defined via the range of spatial frequencies to which the response exceeds
of the maximal level (Schiller et al. 1976b
). The difference between the lower and upper bound of this range is then multiplied by 100. We measure the responses of neurons to drifting sinusoidal gratings of optimal orientation and spatial frequency. The neurons AC/DC ratio is the maximum minus the minimum divided by the mean of the resulting activity.
The models that are used for the modeling of complex cells, such as the two subunit energy model used here, always respond to moving gratings with twice the temporal frequency of the moving grating as they respond equally well to bright and dark edges. This implies that the simulated neurons have a vanishing first harmonic (F1) while the second harmonic (F2) does not vanish. Real complex cells, however, show such frequency doubling only to a limited degree, and both components are small (Heeger 1992
; Spitzer and Hochstein 1985
). How should the AC/DC ratios of such simulated neurons be compared with the relative modulation of real neurons? Either we could compare the AC/DC ratio to the F2/F0 ratio of real neurons, assuming that the frequency doubling is just an artifact of the simulation method. Alternatively we could compare the AC/DC ratio of the simulated neurons to the F1 of the real neurons; this is the preferable method to distinguish complex cells from simple cells. In the scenario followed in this paper, the simulated neurons should have small AC/DC ratio compared with the relative modulation of real neurons.
The envelope of the receptive field is defined as: Ei(x, y)
Wi,1 (x, y)2 + Wi,2(x, y)2. The length Li and width Vi (defined via the SDs) of the receptive field is calculated (using the abbreviation [·]+
max(., 0))
![]() |
Parametric studies
In parametric studies we characterize the dependence of
stable on the receptive field properties. To elucidate why sparse coding alone is not expected to result in complex cell type responses, we also measure the dependence of a specific definitions of sparseness on the receptive field properties
![]() |
![]() |
,w), W2 = G(5,90°,
,w) and vary length,
, and width, w, between 0.5 and 4 in steps of 0.1. Aspect ratios are binned in steps of 0.2 between 0.2 and 5.
|
| RESULTS |
|---|
|
|
|---|
|
Next we compare real and simulated neurons on the basis of their response to moving gratings. In primary visual cortex, a bimodal distribution of relative modulation strengths is observed (Skottun et al. 1991
) (Fig. 3C). Complex cells are defined as having a relative modulation <1.0, whereas simple cells are defined by larger values of the modulation ratio. In our simulations, a wide bimodal distribution of AC/DC values is also observed. The AC/DC ratios of the optimally adapted complex cells have a mean (0.41) that is not significantly larger than the experimentally observed relative modulations (0.40, P > 0.3 KS test).
|
AC/DC ratio and aspect ratio define the invariant processing performed by complex cells. Thus the simulated neurons with optimally stable activity result in good fits to the measured properties of complex cells in the primary visual cortex.
It has been proposed that combining sparse coding with appropriate boundary conditions also leads to complex cells (Hyvarinen and Hoyer 2000
). We repeat that simulation using our stimulus database. This simulation yields neurons with an orientation selectivity of 37° and a spatial frequency selectivity of 40.5, both well in the range of the physiological values (56°, 46.9, respectively) and comparable to optimizing a stability objective (38°, 51.9, respectively). For the AC/DC ratio, this simulation, however, results in a value of 0.65 that is far larger than the physiological value (0.40) and the result of optimizing a stability objective (0.41; P < 0.001 KS test). Thus combining a sparseness objective with additional boundary conditions does not result in sufficiently translation invariant neurons. Furthermore, the aspect ratio of 1.73 is far larger than the one observed for real complex cells (1.02, P < 0.001 t-test). Similar results and equally significant deviations are found if we exchange
stable in our simulations by the objective function derived from a Cauchy prior as used by Olshausen and Field (1996
) or the Kurtosis. This suggests that only the objective of stability adequately explains the properties of complex cells.
The head-mounted camera does not register changes in gaze associated with movements of the eyes. However, recent results indicate that under the conditions the stimuli were recorded eye movements contribute little to stabilizing the retinal image (Möller et al. 2003
). To control for possible residual stabilizing effects of eye movements, we perform two experiments: 1) we simulate eye movements that randomly stabilize 50% of the patches. And 2) we randomly shuffle 10% of the patches. The resulting receptive field properties are essentially unchanged in both cases. In particular in both cases, they are translation invariant and have AC/DC ratios close to the relative modulation of physiological data (P > 0.3 for both controls, KS test). Therefore we do not expect major changes of the reported results if eye movements of the cats under free viewing conditions were taken into account.
To investigate if the results generalize to a more general nonlinear model or if the results are due to the way, we constructed our model neurons we perform an additional simulation (Fig. 4A). Simulated neurons consisting of eight half-squaring subunits are modeled. The neural properties resulting from optimizing
stable are similar to those found for the two-subunit energy model described in the preceding text. Importantly, the AC/DC ratio distribution is not significantly larger than the relative modulations of real complex cells (P > 0.3, KS test). Thus the results do not critically depend on the constraints on the model neurons' nonlinear properties defined by the two-subunit energy model. The type of the nonlinearity is set in our simulations. For the neurons to exhibit complex cell properties, however, the subunits need to obtain identical orientation and spatial frequency as well as the right phase shift. This simulation thus shows that these properties can be obtained from natural scenes even for varied neuron models.
To better understand the preceding results, we proceed to characterize some important nonlinear statistical properties of videos natural scenes. To do so, we measure the objective values of simulated neurons in response to the videos of natural scenes. We choose the subunits of the same model as in the preceding text to be Gabor wavelets of fixed orientation and spatial frequency, leaving the aspect ratio and the relative phase as free parameters. With this more restricted set of subunit receptive fields, we can systematically analyze the influence of the receptive field properties on various objective functions. Varying the relative phase of the subunits reveals that
stable is maximal if the simulated neuron is translation invariant and the wavelets have a relative phase of 90° (Fig. 4B). Neurons then represent localized oriented energy detectors and are translation invariant as are real complex cells. We furthermore analyze the influence of the aspect ratio on the objective functions (Fig. 4C).
stable reaches its highest value for spherical receptive fields with an aspect ratio of
1 similar to the value of real complex cells (Ohzawa and Freeman 1997
). For comparison with other studies, we also plot sparseness as a function of phase and aspect ratio, which peaks at values that are far from those found in physiology. It thus seems that stability is a good candidate for an adaptation criterion that links complex cells with the statistics of natural scenes.
| DISCUSSION |
|---|
|
|
|---|
Recently Hurri and Hyvarinen (2003
) have proposed that optimizing stability of linear neurons in response to natural stimuli leads to receptive fields like those of simple cells. The stability of linear neurons, however, is always considerably lower than the stability of the nonlinear complex cells in our study. The authors furthermore use a slightly different objective that biases the neurons to be both stable and sparse. These results might still indicate that both simple and complex cell responses could be understood in a coherent framework derived from the idea of stability.
In our simulations, each neuron only saw the input stimulus windowed by a Gaussian. Parts of the properties of the neurons, in particular the aspect ratio could thus be affected by this preprocessing. Some of the simulated neurons, however, do have receptive fields that are smaller than the size of the Gaussian. There is a tendency for neurons to obtain localized receptive fields. It would be interesting for future studies to analyze if the distribution of receptive field sizes can be obtained exclusively from optimizing stability. Such studies would, however, need very large numbers of simulated neurons as they would need to jointly encode the retinal space in addition to the orientation and spatial frequency space.
Do neurons found in primary visual cortex exhibit sparse or stable or maybe both types of response properties? Both objectives seem useful for processing in the nervous system. The question of which objective links the properties of natural scenes to the properties of complex cells is experimentally accessible. On one hand, for these analyses, recordings from neurons in response to natural scenes would need to be compared with response to artificial stimuli such as bars or gratings. With respect to sparseness some experiments started to address this issue (Baddeley et al. 1997
; Vinje and Gallant 2000
). If a large set of natural visual patterns is presented in sequence, most of these are not effectively stimulating the recorded neuron. A small subset of stimuli, however, can activate the neuron strongly and elicit very high firing rates. Similar experiments could address how stable neural responses are.
The fact that complex cells of adult animals are well described as an adaptation to a stability objective raises the question whether this adaptation occurs on onto- or phylogenetic time scales. If there is an ontogenetic component to the development of complex cells, it allows the following experimental test of the stability hypothesis. Changing the environment during an animal's critical period (e.g., by strobe rearing) would impair the development of complex cell type receptive fields. In particular there should be a range of strobe rates in which complex cells are severely affected, whereas simple cells are not. From measurements of correlation times in natural videos (Kayser et al. 2003
), this rate is expected to be of the order of 10Hz.
If simple cells optimize a sparseness criterion and complex cells optimize a stability criterion, it is tempting to speculate, whether such a division of labor is repeated in higher areas. Indeed in a widely used architecture for invariant object recognition, the Neocognitron (Fukushima 1980
), a hierarchical network with an alternation of simple and complex type cells is used. Hence it is interesting to build larger systems consisting of several layers, each optimizing an adequate objective. This could result in a hierarchical system allowing to predict the response properties of neurons in higher cortical areas and to relate the response properties of such neurons to the statistics of the real world.
| ACKNOWLEDGMENTS |
|---|
|
|
|---|
GRANTS
This work was supported by the Boehringer Ingelheim Fund and Collegium Helveticum (K. P. Körding), the Neuroscience Center Zurich (C. Kayser), Honda Research Institute Europe (W. Einhäuser), the Swiss National Science Foundation (P. König, 31-65415.01), and the European Union, Bundesaut für Bildung und Wissenschaft (IST-2000-28127/01.0208).
| FOOTNOTES |
|---|
Address for reprint requests and other correspondence: K. P. Körding, Institute of Neurology, UCL London, Queen Square, London WC1N 3BG, UK (E-mail: konrad{at}koerding.de).
| REFERENCES |
|---|
|
|
|---|
Baddeley R, Abbott LF, Booth MC, Sengpiel F, Freeman T, Wakeman EA, and Rolls ET. Responses of neurons in primary and inferior temporal visual cortices to natural scenes. Proc R Soc Lond B Biol Sci 264: 17751783, 1997.[Medline]
Barlow HB. Possible principles underlying the transformation of sensory messages. In: Sensory Communication, edited by Rosenblith W. Cambridge, MA: MIT Press, 1961, p. 336360.
Becker S. Implicit learning in 3D object recognition: the importance of temporal context. Neural Comput 11: 347374, 1999.[Abstract]
Bell AJ and Sejnowski TJ. The "independent components" of natural scenes are edge filters. Vision Res 37: 33273338, 1997.[CrossRef][ISI][Medline]
Chance FS, Nelson SB, and Abbott LF. Complex cells as cortically amplified simple cells. Nat Neurosci 2: 277282, 1999.[CrossRef][ISI][Medline]
Einhäuser W, Kayser C, König P, and Körding KP. Learning the invariance properties of complex cells from natural stimuli. Eur J Neurosci 15: 475486, 2002.[CrossRef][ISI][Medline]
Földiak P. Learning invariance from transformation sequences. Neural Comput 3: 194200, 1991.
Fukushima K. Neocognitron: a self organizing neural network model for a mechanism of pattern recognition unaffected by shift in position. Biol Cybern 36: 193202, 1980.[CrossRef][ISI][Medline]
Fyfe C and Baddeley R. Finding compact and sparse-distributed representations of visual images. Network Comput Neural Syst 6: 333344, 1995.
Heeger DJ. Half-squaring in responses of cat striate cells. Vis Neurosci 9: 427443, 1992.[ISI][Medline]
Hubel DH and Wiesel TN. Receptive fields, binocular interaction and functional architecture in the cat's visual cortex. J Physiol 160: 106154, 1962.
Hurri J and Hyvärinen A. Simple-cell-like receptive fields maximize temporal coherence in natural video. Neural Comput 15: 663691, 2003.
Hyvärinen A and Hoyer P. Emergence of phase- and shift-invariant features by decomposition of natural images into independent feature subspaces. Neural Comput 12: 17051720, 2000.
Kayser C, Einhäuser W, Dümmer O, König P, and Körding K. Extracting slow subspaces from natural videos leads to complex cells. In: ICANN, edited by Dorffner G, Bischoff H, and Kornik K. Berlin, Germany: Springer, 2001, p. 10751080.
Kayser C, Einhäuser W, and König P. Temporal correlations of orientations in natural scenes. Neurocomputing 52: 117123, 2003.
Kjaer TW, Gawne TJ, Hertz JA, and Richmond BJ. Insensitivity of V1 complex cell responses to small shifts in the retinal image of complex patterns. J Neurophysiol 78: 31873197, 1997.
Klopf AH. The Hedonistic Neuron: A Theory of Memory, Learning, and Intelligence. Washington, DC: Hemisphere, 1982.
Möller G, Einhäuser W, and König P. Cats' eye movements under natural conditions. Soc Neurosci Abstr 386.5, 2003.
Movshon JA, Thompson ID, and Tolhurst DJ. Receptive field organization of complex cells in the cat's striate cortex. J Physiol 283: 7999, 1978.
Ohzawa I, DeAngelis GC, and Freeman RD. Encoding of binocular disparity by complex cells in the cat's visual cortex. J Neurophysiol 77: 28792909, 1997.
Ohzawa I and Freeman W. Spatial pooling of subunits in complex cell receptive fields. Soc Neurosci Abstr 23: 1669, 1997.
Olshausen B and Field D. Emergence of simple-cell receptive field properties by learning a sparse code for natural images. Nature 381: 607609, 1996.[CrossRef][Medline]
Schiller PH, Finlay BL, and Volman SF. Quantitative studies of single-cell properties in monkey striate cortex. II. Orientation specificity and ocular dominance. J Neurophysiol 39: 13201333, 1976a.
Schiller PH, Finlay BL, and Volman SF. Quantitative studies of single-cell properties in monkey striate cortex. III. Spatial frequency. J Neurophysiol 39: 13341351, 1976b.
Skottun BC, De Valois RL, Grosof DH, Movshon JA, Albrecht DG, and Bonds AB. Classifying simple and complex cells on the basis of response modulation. Vision Res 31: 10791086, 1991.[CrossRef][ISI][Medline]
Spitzer H and Hochstein S. Simple and complex-cell response dependences on stimulation parameters. J Neurophysiol 53: 12441265, 1985.
Spitzer H and Hochstein S. Complex-cell receptive field models. Prog Neurobiol 31: 285309, 1988.[CrossRef][ISI][Medline]
Stone JV and Harper N. Temporal constraints on visual learning: a computational model. Perception 28: 10891104, 1999.[Medline]
Sutton RS and Barto AG. An adaptive network that constructs and uses an internal model of its world. Cognit Brain Theory 3: 217246, 1981.
Van Hateren JH and van der Schaaf A. Independent component filters of natural images compared with simple cells in primary visual cortex. Proc R Soc Lond B Biol Sci 265, 1998.
Vinje WE and Gallant JL. Sparse coding and decorrelation in primary visual cortex during natural vision. Science 287: 12731276, 2000.
Wallis G and Rolls ET. Invariant face and object recognition in the visual system. Prog Neurobiol 51: 167194, 1997.[CrossRef][ISI][Medline]
Webster M and De Valois RL. Relationship between spatial-frequency and orientation tuning of striate-cortex cells. J Opt Soc Am A 2: 11241132, 1985.[ISI][Medline]
Wiskott L and Sejnowski TJ. Slow feature analysis: unsupervised learning of invariances. Neural Comput 14: 715770, 2002.
This article has been cited by other articles:
![]() |
R. Turner and M. Sahani A maximum-likelihood interpretation for slow feature analysis. Neural Comput., April 1, 2007; 19(4): 1022 - 1038. [Abstract] [Full Text] [PDF] |
||||
![]() |
P. Berkes and L. Wiskott On the Analysis and Interpretation of Inhomogeneous Quadratic Forms as Receptive Fields Neural Comput., August 1, 2006; 18(8): 1868 - 1895. [Abstract] [Full Text] [PDF] |
||||
![]() |
B. D. Moore IV, H. J. Alitto, and W. M. Usrey Orientation Tuning, But Not Direction Selectivity, Is Invariant to Temporal Frequency in Primary Visual Cortex J Neurophysiol, August 1, 2005; 94(2): 1336 - 1345. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |
| Visit Other APS Journals Online |