|
|
||||||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
1Laboratory for Integrative Neural Systems, RIKEN Brain Science Institute, Saitama; 2Division of Biological Sciences, Graduate School of Science, Hokkaido University, Sapporo; 3Laboratory of Visual Physiology, National Institute of Sensory Organs, Tokyo, Japan
Submitted 21 November 2005; accepted in final form 27 August 2006
| ABSTRACT |
|---|
|
|
|---|
| INTRODUCTION |
|---|
|
|
|---|
Early studies on visual responses of TE neurons showed that these neurons respond to various visual stimuli including natural object images (Bruce et al. 1981
; Desimone et al. 1984
; Gross et al. 1979
; Perrett et al. 1982
; Schwartz et al. 1983
). More recently, a number of studies have attempted to identify the simplest visual features that activate individual neurons in area TE (Kobatake and Tanaka 1994
; Tanaka et al. 1991
). These studies have revealed that essential stimuli for TE neurons are visual features that are geometrically less complex than natural objects. Thus combinations of visual features are necessary for neural representation unique to individual object images in area TE.
As in the primary visual cortex, neurons in area TE with similar response properties are reported to be clustered into columns (Fujita et al. 1992
; Gochin et al. 1991
). The columns responding to visual stimuli have been visualized with intrinsic signal imaging as darkened spots scattered across the cortical surface (Tsunoda et al. 2001
; Wang et al. 1996
, 1998
). In particular, Tsunoda and colleagues used this technique together with conventional extracellular recordings and showed that an object image activates multiple spots, each of which represents a particular visual feature of the object image (Tsunoda et al. 2001
). They reported that some of the visual features represented by activated spots were local features of object images, that is, features that appear in a spatially localized part of the object image. Thus it remains unknown how spatial arrangements of these local features in an object image are specified. Neurons in area TE may represent global features, such as spatial configuration of local features, in addition to spatially localized features. In this paper, we address this question by combining data from intrinsic signal imaging and extracellular recordings.
| METHODS |
|---|
|
|
|---|
Four rhesus monkeys were artificially ventilated with a mixture of N2O, O2, and isoflurane for anesthesia and paralyzed with pancuronium bromide or vecuronium bromide (Tsunoda et al. 2001
). The visual stimuli were presented on a 20-in CRT display placed 57 cm from the eye contralateral to the recording hemisphere. The pupil of the eye was dilated by local application of 0.5% tropicamide 0.5% phenylephrine, and the cornea was covered with a contact lens of appropriate power to focus the visual stimuli onto the retina. The fovea was identified with a custom-made ophthalmoscope, and the position of the fovea was back-projected onto the center of the CRT screen. Except for three-dimensional (3D) objects for manual presentations, the visual stimuli were presented at the center of the CRT display. Electroencephalography (EEG), electrocardiography (ECG), expired CO2 concentration, and rectal temperature were monitored throughout the experiments. The experimental protocol was approved by the Experimental Animal Committee of the RIKEN Institute. All experimental procedures were done in accordance with the guidelines of the RIKEN Institute and the National Institute of Health.
Intrinsic signal imaging
The dorsal part of area TE was exposed and illuminated by light with a wavelength of 605 nm through a glass cover slip window attached to a titanium chamber centered 15.017.5 mm anterior to the ear bar position (Tsunoda et al. 2001
). Reflected light from the cortex was detected by a low-noise video camera (frame rate, 1/30 frames/s; S/N ratio, 60 dB; CS8310, Teli, Japan) and digitized by a 10-bit video capture board (Pulsar, Matrox). The light was focused to a depth of 500 µm below the cortical surface. The imaged area was 6.5 x 4.9 mm and contained 320 x 240 pixels. We presented a visual stimulus to the monkey for 2.0 s, and sequential images were acquired for 4.0 s (starting from 1.0 s before the stimulus onset). During the 2-s stimulus presentation period, a stimulus image appeared and moved in a circular path (with a radius of 0.4° at the rate of 1 cycle/s). The imaging experiments consisted of two sessions. In the first session, the visual stimuli were 1020 object images together with two blank images as control. Then on the basis on these results, we selected several stimuli that activated a large number of spots in the imaged region. In the second session, the selected stimuli ("the original"), their modifications and two controls were used as visual stimuli. Each stimulus was randomly presented 1530 times in one session. The same imaging session as the second session was repeated at least twice on different days to confirm the consistency of the observed spots.
Identification of the active spots
The active spots were extracted as follows (Tsunoda et al. 2001
): 1) images acquired during the 0.5- to 3.0-s period after the onset of stimulus presentation were divided by an average of images during the 1-s period just before the stimulus onset. 2) Gaussian spatial filtering was used to eliminate the global (stimulus nonspecific) darkening and high-frequency noise (cut-off frequencies:
= 0.04 mm-1 for high cut and
= 2.1 mm-1 for low cut). 3) The t-values were calculated by pixel-by-pixel comparison of signal intensity between the filtered images for the trials with a particular stimulus and those for the control trials. The filtered images with a stimulus were averaged for all the trials and a differential image was created by subtracting the averaged image for control trials. Localized dark regions of the differential image, which showed significant darkening (t-test, P < 0.05), were defined as active spots. 4) The contour of active spots was demarcated at the half-value of the peak absorption value. Representative images for each step are shown in Fig. 1.
|
The exposed cortex used for intrinsic signal imaging was covered with a transparent artificial dura made of silicon rubber (Arieli et al. 2002
). Tungsten microelectrodes were inserted into the spots through the artificial dura. The surface blood vessel pattern was used as a mapping reference to identify the position of the spots. Extracellular action potentials were recorded for 3 s in each trial. Visual stimulus presentation started 1 s after the onset of a trial and lasted for 1 s. During the 1-s stimulus-presentation period, a stimulus image appeared and moved in a circular path (with a radius of 0.4° at the rate of 1 cycle/s). No intertrial interval was inserted, so that a blank period between two stimuli was 2 s. The different stimuli were presented in pseudo-random order, and the number of trials for each stimulus was between 10 and 20. For each stimulus, we applied the Wilcoxon test to the difference in the mean firing rate during and before the stimulus presentation. The amplitude of evoked responses for each stimulus was calculated by subtracting the mean firing rate during the 1-s period before the stimulus onset from the mean firing rate during the 1-s stimulus-presentation period, and by averaging for all the trials.
To characterize individual cells, we determined visual features critical for the cells according to previous studies (Fujita 1993
; Fujita et al. 1992
; Tanaka et al. 1991
) (Fig. 2): 1) we manually searched for the most effective visual stimulus among 96 hand-held 3D objects (Fig. 3), 2) we simplified the best stimulus by removing or modifying a particular visual feature of the stimulus, and 3) if the simplified image elicited significant responses (Wilcoxon test, P < 0.05) and also if the response amplitude for the simplified image exceeded a certain threshold, we used this image as the best stimulus in the next step. This procedure was repeated until further simplification failed to produce any response that exceeded the threshold. The threshold was set to 70% of the response elicited by the stimulus before simplification because there was no significant difference in evoked responses at this threshold. Typically, we started with the examination of a monochrome image and silhouette of the original as in Fig. 2. However, image simplifications in the intermediate levels were different from case to case even if the original object was the same. The average numbers of simplification steps before reaching the simplest visual feature was 4.8 ± 2.0 (mean ± SD).
|
|
Definition of object parts
Throughout this manuscript, we took the simplest definition of object parts as the ones naturally distinguishable by discontinuities at minima of negative curvature of the object shape. For example, the minimum of negative curvature of stimulus 1 in Fig. 4A is the joint connecting head and body, and accordingly the stimulus is segmented into "head" (stimulus 3) and "body" (stimulus 2). Although this definition of parts is conventionally used in the field of object vision, there is no a priori reason to define object parts according to this definition for TE neurons. Our intention, however, was not to explore the optimal segmentation for TE neurons, but rather to search for a possible mechanism for representing the spatial relationship among local features.
|
| RESULTS |
|---|
|
|
|---|
First, using intrinsic signal imaging, we identified one or two visual stimuli that activated a large number of spots in an imaged region of area TE. Each of these "original" visual stimuli was segmented into parts containing local features. We then conducted another intrinsic signal imaging session with a stimulus set consisting of the original (stimulus 1), two individual parts of the original (stimuli 2 and 3), and the original with a gap between the two parts (stimulus 4; Fig. 4). The results revealed that each spot was activated differently by these four stimuli. For example, one spot (indicated by an arrowhead in Fig. 4A) was activated by stimuli 1, 3, and 4 but not by stimulus 2. We interpret activity in this spot as being related to local features in the upper part of the original image (Fig. 4A). Because we were interested in identifying spots related to the spatial arrangement of local features in parts, we restricted our analysis to spots that were activated by combinations of the two parts but not by individual parts. Among four examined hemispheres, we found three spots, AC, that satisfied this criterion: these spots were activated by the original image (stimulus 1) and the original with a gap (stimulus 4) but not by either part alone (stimuli 2 and 3; Fig. 4). Because the original with a gap (stimulus 4) activated these spots, activity in these spots could not be caused by specific responses to particular local features at the junction between the parts, such as a sharp negative curvature. We therefore considered these spots to be spatial relationship relevant spots (SRR spots), where we would likely find neurons representing the spatial relationship between two parts or between features within these parts.
Responses of the cells in SRR spots to spatial arrangements of object parts
We then conducted extracellular recordings from 49 cells located within SRR spots to characterize responsiveness of individual cells (13, 14, and 22 cells in spots AC, respectively). First, we examined visual responses of each cell with 96 real object stimuli including faces, hands, imitations of living animals, stuffed animals, tools, and plastic fruits and vegetables (Fig. 3). These objects were presented in various sizes, orientations, and views so that the actual number of two-dimensional images used as visual stimuli was three or four times larger than the number of real objects. The stimuli that elicited significant responses (Wilcoxon test, P < 0.05) were diverse in color, texture, and local shapes (Fig. 5, A, C, and E). We could not explain this visual diversity in effective stimuli by preferred stimuli being different from cell to cell in a spot because individual cells in these spots responded to different stimuli and the response amplitudes did not significantly differ from each other (1-way ANOVA, P > 0.25; Fig. 5, B, D, and F). One common aspect of these effective visual stimuli was that the objects tend to consist of at least two distinguishable parts (Fig. 5, B, D, and F). These results from extracellular recordings were in accordance with the observation that with optical imaging, stimulus selectivity of a spot was the same for stimulus sets that originated from different object images (Fig. 4, C and D).
|
|
The sensitivity of the cells to a particular spatial arrangement of parts could be due to the changes in local shapes of either part that occurred incidentally during spatial rearrangements of parts (for example, see Fig. 6, A and B). We conjectured that this was not the case because SRR spots were less sensitive to variations in local shapes (Fig. 5). To confirm this point, however, we determined the simplest visual feature that produced maximal activation ("critical feature"), and examined the cells sensitivity to modifications of the critical feature for each cell in spots A and B. We systematically simplified the best object stimulus step by step to find critical features for 27 cells, following procedures from previous studies (Fujita 1993
; Fujita et al. 1992
; Kobatake and Tanaka 1994
; Tanaka et al. 1991
) (Fig. 2). Figure 7A shows the responses of a representative cell in spot A to its critical feature and to modifications of the critical feature. The critical feature was a combination of a circle and a rectangle (Fig. 7A, stimulus 1); the presentation of the upper or lower part alone caused significant decrease in the evoked responses (t-test, P < 0.05; Fig. 7A, stimuli 2 and 3). The cell responded equally well to the original colored object image and a silhouette of the original, indicating that color and texture of the stimulus were not essential (for example, see Fig. 2). The cell was not sensitive to changes in the shape of individual parts as long as the combination was preserved (Fig. 7A, stimuli 4 and 5). The existence of two parts was required, but the cell was not sensitive to local features at the junction between the two parts. For example, evoked responses to the stimulus with a gap (stimulus 7) and to the original (stimulus 1) were not significantly different (Fig. 7A). If, however, the "two parts" distinction was made less evident by smoothing the sharp joints (Fig. 7A, stimulus 8), the response was significantly reduced. Thus stimulus 6 but not stimulus 5 caused significant decrease in the evoked responses (Fig. 7A). The representative cell in spot B shows results consistent with the cell in spot A (Fig. 7C). In addition, we found that sharp joints between the two parts presented in isolation (Fig. 7C, stimulus 10) significantly reduced the responses, indicating that a sharp joint by itself was not sufficient.
|
|
|
| DISCUSSION |
|---|
|
|
|---|
Recently, Brincat and Connor examined visual responses of IT neurons with variations of two-dimensional (2D) silhouettes consisting of multiple curvatures and found that optimal features of these neurons were a combination of specific local curvatures arranged in particular positions in space (Brincat and Connor 2004
). In this study, they examined the cells with 2D silhouettes but not with object images. Because representation of spatial configuration requires the cell to be insensitive to the visual attributes specific to particular parts, their results were not conclusive with respect to representation of spatial configuration of object parts.
From our findings in the present study, we suggest that neurons in area TE could represent a particular spatial arrangement of object parts based on two observations: 1) with intrinsic signal imaging, we found activity spots that responded to a combination of two parts but not to either part shown in isolation (Fig. 4) and 2) neurons recorded in these SRR spots were selectively activated by stimuli in which the parts were arranged in specific spatial relationship (Figs. 6 and 9). In addition, our data show that cells in these spots are less sensitive to changes in visual attributes that are essential to characterize local features, such as color, texture, and local shape: 1) activity spots showed the same response selectivity for stimulus sets derived from different object images (Fig. 4, C and D), 2) neurons in these spots responded equally well to the stimuli including different colors, textures, and local shapes (Fig. 5), and 3) the critical features of these neurons did not include particular local features (Fig. 7, B and D; see also Fig. 2). These two sets of results suggest that neurons in these spots were activated when arbitrary local features were arranged in a particular spatial configuration. Further evidence supporting this view is provided by direct comparison between responses to variations in color, texture, and local shape of parts and those to variations in the spatial arrangements of parts: the cells in SRR spots were more selective to particular spatial arrangements of parts than to examined variations in color, texture, and shape (Fig. 10). Therefore in terms of representation of object images, the neurons characterized in this study could play a role in specifying spatial relationships between parts. Altogether we found only four SRR spots among 26 activity spots elicited by "original" object images (16.7%). This relatively small proportion indicates that an object image consists of multiple local features and different types of spatial configurations. In the present study, we only investigated neural representation of one particular type of spatial configuration: two parts aligned vertically. It should be noted that these spots did not respond to the combination of parts when the upper part was rotated 180° (Figs. 6, C and D, and 9, C and E), indicating that these neurons are capable of differentiating two parts. Sensitivity of the cells to some unidentified local cues could be essential for differentiating two parts. Further investigations will be necessary to fully understand the neural representation of spatial relationships among parts or local features in general.
|
Finally, although it has been reported that many neurons in area TE respond to visual features less complex than natural objects, it has remained unclear whether these features are related to local features of object images or to more global features (Fujita 1993
; Fujita et al. 1992
; Ito et al. 1995
; Kobatake and Tanaka 1994
; Tanaka et al. 1991
). Here, by global features we mean the combination of elementary components such as combinations of color and shape and local features. In particular, specification of spatial relationship among parts is one such global features. One important contribution of the present study is that it provides concrete evidence that critical features can be such global features of object images.
| GRANTS |
|---|
|
|
|---|
| ACKNOWLEDGMENTS |
|---|
|
|
|---|
| FOOTNOTES |
|---|
Address for reprint requests and other correspondence: M. Tanifuji, Laboratory for Integrative Neural Systems, RIKEN Brain Science Institute, 2-1 Hirosawa, Wako-shi, Saitama 351-0198, Japan (E-mail: tanifuji{at}riken.jp)
| REFERENCES |
|---|
|
|
|---|
Biederman I. Recognition-by-components: a theory of human image understanding. Psychol Rev 94: 115147, 1987.[CrossRef][ISI][Medline]
Brincat SL and Connor CE. Underlying principles of visual shape selectivity in posterior inferotemporal cortex. Nat Neurosci 7: 880886, 2004.[CrossRef][ISI][Medline]
Bruce C, Desimone R, and Gross CG. Visual properties of neurons in a polysensory area in superior temporal sulcus of the macaque. J Neurophysiol 46: 369384, 1981.
Desimone R, Albright TD, Gross CG, and Bruce C. Stimulus-selective properties of inferior temporal neurons in the macaque. J Neurosci 4: 20512062, 1984.[Abstract]
DiCarlo JJ and Maunsell JHR. Anterior inferotemporal neurons of monkeys engaged in object recognition can be highly sensitive to object retinal position. J Neurophysiol 89: 32643278, 2002.
Fujita I. Columns in the inferotemporal cortex: machinery for visual representation of objects. Biomed Res 14: 21, 1993.
Fujita I, Tanaka K, Ito M, and Cheng K. Columns for visual features of objects in monkey inferotemporal cortex. Nature 360: 343346, 1992.[CrossRef][Medline]
Gochin PM, Miller EK, Gross CG, and Gerstein GL. Functional interactions among neurons in inferior temporal cortex of the awake macaque. Exp Brain Res 84: 505516, 1991.[ISI][Medline]
Gross CG. How inferior temporal cortex became a visual area. Cereb Cortex 5: 455469, 1994.
Gross CG, Bender DB, and Rocha-Miranda CE. Visual receptive fields of neurons in inferotemporal cortex of the monkey. Science 166: 13031306, 1969.
Gross CG, Bender DB, and Gerstein GL. Activity of inferior temporal neurons in behaving monkeys. Neuropsychology 17: 215229, 1979.
Ito M, Tamura H, Fujita I, and Tanaka K. Size and position invariance of neuronal responses in monkey inferotemporal cortex. J Neurophysiol 73: 218226, 1995.
Kobatake E and Tanaka K. Neuronal selectivities to complex object features in the ventral visual pathway of the macaque cerebral cortex. J Neurophysiol 71: 856867, 1994.
Logothetis NK and Sheinberg DL. Visual object recognition. Annu Rev Neurosci 19: 577621, 1996.[CrossRef][ISI][Medline]
Marr D and Nishihara HK. Representation and recognition of the spatial organization of three-dimensional shapes. Proc R Soc Lond B Biol Sci 200: 269294, 1978.[Medline]
Op De Beeck H and Vogels R. Spatial sensitivity of macaque inferior temporal neurons. J Comp Neurol 426: 505518, 2000.[CrossRef][ISI][Medline]
Perrett DI, Rolls ET, and Caan W. Visual neurones responsive to faces in the monkey temporal cortex. Exp Brain Res 47: 329342, 1982.[ISI][Medline]
Schwartz EL, Desimone R, Albright TD, and Gross CG. Shape recognition and inferior temporal neurons. Proc Natl Acad Sci USA 80: 57765778, 1983.
Tanaka K, Saito H, Fukada Y, and Moriya M. Coding visual images of objects in the inferotemporal cortex of the macaque monkey. J Neurophysiol 66: 170189, 1991.
Tsunoda K, Yamane Y, Nishizaki M, and Tanifuji M. Complex objects are represented in macaque inferotemporal cortex by the combination of feature columns. Nat Neurosci 4: 832838, 2001.[CrossRef][ISI][Medline]
Wang G, Tanaka K, and Tanifuji M. Optical imaging of functional organization in the monkey inferotemporal cortex. Science 272: 16651668, 1996.[Abstract]
Wang G, Tanifuji M, and Tanaka K. Functional architecture in monkey inferotemporal cortex revealed by in vivo optical imaging. Neurosci Res 32: 3346, 1998.[CrossRef][ISI][Medline]
Zahn CT and Roskies RZL. Fourier descriptors for plane closed curves. IEEE Trans Comput 21: 269281, 1972.
This article has been cited by other articles:
![]() |
G. A. Orban Higher Order Visual Processing in Macaque Extrastriate Cortex Physiol Rev, January 1, 2008; 88(1): 59 - 89. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. Kiani, H. Esteky, K. Mirpour, and K. Tanaka Object Category Structure in Response Patterns of Neuronal Population in Monkey Inferior Temporal Cortex J Neurophysiol, June 1, 2007; 97(6): 4296 - 4309. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |
| Visit Other APS Journals Online |