|
|
||||||||
Howard Hughes Medical Institute and Division of Neuroscience, Baylor College of Medicine, Houston, Texas 77030
Submitted 10 May 2002; accepted in final form 6 February 2003
| ABSTRACT |
|---|
|
|
|---|
60% response decrease between positions within ±1.5° of the center of gaze, and 52% of neurons were unresponsive to one or more of these positions. Consistent with previous studies, each neuron's rank order of target preferences was largely unaffected across position changes. Although we have not yet determined the conditions necessary to observe this marked position sensitivity in AIT responses, we rule out effects of spatial-frequency content, eye movements, and failures to include the RF center. To reconcile this observation with previous studies, we hypothesize that either AIT position sensitivity strongly depends on object size or that position sensitivity is sharpened by extensive visual experience at fixed retinal positions or by the presence of flanking distractors. | INTRODUCTION |
|---|
|
|
|---|
Object position changes are a common source of image variation because they occur frequently when environments are explored with eye, head, or body movements. Yet even in the face of such position variation, we easily carry out behaviors that depend on recognition. Indeed, some studies suggest that recognition can tolerate changes of
5° (Biederman and Cooper 1991
; Ellis et al. 1989
). However, others indicate that the position tolerance of recognition depends on visual experience and the similarity of the objects to be distinguished (Dill and Edelman 2001
; Dill and Fahle 1997
, 1998
; Foster and Kahn 1985
; Nazir and O'Regan 1990
).
Any theory that can explain some range of position tolerance in recognition behavior must include mechanisms that transform retinal images to neuronal signals that are sensitive to object form but are largely insensitive to object position over that range. That is, neuronal signals that are at least as position tolerant as the behavior must exist somewhere in the brain because the behavior dictates their presence at the level of motor neurons. Such neurons could be described as having large receptive fields (RFs) in that they respond selectively to objects over all retinal positions at which recognition occurs. However, because it would be inappropriate to describe motor neurons as having large RFs, we use the term position sensitivity because it can be applied without confusion to the neuronal responses along the entire stimulus-motor chain of processing.
Although many mechanisms have been proposed to create object-selective, position-tolerant signals in the brain (e.g., Biederman 1987
; Mel 1997
; Olshausen et al. 1993
; Riesenhuber and Poggio 1999
; Salinas and Abbott 1997
; Ullman 1996
), the actual mechanisms are unknown, and the brain regions thought to contain these signals are poorly understood. The dominant hypothesis is that these mechanisms operate in the ventral visual processing stream of the cerebral cortex and produce position-tolerant patterns of neuronal activity at the highest level of that streamthe anterior inferotemporal cortex (AIT) (Gross 1973
; Logothetis and Sheinberg 1996
; Tanaka 1996
; Ungerleider and Mishkin 1982
). Indeed, inferotemporal cortex (IT) likely plays a central role in object recognition because IT lesions (Dean 1982
; Weiskrantz and Saunders 1984
) or inactivation (Horel 1996
) impair recognition, and IT neuronal responses are selective for complex stimulus forms (Logothetis and Sheinberg 1996
; Miyashita 1993
; Tanaka 1996
), such as faces (Desimone et al. 1984
; Perrett et al. 1982
).
The strongest statement of the IT position-tolerance hypothesis predicts that IT responses should be highly sensitive to stimulus form (i.e., identity) and completely insensitive to stimulus position (within the visual field). It is already well known that this strict interpretation is not true because previous studies show that IT neurons have finite RFs and that IT responses often decrease with changes in stimulus position away from the RF center (Boussaoud et al. 1991
; Desimone et al. 1984
; Gross et al. 1969
, 1972
; Ito et al. 1995
; Kobatake and Tanaka 1994
; Leuschow et al. 1994
; Logothetis et al. 1995
; Missal et al. 1999
; Op de Beeck and Vogels 2000
; Richmond et al. 1983
; Sary et al. 1993
; Schwartz et al. 1983
; Tovée et al. 1994
). Furthermore, IT neurons are often described as having only a relative form of position tolerance in which the neuron's overall responsiveness decreases with changes in position but its rank order of target preferences remains the same (e.g., Logothetis and Sheinberg 1996
). We do not yet know if or how this relative position tolerance supports nonrelative behavioral position tolerance. Nevertheless, IT neurons have been shown to maintain this relative position tolerance over visual regions
10° in diameter (Ito et al. 1995
; but see Logothetis et al. 1995
and discussion; Sary et al. 1993
; Schwartz et al. 1983
; Tovée et al. 1994
). Thus all of these studies suggest that IT neurons maintain responsivity over large regions of visual spacethat is, they have large RFs. Indeed, standard RF mapping methods indicate that AIT neurons have very large RFs (10 -30° in diameter) (Boussaoud et al. 1991
; Desimone et al. 1984
; Gross et al. 1969
, 1972
; Kobatake and Tanaka 1994
; Op de Beeck and Vogels 2000
; Richmond et al. 1983
).
Although previous studies indicate that AIT neurons maintain relative form selectivity over large RFs, it is not known if or how these neuronal responses compare with the position tolerance of the recognition behavior they are thought to support. We therefore sought to understand the neuronal responses to one or more recognition targets placed within the large RFs of form-selective AIT neurons while animals performed form-recognition tasks. To this end, we trained animals to recognize and report the identity of familiar objects and developed a technique that allowed presentation of visual stimuli to arbitrary retinal positions with an accuracy of
0.1°, even in free-viewing animals (DiCarlo and Maunsell 2000
). We first sought to confirm the large RF property of AIT neurons by presenting stimuli at three closely spaced retinal positions (-1.5, 0, and +1.5°). Based on the studies described in the preceding text, these positions should have all been well within the RFs of essentially all AIT neurons. Unexpectedly, most AIT neurons were highly sensitive to these small changes in stimulus position.
| METHODS |
|---|
|
|
|---|
Experiments were performed on two male rhesus monkeys (Macaca mulatta) weighing 4.5 and 4.7 kg. Before behavioral training, aseptic surgery was performed to attach a head post to the skull and to implant a scleral search coil in the right eye. After 2-3 mo of behavioral training (following text), a second surgery was performed to place a recording chamber (18 mm diam) to reach the anterior half of the left temporal lobe (chamber Horsley-Clark center = 15 mm A). All animal procedures were performed in compliance with the standards of the Baylor College of Medicine Animal Research Committee and the American Physiological Society.
Eye-position monitoring
Horizontal and vertical eye positions were monitored using the scleral search coil (Robinson 1963
). Each channel was low-pass filtered at a corner frequency of 400 Hz and was digitally sampled at 1 kHz with a resolution of
0.003°. The instrumentation time lag was <1.5 ms, the RMS noise in each channel was 0.025°, and accuracy was
0.1°. Saccades greater than
0.2° were reliably detected in real time using speed criteria (saccade start: speed >24°/s; saccade end: speed <16°/s). The methods for detecting saccades and calibrating retinal locations with monitor locations are described in detail elsewhere (DiCarlo and Maunsell 2000
).
Visual stimuli
Stimuli were presented on a video monitor (37.5 x 28.1 cm, 75 Hz frame rate, 1,600 x 1,200 pixels) positioned 62 cm from the monkey so that the display subtended ±17 (h) and ±13 (v)° of visual angle. The background luminance of the monitor was 22 cd/m2; it was the only light source in the room. Both animals worked with the same fixed set of five achromatic forms (Fig. 1A). Each form was constructed by connecting line segments (0.02° width) to form the stimulus outline. This outline shape was then convolved with a difference-of-Gaussians spatial filter (0.01° SD positive, 0.02° SD negative) so that the average luminance over each form was the same as the monitor background (Fig. 10A). The peak luminance was set to the monitor maximal white (46 cd/m2). The size and spatial frequency content of the forms were tuned to allowed us to study both the effects of free viewing (DiCarlo and Maunsell 2000
) and of stimulus position (current study). Specifically, based on the animal's performance with stimuli placed at a range of eccentricities, we chose the stimulus size so that recognition accuracy was good for stimuli at 1.5° eccentricity (Fig. 2) but was approaching chance levels for stimuli at
5° eccentricity (monkey 1 = 0.52° width, monkey 2 = 0.68° width). Although acuity limits depend on the forms to be distinguished, at 1.5° eccentricity acuity is reduced to 40-60% of that observed at the center of gaze (Ludvigh 1941
; Merigan and Katz 1990
), and retinal cone density is
40% of maximal (Curcio et al. 1987
; Perry and Cowey 1985
).
|
|
|
|
Some neurons in monkey 1 were also studied with a second set of target objects that had the same shapes as the original stimuli but substantially different elemental spatial frequency content (Fig. 10, see RESULTS). These were constructed with the same outline shapes, except that the outlines were 0.04° wide and were not filtered with the difference-of-Gaussians spatial filter. Instead, to keep the average luminance over the stimulus near the background luminance, each of these stimulus shapes was added to a negative, (i.e., below the average luminance), circularly symmetric Gaussian (0.3° SD). The amplitude of this Gaussian was set so that the average luminance over a 2° square window centered on the stimulus was the same as the background luminance.
Basic form recognition task
Both animals performed a form recognition task. Four of the five stimulus forms were designated as targets; the remaining form was the distractor (Fig. 1A). Four response locations near the corners of the monitor (16.8° from the display center) were at all times indicated by identical white squares (0.6 x 0.6°, 46 cd/m2; Fig. 1B). For each animal, each target form was assigned a different response location, and this mapping never changed. When a target was presented, the animal was required to signal the target form by making a saccade to the appropriate response location. Saccades that ended within a window [±11.9° (h) and ±4° (v)] around any response location were scored as a response. The horizontal width of these windows was chosen to ensure that the animal would register a response if it produced the same saccade vector from a broad range of absolute horizontal eye positions where targets could be encountered during free viewing studies, described elsewhere (DiCarlo and Maunsell 2000
). Correct responses produced a juice reward and a brief tone. Reaction time was defined as the duration between target onset and the start of the response saccade.
Each trial began with the presentation of a small, white fixation point (0.1 x 0.1°) near the display center (Fig. 1B). The animal was required to bring and hold its gaze within ±0.5° of the point. The fixation point was extinguished 300 ms after acquisition, and one of the five forms was immediately presented in one of three positions: at the center of gaze, 1.5° to the left of the center of gaze (ipsilateral to the recorded hemisphere), or 1.5° to the right of the center of gaze (contralateral to the recorded hemisphere). Because we desired identical retinal stimulation for all trials within a condition and because position variability on the retina can produce neuronal response variability (Gur and Snodderly 1987
), the three positions were always specified relative to the animal's center of gaze at the end of the fixation period. That is, the three positions were specified in retinal coordinates rather than monitor coordinates. Over all recording sessions, the mean center of gaze at the end of the fixation period was 0.01° (h) and 0.13° (v) (monkey 1) and -0.02° (h) and 0.14° (v) (monkey 2) from the fixation point center (h and v SD
0.14° in both monkeys). On each trial, the stimulus form (4 target forms) and the position of the form (3 possible positions) were each randomly chosen with equal likelihood and were presented only briefly (mean:
290 ms, see following text). Thus the animal could not bias spatial or featural attention differently on each trial because it could not predict the position or form of the target. These 12 trial types were presented in blocks such that a correctly completed trial type was not presented again until all trial types were correctly completed.
After a target form was presented, the animal was allowed to respond as rapidly as it liked. If the animal made a saccade that ended >3° (h) or 1° (v) from the fixation point but did not reach one of the response widows, the trial was scored as a failed trial (
4% of trials). Any eye movement that brought the center of gaze out of the fixation window (±0.5° around the initial fixation point) caused the stimulus to be immediately extinguished. Indeed, on
97% of trials in which a form was presented to the left or right of the center of gaze, the animal made a small "adjustment" saccade (mean amplitude = 1.1°; mean duration = 23 ms; latency mean and SD = 140 ± 33 ms) toward the form, and the form was extinguished during this saccade. The animals generated these adjustment saccades without training, and we did not attempt to modify this behavior. Extinguishing the target during the adjustment saccade ensured that the animal could not acquire information about target form from a retinal position other than the initially stimulated position (see Fig. 11). In these trials, the monitor phosphors that comprised the form were last excited 22.5 ms (mean; 95% range = 10 -36 ms) before the saccade out of the fixation window was completed. Because the phosphors decayed exponentially with a time constant of <1 ms, the extinguished form could not have been visible at the end of the adjustment saccade (Michelson contrast <10-9 on average; 95% upper bound = 5 x 10-5). After the adjustment saccade, the animal's gaze typically remained at the new, now empty, position (i.e., near the original target position) for
150 ms before the animal began its response saccade (i.e., to 1 of the 4 response locations). This pattern of eye movements was observed in essentially all correct trials in both animals (monkey 1: 93% of central position trials, 95% of eccentric position trials; monkey 2: 88%, 99%; see Fig. 3, top). In the remaining central position trials, the animals made a small saccade (typically <0.5°) before the response saccade. In the remaining eccentric position trials, no adjustment saccade was detected.
|
|
Additional task conditions
We also recorded data while the animal performed the basic recognition task in the presence of visual clutter. For these trials, the single target form was embedded in a horizontal row of 20 identical distractor forms with a 1.5° center-to-center separation (see Fig. 13) (see also Fig. 1 of DiCarlo and Maunsell 2000
). Trials run with clutter were run in separate blocks, and these blocks were interleaved with the primary behavioral task blocks.
|
52% accuracy; chance is 25%), indicating that the animal had generalized the task (i.e., shape identification regardless of retinal position). After
2 wk of training, performance gradually improved but was still not as good as the central three positions (see RESULTS) and was very poor for some target shapes. Because of this, we did not force the animal to complete an equal number of correct trials for each target in each position but instead included neuronal response data from all trials in which the target was presented, regardless of the behavioral outcome (i.e., correct, wrong, or failed). Recording and data collection
A guide tube (23 G) was used to reach AIT using a dorsal to ventral approach. Recordings were made using glass-coated Pt/Ir electrodes (0.5-1.5 M
at 1 kHz), and spikes from individual neurons were amplified, filtered, and isolated using conventional equipment. The superior temporal sulcus (STS) and the ventral surface were identified by comparing gray and white matter transitions and the depth of the skull base with atlas sections. Penetrations were made over a
10 x 10 mm area of the ventral STS and ventral surface (Horsley-Clark AP: 10 -20 mm, ML: 14 -24 mm) of the left hemisphere of each animal. In both animals, the penetrations were concentrated near the center of this region, where form selective neurons were more reliably found. Using electrolytic lesions and fluorescent dye (DiI, Molecular Probes) to coat the electrode (DiCarlo et al. 1996
), we confirmed that the bulk of the recordings from the first animal were on the ventral surface, centered
10.5 mm posterior of the temporal pole, lateral of the anterior middle temporal sulcus (AMTS). Based on the anterior-posterior coordinates, and the sulci, this region is approximately the anterior third of IT and is contained in area TE (Felleman and Van Essen 1991
; Logothetis and Pauls 1995
; Logothetis and Sheinberg 1996
). We refer to this region as AIT (Felleman and Van Essen 1991
).
The animal cycled through behavioral blocks as the electrode was advanced into AIT. Responses from every isolated neuron were assessed with an audio monitor and on-line histograms, and data were collected from even marginally responsive cells under the assumption that longer periods of observation might reveal statistically detectable effects. Data from each recorded neuron were considered for further analysis if isolation was maintained for at least six presentations (mean = 8.5, maximum = 10) of each target form in each position during all task conditions (
20 -35 min of recording). The responses of 220 AIT neurons (monkey 1 = 119, monkey 2 = 101) were recorded. Among these, 74 (33%) were not considered for further analysis because they failed to produce a statistically significant response to any of the three tested retinal positions (described in the following text). The presence of these 74 unresponsive neurons in the recorded data set is consistent with our low threshold for selecting neurons during the recording sessions. Most of the neurons were located on the ventral surface (127 of 146; 87%); the rest were in the ventral bank of the STS. For brevity, the data from both animals were combined in some plots, and summary values for each animal are indicated in the text and figure legends.
Analysis
Only neuronal responses collected during correctly completed behavioral trials were included in the analyses (88% of trials; except Fig. 8, see METHODS). We also excluded trials in which eye movements >0.3° occurred during the first 50 ms after target onset (<1% of all correct trials) or those in which the animal began its response saccade <100 ms after target onset (<<1% of all correct trials). We estimated the background firing rate of each neuron as the mean rate of firing over all trials in a 100-ms-duration window that directly preceded target onset. For the majority of the data (where only 3 positions were tested), we quantified the response of each neuron to each of the 12 stimulus conditions (4 forms x 3 positions) as the mean response in a 150-ms window that began 100 ms after target onset. One advantage of the behavioral task is that the choice of the temporal analysis window was constrained by both the start of the AIT responses (
100 ms after stimulus onset, see Fig. 13) (see also Baylis et al. 1987
; Vogels and Orban 1994
) and by the animal's reaction times (
300 ms after stimulus onset, see Fig. 2B). The results were largely unaffected by the details of the analysis time window (see RESULTS).
|
The mean response above background for each of the 12 stimulus conditions (4 target forms x 3 positions) was used to determine the form and position preferences of each neuron. Eight neurons that showed decreases in firing rates in all 12 conditions were excluded from further analyses. We defined the neuron's best and worst target forms as those that produced the largest and smallest mean response over all three positions. Likewise, we defined the neuron's best and worst positions as those that produced the largest and smallest mean response over all four target forms. Responsive neurons (n = 146 of 220) were defined as those that showed a statistically significant increase in firing rate (relative to background rate) to their best target form presented in any of the three positions (3 t-test, each run at P = 0.017). Because we selected the neuron's best target before running these tests, Monte Carlo simulation shows this gives an overall false positive level of 0.075. The main result (Fig. 6) was unaffected when false positive levels of 0.05 (n = 140), 0.01 (n = 128), and 0.001 (n = 101) were applied.
In Fig. 6, we used the RF data of Op de Beeck et al. (Op de Beeck and Vogels 2000
) to predict the expected neuronal sensitivity to our tested positions. That report is the most quatitative study of IT RFs currently available. It showed that Gaussian sensitivity profiles fit most of the measured IT RFs, and it provided the distribution of RF sizes and RF centers. Based on those data, we simulated the position sensitivity of 10,000 randomly selected (normal), circularly symmetric Gaussian RFs using the following parameters: mean RF size (square root of RF area) = 10.3°; RF size SD = 5°; min RF size = 2°; mean RF center azimuth = 1.5° (contralateral), mean RF center elevation = 0.0°; RF center SD = 1.5° (azimuth and elevation).
| RESULTS |
|---|
|
|
|---|
4,500 trials for each animal in each position; accuracy: monkey 1:
2 = 3.0, P > 0.05; monkey 2:
2 = 22.2, P < 0.01, df = 2; reaction time: monkey 1: F = 53, P < 0.01; monkey 2: F = 287, P < 0.01). In sum, the behavior showed excellent position tolerance both animals could rapidly and accurately identify each target form, regardless of its position, and without foreknowledge of precisely where it would appear.
If individual AIT neurons were underlying the animal's recognition, the behavioral observations suggested that these neuronal responses should be largely unaffected by these small position changes. Likewise, previous studies showing AIT RFs to be 10° or more in diameter (see INTRODUCTION) also predicted that the neuronal responses should be largely unaffected by our small position changes. To examine these predictions, we analyzed data from all 146 recorded neurons that were responsive in at least one position (72 from monkey 1, 74 from monkey 2; see METHODS). Consistent with previous studies (Logothetis and Sheinberg 1996
; Miyashita 1993
; Tanaka 1996
), many of the recorded neurons were selective for stimulus form (n = 54 of 146, see later). However, the AIT neuronal responses in our animals were largely inconsistent with the large RFs previously reported in AIT (see INTRODUCTION). In particular, almost all neurons showed a stronger than expected sensitivity to small (1.5°) position changes, and some were exquisitely sensitive to these position changes. Responses from one such neuron are shown in Fig. 3. Middle shows that when targets were presented at the center of gaze, the neuron responded strongly to two of the target forms but gave little response to the other two. That is, this neuron was highly form selective at the center of gaze (ANOVA, P < 10-7). However, the neuron produced almost no response when the same target forms appeared either 1.5° ipsilateral or 1.5° contralateral to the center of gaze. Thus this neuron was selective for stimulus form but responded only over a very limited range of stimulus positions (assuming that positions more eccentric than the tested three would yield little or no response, see following text). It should be emphasized that all three tested retinal positions were within the fovea (±2°). One interpretation of these observations is that the neuron had a very small RF near the center of gaze (i.e., <2° in diameter). However, because we did not perform full RF mapping for most neurons and because some neurons showed more than one hot spot in their RF (e.g., Fig. 4), we use the term position sensitivity to describe the effect of our tested position changes on the neuronal responses.
|
The neuron in Fig. 3 could contribute to form discrimination at the central fovea, but it is poorly suited for the eccentric positions just 1.5° away. However, the animals were highly accurate at identifying target forms at all three retinal positions. If AIT supported recognition at all three positions, one would expect to find neurons that showed form selectivity at eccentric positions. Indeed, we also encountered many neurons that preferred stimuli at one or both of the eccentric locations. For example, the response pattern of the neuron shown in Fig. 4 was complementary to that of the previous neuron in that it was most responsive to stimuli presented in the contralateral position, with some response in the ipsilateral position, and almost no response in the central position.
In light of previous studies, the observation that AIT neuronal responses change with stimulus position is not surprising. Indeed, any neuron must show some position sensitivityat least at the edges of its RF. However, the neuronal position sensitivity was typically much larger than that previously reported or expected based on reported RF sizes in AIT. Indeed, many neuronal responses were so strongly affected by retinal position that they failed to respond at one or two of the three tested locations (all were within the fovea). Among the neurons that were responsive in at least one location, 77 (52%) gave no statistically significant response for one or both of the remaining positions (t-test), and 18 of these gave no statistically significant response to the central fovea (using the best target form for all tests). This was not due to the neurons being poorly responsive overall because the mean driven response rate at preferred positions was 24.3 spikes/s (n = 146) comparable to rates previously reported in AIT (20-40 spikes/s) (Leuschow et al. 1994
; Missal et al. 1999
; Op de Beeck and Vogels 2000
). The examples in Fig. 5 illustrate the range of position and form sensitivities seen in the recorded population.
|
To summarize the position sensitivity of each neuron, we plotted its reduction in response when its best target form was presented in its worst position (relative to the response in its best position; Fig. 6). The median relative response was 0.41. In other words, the response of the typical AIT neuron in our sample could be reduced by
60% when the neuron's preferred stimulus form was moved within a region of only ±1.5° around the center of gaze. If we only consider neurons that prefer the center of gaze (i.e., where we clearly included the RF center), assume 2D Gaussian shaped RFs, and define RF cutoff at 50% (as in previous studies, see Op de Beeck and Vogels 2000
), then this median decrease over a position change of 1.5° corresponds to a median RF diameter of 2.6°. This is not an artifact of noisy responsesthe result was nearly identical when the data were split in half and one group was used to compute the best and worst targets and positions and the other group used to compute the position sensitivity.
Because form-selective neurons are most likely to underlie the recognition behavior, it is possible that they have less position sensitivity (because the behavior showed virtually no position sensitivity). However, examination of the 54 neurons (37%) that were selective for stimulus form (ANOVA, P < 0.05) revealed even greater position sensitivity (median = 0.27) than that seen in the entire responsive population (Fig. 6). Under the RF assumptions described above, this corresponds to a median RF diameter of 2.2°.
To compare the distribution of position sensitivities of the recorded population (Fig. 6) with that predicted from previous studies, we estimated the expected AIT position sensitivity using the RF data from a recent, thorough study of AIT RFs (Op de Beeck and Vogels 2000
) (see METHODS). That data predict that the median AIT neuron should have shown only an 18% maximal response change across our three tested positions, nearly fourfold less than we observed.
The stronger than expected position sensitivity could be due to changes in overall responsivity at some retinal positions (e.g., due to small RFs), changes in form preference at each retinal position, or both. The example neuronal data (Figs. 3, 4, 5) suggest the former hypothesis. This hypothesis also seemed most likely because previous studies have reported that the rank order of form preference is largely unaltered by changes in position (e.g., Desimone et al. 1984
; Ito et al. 1995
; Sary et al. 1993
; Schwartz et al. 1983
). However, because we found much greater position sensitivity than previous studies, we sought to confirm that it acted across all stimulus forms. Because the position sensitivity of the neuronal responses was so strong, we could not test this hypothesis for about half the neurons because the 1.5° position shifts eliminated the response (e.g., Fig. 3). Even when responses remained at non-preferred positions, they were so weak that most neurons were no longer significantly form selective at those positions. Specifically, 54 of the 146 responsive neurons (37%) were significantly form selective at their best position but less than half of these (25 of 54) were still significantly form selective at their second best position. Nevertheless, 24 of these 25 neurons maintained the rank order of their best and worst forms at their second best position.
To summarize the average effect of position changes on form selectivity, we split the 54 form-selective neurons into three groups, where each group preferred one of the three tested positions (n = 2, n = 35, n = 17 for the ipsi, central, and contra positions). We then rank-ordered the target forms for each neuron and averaged the normalized (to best response) responses of all neurons in the group for each rank-ordered form in each position (Fig. 7). This analysis showed that, on average, neurons that preferred the central position (Fig. 7, left) maintained their rank order of form preferences at the eccentric positions and showed a strong response reduction in each side position that operated largely as a decrease in response gain over all four target forms (gain of
0.4 across the 1.5° position changes). Results were similar for neurons that preferred the contralateral position, but the decrease in response gain was slightly weaker (Fig. 7, right). In summary, although we found much greater position sensitivity than most previous studies, the results were consistent with other studies in that, when it could be measured, the rank order of target form preference was largely unaffected by position. Thus the strong position sensitivity observed in this study is most consistent with the hypothesis that the neurons have small RFs (
2.5° diam), or that those RFs contain unresponsive locations (e.g., Fig. 4).
|
We could not fully characterize the spatial RFs of the neurons because we tested only three positions. Because the animal's task was to identify forms at these positions, our logic was that the position sensitivity of AIT neurons responding to any of these positions would provide the most appropriate measurement of the position sensitivity of AIT neurons that might support the behavior. Exploration of additional retinal positions could only show that we had underestimated the neuronal position sensitivity. However, we wondered if our measurements were on the edge of some RFs or if they always included the RF center (i.e., maximal response position). Although a thorough exploration of these RF issues is the focus of future studies, we have collected preliminary data from 17 responsive neurons in one animal (monkey 1). For these neurons we extended our measure of position sensitivity along the horizontal meridian by placing stimuli at four additional positions eccentric to those tested for the larger neuronal population. In particular, we tested horizontal eccentricities of -4.5 to +4.5° in 1.5° increments (Fig. 8). Although the animal performed well above chance the first day it saw these new positions, the animal received additional training to better acclimate it to the occurrence of targets at these new positions (see METHODS). After training, the animal's performance at these positions was reduced relative to the more central positions, but was well above chance (70 and 62% correct at 3.0 and 4.5° eccentricity, respectively). Each neuron's preferred target form was determined from the central three positions as before, and the response to that target plotted as a function of position. Of the 17 neurons tested, no neuron gave a significantly larger mean response to any of the more eccentric positions than it did to the best of the original, central three positions (t-test, P = 0.05). Data from four representative neurons are shown in Fig. 8. Thus although the RF shape varied from neuron to neuron, the extended field mapping suggests that the RF centers of the tested neurons were within the original three positions.
Time course of position sensitivity
We next sought to determine if the position sensitivity was present in the earliest part of the responses or if it developed over time. For example, perhaps the AIT neurons had different response latencies for different positions. Inspection of the data revealed little evidence of large differences in latency across stimulus position (e.g., Fig. 4), but we examined the time course for subtle effects. As a first step, we re-analyzed the entire data set using two other analysis windows (100 -200 and 150 -250 ms after stimulus onset) with little effect on any of the results. The median position sensitivity ratios using these time windows were similar (0.36 and 0.38, respectively; cf. Fig. 6). An ideal analysis would estimate each neuron's response latency for each position, but this is problematic because of the limited number of trials and because many neurons did not respond to nonpreferred positions. Instead we estimated the population time course of the position sensitivity by computing the population average response to each neuron's best target form presented in the neuron's best and worst positions (Fig. 9). For the best position, AIT neurons began to respond
100 ms after stimulus onset. For the worst position, the average response began slightly later, rose more slowly, and reached a lower peak. The plot suggests that latency differences across stimulus position account for only a small amount of the position sensitivity reported above. To quantify this, we found the temporal shift and scale factor that could be applied to the average response in the worst position to best match the average response in the best position (RMS error function). The fit was good (correlation coef = 0.976, 0 -300 ms after stimulus onset; dashed line in Fig. 9), and it required a temporal shift of 19 ms and a vertical scale factor of 2.7. The scale factor is an estimate of the amount of position sensitivity not due to latency differences, and it shows that mean position sensitivity (worst/best position) was 0.37 (i.e., 1/2.7), which is comparable to the median effect of 0.41 already described. In summary, changes in response gain with position underlie almost all of the position sensitivity reported in this study.
|
Possible artifacts
Because we found much greater position sensitivity than almost all previous studies of AIT (but see DISCUSSION), we considered factors that might explain this finding. The most intriguing possibilities require further systematic study (see DISCUSSION). However, here we report our examination of three possible artifacts that might have contributed to our findings: stimulus spatial frequency content, differences in eye movements across position, and differences in stimulus duration across position.
The first factor we considered was the spatial frequency composition of the target forms. The target forms were made of line segments with a high spatial frequency content (
25 cycles/°, see METHODS). Because stimulus form (identity) depended on the spatial arrangement of these line segments, the spatial frequencies that supported the animal's differentiation of the forms were much lower (
5 cycles/°)near the maximal contrast sensitivity for primates (Merigan and Maunsell 1993
). Indeed, the stimuli had spatial frequency content similar to that of individual letters during normal reading. Nevertheless, we considered the possibility that the spatial frequency content of the stimulus elements was responsible for the strong position sensitivity. We created a set of four new targets that had the same size and spatial layout as the original four targets, but whose line segments contained lower spatial frequencies (Fig. 10). One of the animals (monkey 1) was retrained to respond to these four modified targets using the same form-response mapping as the four original targets even when both target types were randomly interleaved across trials (
1 wk of training). We recorded the responses of an additional 15 AIT neurons to each of the eight targets in each of the three original positions. We measured position sensitivity for each spatial-frequency condition exactly as before with the exception that each neuron's best target and best and worst positions were chosen after averaging the data from the two spatial-frequency conditions (results were nearly identical when each condition was considered separately). The analysis showed that some neurons were less sensitive to the position of the modified stimuli (Fig. 10C) but that other neurons were equally (Fig. 10D) or more position sensitive (Fig. 10E). Over the population (n = 15), the median position sensitivity for the original stimuli was nearly identical to that measured in the larger group of neurons (0.37) and was not significantly different from the population position sensitivity measured with the modified stimuli (median = 0.33; t-test, P = 0.60). Thus these data suggest that the strong position sensitivity cannot be simply explained by the spatial-frequency content of the stimulus elements per se (but see DISCUSSION).
The second and third potential artifacts we considered were differences in eye movements and differences in stimulus duration across target position. As described in METHODS, we did not place strong constraints on the animal's eye movements but ensured that the target was only presented at the intended retinal position. Because of this, the animal's pattern of eye movement and the stimulus duration were both confounded with the primary variable of retinal position. These confounds are illustrated in Fig. 11, A-C. We admitted these confounds in our design because we wanted the task to remain as natural as possible while still varying the retinal position of the target forms. As a result, it is possible that the shorter stimulus exposure durations used for eccentric stimuli (
150 ms) relative to the central stimuli (
300 ms) could affect response rate and cause apparent strong position sensitivity. This seemed unlikely because rapid presentation of stimuli indicates little peak response reduction for stimulus exposure durations greater than
50 ms (Keysers et al. 2001
) and because the latency of AIT neurons to stimulus onset is
100 ms (Fig. 9) (Baylis et al. 1987
; DiCarlo and Maunsell 2000
; Vogels and Orban 1994
). If stimulus offset requires the same latency as stimulus onset to alter AIT firing rates, then the offset of the target form would not alter the response until the end of the analysis window (i.e., 100 ms after the form offset is
250 ms). A second possibility is that the neuronal processes that produce eye movements toward the target ("adjustment saccades" in Fig. 11; see METHODS) could cause a change in ongoing AIT neuronal activity (e.g., a "reset" signal or saccadic suppression). The fact that the monkeys' reaction times were nearly identical for central and eccentric stimulus positions argues against this possibility (Fig. 2) but does not exclude it. Because the two confounding factors (stimulus exposure duration and time of adjustment saccade) were perfectly correlated in our design, we cannot distinguish their effects, so we considered them to be a single confound and performed analyses to isolate the effect of this confound from that of stimulus position.
One analysis is summarized in Fig. 11 (D and E). Each point in each panel is the response rate of one neuron on one trial relative to the average response rate of this neuron over all trials with the neuron's best form in one position. These normalized trial-by-trial responses are plotted relative to the time that the adjustment saccade (i.e., the confound) occurred for that trial. Thus these plots show the average effect of the confound on response rate (isolated from the effect of stimulus position). If the confound had a consistent effect across the population of AIT neurons (e.g., decrease in ongoing neuronal responses), the running averages in the plots should show a trend. Instead, no trends were apparent and the correlation coefficients were not significantly different from zero (-0.012, -0.030, P > 0.1). The two symbol types in the plots indicate data from the two monkeys, illustrating that monkey 2 tended to make adjustment saccades at shorter latencies than monkey 1. This difference in behavior does not obscure a relationship between the time of the adjustment saccade and response rate because the within-animal correlations are also not significantly different from zero (monkey 1: -0.051, -0.021; monkey 2: 0.013, -0.043; P > 0.1 all cases). In addition, the mean of the normalized responses on trials where no adjustment saccade occurred was not significantly different from that expected based on trials where an adjustment saccade was made (t-test against a value of 1, P > 0.1 for the ipsilateral and contralateral conditions). If the confound causes some neurons to increase their firing rates and others to decrease, the analysis in Fig. 11 might fail to detect these effects. However, a neuron-by-neuron analysis revealed that only
5% of neurons (8% for ipsilateral stimuli, 3% for contralateral stimuli) showed any significant correlation of response rate with adjustment saccade latency (Spearman ranked correlation, P < 0.05), which is approximately the number expected by chance. Furthermore, a mixture of positive and negative effects should increase the variability of relative response rates (i.e., the SD of the ordinate values in Fig. 11) relative to that which would have been observed without the effects. Instead, the observed SDs (ipsi: 0.50, contra: 0.47) were slightly below those obtained from simulated trial-by-trial responses using the average rates observed in the actual population and Poisson firing statistics (ipsi: 0.53, contra: 0.50) (see Shadlen and Newsome 1994
for Poisson assumption; Softky and Koch 1993
). In summary, because these analyses failed to find a significant effect of the time of the adjustment saccade (and stimulus offset) on the response rate, we conclude that these factors did not significantly modify the AIT responses and thus they cannot explain the position sensitivity of those responses.
Behavioral significance of neuronal position sensitivity
Unlike almost all previous studies of AIT RFs or AIT position tolerance, the current data were collected while the subjects performed recognition across changes in object position. Thus we were also able to examine position sensitivity in the context of that behavior. Here we present three such analyses.
In the first analysis, we adopt a standard view of AIT in which the purported role of AIT neurons is to extract object identity and to support the "perceptual equivalence" of the same object over changes in, for example, object position (e.g., Desimone et al. 1984
; Gross and Mishkin 1977
). This hypothesis predicts that individual AIT neurons should be capable of signaling object identity across changes in object position that are "perceptually equivalent." Testing this prediction depends on defining both perceptual equivalence and the manner in which AIT neurons signal or code object identity. The spirit of perceptual equivalence is that the subject's interpretation of the identity of the object remains the same over changes in, for example, object position. The animal's accurate identification of each object across changes in position (even for less trained positions, see METHODS) suggests that it treats each object as equivalent across position. Thus we assume that AIT neurons should signal object identity across these same position changes. We defined an AIT neuron's ability to signal object identity as its response to its best target form relative to a distractor response (d'). The distractor response was taken to be the maximal response to the neuron's worst target form over all three positions. We then asked, how well does each neuron continue to signal its preferred object across the tested position changes?
The results from the 54 form-selective neurons are shown in Fig. 12. Almost all of these neurons provided a strong signal of target identity at their preferred position. In particular, 41 of the 54 neurons (76%) had d' values >1.35 (discrimination performance of 75% correct) at their preferred position. However, only three of the neurons (6%) could continue to provide this target identity signal (d ' > 1.35) at all three of the tested positions. Put another way, the typical form-selective neuron could correctly discriminate its best target from the distractor on 83% of the trials (median d ' = 1.89), but a position change within 1.5° of the fovea caused that same neuron's performance to fall to near chance (median d' = 0.15; 53% correct discrimination; 50% is chance). In sum, these data show that only a few AIT neurons are individually capable of mediating perceptual (behavioral) equivalence.
|
In the second analysis, we ask: were the AIT neurons better at signaling object identity or object position? Tovée et al. (1994
) asked this question in passively fixating animals and showed that the median AIT neuron carried four times as much information about object identity as it did about object form. However, comparison of position sensitivity and form sensitivity is problematic because it depends on the tested range of objects and positions. The comparison is only meaningful in the context of a behavioral task. In particular, if the putative role of AIT neuronal responses is to inform the animal about object identity regardless of small changes in object position, then AIT responses must be more sensitive to an identity change that is critical to the animal's task than to a position change that is irrelevant in that task. Our behavioral task was specifically designed to test this hypothesis, because it required the animal to signal object identity (stimulus form) regardless of position.
We compared the position and form sensitivity of the population of AIT neurons. The median position sensitivity was 11.5 spikes/s (n = 146; best-worst position; monkeys 1 and 2 = 13.1 and 10.9) and the median form sensitivity was 10.4 spikes/s (best-worst form; monkeys 1 and 2 = 10.5 and 10.2). If we consider only the 94 neurons that showed a statistically significant effect of either identity or position or an interaction (2-way ANOVA, P < 0.05), the median sensitivity differences were 14.3 spikes/s (position) and 13.8 spikes/s (form) and the median sensitivity ratios were 3:1 (position) and 2.4:1 (form). In summary, the AIT neurons were slightly more sensitive to differences in position within the fovea that were irrelevant to the task than they were to differences in target form that were critical to the task. These data cannot rule out the possibility that the object position information conveyed in the AIT responses is completely ignored by downstream brain areas. However, these data suggest that the role of AIT neurons is to provide the animal with a representation of both object identity and object position and that the representation of object position can be of much higher spatial resolution than previously appreciated.
So far we have focused on the idea that to perform position-tolerant recognition, the brain should seek large RFs and thus less neuronal position sensitivity. However, there may be competing behavioral demands for small RFs and thus more position sensitivity (i.e., as seen in this study). In this third analysis, we consider one of those behavioral demandsrecognition in visual clutter. Before any recording began, both animals were successfully trained to recognize each target in each position even when the target form was flanked on both sides by a row of distractors (see Fig. 13A and METHODS; mean behavioral accuracy was 87% with clutter vs. 88% without clutter). We considered the hypothesis that small RFs (i.e., high position sensitivity) might have developed to protect each neuron's response, and thus the animal's behavior, from the influence of flanking visual clutter by limiting its intrusion into the RF.
We compared each neuron's responses at its best p