|
|
||||||||
1The Salk Institute for Biological Studies, La Jolla, California; 2Department of Experimental Psychology, University of Oxford, Oxford, and 3School of Psychology, Birmingham University, Edgbaston, Birmingham, United Kingdom; and 4Max-Planck Institute for Biological Cybernetics, Tübingen, Germany
Submitted 1 July 2005; accepted in final form 2 August 2005
| ABSTRACT |
|---|
|
|
|---|
| INTRODUCTION |
|---|
|
|
|---|
Some implied motion cues operate at a cognitive level (such as an athlete about to throw a javelin), and thus form and motion interactions could take place at a cognitive level. Recent neurophysiological (Jellema and Perrett 2003
) and human functional magnetic resonance imaging (fMRI) studies, however, suggest that at least the end result of this interaction is represented in the prototypical motion area of the human brain (hMT+/V5). This area is more active when presented with real-life images that imply motion than when similar images are shown that do not imply motion (Kourtzi and Kanwisher 2000a
; Senior et al. 2000
).
Other implied motion cues operate at a lower level. In a dynamic Glass pattern sequence (Glass 1969
; Ross et al. 2000
), oriented elements are aligned along a common trajectory. This alignment generates a global structure in these patterns (Fig. 1) that evokes a percept of coherent motion (Ross et al. 2000
). For instance, by orienting the elements along concentric trajectories, a percept of rotational motion is evoked. The direction of this rotation is ambiguously clockwise or counterclockwise, but the fact that a coherent rotation is seen at all is surprising because, on average, the motion energy in these patterns is perfectly balanced in all directions (see APPENDIX and DISCUSSION). Krekelberg et al. (2003)
found a neural correlate of the motion percept in the superior temporal sulcus of the macaque, where a subpopulation of motion selective cells responds to Glass sequences as if they contain real coherent motion.
|
Our fMRI data show that the human motion complex indeed contains subpopulations of cells that are selective for both implied and real motion. Selectivity for global patterns of implied motion and global patterns of real motion was also observed in ventral areas [VP, V4, and lateral occipital complex (LOC)]. In contrast with the dorsal areas, however, the subpopulations selective for the structure of Glass sequences did not show significant fMRI selectivity for the structure of real motion patterns. Finally, only a small part of the implied motion selectivity could be explained on the basis of selectivity for local orientation changes in primary visual cortex (V1, V2).
These findings provide insight into the representation of form and motion in the human visual cortex. Dorsal areas have a high degree of cue invariance (Albright 1992
); in these areas real and implied motion patterns drive similar subpopulations of neurons. The ventral areaswhich presumably respond to the global structure rather than the motion in these sequencesdo not show this cue invariance. This allows ventral areas to discriminate between global structure generated by motion cues and that same structure generated by form cues. Dorsal areas, on the other hand, do not make this distinction and extract only the (implied or real) motion information.
| METHODS |
|---|
|
|
|---|
Observers
Eleven observers participated in expt 1 (implied motion), 14 in expt 2 (real motion), 16 in expt 3 (implied and real motion interactions), and 13 in expt 4 (local vs. global implied motion). Three observers were excluded from the analysis in expt 2, three in expt 3, and two in expt 4 because of excessive head movement. All observers had normal or corrected-to-normal vision, were paid for participation, and gave their informed consent. All procedures were in accordance with international standards for research involving human subjects (Declaration of Helsinki).
Apparatus
An LCD projector (NEC GT950) displayed the visual stimuli on a tangent screen that the subjects viewed through a mirror. The refresh rate of the projector was set to 60 Hz for all stimuli. The visible screen subtended 21° of visual angle. A custom-made fiber-optics button box allowed the subjects to communicate their decisions in the perceptual tasks.
Stimuli
LOCALIZERS.
For the LOC localizer scans we used grayscale images of novel and familiar objects as well as scrambled versions of each set, as described previously (Kourtzi and Kanwisher 2001
). We localized hMT+/V5 by using a moving-dot pattern (expanding and contracting for 9 s at a speed of 4°/s within a 21° aperture, reversal rate 1 Hz) and a stationary random-dot field (Huk and Heeger 2002
; Watson et al. 1993
). Area KO was localized by using kinetic boundaries and transparent motion stimuli (Dupont et al. 1997
) that consisted of a field of random black (50%) and white (50%) dots (size: 3.1 arcmin; speed: 4.44°/s). To map the borders and the eccentricity of the retinotopic visual areas, we used rotating triangular wedge stimuli and concentric rings. These stimuli consisted of either gray-level natural images or black and white objects-from-texture images that were presented at a temporal frequency of 2 Hz as described in previous studies (Kourtzi et al. 2003
).
RANDOM-DOT PATTERNS. The random-dot patterns used in the main experiments were created with in-house OpenGL software. Each pattern consisted of 200 dots. A single dot subtended 0.24°, and the whole stimulus pattern was contained within an 18 ° circular aperture around the central fixation point.
IMPLIED MOTION.
To generate implied motion, we used sequences of Glass patterns. In each Glass pattern, the dots were arranged in pairs and all pairs in a given pattern were aligned either along concentric circles around the fixation point or along radial lines emanating from the fixation point (Fig. 1A). In a sequence of Glass patterns, a new pattern of the same type, with a new set of randomly positioned pairs was presented every 83 ms, which was close to the optimal range to generate the impression of motion (Ross et al. 2000
). The duration of a single sequence was 300 ms (see Procedure, below). The distance between the dots in a pair, called the Glass shift, was set per subject to maximize their impression of motion (Ross et al. 2000
) (see Procedure). When the pairs were aligned along concentric circles, they gave an impression of clockwise or counterclockwise rotational motion. They will be referred to as concentric Glass sequences. Orienting the pairs along radial lines gave an impression of expanding or contracting motion; these will be referred to as radial Glass sequences. As the APPENDIX shows, the average motion energy in these sequences is balanced. For each motion energy component in one direction, there is on average an equal component with energy in the opposite direction. Thus the distribution of motion energy in these sequences does not predict the coherent motion percept. In our terminology, stimuli with balanced motion have no coherent motion energy. In contrast, stimuli in which the motion energy does have a clear peak in some direction are referred to as real motion stimuli.
REAL MOTION. The motion sequences all had 200 dots that were identical to those in the Glass patterns. In the motion sequences, however, all dots were randomly positioned rather than arranged in pairs. Because Glass sequences implicitly contain two motion directions we used motion sequences in which the direction of motion reversed every 83 ms. The motion trajectory could either be along concentric circles (concentric motion) or along lines emanating from the fixation point (radial motion). The speed of the dots in the motion sequences was chosen to perceptually match the (high) speed in the Glass sequences (see Procedure).
Local Orientation Controls
To investigate the influence of local orientation selectivity, we devised segmented Glass patterns. In a segmented Glass pattern, the 18° aperture was divided into a square grid of 8 x 8 segments (see Fig. 5). We assigned a random-pair orientation to each segment. The orthogonal Glass sequences were chosen such that the local orientation per segment was orthogonal to the orientation of the previously presented segmented Glass sequence. Per segment, the change from a segmented sequence to the orthogonal sequence is locally the same as the switch from a radial sequence to a concentric sequence. This is most easily seen by comparing the orientation of the highlighted pairs in the example stimuli of Figs. 1 and 5. The difference between the transition from segmented to orthogonal sequences and the transition from concentric to radial Glass sequences lies solely in the global structure. Locallyat the scale of a single segment (about 2°)both transitions are from one orientation to the orthogonal orientation.
|
Observers participated in two LOC, one hMT+/V5, one KO localizer scan, and two retinotopic mapping scans, and four scans for each of the four event-related experiments. Before the relevant scanning sessions the observers participated in a practice session. In this session the subjects were familiarized with the types of stimuli used in the experiment. In these sessions we used a simple adjustment procedure to determine the Glass shift that evoked the strongest coherent motion percept as well as the speed of the real motion sequences that matched the perceived speed in the Glass sequences.
During the scan sessions, the subjects performed one of two behavioral tasks that ensured that an equal amount of attention was allocated to the stimulus in all conditions. In the first task ("matching task"), the subjects pressed a key to report whether the two stimuli in a trial were the same or different. We analyzed the percentage of trials in which the decision was correct. In the second task ("change detection task"), the central fixation point briefly (250 ms) changed its color from red to blue at unpredictable times during the trial. Subjects pressed a key to indicate that they detected this change. We analyzed the percentage correct detections as well as the reaction time for each correct detection.
Design
LOCALIZERS.
For the LOC localizer scans each stimulus condition was presented in a 16-s stimulus epoch (blocked design), as in previous studies (Kourtzi and Kanwisher 2000b
). Each condition was repeated four times in a balanced order and with interleaved fixation periods. Twenty images were presented in each block, each for 300 ms with a blank interval of 500 ms between images. The observers fixated and performed a one-back matching task. In the hMT/V5+ localizer, a stationary-dot pattern was shown for 27 s and was then replaced by a moving (expanding and contracting) random-dot pattern. Each condition was repeated nine times. In the V3B/KO localizer scans, each stimulus condition (kinetic boundaries, transparent motion) was presented for seven 16-s epochs with interleaved fixation periods similar to the LOC localizer scan. For the retinotopic mapping scans, eight wedge positions and eight eccentricity rings were presented for 8 s each and repeated eight times. During the hMT/V5, V3B/KO, and retinotopic scans observers performed the change detection task.
EVENT-RELATED ADAPTATION SCANS.
Each scan started with a 16-s fixation epoch and ended with an 8-s fixation epoch. After the initial fixation epoch, the experimental trials started. We used an event-related adaptation paradigm (Buckner et al. 1998
; Grill-Spector and Malach 2001
; Kourtzi and Kanwisher 2000b
, 2001
) in which two stimuli (e.g., A and B) were presented sequentially in 3-s trials. Each stimulus was presented for 300 ms with a 100-ms blank interval between stimuli and an intertrial interval of 2,300 ms (see Fig. 1B). Thus there were four experimental conditions: 1) A followed by A, 2) B followed by B, 3) A followed by B, 4) B followed by A, and one fixation condition in which only the fixation point appeared throughout the trial. As in previous studies (Kourtzi and Kanwisher 2000b
, 2001
), the order of trials was counterbalanced across subjects and runs so that trials from each condition, including the fixation condition, were preceded (two trials back) equally often by trials from any of the other conditions. In each of the experiments each condition was repeated 25 times per scan (total of 125 trials across conditions per scan) and subjects were run in four scans in one scanning session.
EXPERIMENT 1 ("IMPLIED MOTION"). The standard event-related adaptation design was used with A = concentric Glass and B = radial Glass, resulting in four experimental conditions: 1) concentricconcentric, in which two concentric sequences were presented in a trial; 2) radialradial, in which two radial sequences were presented in a trial; 3) concentricradial, in which a radial sequence followed a concentric one; and 4) radialconcentric, in which a concentric sequence followed a radial one.
EXPERIMENT 2 ("REAL MOTION"). The same design as in expt 1 was used with A = real concentric motion, B = real radial motion.
EXPERIMENT 3 ("IMPLIED AND REAL MOTION INTERACTIONS"). The first stimulus in a trial was always a Glass sequence; the second was always a real motion sequence. The conditions were: 1) concentricconcentric: concentric Glass followed by concentric motion; 2) radialradial: radial Glass followed by radial motion; 3) concentricradial: concentric Glass followed by radial motion; 4) radialconcentric: radial Glass followed by concentric motion.
EXPERIMENT 4 ("LOCAL VS. GLOBAL IMPLIED MOTION"). The first stimulus in a trial was always a segmented Glass sequence, and we had the following four conditions: 1) randomrandom: segmented Glass followed by segmented Glass; 2) randomconcentric: segmented Glass followed by concentric Glass; 3) randomradial: segmented Glass followed by radial Glass; 4) randomorthogonal: segmented Glass followed by orthogonal Glass.
Imaging
The experiments were recorded in a 3-Tesla Siemens scanner at the University Clinic, Tübingen, Germany. Data were collected with a head coil from 11 axial (3 x 3 x 5 mm3) slices that covered occipitotemporal regions using gradient-echo pulse sequences (localizer scans: TR = 2 s, TE = 90 ms; event-related scans: TR = 1 s, TE = 40 ms; block-design scans: TR = 2 s, TE = 90 ms).
Data analysis
The fMRI data were processed using the Brain Voyager software package. Preprocessing of all functional data included head movement correction, temporal high-pass filtering (cutoff frequency 0.0468 Hz), and removal of linear trends. The two-dimensional functional images were aligned to three-dimensional anatomical data with 1 x 1 x 1-mm resolution and the complete data set was transformed to Tailarach coordinates. Anatomical data were additionally inflated and unfolded.
Regions of interest
For each individual observer, early visual areas (V1, V2, Vp, V3, V3a, V4) were identified based on standard retinotopic mapping procedures (DeYoe et al. 1996
; Engel et al. 1994
; Sereno et al. 1995
). The motion complex (hMT+/V5) was defined as the set of contiguous voxels in the ascending limb of the inferior temporal sulcus that showed significantly (P < 104, corrected) stronger activation for coherently moving (expanding, contracting) than stationary dots. The LOC was defined as the voxels in the ventral occipitotemporal cortex that showed significantly stronger activation (P < 104, corrected) to intact than scrambled images based on the averaged data of the two localizer scans. Area KO was defined as the set of voxels anterior to V3a and posterior to hMT+/V5 that showed significantly stronger activation (P < 104, corrected) for kinetic boundaries than transparent motion. All regions of interest (ROIs) are shown on a flattened representation of a single subject's cortex in Fig. 2.
|
For each observer, we extracted fMRI responses by averaging the data from all the voxels within each of the independently defined ROIs in the event-related scans. In each scan, we averaged the signal intensity across all the trials in each condition. We then calculated the percentage signal change for each condition in relation to the fixation baseline as described in previous studies (Kourtzi and Kanwisher 2000a
, 2001
). Finally, we averaged these time courses across scans and observers.
The hemodynamic response function peaks several seconds after the onset of the stimulus (Boynton et al. 1996
). To identify the peak of the fMRI responses in an ROI we fitted a Gaussian model (Kruggel et al. 1999
) to the average fMRI responses for each condition across observers. This analysis showed average peak responses within the same time window (35 s after trial onset) across ROIs and experiments. Based on this analysis the average fMRI response between 3 and 5 s after trial onset was taken as the measure of response magnitude for each condition in subsequent analyses. That is, all comparisons among conditions and fMRI response measures in the figures are based on this averaged signal.
From these averaged fMRI signals we derived a measure of fMRI selectivity to stimulus changes per ROI and experiment. For instance in an experiment with the four conditions (and corresponding fMRI signals) AA, BB, AB, and BA, the selectivity index (or rebound index) was defined as: SI = [(AB + BA)/(AA + BB)] 1. This represents the enhancement of activity obtained when two different sequences are shown compared with when two sequences of the same type are shown. This measure quantifies a rebound effect (i.e., the release from adaptation). Note that, even though we refer to SI as a selectivity index, we are aware that its relationship with true neuronal selectivity has not yet been demonstrated conclusively (see DISCUSSION). Nevertheless, it has proven to be a sensitive tool to investigate the blood oxygenation leveldependent (BOLD) signal selectivity at a spatial resolution below that of the imaged voxels (Buckner et al. 1998
; Grill-Spector and Malach 2001
; Henson and Rugg 2003
; Kourtzi and Kanwisher 2000b
, 2001
).
| RESULTS |
|---|
|
|
|---|
The human motion complex
In this section we present the analysis of the BOLD signal in the human motion complex (hMT+/V5) in some detail. This includes some of the intermediate steps in the analysis necessary to arrive at an index of selectivity for an ROI. In subsequent sections that report the results in other regions of interest we will skip these intermediate results.
Implied motion
In this experiment we presented either two Glass sequences with implied motion of the same type (e.g., concentricconcentric) or two Glass sequences with implied motion of different type (e.g., concentricradial) (see Fig. 1B).
Figure 3A shows the fMRI time courses (averaged over all subjects) in the motion complex. At time 0 the two implied motion stimuli were presented sequentially. The BOLD signal responded with its typical delay of a few seconds and showed an undershoot after the peak of the response. Note that because of the slow time course of the BOLD response, the separate presentation of the two sequences (Fig. 1B) cannot be resolved. A comparison of the two time courses in this panel, however, shows that the response to two successive Glass sequences of the same type (concentricconcentric) was lower than the response to two successive Glass sequences of different types (concentricradial). This was a statistically significant effect [repeated-measures ANOVA; F(1,10) = 5.06, P < 0.05]. Figure 3B shows the same analysis for the two conditions that started with a radial pattern. The BOLD response was lower when two Glass sequences of the same type (radialradial) were shown than when two Glass sequences of different types (radialconcentric) were shown.
|
We interpret this as pattern-selective adaptation. That is, when a second sequence of the same type is presented, the response is reduced as a result of adaptation. When a second sequence of a different type does not show this reduction, it must have stimulated a different (nonadapted) set of neurons. Thus we infer from this so-called release from adaptation that separate subpopulations of cells in hMT+/V5 respond to radial and concentric Glass sequences (see DISCUSSION). In a control experiment described below we tested whether the local or the global differences between these categories could explain this selectivity.
To quantify the pattern selectivity we calculated an index that contrasts the BOLD signal in trials in which two sequences of the same type were presented with the BOLD signal in trials in which two sequences of different type were presented. This selectivity index (SI; see METHODS) condenses the analysis to a single number per experiment and ROI. This index is zeroreflecting no selectivitywhen the response to two patterns of the same type equals the response to two patterns of different types. Index values significantly above zero correspond to the situation where the two patterns of different types evoke larger responses than two patterns of the same type; that is, a positive index reflects release from adaptation and thus pattern selectivity in the underlying population. The implied motion selectivity for the human motion complex is shown as the white bar in Fig. 3F.
Real motion
In this experiment, we presented either two real motion sequences of the same type (e.g., concentricconcentric) or two real motion sequences of different types (e.g., concentricradial). The analysis was the same as that of the implied motion data. Figure 3D shows the bar plot of the average peak BOLD signal for the four conditions in this experiment. Hereas in the implied motion datathe BOLD signal in the conditions with two different types of motion sequences was significantly higher than that in the conditions with two of the same types of motion sequences. [F(1,10) = 42.76, P < 0.001]. Thus we infer thatnot surprisingly and entirely consistent with single-cell findingsthe motion complex has subpopulations of cells that are selective for the type of real motion.
From these average peak BOLD responses we determined the selectivity index of the human motion complex for real motion. The selectivity for real motion in hMT+/V5 was nearly 50% larger than that for implied motion and is represented by the black bar in Fig. 3F.
Interactions between real and implied motion
The analysis so far shows that hMT+/V5 has subpopulations selective for real motion and subpopulations selective for implied motion sequences. However, this does not necessarily mean that these subpopulations were the same. Experiment 3 was designed to test this hypothesis directly. If the same cells that respond to implied rotations also respond to real rotations, it should be possible to obtain pattern-selective adaptation for a concentric real motion pattern after the presentation of an implied rotation, and similarly for expansions. Thus in this experiment we presented concentric implied motion followed by concentric real motion and compared it to trials in which concentric implied motion was followed by radial real motion. We also compared radial implied motion followed by radial real motion to radial implied motion followed by concentric real motion.
Figure 3E shows the average peak BOLD responses for the four conditions. A real motion pattern after an implied motion pattern of the same type led to lower fMRI responses than a real motion pattern after an implied motion pattern of a different type [F(1,10) = 6.28, P < 0.05]. This was the case for both radial and concentric implied motion sequences. In terms of pattern-selective adaptation this suggests that the adaptation caused by an implied motion sequence affects the response to a real motion sequence, but only if it is of the same motion type (radial or concentric). This in turn suggests that the same subpopulations of cells respond (and therefore adapt) to both implied and real motion sequences of the same type.
From the activation measures in Fig. 3E we determined the selectivity index, shown as the gray bar in Fig. 3F. It measures the extent to which pattern-selective adaptation in the motion complex is invariant to changes in the sequence from implied motion to real motion. The fact that the index is nonzero shows that some of the cells selective for real motion were also selective for implied motion. If the gray bar were as tall as the black bar, that would mean that adapting with real motion had the same effect as adapting with implied motion (when tested with real motion). The fact that the selectivity index appears lower in this experiment than in the experiments in which only real motion was used suggests that not all cells selective for real motion are also selective for implied motion. From the relative sizes of the real motion selectivity index and interaction selectivity index we infer that about 45% of cells selective for real motion were also selective for implied motion.
The main points of our findings in the motion complex can be illustrated by Fig. 3F. The selectivity indices shown in this graph (together with their associated statistical tests of significance) make three points: 1) hMT+/V5 is selective for the type of implied motion (white bar), 2) hMT+/V5 is selective for the type of real motion (black bar), and 3) the selectivity for implied and real motion sequences is subserved by overlapping subpopulations (gray bar). For the other visual areas, we will present only these final steps of the analysis.
Retinotopic ventral and dorsal visual areas and the LOC
The same analysis shown in detail for hMT+/V5 in Fig. 3 was applied to the data from the other ROIs in the ventral and dorsal pathway.
Implied motion
The responses for two Glass sequences of the same type were significantly lower than the responses to two Glass sequences of different types in all early areas (V1, V2), dorsal retinotopic areas (V3, V3a, V3b/KO), as well as ventral retinotopic areas (Vp,V4) and the higher occipitotemporal area (LOC). Thus all areas showed some selectivity for these sequences; their selectivity indices are represented by the white bars in Fig. 4. Statistical analysis of selectivity gave the following results: V1: F(1,10) = 43.36, P < 0.001; V2: F(1,10) = 37.15, P < 0.001; V3: F(1,16) = 20.86, P < 0.01; V3a: F(1,16) = 30.33, P < 0.001; V3b/KO: F(1,16) = 53.76, P < 0.001;Vp: F(1,20) = 68.61, P < 0.001; V4: F(1,20) = 72.41, P < 0.001; LOC: F(1,20) = 70.21, P < 0.001.
|
We also analyzed the responses to the real motion sequences (expt 2) and again found that there was significantalbeit smallselectivity in early visual areas (V1, V2, V3), ventral areas (Vp, V4, LOC), and the dorsal stream (V3a, V3B/KO). Statistical analysis gave the following results: V1: F(1,10) = 122.12, P < 0.001; V2: F(1,10) = 97.35, P < 0.001; V3: F(1,12) = 5.79, P < 0.05; V3a: F(1,12) = 20.88, P = 0.001; V3b/KO: F(1,12) = 9.02, P = 0.01; Vp: F(1,20) = 12.76, P < 0.01; V4: F(1,20) = 6.89, P = 0.01; LOC: F(1,20) = 6.19, P < 0.05. The selectivity index per area is represented by the black bars in Fig. 4. Note that this selectivity does not necessarily imply motion selectivity; the direction of motion is one feature that distinguishes radial real motion patterns from concentric motion patterns, but each of these patterns also has a distinct structure or form. Given the known properties of cells in the ventral stream, the selectivity observed in V4 and the LOC is most likely attributable to selectivity for the form suggested by the motion, consistent with previous studies that show enhanced fMRI responses in ventral visual areas for moving compared with static forms (Grill-Spector et al. 1998
; Kourtzi et al. 2002
).
Interactions between real and implied motion
As in hMT/V5, we then tested whether pattern-selective adaptation to an implied motion pattern transferred to a reduced response in a real motion pattern. The aim of this experiment was to determine whether the neural subpopulations selective for implied motion sequences were also selective for real motion sequences. We found (Fig. 4) that the selectivity indices in the dorsal areas (V3, V3a, V3b/KO) were typically at least twice as large as those in V1 and V2 and the ventral visual areas (Vp, V4, LOC). This suggests that the overlap between the neural populations responding to implied and real motion stimuli was much larger in the dorsal areas than in early or ventral areas.
Specifically, the BOLD response for a real motion sequence after an implied motion sequence of the same type was lower than the response to a real motion sequence after an implied motion sequence of a different type in the dorsal visual areas [V3: F(1,14) = 27.55, P < 0.001; V3a: F(1,14) = 52.07, P < 0.001; V3b/KO: F(1,14) = 2.64, P = 0.05]. However, no significant differences were observed between the fMRI responses for same or different types of sequences in V1 and V2 [F(1,10) = 1.11, P = 0.31] or ventral visual areas [Vp: F(1,20) = 1.61, P = 0.21; V4: F(1,20) <1, P = 0.85; LOC: F(1,20) = 1.68, P = 0.21].
Sensitivity to local versus global changes
Figure 4 shows that some selectivity for implied motion sequences is already present at the level of V1. This leads to two questions. First, how can area V1, with its small receptive fields, be selective for such large patterns? Second, if V1 is selective, does that mean that all selectivity in higher areas is simply inherited from V1, or can selectivity be observed in higher areas with patterns for which no selectivity is observed in V1?
The first question is easily answered; the sequences (concentric vs. radial) differ not only at the global scale, but they also have systematic differences at the scale of a V1 receptive field. In fact, if a given part of the screen contains a pair of dots oriented one way in a concentric pattern, that same part of the screen will contain the orthogonal orientation in a subsequent radial pattern. The highlighted pairs in Fig. 1A illustrate this. Thus (local) orientation selectivity in an area could be enough to lead to the selective adaptation we observed. Experiment 4 was designed to test this hypothesis (Fig. 5). We divided the screen into 64 segments and assigned a random orientation to each segment (see Fig. 5 and METHODS). In one condition, two segmented Glass sequences, with the same random orientation per segment, were shown in succession. In the other condition, a segmented Glass sequence was followed by another segmented sequence in which the orthogonal orientation was assigned to each segment. Locally (at the scale of the segments), the difference between these two sequences was a 90 ° orientation change, just like the transition from a concentric to a radial Glass pattern.
Figure 6 shows that the selectivity index in V1 was about as large for these local changes as it was for the global changes documented in Fig. 4. In particular, fMRI responses were significantly stronger for segmented Glass sequences with orthogonal orientations than the same orientation [V1: F(1,30) = 7.16, P = 0.01; V2: F(1,30) = 2.13, P = 0.05]. Thus it seems likely that the selectivity to the global changes of expt 1 in V1 was in fact a result of local orientation selectivity. This is in agreement with single-cell data demonstrating that the orientation signals present in Glass sequences are enough to drive orientation-selective V1 cells (Smith et al. 2002
). As Fig. 6 shows, selectivity for local orientation was also found in other areas, both dorsal [V3: F(1,54) = 6.59, P = 0.01; V3a: F(1,54) = 6.27, P = 0.01; V3b/KO: F(1,54) = 2.98, P = 0.05; hMT+/V5: F(1,10) = 8.49, P < 0.01] and ventral [Vp: F(1,60) = 16.72, P < 0.01; V4: F(1,60) = 22.54, P < 0.001; LOC: F(1,60) = 12.21, P < 0.001]. It is possible that cells in areas such as hMT+/V5 picked up some of the implied motion signals at the scale of the segments, or that this reflects true local orientation tuning in these areas (Albright 1984
). Nevertheless, the most parsimonious explanation of this selectivity is that it was inherited from the differential responses in V1.
|
This shows that higher areas in both the dorsal and the ventral stream but not the early areas V1 and V2 were selective (as measured by the BOLD signal) to sequences that have similar local properties but distinct global structure. These findings suggest that at least part of the selectivity for the global organization of Glass sequences arises in areas beyond V1 and V2.
Control experiments and analysis
It is conceivable that attention could be engaged more during trials in which two different stimuli were presented than during trials in which the same stimulus was shown twice. Because attention is known to modulate responses in visual areas, this could then have influenced our results. To control for this possibility, subjects performed a matching task during all scans (see METHODS). This task drew the subject's attention toward the stimuli, even in trials in which the same stimulus was shown twice. A KruskalWallis ANOVA on ranks of the subjects' performance on this task showed that there was no significant difference in the percentage correct across conditions in any of the experiments (expt 1: P = 0.92; expt 3: P = 0.32, expt 4: P = 0.98; because of a software error, the behavioral responses from expt 2 could not be analyzed). The constant level of performance in these experiments indicates that a similar amount of attention was always devoted to the stimuli, regardless of condition. Moreover, it is highly unlikely that observers could selectively choose to attend to particular conditions because trials were presented in quick succession and were randomly interleaved. However, the matching task was not particularly difficult, as witnessed by the high average level of performance (96% correct). This implies that some attentional capacity may have been left for the subject to allocate differently in different conditions.
We therefore performed an additional control experiment. We repeated the implied motion experiment, with the instruction to the subjects to detect a change in the color of the fixation point (see METHODS). This task was subjectively much more difficult than the matching task. Analysis of the behavioral data showed that the number of undetected changes in the fixation point did not vary significantly with condition [F(2,4) = 0.93, P = 0.49]. Moreover, the reaction times for the correctly detected changes in the fixation spot were also not significantly modulated by the stimulus condition [F(2,4) = 2.28; P > 0.05]. These behavioral data suggest that attention was allocated similarly (to the fixation point) in all five conditions. Analysis of the fMRI data and the corresponding selectivity indices obtained in these sessions confirmed the adaptation effects that we reported above for implied motion in expt 1 across all areas [V1: F(1,3) = 58.08, P < 0.01, V2: F(1,3) = 104.97, P < 0.01, V3: F(1,3) = 16.46, P < 0.05, V3a: F(1,3) = 12.24, P < 0.05, Vp: F(1,3) = 134.04, P < 0.001, V4: F(1,3) = 119.92, P < 0.001, LOC: F(1,3) = 62.98, P < 0.001]. This control suggests that our findings were not confounded by a differential allocation of attention across conditions.
Eye movements
During the scans, observers were instructed to fixate the central fixation point. Eye movements of three subjects in each experiment were recorded (Eye-Link videobased system, 250-Hz sample rate). We compared eye position and saccades across conditions in each experiment. This analysis showed no significant differences in the average saccade number, the vertical or horizontal eye position, or the vertical or horizontal saccade amplitude between experimental conditions and the fixation condition. This analysis was applied to all four experiments and none of the statistical tests reached a value of P < 0.11 [F(2,8)
2.58]. This shows that the subjects were able to fixate for long periods of time and that it is unlikely that our findings could be confounded by differential eye movements across conditions.
Block design analysis
Region of interest analyses are sometimes criticized for preselecting areas and ignoring the rest of the brain. Our study tested very specific hypotheses about the involvement of typical motion and form areas in implied motion perception. In that context we believe the ROI approach to be appropriate. Nevertheless, to investigate whether other brain areas beyond the measured ROIs respond to implied motion sequences we ran block design scans that presented real, implied, and random motion. Each scan consisted of 16-s blocks with a given visual pattern, presented in a counterbalanced order. The seven conditions were: fixation only, concentric Glass, radial Glass, segmented/random Glass, concentric motion, and radial motion. Thus all stimuli that were used in the event-related paradigms were presented here in separate blocks. In each block 20 sequences were presented each for 300 ms followed by 500-ms blank. During this experiment, subjects performed the fixation point color-change detection task (five color changes of the fixation point per block).
Confirming our ROI analysis, we observed significantly stronger activations for Glass sequences than segmented/random (P < 0.01) in hMT+/V5, V3b/KO, and LOC. These activations did not cover the full extent of these areas as defined by independent localizers. Moreover, consistent with previous studies of motion-related areas, we also observed significantly stronger (P < 0.01) activations for real and implied than random motion sequences anterior to hMT+/V5 (Senior et al. 2000
; Zeki et al. 1993
) and along the intraparietal sulcus (Claeys et al. 2003
).
| DISCUSSION |
|---|
|
|
|---|
fMRI adaptation
The fMRI adaptation paradigm (Grill-Spector and Malach 2001
) is being used in an increasing number of studies to uncover selective subpopulations of neurons at a resolution below that of the typical human fMRI voxel. The main assumption behind this paradigm is that if a neuron responds to a pattern (pattern selectivity) it will respond less to the second presentation of a pattern than to the first. This assumption has considerable support in early sensory areas (e.g., V1: Movshon and Lennie 1979
; MT: Petersen et al. 1985
). In psychophysics (Blakemore and Campbell 1969
) and functional imaging (Grill-Spector and Malach 2001
) a noninvasive measure of pattern-selective adaptation is used to infer neuronal selectivity. Specifically, to infer whether a population can distinguish pattern A from B, one presents two identical patterns successively (AA as well as BB) and measures whether this leads to a smaller response than presenting two different patterns successively (AB and BA). If both the AA and BB responses are smaller than the AB and BA responses, then there must be two separate mechanisms that respond and adapt to A and B.
The validity of adaptation fMRI has not yet been confirmed by detailed comparisons of adaptation effects at the neuronal and imaging levels. Initial reports (Sawamura et al. 2004
), however, suggest that although successive presentation of identical stimuli does typically reduce responses, there are more complicated sequence effects. Some of these properties may be area specific, as witnessed by the qualitatively different adaptation effects found in, for instance, area MT compared with area V1 (Kohn and Movshon 2004
; B Krekelberg, RJ Van Wezel, and TD Albright, unpublished observations). Moreover, there are still many uncertain steps relating even neuronal activity to the BOLD response (Logothetis and Wandell 2004
). Thus even if a reduced BOLD response is observed in an adaptation paradigm, the underlying mechanism need not be neuronal pattern-selective adaptation.
These caveats notwithstanding, adaptation fMRI has been successful in various contexts in which the results could be verified at least indirectly with intracortical recordings. Examples are orientation selectivity in visual areas (Boynton and Finney 2003
; Kourtzi et al. 2003
; Tootell et al. 1998
) and motion selectivity in area MT (Huettel et al. 2004
; Huk et al. 2001
, 2002
; Tolias et al. 2001
). Our study is another example; monkey single-cell recordings showed implied motion selectivity in MT (Krekelberg et al. 2003
) and our adaptation fMRI revealed selectivity in the human motion complex. However, although some studies, including the current one, could demonstrate orientation selectivity in V1 with adaptation techniques (Kourtzi et al. 2003
; Tootell et al. 1998
), others could not (Boynton and Finney 2003
). These discrepancies may be attributable to differences in the stimuli (Boynton and Finney used low spatial frequency stimuli that were nonoptimal for typical V1 cells), but it also points to possible area differences in adaptation and the effect it has on the BOLD signal. Thus until a fuller validation of the adaptation fMRI paradigm has been obtained with intracortical recordings, statements regarding underlying neuronal selectivity made on the basis of imaging data alone should be treated with caution. Imaging data, however, certainly can be suggestive of and consistent with neuronal selectivity. Treated as a piece of evidence in favor of such selectivitynot the final and conclusive answerthey can be highly valuable.
Glass patterns and implied motion
The percept of motion in a sequence of Glass patterns is so convincing that many observers find it difficult to believe that the motion energy in these sequences is just as balanced as the motion energy in a sequence of random-dot patterns. The APPENDIX provides a mathematical proof of this theorem, which states that in both random-dot sequences and in Glass sequences, the average motion energy spectrum is symmetric: The motion energy in any direction is on average matched by the motion energy in the opposite direction.
Motion energy detectors rely on asymmetries in the stimulus power spectrum (Adelson and Bergen 1985
). Because the average power spectrum of a Glass pattern is symmetric, a motion energy detector will not assign a consistent direction of motion to it. On the basis of the motion energy distribution alone such a stimulus is therefore not expected to lead to a consistent, coherent motion percept. This is our reason for referring to these balanced motion energy patterns as containing no coherent motion. When we speak of implied motion in Glass patterns, we refer to the percept of coherent motion generated in the absence of coherent motion signals. By contrast, when we speak of real motion, we refer to the percept generated by stimuli whose average power spectrum is asymmetric and therefore contain unambiguous coherent motion signals.
The theorem in the APPENDIX, however, applies to the average motion energy in all Glass pattern sequences of a particular type. Because of the stochastic nature of the placement of the dots, the motion energy at any given time can be larger in any given direction, just as in random-dot sequences. This leaves open the possibility that any individual Glass sequence does contain an asymmetry and that this asymmetry is detected by a simple motion energy detector. However, if these accidental coherent motion components were the underlying cause of the perceived direction of motion, one would expect that a reversal of the sequence would lead to a reversal of the perceived direction of motion. Krekelberg et al. (2003)
demonstrated that this prediction is not borne out by the data. We conclude that the implied motion percept is not generated by stochastic fluctuations in the motion energy of the stimulus.
Some stage of processing beyond motion energy extraction is therefore required to explain why Glass patterns appear to move coherently. In this context it is instructive to note that some stimuli with balanced motion energy fail to evoke a coherent motion percept. In a random-dot sequence, for instance, the balanced motion energy does not lead to the percept of a globally coherent direction of motion. Instead, it results in a percept of directed motion that changes rapidly over time and is presumably driven by the stochastic fluctuations in dot placement. In the Glass pattern sequences, on the other hand, a globally coherent motion percept is evoked that alternates between only two directions over time (e.g., clockwise and counterclockwise for concentric Glass patterns). This perceptual difference is the phenomenon whose neural mechanisms we wish to understand. The difference between random sequences and Glass sequences is one of form: both global form (concentric patterns) and local form (oriented elements). Thus in this descriptive sense, the form of the Glass patterns generates the implied motion. Whether neural mechanisms make explicit use of this form information to enhance motion processing (Geisler 1999
) is a topic of ongoing investigation.
Implied motion selectivity in the human motion complex
A recent imaging study has reported not to find selectivity for Glass sequences in hMT+/V5 (Wade et al. 2004
). We believe this arose primarily from two factors. First, the authors used a restrictive definition of hMT+/V5; they chose voxels that responded better to coherent motion than to random motion. Many MT cells in the macaque, however, respond vigorously to random motion (Churan and Ilg 2002
; Krekelberg and Albright 2005
; Thiele et al. 2000
). Our definition of hMT+/V5 included these cells. Second, the block design of Wade et al. looked for voxels that responded better to Glass sequences than to random motion. However, if cells responsive to random motion and implied motion are spatially interleaved, no difference in activation between the blocks would be expected. The adaptation technique, on the other hand, can resolve selectivity at a spatial resolution below that of a single voxel. Thus the findings of Wade et al. do not contradict ours; at the scale of MRI voxels there is no part of hMT+/V5 specialized for the detection of implied motion, although our study shows that hMT+/V5 does contain spatially intermingled subpopulations of neurons selective for implied motion.
Mechanisms for implied motion perception
Our data confirm and extend the findings of Krekelberg et al. (2003)
who showed that a subpopulation of cells in the macaque superior temporal sulcus (MT and MST) responds to implied motion as if it is real motion. These cells responded best to those Glass patterns that evoked the strongest sense of motion and the cells' selectivity for real motion carried over to implied motion. This generalization suggests that these cells extract motion signals independent of the real or implied motion cue that delivers the signal. This cue invariance was also evident in the imaging data obtained from dorsal motion areas, but not ventral areas.
The implied motion selectivity in the dorsal stream could arise from direct judicious sampling of neurons in the early visual areas (V1, V2). Both the single-cell data (Smith et al. 2002
) and our imaging data show that these areas contain the necessary local orientation information. Moreover, in MT a subset of cells responds to oriented features that are aligned along the preferred direction of motion of the cell (Albright 1984
; Maunsell and Van Essen 1983
). The oriented form features of Glass patterns could activate these cells and thereby signal motion along those oriented features. In line with the percept of motion in plaids, these type II MT cells respond to plaid stimuli containing two directions of motion as if they contain a single direction of motion (Rodman and Albright 1989
). Thus they are promising candidates for the computation needed to extract coherent motion from the multiple balanced motion signals in Glass patterns. At the same time, however, it is important to note that the majority of MT cells (nearly 80%) respond preferentially to oriented features orthogonal to their preferred direction of motion (Albright 1984
). Activation of these cells by Glass patterns would signal motion orthogonal to the motion implied by the pattern. Thus even though some MT cells are expected to respond simply to the oriented features in the Glass patterns, this by itself does not explain why motion is perceived along, and not orthogonal to, the oriented features.
Alternatively, implied motion selectivity in dorsal areas could arise from interactions with early form areas (V4, LOC). Our data support this view in that V4 and LOC contain subpopulations selective for the global structure in these patterns. Such an explicit use of orientation information could improve motion processing at high velocities (Geisler 1999
). The temporal resolution of current functional imaging is not high enough to test whether selective responses in form areas precede those in the motion areas and thus we cannot resolve whether implied motion selectivity arises directly from V1 and V2 or through an interaction with V4 and the LOC. However, our study does provide electrode guidance for intracortical recordings that could address this issue. Area V4 and the inferotemporal cortex (IT) of the monkey have form selectivity that is comparable to that of V4 and the LOC in humans (Denys et al. 2004
; Gallant et al. 2000
; Kobatake and Tanaka 1994
). Using simultaneous single-cell recordings from areas that respond to complex form (such as V4 or IT) and MT, it should be possible to test the hypothesis that an interaction between form and motion areas underlies our perception of implied motion. It would be particularly interesting to determine whether that same interaction also underlies the perception of motion in the more cognitive implied motion images, such as a cup about to fall from a table (Kourtzi and Kanwisher 2000a
; Senior et al. 2000
).
Mechanisms for global form perception
We concentrated on the implied motion percept generated by Glass pattern sequences. These patterns, however, also generate a strong perception of global form. In fact, most of the work on Glass patterns has concentrated on how the local elements that carry the orientation information are bound together to generate the global form percept. Behavioral (Wilson and Wilkinson 1998
) and recent ERP studies (Pei et al. 2005
) show that the detection of structure in radial or concentric Glass patterns involves pooling of local information beyond the scale of typical V1 receptive fields. A case study by Gallant et al. (2000)
strongly implicated V4 because a lesion involving V4 significantly disrupted a patient's ability to detect the global structure in Glass patterns. Single-cell studies in monkeys also point to areas beyond V1 because selectivity for so-called non-Cartesian gratings and complex object features arises in V4 (Gallant et al. 1993
, 1996
; Kobatake and Tanaka 1994
). Moreover, a recent psychophysical study (Clifford and Weston 2005
) provides evidence that adaptation to the global structure of Glass patterns also has two components, one likely to originate in the local orientation detectors of V1, the other likely to originate in cells with much larger receptive fields and therefore presumably in extrastriate areas. Our data are compatible with this view. Local orientation selectivityeven for the noisy oriented elements in a Glass patternwas evident in V1, but selectivity for the global organization of a Glass pattern was observed only in later ventral areas (V4, LOC). In agreement with previous models this suggests that global form selectivity starts to arise in V4 from appropriate pooling of V1 orientation detectors.
Interestingly, ventral areas were also selective for the structure of global motion patterns. Previous work has also documented this overlap of sensitivity for motion and form (Braddick et al. 2000
; Denys et al. 2004
). Our data suggest, however, that the selectivity for real motion is largely carried by a different subpopulation of cells than those selective for the structure of Glass patterns. In other words, whereas the representation of motion information in the human motion complex has a significant degree of invariance with respect to implied and real motion cues, the representation of the form of those same patterns in V4 and LOC is not cue invariant. The reason for this may be that, although motion processing is essentially complete once the velocity is estimated, form information may have to be processed in much more detail to extract further information on the objects that generate it. For such a detailed analysis an early stage of cue-invariant processing would be detrimental.
| APPENDIX |
|---|
|
|
|---|
). The power spectrum of P is written as F(P)F*(P), where * denotes the complex conjugate. Because dot patterns are chosen randomly in each frame of the sequence, there are, on average, no spacetime correlations in P. Averaged over time, each direction of motion is equally likely to occur, and the motion signals are balanced. In Fourier space, the balance in directional signals results in a stimulus power spectrum that is symmetric around the
-axis in k
space. We refer to such a stimulus as one that has no coherent motion energy. (By contrast, a stimulus whose power spectrum is consistently oriented in k
space is referred to as a real motion stimulus.)
To construct a Glass pattern sequence from a sequence of random-dot patterns, each element in the original sequence P is shifted along some one-dimensional coordinate. For translational Glass patterns this is a simple spatial translation; for concentric and radial Glass patterns this is a translation in polar coordinates. We will represent this translation/rotation/expansion operation by the operator S, which shifts a pattern by an amount s. With this notation, a Glass pattern (G) is simply the sum of a random-dot pattern sequence and that same sequence after the operation S
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |