We examined the role of temporal synchrony—the simultaneous appearance of visual features—in the perceptual and neural processes underlying object persistence. When a binding cue (such as color or motion) momentarily exposes an object from a background of similar elements, viewers remain aware of the object for several seconds before it perceptually fades into the background, a phenomenon known as object persistence. We showed that persistence from temporal stimulus synchrony, like that arising from motion and color, is associated with activation in the lateral occipital (LO) area, as measured by functional magnetic resonance imaging. We also compared the distribution of occipital cortex activity related to persistence to that of iconic visual memory. Although activation related to iconic memory was largely confined to LO, activation related to object persistence was present across V1 to LO, peaking in V3 and V4, regardless of the binding cue (temporal synchrony, motion, or color). Although persistence from motion cues was not associated with higher activation in the MT+ motion complex, persistence from color cues was associated with increased activation in V4. Taken together, these results demonstrate that although persistence is a form of visual memory, it relies on neural mechanisms different from those of iconic memory. That is, persistence not only activates LO in a cue-independent manner, it also recruits visual areas that may be necessary to maintain binding between object elements.
The importance of motion in figure–ground segregation was elegantly illustrated by Regan (2000). He showed that the outline of a bird, drawn in line segments, is invisible when placed over similar line segments forming a background. Strikingly, the bird becomes visible the moment it is moved relative to the background. Regan also used this to elicit another important percept—when motion stops, the bird does not disappear immediately into the background but takes a few seconds to do so. The percept is a form of short-term visual memory that, after the binding cue is removed, maintains a link between the elements that belong to the object and segregates them from those of the background. This percept, which we call persistence, is different from the low-level sensory representations of iconic visual memory (Di Lollo 1980; Hollingworth et al. 2005; Sperling 1960). The latter is the memory for the object elements themselves and is much shorter in duration—just fractions of a second (Tamura and Tanaka 2001). Determining where and how the binding of elements disintegrates (persistence) can be as important as determining the process by which this binding is formed.
The fact that persistence takes place over a few seconds is important because it allowed us to use functional magnetic resonance imaging (fMRI) to examine the process. With a high temporal sampling rate (500 ms/volume) we were the first to measure the time courses of brain activation during persistence to identify its neural substrates (Ferber et al. 2003). Perceptual persistence, relative to a control condition in which no persistence was perceived, was associated with increased activation in the lateral occipital (LO) area, which is part of the larger lateral occipital complex (LOC), an important intermediate processing level in object perception and recognition (Grill-Spector et al. 1998a,b, 2001; Kourtzi and Kanwisher 2001; Malach et al. 2002). Surprisingly, no activation related to motion-defined persistence was observed in the motion-selective middle temporal complex (MT+). Motion is only one of many binding cues that can be used in figure–ground segregation (Treisman and Kanwisher 1998; von der Malsburg 1995). Color is another binding cue that leads to both persistence and associated activation increases in LO (Large et al. 2005).
One goal in this study was to determine whether temporal synchrony also elicits activation that is related to persistence within LO. There is some debate whether the human visual system can use temporal synchrony for the perceptual grouping of image regions into unified objects (Farid 2002). Indeed fMRI studies (Caplovitz et al. 2007) suggest that temporally synchronous displays elicit no greater activation of LO than asynchronous displays. Whereas our past research leads to the prediction that temporal synchrony, like other binding cues of motion (Ferber et al. 2003) and color (Large et al. 2005) should lead to increased LO activation during persistence, the results of Caplovitz and colleagues (2007) suggest that LO may not be sensitive to temporal synchrony or persistence arising from it. To investigate this question, we generated stimuli in which segmented lines forming the outline of an object appeared simultaneously atop a preexisting background of scrambled lines. We found, as in the motion cue used by Regan (2000), the simultaneous onset of object segments leads to a percept of the object that persists for several seconds after the appearance of the binding cue.
A second goal of this study was to examine the role of additional visual areas in the phenomenon of persistence versus iconic memory. Our previous studies used a high temporal resolution to measure persistence duration and thus we were able to sample only a small number of slices that always included LO but did not entirely sample earlier visual areas. Although LO activation was reliably prolonged during visual persistence, it is possible that structures preceding LO in the visual cortical hierarchy are the origin of this signal. To examine this possibility, we performed a second experiment using a technique developed by Mukamel et al. (2004) to study the neural substrates of iconic memory, the short-term storage of visual information lasting several tenths of a second (Sperling 1960). This technique examined activation over a 12-s period during which an object flashed for 250 ms at a frequency of 4 versus 1 Hz. The rationale was that in areas without an iconic memory trace, activation would be proportional to the repetition rate. Thus the 1-Hz rate should show considerably less activation than that of the 4-Hz rate. In contrast, for areas with an iconic memory trace, the representation of the object would be maintained even when it was not physically present and thus the activation levels should be similar at 1 and 4 Hz: the longer the memory trace, the less the difference. They found that the drop in activation for 1 versus 4 Hz was smaller in area LOC than that in areas V1 and V2, suggesting that LOC may subserve iconic memory. By the same logic, persistence, which lasts several seconds rather than a fraction of a second, should show an even smaller drop in activation between 1 versus 4 Hz in LOC and perhaps also in earlier areas. This indeed proved to be the case and we were able to show that persistence originated at earlier stages than previously thought.
All subjects were normal healthy young adults with normal or corrected-to-normal vision and provided informed consent, as approved by the Health Sciences Review Ethics Board at the University of Western Ontario. The number of subjects in each experiment is noted in the figure captions.
While lying in the magnet, subjects viewed visual stimuli displayed (model VT540 liquid crystal display projector, NEC Display Solutions, Itasca, IL) on a rear-projection screen placed 20 cm from the eyes. The visual stimuli consisted of an object made up of segmented lines that was displayed over a background composed of similar segmented lines subtending 17° × 16° of visual angle (see Fig. 1).
Experiments were performed in a 4.0-Tesla Varian-Siemens (Palo Alto, CA; Erlangen, Germany) whole body imaging system. A 15.5 × 11.5-cm quadrature radiofrequency surface coil placed at the occipital pole was used to attain a high signal-to-noise ratio in visual areas. Functional data based on the blood oxygenation level–dependent (BOLD) signal were aligned to high-resolution inversion-prepared three-dimensional T1-weighted anatomical images of the brain collected immediately after the functional images using the same in-plane field of view (FOV). The parameters for the spiral sequence anatomicals were: 96 slices; FOV 19.2 × 19.2 cm; in-plane pixel size = 0.75 × 0.75 mm; slice thickness = 1.25 mm; time to echo (TE) = 3 ms; time to repeat (TR) = 50 ms; inversion time (TI) = 1,300 ms. The parameters for functional T2*-weighted scans varied across the different types of scans, as subsequently detailed. Data were analyzed with Brain Voyager QX v. 1.9 (Brain Innovation, Maastricht, The Netherlands).
For each of the two experiments (see following text), we used a region of interest (ROI) approach in which we first conducted localizer scans, often on a different day, at standard temporal resolution to identify areas LO and MT+. To localize LO, we contrasted the activation for intact images versus their scrambled counterparts. In addition, to identify MT+, which lies immediately anterior to LO and is a useful aid for localizing LO, we contrasted the activation for moving versus stationary lines. For more details on the localizer paradigms, see Large et al. (2005). For LO and MT+ localizer scans, 11 oblique slices × 5-mm thickness with 3 × 3-mm in-plane resolution were sampled using a 2,000-ms volume acquisition time (based on 2 segments/plane × TR = 1,000 ms, spiral imaging; TE = 15 ms; flip angle [FA] = 45°). Slices were oriented parallel to the calcarine sulcus to provide a large volume that would be certain to contain LO.
As well, we mapped retinotopic regions using the horizontal and vertical meridians to identify the early visual areas V1, V2, V3, and V4 (for details see Large et al. 2005). We presented 16-s movie clips that played within wedges in one of four orientations: up, down, left, and right. By contrasting the up and down versus the left and right orientations, we were able to distinguish the meridians that divide the early visual areas.
Experiment 1: does LO show persistence for temporal synchrony cues?
The first experiment sampled the slow event-related time course for persistence from temporal synchrony with a high temporal sampling rate, as in earlier investigations of persistence from motion and color. To attain this high temporal resolution, the number of slices was limited. Once the localizer runs had been used to identify the slices containing MT+ and LO, experimental scans examined a subset of five contiguous slices with a volume acquisition time of 0.5 s (based on 2 segments/plane × TR = 250 ms; spiral imaging; FOV 19.2 ×19.2 cm; in-plane pixel size 3 × 3 mm; TE = 15 ms; FA = 45°; slice thickness = 5 mm).We presented stimuli in three different forms of temporal synchrony (Fig. 1): two conditions that could elicit persistence, the Persistence conditions, and one that did not, the Vanish condition. In the Flash On Persistence condition (Fig. 1B), an object consisting of segmented lines was displayed synchronously on top of a preexisting background composed of similar segmented lines. In the Flash Off–On Persistence condition (Fig. 1C), the object was initially camouflaged in the background. The object then flashed off for 200 ms and then on again, making it visible and briefly persistent. Finally, in the Flash On–Off Vanish condition, the object elements were flashed on and then off after 200 ms. This elicited iconic visual memory but no persistence. Four runs consisting of 14 counterbalanced trials, each 24 s in duration, were presented. Each run contained four trials of each of the three different conditions. In addition, there were two control fixation periods, with a background but no object (Fig. 1A), at the beginning and end of each run, and the average of these was used as a baseline for computing percentage signal change. In each trial, subjects were required to press a button with their dominant hand after the object percept disappeared. This was a reasonably difficult task. In a block of 12 instances, the animal faced right only once. Subjects were not able to correctly identify this right-facing instance about 50% of the time.
Experiment 2: do persistence and iconic memory have different neural substrates?
The second experiment used standard temporal resolution with a block design to examine activation for persistence and iconic memory across a wider range of visual areas. Imaging parameters were the same as those used during the localizer scans.
The three conditions used are illustrated in Fig. 2. Each began at time 0 ms with a change in the background (Fig. 2A). In the Persistence condition, the segmented lines of an object were added at 100 ms and remained on for the duration of the trial (Fig. 2B). The Vanish condition was the same except that the object appeared for only 50 ms (Fig. 2C). In this condition, once the stimulus vanishes, it should nevertheless be maintained in iconic memory for a fraction of a second. The Mask condition was the same as the Vanish condition, except that a different background masked the object as soon as it vanished (i.e., at the 150-ms mark) (Fig. 2D). The replacement of the background served as a backward mask that disrupted iconic memory. The condition consisting of a new background with no object served as the 0% baseline control (Fig. 2A). The data for each area and subject were normalized such that the 4-Hz Mask condition activation was defined as either 100% or 1; the other activations were scaled accordingly.
The three conditions were presented at a rate of 1 or 4 Hz in randomized blocks of 12-s duration. To maintain attention, subjects were required to press a button when they perceived a rightward-facing object presented at random. Each subject performed six runs, with each run consisting of 26 randomized blocks including fixation control periods at the beginning and end of each run.
We compared the Persistence condition at 1 Hz for three different binding cues: temporal synchrony, color, and motion. For each condition the background and object rotated ±15°, with a period of 2.5 s to avoid afterimages and retinotopic adaptation. All three cues were given 100 ms after the background was presented. For temporal synchrony cues, the object lines flashed on and remained on. For color cues, black object lines were embedded in the background at the beginning of the trial and changed to red for 50 ms, to distinguish the object from the black background lines. For motion cues, the object lines were again initially embedded in the background and then rotated at a higher velocity than that of the background for 50 ms. Each of these conditions was compared with the Mask condition as before.
To determine whether a neural correlate of persistence would be observed in LO when an object was cued by temporal synchrony, as shown in Fig. 1, we examined the time courses in LO for the two persistence conditions and the control condition (Fig. 3).
The three traces in Fig. 3 show the activation in LO for the three conditions measured from individually defined LO ROIs and then averaged across our eight subjects. Note that following the onset of the transient (time = 0 s), activation rises for all three conditions, peaking after 6–8 s, with a greater increase for the two Persistence conditions in which the object is perceived. Importantly, after this peak (time = 9–12 s), the activation for the Vanish condition, in which no persistence occurred, rapidly returns to baseline, whereas the activation in the two Persistence conditions remains high.
To determine, in a more quantitative manner, whether persistence did indeed produce larger late responses during the percept of longer persistence, we averaged, as did Large et al. (2005), the percentage signal change over the last 3 s of the persistence period (i.e., 9 to 12 s after the onset of the binding cue, as indicated in gray in Fig. 3). This period was chosen for the following reason. Given that behavioral persistence lasted about 5 s (Fig. 4) and that Fig. 3 shows that it takes about 6 s until the fMRI response peaks, we wanted to measure a response as late as possible. The last 3 s seemed reasonable.
Figure 4 plots this percentage signal change against the behavioral measures of the duration in each of the three conditions for each subject and the group average. The percentage signal change responses of the two Persistence conditions are significantly larger than the responses to the Vanish condition [Vanish vs. Flash On Persistence, t(1,7) = 3.2, P = 0.015]. Although the activation during the Flash On Persistence condition appears larger than the Flash Off–On condition, this difference did not reach statistical significance [t(1,7) = 2, P = 0.08].
The behavioral measures of the durations for these three conditions show a similar pattern. In this case, however, the durations in the two Persistence conditions are significantly different [Vanish vs. Flash On Persistence, t(1,7) = 5.7, P < 0.001 and Flash On Persistence vs. Flash On–Off Persistence, t(1,7) = 3.42, P = 0.01].
What could cause this difference? One possibility is that in the Flash Off–On condition (Fig. 1B), the presence of the object lines for 12 s caused adaptation earlier within the visual stream. so that when these lines came on again they produced a less-potent signal than that in the Flash On condition. To examine this possibility, we repeated the behavioral experiment, but this time with the background and the object rotating together ±15°. Under these rotating conditions, the durations of the rotating Flash On and Flash Off–On conditions became the same. This suggested that in the Flash Off–On condition, prior exposure to the camouflaged object elements resulted in adaptation in early visual areas and thus a weaker input to LO.
At the 1-Hz presentation frequency, in each of the three experimental conditions, different processes should be occurring during the interstimulus interval (ISI, 900 ms) following the stimulus at 100 ms. During the Persistence condition, the percept of the stimulus should be maintained throughout the ISI. During the Vanish condition, iconic memory should maintain features of the stimulus for several hundred milliseconds of the ISI, although no persistence should occur. During the Mask condition, the presentation of a new background should lead to backward masking that wipes out the contents of iconic memory. Thus by subtraction logic (Culham 2006; Donders 1868; Donders 1969), a subtraction of Persistence − Vanish should isolate activation specific to persistence, as in past experiments (Ferber et al. 2003; Large et al. 2005). In addition, a contrast of Vanish − Mask should isolate activation specific to iconic memory. Thus to contrast the neural substrates of persistence and iconic memory, we began by performing these two subtractions for stimuli presented at 1 Hz, as shown in Fig. 5.
For the temporal synchrony binding cues, the contrast of Vanish − Mask at 1 Hz reveals activation within or near area LO (Fig. 5A), consistent with the suggestion of Mukamel and colleagues (2004): that higher-level visual areas (i.e., LO) may play a role in iconic memory. By comparison, the contrast of Persistence − Vanish also shows activation in LO, consistent with experiment 1, but also within the early visual areas, particularly V3 and V4 (Fig. 5B). In contrast, MT+—consistent with the findings of Ferber et al. (2003) and Large et al. (2005)—shows no persistence.
Next, as did Mukamel et al. (2004), we compared the responses in these areas at a 1-Hz presentation rate to a 4-Hz rate. In the absence of sustained memory processes, one would expect the lower frequency to elicit a considerably smaller response than the higher frequency. However, if some form of visual memory (iconic memory or persistence) sustains the level of activation at the lower frequency, then the two frequencies should yield more similar responses.
We computed the activation levels throughout ROIs in the visual hierarchy for each of our three experimental conditions across all subjects (Fig. 6 A). To facilitate comparisons between conditions and areas, activation levels were normalized such that the signal for the background condition (Fig. 2A) was set to 0 and the signal for the masking condition (Fig. 2D) at 4 Hz was defined as 1. Values for all other conditions were scaled accordingly. A repeated-measures ANOVA with factors of Region (V1, V2, V3, V4, and LO), Persistence (Persistence, Vanish, Mask), and Frequency (1 Hz, 4 Hz) was performed on the normalized percentage signal change. There were significant main effects for each of the factors [Region: F(4,28) = 12.15, P < 0.001; Persistence: F(2,14) = 23.66, P < 0.001; and Frequency: F(1,7) = 210.38, P < 0.001]. Note that all conditions showed an increase in activation between 1 and 4 Hz. There were significant two-way interactions between Region and Persistence [F(8,56) = 9.58, P < 0.001] and Region and Frequency [F(4,28) = 8.44, P = 0.004]. There was also a three-way interaction between all three factors, Region × Persistence × Frequency [F(8,56) = 3.7, P = 0.002], suggesting that regions in the visual hierarchy showed different patterns of activation, depending on whether there was persistence present and the frequency at which stimuli were presented. Activation in the Vanish condition shows little difference from that of the Mask except in area LO, supporting the suggestion that iconic memory processes are limited to LO. In contrast, the Persistence condition begins to show a difference from the Mask in area V1 [t(1,7) = 3.3, P = 0.01] and this difference peaks in V3 and V4 [V3: t(1,7) = 7.22, P < 0.001; V4: t(1,7) = 4.76, P = 0.002], consistent with the suggestion that persistence processes have neural correlates throughout the early visual system, but particularly at later stages.
To view the data in a more compressed way, the 1- to 4-Hz ratio (akin to the slope in Fig. 6A) is plotted in Fig. 6B. Recall that a higher ratio was attributed by Mukamel et al. (2004) to a greater role of that area in the storage of an object's visual memory. The Vanish condition shows a trend of increasing ratios from the early visual areas through to LO [linear trend analysis, F(1,7) = 12.65, P = 0.009]. It is only in LO, however, that this ratio first becomes significantly greater than zero [LO: t(1,7) = 11.32, P < 0.001; V4: t(1,7) = 1.911, P = 0.1], supporting the suggestion that iconic memory processes are limited to LO. In contrast, the Persistence condition is significantly greater than zero in all areas (one-sample t-test on Persistence ratios, P < 0.0001, in all five regions). Notably, the greatest differences between the Persistence and Mask ratios occur in areas V3 and V4.
In addition to examining temporal synchrony at different frequencies, we also compared activation produced by different binding cues. We compared, in a 1-Hz block design, the persistence activation produced by a synchronous onset (flash), a color cue, or a motion cue. These activations, normalized to the response produced by color, are shown for each area in Fig. 7. The three cues elicit equivalent persistence activation in V1, V2, V3, and LO. Only in area V4 does color elicit more activation than that of the other cues.
The results of both experiments clearly demonstrate that the synchronous onset of object elements is an effective binding cue for eliciting persistence in area LO. Taken together with similar results for motion and color binding cues (Ferber et al. 2003, 2005; Large et al. 2005), these new results suggest that perceptual persistence has a neural correlate in LO, regardless of the cue that is used to bind the elements. Although the duration of persistence may vary depending on the cue—persistence from temporal synchrony lasted longer (∼5 s) than that from motion (∼2–3 s) or color (3 s) (Ferber et al. 2003, 2005; Large et al. 2005)—binding during persistence activates LO regardless of the cue.
Our finding that LO is strongly implicated in persistence from temporal synchrony stands in contrast to the results of Caplovitz et al. (2007) who found that binding by temporal synchrony led to activation in early visual areas (V2, V3, and V4v), but not LO. However, there are two critical differences between our study and theirs. First, we explicitly studied persistence, whereas they studied the binding itself. Second, whereas the elements of the stimulus used by Caplovitz et al. were grouped only by temporal synchrony (four disks that flashed simultaneously vs. sequentially), our stimulus elements could also be grouped by line proximity and colinearity. Indeed, we previously found that persistence is longer for objects consisting of colinear elements than for scrambled objects that lack continuity (Ferber et al. 2005). In sum, the stimuli used by Caplovitz et al. (2007) may have been perceived as four separate objects that were grouped together rather than one single object consisting of four elements. Given that LO is activated by objects and the perceptual grouping of objects (Mendola et al. 1999), such stimulus differences may be critical. The finding by Caplovitz and colleagues—that these four disks produced greater activation in V2, V3, and V4v—is remarkable given that it is produced by the binding of neuronal responses across anatomically separated areas representing the four visual quadrants. It suggests that these early visual areas would also be capable of maintaining the binding between fragmented lines that form the contours of our objects, despite the removal of cues that segregate these from similar background fragments. Indeed our second experiment supports this.
Our second experiment demonstrated that visual persistence from temporal synchrony cues is associated with higher activation throughout the occipital cortex. This finding agrees with earlier findings from Large et al. (2005) who found a similar pattern of increasing activation from persistence arising from both motion and color cues as one moved through V1 to LO (see their Fig. 3). However, the present results not only extend the findings of Large and colleagues to a novel binding cue, they were also based on a larger number of slices and cover a greater extent of the visual cortex, particularly V2 and V3. Thus in Fig. 6 of Large et al., the extent of areas with more activation to persistence than to iconic memory is largely limited to LO and V4v; in contrast, however, our Fig. 4B shows that broad regions of the visual cortex are activated during persistence, peaking in V3 and V4 but extending to V2 and parts of V1. Except for a preference for the color binding cue in V4 [not unexpected, given the findings of Bartels and Zeki (2000) and Murphey et al. (2008)] these early visual areas also show persistence-related activity that is cue independent.
In addition, our second experiment clarifies the neural distinction between iconic memory and persistence. Specifically, we found, consistent with Mukamel et al. (2004), that iconic memory is associated with activation differences in higher-level object-selective areas, but not earlier visual areas. In contrast, persistence is associated with activation in LO as well as the early visual areas. Thus not only are there differences between persistence and iconic memory in behavioral duration, the neural substrates are also different. Why would persistence but not iconic memory activate early visual areas? We propose that the early visual areas are important for maintaining the binding between fragmented lines of our objects and their segregation from those of the background.
Considering the results of our research in the context of the larger literature on persistence, the phenomenon and its neural correlates do not appear to be merely a confound of attention. Focused attention seems to have little effect; for example, persistence is unaffected by the addition of an n-back task, which draws attentional resources to a second task (Ferber and Emrich 2007). Moreover, persistence cannot be accounted for by feature-based attention. Specifically, despite the fact that feature-based attention enhances activation in feature-selective brain areas (Corbetta et al. 1991; O'Craven et al. 1997), persistence does not show such feature-selective enhancement [e.g., motion-defined persistence does not enhance processing in motion-selective area MT+ (Ferber et al. 2005)]. In addition, it has also been shown behaviorally that persistence is unaffected by changes in the features of objects (Ferber and Emrich 2007). Specifically, the persistence effects remain strong even when the original set of line elements is replaced by their complements (Ferber et al. 2005). In other words, persistence does not require feature-based attention. Further, persistence cannot be accounted for by exogenous attention related to the occurrence of offset and onset transients because the relative activation levels are inconsistent with this hypothesis. For example, although the Mask condition includes both stimulus offset and onset, it yields no greater activation than the Vanish condition, with only a stimulus offset.
Taken together, our results show that persistence requires the following mechanisms: 1) segregation of the figure elements from the background elements by the use of binding cues, irrespective of whether they are motion cues (Ferber et al. 2003), color cues (Large et al. 2005), or the temporal synchrony of the figure elements, as shown in this study; and 2) binding between the figure elements, which must be maintained for some time after the binding cues disappear. Thus the neural mechanisms and networks subserving persistence must have 1) access to the binding cues (e.g., motion-, color-, or transient-sensitive cells); and 2) sufficient resolution to encode the specific line elements that form the shape and to distinguish these from the elements forming the cluttered background.
Regan (2000) suggested that the persistence related to structure from motion was a property of adaptation within MT+. However, Ferber et al. (2003) showed that MT+ has no persistence. An important aspect of our current findings is that we discovered evidence of persistence at some distance from MT+. To bind two lines that belong to the same object and to segregate these from lines that belong to the background may require a degree of visual acuity that is not present in MT+, but is present in V2 and V3.
This work was supported by Canadian Institutes of Health Research Grant 9335 to T. Vilis.
We thank S. Ferber and J. Snow for helpful comments on an earlier version of the manuscript and J. Gati and J. Williams for help in data acquisition.
- Copyright © 2009 the American Physiological Society