The concerted action of saccades and fixational eye movements are crucial for seeing stationary objects in the visual world. We studied how these eye movements contribute to retinal coding of visual information using the archer fish as a model system. We quantified the animal's ability to distinguish among objects of different sizes and measured its eye movements. We recorded from populations of retinal ganglion cells with a multielectrode array, while presenting visual stimuli matched to the behavioral task. We found that the beginning of fixation, namely the time immediately after the saccade, provided the most visual information about object size, with fixational eye movements, which consist of tremor and drift in the archer fish, yielding only a minor contribution. A simple decoder that combined information from ≤15 ganglion cells could account for the behavior. Our results support the view that saccades impose not just difficulties for the visual system, but also an opportunity for the retina to encode high quality “snapshots” of the environment.
Eye movements are essential for vision: if gaze is stabilized, the retina quickly adapts to stationary images and visual perception fades away within a time window of 100 ms (Ditchburn and Ginsborg 1952; Martinez-Conde et al. 2004; Riggs et al. 1953; Yarbus 1967). Our gaze is structured into saccades, which are rapid, large amplitude movements and periods of fixation (Carpenter 1977). Within each fixation, the eyes continue to make relatively small, jitter-like movements (Martinez-Conde et al. 2004). During saccades, the retinal image is highly blurred, and vision is partially suppressed in higher brain regions (Burr et al. 1994; Volkmann 1986). Each fixation generates a new “snapshot” of the visual world that seems to be acquired within 400 ms (Tatler 2001). However, we do not know the relative contribution of saccades and fixational eye movements to retinal processing and coding. Wide field motion induced by saccades inhibits some ganglion cells types but drives strong responses in other cells (Roska and Werblin 2003). Similarly, fixational eye movements can generate informative retinal responses in the turtle retina (Greschner et al. 2002), but they can also suppress firing in some ganglion cells in the rabbit and salamander retina (Olveczky et al. 2003).
To quantify the different contributions of eye movements to visual perception, we have brought together observation of visual behavior, measurement of eye movements, and analysis of population neural codes using the archer fish as a model system. Archer fish (Toxotes chatareus) are famed for their ability to shoot down insects resting on foliage using a squirt of water from their mouth (Fig. 1A; Supplementary Video 11 ) (Allen 1978; Luling 1963; Schuster et al. 2006; Timmermans 2000). Archer fish are capable of many expert visual behaviors: they can accurately shoot stationary targets that cover only about one photoreceptor on the retina (Timmermans 2001), predict the location at which the insect will splash down in <100 ms (Rossel et al. 2002), and deduce the absolute size of an object presented at different viewing distances (Schuster et al. 2004).
We trained archer fish to accurately discriminate a medium-sized object from larger and smaller distracters and signal this choice by shooting a jet of water. We measured the eye movements made by freely swimming fish, finding that they use saccades and fixations in a manner very similar to humans. To connect this behavior to the retinal code, we recorded from populations of ganglion cells using a multielectrode array while presenting objects of different sizes that were moved to simulate saccades and drift and tremor. Finally, we used simple decoding algorithms to extract information from retinal spike trains and discriminate among the different object sizes. We found that saccades were far more effective than fixational movements in driving informative responses in the ganglion cells. In addition, decoders are needed to pool over small populations of ganglion cells to discriminate among objects and the animal.
Training the archer fish to shoot at a target printed on paper
The training procedure started by placing a single naïve archer fish (T. chatareus) in a water tank. Live flies with removed wings were presented to the fish by placing them on a stick ∼25 cm above water level against a white background (a regular letter-sized paper, 30 cm above water level). The archer fish would usually shoot at the walking fly very quickly. After fish were acquainted with flies as a food source, the flies were immobilized in the center of the paper inside a flexible silicone tube connected to a manual basketball air pump (Toys R-Us, Princeton, NJ). Whenever the fish shot at the immobilized fly, the experimentalist rewarded it by ejecting the fly from the tube so that it fell to the water's surface. After a week, the immobilized fly was replaced by a single black disk (5.08 mm diam) printed on letter-sized paper. Again, an air pump was used to reward the fish after shooting at the printed target. Three fish were trained using this procedure. It took the fish between 120 and 250 trials before reaching the specific fish's best performance.
Measurement of the ability of the archer fish to discriminate between targets
At each session, three printed targets (2.54, 5.08, and 7.62 mm) were presented in randomized order 30 cm above water level. The fish was given 60–120 s to shoot at the middle-sized target (5.08 mm). In cases where the fish shot the correct target, it was rewarded by dropping a wingless fly to the surface of the water with the air pump. In cases where the fish missed, the paper with printed targets was replaced by another one with a different organization of targets. The targets organizations were selected randomly. On each day, we presented each fish with 5–15 different trials, with more trials for larger fish (see Supplementary Video 2). To determine whether to reward the fish for good performance, at each shot, we looked on the pattern of water created by the spit on the glass below the paper with disks. The pattern usually had a star-like shape with the center of the jet corresponding to the center of the star-like shape. The fish was rewarded only when the pattern was centered on top of the middle-size target.
This task was designed to allow us to study how the ganglion cells transfer information for retinal image size. Accordingly, we designed a behavioral task where the fish does not need to evaluate the distance to the target. One should note that there is evidence (Schuster et al. 2004) that the archer fish can evaluate both the size of the retinal image and the distance to the target to obtain an estimation of the absolute target size (i.e., the true size of the target regardless its distance). This is of course a more difficult calculation that requires depth cues, such as the image disparity between the two eyes, and is outside the scope of our study.
Recording eye movements
Search coils were fabricated (24 loops, 4 mm ID) from thin copper wire (California Fine Wire Co., Grover Beach, CA). The coil was attached to one of the archer fish's eyes (n = 3 fish) with superglue. Coil leads were kept loose to allow the eyes to move without constraint. The fish was transferred to a narrow water tank, which allowed it to swim freely forward and backward but not to make large, sideways body movements. The water tank with the fish was mounted inside a magnetic coil system (Remmel Labs, Katy, TX), which generates three perpendicular oscillating magnetic fields with different frequencies. The signal from the search coil was amplified and separated using a phase lock amplifier (Remmel Labs) to vertical and horizontal components. The output of the phase-locked amplifier was proportional to the angle along the vertical and horizontal axis. This voltage was sampled (1,000 samples/s) and stored on a computer hard drive. Stable recordings were made for ≤2 h. The search coil resolution was better than 0.015°.
Calibration of the search coil
Before each experiment, the search coil was mounted on a three-dimensional protractor (Mensh et al. 2004). The coil was placed inside the full water tank that later was used to house the fish. First the offset of the phase lock amplifier was adjusted so that zero voltage corresponded with 0° in both vertical and horizontal axes. The amplification was adjusted to ensure that ±7 V in each direction corresponded to ±20°. A linear interpolation was used to transform the voltage output to the orientation of the search coil in space.
Analysis of eye movements
The archer fish eye movements had two major components: saccades and fixations. Saccades were detected by calculating the angular velocity at each time step. A saccade was detected whenever the angular velocity exceeded 20°/s. For each saccade, we calculated the total angular rotation, the mean angular velocity, the maximal angular velocity, and duration. The intersaccade segment (fixation) had two components: drift and tremor. The velocity of the drift was evaluated dividing the difference of the start and end angular displacement by the total duration. Tremor was defined as the eye movement remaining after subtracting drift, and its frequency content was evaluated by calculating the power spectral density of each segment (⇓Fig. 3B) with the Welch method (pwelch, Matlab, Mathworks, Natick, MA) and detecting the frequency that contained the maximal energy (Fig. 3E). To measure the peak-to-peak amplitude of the tremor, we first applied a band-pass filter (0.01–12 Hz) to avoid misdetection of peaks caused by noise. The time of maximal and minimal eye positions in the filtered data were found by checking the first and second time derivatives. Finally, the peak-to-peak amplitudes were calculated from the raw data using these extremal time points.
Image movement caused by waves on the water surface
In addition to the eye movements, the image on the retina might move because of waves on the surface of the water. However, in our water tank, we had a barrier that suppressed waves coming from air and water pumps. To ensure that the contribution caused by surface movement was negligible, we measured this movement caused by refraction with a laser beam. We found that the image motion caused by surface waves was at least one order of magnitude smaller then the image motion caused by fixational eye movements.
Image movement caused by optic flow
Large body movements made by the fish will induce optic flow that cannot be compensated for by gaze stabilizing eye movements. Such optic flow makes the target bigger on the retina as the fish approaches the surface. We can place an upper bound on the size of image movement induced by optic flow in the following fashion. For the biggest target, the size of the image on the retina when the fish is at the bottom of the tank, ≥50 cm below the target, is ∼55 μm. When the fish is at the surface after 3 s of swimming, the size is ∼90 μm. Optic flow therefore generates movement of the target's edge of ∼12 μm/s. This is ∼15 times smaller than the movement caused by tremor, indicating that the optic flow does not make a big contribution to retinal encoding of these stimuli.
Experiments were performed on the archer fish (T. chatareus), obtained from www.aquariumfish.net or Seven Star Tropical Fish (Philadelphia, PA), in conformity with all institutional animal care standards. Fish were anesthetized using MS222 (0.2 g/l; a-5040, Sigma, St. Louis, MO). Retinas were isolated from the eye in darkness. To allow good access of the multielectrode array to the ganglion cells, prior removal of the vitreous was essential. This was done by injecting the eye with 0.025 ml collagenase (25 mg/ml; C-9891, Sigma) dissolved in Ringer solution for 2 h before the experiment (Hirasawa et al. 2002). This procedure causes the vitreous to become a viscous fluid that can be removed by flushing with Ringer using a plastic pipette during the dissection. To remove concerns that the collagenase might affect synapses in the retina, we characterized the basic properties of the ganglion cells using on-off diffuse flash and random checkerboard to obtain the receptive field structure. We found that the ganglion cells in the archer fish retina have similar characteristics of to the ganglion cells in all the species we studied (mouse, salamander, and guinea pig). The majority (>90%) of cells were found to be off cells. Analysis of the response to a random checkerboard stimulus reveals that there are at least two distinct off types, transient and sustained types, as indicated by the autocorrelation function. Finally, a small (2 × 2 mm) piece of the retina with the pigment epithelium attached was cut and placed over the array. In these experiments, the retina was taken from around the fovea region.
Retinas were placed with the ganglion cell layer facing a multielectrode array (Multichannel Systems) and superfused with oxygenated (97% O2-3% CO2) Ringer medium at room temperature. We developed the archer fish Ringer medium based on published data from the goldfish (Dmitriev and Mangel 2000; Wang and Mangel 1996). It contained the following (in mM): 120 NaCl, 2.5 KCl, 1 MgCl2, 0.7 CaCl2, 3 NaHCO3, and 11 glucose. Stable recordings of >8 h were achieved under these conditions. Extracellularly recorded signals were digitized at 10 kSamples/s and stored for off-line analysis. We used two types of multielectrode arrays: a dense planar array with 30-μm spacing and 10-μm electrode diameter (Mulitchannel Systems) and a three-dimensional (3D) array with 200-μm electrode spacing (Ayanda Biosystems). In the 3D array, the electrodes are fabricated with a needle-like shape that was elevated 60–80 μm above the substrate. This geometry allows the electrodes to penetrate the residue of the vitreous and to achieve good electrical contact with the ganglion cells. The dense array was used for a general characterization of the archer fish retina, and the 3D array was used for our study of how the retina encodes target size. Spike sorting was done using our novel method (Segev et al. 2004) based on the signal from many electrodes (for the dense array) or by MClust based on spike amplitude and width (for the 3D array).
The spike sorting procedure for the dense array was previously described in detail (Segev et al. 2004). Briefly, using the peak voltage on 30 electrodes, we identified examples where a ganglion cell fired an action potential without overlapping spikes from other cells. These isolated spikes were averaged together to form a 6.4-ms template of the voltage activity on the array. Templates having a range of time shifts were matched against putative spike patterns in the raw data in an iterative process using mean-squared error as a measure of goodness of fit. We recorded a total of 44 cells from the temporal region of six retinas for the flash data (Table 1, flash only condition) and 37 cells from five retinas for the saccade data (Table 1, all other conditions).
Stimuli were displayed on a CRT monitor at a frame rate of 120 Hz and focused onto the plane of the retina using standard optics (Meister et al. 1994; Puchalla et al. 2005). The mean intensity on the retina was 12 mW/m2, corresponding to photopic vision. The stimulus consisted of three disks with sizes (30.7, 61.4, and 92.0 μm on the retina) matched to the behavioral experiments. To use the ability of the multielectrode array to record from many neurons, we showed each on of the three disk sizes in a 3 × 3 matrix form so we could simultaneously stimulate more cells. The distance between targets, 500 μm, was selected to be large enough that a given cell (receptive field radius ∼125 μm) would respond to only one of the targets. To test that this spacing was adequate, we compared the activity in response to a flash of the whole matrix with the activity in response to a flash at each of the nine target locations. We found that 91% of the cells (n = 10/11) had responses that were statistically indistinguishable between the case of single flashed disks and the 3 × 3 matrix of disks. The average response of ganglion cells to the large disk was 8.0 ± 1.4 (SE) spikes (n = 11 cells) for the 3 × 3 matrix and 7.6 ± 1.2 spikes for a single disk, indicating that the average difference in the population was not statistically significant. Furthermore, relative comparisons, such as the performance of a decoder the uses the response to saccades versus a decoder that uses fixational eye movements, have even less bias.
Training the decoder
The decoder was trained using N − 1 presentations of the stimulus (where N is the total number of stimulus presentations) and tested on the remaining presentation. In this way, each event is actually decoded using information acquired during all the other events. This allows us to do cross-validation of the training (Lindzey and Aronson 1968).
Retrograde labeling of ganglion cells
The density of the ganglion cells was measured by retrograde labeling through the optic nerve. Following Dacey et al. (2003) and Segev et al. (2004), the eyes were removed and placed in Ringer solution, leaving an optic nerve stump of ∼1 mm. A dye crystal (rhodamine dextran together with biotin, Molecular Probes, D-1817, 3,000 MW) was placed on the optic nerve stump for an overnight period. Then the retina was removed from the eyeball and placed on a coverslip. A filter paper was used to maintain the retina in place. Using this method, we produced a density map of the ganglion cells that ranged from ∼1,500 cells/mm2 in the far periphery to ∼5,000 cells/mm2 in the fovea (Fig. 2A).
Electron microscopy of optic nerve
Optic nerves were excised from animals and fixed for 24 h at 4°C in 3% glutaraldehyde and 6% tannic acid. Nerve tissue was postfixed with 1% reduced osmium tetroxide in 0.075 M sodium cacodyalate buffer with 5% sucrose on ice for 2 h. To improve image contrast, tissues were en bloc stained with 1% aqueous uranyl acetate for 1 h. Finally, tissues were dehydrated by ethanol, followed by a 50/50 mix of ETOH/PO (propylene oxide) and infiltrated with Embed 812 resin. Blocks were polymerized for 24 h and sectioned with a Diatome 35° diamond knife. Sections were placed on a 1 × 2-mm slot coated with formvar grids. All sections were examined with a Leo 912 AB Omega Energy Filtered TEM at either 80 or 100 kV. Digital micrographs were taken with an AMT XR-60B CCD. We counted the number of axons in 16–21 randomly chosen fields of view at ×4,000 magnification (Fig. 2C), giving us the density of axons per unit area. We next multiplied this density by the total cross-sectional area of the optic nerve, as measured by optical microscopy (Fig. 2D), finding 360,000 ± 49,000 (SD) axons in the entire optic nerve (n = 4 nerves). We used this number to uniformly scale the cell counts from our retrograde labeling, such that the cell density shown in Fig. 2A added up to the correct total number of ganglion cells.
Anatomy of the archer fish eye and retina
The archer fish have large eyes (∼9 mm diam for a fish of 8–10 cm head to tail length) located very near the mouth. The temporal part of the eye is directed forward, which allows binocular vision and may contribute to the ability of the archer fish to determine the distance of prey. We measured the size of the photoreceptors at the fovea by electron microscopy. We found that the distance between centers of two cones is ∼6 μm (Fig. 2B). In addition, we measured the ganglion cells' receptive field size using reverse correlation to spatially modulated random flicker (Puchalla et al. 2005; Segev et al. 2004). We fit the spatial profile of each cell's receptive field center with a 2D Gaussian (Segev et al. 2004), finding the receptive field radius to be 125 ± 70 (SD) μm (n = 34 cells).
Archer fish can distinguish between different target sizes
We trained archer fish to select a medium-sized disk out of three possible disk targets (diameters of 2.54, 5.08, and 7.62 mm) by shooting a jet of water at it (see methods) (Rossel et al. 2002; Schuster et al. 2004). Targets were dark disks presented on a piece of white paper 30 cm above the surface of the water. At this distance, the disk diameters were ∼5, 10, or 15 photoreceptors on the retina (see Fig. 2 and methods).
After the paper was placed above the tank, the archer fish would swim close to the surface of the water and fixate for several times at successive targets for 1–3 s each. After looking at each target once or sometimes twice, the fish would orient its body toward one target, fixate for several seconds more, stick its snout out of the water, and shoot (see Supplementary Movie 2 and Fig. 1B). If it shot at the correct target, the fish was rewarded by being given a piece of food (see methods). Although shooting was very accurate, we did not use the fish's accuracy to judge whether to give a food reward or not. Archer fish quickly learned to perform this task, achieving a success rate of 86% on average (n = 3 animals) and 91% for the best fish. A success was considered only when the fish shot directly at the target.
One should note that the success rate obtained in this experiment is only a lower bound on the ability of the archer fish retina to encode information reliably. The fish's error may well be caused by some properties of central processing, like drifting attention, exploration of other options to see what the reward might be (i.e., the fish knows which dot is medium sized, but shoots at another dot to see if it gets a better reward), or imperfect training.
Archer fish eye movements
Eye movements were measured using a standard, magnetic search coil technique (see methods) (Mensh et al. 2004; Robinson 1963). Animals were freely swimming, but were kept in a narrow tank so that their body movements were small. In this condition, the animal stabilizes its body and scans the visual world using saccadic eye movements, closely approximating the manner in which it looks at visual targets in the behavioral task. The search coil measures the angle of the animal's eye in space, including both movements of the eye relative to the body and of the body in space. This is the appropriate measurement, because both eye and body movements effect the motion of the static visual world on the retina.
Archer fish eye movements were similar to most vertebrates, consisting of periods of fixation separated by saccades (Fig. 3A ) (Land 1995; Land and Nilsson 2002). The saccades are fast (∼50–100°/s), smooth, and large movements that bring the center of gaze (which for the archer fish is the temporal region of the retina) onto a new location. Between saccades, the archer fish had smaller eye movements that can be divided into two basic components: a high-frequency tremor that wiggles the image on the retina and a slow drift that scans the entire image across the retina (Fig. 3B). We found that the archer fish made microsaccades rarely, and therefore we did not include them in the analysis. (When fixational eye movements in the archer fish are mentioned in the text, we refer to tremor and drift.)
To analyze eye movements quantitatively, we first identified saccades using an eye velocity criterion (see methods). Saccades occurred every 1–5 s with a mean time of 1.33 s (Fig. 3C). Within each fixation, we measured drift by fitting a line to the eye position trace. Velocities were found to range ≤0.40°/s, corresponding to 4 photoreceptor diameters/s on the retina (assuming an eye radius ∼3.6 mm) (Luling 1958). Tremor was defined as the eye movement remaining after subtracting the drift. We studied its frequency content by calculating the Fourier transform of each segment, finding peaks at three frequencies: ∼2, ∼5, and ∼10 Hz (Fig. 3D). The contribution from the different frequency bands varied from one fixation to the next. The most prominent and reproducible component was the 5-Hz oscillation. The peak-to-peak tremor amplitude was 0.1–0.2° (Fig. 3E), which corresponds to an image displacement of one to two photoreceptors. To show how a static image moves across the retina caused by fixational eye movements, we display a typical 1.5 s trace of eye position measured from one archer fish over a background of its photoreceptor mosaic (Fig. 3F).
To assess the applicability of these eye movement measurements to our size discrimination task, we videotaped the fish eye during many task trials. Close inspection revealed that the fish indeed used a series of saccades and fixations with similar durations prior to shooting at a target (see methods).
Encoding of target size by retinal ganglion cells
To evaluate how eye movements influence the ability of the retina to encode visual information, we presented a set of three targets to the isolated retina while moving the image to simulate the effect of eye movements (Fig. 3G). During this visual stimulus, we recorded spike trains from many ganglion cells with a multielectrode array (Puchalla et al. 2005; Segev et al. 2004). Disks were displayed on a computer monitor and had diameters of 30, 60, and 90 μm on the retina, matching the dimensions presented to the fish during the behavioral task.
Saccades were simulated by rapidly (50°/s, which corresponds to ∼3,000 μm/s on the retina) moving the disk from the edge of the monitor to a position over the multielectrode array. In the saccade-only condition, the object remained stationary for 2 s and moved rapidly away. In an alternative, more idealized version, the disk simply appeared suddenly, remained for 2 s, and disappeared suddenly. The disk was moved off the retina using a reverse saccade with the same velocity. This flash version corresponds to the limit of very fast saccades. We found retinal responses to be nearly identical in these two cases (Table 1, 1st and last rows). Fixational eye movements were simulated with two different kinds of motion. Tremor consisted of an oscillation of disk position at 5 Hz, which matches the primary component observed experimentally. Drift consisted of a constant velocity added on top of the tremor. Both kinds of fixational eye movements were presented for 2 s after the occurrence of a saccade and had amplitudes matched to experiment.
Figure 4 shows the peristimulus time histogram (PSTH) of 10 ganglion cells during the presentation of all eye movement components: saccades plus tremor and drift. The three columns show the results for the three different disk sizes. The saccade can be seen to elicit a strong, brief burst of firing at the beginning of the fixation with a peak rate as high as 900+ spikes/s. In general, rapid movement of the object off of the cell's receptive field did not evoke a very strong response. These results are expected, as all ganglion cells recorded in this experiment were off-type and the object was a dark disk. One can clearly see that the strength of the burst of spikes increased with the size of the disk. Different ganglion cells had considerably different response latencies (e.g., cells 5 and 6), but these differences did not necessarily convey information about disk size. During fixational eye movements, only a weak response was elicited. Where present (see cells 8 and 10), the firing rate was phase locked to the tremor and often appeared near the end of the 2-s fixational period.
Estimation of the target size from ganglion cell spike trains
To discriminate between different object sizes, the archer fish brain must “read out” the information about target size encoded by the retina. We studied and quantified this process by formulating different decoding algorithms that the brain might apply to retinal spike trains. All such decoders involve estimates of the conditional probability of the stimulus given a neural response. Elaborate specification of a neuron's response will be difficult to sample properly, even with hundreds of repeated presentations of the stimulus. In addition, a complicated decoder will be difficult for the central brain to actually implement. With these constraints in mind, we chose simple, biologically plausible decoding algorithms. Although ganglion cell spike trains may contain information that is not accessed by a simple decoder, these algorithms still provide a lower bound on the information about target identity available to the animal in retinal ganglion cell responses.
We first studied decoders based on the number of spikes fired by a ganglion cell, because this was the most apparent source of information about disk size. To ensure adequate sampling, we divided the time during the fixation into two windows. The first window was between 0 and 200 ms after the saccade's end and included the transient firing elicited by a saccade. The second time window was between 200 and 2,000 ms after the stimulus onset and included responses to the fixational eye movement (Fig. 5A). For each presentation of the stimulus, we counted the number of spikes that occurred in each time window. By repeating the stimulus many times, we estimated the probability of the stimulus given the number of spikes generated during each of the two time windows P(S|Ni), where Ni is the number of spikes generated during each window i by a single cell.
To combine stimulus information from the two time windows, we made the approximation that the spike counts in the two windows were uncorrelated given the stimulus: P(N1,N2|S) = P(N1|S)P(N2|S). Using Bayes' rule, we find P(S|N1,N2) = P(S|N1) × P(S|N2)/Z where S stands for the target size, which is one of three disks, and Z is a normalization factor. Under this assumption of no noise correlation, we could effectively sample the joint distribution P(S|N1,N2). Finally, we estimated the stimulus by finding the disk size that had the largest probability given the neural response on that trial (also known as maximum likelihood decoding). To avoid overfitting, we always excluded the test trial when compiling the decoding dictionary (see methods for more details).
When we applied this decoding algorithm, we found that the success rate for individual neurons was 47 ± 2% (SE) with values ranging ≤74% (Table 1). We can gain additional insight by limiting the information available to the decoder. Using only the second time window containing the response to fixational eye movements gave an average decoding success of only 38 ± 1% for tremor alone and 40 ± 1% for tremor plus drift. Although this value was significantly greater than the chance value (one third), it is clear that the response triggered by fixational eye movements was weak and not very informative about the spatial structure of the visual stimulus. Performance was similar for tremor alone and for tremor plus drift, suggesting that drift plays a very minor role in revealing spatial structure.
In contrast, when we used only the first time window at the beginning of a fixation, which contains the response to the end of the saccade, the average decoding success was roughly the same as when we used the entire response (success = 46 ± 2%; Table 1). This implies that responses to fixational eye movements contained information that was largely redundant with that from the saccade. Finally, when we flashed the disks onto the retina, simulating the effect of a very fast saccade, we got similar performance to the saccade only condition (success = 50 ± 3%). This implies that it is the sudden appearance of the disk on a cell's receptive field that is important for encoding disk size rather than the detailed saccadic trajectory.
Time to first spike decoder.
Having found that a decoder based on spike counts in large time windows can successfully discriminate among disk sizes, we next explored the possibility that other response variables carry information about target identity. Because of the vigorous response to the end of a saccade at the beginning of a fixation, another intuitive choice is the time to first spike after the saccade ended, as larger disks cause ganglion cells to fire their spikes earlier than smaller disks (Fig. 5B). The simplest decoder that takes this information into account consists of two latency boundaries that separate small disks from medium and medium from large, respectively. We set these boundaries by finding the point in the PSTH where the firing rates of the two conditions cross each other. On average, the success rate of ganglion cells was 49 ± 2%, and the best cell achieved a 75% success rate (Table 1). This is nearly identical to the results for the spike count decoder.
To study the respective roles of the spike count and time to first spike, we formed a decoder that combined both of these response variables. To do this, we evaluated the full conditional probability P(S|N,τ), where N is the number of spikes in the first 200 ms after the target onset, and τ is the time to first spike after the stimulus onset. We used a maximum likelihood algorithm to estimate the stimulus identity, as described above (Eq. 1). We found that the average success rate was 48 ± 2%, with the best cell achieving 75% success (Table 1). This result was roughly the same as the performance with either response variable alone, indicating that these two response variables are highly redundant.
Phase locking between stimulus and response decoder.
Our analysis showed that simply counting the number of spikes elicited by fixational eye movements gives poor discrimination among different target sizes. However, the firing rate for some cells was clearly locked to the phase of the tremor (Fig. 4). To study whether more detailed spike timing during tremor was informative, we calculated the firing rate relative to the beginning of each tremor cycle (Fig. 5C). Next, we divided this PSTH into four time bins, where the boundary between time bins corresponded to the beginning of each tremor period. We formed a decoder that counted spikes in these four bins and combined those counts to estimate disk size using the approximation given by Eq. 1. This choice allowed the decoder to make use of time locking to the tremor cycle while also allowing us to effectively sample the joint probability distribution of stimulus and response from a limited data set.
On average, the performance of this decoder was 40 ± 1%, which was slightly better than the decoder that simply counted spikes during the fixation. However, this improvement was small because most cells simply did not respond reliably at all during the fixation. The best cell achieved a success rate of 56% using time-locking to tremor compared with only 49% for spike counts alone (Table 1). Thus the brain can derive some additional information using more detailed spike timing during fixational eye movements.
Integrating information from multiple cells
Thus far, we showed that single ganglion cells do not provide enough information to estimate the target size with sufficient accuracy to account for the animal's behavioral ability (Fig. 1D; Table 1). This result holds for a broad class of decoders that use spike counts, spike timing, or combinations of both. Although these decoders represent a large variety of decoding scheme, we should emphasize that there is the possibility that the brain used a sophisticated decoder that achieves a better result.
Obviously, the archer fish brain must combine information from multiple cells. We can make a naïve estimate of the number of ganglion cells that could contribute to this task. Assuming that any ganglion that has the smallest disk fall in its receptive field center can participate, we have an area of radius 140 μm (disk radius = 15 μm, receptive field radius = 125 μm; see methods). Multiplying by the ganglion cell density in the fovea (5,000 cells/mm2; see Fig. 2A), one finds that ∼300 ganglion cells could be involved. This indicates extensive potential for a population code. One should note that a binocular fish can use information from both eyes and therefore we need to double this estimation. In addition, there is no conceptually different analysis the brain needs to carry on the information streaming from the retinas while combining information from ganglion cells belonging to two different eyes.
We constructed population decoders based solely on the spike count in the first 200 ms of a fixation. We made this choice because we have not identified any response variable that contains significantly more information available from single cells and also because this variable is easier to combine across cells than spike times. We again used Eq. 1 to combine spike counts from M cells, again assuming conditional independence.
To study the dependence of the success rate on the number of cells, we selected subgroups from all the cells using two methods. For average populations, we randomly selected subgroups for each population size and averaged the result over all subgroups. For the best population, we started with the best individual cell for group size M = 1. We formed two-cell decoders using all the remaining cells and chose the best one. Similarly, the best M-cell decoder was chosen using the best M − 1 cell decoder and trying all possible remaining cells. While this iterative method is not guaranteed to find the optimal subgroup, it should find subgroups with performance far above average.
The performance of average populations increased steadily as more cells were added, rising to 78% for M = 9 cells (Fig. 6). This is not quite good enough to explain the animal's behavior, but if we linearly extrapolate the trend for more than four cells, we estimate that average populations with M ∼ 14 cells are sufficient. Selecting the best possible population gave much better performance: here, only two to three cells were needed to match the animal's behavior. In addition, the performance saturated at a success rate of ∼96% with populations of more than four cells. Thus the archer fish brain can integrate information from a handful of neurons to achieve the behavioral success rate. Finally, the same analysis was applied to the time window that contained the spikes caused by fixational eye movement. We found that the success rate is marginally better than chance value (Fig. 6). This result support the finding obtained previously using single cell analysis that the beginning of fixation (end of saccade) is the most informative part of eye movement in this particular task.
We showed that single ganglion cells can discriminate among targets of different sizes with a success rate much better than chance when driven by realistic eye movements. Although fixational eye movements carried significant information about target size, we found that the beginning of fixation was far more informative. We considered several possible algorithms for using the information encoded in retinal spike trains to discriminate among targets and found that a very simple decoder based on counting the number of spikes elicited by the end of the saccade performed well. Individual ganglion cells could not account for the archer fish's behavioral ability, but small groups of randomly chosen ganglion cells could.
Role of saccadic eye movements
In humans, stabilizing gaze so that fixational eye movements are largely eliminated has the dramatic effect of causing visual perception to fade away in a few seconds (Ditchburn and Ginsborg 1952; Riggs et al. 1953). This has led many to argue that fixational eye movements provide much of our information about static visual scenes (Ahissar and Arieli 2001). However, gaze stabilization also removes saccades. How much visual information do we really derive during fixations? Clearly, we can perform some tasks based solely on fixational eye movements, such as being able to see during fixations lasting for tens of seconds. However, it may be the case that, during normal vision, most of our information actually comes from saccades.
Many studies have acknowledged the problems that saccades pose for vision, such as extreme blurring of the retinal image (Ross et al. 2001) and changing the relative position between external world and the retina (Ross et al. 2001; Thilo et al. 2004). However, saccades also allow the acquisition of information about static patterns in the visual world by “flashing” a new image onto the retina and eliciting high firing rates in many ganglion cells (Puchalla et al. 2005; Roska and Werblin 2003). Because the information rate of ganglion cells tends to increase with the firing rate (Koch et al. 2006; Puchalla et al. 2005), one might expect the end of the saccade to be very informative about the outside world. Can this retinal information escape the partial saccadic suppression and actually be used by the brain? In humans, the release from suppression is fast: if an image is stabilized during a saccade for 30 ms (i.e., after the saccade started and before it ends), the subject can perceive the image (Carpenter 1977; Ross et al. 2001). Furthermore, saccades suppress only a subset of visual signals, for example, those carried by the magnocellular pathway in primates (Ross et al. 2001). In the archer fish, ganglion cell responses occur with a latency of 80–100 ms after the end of the saccade, a latency long enough to suggest that saccadic suppression is no longer in effect.
The conclusions that can be drawn from this study are limited by the fact that we used a specific discrimination task. The relative importance of saccades and fixations as well as detailed timing information may be different for other kinds of visual discrimination. For instance, the response to the end of the saccade exhibited a variety of latencies, and some cells even had two peaks of firing (Fig. 4, cell 4). This variety was not useful for discriminating between different disk sizes, but it might prove useful for other tasks. Similarly, the response driven by tremor might help localize a spatial edge or give information about its contrast, but we did not explicitly vary the disk location or contrast in our experiments. The strength of our experimental design is that we tested how the retina encodes visual stimuli that were matched to a behavioral task, so that we know that the information we attempted to extract from retinal spike trains was, in fact, relevant to the animal.
An additional limitation of this study is that we only considered several simple decoding algorithms. It is possible that the brain might be able to use a more sophisticated decoding strategy with a longer integration time to extract additional information from retinal spike trains, especially during fixations. Again, such strategies might be more important for other visual tasks, such as localizing an object. However, simple decoding strategies set a lower bound on the information conveyed by retinal spike trains that can be far less sensitive to sampling problems that a full information calculation. Furthermore, the strategies that we considered are all easily implemented with neural hardware.
Comparison of the archer fish and human visual systems
How similar is vision in the archer fish to humans? Both archer fish and primates have a region in the retina—the fovea—that looks directly forward, has a binocular visual field, and is devoted to vision with high visual acuity. This anatomical fact has important consequences for the way that visual space is sampled with eye movements. In both the archer fish and primates, visual search shares a similar strategy: fast saccades move the center of gaze onto objects of interest and gaze remains relatively stable during periods of fixation.
While the basic strategies are similar, there are differences in the detail. The archer fish saccades every 1–4 s, whereas humans make saccades as often as 2–4 times per second. In addition, humans perform microsaccades during the fixation period at a rate of ≤3–5 times/s, which make them the most frequent type of saccades. The archer fish makes microsaccades only rarely according to our measurements. The fact that humans make saccades (and microsaccades) more frequently than archer fish suggests that saccades and microsaccades may provide even more visual information in humans than in archer fish. Furthermore, saccades and microsaccade are produced by a common physiological system in primates (Carpenter 1977; Zuber and Stark 1965), so that microsaccades in primates can be expected to generate similar effects as those produced by the nonfixational saccades in the archer fish. Thus fixational microsaccades in humans and primates may well be involved in size discrimination tasks similar to the task we used, and in fact, may be significant like nonfixational saccades in some conditions.
Recently, Rucci et al. (2007) showed that human fixational eye movements improve the visibility of high-spatial frequency information in the image. The results of Rucci et al. may seem to be in contrast with the results from this study. However, the conclusions from both studies can be reconciled by considering a differential role of microsaccades versus drift and tremor. That is, Rucci et al. found that fixational eye movements improve vision, but they do not address which part of the fixational eye movements is most important. Our study suggests that drift and tremor by themselves are not sufficient to improve visual perception. Thus a possible explanation is that microsaccades are most important component of eye movements for improving vision during fixation.
Additional difference between human and archer fish vision is that the angular velocity during saccades in archer fish was typically 50–100°/s, whereas humans make saccades with velocities up to several hundred degrees per second (Carpenter 1977). We suspect that these differences in saccadic velocity may not be so important for deriving information about static spatial features, because our data showed that a 50°/s saccade and a sudden flash (corresponding to a very high velocity saccade) were equally informative about target size. Finally, archer fish, like humans, perform saccades with a linear relationship between saccade amplitude and mean velocity, which may indicate for a common mechanism for saccades generation.
Fixational eye movements in archer fish have several quantitative differences from those in humans. Although the eye velocity is about the same in both species, the movement covers about four photoreceptors per second in archer fish but 10–30 foveal cones per fixation in humans (Ahissar and Arieli 2001; Martinez-Conde et al. 2004; Pritchard 1961; Steinman et al. 1973). This difference results both from the fact that the human eye is larger and its photoreceptors are smaller. In addition, midget ganglion cells in the fovea have a receptive field equivalent to roughly one cone, whereas archer fish ganglion cells typically have a receptive field that spans tens of cones. This means that image motion during fixations is even smaller compared with the ganglion cell receptive field in archer fish than humans.
This difference could be very important, because midget ganglion cells in the human fovea may fire much more strongly and with better time locking during fixational eye movements than we found in the archer fish. We do not actually know how strongly midget cells fire during fixations in natural stimulus conditions. Also, ganglion cells in the human periphery have much larger receptive fields, such that eye movements span a similar portion of their receptive field as for archer fish ganglion cells. Thus our results in the archer fish may apply more closely to the human periphery.
There are two more differences between human and archer fish eye movements that are harder to interpret. Tremor in humans is comparatively smaller (∼1 min vs. ∼10 min) and has a higher frequency (40+ vs. 5 Hz) than in archer fish (Ahissar and Arieli 2001). This suggests that tremor might be more important in archer fish than in humans. However, the velocity caused by tremor is again quite similar in both species (∼1°/s), and, interestingly, the differences in tremor frequency and fixational duration trade-off, so that both humans and archer fish experience a similar number of tremor cycles per fixation (∼10). Another difference is that the archer fish eye moves mainly along the horizontal axis, a pattern that is characteristic of many teleost fish (Land and Nilsson 2002; Mensh et al. 2004). Humans also have an asymmetric dynamics of horizontal and vertical eye movements (Liang et al. 2005), which may result from the fact that horizontal and vertical components of saccades are controlled by different brain stem nuclei (Sparks 2002). The significance of this bias in favor of horizontal eye movements is not known.
Relation to other studies
Greschner et al. (2002) made a detailed study of the contribution of eye movements to visual processing in the turtle retina. They showed that by wobbling a static image (a square wave grating) in a fashion matched to measured eye movements in the turtle, ganglion cells firing was increased. They could decode the activity of a population of ganglion cells to decide which grating frequency was presented, and concluded that image movement caused by eye movements can help distinguish spatial features in a visual scene. In our study, we found that fixational eye movements allowed the brain to extract some information about target size, similar to the results of Greschner et al. However, we also found that saccades were significantly more informative.
Finally, our findings that saccades elicit a more vigorous response from the retinal ganglion cells is consistent with the study by Martinez-Conde et al. (2002), who found that microsaccades drive the response of center-surround neurons in the awake monkey lateral geniculate nucleus (LGN). Furthermore, Donner and Hemila (2007) found in a theoretical study that microsaccades may improve resolution to two closely spaced lines. These studies provide additional support to our conclusion that saccades provide not only a problem to the visual system but also an opportunity to encode information by the retina.
Generalization of spike count for encoding different aspects of the stimulus
An important issue of our finding that a spike count code can be used by the fish brain to figure out the target size is to what extent spike count codes can be generalized to the more complicated situation a fish might cope with in the real world. That is, the number of spikes depends also on contrast, background, texture, adaptation state, and even target color. For example, a large target with low contrast might elicit the same number of spikes as a small target with high contrast. In our view, the solution to this ambiguity in a spike count neural code may come by using information available at the population level. Different ganglion cells have different sizes of receptive fields and different response characteristics to light level changes. This diversity of response leads to a situation in which different aspects of the target are encoded by different combinations of ganglion cells firing together, such that ambiguity in the code of single neurons can be resolved at the level of populations. Future study of the properties of population codes will be needed to address this important issue.
This study was supported by the E. Mathilda Ziegler Foundation and National Eye Institute Grant EY-01496.
We thank J. Puchalla and R. Harris for helping in the initial stages of this project and G. Lewen for helping in data acquisition and training the fish.
↵1 The online version of this article contains supplemental data.
The costs of publication of this article were defrayed in part by the payment of page charges. The article must therefore be hereby marked “advertisement” in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.
- Copyright © 2007 by the American Physiological Society