## Abstract

Sensory experience typically depends on the ensemble activity of hundreds or thousands of neurons, but little is known about how populations of neurons faithfully encode behaviorally important sensory information. We examined how precisely speed of movement is encoded in the population activity of magnocellular-projecting parasol retinal ganglion cells (RGCs) in macaque monkey retina. Multi-electrode recordings were used to measure the activity of ∼100 parasol RGCs simultaneously in isolated retinas stimulated with moving bars. To examine how faithfully the retina signals motion, stimulus speed was estimated directly from recorded RGC responses using an optimized algorithm that resembles models of motion sensing in the brain. RGC population activity encoded speed with a precision of ∼1%. The elementary motion signal was conveyed in ∼10 ms, comparable to the interspike interval. Temporal structure in spike trains provided more precise speed estimates than time-varying firing rates. Correlated activity between RGCs had little effect on speed estimates. The spatial dispersion of RGC receptive fields along the axis of motion influenced speed estimates more strongly than along the orthogonal direction, as predicted by a simple model based on RGC response time variability and optimal pooling. on and off cells encoded speed with similar and statistically independent variability. Simulation of downstream speed estimation using populations of speed-tuned units showed that peak (winner take all) readout provided more precise speed estimates than centroid (vector average) readout. These findings reveal how faithfully the retinal population code conveys information about stimulus speed and the consequences for motion sensing in the brain.

## INTRODUCTION

An essential function of sensory systems is to extract specific information about the environment efficiently from the activity of peripheral neurons. Current understanding of this process is based mostly on examination of how faithfully the activity of an individual peripheral or central neuron represents a sensory variable, such as the number of incident photons or the direction of movement (e.g., Barlow et al. 1971; Baylor et al. 1979; Bialek et al. 1991; Britten et al. 1992; Copenhagen et al. 1987). However, in peripheral sensory structures, behaviorally important information is usually represented not by the activity of an individual neuron but by the concerted activity of many neurons. For example, visual motion, form, and texture are encoded by the ensemble activity of hundreds or thousands of retinal ganglion cells (RGCs) that do not individually signal these stimulus attributes. Yet little is known about how faithfully stimulus information is conveyed by sensory population codes, what limits the fidelity of the encoding, and what computations are required to extract the information efficiently downstream.

Approaching these problems poses a major challenge: recording from the entire population of cells relevant for a behaviorally important sensory task. Although modern techniques allow recording from several dozen neurons simultaneously, in most experimental systems it is unclear how to target the cells responsible for a specific neural computation and record from the entire population. A system with unusual promise is the encoding of visual motion in the primate retina (Chichilnisky and Kalmar 2003). Waves of activity in the population of parasol (magnocellular-projecting) RGCs carry information about visual motion to circuits in the brain responsible for motion sensing, and it has recently become possible to record from ∼100 parasol cells with receptive fields that tile almost completely a significant region of visual space (Chichilnisky and Kalmar 2002; Litke et al. 2004). Because parasol cells are not individually direction selective, visual motion information is carried by population activity. The fidelity of this population code places the ultimate limits on cortical motion processing and behavioral motion sensing, which have been examined extensively in monkeys and humans. Finally, the wave of activity traversing the retina is an elementary representation likely to be recapitulated in other sensory structures. For these reasons, the encoding of motion in the primate retina provides an opportunity to understand fully a behaviorally important population code and the problems faced by the brain in reading it out.

Here we focus on estimating the speed of a moving object from retinal responses, which is required for visually guided behaviors such as tracking eye movements and target interception. To study the fidelity of the population code, we estimated stimulus speed directly from the responses of ∼100 on and off parasol RGCs simultaneously recorded, using an efficient procedure. We then examined how the precision of speed estimates depended on several aspects of the retinal representation: the number and spatial arrangement of cells, detailed temporal patterns of spiking, correlated activity between cells, noise in retinal circuits, and the relative efficiency and independence of signals in different cell types. We provide a theoretical framework to explain the observed speed estimate precision in terms of optimal pooling of RGC responses with a given temporal precision. Finally, we examine the limits to motion sensing that would be imposed by different readout architectures in the brain. Together, these results reveal how retinal processing and signaling limit the fidelity of visual motion sensing and how downstream structures can most efficiently exploit the retinal population code for perception and behavior.

## METHODS

### Recordings

Eyes were obtained from two deeply and terminally anesthetized macaque monkeys (*Macaca mulatta, M. radiata*) used by other experimenters, in accordance with institutional guidelines for the care and use of animals. Immediately after enucleation the anterior portion of the eye and vitreous were removed in room light and the eye cup was placed in bicarbonate buffered Ames' solution (Sigma, St. Louis, MO) and stored in darkness at 35–36°C, pH 7.4, for ≥20 min prior to dissection. Under infrared illumination pieces of peripheral retina 3–5 mm in diameter, isolated from the retinal pigment epithelium, were placed flat against a planar array of 512 extracellular microelectrodes, covering an area of 1,890 × 900 μm, that were used to record action potentials from retinal ganglion cells (Litke et al. 2004). The preparation was perfused with Ames' solution bubbled with 95% O_{2}-5% CO_{2} and maintained at 35–36°C, pH 7.4.

Retinal eccentricity was measured with a precision of ±2 mm. Eccentricity was converted to a temporal equivalent value because the contours of constant RGC density (and thus presumably dendritic and receptive field size) in the macaque monkey retina are approximately semicircular in the temporal half of the retina, but elliptical with an aspect ratio of 0.61 in the nasal half (Perry and Cowey 1985; Watanabe and Rodieck 1989). Thus a location *X* mm nasal and *Y* mm superior (or inferior) to the fovea was assigned an equivalent eccentricity of [(0.61*X*)^{2} + *Y*^{2}]^{1/2} A location *X* mm temporal and *Y* mm superior (or inferior) to the fovea was assigned an equivalent eccentricity of (*X*^{2} + *Y*^{2})^{1/2}. Visual angle, *A*, in degrees, was computed from temporal equivalent eccentricity, *E*, in mm, using the relation *A* = 0.1 + 4.21*E* + 0.038*E*^{2} (Dacey and Petersen 1992; Perry and Cowey 1985). The temporal equivalent eccentricity (visual angle) of each of the three pieces of retina examined was: 9.7 mm (45°); 9.0 mm (41°); 8.4 mm (38°).

Voltage waveforms recorded from each electrode were digitized at 20 kHz and stored for off-line analysis (Litke et al. 2004). Spikes were identified using a threshold equal to three times the typical noise level on each electrode, and spikes from different cells were segregated as follows (Litke et al. 2004). For each recorded spike on the reference electrode, the waveform of the spike and the simultaneous waveforms on six surrounding electrodes in the array were used as a signature of the spike. These signatures were reduced to five dimensions using principal components analysis, and clusters in this space were identified by fitting a collection of *N*-dimensional Gaussian distributions using expectation maximization. Duplicate cells were identified by temporal coincidence. The accuracy of spike sorting was checked by verifying the presence of refractoriness 0.5–1.0 ms after the spike.

Data from on and off parasol cells recorded from three preparations are presented in results. on and off populations were analyzed separately, because of their different response kinetics (Chichilnisky and Kalmar 2002). The following numbers of cells were analyzed: retina 1: 40 on, 49 off; retina 2: 56 on, 35 off; retina 3: 63 on, 68 off. To exclude the possibility that a small number of unstable or subsampled cells would influence the results, a small number of additional cells with response properties differing substantially from other cells of the same functional type were identified and excluded as follows. Spike trains of all cells of the same type were aligned in time by circularly shifting by an amount equal to the location of the center of the receptive field divided by the stimulus speed. The inner product of the response of each cell with the mean response across all cells was computed. Cells for which the inner product was >2 SDs from the mean were excluded from further analysis (7, 12, and 10 cells in the 3 retinas examined).

### Stimuli

The retina was stimulated with the optically reduced (2.9 mm diam) image of a cathode ray tube display refreshing at 120 Hz, focused on the photoreceptor layer by a microscope objective, and centered on the electrode array. Stimuli were attenuated to low photopic light levels using neutral density filters. Stimuli were presented as modulations around a mean gray background. The background photon absorption rate for the long (middle, short) wavelength-sensitive cones was approximately equal to the rate that would have been caused by a spatially uniform monochromatic light of wavelength 561 (530; 430) nanometers and intensity 9,200 (8,700; 7,100) photons·μm^{−2}·s^{−1}, incident on the photoreceptors.

RGCs were characterized and classified on the basis of their responses to a spatiotemporal white noise stimulus presented for 30 min (see Chichilnisky 2001; Sakai et al. 1988). The stimulus was a square lattice of randomly flickering pixels. Random flicker was created by selecting the intensities of the red, green, and blue display phosphors at each pixel location independently from a Gaussian or binary (2-valued) distribution on each stimulus frame. The light response properties of each cell were summarized by the average stimulus on the display over 250 ms preceding a spike (spike-triggered average, STA). The STA is a measure of how effectively stimuli at different locations and with different colors are integrated by the cell over time to control firing. The structure of each receptive field was measured by fitting the STA with a difference of elliptical Gaussians (center-surround) spatial profile, a difference of low-pass filters temporal profile, and a relative sensitivity to modulation of each phosphor. The product of these terms provided accurate fits to the space-time-color STA (Chichilnisky and Kalmar 2002). The receptive field diameter was defined as the geometric mean of the lengths of the major and minor axes of the 1 SD ellipse of the center component of the fit to the STA. The mean receptive field diameters for the parasol cells in each of the three retinas recorded was 150, 128, and 109 μm, respectively.

Moving bars were presented in blocks of trials with constant speed; direction of motion (0, 90, 180, and 270°) and contrast (±96%) were randomly interleaved within each block. The spatial profile of the bar in the direction of motion was a Gaussian function with a SD of 97 μm. The spatial profile of the bar orthogonal to the direction of motion was uniform and covered the entire area recorded. The speeds (number of trials) probed in each retina were: 7.3°/s (110–167 trials); 14.5°/s (144–214 trials); 29.0°/s (232–347 trials); 58.1°/s (338–505 trials). Stimulus dimensions and speeds were converted to degrees using the approximation 200 μm/° for the peripheral macaque retina (Perry and Cowey 1985).

The rasterization of the CRT display introduced a space-time sampled approximation of a moving bar. For example, a bar nominally moving at 58.1°/s (the highest speed tested) was in fact redrawn on the CRT every 8.33 ms displaced by 97 μm. The effect of this discretization was probably small. First, the refresh interval of the display was significantly shorter than the ∼60 ms excitatory portion of the parasol RGC impulse response (Chichilnisky and Kalmar 2002). Second, the spatial displacement of the bar at the highest speed tested was 1 SD of the bar profile and smaller than receptive field diameter and separation of on and off parasol cells (e.g., see Fig. 1*A*).

### Comparison to in vivo recordings

The maintained firing rate (mean ± SD across on and off cells) during exposure to spatially uniform background light was: 5.7 ± 0.3 and 1.8 ± 1.4, 4.6 ± 2 and 0.2 ± 0.2, and 2.1 ± 1.6 and 1.8 ± 1.7 Hz in each of the three retinas, respectively. These values were low compared with 21 ± 9 Hz reported for magnocellular-projecting RGCs in anesthetized, paralyzed animals (Troy and Lee 1994). The reason for the discrepancy is unclear. However, peak-evoked modulations were comparable to those observed in magnocellular-projecting cells recorded in vivo. The peak firing rate (mean ± SD across on and off cells) elicited by bars moving at 14.5°/s, measured in 25-ms bins, was: 78 ± 17 and 95 ± 32, 80 ± 21 and 77 ± 23, and 95 ± 26 and 84 ± 30 Hz in each of the three retinas recorded, respectively. In a previous study (Kremers et al. 1993), as the contrast of a 1.22-Hz squarewave modulation approached 100%, the peak firing rate (computed in 25 ms bins and expressed as an increment above an assumed maintained rate of 20 Hz) approached a maximum of ∼100 Hz. Because the Gaussian bar used in the present experiments enters the receptive field gradually and continues moving, it would be expected to elicit a somewhat smaller peak response, as was observed.

## RESULTS

To understand the retinal population code for visual motion, we approach three main issues. First, we determine what neural computation is required to extract the speed of a moving stimulus efficiently from RGC population activity. Second, we use this computation to characterize how faithfully the RGC population code specifies stimulus speed, and provide a simple explanatory model based on the timing precision of RGC responses. Third, we examine how essential aspects of retinal and central processing influence the precision of speed estimates.

### Extracting speed estimates from RGC population activity

The following four sections describe the foundation for measuring the retinal population code for motion and efficiently reading out the stimulus speed from measured spike trains.

#### • Measuring the entire population code

A challenging step in understanding a sensory population code is obtaining simultaneous recordings from the entire collection of relevant cells. The principal signals used by the visual cortex to sense motion are thought to be conveyed by the morphologically defined on and off parasol RGCs (Polyak 1941), the axons of which project to the magnocellular layers of the lateral geniculate nucleus (see Merigan and Maunsell 1993; Van Essen 1985). The cell bodies of the on and off parasol populations each form a regular mosaic with dendritic fields that tile the surface of the retina and thus uniformly sample visual space (Dacey and Brace 1992). To examine parasol cell population activity over a region of visual space, multi-electrode recordings were performed in pieces of peripheral primate retina. Visual responses of several hundred isolated RGCs, with receptive fields collectively covering ∼5° ×10° of visual angle, were recorded simultaneously using a 512-electrode system (Litke et al. 2004). Analysis was restricted to two functionally defined cell types having receptive field tiling and density, spectral sensitivity, response kinetics and contrast gain that closely correspond to those of the on and off parasol cells (Chichilnisky and Kalmar 2002). These two cell types will be referred to as parasol cells in what follows.

An example of the ensemble activity elicited by a moving bar superimposed on a photopic background is shown in Fig. 1. Figure 1*A* shows the receptive field outlines of a mosaic of 56 on parasol cells obtained with white-noise stimulation and reverse correlation (see methods), along with an image of a moving bar with a Gaussian intensity profile. The nearly complete mosaic of receptive fields provides strong evidence that in this region of retina, nearly every on parasol cell was recorded, revealing the complete population code. Figure 1*B* shows, in raster format, the spike trains obtained from these cells in a single trial in which the bar drifted from left to right. As the bar crossed the receptive field of each cell, it elicited spikes in excess of background activity. The relative timing of responses in different cells reflects a wave of activity in the parasol cell population. This wave is the principal signal used by the cortex to sense visual motion.

#### • Speed estimation

To probe how faithfully parasol RGCs signal visual motion, a procedure was developed to estimate the speed of the moving bar directly from the relative timing of responses in different cells. The procedure, described in this section, was then applied to quantify the precision of speed estimates across trials.

The concept behind the speed estimation procedure is that if all RGCs respond identically, then a translating stimulus should on average produce the same response waveform in each cell, shifted in time. Thus the evidence for movement at a particular speed is given by the degree of alignment of spike trains from different cells, after compensating for the response time shift expected at that speed (see Fig. 1). This concept can be implemented using the peak response in a collection of detectors tuned for different speeds. The output of each detector is based on cross-correlation (Reichardt 1961), a central element of standard models of motion sensing, including motion energy algorithms that have been used to describe the responses of direction-selective neurons in visual cortex (Adelson and Bergen 1985; Emerson et al. 1992; Simoncelli and Heeger 1998; Watson and Ahumada 1985). Note, however, that this procedure is not intended to represent an explicit model of motion sensing in the brain (see discussion).

The procedure proceeds as follows (Fig. 2). Consider the case of two cells, *A* and *B.* A *motion signal* tuned for a particular speed is computed from their responses by delaying the spike train of one cell, smoothing both spike trains over time with a filter, multiplying the resulting signals pointwise to detect coincidences, and integrating the result over the duration of the trial. Specifically, let *r _{A}*(

*t*) and

*r*(

_{B}*t*) represent the firing rate of each cell as a function of time during the trial. These are obtained by representing the spike trains at floating point resolution, convolving with a Gaussian filter

*f*(

*t*) = exp(−

*t*

^{2}/2τ

^{2}), and sampling the result at intervals of τ. Denote by Δ

*x*the known separation of the receptive fields along the axis of motion, computed using the parametric fit to the receptive field profile of each cell (see methods). Then a motion signal indicating the evidence for movement at speed

*s*is obtained by delaying the response of cell

*A*by an amount Δ

*t*= Δ

*x*/

*s*, multiplying pointwise by the response of cell

*B*, and summing the result over all time points in the trial:

*R*= ∑

_{t}

*r*(

_{A}*t*− Δ

*t*)

*r*(

_{B}*t*). Note that

*r*is circularly shifted in time—rather than cropped from a longer response—to match the length of

_{A}*r*(circular shifting provided a convenient and accurate approximation of an extended period of background activity before and after the response, to avoid having to record long periods of background activity between trials). Finally, to minimize potential bias due to spontaneous activity, a signal indicating the evidence for motion at the same speed in the opposite direction is created symmetrically,

_{B}*L*= ∑

_{t}

*r*(

_{B}*t*− Δ

*t*)

*r*(

_{A}*t*), and the

*net motion signal*

*N*is given by the difference,

*N*=

*R*−

*L.*

If the speed of the stimulus matches the separation of the receptive fields divided by the delay, the delay aligns the stimulus-elicited activity in cell *A* with the stimulus-elicited activity in cell *B*, causing the product of the signals and thus the net motion signal to be large. Thus the preceding computation was repeated for a number of different detectors, each tuned for a different speed *s*. The speed estimate was the value of *s* that maximized *N* (or, for leftward stimuli, −*N*). The maximum was obtained using an iterative search (Powell's method) (Press et al. 1988) over the range 0.5–500°/s (for comparison, the range of speed tunings of neurons in area MT is roughly 2–256°/s) (Maunsell and Van Essen 1983).

The net motion signal for a collection of cells was obtained by adding the net motion signals obtained from all distinct pairs. This pairwise computation is mathematically equivalent to an approach that measures the alignment of shifted responses from all cells, by summing, squaring, and integrating shifted responses over time (Chichilnisky and Kalmar 2003). Specifically, the response *r _{i}*(

*t*) for the

*i*

^{th}cell is delayed by an amount Δ

*t*=

_{i}*x*/

_{i}*s*, where

*x*is the position of the receptive field along the axis of motion, yielding a right-shifted response

_{i}*r*(

_{i}*t*− Δ

*t*) and a left-shifted response

_{i}*r*(

_{i}*t*+ Δ

*t*). The net motion signal is

_{i}*N*= ∑

_{t}[∑

_{i}

*r*(

_{i}*t*− Δ

*t*)]

_{i}^{2}− ∑

_{t}[∑

_{i}

*r*(

_{i}*t*+ Δ

*t*)]

_{i}^{2}.

To illustrate how the procedure works, Fig. 1*C* shows spike trains from a single stimulus presentation delayed according to several speed tuning (putative speed) values, and the net motion signals for detectors tuned to these speeds. When the putative speed was near the correct speed (*middle*), the delayed spike trains were maximally aligned. Thus the detector tuned to the correct speed yielded the largest motion signal. Figure 3*A* shows the net motion signal as a function of speed tuning for a single stimulus presentation. The peak of this function—the extracted speed estimate—was close to the true speed.

Importantly, the preceding approach provides veridical speed estimates for any stimulus because, in general, shifting according to an incorrect speed cannot cause spike trains to align more accurately than the shifting according to the correct speed. Thus the procedure yields a true speed signal and avoids the known bias in the speed tuning of a single, two-input Reichardt detector (see Dror et al. 2001). Also, the large collection of irregularly spaced inputs avoids aliasing that occurs in a single, two-input Reichardt detector with periodic stimuli.

#### • Measuring the precision of speed estimates

To quantify how faithfully the retina transmits information about speed, the variability of speed estimates across trials was examined. A histogram of speed estimates for one condition is shown in Fig. 3*B*, along with a Gaussian distribution with the same mean and SD.

In this case and most others examined, a Gaussian distribution provided a reasonable approximation. A test statistic (χ^{2}) was computed by summing the squared deviations of observed counts from those expected of a Gaussian distributed variable with the same mean and SD (Rice 1988; p. 226) divided by the expected counts. In the null hypothesis of Gaussian-distributed speed estimates, the distribution of χ^{2} is approximately chi-square with *N* − 3 degrees of freedom, where *N* is the number of bins. Using a filter width τ = 0.01 s, χ^{2} was below the 99th percentile of the chi-square distribution in 83% of cases tested. Thus the accuracy and precision with which the population of RGCs signaled stimulus speed were reasonably well summarized by the mean and SD of the distribution, respectively. If χ^{2} exceeded the 99th percentile of the chi-square distribution, the condition was excluded from certain analyses (a condition refers to on or off cells in a particular retina, tested with a specific stimulus speed and contrast).

Speed estimation from real spike trains would be expected to exhibit random deviations from the true speed due to noise in phototransduction or retinal processing, but could also exhibit systematic errors. Figure 3*C* shows a histogram of the mean speed estimate minus the true speed (i.e., bias) expressed as a fraction of the SD, for all conditions examined. The mean of the distribution shown is −0.3, indicating a weak tendency to underestimate speed. In 85% of conditions examined, the ratio of the absolute value of bias to SD was <2, indicating that bias was on the order of the variability. Because the bias is small and because bias in principle can be compensated by downstream calibration, whereas trial-to-trial variability cannot, in what follows the SD of speed estimates will be taken as a measure of the fidelity of retinal speed signals and the bias will not be considered further.

#### • Optimal temporal filtering for speed estimation

The temporal filter applied to spike trains to estimate speed (see Fig. 2) permits efficient detection of alignment in delayed spike trains while allowing for some spike timing jitter from trial to trial. Such filtering might be expected to occur in the synapses on to direction-sensitive neurons in the visual cortex and is an essential consideration for precise speed estimation. Although the optimal temporal filtering for left-right direction discrimination was determined in a previous study (Chichilnisky and Kalmar 2003), a fine-grained task such as speed discrimination could in principle utilize much finer filtering. The remainder of this section shows that a filter width of ∼10 ms produced maximum speed estimate precision over the range of conditions examined, so a filter width of 10 ms will be used in sections that follow.

Optimal filtering was determined empirically, by finding the filter width that minimized the SD of speed estimates. An example is shown for one condition in Fig. 4*A*. A filter width of 15 ms minimized speed estimate SD; much narrower or wider filters produced SD values up to threefold higher. The optimal filter width was in the range of tens of milliseconds over a wide range of conditions. The ○ in Fig. 4*B* show the optimal filter width for all conditions examined, determined by computing the SD of speed estimates across trials as a function of filter width over the range 1–100 ms, fitting the results with a polynomial, and extracting the minimum of the fit. Optimal filter width declined with stimulus speed to a minimum of ∼7 ms at the highest speeds probed (Chichilnisky and Kalmar 2003). The dependence on speed was approximated by the function τ_{s} = τ_{∞} + α/*s*, where *s* is the speed, τ_{s} is the optimal filter width for speed *s*, τ_{∞} is the optimal filter width for asymptotically high speeds, and α is a constant.

For the analysis of speed estimate variability in what follows, a fixed filter width of 10 ms was used, rather than a filter width which varied with stimulus speed. This provided speed estimates with nearly minimum variability for all speeds tested (e.g., see Fig. 4*A*) and may provide a more realistic approximation of downstream processing than a stimulus-dependent filter width. Note that filter widths much larger or smaller than 10 ms gave rise to more outliers in speed estimate distributions, resulting in greater deviations from Gaussian statistics (not shown).

Optimal filter width could be systematically overestimated by two experimental limitations: misestimation of receptive field locations due to spatial discretization of the stimulus and limited recording time, or discretization of the moving bar image in space and time due to temporal refresh of the display. These possibilities were tested by computing effective receptive field locations directly from responses to moving bars. Average responses across trials were used to determine delays between cells that resulted in maximum response alignment. These delays were multiplied by the stimulus speed to determine effective receptive field locations, which were then used for trial-by-trial speed estimation. The optimal filter width obtained with this procedure, shown with • in Fig. 4*B*, was similar to that measured using locations extracted from direct receptive field measurements, for all speeds tested. Using a filter width of 10 ms, the median ratio of the SD of speed estimates obtained with the modified and standard procedure was 0.98. These findings suggest that discretization and finite data effects had little effect on speed estimates or optimal filter width.

The optimal filter width can be used to infer the number of spikes from each cell that typically contribute to the elementary motion signal (Chichilnisky and Kalmar 2003). If the interspike interval (ISI) is always much larger than the optimal filter width, optimal motion sensing preserves the distinction between sequential spikes and motion information is effectively conveyed by individual spike times. Conversely, if the ISI is always much smaller than the filter width, optimal motion sensing integrates over many spikes and motion information is effectively conveyed by variations in firing rate. The ratio of ISI to optimal filter width, accumulated across the period in each spike train when the bar overlapped the receptive field of the cell, is shown in Fig. 4*C*. The modal ratio was near unity: the median was 0.62, and 72% of values were <1. Although the ratio of ISI to optimal filter width spans a wide range, the concentration of values near unity indicates that optimal speed estimation typically requires integrating over one to a few spikes from each cell.

#### • Efficiency of speed estimation procedure

The variability of extracted speed estimates accurately reflects the precision of retinal signals if and only if the estimation procedure efficiently extracts information about stimulus speed. To test the efficiency of the procedure, its performance was compared with four alternative approaches. The remainder of this section demonstrates that each alternative procedure exhibited speed estimate variability similar to or higher than the correlation procedure, consistent with the idea that the correlation procedure is efficient.

For each alternative procedure, as with the standard correlation procedure, the speed estimate was selected to maximize the alignment of spike trains, after delaying each spike train by an amount equal to the receptive field position along the axis of motion divided by the speed tuning of the detector. As alternatives to cross-correlation, four measures of alignment were tested, and the rightward motion signal was computed as follows.

##### FOURTH-ORDER CORRELATION.

Pointwise products of responses considered in groups of four. The shifted response vectors *r _{i}*(

*t*− Δ

*t*) were summed pointwise, yielding

_{i}*m*(

*t*) = ∑

_{i}

*r*(

_{i}*t*− Δ

*t*). The motion signal was given by ∑

_{i}_{t}

*m*(

*t*)

^{p}, with

*p*= 4. This is a generalization of the multi-cell equivalent of the cross-correlation procedure (see preceding text), in which

*p*= 2.

##### SEPARABILITY.

The fraction of the variance of a collection of responses explained by the first principal component. The shifted response vectors *r _{i}*(

*t*− Δ

*t*) were placed in the rows of a matrix. The singular value decomposition was computed, yielding singular values {

_{i}*s*

_{1}…

*s*}. The motion signal was given by

_{K}*s*

_{1}

^{2}/(

*s*

_{1}

^{2}+…+

*s*

_{K}^{2}).

##### ENTROPY.

Temporal dispersion of the summed responses. The shifted response vectors were summed, and the result *m*(*t*) = ∑_{i}*r _{i}*(

*t*− Δ

*t*) was normalized to unit integral,

_{i}*n*(

*t*) =

*m*(

*t*)/∑

_{t}

*m*(

*t*). The motion signal was given by the negative of the entropy of the result, i.e., ∑

_{t}

*n*(

*t*) log

_{2}

*n*(

*t*).

##### DISTANCE.

Summed pairwise difference in Euclidean distances between responses from different cells. The motion signal was −∑_{i}_{≠}_{j}‖*r _{i}*(

*t*− Δ

*t*),

_{i}*r*(

_{j}*t*− Δ

*t*)‖, where ‖ · ‖ indicates Euclidean distance between vectors.

_{j}Leftward motion signals were computed analogously based on left-shifted response vectors *r _{i}*(

*t*+ Δ

*t*), and the net motion signal was used for speed estimation as in the preceding text. Figure 5

_{i}*A*shows the optimal filter width for each measure as a function of that for the correlation measure, across all conditions tested. In each case, the optimal filter width was similar to that obtained with the correlation measure. For each measure, overall speed estimate variability was obtained using the optimal filter width for that measure. Figure 5

*B*shows the performance of each alternate procedure compared with that of the standard procedure. In all cases, alternate procedures exhibited speed estimate varibility similar to or higher than the standard procedure.

Note that nonopponent speed estimation (using individual motion signals *L* and *R* for estimating speeds of leftward and rightward targets, respectively) produced results very similar to opponent estimation (using the net motion signal *N*). The median ratio of speed estimate SD obtained with nonopponent and opponent procedures across all conditions examined was 0.97.

### Precision of retinal speed estimates

The procedures in the preceding text provide a measure of how precisely the retina transmits speed information to the brain. Because this precision may depend on stimulus speed—due to the kinetics of RGC responses, spike train statistics, and accumulation of information over time—speed estimate variability was examined for a range of bar speeds.

Figure 6 shows fractional speed estimate variability (SD of estimates divided by true speed) as a function of speed, for all conditions tested. Each point represents data obtained from 35 to 68 on or off parasol cells in one retina. Across the range of speeds examined, fractional speed estimate variability was on the order of 1% of the stimulus speed, increasing roughly in proportion to speed at the highest speeds tested.

#### • Simple model of speed estimate precision

The trend in Fig. 6, as well as the dependence on the number and spatial arrangement of cells, can be understood in terms of the timing precision of RGC responses. This section provides a theoretical prediction for speed estimate variability based on the following assumptions. *1*) Each RGC signals only the time of arrival of a stimulus at its receptive field. *2*) Speed estimates from different cell pairs are combined optimally. *3*) The variability of RGC timing signals is inversely related to speed. The derivation proceeds as follows.

Consider the simplest speed estimate obtained from two RGCs, each of which signals only the time of arrival of a stimulus at the receptive field. Assume the cells are separated by a distance Δ*x*, and stimulated with a bar moving at speed *s.* The time required for the bar to move from one cell to the next is Δ*t* = Δ*x*/*s*. If each RGC provides a noisy signal indicating the time of arrival of the stimulus, denote the time difference signal from the pair of cells by Δ*t* + ε, where the noise ε has SD denoted by σ_{t}. A simple speed estimate from the pair is: *e* = Δ*x*/(Δ*t* + ε). The variability of *e* can be approximated by the SD of the response time difference multiplied by the absolute value of the derivative of the estimate with respect to the time difference (the approximation is valid for σ_{t} ≪ Δ*t*) (see Bevington and Robinson 1992). Hence to first order, the speed estimate variability from the pair is (1) Now assume that speed estimates are obtained by optimally pooling information from all cell pairs. Assume that only disjoint pairs are used and that these provide statistically independent speed estimates. Denote the speed estimate from each pair by *e _{i}*, with SD given by σ

_{ei}=

*s*

^{2}σ

_{t}/Δ

*x*as in the preceding text. A speed estimate from the collection may be obtained by computing the weighted sum:

_{i}*e*

_{pool}= (∑

*e*/σ

_{i}_{ei}

^{2})/(∑1/σ

_{ei}

^{2}). This weighting causes

*e*

_{pool}to have minimum variance, σ

_{pool}

^{2}= 1/∑(1/σ

_{ei}

^{2}), in the case of independent data (see Bevington and Robinson 1992). Substituting for σ

_{ei}yields (2) To determine how the variability of the pooled estimate, σ

_{pool}, depends on the number of cells and their spatial arrangement along the

*x*and

*y*dimensions, consider only the term that depends on the locations of the cells:

*S*= ∑Δ

*x*

_{i}

^{2}. Consider the case of a lattice of cells with density

*p*filling a rectangular region of area

*xy*, where

*x*specifies the dimension along the axis of motion, and

*y*the dimension along the orthogonal axis. The specific pairings of cells used for speed estimation influence σ

_{pool}(see below). So, consider an optimal pairing rule in which the first pair consists of the two cells most widely spaced in the

*x*dimension, the second pair consists of the next two most widely spaced cells (distinct from the first 2 cells), and so on (note that

*S*is independent of the

*y*coordinates). For a small increase δ

*x*in

*x*, the number of cells added is

*py*δ

*x*, and half as many cell pairs are added. By the pairing rule, each new pair consists of cells at both extremes along the

*x*dimension, hence each pair produces an increment Δ

*x*

_{i}

^{2}≈

*x*

^{2}in the sum

*S.*Therefore the increase in

*S*is δ

*S*=

*x*

^{2}

*yp*δ

*x*/2. This yields δ

*S*/δ

*x*=

*x*

^{2}

*yp*/2; integrating with respect to

*x*gives

*S*=

*x*

^{3}

*yp*/6. Substituting the preceding yields (3) Note the stronger dependence on

*x*than on

*y.*Also note that a suboptimal choice of pairings yields higher speed estimate variability. For example, consider the case where all pairs are nearest neighbors in the

*x*dimension. Then the sum

*S*is the square of the neighbor spacing, Δ

*x*

_{i}

^{2}= 1/

*p*, multiplied by the total number of pairs,

*pxy*/2. Hence σ

_{pool}=

*s*

^{2}σ

_{t}(

*xy*/2)

^{−1/2}, which is a factor of

*x*(

*p*/3)

^{1/2}higher than the value obtained with the pairing rule in the preceding derivation.

Finally, the timing variability σ_{t} would be expected to depend on parameters of the stimulus, such as stimulus speed. The inverse dependence of optimal filter width on speed (Fig. 4*B*) suggests a similar dependence for timing variability: σ_{t} = σ_{∞} + α/*s*. Substituting above yields (4) This prediction for the dependence of speed estimate variability on speed, with parameters α and σ_{∞} fitted to the data, is shown in Fig. 6. The accuracy of the fit is consistent with the idea that speed estimate precision is governed by the timing precision of RGC responses. As will be shown in the following text, the same model also provides accurate predictions for the dependence of speed estimate precision on the number and spatial arrangement of RGCs.

In summary, the results in Fig. 6 reveal the limits to behavioral speed estimation imposed by the population code in parasol RGCs, and are consistent with a simple model. What follows is an analysis of the factors that contribute to speed estimate precision and consequences for readout of the population code in the brain.

### Retinal limits on speed estimation

Several major features of retinal processing may influence speed estimate fidelity. Correlated activity, known to be significant in adjacent RGCs of like type, may reflect common signal and noise and thus may influence performance. Timing structure of retinal spike trains may transmit motion information differently than expected from simple variations in firing rate. The number and spatial arrangement of cells would be expected to influence the fidelity of motion signals. Finally, on and off parasol cells, with receptive fields that tile the same area of the visual world, may convey motion signals with different efficiency, and may exhibit redundancy due to common photoreceptor inputs. These contributions to the precision of retinal motion signals are examined in turn.

#### • Correlated activity

Correlated firing at rates significantly higher than expected by chance has been described in pairs of nearby cells of like functional type in cat and rabbit retina (DeVries 1999; Mastronarde 1983); in salamander retina correlated firing has been proposed to be important for visual signaling (Meister et al. 1995). Similarly, adjacent pairs of on parasol cells and off parasol cells in primate retina fire synchronized spikes (±5 ms) at rates roughly twice that expected by chance in the recording conditions used here (Chichilnisky and Baylor 1999). This synchronized firing, as well as other forms of response covariation between cells, could influence how precisely ensembles of RGCs transmit information about stimulus motion.

To probe the effects of correlated activity, the observed speed estimate variability was compared with the variability obtained from artificially shuffled ensemble responses consisting of spike trains from a different trial for each cell. This manipulation removes covariation, enforcing statistical independence between spike trains from different cells while preserving the response statistics of each cell. Figure 7 shows the speed estimate variability obtained with shuffled data as a function of the speed estimate variability obtained with unshuffled data, across all conditions tested. The data cluster near the identity line. Shuffled data displayed a statistically significant (*P* < 0.001, Wilcoxon signed-rank test) but very weak (median ratio: 0.96) tendency toward more precise speed estimates. In summary, eliminating correlated activity in RGC spike trains had very little effect on speed estimates.

#### • Timing structure in spike trains

Many models of visual processing assume that information is communicated from retina to brain by the firing rates of RGCs, specifically, that RGC spikes are generated approximately independently of one another over time according to a Poisson process with a time-varying rate. A Poisson model fails to account for phenomena such as action potential refractoriness, and the non-Poisson intrinsic timing structure of RGC spike trains has been the subject of several recent studies (Berry et al. 1997; Reich et al. 1997; Uzzell and Chichilnisky 2004). However, it is unclear whether the intrinsic structure of retinal spike trains plays an important role in communicating behaviorally relevant visual signals or whether the firing rate model provides an approximation adequate for understanding downstream processing.

To distinguish these possibilities, the speed estimate variability obtained with RGC spike trains was compared with the variability obtained from artificial spike trains generated by Poisson spiking with the observed time-varying rate. The artificial spike train for a given cell and trial was created by sampling spike times, with replacement, across all trials for that cell and stimulus. The number of spikes in the resampled spike train for each trial was on average equal to the number of spikes in recorded spike trains. Figure 7*B* shows the comparison of speed estimate variability obtained with real and Poisson spike trains for all conditions tested. The data lie systematically above the identity line, particularly for the lower fractional SD values. The median ratio of the SD obtained from resampled data to that obtained with the original data was 1.50. The higher variability obtained with resampled data could not be attributed to nonstationarity of responses over the course of the experiment, because the shuffling analysis of Fig. 7*A* did not produce such an effect. Thus the intrinsic timing structure of RGC spike trains allows them to convey stimulus speed information more faithfully than would be expected from a Poisson model of RGC spiking.

#### • Spatial arrangment of receptive fields

Because motion is represented in a wave of activity in the parasol RGC population, the spatial arrangement of the cells used for readout could influence the fidelity of speed estimates extracted by the brain. For example, Fig. 8*A* shows the receptive field outlines of a collection of on parasol cells, with receptive fields that clustered in a region of retina, and a collection of the same number of on parasol cells (simultaneously recorded, partially overlapping), with more dispersed receptive fields. The distributions of speed estimates obtained from each of these ensembles in one stimulus condition are shown in the histograms. The variability of speed estimates obtained from the clustered cells was substantially larger than that from the dispersed cells. Pooled data in Fig. 8*C* for all such conditions tested (○) show the same trend.

The difference in speed estimates obtained from clustered and dispersed cells could arise from redundancy due to correlated activity in nearby cells. To test for this possibility, the same comparison was performed with shuffled responses (•). The similarity of the shuffled and unshuffled results indicates that response covariation did not account for the effect of spatial arrangement.

An alternative possibility is that speed estimates obtained from cells more dispersed along the axis of motion are less sensitive to response timing jitter. To illustrate this possibility, consider the simple model in the preceding text in which speed estimate precision for a given cell pair is limited by how accurately individual RGCs signal the time of stimulus arrival. Distant cells are relatively less affected by response timing jitter (*Eq. 1*) because of the large temporal separation in responses to the moving stimulus.

To test for this possibility, the variation of speed estimates was examined using a stimulus that moved either in a direction that created large temporal separation of responses or in a direction that created small temporal separation of responses, using the same collection of cells, as shown in Fig. 9*A*. Results are shown for one example in *B;* pooled results are shown in *C.* Larger temporal separations resulted in more precise speed estimates (○). This was not affected by shuffling responses across trials (•).

These findings may be understood in terms of the simple model. The increased precision obtained with responses widely separated in time is reflected in the fact that the predicted SD of speed estimates of the pool of RGCs declines as the 3/2 power of distance along the direction of motion, and as the 1/2 power of distance orthogonal to the axis of motion (*Eq. 3*). Thus for an array of cells in 2:1 aspect ratio, the SD for motion along the long axis should be roughly half that for motion along the short axis, as was observed (Fig. 9).

These findings suggests that temporal separation of responses is the primary determinant of how spatial arrangement affects speed estimation. Note that the use of widely spaced cells in speed estimation implicitly assumes constant speed over the duration required for the stimulus to travel from one cell to the other; this assumption is valid in the present task but may not be for more natural stimuli (see discussion).

#### • Number of cells

Large receptive fields of motion-sensitive neurons in extrastriate cortex (Albright and Desimone 1987) may provide more accurate estimates of stimulus speed by integrating over many inputs. However, the benefits of such pooling depend on the spatial arrangement of input signals (see preceding text). To examine the potential benefits of pooling many RGC inputs for motion sensing, speed estimate variability was examined for subsets of recorded RGCs.

Figure 10*A* shows a collection of on parasol cells as well as two subsets of this collection obtained by discarding cells sequentially, orthogonal to the axis of motion. Figure 10*B* shows speed estimate variability as a function of the number of cells in these subsets on a double logarithmic scale. As expected, the variability of speed estimates decreased with the number of cells. The steepness of this relation was estimated by fitting a line to data such as those in Fig. 10*B* and extracting the slope. Results accumulated across all conditions tested are shown in the histogram of Fig. 10*C*, which reveals slopes near −1/2. Figure 10*D* shows the dependence of variability on the number of cells pooled across all conditions tested; data have been normalized (vertically shifted) for each condition independently. These normalized data fall roughly on a common line, suggesting a lawful relationship between speed estimate variability and the number of cells used.

The same analysis was performed on subsets obtained by discarding cells sequentially parallel to the axis of motion, as illustrated in Fig. 10*E*. In this case, the dependence of speed estimate variability on cell number was steeper (roughly −3/2) as shown in Fig. 10, *F–H*. The steeper slope is consistent with the preceding observation that cells widely dispersed along the axis of motion provide more precise speed estimates, so that removing the cells most widely dispersed along the axis of motion has the largest effect on speed estimates.

These trends can be understood quantitatively with the simple model. Because the SD of the pooled speed estimate declines as the 3/2 power of distance along the axis of motion and as the 1/2 power in the orthogonal direction (*Eq. 3*), the data in Fig. 10, *A–D*, should exhibit a slope of −1/2 and the data in Fig. 10, *E–H*, should exhibit a slope of −3/2, similar to the values observed. The regular dependence of speed estimate variability on cell number, along with the model, provides a basis for predicting the precision of speed estimates obtained with smaller or larger ensembles of RGCs.

#### • Relative efficiency of on and off speed estimates

on and off parasol cells, which are primarily excited by increments and decrements of light respectively, may be specialized to signal motion more accurately for stimuli of matched polarity (positive contrast for on, negative contrast for off). Furthermore, due to spatial, kinetic and contrast-response asymmetries (Chichilnisky and Kalmar 2002), one pathway could convey to the brain higher fidelity speed estimates for both kinds of stimuli. Such asymmetries could determine how on and off signals are used downstream for speed estimation.

The variability of speed estimates obtained from equal numbers of on and off cells was compared using moving bars of positive and negative contrast. The three panels in Fig. 11 show the comparison for positive contrast stimuli, negative contrast stimuli, and matched contrast stimuli (positive for on cells, negative for off cells). on cells signaled the speed of positive contrast stimuli more faithfully than off cells, and off cells signaled the speed of negative contrast stimuli more faithfully than on cells. This would be expected from response rectification elicited by nonmatched stimuli that strongly suppress firing. For matched polarity stimuli, on and off cells exhibited similar speed estimate variability. Thus the circuits converging on on and off parasol cells represent motion information with similar precision.

#### • Statistical independence of on and off speed estimates

Given that the on and off parasol cells provide motion signals of comparable precision, cortical neurons may pool speed information transmitted by these populations to obtain faithful speed estimates. However, because on and off cells sample the same region of space and thus receive inputs from the same photoreceptors, on and off circuits may exhibit significant common noise. Such redundancy could limit or eliminate the benefits of pooling. The degree of redundancy in on and off motion signals was examined by measuring the degree to which pooling on and off signals reduced the variability of speed estimates.

Figure 12*A* shows the receptive fields of a group of on cells and a group of off cells simultaneously recorded. These receptive fields covered approximately the same area of retina, therefore the two cell groups received inputs mostly from the same photoreceptors. Figure 12*B* shows histograms of speed estimates obtained from the two cell groups in one stimulus condition. A pooled estimate of speed from both populations may be obtained by taking a weighted sum of on and off estimates with weights that minimize variance across trials in the case of independent data: *s _{P}* = (

*s*

_{ON}σ

_{OFF}

^{2}+

*s*

_{OFF}σ

_{ON}

^{2})/(σ

_{ON}

^{2}+ σ

_{OFF}

^{2}), where

*s*

_{ON}and

*s*

_{OFF}represent the speed estimates from on and off cells, and σ

_{ON}and σ

_{OFF}represent the SD of speed estimates across trials for on and off cells, respectively (Bevington and Robinson 1992).

The distribution of the pooled speed estimate s_{P} across trials is shown in Fig. 12*C, left*. This distribution had a lower SD (0.047°/s) than either of the individual distributions (0.054 and 0.076°/s). To probe the statistical dependence of on and off population speed estimates, consider two extreme possibilities. If *s*_{ON} and *s*_{OFF} are statistically independent, the SD of the pooled estimate *s _{P}* across trials is σ

_{I}= [σ

_{ON}

^{2}σ

_{OFF}

^{2}/(σ

_{ON}

^{2}+ σ

_{OFF}

^{2})]

^{1/2}. If, instead,

*s*

_{ON}and

*s*

_{OFF}are perfectly correlated across trials, with

*s*

_{ON}∝

*s*

_{OFF}, then the SD of

*s*across trials is σ

_{P}_{C}= (σ

_{ON}σ

_{OFF}

^{2}+ σ

_{OFF}σ

_{ON}

^{2})/(σ

_{ON}

^{2}+ σ

_{OFF}

^{2}). The SD of the distribution in Fig. 12

*C*was lower than would be expected from perfectly correlated variability (σ

_{C}= 0.061°/s) but was similar to what would be expected from statistical independence (σ

_{I}= 0.044°/s). This suggests that speed estimates may be primarily limited by independent noise in on and off cells.

A further test was obtained by forcing the on and off data to be statistically independent, by combining on cell speed estimates from each trial with off cell speed estimates from a different trial. The distribution of pooled speed estimates across trials from the shuffled data, shown in the second panel of Fig. 12*C*, was similar to the distribution of pooled estimates from the original data, consistent with statistical independence in the original data.

Pooled data from all conditions tested are shown in Fig. 12*D*. For each condition, an index of covariation was computed: *I* = (σ_{P} − σ_{I})/(σ_{C} − σ_{I}), where σ_{P} is the SD of the pooled speed estimate, σ_{I} is the expectation from independent variability in on and off cells, and σ_{C} is the expectation from perfectly correlated variability in on and off cells. The index should assume a value of 0 in the case of independent data or 1 in the case of perfectly correlated data. The observed distribution of the index clusters near 0. The distribution of the same index computed on shuffled data from on and off cells (see preceding text) is similar.

Taken together, these data indicate that the dominant source of speed estimate variability is independent in on and off parasol cells receiving inputs from roughly the same population of photoreceptors. Independence may result from neural processing and/or noise downstream of the photoreceptors (see discussion).

### Central limits on speed estimation

While noise and processing in retinal circuits limit how faithfully the brain can sense motion (see preceding text), central processing could impose additional limits. For example, if central circuits involved in motion sensing were to significantly corrupt their inputs with noise, behavioral motion sensitivity could be degraded. However, even in the absence of additional noise, the organization of motion sensing circuits in the brain could influence behavioral motion sensitivity. This possibility was explored by comparing two simple models of how signals from speed-tuned units may be combined to produce speed estimates:

##### PEAK.

Use the peak of the distribution of speed-tuned units as an estimate of stimulus speed (see Figs. 2 and 3*A*).

##### CENTROID.

Use the centroid of the distribution of speed-tuned units as an estimate of stimulus speed (i.e., the mean of the distribution of Figs. 2 and 3*A*). This approach, unlike the peak approach, is guaranteed to provide an unambiguous result in all stimulus conditions.

For both models, it was assumed that for each putative speed *s* over the range 0.5–500°/s, the response *N*(*s*) of a unit tuned to speed *s* is determined by an opponent cross-correlation of delayed spike trains (Fig. 2), summed across all pairs of RGCs. The peak speed estimate was the value of *s* which yielded the maximal value of *N*(*s*), as in preceding analyses. The centroid speed estimate was the weighted sum of speeds, with the weight equal to the response of the unit tuned to that speed: ∑_{s}*s* ⌊*N*(*s*)⌋/∑_{s} ⌊*N*(*s*)⌋ (64 logarithmically spaced samples of speed were used). Here, ⌊·⌋ represents clipping negative values to zero.

Figure 13 (⧫) shows speed estimate variability obtained with centroid readout, as a function of speed. For comparison, variability obtained with peak readout is replotted from Fig. 6 (○). Two major trends are evident. First, centroid readout was up to an order of magnitude less precise than peak readout. Second, the difference in performance was largest at lower speeds, for which peak readout was most precise. The median ratio of speed estimate SD obtained with centroid readout to that obtained with peak readout was 6.7.

Several aspects of the difference in performance were examined in more detail. First, the difference cannot be attributed to coarse sampling of speed, because variability was asymptotic in the number of samples (not shown). Second, if centroid readout were more robust to noise than peak readout, the difference in performance could be diminished or reversed in the case of weaker responses (e.g., low contrast stimuli). This possibility was tested by artificially subsampling 1/2 or 1/4 of recorded spikes from each cell, and repeating the comparison using the subsampled data. The difference in performance was only slightly reduced (median ratio: 5.4, 5.9). Third, the difference could arise because in centroid readout, all units—including those carrying noise but little useful signal—contribute to the speed estimate, whereas in the peak readout only units with speed tuning near the correct speed contribute. This problem could be counteracted by broadening the speed tuning curve of each unit. Increasing the filter width for centroid readout to 100 ms, thereby increasing the speed tuning bandwidth, reduced speed estimate SD, but still yielded much greater SD than peak readout (median ratio: 5.3).

In summary, centroid readout provided less precise speed estimates than peak readout. If such an architecture were used in motion sensing circuits in the brain, these circuits could place the dominant limit on behavioral motion sensitivity in the conditions tested.

#### Controls

In the present experiments, parasol cells exhibited evoked firing rates comparable to what would be expected from in vivo studies, but low maintained firing rates (see methods). To investigate the effects of low maintained firing rates, speed estimation was performed with artificial spikes added to the data at random times, at a rate of 20 Hz. In the presence of artificially elevated background firing, the SD of speed estimates was approximately doubled in all conditions tested (median ratio: 2.1). Resampling of spike trains (Fig. 7*B*) continued to produce a systematic increase in speed estimate SD (median ratio: 1.2). Centroid readout (Fig. 13) continued to produce higher speed estimate SD than peak readout (median ratio: 12.3).

## DISCUSSION

We have established the limits to sensory performance imposed by a neural population code in a behaviorally relevant visual task. To systematically examine the entire visual signal relevant for speed estimation, we performed large-scale simultaneous recordings from mosaics of on and off parasol cells, which provide the dominant inputs to motion-sensing circuits in the primate brain. This approach illuminated some of the key issues involved in reading a population code. Simple neural computations (Figs. 2 and 3) at time scales optimized for readout precision (Fig. 4) efficiently extracted information about stimulus speed from RGC spike trains (Fig. 5). The ensemble activity of ∼100 RGCs signaled speed with a precision of ∼1% (Fig. 6), much finer than previous estimates of speed discrimination in human observers. Precision was not influenced by correlated activity in RGCs but did depend on the intrinsic timing structure of RGC spike trains (Fig. 7). The effects of stimulus speed (Fig. 6) and the number and spatial arrangement of RGCs (Figs. 8–10) were explained simply in terms of the timing variability of RGC responses and optimal pooling. This framework provided a basis for predicting the speed estimate precision that would be obtained with stimuli of different speeds covering different areas of retina. on and off cells with overlapping receptive fields provided signals with similar precision that were nonredundant (Figs. 11 and 12), probably due to neural processing downstream of the photoreceptors. Finally, simulations indicated that the architecture of readout from populations of speed-tuned neurons in the brain could profoundly affect how efficiently retinal motion signals are exploited for visual perception and behavior (Fig. 13).

### Extracting motion signals from the ensemble code

To interpret the precision of speed estimates as revealing intrinsic limits of retinal signals, it is important that the readout approach efficiently exploit the information about speed available in RGC spike trains. Previous work has indicated that some approaches to motion estimation are more efficient than others, depending on the signal-to-noise ratio at the encoding stage (Potters and Bialek 1994). Speed estimates were obtained from a population of speed-tuned sensors created by low-pass filtering of RGC spike trains followed by delay and cross-correlation. This estimation approach relies on the assumption that the essential information about motion is given by alignment, after appropriate translation in time, of spike trains from different cells; cross-correlation is one measure of alignment. Various alternative measures of alignment exhibited similar speed estimation performance, suggesting that the cross-correlation measure extracted most of the information available and that the conclusions are likely to generalize to any motion sensing algorithm that compares the timing of responses in different cells. In addition, the dependence of speed estimates on the number and spatial arrangement of cells (Figs. 8–10) was consistent with optimal pooling of signals from all pairs of cells. However, the possibility that entirely different speed estimation procedures would yield more precise estimates cannot be excluded.

Even if readout is efficient, the “ideal observer” approach adopted here (e.g., Banks et al. 1987; Geisler 1989) carries important caveats for interpretation of visual system function. The readout procedure effectively incorporated several assumptions: a one-dimensional pattern of light intensity and a known axis of motion, trial duration, and relevant retinal area. An organism rarely if ever has access to such prior knowledge, and the visual system may be unable to exploit it. A full assessment of the retinal limits on motion sensing may require more realistic assumptions about how the signals are used downstream. For example, motion sensing may be more spatially localized than the readout approach adopted here. Interestingly, a simple model of speed estimate variability (*Eq. 3*) suggests a steep penalty for restricting the spatial extent over which speed is computed (Fig. 10).

Note that the readout approach used here is not intended as an explicit model of motion sensing in the brain. However, cross-correlation and filtering are essential elements in computational models of motion sensing (Adelson and Bergen 1985; see Clifford and Ibbotson 2003; Emerson et al. 1992; Reichardt 1961; Simoncelli and Heeger 1998; Watson and Ahumada 1985). For example, the input-output properties of Reichardt detectors and motion energy sensors are in some cases identical because they both rely on pairwise multiplication, or summing and squaring, of input signals filtered differently in time and space (Adelson and Bergen 1985). The approach used here can also be described both ways (see results).

Stimulus manipulations beyond those examined here (speed, direction, contrast polarity) in future experiments could be valuable. For instance, weaker stimuli (e.g., lower contrast) are likely to increase the optimal filter width by creating sparser spike trains, forcing longer temporal integration for faithful motion sensing (see Chichilnisky and Kalmar 2003). Spatially extended stimuli (e.g., moving textures) could improve the fidelity of speed estimates by simultaneously stimulating all cells in the region recorded. Different spatial patterns may also have different effects on response synchrony, the stimulus dependence of which is poorly understood. Finally, stimuli with time-varying rather than fixed speed, and corresponding time-varying speed estimates (Bialek et al. 1991), may more closely approximate natural behavior.

### Temporal structure of RGC motion signals

The elementary time scale of RGC speed signals is reflected in the optimal filter width for readout (∼10 ms), which roughly matched that in an earlier study of left-right direction estimation (Chichilnisky and Kalmar 2003). However, the earlier study did not test the possibility that finer timing precision sometimes observed in RGC spike trains (Berry et al. 1997; Reich et al. 1997; Uzzell and Chichilnisky 2004) could be exploited for fine-grained tasks such as speed estimation. A prediction from the present results is that synapses on motion-sensitive cortical neurons may temporally filter input spikes on the time scale of ∼10 ms. Of course, the brain may perform motion estimation with nonoptimal filtering.

Many models of visual processing implicitly assume a simplified model of the time structure of RGC spike trains, namely, that spikes are generated according to a Poisson process with a time-varying firing rate determined by the stimulus. The Poisson description is undoubtedly wrong in detail (e.g., it does not allow for temporal patterns in spike trains caused by refractoriness and bursting). However, the importance of temporal structure for downstream computations is controversial (see e.g., Shadlen and Newsome 1998; Victor 1999), and a Poisson approximation may suffice for understanding the limits on visual performance in some tasks (e.g., Dhingra and Smith 2004; but see J. W. Pillow, L. Paninski, V. J. Uzzell, E. P. Simoncelli, and E. J. Chichilnisky, unpublished data). In the present study, Poisson simulations, matched for the time-varying firing rate of RGC data, yielded systematically less precise speed estimates. Thus the intrinsic temporal structure of RGC spike trains is important for signaling speed information to the brain. Interestingly, exploiting the intrinsic structure did not require an elaborate decoding procedure—instead, merely filtering and correlating responses, effectively comparing the timing of responses in different cells, sufficed. Thus the conclusions are likely to generalize to any procedure that relies on comparison of responses in different cells.

### Correlated firing and the population code

A major open question in population coding is whether a sensory signal conveyed by the collective activity of many neurons can be understood based on sequential measurements from individual neurons or whether simultaneous recordings from multiple neurons are required. Significant departures from statistical independence observed in RGC spike trains (DeVries 1999; Mastronarde 1983; Meister et al. 1995) have been proposed as evidence for the importance of simultaneous recordings (Meister et al. 1994, 1995). Speed estimation with shuffled responses to enforce statistical independence indicated that covariation in responses of nearby RGCs does not fundamentally change retinal motion signals. As in the analysis of temporal structure, this conclusion is likely to apply to any downstream decoding procedure which compares responses of different cells. Thus recordings from single neurons may in some sense suffice for understanding the population code for motion.

However, there are three major caveats to this conclusion. First, the present results may not generalize to other tasks, and for any given task, the importance of correlated activity can only be examined rigorously using simultaneous recordings. Second, synchronized spikes were not treated differently from other spikes by the cross-correlation computation. It is conceivable that a decoding algorithm could explicitly exploit synchronized spikes to yield more precise motion sensing (Dan et al. 1998; Warland et al. 1997). The present results can be interpreted as showing that common input and noise to RGCs reflected in synchronized spikes is not important for speed estimation. Third, simultaneous recordings provide major technical advantages, such as revealing the spatial arrangement of receptive fields, response kinetics, and overall sensitivity, which can be compared reliably between cells in a population. Thus simultaneous recordings are crucial for examining the fidelity of visual signals—such as motion—that rely on comparison of activity in different cells.

### Retinal limits on motion signaling fidelity

The precision of speed estimates obtained by combining estimates from on and off parasol cells indicates that these cell types carry motion information that is nearly statistically independent, even when their receptive fields cover roughly the same retinal area. Because the on and off pathways diverge at the cone-bipolar synapse, a possible explanation is that the physiological noise that limits motion signal fidelity originates downstream of the photoreceptors. However, a more parsimonious explanation is that because of rectification downstream of the photoreceptors, responses in on and off cells are driven primarily by different temporal components of the elementary photoreceptor response, such as the hyperpolarizing and depolarizing phases respectively. Rectification could explain the observed independence because different components of the photoreceptor response exhibit independent noise (Schneeweis and Schnapf 1999; but see Schnapf et al. 1990).

Irrespective of its origin, the independence of speed estimates in the on and off pathways suggests that it would be advantageous for motion-sensitive neurons in the cortex to pool motion signals derived from on and off inputs. Such pooling occurs in single neurons of cat visual cortex (Sherk and Horton 1984).

### Relation to central motion sensing

To determine whether retinal signals impose the main limit on behavioral motion sensing, three experimental predictions from the present work could be tested. First, behavioral speed estimates using bars of matched intensity, contrast, eccentricity, size, and duration should exhibit a fractional variability of ∼1% (see Fig. 6). Second, fractional speed estimate variability should increase with speed according to the relation predicted from the temporal precision of RGC signals (see Fig. 6 and *Eq. 4*). Previously reported speed estimates in the periphery exhibited variability nearly an order of magnitude higher and a different trend with speed, but the stimuli used in those experiments differed considerably (McKee and Nakayama 1984), in particular, stimulus duration rather than extent of stimulus travel was fixed across speeds. Third, behavioral speed estimate SD should be lower for stimuli elongated in the direction of motion than for stimuli elongated orthogonal to the direction of motion (see Figs. 8 and 9) and should decline as the 3/2 and 1/2 power in these dimensions, respectively. This is qualitatively consistent with previous observations on human speed discrimination, though those experiments were performed with random-dot stimuli (Vreven and Verghese 2002). Note that the preceding predictions rely on the assumption that observers can integrate information efficiently over space and time. A rigorous test of this assumption, as well as all predictions about behavioral performance, will require psychophysical experiments using stimuli matched for intensity, contrast, eccentricity, size, and duration. Such experiments must also account for eye movements, which alter image velocity on the retina.

If cortical networks involved in sensing motion add noise or rely on few neurons, they could impose a limit to performance more severe than that imposed by the retina. In addition, the architecture of central motion readout could impose major limits. In the present work, stimulus speed was estimated using the peak of the activity in a collection of detectors tuned for speed (see Fig. 3*A*). However, the brain may use a different architecture. Several studies have indicated that behavioral readout from speed-tuned neurons in visual area MT may rely on the centroid of activity (vector average) rather than the peak (winner take all) (Groh et al. 1997; Lisberger and Ferrera 1997; Priebe and Lisberger 2004; but see Ferrera and Lisberger 1995; Nichols and Newsome 2002). The centroid computation yields unique speed estimates over a wide range of stimulus conditions, which may be desirable for controlling essentially unitary behavioral output such as an eye movement. The present results show that centroid readout can yield much less precise speed estimates. This may occur because all units contribute to the speed estimate, whereas in the peak computation only neurons with speed tuning near the true speed contribute (Seung and Sompolinsky 1993). Centroid readout exhibited performance more similar to peak readout at high speeds, a regime in which variability was high for both procedures. This suggests that different architectures may be appropriate in high and low signal-to-noise regimes (Potters and Bialek 1994). However, subsampling spike trains had little effect on the increased precision provided by peak readout.

The fidelity of visual motion signals has been examined most extensively in single neurons of area MT, which are tuned for stimulus direction and speed (Albright et al. 1984; Dubner and Zeki 1971; Maunsell and Van Essen 1983; Perrone and Thiele 2001) and are important for motion perception and behavior guided by motion (Newsome et al. 1985; Salzman et al. 1990). Direction discrimination based on responses of individual MT cells is on average comparable to the behavioral direction discrimination (Britten et al. 1992), suggesting efficient readout downstream. However, it is unclear how much information is carried by the entire population of MT cells covering a particular region of visual space (Britten et al. 1992; Zohary et al. 1994). In the present work, the mosaic organization of parasol cells provided strong evidence that nearly the entire visual motion signal available to the brain from the parasol population was recorded, over an area at least the size of a V1 neuron at the same eccentricity. Parasol cells project to the magnocellular layers of LGN (Perry et al. 1984; see Rodieck 1998), and lesion experiments indicate an important role of magnocellular neurons in motion perception (Merigan and Maunsell 1990; Schiller et al. 1990; see Merigan and Maunsell 1993; Van Essen 1985), suggesting an important role for parasol cells. However, at least 13 distinct RGC types project to the primate LGN and additional types project to subcortical targets (Dacey et al. 2003; see Rodieck 1998; Rodieck and Watanabe 1993). The coarse resolution of lesion experiments, the possibility that cell types other than parasols project to the magnocellular layers of the LGN, and the unknown role of RGCs that project to targets other than the LGN, leave open the possibility that other RGC types contribute to motion sensing. In spite of this caveat, it is clear that parasol cells provide a major component of the visual motion signals used by the cortex. Thus a direct comparison of the present results to behavioral speed estimation in matched experimental conditions may provide insights into whether retinal processing or downstream processing places the ultimate limit on motion sensitivity.

### Broader implications

Previous experimental studies of neural coding have focused largely on individual neurons, an approach that has significant limitations for systems in which information is represented by the spatiotemporal activity of a population of neurons. Examples include the encoding of position, color, and texture in the early visual system, object shape in the whisker barrel fields of somatosensory cortex, frequency modulation in auditory nerve activity, and prey location in the electrosensory system of electric fish. Several conclusions from the present study may extend to other systems. *1*) The statistics of spike trains define a natural time scale over which information is conveyed, which in turn predicts the time scale for synaptic integration in neurons that read out the population. *2*) Intrinsic structure in spike trains (such as refractoriness) may be used to convey more information than would be expected from time-varying firing rates alone. *3*) Even if correlated activity is clear and powerful, it may have little influence on downstream processing. *4*) The spatial arrangement of neural activity may influence the fidelity of the population code much more strongly than the √*N* improvements expected from averaging *N* inputs, and this dependence may be explained by timing variability of neural responses and optimal pooling. *5*) Different populations of cells may convey statistically independent information, thus providing benefits to pooling, even if their inputs are common. *6*) In addition to downstream noise, the architecture of downstream circuits can profoundly influence how efficiently the population code is read out. These principles may help to further elucidate the factors that limit the fidelity of other population codes and the readout strategies employed by the nervous system.

## GRANTS

This work was supported by La Jolla Interfaces in Science, National Institutes of Health MSTP Fellowship GM-07198, and a Merck Fellowship (E. S. Frechette); National Science Foundation Grant PHY-9988753 (A. M. Litke), and National Institutes of Health Grant EY-13150, Sloan Foundation Fellowship, and McKnight Foundation Scholars Award (E. J. Chichilnisky).

## Acknowledgments

We thank E. Callaway for providing access to tissue, W. Dabrowski, A. Grillo, P. Grybos, P. Hottowy, and S. Kachiguine for technical development, R. Krauzlis, F. Rieke, G. Field, S. du Lac, and D. Brainard for valuable comments on the manuscript, J. French and R. Kalmar for experimental assistance, and S. Barry and S. Bagnall for technical assistance.

## Footnotes

The costs of publication of this article were defrayed in part by the payment of page charges. The article must therefore be hereby marked “

*advertisement*” in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.

- Copyright © 2005 by the American Physiological Society