|
|
||||||||
1The Salk Institute and University of California, San Diego; and 2University of California, Santa Cruz, California
Submitted 15 November 2004; accepted in final form 27 December 2004
| ABSTRACT |
|---|
|
|
|---|
100 parasol RGCs simultaneously in isolated retinas stimulated with moving bars. To examine how faithfully the retina signals motion, stimulus speed was estimated directly from recorded RGC responses using an optimized algorithm that resembles models of motion sensing in the brain. RGC population activity encoded speed with a precision of
1%. The elementary motion signal was conveyed in
10 ms, comparable to the interspike interval. Temporal structure in spike trains provided more precise speed estimates than time-varying firing rates. Correlated activity between RGCs had little effect on speed estimates. The spatial dispersion of RGC receptive fields along the axis of motion influenced speed estimates more strongly than along the orthogonal direction, as predicted by a simple model based on RGC response time variability and optimal pooling. ON and OFF cells encoded speed with similar and statistically independent variability. Simulation of downstream speed estimation using populations of speed-tuned units showed that peak (winner take all) readout provided more precise speed estimates than centroid (vector average) readout. These findings reveal how faithfully the retinal population code conveys information about stimulus speed and the consequences for motion sensing in the brain. | INTRODUCTION |
|---|
|
|
|---|
Approaching these problems poses a major challenge: recording from the entire population of cells relevant for a behaviorally important sensory task. Although modern techniques allow recording from several dozen neurons simultaneously, in most experimental systems it is unclear how to target the cells responsible for a specific neural computation and record from the entire population. A system with unusual promise is the encoding of visual motion in the primate retina (Chichilnisky and Kalmar 2003
). Waves of activity in the population of parasol (magnocellular-projecting) RGCs carry information about visual motion to circuits in the brain responsible for motion sensing, and it has recently become possible to record from
100 parasol cells with receptive fields that tile almost completely a significant region of visual space (Chichilnisky and Kalmar 2002
; Litke et al. 2004
). Because parasol cells are not individually direction selective, visual motion information is carried by population activity. The fidelity of this population code places the ultimate limits on cortical motion processing and behavioral motion sensing, which have been examined extensively in monkeys and humans. Finally, the wave of activity traversing the retina is an elementary representation likely to be recapitulated in other sensory structures. For these reasons, the encoding of motion in the primate retina provides an opportunity to understand fully a behaviorally important population code and the problems faced by the brain in reading it out.
Here we focus on estimating the speed of a moving object from retinal responses, which is required for visually guided behaviors such as tracking eye movements and target interception. To study the fidelity of the population code, we estimated stimulus speed directly from the responses of
100 ON and OFF parasol RGCs simultaneously recorded, using an efficient procedure. We then examined how the precision of speed estimates depended on several aspects of the retinal representation: the number and spatial arrangement of cells, detailed temporal patterns of spiking, correlated activity between cells, noise in retinal circuits, and the relative efficiency and independence of signals in different cell types. We provide a theoretical framework to explain the observed speed estimate precision in terms of optimal pooling of RGC responses with a given temporal precision. Finally, we examine the limits to motion sensing that would be imposed by different readout architectures in the brain. Together, these results reveal how retinal processing and signaling limit the fidelity of visual motion sensing and how downstream structures can most efficiently exploit the retinal population code for perception and behavior.
| METHODS |
|---|
|
|
|---|
Eyes were obtained from two deeply and terminally anesthetized macaque monkeys (Macaca mulatta, M. radiata) used by other experimenters, in accordance with institutional guidelines for the care and use of animals. Immediately after enucleation the anterior portion of the eye and vitreous were removed in room light and the eye cup was placed in bicarbonate buffered Ames' solution (Sigma, St. Louis, MO) and stored in darkness at 3536°C, pH 7.4, for
20 min prior to dissection. Under infrared illumination pieces of peripheral retina 35 mm in diameter, isolated from the retinal pigment epithelium, were placed flat against a planar array of 512 extracellular microelectrodes, covering an area of 1,890 x 900 µm, that were used to record action potentials from retinal ganglion cells (Litke et al. 2004
). The preparation was perfused with Ames' solution bubbled with 95% O2-5% CO2 and maintained at 3536°C, pH 7.4.
Retinal eccentricity was measured with a precision of ±2 mm. Eccentricity was converted to a temporal equivalent value because the contours of constant RGC density (and thus presumably dendritic and receptive field size) in the macaque monkey retina are approximately semicircular in the temporal half of the retina, but elliptical with an aspect ratio of 0.61 in the nasal half (Perry and Cowey 1985
; Watanabe and Rodieck 1989
). Thus a location X mm nasal and Y mm superior (or inferior) to the fovea was assigned an equivalent eccentricity of [(0.61X)2 + Y2]1/2 A location X mm temporal and Y mm superior (or inferior) to the fovea was assigned an equivalent eccentricity of (X2 + Y2)1/2. Visual angle, A, in degrees, was computed from temporal equivalent eccentricity, E, in mm, using the relation A = 0.1 + 4.21E + 0.038E2 (Dacey and Petersen 1992
; Perry and Cowey 1985
). The temporal equivalent eccentricity (visual angle) of each of the three pieces of retina examined was: 9.7 mm (45°); 9.0 mm (41°); 8.4 mm (38°).
Voltage waveforms recorded from each electrode were digitized at 20 kHz and stored for off-line analysis (Litke et al. 2004
). Spikes were identified using a threshold equal to three times the typical noise level on each electrode, and spikes from different cells were segregated as follows (Litke et al. 2004
). For each recorded spike on the reference electrode, the waveform of the spike and the simultaneous waveforms on six surrounding electrodes in the array were used as a signature of the spike. These signatures were reduced to five dimensions using principal components analysis, and clusters in this space were identified by fitting a collection of N-dimensional Gaussian distributions using expectation maximization. Duplicate cells were identified by temporal coincidence. The accuracy of spike sorting was checked by verifying the presence of refractoriness 0.51.0 ms after the spike.
Data from ON and OFF parasol cells recorded from three preparations are presented in RESULTS. ON and OFF populations were analyzed separately, because of their different response kinetics (Chichilnisky and Kalmar 2002
). The following numbers of cells were analyzed: retina 1: 40 ON, 49 OFF; retina 2: 56 ON, 35 OFF; retina 3: 63 ON, 68 OFF. To exclude the possibility that a small number of unstable or subsampled cells would influence the results, a small number of additional cells with response properties differing substantially from other cells of the same functional type were identified and excluded as follows. Spike trains of all cells of the same type were aligned in time by circularly shifting by an amount equal to the location of the center of the receptive field divided by the stimulus speed. The inner product of the response of each cell with the mean response across all cells was computed. Cells for which the inner product was >2 SDs from the mean were excluded from further analysis (7, 12, and 10 cells in the 3 retinas examined).
Stimuli
The retina was stimulated with the optically reduced (2.9 mm diam) image of a cathode ray tube display refreshing at 120 Hz, focused on the photoreceptor layer by a microscope objective, and centered on the electrode array. Stimuli were attenuated to low photopic light levels using neutral density filters. Stimuli were presented as modulations around a mean gray background. The background photon absorption rate for the long (middle, short) wavelength-sensitive cones was approximately equal to the rate that would have been caused by a spatially uniform monochromatic light of wavelength 561 (530; 430) nanometers and intensity 9,200 (8,700; 7,100) photons·µm2·s1, incident on the photoreceptors.
RGCs were characterized and classified on the basis of their responses to a spatiotemporal white noise stimulus presented for 30 min (see Chichilnisky 2001
; Sakai et al. 1988
). The stimulus was a square lattice of randomly flickering pixels. Random flicker was created by selecting the intensities of the red, green, and blue display phosphors at each pixel location independently from a Gaussian or binary (2-valued) distribution on each stimulus frame. The light response properties of each cell were summarized by the average stimulus on the display over 250 ms preceding a spike (spike-triggered average, STA). The STA is a measure of how effectively stimuli at different locations and with different colors are integrated by the cell over time to control firing. The structure of each receptive field was measured by fitting the STA with a difference of elliptical Gaussians (center-surround) spatial profile, a difference of low-pass filters temporal profile, and a relative sensitivity to modulation of each phosphor. The product of these terms provided accurate fits to the space-time-color STA (Chichilnisky and Kalmar 2002
). The receptive field diameter was defined as the geometric mean of the lengths of the major and minor axes of the 1 SD ellipse of the center component of the fit to the STA. The mean receptive field diameters for the parasol cells in each of the three retinas recorded was 150, 128, and 109 µm, respectively.
Moving bars were presented in blocks of trials with constant speed; direction of motion (0, 90, 180, and 270°) and contrast (±96%) were randomly interleaved within each block. The spatial profile of the bar in the direction of motion was a Gaussian function with a SD of 97 µm. The spatial profile of the bar orthogonal to the direction of motion was uniform and covered the entire area recorded. The speeds (number of trials) probed in each retina were: 7.3°/s (110167 trials); 14.5°/s (144214 trials); 29.0°/s (232347 trials); 58.1°/s (338505 trials). Stimulus dimensions and speeds were converted to degrees using the approximation 200 µm/° for the peripheral macaque retina (Perry and Cowey 1985
).
The rasterization of the CRT display introduced a space-time sampled approximation of a moving bar. For example, a bar nominally moving at 58.1°/s (the highest speed tested) was in fact redrawn on the CRT every 8.33 ms displaced by 97 µm. The effect of this discretization was probably small. First, the refresh interval of the display was significantly shorter than the
60 ms excitatory portion of the parasol RGC impulse response (Chichilnisky and Kalmar 2002
). Second, the spatial displacement of the bar at the highest speed tested was 1 SD of the bar profile and smaller than receptive field diameter and separation of ON and OFF parasol cells (e.g., see Fig. 1A).
|
The maintained firing rate (mean ± SD across ON and OFF cells) during exposure to spatially uniform background light was: 5.7 ± 0.3 and 1.8 ± 1.4, 4.6 ± 2 and 0.2 ± 0.2, and 2.1 ± 1.6 and 1.8 ± 1.7 Hz in each of the three retinas, respectively. These values were low compared with 21 ± 9 Hz reported for magnocellular-projecting RGCs in anesthetized, paralyzed animals (Troy and Lee 1994
). The reason for the discrepancy is unclear. However, peak-evoked modulations were comparable to those observed in magnocellular-projecting cells recorded in vivo. The peak firing rate (mean ± SD across ON and OFF cells) elicited by bars moving at 14.5°/s, measured in 25-ms bins, was: 78 ± 17 and 95 ± 32, 80 ± 21 and 77 ± 23, and 95 ± 26 and 84 ± 30 Hz in each of the three retinas recorded, respectively. In a previous study (Kremers et al. 1993
), as the contrast of a 1.22-Hz squarewave modulation approached 100%, the peak firing rate (computed in 25 ms bins and expressed as an increment above an assumed maintained rate of 20 Hz) approached a maximum of
100 Hz. Because the Gaussian bar used in the present experiments enters the receptive field gradually and continues moving, it would be expected to elicit a somewhat smaller peak response, as was observed.
| RESULTS |
|---|
|
|
|---|
Extracting speed estimates from RGC population activity
The following four sections describe the foundation for measuring the retinal population code for motion and efficiently reading out the stimulus speed from measured spike trains.
Measuring the entire population code
A challenging step in understanding a sensory population code is obtaining simultaneous recordings from the entire collection of relevant cells. The principal signals used by the visual cortex to sense motion are thought to be conveyed by the morphologically defined ON and OFF parasol RGCs (Polyak 1941
), the axons of which project to the magnocellular layers of the lateral geniculate nucleus (see Merigan and Maunsell 1993
; Van Essen 1985
). The cell bodies of the ON and OFF parasol populations each form a regular mosaic with dendritic fields that tile the surface of the retina and thus uniformly sample visual space (Dacey and Brace 1992
). To examine parasol cell population activity over a region of visual space, multi-electrode recordings were performed in pieces of peripheral primate retina. Visual responses of several hundred isolated RGCs, with receptive fields collectively covering
5° x10° of visual angle, were recorded simultaneously using a 512-electrode system (Litke et al. 2004
). Analysis was restricted to two functionally defined cell types having receptive field tiling and density, spectral sensitivity, response kinetics and contrast gain that closely correspond to those of the ON and OFF parasol cells (Chichilnisky and Kalmar 2002
). These two cell types will be referred to as parasol cells in what follows.
An example of the ensemble activity elicited by a moving bar superimposed on a photopic background is shown in Fig. 1. Figure 1A shows the receptive field outlines of a mosaic of 56 ON parasol cells obtained with white-noise stimulation and reverse correlation (see METHODS), along with an image of a moving bar with a Gaussian intensity profile. The nearly complete mosaic of receptive fields provides strong evidence that in this region of retina, nearly every ON parasol cell was recorded, revealing the complete population code. Figure 1B shows, in raster format, the spike trains obtained from these cells in a single trial in which the bar drifted from left to right. As the bar crossed the receptive field of each cell, it elicited spikes in excess of background activity. The relative timing of responses in different cells reflects a wave of activity in the parasol cell population. This wave is the principal signal used by the cortex to sense visual motion.
Speed estimation
To probe how faithfully parasol RGCs signal visual motion, a procedure was developed to estimate the speed of the moving bar directly from the relative timing of responses in different cells. The procedure, described in this section, was then applied to quantify the precision of speed estimates across trials.
The concept behind the speed estimation procedure is that if all RGCs respond identically, then a translating stimulus should on average produce the same response waveform in each cell, shifted in time. Thus the evidence for movement at a particular speed is given by the degree of alignment of spike trains from different cells, after compensating for the response time shift expected at that speed (see Fig. 1). This concept can be implemented using the peak response in a collection of detectors tuned for different speeds. The output of each detector is based on cross-correlation (Reichardt 1961
), a central element of standard models of motion sensing, including motion energy algorithms that have been used to describe the responses of direction-selective neurons in visual cortex (Adelson and Bergen 1985
; Emerson et al. 1992
; Simoncelli and Heeger 1998
; Watson and Ahumada 1985
). Note, however, that this procedure is not intended to represent an explicit model of motion sensing in the brain (see DISCUSSION).
The procedure proceeds as follows (Fig. 2). Consider the case of two cells, A and B. A motion signal tuned for a particular speed is computed from their responses by delaying the spike train of one cell, smoothing both spike trains over time with a filter, multiplying the resulting signals pointwise to detect coincidences, and integrating the result over the duration of the trial. Specifically, let rA(t) and rB(t) represent the firing rate of each cell as a function of time during the trial. These are obtained by representing the spike trains at floating point resolution, convolving with a Gaussian filter f(t) = exp(t2/2
2), and sampling the result at intervals of
. Denote by
x the known separation of the receptive fields along the axis of motion, computed using the parametric fit to the receptive field profile of each cell (see METHODS). Then a motion signal indicating the evidence for movement at speed s is obtained by delaying the response of cell A by an amount
t =
x/s, multiplying pointwise by the response of cell B, and summing the result over all time points in the trial: R =
trA(t
t)rB(t). Note that rA is circularly shifted in timerather than cropped from a longer responseto match the length of rB (circular shifting provided a convenient and accurate approximation of an extended period of background activity before and after the response, to avoid having to record long periods of background activity between trials). Finally, to minimize potential bias due to spontaneous activity, a signal indicating the evidence for motion at the same speed in the opposite direction is created symmetrically, L =
trB(t
t)rA(t), and the net motion signal N is given by the difference, N = R L.
|
The net motion signal for a collection of cells was obtained by adding the net motion signals obtained from all distinct pairs. This pairwise computation is mathematically equivalent to an approach that measures the alignment of shifted responses from all cells, by summing, squaring, and integrating shifted responses over time (Chichilnisky and Kalmar 2003
). Specifically, the response ri(t) for the ith cell is delayed by an amount
ti = xi/s, where xi is the position of the receptive field along the axis of motion, yielding a right-shifted response ri(t
ti) and a left-shifted response ri(t +
ti). The net motion signal is N =
t [
i ri(t
ti)]2
t[
i ri(t +
ti)]2.
To illustrate how the procedure works, Fig. 1C shows spike trains from a single stimulus presentation delayed according to several speed tuning (putative speed) values, and the net motion signals for detectors tuned to these speeds. When the putative speed was near the correct speed (middle), the delayed spike trains were maximally aligned. Thus the detector tuned to the correct speed yielded the largest motion signal. Figure 3A shows the net motion signal as a function of speed tuning for a single stimulus presentation. The peak of this functionthe extracted speed estimatewas close to the true speed.
|
Measuring the precision of speed estimates
To quantify how faithfully the retina transmits information about speed, the variability of speed estimates across trials was examined. A histogram of speed estimates for one condition is shown in Fig. 3B, along with a Gaussian distribution with the same mean and SD.
In this case and most others examined, a Gaussian distribution provided a reasonable approximation. A test statistic (
2) was computed by summing the squared deviations of observed counts from those expected of a Gaussian distributed variable with the same mean and SD (Rice 1988
; p. 226) divided by the expected counts. In the null hypothesis of Gaussian-distributed speed estimates, the distribution of
2 is approximately chi-square with N 3 degrees of freedom, where N is the number of bins. Using a filter width
= 0.01 s,
2 was below the 99th percentile of the chi-square distribution in 83% of cases tested. Thus the accuracy and precision with which the population of RGCs signaled stimulus speed were reasonably well summarized by the mean and SD of the distribution, respectively. If
2 exceeded the 99th percentile of the chi-square distribution, the condition was excluded from certain analyses (a condition refers to ON or OFF cells in a particular retina, tested with a specific stimulus speed and contrast).
Speed estimation from real spike trains would be expected to exhibit random deviations from the true speed due to noise in phototransduction or retinal processing, but could also exhibit systematic errors. Figure 3C shows a histogram of the mean speed estimate minus the true speed (i.e., bias) expressed as a fraction of the SD, for all conditions examined. The mean of the distribution shown is 0.3, indicating a weak tendency to underestimate speed. In 85% of conditions examined, the ratio of the absolute value of bias to SD was <2, indicating that bias was on the order of the variability. Because the bias is small and because bias in principle can be compensated by downstream calibration, whereas trial-to-trial variability cannot, in what follows the SD of speed estimates will be taken as a measure of the fidelity of retinal speed signals and the bias will not be considered further.
Optimal temporal filtering for speed estimation
The temporal filter applied to spike trains to estimate speed (see Fig. 2) permits efficient detection of alignment in delayed spike trains while allowing for some spike timing jitter from trial to trial. Such filtering might be expected to occur in the synapses on to direction-sensitive neurons in the visual cortex and is an essential consideration for precise speed estimation. Although the optimal temporal filtering for left-right direction discrimination was determined in a previous study (Chichilnisky and Kalmar 2003
), a fine-grained task such as speed discrimination could in principle utilize much finer filtering. The remainder of this section shows that a filter width of
10 ms produced maximum speed estimate precision over the range of conditions examined, so a filter width of 10 ms will be used in sections that follow.
Optimal filtering was determined empirically, by finding the filter width that minimized the SD of speed estimates. An example is shown for one condition in Fig. 4A. A filter width of 15 ms minimized speed estimate SD; much narrower or wider filters produced SD values up to threefold higher. The optimal filter width was in the range of tens of milliseconds over a wide range of conditions. The
in Fig. 4B show the optimal filter width for all conditions examined, determined by computing the SD of speed estimates across trials as a function of filter width over the range 1100 ms, fitting the results with a polynomial, and extracting the minimum of the fit. Optimal filter width declined with stimulus speed to a minimum of
7 ms at the highest speeds probed (Chichilnisky and Kalmar 2003
). The dependence on speed was approximated by the function
s = 
+
/s, where s is the speed,
s is the optimal filter width for speed s, 
is the optimal filter width for asymptotically high speeds, and
is a constant.
|
Optimal filter width could be systematically overestimated by two experimental limitations: misestimation of receptive field locations due to spatial discretization of the stimulus and limited recording time, or discretization of the moving bar image in space and time due to temporal refresh of the display. These possibilities were tested by computing effective receptive field locations directly from responses to moving bars. Average responses across trials were used to determine delays between cells that resulted in maximum response alignment. These delays were multiplied by the stimulus speed to determine effective receptive field locations, which were then used for trial-by-trial speed estimation. The optimal filter width obtained with this procedure, shown with
in Fig. 4B, was similar to that measured using locations extracted from direct receptive field measurements, for all speeds tested. Using a filter width of 10 ms, the median ratio of the SD of speed estimates obtained with the modified and standard procedure was 0.98. These findings suggest that discretization and finite data effects had little effect on speed estimates or optimal filter width.
The optimal filter width can be used to infer the number of spikes from each cell that typically contribute to the elementary motion signal (Chichilnisky and Kalmar 2003
). If the interspike interval (ISI) is always much larger than the optimal filter width, optimal motion sensing preserves the distinction between sequential spikes and motion information is effectively conveyed by individual spike times. Conversely, if the ISI is always much smaller than the filter width, optimal motion sensing integrates over many spikes and motion information is effectively conveyed by variations in firing rate. The ratio of ISI to optimal filter width, accumulated across the period in each spike train when the bar overlapped the receptive field of the cell, is shown in Fig. 4C. The modal ratio was near unity: the median was 0.62, and 72% of values were <1. Although the ratio of ISI to optimal filter width spans a wide range, the concentration of values near unity indicates that optimal speed estimation typically requires integrating over one to a few spikes from each cell.
Efficiency of speed estimation procedure
The variability of extracted speed estimates accurately reflects the precision of retinal signals if and only if the estimation procedure efficiently extracts information about stimulus speed. To test the efficiency of the procedure, its performance was compared with four alternative approaches. The remainder of this section demonstrates that each alternative procedure exhibited speed estimate variability similar to or higher than the correlation procedure, consistent with the idea that the correlation procedure is efficient.
For each alternative procedure, as with the standard correlation procedure, the speed estimate was selected to maximize the alignment of spike trains, after delaying each spike train by an amount equal to the receptive field position along the axis of motion divided by the speed tuning of the detector. As alternatives to cross-correlation, four measures of alignment were tested, and the rightward motion signal was computed as follows.
FOURTH-ORDER CORRELATION.
Pointwise products of responses considered in groups of four. The shifted response vectors ri(t
ti) were summed pointwise, yielding m(t) =
iri(t
ti). The motion signal was given by
tm(t)p, with p = 4. This is a generalization of the multi-cell equivalent of the cross-correlation procedure (see preceding text), in which p = 2.
SEPARABILITY.
The fraction of the variance of a collection of responses explained by the first principal component. The shifted response vectors ri(t
ti) were placed in the rows of a matrix. The singular value decomposition was computed, yielding singular values {s1...sK}. The motion signal was given by s12/(s12 +...+ sK2).
ENTROPY.
Temporal dispersion of the summed responses. The shifted response vectors were summed, and the result m(t) =
iri(t
ti) was normalized to unit integral, n(t) = m(t)/
tm(t). The motion signal was given by the negative of the entropy of the result, i.e.,
tn(t) log2 n(t).
DISTANCE.
Summed pairwise difference in Euclidean distances between responses from different cells. The motion signal was
i
j||ri(t
ti), rj(t
tj)||, where || · || indicates Euclidean distance between vectors.
Leftward motion signals were computed analogously based on left-shifted response vectors ri(t +
ti), and the net motion signal was used for speed estimation as in the preceding text. Figure 5A shows the optimal filter width for each measure as a function of that for the correlation measure, across all conditions tested. In each case, the optimal filter width was similar to that obtained with the correlation measure. For each measure, overall speed estimate variability was obtained using the optimal filter width for that measure. Figure 5B shows the performance of each alternate procedure compared with that of the standard procedure. In all cases, alternate procedures exhibited speed estimate varibility similar to or higher than the standard procedure.
|
Precision of retinal speed estimates
The procedures in the preceding text provide a measure of how precisely the retina transmits speed information to the brain. Because this precision may depend on stimulus speeddue to the kinetics of RGC responses, spike train statistics, and accumulation of information over timespeed estimate variability was examined for a range of bar speeds.
Figure 6 shows fractional speed estimate variability (SD of estimates divided by true speed) as a function of speed, for all conditions tested. Each point represents data obtained from 35 to 68 ON or OFF parasol cells in one retina. Across the range of speeds examined, fractional speed estimate variability was on the order of 1% of the stimulus speed, increasing roughly in proportion to speed at the highest speeds tested.
|
Simple model of speed estimate precision
The trend in Fig. 6, as well as the dependence on the number and spatial arrangement of cells, can be understood in terms of the timing precision of RGC responses. This section provides a theoretical prediction for speed estimate variability based on the following assumptions. 1) Each RGC signals only the time of arrival of a stimulus at its receptive field. 2) Speed estimates from different cell pairs are combined optimally. 3) The variability of RGC timing signals is inversely related to speed. The derivation proceeds as follows.
Consider the simplest speed estimate obtained from two RGCs, each of which signals only the time of arrival of a stimulus at the receptive field. Assume the cells are separated by a distance
x, and stimulated with a bar moving at speed s. The time required for the bar to move from one cell to the next is
t =
x/s. If each RGC provides a noisy signal indicating the time of arrival of the stimulus, denote the time difference signal from the pair of cells by
t +
, where the noise
has SD denoted by
t. A simple speed estimate from the pair is: e =
x/(
t +
). The variability of e can be approximated by the SD of the response time difference multiplied by the absolute value of the derivative of the estimate with respect to the time difference (the approximation is valid for
t <<
t) (see Bevington and Robinson 1992
). Hence to first order, the speed estimate variability from the pair is
![]() | (1) |
ei = s2
t/
xi as in the preceding text. A speed estimate from the collection may be obtained by computing the weighted sum: epool = (
ei/
ei2)/(
1/
ei2). This weighting causes epool to have minimum variance,
pool2 = 1/
(1/
ei2), in the case of independent data (see Bevington and Robinson 1992
ei yields
![]() | (2) |
pool, depends on the number of cells and their spatial arrangement along the x and y dimensions, consider only the term that depends on the locations of the cells: S = 
xi2. Consider the case of a lattice of cells with density p filling a rectangular region of area xy, where x specifies the dimension along the axis of motion, and y the dimension along the orthogonal axis. The specific pairings of cells used for speed estimation influence
pool (see below). So, consider an optimal pairing rule in which the first pair consists of the two cells most widely spaced in the x dimension, the second pair consists of the next two most widely spaced cells (distinct from the first 2 cells), and so on (note that S is independent of the y coordinates). For a small increase
x in x, the number of cells added is py
x, and half as many cell pairs are added. By the pairing rule, each new pair consists of cells at both extremes along the x dimension, hence each pair produces an increment
xi2
x2 in the sum S. Therefore the increase in S is
S = x2yp
x/2. This yields
S/
x = x2yp/2; integrating with respect to x gives S = x3yp/6. Substituting the preceding yields
![]() | (3) |
xi2 = 1/p, multiplied by the total number of pairs, pxy/2. Hence
pool = s2
t(xy/2)1/2, which is a factor of x(p/3)1/2 higher than the value obtained with the pairing rule in the preceding derivation.
Finally, the timing variability
t would be expected to depend on parameters of the stimulus, such as stimulus speed. The inverse dependence of optimal filter width on speed (Fig. 4B) suggests a similar dependence for timing variability:
t = 
+
/s. Substituting above yields
![]() | (4) |
and 
fitted to the data, is shown in Fig. 6. The accuracy of the fit is consistent with the idea that speed estimate precision is governed by the timing precision of RGC responses. As will be shown in the following text, the same model also provides accurate predictions for the dependence of speed estimate precision on the number and spatial arrangement of RGCs. In summary, the results in Fig. 6 reveal the limits to behavioral speed estimation imposed by the population code in parasol RGCs, and are consistent with a simple model. What follows is an analysis of the factors that contribute to speed estimate precision and consequences for readout of the population code in the brain.
Retinal limits on speed estimation
Several major features of retinal processing may influence speed estimate fidelity. Correlated activity, known to be significant in adjacent RGCs of like type, may reflect common signal and noise and thus may influence performance. Timing structure of retinal spike trains may transmit motion information differently than expected from simple variations in firing rate. The number and spatial arrangement of cells would be expected to influence the fidelity of motion signals. Finally, ON and OFF parasol cells, with receptive fields that tile the same area of the visual world, may convey motion signals with different efficiency, and may exhibit redundancy due to common photoreceptor inputs. These contributions to the precision of retinal motion signals are examined in turn.
Correlated activity
Correlated firing at rates significantly higher than expected by chance has been described in pairs of nearby cells of like functional type in cat and rabbit retina (DeVries 1999
; Mastronarde 1983
); in salamander retina correlated firing has been proposed to be important for visual signaling (Meister et al. 1995
). Similarly, adjacent pairs of ON parasol cells and OFF parasol cells in primate retina fire synchronized spikes (±5 ms) at rates roughly twice that expected by chance in the recording conditions used here (Chichilnisky and Baylor 1999
). This synchronized firing, as well as other forms of response covariation between cells, could influence how precisely ensembles of RGCs transmit information about stimulus motion.
To probe the effects of correlated activity, the observed speed estimate variability was compared with the variability obtained from artificially shuffled ensemble responses consisting of spike trains from a different trial for each cell. This manipulation removes covariation, enforcing statistical independence between spike trains from different cells while preserving the response statistics of each cell. Figure 7 shows the speed estimate variability obtained with shuffled data as a function of the speed estimate variability obtained with unshuffled data, across all conditions tested. The data cluster near the identity line. Shuffled data displayed a statistically significant (P < 0.001, Wilcoxon signed-rank test) but very weak (median ratio: 0.96) tendency toward more precise speed estimates. In summary, eliminating correlated activity in RGC spike trains had very little effect on speed estimates.
|
Timing structure in spike trains
Many models of visual processing assume that information is communicated from retina to brain by the firing rates of RGCs, specifically, that RGC spikes are generated approximately independently of one another over time according to a Poisson process with a time-varying rate. A Poisson model fails to account for phenomena such as action potential refractoriness, and the non-Poisson intrinsic timing structure of RGC spike trains has been the subject of several recent studies (Berry et al. 1997To distinguish these possibilities, the speed estimate variability obtained with RGC spike trains was compared with the variability obtained from artificial spike trains generated by Poisson spiking with the observed time-varying rate. The artificial spike train for a given cell and trial was created by sampling spike times, with replacement, across all trials for that cell and stimulus. The number of spikes in the resampled spike train for each trial was on average equal to the number of spikes in recorded spike trains. Figure 7B shows the comparison of speed estimate variability obtained with real and Poisson spike trains for all conditions tested. The data lie systematically above the identity line, particularly for the lower fractional SD values. The median ratio of the SD obtained from resampled data to that obtained with the original data was 1.50. The higher variability obtained with resampled data could not be attributed to nonstationarity of responses over the course of the experiment, because the shuffling analysis of Fig. 7A did not produce such an effect. Thus the intrinsic timing structure of RGC spike trains allows them to convey stimulus speed information more faithfully than would be expected from a Poisson model of RGC spiking.
Spatial arrangment of receptive fields
Because motion is represented in a wave of activity in the parasol RGC population, the spatial arrangement of the cells used for readout could influence the fidelity of speed estimates extracted by the brain. For example, Fig. 8A shows the receptive field outlines of a collection of ON parasol cells, with receptive fields that clustered in a region of retina, and a collection of the same number of ON parasol cells (simultaneously recorded, partially overlapping), with more dispersed receptive fields. The distributions of speed estimates obtained from each of these ensembles in one stimulus condition are shown in the histograms. The variability of speed estimates obtained from the clustered cells was substantially larger than that from the dispersed cells. Pooled data in Fig. 8C for all such conditions tested (
) show the same trend.
|
). The similarity of the shuffled and unshuffled results indicates that response covariation did not account for the effect of spatial arrangement. An alternative possibility is that speed estimates obtained from cells more dispersed along the axis of motion are less sensitive to response timing jitter. To illustrate this possibility, consider the simple model in the preceding text in which speed estimate precision for a given cell pair is limited by how accurately individual RGCs signal the time of stimulus arrival. Distant cells are relatively less affected by response timing jitter (Eq. 1) because of the large temporal separation in responses to the moving stimulus.
To test for this possibility, the variation of speed estimates was examined using a stimulus that moved either in a direction that created large temporal separation of responses or in a direction that created small temporal separation of responses, using the same collection of cells, as shown in Fig. 9A. Results are shown for one example in B; pooled results are shown in C. Larger temporal separations resulted in more precise speed estimates (
). This was not affected by shuffling responses across trials (
).
|
These findings suggests that temporal separation of responses is the primary determinant of how spatial arrangement affects speed estimation. Note that the use of widely spaced cells in speed estimation implicitly assumes constant speed over the duration required for the stimulus to travel from one cell to the other; this assumption is valid in the present task but may not be for more natural stimuli (see DISCUSSION).
Number of cells
Large receptive fields of motion-sensitive neurons in extrastriate cortex (Albright and Desimone 1987
) may provide more accurate estimates of stimulus speed by integrating over many inputs. However, the benefits of such pooling depend on the spatial arrangement of input signals (see preceding text). To examine the potential benefits of pooling many RGC inputs for motion sensing, speed estimate variability was examined for subsets of recorded RGCs.
Figure 10A shows a collection of ON parasol cells as well as two subsets of this collection obtained by discarding cells sequentially, orthogonal to the axis of motion. Figure 10B shows speed estimate variability as a function of the number of cells in these subsets on a double logarithmic scale. As expected, the variability of speed estimates decreased with the number of cells. The steepness of this relation was estimated by fitting a line to data such as those in Fig. 10B and extracting the slope. Results accumulated across all conditions tested are shown in the histogram of Fig. 10C, which reveals slopes near 1/2. Figure 10D shows the dependence of variability on the number of cells pooled across all conditions tested; data have been normalized (vertically shifted) for each condition independently. These normalized data fall roughly on a common line, suggesting a lawful relationship between speed estimate variability and the number of cells used.
|
These trends can be understood quantitatively with the simple model. Because the SD of the pooled speed estimate declines as the 3/2 power of distance along the axis of motion and as the 1/2 power in the orthogonal direction (Eq. 3), the data in Fig. 10, AD, should exhibit a slope of 1/2 and the data in Fig. 10, EH, should exhibit a slope of 3/2, similar to the values observed. The regular dependence of speed estimate variability on cell number, along with the model, provides a basis for predicting the precision of speed estimates obtained with smaller or larger ensembles of RGCs.
Relative efficiency of ON and OFF speed estimates
ON and OFF parasol cells, which are primarily excited by increments and decrements of light respectively, may be specialized to signal motion more accurately for stimuli of matched polarity (positive contrast for ON, negative contrast for OFF). Furthermore, due to spatial, kinetic and contrast-response asymmetries (Chichilnisky and Kalmar 2002
), one pathway could convey to the brain higher fidelity speed estimates for both kinds of stimuli. Such asymmetries could determine how ON and OFF signals are used downstream for speed estimation.
The variability of speed estimates obtained from equal numbers of ON and OFF cells was compared using moving bars of positive and negative contrast. The three panels in Fig. 11 show the comparison for positive contrast stimuli, negative contrast stimuli, and matched contrast stimuli (positive for ON cells, negative for OFF cells). ON cells signaled the speed of positive contrast stimuli more faithfully than OFF cells, and OFF cells signaled the speed of negative contrast stimuli more faithfully than ON cells. This would be expected from response rectification elicited by nonmatched stimuli that strongly suppress firing. For matched polarity stimuli, ON and OFF cells exhibited similar speed estimate variability. Thus the circuits converging on ON and OFF parasol cells represent motion information with similar precision.
|
Statistical independence of ON and OFF speed estimates
Given that the ON and OFF parasol cells provide motion signals of comparable precision, cortical neurons may pool speed information transmitted by these populations to obtain faithful speed estimates. However, because ON and OFF cells sample the same region of space and thus receive inputs from the same photoreceptors, ON and OFF circuits may exhibit significant common noise. Such redundancy could limit or eliminate the benefits of pooling. The degree of redundancy in ON and OFF motion signals was examined by measuring the degree to which pooling ON and OFF signals reduced the variability of speed estimates.
Figure 12A shows the receptive fields of a group of ON cells and a group of OFF cells simultaneously recorded. These receptive fields covered approximately the same area of retina, therefore the two cell groups received inputs mostly from the same photoreceptors. Figure 12B shows histograms of speed estimates obtained from the two cell groups in one stimulus condition. A pooled estimate of speed from both populations may be obtained by taking a weighted sum of ON and OFF estimates with weights that minimize variance across trials in the case of independent data: sP = (sON
OFF2 + sOFF
ON2)/(
ON2 +
OFF2), where sON and sOFF represent the speed estimates from ON and OFF cells, and
ON and
OFF represent the SD of speed estimates across trials for ON and OFF cells, respectively (Bevington and Robinson 1992
).