## Abstract

Neurons in the fly lobula plate integrate motion signals over large regions of visual space in a directionally selective manner. This study is concerned with the details of this integration process. We used a stimulus consisting of a 4 × 4 lattice of locally moving Gabor patches, in which each patch could take any direction independently. We also presented only one patch at a time or two patches at a time. Across all possible directions of motion, the firing rate response r_{1+2} to two simultaneously presented patches was well described by r_{1+2}(d_{1}, d_{2}) = G × [r_{1}(d_{1}) + r_{2}(d_{2})] + S, where r_{1} and r_{2} are responses to individual patches moving in directions d_{1} and d_{2}, and G ∼ 0.81, S ∼ −23. However, this quasi-linear scaling expression failed to account for three main empirical observations: *1*) the directional-tuning curve for one patch is broader in the presence of another patch moving in the neuron’s preferred direction (PD); *2*) the vertical compression of this curve is greater when the second patch moves in the antipreferred direction (AD) as opposed to PD; *3*) the ability of the neuronal response to discriminate the direction of a patch is greater when the other patch is moving in the PD as opposed to AD, where this ability is assessed using both information theory and a standard discriminability index. To account for these departures from the simple scaling model, we used a normalization model very similar to one used for macaque area MT/V5. This model can qualitatively explain all three departures from the scaling equation described above, suggesting that a gain-control normalization network may be at work within the fly lobula plate.

## INTRODUCTION

The fly brain is an ideal system for studying biological visual motion processing. In this system, motion detection algorithms can be mapped onto neuronal circuitry in a manner that has not been possible in any other system (Borst and Egelhaaf 1994; Egelhaaf et al. 2002). Detection of front-end motion is implemented within a few synapses (Borst and Egelhaaf 1989, 1994), thus providing a flow-field map of local directional signals for subsequent processing at the level of the lobula plate (Egelhaaf et al. 2002). In this structure, local motion signals are selectively integrated by individual neurons over enormous regions of visual space (Hausen 1984; Krapp and Hengstenberg 1996, 1997) according to metrics that are relevant for naturalistic optic flow processing (Franz and Krapp 2000; Krapp 2000) and navigational control (Egelhaaf et al. 2002), similarly to what happens in the medial superior temporal (MST) area of macaque visual cortex (Andersen et al. 2000; Grossberg et al. 1999).

These optic flow–selective neurons, collectively known as lobula plate tangential cells, sit at the end of a processing pathway that starts at the level of the fly retina, projects to the lamina, from there to the medulla, and from the medulla to the lobula and lobula plate. Motion processing is already well developed at the level of the medulla (Douglass and Strausfeld 1995), although neurons in this structure do not pool signals over regions that extend as widely as in the lobula plate. It is within the latter structure that spatial integration becomes more immediately relevant to optic flow processing, and information about external motion is beautifully organized into well-defined anatomical and physiological modules (for review see Hausen 1984).

It is known that this integration process is not linear. For example, neuronal responses do not increase indefinitely as the size of an optimally moving stimulus is increased, but rather saturate at some plateau value (Hausen 1984). Moreover, the specific value of this plateau depends on stimulus velocity (Borst 1996; Haag et al. 1992). What is *not* known is whether some form of quasi-linearity holds once the system has been recalibrated by the conspicuous nonlinear phenomena mentioned above. For example, although it is known that the average response to two patches does not in general equal the sum of responses to each patch alone, is there a simple static transformation that maps such a sum onto the final response? This type of question is addressed in great detail in this report. We used stimuli that allowed a high degree of control over directional signals at different spatial locations within the neuron’s receptive field, further using them to obtain a detailed quantitative picture of how motion signals are integrated by spiking tangential cells in the lobula plate.

Our main finding is that integration is indeed very close to linear. The type of nonlinearity that underlies response saturation with increasing stimulus size may be described as a simple scaling operation that is static, in the sense that it acts only on the final summed output from the neuron and does not depend on the detailed pattern of flow signals within the receptive field. However, we also found that this is true to only a limited extent. We were able to identify some consistent features of integrated responses that did not conform to this quasi-linear description and required a more elaborate scheme inspired by gain-control models of middle temporal (MT) cortex (Heeger et al. 1996; Simoncelli and Heeger 1998). Based on these results and related simulations, we conclude that the fly lobula plate may share this type of recalibration circuitry with primate motion-selective structures.

## METHODS

### Electrophysiological recordings

We recorded extracellularly from various neurons—six V1, 11 H1, one V2/V3, and one H3—in 19 female blow flies (*Calliphora vicina*). There is one neuron for each of these types in the hemisphere of each animal and they are easily identified by their firing patterns and directional preferences (Hausen 1984). V1 prefers downward motion in the frontoparallel region of the visual field (Krapp and Hengstenberg 1997), H1 prefers back-to-front motion, V2/V3 prefers upward motion, and H3 prefers front-to-back motion (see Fig. 2*A*; refer to Hausen 1984 for review). Moreover, electrode insertions that target V1 are typically located more ventrally within the lobula plate than those targeting H1. The fly was immobilized with wax and positioned so that the monitor (ViewSonic PT795), driven by a VSG graphics card (Cambridge Research Systems) at 180 Hz, would cover roughly 78° × 74° (width × height; pixel resolution was 512 × 496) of visual angle to the left eye at a distance of about 13 cm from the animal (monitor was typically centered at an azimuth/elevation of roughly 40°/5°), and recordings were made in the contralateral (right) lobula plate using micromanipulated tungsten electrodes. As soon as a unit was isolated, its rough directional preference was determined using a wide-field, high-contrast sine-wave grating that optimally drove the neuron. Examples of the resulting directional-tuning curves are shown in Figs. 2*A*, 3*A*, and 4*A*. Recording sessions lasted 6 h and 24 min on average. The longest session lasted 12 h and 7 min, the shortest one 1 h and 52 min. Total recording time was 121 h and 41 min. The total recording time allocated to H1 was 59 h; that allocated to V1 was 51 h and 33 min.

### Preliminary mapping of receptive field

We used a vector white-noise reverse-correlation technique (Srinivasan et al. 1993), in which a large portion of the neuron’s receptive field is stimulated with a lattice of Gabor patches locally moving in random directions at a constant speed of 23°/s (Fig. 1). We mostly used 4 × 4 patches (as shown in Fig. 4*B*), only rarely 5 × 5 (as shown in Fig. 3*B*). Each patch subtended roughly 19° × 18° (100% contrast on a 38 cd/m^{2} mean luminance background, carrier spatial frequency 0.1 cycles/degree, SD of Gaussian envelope 4°) and moved for 220 ms in one of eight possible directions at cardinal and diagonal axes. Each presentation lasted for 6.6 s (30 different lattice samples). We used around 200 presentations per neuron. By correlating the random-motion sequence with the firing pattern (Srinivasan et al. 1993) we derived vector maps (e.g., Fig. 3*B*). More specifically, we treated each 220-ms motion frame as a vector map where the direction of each vector was determined by the direction of the corresponding patch. We then took the weighted vector average of all vector maps across the entire presentation, where weighting was determined by the spike-response sequence (i.e., weighting was proportional to the number of elicited spikes within each motion frame).

### Experiments with one or two patches

For 14 neurons, we chose highly responsive locations in the vector map and stimulated them both individually and in pairs with moving patches that were identical to those used for preliminary mapping. Examples of directional-tuning curves obtained using one patch are shown in Fig. 3, *D* and *F* and Fig. 4, *D* and *F*. Examples of tuning surfaces obtained using two patches are shown in Figs. 3*G* and 4*G*. We tested an equal number (11) of paired locations in our H1 and V1 samples.

### Double-pass protocol and theoretical maximum predictability

For 13 neurons (only partially overlapping with the 14 that were tested with one or two patches; see previous section), we presented the same white-noise sequence twice and measured the correlation between the two spike responses to the two sequences, which we term *R _{double-pass}*. Examples of responses obtained in this way are shown in Figs. 3

*C*, 4

*C*, and in the

*inset*above Fig. 5

*B*.

*R*can then be used to compute the maximum attainable predictability of the neuron’s response using the Spearman–Brown formula where

_{double-pass}*R*is the correlation coefficient between the neuron’s response (averaged between the two passes) and the prediction of the ideal model (Ahumada and Lovell 1971). The result of this expression is plotted as a solid line in Fig. 5

_{ideal}*B*. Notice that the double-pass method just described does not require explicit knowledge of the prediction from the ideal model. The preceding formula provides an estimate for the internal consistency of the experimental process under study (i.e., its ability to predict its own outcome given internal noise). This quantity also measures the predictive power of the ideal representation of the process because this ideal representation is in fact the process itself. The only assumption made here is that the double-pass correlation

*R*reflects the correlation of two quantities each having the same predictable part (which is both the process under investigation

_{double-pass}*and*the ideal model), but each having independent random parts (Ahumada and Lovell 1971).

### Discriminability measures

The first measure we used is the direction discrimination index (DDI), equivalent to the disparity discrimination index defined in Prince et al. (2002) where *r _{max}* and

*r*are the greatest and smallest responses on the measured tuning curve, and RMS

_{min}_{error}is the square root of the residual variance around the means across the whole tuning curve. All calculations were performed using the square root of firing rate (

*r*= ). When using this index, it is important that comparisons are made between estimates that are derived from the same number of empirical measurements (Prince et al. 2002). This was indeed the case for all comparisons made herein.

The second measure, termed pairwise mutual information (MI) between directions *d*_{1} and *d*_{2}, is defined as the difference between the total entropy across all directions and the mean noise entropy at the two selected directions (Rust et al. 2002) where *r* is the number of spikes in response to each stimulus presentation and *P* is probability. MI is related to the d′ measure used in signal detection theory (Green and Swets 1966).

### Modeling

We used four model neurons preferring upward, downward, leftward, and rightward motion. The directional-tuning response *f _{x}* for neuron

*x*is where Gaussian internal noise

*N*has mean μ = 0.3 and SD σ = 0.4, and

*d*is the preferred direction of neuron

_{x}*x*([

*r*]

_{+}=

*r*for

*r*> 0; [

*r*]

_{+}= 0, otherwise). These four neurons are the modeling analogue of the four neuronal types described previously (see Fig. 2

*A*). The final response from neuron 1 is and similarly for the other three neurons. We set

*k*= 1. This model has three free parameters (μ, σ, and

*k*).

## RESULTS

### Overall characterization of receptive field properties

Immediately after isolating a tangential cell, we measured directional tuning to a wide-field, high-contrast grating to determine the identity of the neuron. For example, H1 responds preferentially to back-to-front (rightward) motion as in Fig. 2*A* (solid black symbols and line) and H3 responds preferentially to the opposite direction of motion (solid gray symbols and line). Although tangential cells have an overall preferred direction of motion, it has been established that different regions within their receptive field may differ substantially in their directional preference (Krapp et al. 1998). For this reason, we then presented a vector white-noise stimulus consisting of a lattice of Gabor patches moving in random directions and changing direction every 220 ms (Fig. 1). By correlating the random sequence of presented directions with the spike response from the neuron, we could determine the preferred direction at each location within the receptive field. For example, the mock response pattern in Fig. 1 could be generated by a V1 neuron. V1 prefers downward motion over a large portion of its receptive field (Krapp and Hengstenberg 1997; see also open black symbols and dashed black line in Fig. 2*A*), thus responding most vigorously to the second motion frame in Fig. 1 where three nearby patches move downward.

A map of directional preference that was obtained using this procedure is shown in Fig. 3*B*. Arrow direction indicates preferred direction of motion and arrow length corresponds to the relative degree of selectivity for that direction. The vector map in Fig. 3*B* is typical of the neuron H1 (Krapp et al. 1997).

Several such maps are plotted on top of each other in Fig. 2*B* for all H1 neurons in this study and in Fig. 2*C* for all V1 neurons, allowing a direct comparison between these two neuronal types. Vector length has been normalized within each map with respect to the largest vector length in that map. Gray arrows show overall average vectors for the two sets of maps. The gray arrow in Fig. 2*B* shows that H1 prefers back-to-front motion. The direction of this arrow is at 10° off of rightward, perfectly consistent with previous maps of the region we sampled with our stimuli (Krapp and Hengstenberg 1997). The gray arrow in Fig. 2*C* shows that V1 prefers downward motion, again consistent with previous studies of receptive field structure in this neuronal type (Krapp and Hengstenberg 1997). Figure 2*D* plots vector length (*y*-axis) versus vector direction (*x*-axis) for all locations within all maps of both H1 (solid) and V1 (open), after subtracting the average vector directions shown by the gray arrows. Distributions for both quantities are shown *above* and to the *right* (solid bars for H1, open bars for V1), and clearly show that there was a high degree of similarity between H1 and V1 maps. For this reason, the results of our analyses on different neuronal types are plotted together in Figs. 5–7. However, we also performed separate statistical tests for H1 and V1 (reported in the relevant results sections). All our main conclusions from the overall neuronal sample hold true when tested separately on the H1 and V1 samples.

To estimate the repeatability of our vector white-noise experiment, we twice presented the same random sequence of patches. In Fig. 3*C*, the black trace shows the number of elicited spikes for each motion frame during the first presentation and the gray trace during the second presentation. It can be seen that the two traces follow similar patterns. The correlation coefficient between the spike responses to the two passes of the same random sequence (which is 0.5 for the short sequences shown in Fig. 3*C*) can be used to establish an upper limit on the amount of variance that can be accounted for by any model, including an ideal one that implements the physiological process with full fidelity (see methods and following text; see also Fig. 5*B*).

### Tests of linearity

After obtaining a vector map for the whole region covered by our stimulus, we focused only on the most responsive locations. By presenting a random sequence for only one patch, we could derive a directional-tuning curve for that location (Fig. 3*D*). Similarly, we derived a curve for another, nearby location (Fig. 3*F*) by presenting only one patch at that location. We expect these tuning curves to be consistent with the corresponding vectors in Fig. 3*B*, which is indeed the case. We then asked the following question: if we present both patches at the same time, will the response equal the sum of the responses to the individual patches? In other words, is the system linear? A similar question has been asked by previous investigators (Borst 1996; Haag et al. 1992; Krapp and Gabbiani 2005), but the range of locations and directions that they could test was limited. Here we can ask this question for all possible combinations of directions for the two patches, which map to a surface (Fig. 3*G*). In Fig. 3*E*, every value on the surface in *G* is plotted on the *y*-axis, against the sum of the responses to the two corresponding directions for the individual patches on the *x*-axis (obtained from Fig. 3, *D* and *F*). If the neuron responded in a linear fashion, data points would fall on the solid unity line. Clearly this is not the case: the response to two patches is smaller than the sum of the individual responses, in line with previous literature on the effect of stimulus size (Hausen 1984; Krapp and Gabbiani 2005).

Although the data points in Fig. 3*E* do not fall on the unity line, they do appear to fall on a line of slope 0.75 and intercept −18, indicated by the gray linear fit. The correlation coefficient for this fit is 0.97. It would then appear that the response to two patches can be quite accurately predicted by r_{1+2} = G × (r_{1} + r_{2}) + S with G = 0.75 and S = −18, for all possible directions of the patches. Similar data are shown for a V1 neuron in Fig. 4. In this case r_{1+2} = 0.99 × (r_{1} + r_{2}) − 45 (with a correlation coefficient of 0.95), indicating that the departure from full linearity is almost solely subtractive (Fig. 4*E*). Figure 5*A* plots G (gain or slope) versus S (shift or intercept) for all tests of this sort carried out in this study (25 in 14 neurons). There is a clear negative correlation between G and S (*r* = −0.8; G = −0.00623 × S + 0.657), implying that (with some approximation) the different tests lie along a one-dimensional continuum. With two exceptions (points in the *bottom right* region of the plot; see paragraph below), shifts were negative (averaging −23 spikes) and gains were between 0.6 and 1 (averaging 0.81). The average correlation coefficient for all linear fits was 0.93 (0.97 for H1 sample alone and 0.91 for V1).

The two points at the *bottom right* of Fig. 5*A* are interesting because they depart from all other measurements. These two points, together with the double-starred point at *top left*, come from the V1 neuron in Fig. 4. The double-starred point refers to the example detailed in Fig. 4 and therefore reports slope and intercept for the linear fit in Fig. 4*E*, which refers to the interaction between the two circled positions in Fig. 4*B*. The two departing points encircled in Fig. 5*A* refer to the interaction between each circled position and the starred position in Fig. 4*B*. In other words, integration of local signals at the two circled positions in Fig. 4*B* falls within the population pattern and shows a large subtraction (negative intercept), whereas the interaction of either circled location with the starred location in Fig. 4*B* shows a positive shift in spike response, accompanied by a pronounced multiplicative gain compression (slopes ∼ 0.5). Figure 4*I* plots the directional-tuning curve obtained by stimulating only the starred position in Fig. 4*B*. It can be seen that this tuning curve is indeed rather unusual: this region of the receptive field is very responsive (in fact, more responsive than the other two regions; compare overall firing rates), but directional preference is poor in that more than one direction can drive the neuron (thus the shorter vector length in Fig. 4*B*). This may be the reason for the pronounced departure of the corresponding points in Fig. 5*A*. The odd shape of the directional-tuning curve in Fig. 4*I* may result from convergence of two regions with directional preference on opposite sides of the overall preferred direction for the neuron, as suggested by its bimodal profile.

The experiments described so far were performed using patches of 100% contrast. Stimulus contrast is known to be relevant to spatial summation: whereas responses to high-contrast stimuli show a compressive nonlinearity as stimulus size is increased (as mentioned earlier), the nonlinearity becomes expansive when contrast is low [Borst 1996; related results have been obtained in macaque MT by Heuer and Britten (2002)]. We attempted to perform our measurements at contrasts as low as 5%, but found that the S/N ratio for responses to our stimuli was prohibitively poor below 20% (which is already quite high for neurons such as H1 and V1). We then carried out one extensive test with single and double patches at 20% contrast on a V1 neuron: G and S for the linear fit were 0.14 ± 0.13 and 41 ± 15 (we did not plot this point in Fig. 5*A* because it falls outside the main range) with a correlation coefficient of 0.14. The shallow slope G (and large error on it), the low correlation coefficient, and the fact that the baseline firing rate for this neuron was 42 Hz (very similar to the intercept value S for the fit) all indicate that the S/N was already extremely poor even for 20% contrast. The same neuron tested with high-contrast stimuli returned a correlation coefficient of 0.97 for the linear fit. We plan to investigate the low-contrast range more thoroughly in future experiments.

### Predictive power of scaling (quasi-linear) model

To obtain an independent assessment of the viability of a simple scaling model as described by r_{1+2} = G × (r_{1} + r_{2}) + S, we can use the vector map to predict responses to a novel white-noise sequence in the assumption that this expression is valid, and compare the prediction against the actual response obtained from the neuron. For the short traces in Figs. 3*C* and 4*C*, the correlation coefficients between observed and predicted responses are 0.5 and 0.58. Are these figures satisfactory? To answer this question we need to establish what their values would be for the ideal predictor, keeping in mind that perfect prediction (i.e., a correlation coefficient between observed and predicted response of 1) cannot be achieved even for such an ideal device. The reason for this is that neurons are inherently noisy, and the ability to predict their response cannot do away with such internal noise. The noisier the neuron, the lower the predictive potential for any model.

The noisiness of a neuron is related to the correlation coefficient between its responses to the same stimulus presented twice, which we term double-pass correlation. If the neuron were deterministic, its response would be the same on both presentations of the same stimulus, yielding a double-pass correlation coefficient of 1. Because of internal noise, this coefficient is 0.63 on average for our sample (0.61 for H1 alone, 0.68 for V1). For a given double-pass correlation, it is possible to compute the correlation coefficient between the neuron’s response and the prediction of an ideal model (see methods). This is an upper limit for the predictive power of any model and is indicated by the solid line in Fig. 5*B*, which plots the correlation coefficient between model prediction and neuronal response (average of two passes) on the *y*-axis against double-pass correlation on the *x*-axis. It can be seen that the scaling model does reasonably well, in that data points fall only slightly below the solid line. On average, the predicted correlation accounts for 80% of the range defined by the upper limit on predictability.

### Departures from the scaling model 1: gain

If two patches are presented within the neuron’s receptive field and the direction d_{2} of one patch is kept fixed, from the scaling model we have: r_{1+2}(d_{1}) = G × r_{1}(d_{1}) + k, where k is constant for all values of d_{1} (but depends on d_{2}). In other words, the directional-tuning response to one patch should be the same regardless of the direction of the other patch except for an upward or downward shift, and should be scaled by a gain factor G compared with the tuning curve obtained when only that patch was presented. This can be easily checked by taking slices through the response surface to two patches along lines corresponding to different directions of one patch, as shown in Fig. 3, *H* and *J*. Close inspection of the curves in Fig. 3*H* shows that the prediction of the scaling model is violated: the black directional-tuning curve, corresponding to different directions of the patch plotted on the *x*-axis in Fig. 3*G* for a preferred direction of the patch plotted on the *y*-axis, spans a wider response range than the gray curve, which is for the patch on the *y*-axis moving in the antipreferred direction (preferred and antipreferred directions are defined in relation to the full-field tuning curve in Fig. 3*A*, but they match those for the individual patches as evident in Fig. 3, *D* and *F*). For example, the black curve in Fig. 4*H* is scaled by 0.76 with respect to the single-patch curve in Fig. 4*D*, whereas for the gray curve this factor is 0.65. For the other patch, it is 0.75 for the black curve in Fig. 3*J* compared with Fig. 3*F*, but 0.49 for the gray curve. Similar effects are shown for the V1 neuron in Fig. 4. The black curve in Fig. 4*H* is actually expanded by a scaling factor of 1.12 with respect to the single-patch tuning curve in Fig. 4*D*, but the gray curve is scaled down by 0.52. In Fig. 4*J*, the black curve is expanded by 1.2 with respect to the single-patch curve in Fig. 4*F*, whereas the gray curve is compressed by 0.82.

Figure 6*A* plots gain values for tuning curves obtained from one patch while the other patch was moving in either preferred (*y*-axis) or antipreferred direction (*x*-axis). The scaling model predicts that all points should fall on the solid unity line. However, it can be seen that they tend to lie above it, implying that the preferred gain is larger than the antipreferred gain. This effect is highly significant (paired *t*-test, *P* < 10^{−7}), even when restricted to the H1 (*P* < 10^{−4}) or V1 (*P* < 10^{−3}) subpopulations. The mean values for preferred and antipreferred gain are, respectively, 0.85 ± 0.03 and 0.66 ± 0.03 (±SE).

### Departures from the scaling model 2: width

A second prediction of the scaling model is that not only the gain for the tuning curve of one patch should be independent of the direction of the other patch, but also the width of the tuning curve [where width is appropriately defined as half-width at half-height after rescaling the curve to its maximum and minimum values; see McAdams and Maunsell (1999) for why this is the correct way to calculate width]. Moreover, this width should be the same as the width obtained when the tuning curve is measured for that patch alone. This is apparent from r_{1+2}(d_{1}) = G × r_{1}(d_{1}) + k. Figure 3*H* shows that this prediction is sometimes violated, in that the width of the black curve appears to be wider than that of the gray curve, and also wider than the width of the single-patch tuning curve in Fig. 3*D* (for this particular example, half-width at half-height is 103° for the black curve and 49° for the gray curve in Fig. 3*H*, and 61° for the curve in Fig. 3*D*).

Figure 6*B* plots width values for tuning curves obtained from one patch while the other patch was moving in either preferred (*y*-axis) or antipreferred direction (*x*-axis), both after subtracting the width value of the corresponding single-patch tuning curves. To provide cross-validation of the result, width was assessed using two different measures: the SD of a Gaussian fit (open) and a direct measurement of half-width at half-height (solid). The Gaussian estimate has been multiplied by √2 × log(2) to be comparable to the half-width at half-height estimate (Graham 1989). The scaling model predicts that all points should fall around the origin. Instead they tend to fall above the solid unity line, implying that the preferred width is larger than the antipreferred width (paired *t*-test: *P* < 10^{−3} for solid points, *P* < 0.02 for open points). The preferred width is also larger than the single-patch tuning width by 11° (solid) and 15° (open) on average, and this effect is significant [paired *t*-test: *P* = 0.01 (solid), *P* = 0.001 (open)]. The antipreferred width is instead smaller than the single-patch tuning width by 3.4° on average for the solid points and larger by 1.6° for the open points, although both effects are not significant [*P* = 0.37 (solid), *P* = 0.72 (open)]. There is also a weak positive correlation between preferred and antipreferred width differences: correlation coefficient is 0.48 for solid symbols and 0.26 for open. Mean absolute width values were 70 ± 2.5 for single-patch tuning width, 81 ± 3.1 for preferred width, and 66.5 ± 3.6 for antipreferred width. Corresponding values for the Gaussian fit were 75.6 ± 2.9, 90.8 ± 3.4, and 77.2 ± 4.9.

When analysis is restricted to the H1 sample, all the effects described above remain statistically significant at *P* < 0.05. For V1, preferred width is larger than antipreferred width, but this effect does not reach statistical significance for either width metric (Gaussian fit or half-width at half-height). However, a one-tailed *t*-test for preferred width greater than single-patch tuning width returns *P* < 0.05 when using the Gaussian fit metric, providing at least partial evidence for our main result in the V1 population alone.

To summarize the consistently significant effect, the width of the tuning curve at a given location is broadened by the presence of another patch moving at PD at a different location, but unaffected when this other patch is moving in the AD.

### Departures from the scaling model 3: mutual information and discriminability

Greater gain does not necessarily correspond to better discriminability. The potential of a neuronal spike response for discriminating between two directions depends on both the difference in response to those two directions and the variability associated with those responses. Suppose a neuron fires at r_{1}±σ Hz in response to stimulus 1 and at r_{2}±σ Hz in response to stimulus 2. If the neuron responds at r_{x} Hz in response to an unknown stimulus x, how reliably can the neuron determine whether x = 1 or x = 2? The reliability for such discrimination is related to (r_{1} − r_{2})/σ, and it is this type of metric that we are interested in. For cross-validation, we used two measures of discriminability, one borrowed from signal detection theory [disparity discriminability index (DDI)] and one from information theory [mutual information (MI)]. These two measures are clearly related (see methods).

We computed these values for the scaling model applied to our data set. The average DDI for discriminating the direction of a patch presented individually is 0.41 ± 0.02. When there is another patch moving in the neuron’s preferred direction, the scaling model predicts a drop in discriminability to 0.35 ± 0.01 (highly significant difference, *P* < 10^{−12}); when it is moving in the antipreferred direction, it drops to 0.37 ± 0.02 (highly significant, *P* < 10^{−7}). The difference between the latter two values is also highly significant (*P* < 10^{−4}), meaning that the scaling model predicts better DDI for one patch when the other one is moving in the antipreferred direction for the neuron, rather than in the preferred direction. This analysis is confirmed by mutual information. Average MI for a patch presented individually is 0.44 ± 0.04. This value drops to 0.35 ± 0.03 and 0.37 ± 0.03 when a second patch is moving in preferred and antipreferred directions, respectively. Both decreases are highly significant at *P* < 10^{−7}, and their difference also reaches significance at *P* = 0.03 on a paired *t*-test. In conclusion, the scaling model predicts that discriminability should drop when a second patch is added, and that it should drop even more when the second patch is moving in the preferred as opposed to antipreferred direction (in Fig. 7, it predicts that points should lie below the unity line). As detailed below, the latter prediction is opposite to what is observed experimentally.

Figure 7 plots discriminability values for tuning curves obtained from one patch while the other patch was moving in either the preferred (*y*-axis) or antipreferred direction (*x*-axis), both after subtracting the discriminability value of the corresponding single-patch tuning curves. Solid black symbols refer to MI between preferred and antipreferred directions for the patch used to derive the tuning curve (not the other patch, the direction of which defines whether the MI value is on the *x*- or *y*-axis). Open symbols refer to average MI between preferred direction and the two immediately nearby directions (45° apart), which we will term MI_{side}. Solid gray symbols refer to DDI. The scaling model predicts that points should fall to the left of the origin and below the unity line. Instead they tend to fall above the solid unity line, implying that preferred discriminability is larger than antipreferred discriminability [paired *t*-test: *P* = 0.001 for solid black symbols (MI), *P* < 10^{−3} for open points (DDI); not significant for MI_{side}, *P* = 0.3]. Preferred discriminability is smaller than single-patch discriminability by 0.09 (MI) and 0.026 (DDI) on average (0.023 for MI_{side}), but this effect is significant only for MI (paired *t*-test, *P* < 10^{−2}) and MI_{side} (*P* = 0.04), not for DDI (*P* = 0.125). Clearly, antipreferred discriminability is also smaller than single-patch discriminability by 0.14 (MI) and 0.064 (DDI) on average (0.016 for MI_{side}), and this effect is significant for both MI (*P* < 10^{−3}) and DDI (*P* < 10^{−3}), but not for MI_{side} (*P* = 0.23). There is also a strong positive correlation between preferred and antipreferred discriminability differences: correlation coefficient is 0.92 for MI, 0.91 for DDI, and 0.88 for MI_{side}. Mean absolute discriminability values were 0.44 ± 0.04 for single-patch MI, 0.35 ± 0.03 for preferred, and 0.30 ± 0.03 for antipreferred. Corresponding values for DDI were 0.41 ± 0.02, 0.39 ± 0.02, and 0.35 ± 0.02.

The same effects for MI and DDI remain statistically significant (*P* < 0.05) when analysis is restricted to either V1 or H1, including the fact that preferred discriminability is smaller than single-patch discriminability only for MI (*P* = 0.03 for H1, *P* = 0.02 for V1) but not for DDI (*P* = 0.2 for H1, *P* = 0.1 for V1).

To summarize the consistently significant effects, preferred discriminability is greater than antipreferred discriminability, and the latter is also smaller than single-patch discriminability. These results are not predicted by the scaling model.

### Gain-control normalization model

Neurons in the middle temporal (MT) area of macaque visual cortex present a high degree of selectivity for visual motion (Born and Bradley 2005), as well as retinal disparity (DeAngelis et al. 1998). In an area immediately adjacent to MT, called MST, neurons have larger receptive fields and are selective for optic flow patterns such as expansion and rotation (Andersen et al. 2000). MST neurons may be regarded as the best-known macaque approximation to tangential cells in the fly lobula plate. Human homologues of MT and MST have been identified using functional MRI (Huk et al. 2002).

Various aspects of MT physiology are explained by a normalization model where the output of each motion-selective neuron is divided by its own output and the outputs of all other motion-selective neurons (Heeger et al. 1996). Figure 8 sketches this model as it would translate to the fly lobula plate. In this figure, the output of the H1 neuron (whose receptive field is cartooned on the *left*) is divided by the output of all other motion-responsive neurons in the lobula plate selective for all directions. The implementation presented herein uses four neurons, each with a rectified sinusoidal directional-tuning curve. The preferred directions of the four neurons are at 90° away from each other: upward, leftward, downward, and rightward (as in Fig. 2*A*). The output of each neuron is normalized by its own output plus the output of the other three neurons (as described in methods). This is a standard implementation of well-established MT computational models (Heeger et al. 1996; Simoncelli and Heeger 1998).

Figure 9*A* shows tuning curves averaged over all our measurements. The tuning curve in gray is for one patch presented in isolation (“Neutral”), the black solid curve is for the same patch while another patch is moving in the neuron’s preferred direction, and the black dotted curve is for the same patch while the other patch is moving in the antipreferred direction (see stimulus depictions on the *left*). As already discussed in relation to Fig. 6*A*, the gain for the black solid curve is larger than that for the dotted curve. This is shown again here by the solid symbols in the *top inset* labeled “Gain,” which show average gain values for preferred (P on *x*-axis) and antipreferred (A on *x*-axis) directions. As already discussed in relation to Fig. 6*B*, the width of the solid black curve in Fig. 9*A* is also larger than the width of both gray and dashed curves. This is shown again here by the solid symbols in the *middle inset* labeled “Width,” which show average width values for neutral (N) as well as preferred (P) and antipreferred (A) directions. Circles refer to width estimates from Gaussian fits, squares to direct estimates of half-width at half-height. Finally, as already discussed in relation to Fig. 7, the discriminability for the black solid curve is larger than that for the dotted curve, but both are smaller than discriminability for the gray curve. This is shown again here by the solid symbols in the *bottom inset* labeled “Discriminability.” Circles refer to DDI estimates, squares to MI estimates. Gray symbols show predictions for the scaling model.

Figure 9*B* shows simulated tuning curves from the normalization model. Although there are slight quantitative differences, there is a remarkable degree of similarity between the two panels. The normalization model manages to capture all thee main features of the data that were not consistent with the scaling model. This can be verified more directly by inspecting the three *insets* in the middle of the figure, where open symbols report values for the model (notice the difference in scale for model discriminability values). Although it is true that the normalization model has more free parameters than the simple scaling equation, it is nevertheless remarkable that this model manages to simultaneously capture all three departures from scaling. We attempted to find a set of parameters that would further improve the fit, but found that, although we could optimize our fit to almost perfect predictability for each factor (gain, width, or discriminability) individually, we failed to simultaneously fit all three factors optimally. For this reason, the example shown in Fig. 9*B* is meant only to demonstrate that the general trends observed in the data can result from simple circuitry such as that shown in Fig. 8. Further modeling efforts (and possibly the addition of other free parameters) will be necessary to provide a closer approximation.

## DISCUSSION

### Viability of a simple scaling model (linear fit)

The simplest possible description of the combined response to two patches is a quasi-linear combination of the responses to the two individual patches of the form r_{1+2}(d_{1}, d_{2}) = G × [r_{1}(d_{1}) + r_{2}(d_{2})] + S. This equation fits our entire data set with an average correlation coefficient of 0.93. The dimensionality of this relationship (which is 2, G and S) may be reduced (with some approximation) to 1 by the fact that G = −0.00623 × S + 0.657 with a correlation coefficient of 0.8 across our entire sample (Fig. 5*A*). Given these figures, as well as the simplicity and robustness of this simple scaling equation, it would not be unreasonable to claim that integration of motion signals across the receptive fields of the H1 and V1 neurons is very close to linear.

The question of whether the nonlinearity involved in spatial integration is a simple static one (like that described by the equation in the previous paragraph) or a more complex one (like that generated by the normalization model) is not purely notional. This question bears directly on whether spike-triggered averaging techniques are applicable in the context of spiking tangential cells. The characteristics of the front-end filtering stage can be recovered using these techniques only if the system acts like a matched filter followed by a static nonlinearity. If the nonlinearity is of a more complex nature, the spike-triggered average does not return a faithful representation of front-end filtering (Marmarelis and Marmarelis 1978). In this study, a double-pass technique was used to validate the matched filter model. Although this model does not capture the entire response pattern generated by tangential cells, it is not far from providing a full account (see Fig. 5*B*). This is to say that, although we did identify response features that required more complex modeling schemes, the bulk of the processing performed by tangential neurons can be largely accounted for by a simple matched filtering stage followed by a static nonlinearity (regardless of the mechanism that generates this nonlinearity).

It should be emphasized that the quasi-linearity we observed in our measurements applies only to the firing response range elicited by our stimulus patches, and may not extend to other response regimes. Our measurements with one or two patches spanned a response range from 15% below resting rate to about twice resting rate (see Fig. 9*A*), for an average resting rate of 42 Hz (SD 13) for our entire sample. In contrast, the maximum response that we could elicit using 100% full-field gratings averaged 298 Hz across our sample, with some neurons reaching peaks of 500 Hz (the least-responsive neurons still reached 137 Hz in response to full-field stimulation). This means that, at best, we targeted approximately 1/3 of the overall neuronal response range. Our conclusions apply only within this range.

A second caveat in interpreting our findings results from the way in which we selected subregions within the receptive field for testing. As explained in methods, we chose regions associated with large net motion vectors. In one instance (Fig. 4*B*, starred region) we tested a region with a small motion vector. Although the corresponding points in Fig. 5*A* follow the same relationship between G and S observed for other regions (G = −0.00623 × S + 0.657), they nevertheless fall outside the main cluster (two circled points in Fig. 5*A*), suggesting that different patterns may be observed for regions that are selected using criteria other than the one we adopted in this study. It is conceivable that responses to a full sample of all combinations of motion patches at all regions would not be well characterized by a quasi-linear scaling equation. Our results with the double-pass method partly argue against this possibility because they show that a scaling model achieves satisfactory predictability of neuronal responses to full-motion lattices (see Fig. 5*B*). However, two-patch stimulation may yield different results for regions with small net motion vectors (undersampled by our selection criteria).

Finally, it should be pointed out that all analyses in this study were carried out under the assumption that neuronal adaptation did not play a major role in determining spike responses to our stimuli. It is well known that tangential cells adapt to moving stimuli (Harris et al. 2000; Maddess and Laughlin 1985), even at short temporal scales [Fairhall et al. 2001; see Borst et al. (2005) for an explanation of fast adaptation based entirely on the Reichardt model]. It is also known that responses to stimuli presented at a given subregion within the receptive fields of both H1 and V1 can be affected by previous adaptation to stimuli presented at a different subregion (Neri and Laughlin 2005). These adaptive phenomena are clearly relevant to the present study, but we assume here that they operate on longer timescales than those spanned by our protocols. In the future, we hope to understand more clearly the potential relationship between adaptation and spatial integration.

### Failures of the simple scaling model

Close inspection of the data reveals three main features that are not consistent with this simple model. In the scaling equation above, the surface response obtained by simply summing responses to the two patches, [r_{1}(d_{1}) + r_{2}(d_{2})], is multiplied by a single scaling factor G. This means that the tuning curves obtained by slicing the surface at given values of either d_{1} or d_{2} should be identical, except for a vertical shift along the response axis. In other words, the directional-tuning curve for one patch should be independent of the direction of the other patch, and should differ from the tuning curve obtained for that patch presented in isolation only by the scaling gain factor G, with no other change in shape including width. These predictions were violated by the data, in that directional tuning for one patch does depend on the direction of the other patch. When the other patch moves in the neuron’s preferred direction, the tuning curve for the probing patch has higher gain and larger width than when the other patch is moving in the antipreferred direction. Finally, the scaling model predicts that the ability to discriminate directional signals by one patch should be deteriorated by the presence of another patch, and deteriorated more if that other patch moves in the neuron’s preferred direction (gray symbols in *bottom inset* of Fig. 9). Analysis of the data yields a different result: although it is the case that discriminability is decreased by the presence of a second patch, it is decreased more when the second patch moves in the neuron’s antipreferred direction, rather than in the preferred.

### Normalization model

An obvious alternative to simple scaling is gain control by normalization (Simoncelli and Heeger 1998), the most widely used nonlinear scheme in cortical modeling (Simoncelli and Olshausen 2001). The implementation used here is directly borrowed from models used to explain MT physiology [compare Fig. 8 here with Fig. 1 in Heeger et al. (1996)]. The normalization model captures all three departures from the scaling model described in the previous section. Of course this model has more free parameters than the scaling equation, so it is expected that it would display a larger degree of flexibility, although it is nevertheless surprising that it can account simultaneously for all three effects of adding an extra patch, at least qualitatively. A direct comparison between these two models would not be appropriate because the scaling model is not a model in the same sense that the normalization model is. For example, the scaling model does not “predict” exact values for gain or width for the different conditions; it predicts only “relations” such as “gain should not change” or “width should not change.”

The ability of the normalization model to simulate discriminability differences is particularly interesting. The prediction from the scaling model makes intuitive sense: in the presence of an extra patch moving in the preferred direction, the added activity is larger than when the extra patch is moving in the antipreferred direction. If response variance scales with response mean, the additional preferred-direction signal should result in larger variance, lower reliability, and therefore lower discriminability. This is why, as shown by the small gray circles in the *inset* labeled “discriminability” in Fig. 9, the scaling model predicts larger MI and DDI for the antipreferred direction of the added patch.

Previous research (Borst and Haag 2001) has shown that information rate increases with increasing firing rate in H1 [but the opposite relationship has been demonstrated in developing visual cortical neurons of the macaque monkey by Rust et al. (2002)]. This result in itself does not explain the mutual information results presented herein. For information rate to scale positively with firing rate, all that is needed is that the firing variance increases less than linearly with firing rate. However, as long as the relationship is positive monotonic, the prediction from the scaling model is still that the PD extra patch should reduce MI more than the ND patch, which is the opposite of what we observed experimentally.

### Physiological plausibility of a gain-control network in the lobula plate

There are more than 60 characterized neuronal types in the fly lobula plate, spanning a wide range of directional selectivities and integration patterns (Hausen 1984). This means that there are certainly enough building blocks for a model like that proposed in Fig. 8. In fact, for the implementation of Fig. 8 we used only four neuronal types with overall directional preference for the four cardinal axes. As shown in Fig. 2*A*, we encountered all four types in the course of this study.

The next question is whether what we know about connectivity in the lobula plate may be consistent with a scheme like that proposed in Fig. 8. Given our present knowledge, this question can be answered but only tentatively. It is certainly the case that interactions between neurons with different selectivity patterns exist (Egelhaaf et al. 1993; Warzecha et al. 1993). Haag and Borst (2004), for example, showed that VS7/8 receive inputs not only from downward preferring units, but also from a neuron that responds best to horizontal motion and determines the partial selectivity of VS7/8 for horizontal motion in the dorsal part of the receptive field. Inhibitory interactions are also known to exist (Borst and Haag 2002; Warzecha et al. 1993), but our present knowledge of interneuronal circuitry within the lobula plate is still far from providing a complete picture of the functional connectivity within this structure.

### Relationships to previous models

Previous gain-control models of tangential cells have favored implementations that ascribe the gain-control stage to properties of individual neurons (Borst et al. 1995; Haag et al. 1992), rather than to connections between neurons (although they still typically involve interactions between at least two neurons). In these models, tangential cells integrate motion-selective signals from upstream units at the level of their dendrites, and gain control results as a consequence of dendritic mechanisms (Single et al. 1997). The evidence in favor of these models is very convincing, and they have been successfully used to simulate neuronal responses to stimuli that are viewed by freely navigating flies (Lindemann et al. 2005). It is possible that the implementation of gain control proposed by these dendritic schemes may explain the results presented herein. However, these models often possess many more free parameters than the three used for the model presented here. Our goal was to provide a simulation of our results based on a model that satisfied the following criteria: *1*) plausible within the context of current computational literature on motion processing in neural systems; *2*) smallest number of free parameters; *3*) as self-contained as possible, meaning that the model would rely mainly on information acquired during this study, so as to avoid the inevitable assumptions that seem necessary when borrowing information from studies that used different methods and stimuli. The model we adapted from the MT literature (Simoncelli and Heeger 1998) seemed to satisfy all these criteria.

The fact that our model may have fewer free parameters than other models does not make it more likely to be correct. Previous gain-control models (like the dendritic ones described above) typically rely on aspects of lobula plate physiology that have been firmly established (Borst and Egelhaaf 1992, 1994; Single and Borst 1998, 2002). On the contrary, our present knowledge of lobula plate connectivity is not sufficient to pinpoint a known network that could mediate the model used in this study (as discussed in previous paragraphs). Moreover, it may well be the case that both models are at least partly correct and that there are multiple mechanisms for controlling gain in tangential cells: one at the level of their dendrites and one at the level of their interneuronal connectivity. Clearly, further research will be necessary to establish the exact stage at which nonlinear aspects of motion integration are actually implemented in biological hardware. This study adds experimental constraints on the response features that future models will need to incorporate.

## GRANTS

This work was supported by Wellcome Trust Grant GR076941AIA.

## Acknowledgments

The author thanks H. Krapp, S. Laughlin, and three anonymous reviewers for useful comments.

## Footnotes

The costs of publication of this article were defrayed in part by the payment of page charges. The article must therefore be hereby marked “

*advertisement*” in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.

- Copyright © 2006 by the American Physiological Society