|
|
||||||||
Howard Hughes Medical Institute, Systems Neurobiology Laboratories, Salk Institute for Biological Studies, La Jolla, California
Submitted 6 May 2004; accepted in final form 26 November 2004
| ABSTRACT |
|---|
|
|
|---|
| INTRODUCTION |
|---|
|
|
|---|
|
Reverse-phi motion is a potentially useful tool for investigating the neuronal basis of motion processing. Most important, the behavior of a directional selective neuron in response to the reverse-phi manipulation may reveal the mechanism used to detect motion, and can be used to test the motionenergy hypothesis. In addition, regardless of the underlying mechanism, the sharp transition between perceptual states associated with a unique pattern attribute (contrast polarity) offers a fine opportunity to explore neural correlates of perceived motion.
A number of physiological studies have used luminance contrast-reversing stimuli of various types, which might be expected to elicit reverse-phi motion, and found that the responses of the recorded neurons "reversed" in the predicted manner (Emerson et al. 1987
; Ibbotson and Clifford 2001
; Livingstone et al. 2001
). In primates, this was first noted by Dobkins and Albright (1994)
, who observed that the directionally selective responses of macaque middle temporal area (MT) neurons qualitatively mirrored the reverse-phi perceptual reports of human observers under similar stimulus conditions (Dobkins and Albright 1994
). Livingstone et al. (2001)
observed a related effect in MT neurons using one-dimensional sparse white-noise stimuli.
Inspired by the potential utility of reverse-phi stimuli for understanding motion mechanisms and encouraged by the earlier reports of neuronal sensitivity, we devised the present experiments. Our first goal was to determine the extent to which MT neuronal responses and perceptual reports of reverse-phi motion are correlated. Thus we recorded neuronal responses in MT while the monkey performed a direction-discrimination task with phi and reverse-phi stimuli. The outcome of this experiment was consistent with the idea that the spatiotemporal Fourier components in the stimulus are the relevant signals for motion detection by monkeys and that the activity of MT cells is a neural correlate of this process.
The specification of a motion mechanism, however, is not complete with an understanding of what the relevant signals are. It needs to be supplemented with a description of how a multitude of such elementary signals is combined. According to the original motionenergy model, this operation is restricted to a linear subtraction of motion energy in opposite directions (motion opponency). In a second set of experiments, we tested this hypothesis and examined in some detail how MT neurons combine multiple spatiotemporal Fourier components. The data clearly show that a simple nonlinear summation of the component responses cannot explain the responses to multicomponent stimuli. Instead, we propose that a competition among all Fourier components present in the stimulus provides an excellent description of the motion mechanisms in MT cells. Unlike the simple subtraction in the motionenergy model, this competition allows interactions among all Fourier components in a stimulus, not just the preferred and antipreferred directions. Moreover, the influence of any given component on the response varies greatly depending on the other components present in the stimulus. As a result, the response is not simply a vote as to whether the current direction is in the preferred or antipreferred direction, but rather a subtler signal that provides information on the total Fourier energy content of the stimulus.
| METHODS |
|---|
|
|
|---|
We conducted two experiments. Experiment I was designed to investigate the sensitivity of monkey observers and individual MT neurons to reverse-phi motion. Both behavioral and neuronal responses were recorded. As we show herein, the results of Experiment I are consistent with a model in which motion detection by MT neurons is accomplished through sensitivity to Fourier motionenergy components in the stimulus. Experiment II was designed to investigate this hypothesis more thoroughly and to determine how the multitude of Fourier components present in natural stimuli is combined. We begin with methods and procedures common to both experiments. Visual stimuli, behavioral paradigms, and electrophysiological recording procedures unique to each of the two experiments are identified below.
Subjects
Three adult male rhesus monkeys (Macaca mulatta; monkeys M, S, and T) were used in the neurophysiological part of Experiment I. Monkeys M and T served as subjects in the behavioral experiments. Simultaneous behavioral and neurophysiological measurements were obtained in monkey T. Two of the monkeys (monkeys S and M) were also used in Experiment II. The subjects had no significant refractive error. Experimental protocols were approved by the Salk Institute Animal Care and Use Committee, and conform to U.S. Department of Agriculture regulations and to the National Institutes of Health guidelines for humane care and use of laboratory animals.
Surgical preparation
Procedures for surgery and wound maintenance have been described in detail elsewhere (Dobkins and Albright 1994
). In short, a head post and a recording cylinder were affixed to the skull using stainless steel rails, screws, and dental acrylic (monkeys M and T) or magnetic resonance (MR)safe Cilux screws and dental acrylic (monkey S). Recording chambers were placed vertically above the anatomical location of area MT (typically 4 mm posterior to the interaural plane and 17 mm lateral to the midsagittal plane) to allow for a dorsoventral electrode trajectory. Chamber placement was guided by structural MR scans. In one animal (monkey T), a search coil for measuring eye position was surgically implanted in one eye using a variation of the method of Judge et al. (1980)
. After surgical recovery and attainment of criterion performance on the visual fixation task (see following text), a craniotomy was performed to allow for electrode passage into area MT. All surgical procedures were conducted under sterile conditions using isoflurane anesthesia.
Apparatus for visual stimulation
All visual stimuli were generated with in-house OpenGL software using a high-resolution graphics display controller (Quadro Pro Graphics card, 1,024 x 768 pixels, 8 bits/pixel) operating in a Pentium class computer. Stimuli were displayed on a 21-in. analog RGB video monitor (Sony GDM-2000TC; 75 Hz, noninterlaced). The output of the video monitor was measured with a PR650 photometer (Photo-Research), and the voltage/luminance relationship was linearized independently for each of the 3 guns in the CRT. Stimuli were viewed from a distance of 57 cm in a dark room (<0.5 cd/m2).
Behavioral paradigm
GENERAL. Monkeys were seated in a standard primate chair with the head post rigidly supported by the chair frame. Eye position was monitored using one of 2 standard methods. In one animal (monkey T), eye position was sampled at 500 Hz using the magnetic scleral search coil technique. In the 2 remaining animals (monkeys M and S), eye position was sampled at 60 Hz using an infrared video-based system (IScan). Eye position data were monitored and recorded with the CORTEX program (Laboratory of Neuropsychology, NIMH: www.cortex.salk.edu), which was also used to implement the behavioral paradigm and control stimulus presentation. When in a given trial the monkey's behavior did not precisely follow the rules of the task (including accurate fixation), that trial was terminated and the behavioral and neural data from that trial were removed from the data set.
FIXATION TASK. This basic behavioral task, which was used in both experiments, required subjects to fixate a small (0.15°) centrally located red spot for the duration of the trial. Each trial began with the appearance of this fixation spot on the video display. After ocular fixation was achieved and held for 250 ms, the stimulus (see following text) appeared. After stimulus offset, the fixation spot remained visible for another 250 ms. Trials in which eye position was maintained inside a square window (1.7° wide) surrounding the fixation spot were concluded with a small (0.15 ml) juice reward. A trial was aborted immediately if eye position strayed outside the fixation window at any time. Each trial was followed by a 1-s intertrial interval.
Electrophysiological recording
We recorded the activity of single units in area MT using tungsten microelectrodes (FHC, 35 M
base impedance), which were driven into cortex using a hydraulic micropositioner (David Kopf, model 650). Neurophysiological signals were filtered, sorted, and stored using the Plexon system (Plexon, Dallas, TX). Off-line spike sorting based on principal components analysis of the waveforms was used to separate up to 3 cells from a single electrode.
We identified area MT physiologically by its characteristically high proportion of cells with directionally selective responses, receptive fields that were small relative to those of neighboring area MST, and its location on the posterior bank of the superior temporal sulcus. The typical recording depth agreed well with the expected anatomical location of MT that was determined from the structural MR scans.
INITIAL ASSESSMENT OF DIRECTIONAL SELECTIVITY AND RECEPTIVE FIELD MAPPING.
Directional tuning was assessed rapidly using whole-field circular motion in the frontoparallel plane (Schoppmann and Hoffmann 1976
). In each trial, a random-dot pattern moved either clockwise or counterclockwise on a circular pathway for 1.25 s. Response rates were quantified as a function of phase of circular motion and the preferred direction was defined as the vector average of these firing rates. Statistical significance was assessed with a Rayleigh test.
The receptive field (RF) location was then determined using an automated sequence of briefly moving dot patterns. These patterns each contained 50 randomly positioned dots within a 5 x 5° window. Each pattern appeared for 260 ms at random nonoverlapping locations in a 6 x 6-square grid (i.e., subtending a 30 x 30° region of visual space), which was centered at the center of gaze. Dots moved coherently in the preferred direction at 10°/s. Each of these pattern presentations was followed by a 130-ms pause before the appearance of the next dot pattern at a different location. Neuronal responses were assessed in trials of 6 consecutive pattern presentations, during which subjects fixated a central target. The general structure of these RF mapping trials and the fixation requirements were identical to those for the fixation task described above. Total trial duration was, however, about 2,700 ms (the sum of 6 pattern presentations and associated pauses, etc.). A 2D spatial map of neuronal responsivity was derived by this procedure, from which the RF center was determined. We placed the stimuli of the main experiment, which always had the same 10° width and height, in the center of the RF, but made no attempts to optimize spatial and temporal frequency properties to the preference of the neuron.
We analyzed neuronal responses with our own software, written in Matlab 6.5 (The MathWorks, Natick, MA). A number of conventional response metrics were computed. The measure of neuronal response used for comparison of the effects of different experimental conditions was the mean spike rate computed within a window of 1,000 ms after response onset. Response onset for a given stimulus condition was defined as the start of the first 50-ms bin in which the firing rate was more than 3 SDs from the baseline firing rate (computed during the initial 250-ms interval during which fixation was maintained but no stimulus was presented in the RF). We used the minimum difference across all conditions between onset of stimulus motion and response onset as an estimate of the response latency of a cell. This minimum response latency was used to analyze all conditions. Comparisons between conditions were based on rank-sum tests.
EXPERIMENT I: A NEURAL CORRELATE OF THE REVERSE-PHI ILLUSION.
Visual stimuli.
Moving stimuli used for Experiment I were of 2 general types: phi and reverse-phi (see Fig. 1). The stimulus configurations used were based on the Gamma (
) stimulus defined by Chubb and Sperling (1989)
. Both phi and reverse-phi stimuli were composed of quarter duty-cycle, rectangular-wave gratings [spatial frequency = 0.25 cycles per degree (cpd)], which moved within a 10°-square aperture for 1,000 ms on each trial. The spacetime-averaged luminance of the gratings (15 cd/m2) was identical to that of the uniform gray field that surrounded them. Grating contrast (Michelson) was 15%. On each trial, the grating was displaced one quarter of its wavelength on every 4th video frame, i.e., every 53.3 ms (temporal frequency = 4.7 Hz). Reverse-phi stimuli differed from their phi counterparts solely by the fact that the luminance contrast polarity of the moving grating inverted with every displacement. Phi and reverse-phi stimuli were presented in the preferred and antipreferred directions (rounded to the nearest 45°) of the cell under study.
Stimulus/response nomenclature and definitions.
The terms phi, reverse-phi, and direction of motion have heretofore been used to refer to both stimuli and the percepts that such stimuli may elicit. To avoid such ambiguity, we adhere to the following definitions throughout: 1) Phi and reverse-phi stimuli refer to the moving grating and its contrast-reversing counterpart, respectively. Direction of stimulus motion refers to the direction of the smallest physical displacement of a grating on the screen, independent of grating contrast polarity. For the reverse-phi stimulus, this stepping direction does not correspond to the generally perceived direction of motion. 2) Phi and reverse-phi motion refer to the motion percepts that are commonly, but not inevitably, elicited by phi and reverse-phi stimuli, respectively. For example, a reverse-phi stimulus will commonly elicit a percept of reverse-phi motion, but occasionally and under some conditions it will elicit phi motion (Chubb and Sperling 1989
; Maruya et al. 2003
).
Direction-discrimination task.
This psychophysical task placed the same fixation requirements on the subject that were imposed in the fixation task. In addition, subjects (monkeys M and T) were required to report perceived direction of motion after stimulus presentation on each trial. Thus the sequence and timing of events on discrimination trials were identical to those on fixation trials until the end of the 250-ms fixation interval that followed stimulus presentation. At that time, 2 small (0.15°) red targets appeared on the display at positions equidistant (5°) from the central fixation spot. One target was displaced from the fixation spot in the direction of stimulus motion and the other was displaced in the opposite direction. Subjects were trained to report perceived direction of motion by making a saccadic eye movement to the target that was displaced in the direction of stimulus motion. Subjects were required to execute this response within 2,000 ms after target onset and to maintain fixation on the chosen target for 500 ms.
Subjects were initially trained to perform the direction discrimination task using phi-motion stimuli. After subjects reached a performance level of
80% correct in all 8 directions, we introduced reverse-phi stimuli. Trials with phi and reverse-phi stimuli were then randomly interleaved. On trials in which phi motion was presented, selection of the correct target was rewarded with a small drop (0.15 ml) of juice on 70100% of the completed trials (the fraction of trials that was rewarded was titrated for each animal/session to simultaneously minimize fluid consumption rate and maximize performance level). On trials in which reverse-phi stimuli were presented, juice reward was given on a random schedule (60% of trials). This random reinforcement for reverse-phi stimuli was used to avoid the possibility that the subject would become trained to report a particular direction for reverse-phi stimuli regardless of what was actually perceived.
Subjects discriminated 2 opposite directions of motion during any given behavioral session, but the stimulus parameters that were used depended on whether neurophysiological data were being obtained concurrently. In the case of monkey M, the stimuli were always positioned 5° to the left of fixation. We recorded >600 trials and averaged over all 8 directions of motion. In the case of monkey T, for which behavioral and neuronal data were obtained concurrently, the stimuli were centered on the RF of the recorded neuron and the axis of stimulus motion was chosen to most closely approximate the neuronal preferred axis of motion. In this case, we found that the behavioral data were largely independent of stimulus position in the visual field (within the limited range of eccentricities tested: <15°), and the physical direction of grating displacement. Data from monkey T were therefore pooled over all trials performed during physiological recordings (>2,000 trials).
Neuronal response measures.
Direction selectivity indices were computed for phi (DSIp) and reverse-phi (DSIrp) stimuli, and were defined as the contrast between the response to the stimulus moving in the preferred direction and the stimulus moving in the antipreferred direction: DSI = 100 x [(preferred antipreferred)/(preferred + antipreferred)]. Significance of the directional selectivity was assessed using a rank-sum test on the firing rates for preferred versus antipreferred stimulation. Note that performing this rank-sum test is not the same as testing whether the DSI is significantly different from 0, which explains why some cells with low DSI values are nevertheless significantly tuned for direction (see Fig. 4).
|
To determine the choice probability for a particular cell and a particular stimulus, responses were divided into 2 groups depending on the behavioral report on each trial. A receiver-operating characteristic (ROC) analysis of these 2 firing rate distributions gave the choice probability (CP). A CP >0.5 indicates that the cell fired more when the monkey made a decision that corresponded to the preferred direction of the cell. The reliability with which CP can be determined depends on stimulus-independent behavioral response variation and is thus tied to the fraction of incorrect responses. Because our monkey performed the task at high rates, we typically obtained few trials for one decision and many trials for the other decision. This situation prevents a meaningful analysis of the significance of the choice probability of individual cells (Britten et al. 1996
). At the population level, however, the null hypothesis that the trial-to-trial response variability and the monkey's decision process are independent nevertheless predicts a distribution of CP values with a median of 0.5. We used a Wilcoxon signed-rank test to test this null hypothesis for the entire population of cells.
EXPERIMENT II: PROBING MOTION MECHANISMS IN THE FOURIER DOMAIN. Two of the 3 animals that participated in Experiment I (monkeys S and M) also served as subjects in Experiment II. The only behavioral requirement in this experiment was fixation of gaze.
Visual stimuli.
The stimulus set used in Experiment II was based on the lower-order Fourier components (i.e., the components accounting for the greatest proportion of Fourier energy) of the phi stimulus used in Experiment I. In practice, this stimulus set was constructed for each neuron using the first 4 Fourier components for each direction along the neuronal preferred axis of motion. To illustrate, first consider the phi stimulus moving in the preferred direction. The first 4 Fourier components of this 0.25-cpd asymmetric rectangular-wave stimulus are sine-wave gratings of the following varieties (Lu and Sperling 1999
): If we also consider the phi stimulus moving in the antipreferred direction, the set of components expands to include: Because the nonmoving components (2 and 4) are identical for each direction of motion, the basic stimulus set actually consists of 6 different Fourier components, which we designate by the following abbreviations: P, a, p, A, f, s.
|
|
We stimulated each cell using the full stimulus set. As in Experiment I, all stimuli were presented within a square aperture centered on the RF. All component and composite stimuli were oriented perpendicularly to the preferred axis of motion (rounded to the nearest 45°).
Model fitting.
We used a nonlinear least-squares curve-fitting algorithm in Matlab (lsqcurvefit.m) to determine the best-fitting parameters for each model. The initial conditions of the fitting procedure were randomly chosen and the best fit of 50 repetitions was used as the final answer. As an intuitive measure of the explanatory power of a model, we determined the percentage explained variance, given by: 100 x {1 [variance (predicted actual)/variance (actual)]}. This number is bounded above by 100%, but can become negative for extremely poor models.
The models we sought to compare differed both in their mathematical form and the number of free parameters. This invalidates comparisons based only on explained variance or nested log-likelihood. The Akaike Information Criterion (AIC) is a measure of model performance that allows one to compare the performance of arbitrary models while correcting for their different number of free parameters (Burnham and Anderson 1998
). An intuitive (although not entirely correct) way to interpret the AIC is that it is the sum of a measure of the goodness-of-fit and a penalty term for the number of free parameters in the model
![]() |
3 less than that of the alternative model. When the difference between the models was <3, the analysis was deemed inconclusive. | RESULTS |
|---|
|
|
|---|
BEHAVIORAL DATE. Two of our 3 subjects (monkeys M and T) performed a direction-discrimination task using both phi and reverse-phi stimuli. The task and stimulus are detailed in METHODS and Fig. 1; the results are shown in Fig. 2. Behavioral reports elicited from both monkeys by the phi stimulus were predominantly of phi motion (monkey T, 91%; monkey M, 80% of trials). By contrast, when viewing the reverse-phi stimulus both monkeys reported reverse-phi motion on the majority of trials. Reports of phi motion were much less frequent (monkey T, 10%; monkey M, 32% of trials). This behavioral dichotomy mirrors that seen in human subjects, which suggests that monkeys experience phi and reverse-phi motion similarly to humans under these conditions. Interestingly, we found that, over the recording period of 4 mo, monkey T never appeared to recognize that rewards on reverse-phi trials were independent of his decision (i.e., he consistently reported a reverse-phi percept). This absence of extinction is testimony to the strength of the reverse-phi illusion. The demonstration that monkeys perceive the reverse-phi illusion shows that they are good animal models to investigate the neural correlates of this illusion and its implications for motion perception in general.
|
|
In Fig. 3D we have plotted the average firing rate of our example MT neuron as a function of stimulus type and direction of motion. These data illustrate, once again, that responses were strongest when either phi stimuli moved in the 315° direction or reverse-phi stimuli moved in the 135° direction. Behavioral reports of perceived direction of motion are summarized in Fig. 3E. Here the vertical axis represents the fraction of trials on which the monkey reported motion in the 315° direction. Mirroring the neuronal data precisely, such reports were most common when either phi stimuli moved in the 315° direction or reverse-phi stimuli moved in the 135° direction.
Population data. The reverse-phi behavior of the cell highlighted in Fig. 3 was typical of the population. To quantify the effect across the population, we computed 2 directional selectivity indices for each neuron: one for phi stimuli (DSIp) and another for reverse-phi stimuli (DSIrp) (see METHODS). The sign of DSIp was constrained to be positive. However, because we defined "direction of stimulus motion " to be the direction of smallest grating displacement, regardless of contrast polarity, a cell that detects reverse-phi motiona cell with a selectivity pattern like the one shown in Fig. 3will exhibit a negative DSIrp. For the cell in Fig. 3 DSIp = 76% and DSIrp = 71%. In Fig. 4 we have cross-plotted values of DSIrp and DSIp for the cells that were directionally selective (n = 73) for either phi or reverse-phi stimuli. As can be seen from the marginal distribution, DSIrp was negative for the majority (58 out of 73 cells; 80%) of cells and the median (26%) was significantly different from zero (P < 0.01). Moreover, values of DSIp and DSIrp were highly correlated (Spearman R = 0.63; P < 0.001) for those neurons that exhibited significant selectivity for both stimulus types (red dots). Even for the subpopulation of neurons that were not significantly directionally selective (and thus not shown in Fig. 4), the median DSIp = 7.6%, whereas DSIrp = 4% (both significantly different from zero; P < 0.05). This confirms that, even though the strong direction selectivity for phi motion of the example neuron in Fig. 3 was atypical for the population, the reversal of the directional response for the reverse-phi stimulus was typical.
Choice probability.
As Fig. 3E reveals, monkey T did not always give the same answer to the same stimulus. To quantify the neuron's involvement in the monkey's decision process, we determined the choice probability (CP; see METHODS). This analysis was performed for every recording in which the monkey made at least one mistake; this resulted in a database of 94 neurons. These neurons all had statistically significant direction-selective responses to smoothly moving random-dot patterns. A CP value >0.5 indicates that, on trials in which the cell fired more than usual, the monkey was more likely to report motion in the preferred direction of the cell. The distribution of CP values for the phi stimulus is plotted in Fig. 5A. Given the small fraction of error trials (10%), an analysis of significance on individual cells is not meaningful (Britten et al. 1996
). The overall distribution of CPs, however, possessed a median (0.60) that was significantly greater than 0.5 (P < 0.05). The corresponding distribution of CP values for the reverse-phi stimulus is plotted in Fig. 5B. For the reverse-phi stimuli, the median of the distribution of CP values (0.55) was also significantly larger than 0.5 (P < 0.05). These MT choice probabilities were comparable with those found in other direction-of-motion tasks (0.55; Britten et al. 1996
) but smaller than those found for a stereoscopic depth task (0.67; Dodd et al. 2001
). Note also that for phi and reverse-phi stimuli, the effect was somewhat larger for stimuli moving in the preferred direction than for stimuli moving in the antipreferred direction. However, because the trends in the separate data sets were the same, they were pooled to construct the histogram in Fig. 5. These data show that the monkey's behavioral report of direction was on average predicted by MT responses both for phi and for reverse-phi stimuli, which suggests, in turn, that the two stimulus types engaged a common decision mechanism.
|
Experiment II: probing motion mechanisms in the Fourier domain
In this experiment we investigated how neuronal responses to single Fourier components are related to the neural responses to a stimulus that contains multiple Fourier components. Specifically, we recorded responses of 54 MT neurons to 6 single Fourier components (sine-wave gratings) as well as 21 composite stimuli that were created by combining these components (see METHODS). Comparing responses to the individual Fourier components with those elicited by the composite stimuli allowed us to determine how MT neurons combine information in Fourier space.
ILLUSTRATIVE NEURONAL RESPONSES.
Data from one MT neuron are shown in Fig. 6. The inner circle of the figure shows peristimulus histograms of the neuronal responses to 6 individual Fourier components of the phi stimulus (P, p: preferred direction; f, flicker; s, stationary; A, a, antipreferred; see METHODS). As expected, the largest response was elicited by the components moving in the preferred direction (stimulus P, p). The cell also responded strongly to the flickering sinusoid (stimulus f), which corroborates previous evidence that many MT neurons respond to nonmoving contrast-modulated patterns (Churan and Ilg 2002; Thiele et al. 2000
). The components moving in the antipreferred direction (stimuli A, a) induced a small above-baseline response. Finally, the stationary component (s) reduced firing somewhat below baseline. We found such a suppressive effect of the stationary component in a minority (22) of the 54 cells. Because baseline activity is low in MT, suppression below baseline is also necessarily small. Thus when averaging the response of the MT population to stationary stimuli it was still above baseline. Consistent with previous reports (Albright 1984
), the population average response to a stationary stimulus was about 24% of the response to the preferred stimulus.
|
MODELING COMPONENT INTERACTIONS.
1) Power-law summation.
In the power-law summation model the response to a composite stimulus is a power-law function of the responses to the components. Power-law summation accurately describes how responses to multiple objects in an MT receptive field depend on responses to single objects in the RF (Britten and Heuer 1999
; Heuer and Britten 2002
). Because this model works well for spatial summation, it seems worthwhile investigating whether it is an accurate description of Fourier space summation. The mathematical form we used is as follows
![]() |
runs over all components. Finally, the
operator represents a linear threshold that sets all negative values to zero and leaves positive values unaltered.
This model includes both excitation and (subtractive) inhibition through the sign (ri) term: if a component reduces firing below baseline, its response will be subtracted to determine the response to the composite stimulus. The appealing property of this model is that it allows one to study qualitatively different interactions within a single mathematical framework. For instance, with gain = 1 and exp = 1, the model performs simple weighted linear summation. With large values of exp, on the other hand, the model implements a highly nonlinear winner-take-all algorithm, in which the response to the composite is essentially determined by the response to the strongest component. With exp = 0.5, this model implements a half-squaring nonlinearity, similar to the nonlinearity used by Simoncelli and Heeger (1989) in their model of pattern MT cells. To systematically investigate these qualitatively different models, we followed Britten and Heuer (1999)
and determined best-fitting parameters per cell, while constraining one or 2 of the 3 model parameters. The offset parameter was always free. Table 1 summarizes the results of this analysis in terms of the average percentage of the variance in the data that the model could explain.
|
Because the median-explained variance is quite low in these simple summation models, it is perhaps more interesting to determine why none of the models worked than determining which of these models worked best. The example cell in Fig. 6 provides some clues. For this cell, the average responses to the p and P components were nearly identical and so were the responses to the a and A components. A comparison of the pA and Pa composites, however, shows that even though the components of these stimuli evoked very similar firing rates in this cell, the composites constructed by combining those evoked very different responses.
This is an important observation because it shows that the rates that the components evoked are not by themselves sufficient to explain how the cell will respond to a composite stimulus consisting of only those components. In other words, any model, regardless of its precise mathematical form, in which the response to a composite stimulus is determined only by the response to the components, cannot explain the full Fourier space summation properties of this MT cell. To investigate how prevalent these responses properties were in our database, we searched, for each cell, for pairs of composite stimuli that contained 2 components that evoked similar rates (e.g., the Pa and pA stimuli in Fig. 6). The difference between the component responses was allowed to be at most equal to the average SE in the component responses. Over the whole database, 210 pairs of composite stimuli with 2 components satisfied this constraint. Any model relying on only the component responses to explain the composite response would predict composite responses that were very similar. To quantify how often such a model would fail, we counted how often the difference in the composite response was >2 SE; this was the case in 127 of the 210 pairs (60%). Thus even using a conservative criterion, and looking only at one way in which any purely response-based model would fail, we have strong evidence against such models for more than half of the cells.
We conclude that one reason why the simple summation models of Table 1 fail is that the influence of a component on a composite stimulus depends not only on its firing rate but also on other factors. One factor could be that a component has an influence on a composite that is stronger or weaker than the response to the component alone would suggest. We investigated such first-order factors in the weighted power-law model.
2) Weighted power-law summation.
The weighted power-law model for Fourier domain summation extends the standard power-law model by introducing a separate gain parameter (Wi) for each component. The mathematical form is
![]() |
still runs over all components, each of which is now weighted by a different weight that is constrained to be positive (Wi). Through these weights, the components can have an influence on the response R that is not directly proportional to the response that the component evoked when presented alone. Note, however, that these weights were also constrained by the fact that when all Ci were zero except one, the response of the model should predict the response to the single component. Thus this model is qualitatively different from the power-law model in that it aims to describe not only the composite responses but also the responses to the individual components. We again determined the best-fitting parameters separately for each cell. The median best-fitting gain parameter for all cell models was 2.2 (quartile range: [1.0; 4.0]) and the median exponent was 16 (quartile range: [4; 261]; note that values >10 essentially represent winner-take-all behavior). On average the power-law model explained 75% of the variance (quartile range: [62%; 87%]). The component weights (Wi) in these weighted power-law models show, not surprisingly, that the preferred direction components (P, p) typically have the greatest influence on the composite response, followed by the flicker and antipreferred components (f, N, n). The stationary (s) component weight was typically small. For most cells, the best-fitting power-law model was a winner-take-all mechanism. In other words, the response to a composite was primarily determined by the response to the component that best drove the cell. Although this model does explain a significant part of the variance in the data set, it also leaves some structure in the responses to composite stimuli unexplained.
To illustrate a weakness of the weighted power-law model, we have presented a model fit for an illustrative MT neuron in Fig. 7A. The model response (vertical axis) was limited to a small set of responses, whereas the actual firing rate (horizontal axis) varied more or less continuously over a large response range. The cluster of data points with model responses at about 27 Hz corresponds to stimuli that contained a preferred component (stimulus P). Every composite stimulus that contained this component yielded an identical output from the power-law model in winner-take-all mode. Indeed, even when the preferred component was combined with 3 nonoptimal components (yielding composite stimulus Pfas, indicated by asterisk), the model response did not change and clearly overestimated the actual response. Similarly, the cluster of data points with model responses at about 20 Hz corresponds to stimuli that contained the flicker component (stimulus f). When this flicker component was combined with the antipreferred component (yielding composite stimulus Af, indicated by a diamond), the response was again overestimated by the model. In this case, the suppressive influence of the antipreferred component (when presented alone this component inhibited the cell below baseline) was ineffective when combined with the excitation elicited by the flicker component.
|
3) Competition.
To account for graded and complex interactions among Fourier components, we considered a feed-forward network model that uses tuned excitatory input and tuned divisive inhibition to implement a competition among the component inputs (Grossberg 1973
; Reynolds et al. 1999
). Unlike the winner-take-all kind of competition implicit in the power-law model, this competition does not necessarily lead to a single "winner. " In the competition model, MT neurons receive both excitatory and inhibitory inputs from the same components, but with different weights. Mathematically
![]() |
parameter (arbitrarily) to 1. This leaves the 6 excitatory (Wj+) and 6 inhibitory weights (Wj) to be found by parameter fitting. All weights are constrained to be positive. Just as in the model proposed by Simoncelli and Heeger (1998)The competition model worked extremely well. On average, it explained 94% of the variance (quartile range: [86%, 97%]). An example fit for direct comparison with the weighted power-law model of Fig. 7A is shown in Fig. 7B. For this cell, the highest firing rates were somewhat underestimated by the competition model, but this was not typical across the population. Importantly, the competition model captured the influence of components rendered ineffective by the power-law model. This occurred because all components in a stimulus were in competition with one another to determine the firing rate of the cell. Thus for example, when the 3 nonoptimal components (fas) were added to the preferred component (P), the model response was reduced, mimicking the behavior of the MT cell (asterisk). Similarly, the modeled response (diamond) to the stimulus composed of flicker (f) and anti-preferred (A) components was correctly intermediate between the responses to the components alone.
The competition model explained more of the variance in the data, but compared with the weighted power-law model, it had 4 extra free parameters. This may have accounted for some of the increase in explained variance. We therefore used the Akaike Information Criterion (AIC) to determine the better model; this criterion corrects for the different number of free parameters. We used an AIC threshold that ensures that a model with more free parameters is considered better only if the increase in information the model provides is warranted by the increase in free parameters (see METHODS; Burnham and Anderson 1998
).
Figure 8 compares the performance of the two models on a cell-by-cell basis. For 25 cells (46%), the competitive interaction model was a better description than the power-law model. The power-law model was a more parsimonious description for only 15 cells. The AIC analysis was inconclusive for the remaining 14 cells. This confirms that for many cells, the competition model is truly a better way to describe the data.
|
|
Figure 9, C and D illustrate the competition model for the example cell in Fig. 6. The most interesting comparison here is between composites Pa and Ap. This cell's responses to the components P and p were very similar and so were the responses to a and A. In Fig. 9C this is shown by the location of the blue asterisks for these components (actual component responses) as well as the dashed lines (model component responses). As pointed out earlier, any summation model that uses only the component responses to predict the composite responses would predict almost equal responses to Pa and Ap. The cell's actual responses, however, were very different and the model captured this. The ratio of the excitatory to inhibitory weights was approximately the same for the A and a components, but because the absolute value of the A component weights were much larger, they had a much stronger suppressive influence in a composite stimulus.
| DISCUSSION |
|---|
|
|
|---|
Here we will first discuss the neural correlates of reverse-phi, and then what the neural circuitry outside MT may contribute to the pattern of responses we found. Last, we will discuss what the phenomenon of reverse-phi tells us about the mechanisms underlying motion perception.
Neural correlates of reverse-phi
Previous reports have demonstrated that directional neural responses reverse with contrast reversals in cat V1 (Emerson et al. 1987
), the nucleus of the optic tract of the wallaby (Ibbotson and Clifford 2001
) and macaque V1 (Livingstone and Conway 2003
), and MT (Livingstone et al. 2001
). Given the nature of the animal subjects or the stimuli used in these studies, however, a direct link between the neural responses and the perceptual phenomenon of reverse-phi could not be established. The studies of Emerson et al. and Livingstone et al. used one-dimensional random white-noise stimuli in combination with a reverse-correlation technique to determine the underlying spatiotemporal structure ("kernels") of the receptive fields. They showed that these kernels are oriented in spacetime, which is consistent with the motionenergy model, and that these kernels are inverted when the contrast-sign is inverted, consistent with the reverse-phi phenomenon. The white-noise reverse-correlation technique provides a detailed look at the structure of the spatiotemporal receptive field but, because white-noise stimuli do not lead to unequivocal phi or reverse-phi motion percepts, the link with the perceptual phenomenon remains tentative. The stimuli in our study do lead to a clear percept of motion, which allowed us to record both behavioral and neural responses and establish a direct link between them.
How is it implemented?
By themselves our data tell us something about only the end result of all processing that leads to the response of an MT cell. This surely includes considerable processing in the retina, the LGN, motion processing in V1 and within MT, and could in principle include signals from many other areas. Because we recorded only the stimulus and the MT response, the weights in the models describe the weights between the stimulus space and the MT cell, not a synaptic weight connecting a V1 cell to an MT cell. In other words, it describes the computation that takes place in the whole network, not its implementation. The extent to which the competitive interactions arise from earlier areas or de novo in MT can be investigated in future work by measuring competitive interactions in V1 as well as simultaneous recordings in MT and V1.
One well-known model of the circuitry needed to achieve directional selectivity makes use of shunting inhibition to veto (Barlow and Levick 1965
) the dendritic propagation of depolarization in the antipreferred direction (Koch and Poggio 1985
). Recently Mo and Koch (2003)
extended this model to include reverse-phi motion selectivity. This model combines same-contrast shunting inhibition with an opposite-contrast shunting inhibition of a stimulus moving in the preferred direction. This antisymmetric coupling of ON and OFF pathways results in directional responses that reverse with contrast reversals. Mo and Koch did not report how their model cells respond when more complex stimuli impinge on the dendritic tree. Intuitively, however, it seems possible that a complex spatiotemporal distribution of luminance increments and decrements could result in an equally complex pattern of facilitation and shunting inhibition. Part of the complex interactions among Fourier components that we observed in MT may therefore already arise at an earlier level that implements directional selectivity with the veto mechanism.
A competitive motion energy model
The first stage of the classic motionenergy model consists of spacetime-inseparable filters that estimate motionenergy in a particular direction (Adelson and Bergen 1985
). Such filters have been demonstrated psychophysically in humans (Burr et al. 1986
) and the