Binocular rivalry is the alternating perception that occurs when incompatible stimuli are presented to the two eyes: one monocular stimulus dominates vision and then the other stimulus dominates, with a perceptual switch occurring every few seconds. There is a need for a binocular rivalry model that accounts for both well-established results on the timing of dominance intervals and for more recent evidence on the distributed neural processing of rivalry. The model for binocular rivalry developed here consists of four parallel visual channels, two driven by the left eye and two by the right. Each channel consists of several consecutive processing stages representing successively higher cortical levels, with mutual inhibition between the channels at each stage. All stages are architecturally identical. With n the number of stages, the model is implemented as 4n nonlinear differential equations using a total of eight parameters. Despite the simplicity of its architecture, the model accounts for a variety of experimental observations: 1) the increasing depth of rivalry at higher cortical areas, as shown in electrophysiological, imaging, and psychophysical experiments; 2) the unimodal probability density of dominance durations, where the mode is less than the mean; 3) the lack of correlation between successive dominance durations; 4) the effect of interocular stimulus differences on dominance duration; and 5) eye suppression, as opposed to feature suppression. The model is potentially applicable to issues of visual processing more general than binocular rivalry.
When incompatible stimuli (such as orthogonal gratings) are presented to the two eyes, they are not fused into a single image. Instead, the monocular stimuli take it in turns to dominate perception. This phenomenon, binocular rivalry, provides a valuable means of studying the perceptual process because it involves a changing percept without any change in the visual stimulus. There has been a surge of interest in binocular rivalry over the last ten years, largely because physiological and imaging experiments have shown that rivalry is a process distributed across a hierarchy of visual cortical areas (Alais and Blake 2004). Binocular rivalry modulates neural activity in the primary visual cortex (Polonsky et al. 2000; Tong and Engel 2001) and in areas in higher cortex including V2 and V4 (Leopold and Logothetis 1996), MT (Logothetis and Schall 1989), inferior temporal cortex and superior temporal sulcus (Sheinberg and Logothetis 1997), and other high-order areas (Tong et al. 1998).
Modeling and predictive studies, however, have not kept pace with the empirical work. Early models consisted of two channels, one for each eye, and a single processing stage (Lehky 1988; Sugie 1982). These models and more recent ones with similar architecture (Laing and Chow 2002) were able to reproduce the stochastic alternation between one percept and the other (Fox and Herrmann 1967). Lumer (1998) also used two channels in his model, but expanded the number of stages to four. Rivalry in this model produced larger neural modulation at the later stages than in the earlier stages, replicating some of the results from experiments on behaving monkeys (Sheinberg and Logothetis 1997). More recently, Wilson (2003) described a four-channel, two-stage model. Two of the channels were driven by the left eye, two by the right, and for each eye, one channel was selective for one stimulus feature and the second channel was selective for an incompatible feature. This model was able to account for the observation that the perceptual alternation typical of binocular rivalry can occur when incompatible stimuli are swapped between the eyes several times a second (Logothetis et al. 1996).
The model described here goes a step further. It consists of four channels and multiple stages. Because all stages are architecturally identical, the number of stages can be set to match the empirical data. The model has three aims:
to account for well-established findings on the duration and independence of dominance intervals in rivalry, and on the effect of differing stimulus strength in the two eyes;
to contribute to the ongoing debate about the nature of binocular rivalry suppression: low-level and monocular (eye suppression) or high-level (feature, or stimulus, suppression);
most importantly, to account for the growing electrophysiological, imaging, and psychophysical evidence that rivalry is a process distributed across a number of visual areas.
Two principles have been used in designing the model. First, it is designed to be as simple as possible, and to produce explanatory and predictive power rather than exact replication of published data. The result is a model with eight parameters, each of which has an easily understood role in the rivalry process. Second, the model is designed to be modular, in that it is built by putting together a sequence of modules, each of which can be separately analyzed. The model is a quantitative version of previous qualitative models (Freeman and Morley 1997; Nguyen et al. 2001).
The full model, consisting of four channels, is described below. To simplify explanation, we start with the two-channel model shown in Fig. 1. One channel, denoted A, is selective for a particular stimulus property (e.g., horizontal contours) and is driven by stimuli presented to the left eye. The other channel, B, is selective for an incompatible property (e.g., vertical contours) and is driven by stimuli to the right eye. Stage 1 represents monocular cells in layer 4C of primary visual cortex, but the locations of the other stages are left undefined. Stage 2, for example, could represent simple cells in primary visual cortex. The general stage, considered below, has subscript k, which is an integer less than or equal to the total number of stages n.
Early investigations of binocular rivalry led to a search for a model that could explain the empirical findings. A constant in this search was mutual inhibition. It was assumed that incompatible stimuli to the two eyes activated two neural populations that inhibited each other (Lehky 1988; Sugie 1982). Despite the assumption of mutual inhibition in most of the existing models for binocular rivalry, there is little or no direct evidence for such inhibition. The circumstantial evidence, however, is compelling: when one eye's stimulus is visible, the other's is not. For this reason alone, mutual inhibition will be assumed in the model developed here.
Ak, representing a single cell on channel A at stage k, is illustrated in Fig. 1B. The cell has three synaptic inputs that it weights (positive for excitatory input, negative for inhibition), sums, and integrates over time to form a postsynaptic potential pk. The postsynaptic potential must exceed a threshold to produce the action potential rate ak. Both postsynaptic potential and action potential rate are functions of time, but this dependency on time will not be shown explicitly in what follows. Now assume a second cell, Bk, on channel B at the same stage, with action potential rate bk. Cells Ak and Bk are assumed to be mutually inhibitory, as illustrated in Fig. 1A for k = 1, 2. The rate of change of postsynaptic potential pk with time, dpk/dt, is defined by (1) where τ is a (positive) time constant. There are four parameters (τ, wes, wis, and wib) in this equation, but one of them can be set arbitrarily without affecting the generality of the equation. The setting chosen is wis = 1. Equation 1 can be more easily interpreted from its integral form (2) In this equation, ak−1 is an excitatory synaptic input to cell Ak from the previous stage on the same channel, with weight wes, bk is an inhibitory input from the competing cell at the same stage, with weight wib, and ak is a self-inhibitory input that sets limits on changes in the postsynaptic potential. The weight subscripts e, i, s, and b, stand for excitatory, inhibitory, same or self, and both, respectively. (The use of both will be explained for the four-channel model.) Equation 2 tells us, therefore, that cell Ak sums three weighted inputs and integrates the sum over time to produce its postsynaptic potential.
The postsynaptic potential pk must reach a threshold level pt, to produce action potentials. The conversion of postsynaptic potential to action potential rate is defined by the following equation (3) Thus, the action potential rate is zero for low postsynaptic potentials and approaches a linear function of postsynaptic potential (pk − pt) at high postsynaptic potentials. Dividing both sides of Eq. 3 by pt shows that this parameter acts only as a normalizing factor for the action potential rate. Thus, pt is set equal to 1, and action potential rates are given as multiples of the threshold.
The equations for the postsynaptic potential qk and action potential rate bk in cell Bk are of the same form as those for cell Ak (4) (5) Note that the same parameters have been used in the equations for Ak and Bk: the two cells are assumed to be equivalent apart from the stimulus properties for which they are selective.
The model, as defined above, has three weights: wes, wis, and wib. One weight can be expressed in terms of the other two as follows. Assume that the average output of cells Ak and Bk is greater than their average input. All stages in Fig. 1 are architecturally identical, so that a growth of output across a single stage will lead to an average output at the final stage that is much larger than the average input to the first stage. This is physiologically untenable: we do not expect action potential rates to differ drastically from visual area to area. By the same argument, we do not expect the average output of a stage to be much less than its average input.
The conclusion, therefore, is that the average output of a stage is similar to its average input. This conclusion applies to both time-varying activity and steady state. At steady state, the average inputs and outputs are obtained by setting the derivatives in Eqs. 1 and 4 to zero, and adding the resulting equations (6) The simplest assumption is that average output equals average input. The following setting is therefore made (7) thereby removing wes as an independent parameter. This equation can be interpreted to mean that the loss of activity attributed to mutual inhibition within a stage is compensated for by the summation of activity arising from receptive field convergence from one stage to the next.
The model described thus far consists of two channels. Such a model cannot address some of the important results in the binocular rivalry literature, such as stimulus rivalry (Logothetis et al. 1996). These results derive from experiments using two incompatible stimuli presented to the same eye at different times. If the model is to account for experiments such as these, it must have two channels for each eye, with one channel selective for each of the incompatible stimuli.
The four-channel model, illustrated in Fig. 2, uses the two-channel model as foundation. The two channels driven by the left eye are 1 and 3, where channel 1 is selective for one stimulus feature (e.g., horizontal contours) and channel 3 is selective for an incompatible feature (e.g., vertical contours). The other two channels are driven by the right eye. The connections for the cell on channel 1 at stage 2, A12, are also shown in the figure. The cell receives excitatory input from the previous stage with weight wes, where e stands for excitatory and s for self. It also receives excitatory input from the channel with the same stimulus selectivity but driven by the other eye; this connection represents binocular summation. The synaptic weight in this case is weo, where o stands for interocular. The inhibitory inputs to a cell are all within the same stage, consistent with the evidence that inhibitory connections do not extend between visual areas (Bullier 2004). The weights of these inputs indicate their sources: wif (f for interfeature inhibition), wib (b for both interfeature and interocular inhibition), wio, and wis.
The equations for a stage in this model are best given in matrix terms. With pjk and ajk representing the postsynaptic potential and action potential rate, respectively, for channel j at stage k (8) where and Action potential production from the postsynaptic potential is given by (9) As with the two-channel model, three of the parameters can be assigned fixed values. First (10) Second, the steady state condition is (11) Adding the four equations represented by this expression, and requiring that average action potential output from a stage equal its average input, yields (12)
The input to each channel is a sum of stimulus and noise components, s and n, respectively (13) The stimulus is a fixed value between 0 and 1, indicating the strength (e.g., contrast) of the stimulus for which that channel is selective. The noise is a Gaussian white noise process with zero mean. The signal aj,in is low-pass filtered with time constant τ0 to simulate the effect of precortical processing (14) The resulting inputs to the first stage aj0 (SD of σ) are independent of each other.
The model is compared with psychophysical data involving two types of decision process. The model's output differs between these two cases.
In studies of the dynamics of rivalry (Fig. 5), and of the effect of unequal stimulus strength in the two eyes (Fig. 6), experimental subjects indicate the intervals during which a given stimulus is dominant. In the model, the dominant stimulus at any given time is defined to be that driving the channel whose final stage has the highest output action potential rate.
The remaining experimental studies (Fig. 4 and 7) measure discrimination sensitivities for monocular test stimuli delivered during either a dominance or suppression phase of the tested eye. For the model, it is assumed that the discrimination process is based on activity at a specific stage, the discrimination stage, which is not necessarily the final one. Sensitivity was determined as follows. First, the dominance intervals for a channel were found and the maximum action potential rate in that channel's discrimination stage was determined for each interval. Next, if these maxima occurred more frequently than 1 per 7 s (the average rate at which subjects triggered test stimuli in the experiments), the smallest maxima in excess of this rate were discarded. The remaining maxima were averaged. Finally, this action potential rate was converted to a sensitivity by delivering small increments to the channel's input and finding the resulting change in action potential rate at the discrimination stage; sensitivity is output increment divided by input increment. The procedure for suppression intervals was the same, except that minimum action potential rate was found in each interval and enough of the largest minima were discarded to ensure that the remaining ones occurred at 1 per 7 s.
Both the two- and four-channel models are defined by Eqs. 8, 9, 10, 12, 13, and 14. The parameter values for the four-channel model (15) are justified in the discussion. The two-channel model has the same parameters, except that weo = wio = wif = 0. Channels 1 and 4 are then independent of channels 2 and 3; either pair can be used to generate the results. The stimuli used to generate Fig. 3, 4, and 5, were s1 = s4 = 1.
The equations were numerically integrated with a fourth-order Runge–Kutta algorithm, using the Matlab (The MathWorks) programming language. The accuracy of the implementation was determined by independently coding the same equations as a Simulink flow diagram (Simulink is part of the Matlab suite) and verifying that the two implementations produced identical time courses. Simulations used a time step of 5 ms.
Some of the results described here require only two channels for their explanation. We therefore start with the two-channel model.
Figure 3 shows the model's time course computed over 10 s. Action potential rate in the A channel is shown in black and that in B by gray. Two important characteristics of the model's behavior can be seen in the figure. First, there are time intervals where the A channel's output is higher than that of B, indicating that the A channel's stimulus dominates perception. These intervals apparently vary randomly in duration, reminiscent of the rivalry process. Second, the difference between the action potential rate in channel A and that in channel B is small for the early stages and increases as activity progresses to later stages. These two aspects of the model will now be examined quantitatively.
AMPLIFICATION OF NEURAL MODULATION.
Several lines of evidence indicate that binocular rivalry modulates neural signals at later stages of the visual pathway more than it modulates earlier signals. The evidence comes from studies of the correlation between single-neuron recordings and perceptual reports in monkeys (Sheinberg and Logothetis 1997), functional magnetic resonance imaging in humans (Polonsky et al. 2000; Tong et al. 1998), and psychophysical experiments (Nguyen et al. 2003).
The model also produces deeper modulation of activity in its later stages, as illustrated in Fig. 3. This can be shown analytically as follows. Mutual inhibition in the model decreases activity in one channel relative to the other. We are therefore interested in the difference in activity between the two channels at one stage relative to the activity difference at a previous stage. Assume steady state, by setting to zero the derivatives in Eqs. 1 and 4. Subtraction of the second equation from the first, and using Eq. 7 yields (16) The ratio (wis + wib)/(wis − wib) takes a value >1 (because wis > wib) and therefore represents a gain between stage k − 1 and stage k. Successive application of Eq. 16 to each stage of the model gives (17) This equation provides the activity difference between the outputs of the two channels relative to the activity difference of their inputs. Given that wis = 1, wib = 0.25, and n = 6, activity modulation increases by a factor of about 21 from input to output. The reason for this amplification is clear. At any given time, one channel is dominant and the other suppressed. A cell on the dominant channel inhibits its suppressed counterpart, increasing the suppression. The suppressed cell, in turn, inhibits the dominant cell less, increasing the dominance.
AMPLIFICATION OF SUPPRESSION.
The amplification of binocular rivalry suppression shown in Fig. 3 can be compared with experimental data. The gray lines in Fig. 4 show psychophysical data from Nguyen et al. (2003), who induced binocular rivalry and presented a brief test stimulus to measure visual sensitivity during dominance and suppression periods. The vertical axis in Fig. 4 shows the sensitivity during suppression divided by that during dominance, and the gap between the data and the dashed line therefore shows suppression depth. The test stimulus consisted of two lobed semicircles, and the subject's task was to discriminate between them. These two stimulus components were made progressively more alike to require more complex form discriminations and to thereby tap into decision processes at neural locations further along the visual pathway: task complexity increases from left to right along the horizontal axis. Suppression deepens with task complexity.
Predictions from the model were obtained by calculating its time course over 100 s. Channel A's sensitivity was calculated for both its dominant and suppressed states, as described in methods. The black line in Fig. 4 shows the sensitivity of the suppressed channel divided by that in the dominant channel, as a function of stage number. As with the experimental observations, suppression deepens from left to right. We do not know to what visual area each model stage corresponds, nor can we nominate the neural sites underlying the visual tasks of Nguyen et al. (2003). The best that can be done, therefore, is to shift the model data laterally so that it matches the psychophysical data. The model data were not adjusted vertically; despite this, the model fits the psychophysical data quite closely. Why is there a decline in sensitivity during suppression? In the model, sensitivity losses originate in the nonlinear relationship between postsynaptic potential and action potential rate: low action potential rates lie on the low-gradient portion of the nonlinearity.
DURATION OF DOMINANCE INTERVALS.
Many studies of binocular rivalry include measurement of its time intervals. Levelt (1967), for example, measured the durations of successive dominance intervals. He was the first to show that the probability density of durations was skewed: it had a single mode that was less than its mean and a substantial number of intervals with durations much longer than the mean. The gray line in Fig. 5A shows a published estimate of this probability density.
Can the model reproduce these findings? To answer this question, the model was run for 500 s and dominance interval durations for channel A were compiled into a probability density, shown by the black line in Fig. 5A. The model density has a shape similar to that of the empirical curves, but differs in that it has an excess of very short and very long intervals. The mismatch for very short intervals is at least partly a result of the lack of the response latency found in human subjects. The very long intervals in the model data are more difficult to reconcile with the empirical data. They could be present because the model lacks an adaptation mechanism; adaptation would tend to produce a sensitivity loss in the dominant channel at long intervals, and a truncation of such intervals.
CORRELATION OF DOMINANCE INTERVALS.
There is a second well-established principle concerning the timing of binocular rivalry: successive dominance intervals have uncorrelated durations (Fox and Herrmann 1967). Simulation shows that this is also true of the model, as seen in Fig. 5B. The black line in this figure was obtained from the model by calculating the time course over 500 s and finding the durations of dominance intervals for both channels. Correlations were calculated between each duration and itself (separation = 0), the following interval (separation = 1), and with intervals at separations up to 10. The only substantial correlation was the (trivial) zero lag. The model therefore accords with the experimental data (gray line) in this respect.
Levelt (1966) proposed a further set of principles for the timing of binocular rivalry alternations, that is, that an increase in the strength of the inducing stimulus to one eye:
does not affect the mean dominance time for that stimulus;
reduces the mean dominance time for the other stimulus.
By “strength” Levelt meant stimulus parameters such as contrast and the sharpness of contours. Subsequent research has shown that although the second statement is correct, the first needs to be softened. Figure 6A shows empirical data (Leopold and Logothetis 1996): as stimulus contrast increases for one eye, dominance duration increases slightly for that stimulus but decreases relatively rapidly for the other stimulus.
Figure 6B shows that the model reproduces this behavior, with one caveat: the stimulus range producing the required changes in dominance times is considerably smaller than the range of contrasts for the experimental data. This difference can be at least partly explained by the steep contrast-response functions measured in lateral geniculate nucleus and cortical cells (Sclar et al. 1990). The bottom horizontal axis in Fig. 6A shows lateral geniculate responses corresponding to the contrasts on the upper axis. These values were calculated from the contrast-response functions of an equal mixture of parvocellular and magnocellular cells in Sclar et al.'s sample. Although the new horizontal axis does not completely match the horizontal axis in Fig. 6B, it shows that contrast normalization can account for at least some of the difference between the two graphs. Another possibility is the lack of an adaptation process in the model: adaptation would tend to reduce the effect of a strong stimulus relative to a weak one.
The last property of rivalry to be considered is eye suppression. The experiments in this case use binocularly incompatible stimuli presented to the same eye at different times. To compare model to experiment we need to switch to the four-channel model.
The earliest models of binocular rivalry assumed that rivalry arose from mutual inhibition between primary visual cortical cells driven by the left eye and those driven by the right eye. This led to the idea of eye suppression, that is, that when one eye's stimulus is suppressed any stimulus to that eye will be suppressed, regardless of stimulus features. The alternative hypothesis is feature (or stimulus) suppression, which states that it is a stimulus feature (e.g., horizontal contours) that is suppressed, regardless of the eye to which the feature is presented (Logothetis et al. 1996). There is psychophysical evidence for eye suppression (Blake et al. 1980; Nguyen et al. 2001). Nguyen et al. induced rivalry with orthogonal gratings and then superimposed a test stimulus on one of the gratings to measure sensitivity. The orientation of the test varied: it matched that of the conditioning stimulus on which it was superimposed, that of the other stimulus, or took one of several values in between. Suppression depth was calculated by dividing test sensitivity when the tested eye's conditioning stimulus was suppressed, by that during dominance. Sensitivity during suppression averaged 62% of that during dominance, a value comparable with previous estimates (Blake and Camisa 1978; Makous and Sanders 1978). More important, suppression depth varied little with orientation, as shown by the gray lines in Fig. 7B, and as expected of eye suppression. The results do not conform to the expectation of feature suppression. At the right end of the axis, when the tested eye's stimulus is suppressed and the test stimulus has the same features as the dominant eye's stimulus, feature suppression predicts that the data should lie above 1.
Like the experimental data, the model produces eye suppression. This is illustrated in the time course shown in Fig. 7A. To generate this time course, stimuli were applied to the left-eye channel selective for one orientation and the right eye channel selective for the orthogonal orientation: s1 = s4 = 1. Lesser inputs were applied to the other two channels: s2 = s3 = 0.8. It can be seen that the activities in the two left-eye channels (shown in black) tend to vary together. When activity in one left-eye channel is high, so is that in the other left-eye channel. Similarly, the two right-eye channels (shown in gray) also tend to vary together.
To compare the model with the experimental observations, the model was analyzed in the following steps. First, the model was run for 100 s with the stimuli set as above. The tested eye was assumed to be that driving channel 1. Second, the sensitivity of channel 1's stage 5 was found when that channel was dominant and also when it was suppressed; the procedure for calculating sensitivity is described in methods. Stage 5 was chosen because the experiment required the subject to decide at which of two locations the test stimulus was located: the neural site underlying this task presumably lies relatively early in the visual pathway. Third, the sensitivity during suppression was divided by that during dominance to provide the filled circle at the left side of Fig. 7B. Fourth, at the right side of the figure, the tested eye is the same but the test orientation is orthogonal. The analysis was therefore repeated, except that the sensitivities found were for channel 3. The resulting model data match well with the empirical data, confirming that the model produces eye suppression. Conversely, feature suppression requires that the data trend upward from left to right. Given that the model data slope in the opposite direction, if anything, the model offers no support for the feature suppression hypothesis.
The model developed here has the virtue of simplicity: it has eight independent parameters and yet can account for a variety of experimental observations. As stated in the introduction, the aim of this model-building exercise was explanatory power rather than exact reproduction of experimental data. Accordingly, the model parameters were set so that the model matched major features of the experimental data; an error minimization procedure was not used. The parameters were set as follows.
STAGE TIME CONSTANT, τ.
The mean duration of dominance intervals is proportional to τ, which therefore sets the overall timescale of the model. The time constant was set so that the mean dominance duration, 1.69 s, was close to an early empirical measurement, 1.63 s, of the same quantity (Fox and Herrmann 1967).
NUMBER OF STAGES, n; INHIBITORY WEIGHT,Wib.
Both these parameters contribute to suppression depth, as shown by Eq. 17. With respect to suppression depth, therefore, an increase in one parameter can be compensated for by a decrease in the other. To match the gradual decline of suppression depth with task complexity in psychophysical experiments (Fig. 4), the number of stages had to be at least six. The number of stages was therefore set at six, and wib adjusted to obtain a close fit with the psychophysical data.
INPUT TIME CONSTANT, τ0.
Two recent models (Laing and Chow 2002; Wilson 2003) show that the stochastic alternations in rivalry can be produced by chaotic relationships between action potential rates in mutually inhibitory channels. The input noise in the present model may therefore represent chaotic relationships between spike trains rather than variability in individual trains. The time constant for the noise was therefore set at a value, 40 ms, over which differences in action potential rates are likely to be significant.
NOISE AMPLITUDE, σ.
This parameter plays a key role when stimulus-dependent inputs differ between channels. A channel with lower-strength input can dominate only when the sum of its stimulus-dependent and noise components exceeds that of the dominant channel. The noise amplitude was set at a level for which the model approximates the shape of the empirical curves in Fig. 6. It should also be noted that there is a trade-off between the two parameters, τ0 and σ. When the input time constant is small, the rapid fluctuations in the input are smoothed by the low-pass filtering of the stages, and the noise amplitude has to be increased to compensate.
WEIGHTS Weo, Wio, AND Wif.
These weights are set equal to zero for the two-channel model and are therefore significant only in the four-channel model. Weights weo and wio have conflicting roles: weo mediates binocular excitation of a neuron and wio mediates interocular inhibition. Weight weo was set so that each neuron at the output stage of the model could be monocularly excited from both eyes (Burkhalter and Van Essen 1986). Eye suppression (Fig. 7) occurs only if the inhibition between channels driven by different eyes and selective for the same feature is greater than that between channels driven by the same eye and selective for different features. This required that wio take a value close to weo, and that wif be substantially smaller than both of the other weights.
It is of interest to compare the model parameter settings with corresponding values measured experimentally. This is possible for two of the parameters.
STAGE TIME CONSTANT, τ.
The time constant setting, τ = 80 ms, is considerably longer than the time constant for excitation in single cortical cells (Gutnick and Crill 1995). This may be attributable to the lack of a mechanism for spatial spread in the model. It will take time for inhibition to spread across a population of cells at any one stage of the model (Wilson et al. 2001). The time for spatial spread is presumably incorporated into the time constant.
NUMBER OF STAGES, n.
Sheinberg et al. (1997) showed that the responses of single cells in inferior temporal (IT) cortex correlate closely with perceptual reports during binocular rivalry. Visual signals pass through V1, V2, and V4, on their way to IT (Felleman and Van Essen 1991). It could be, therefore, that when the model is applied to the ventral visual pathway, its stages include the cortical areas V1, V2, V4, and IT.
The model developed here builds on ideas used in a number of previous models. Like its predecessors, it has mutual inhibition between cells within a processing stage (Lehky 1988; Sugie 1982), noisy inputs to generate dominance of one population over another (Lumer 1998), and more than two channels (Wilson 2003). The major innovation in the current model is the use of multiple stages with identical architecture. Lumer (1998) used four stages with differing architecture to show that the activity difference between dominant and suppressed channels grows from stage to stage. The model developed here goes further by demonstrating that there is a sensitivity difference between dominant and suppressed channels, and that the increased sensitivity difference at higher stages is in quantitative agreement with psychophysical data and in qualitative agreement with electrophysiological data.
A notable difference between the present model and previous ones lies in the use of adaptation. Adaptation has been used previously to produce the switches from dominance to suppression, in two ways (Laing and Chow 2002; Wilson 2003). First, synaptic depression weakens the effect of inhibition on suppressed neurons, allowing them to become dominant. The second mechanism is action potential adaptation: dominant cells lose sensitivity because of their higher action potential rates, and lose dominance as the suppressed cells become more sensitive because of their low action potential rates. The present model differs in that it has no adaptation. It has a single nonlinearity (the action potential threshold), but that does not qualify as adaptive because it is instantaneous and produces no change in sensitivity over time. Switches between dominance and suppression are initiated at the model's input when the stochastic driving function for the dominant channel falls below that of the suppressed one. There is some evidence in the literature against a role for adaptation in binocular rivalry. Although adaptation should produce a lessening depth of suppression toward the end of a dominance interval, two studies have found depth to be constant across an interval (Fox and Check 1972; Norman et al. 2000). Nevertheless, given the ubiquity of adaptation in sensory systems, it would be surprising if it had no role at all in rivalry.
The site at which binocular rivalry is initiated constitutes another important difference between models. As in Wilson's (2003) model, the model developed here assumes that perceptual switches in binocular rivalry are instigated at the lowest stage of the model, corresponding to primary visual cortex. This assumption distinguishes these models from others that assume perceptual switches arise from top-down influences (Dayan 1998; Lumer et al. 1998) or from a brain stem oscillator (Miller et al. 2000). The existence of eye suppression and feature suppression provides important evidence in this debate about the initiation site of binocular rivalry. The present model concurs with the existence of eye rivalry (Fig. 7) because rivalry switches in the model are initiated through the mutual inhibition of monocularly driven cells. How does this match with the demonstration of feature suppression obtained by rapidly swapping stimuli between the two eyes (Logothetis et al. 1996)? Wilson (2003) used his model to show that the crucial factor here is the stimulus flicker that Logothetis et al. used in conjunction with eye-swapping. Flicker moves the site of rivalrous cellular activity from stage 1 of Wilson's model to stage 2, thereby reducing eye suppression and making the rivalry more like feature suppression.
Comparisons with physiology
The depth of binocular rivalry suppression is small in the early stages of the present model (Fig. 4). This corresponds well with the paucity of primary visual cortical cells whose activity correlates with behavioral reports of perception during rivalry (Leopold and Logothetis 1996). It does not match well, however, with two studies (Polonsky et al. 2000; Tong and Engel 2001) in which modulation of the magnetic resonance signal from primary visual cortex during rivalry was more than half as big as the modulation produced by physical alternation of stimuli. How is this discrepancy to be explained? Polonsky et al. discussed several possible reasons for the difference. There are a further two possibilities that they did not raise. First, the imaging studies used a grating presented to one eye and a grating with incompatible orientation and color presented to the other eye. Leopold et al. used orthogonal gratings that did not differ in color. It could be, therefore, that the imaging studies yielded larger modulations because their stimuli evoked not just contour rivalry, but color rivalry as well. Second, Tong et al., who recorded the largest rivalry-driven modulation of all of the studies, sampled activity from an area of the cortex (the representation of the blind spot) in which all cells were driven by the same eye. It remains to be seen whether the large activity modulations they recorded can also be found in cortical tissue containing intermingled left-eye- and right-eye-driven cells.
These considerations aside, it still remains to reconcile the small suppression depths in the model's first and second stages with the imaging results. A common assumption about primary visual cortex is that it contains at least three levels of processing—monocular, simple, and complex cells—and that these levels are sequential (Hubel and Wiesel 1962). It could be, therefore, that three stages of the model also reside in primary visual cortex. The last of these produces activity modulation comparable with that in the imaging studies. If the model is to be tested in future electrophysiology experiments, what does it predict? The essential predictions are illustrated in Fig. 3: that at any given time during binocular rivalry, one population of cells has a high firing rate and another has a low rate, that this relationship reverses cyclically, and that the firing rate difference between populations is greater in higher than in lower visual cortex.
Generalizing the model
The last issue discussed is that of generalizing the model. As it stands, the model's behavior is determined by its subcortical input. There are a number of studies, however, that show that binocular rivalry can also be influenced by top-down inputs such as attention. It has been shown that subjects can willfully change the switching rate in rivalry (Lack 1969) and that attention to one of the monocular stimuli producing rivalry slightly shortens the intervals for which the unattended stimulus is dominant (Meng and Tong 2004). Further, there is rivalry between stimuli, such as two figures with differing biological motion, that require high-level processing for their interpretation (Watson et al. 2004). These findings do not invalidate the feedforward design used in the present model for several reasons: 1) rivalry can occur in the absence of wilful intervention; 2) the ability to keep a specific rivalrous stimulus in view, at the expense of the other, is weak; and 3) rivalrous stimuli requiring high-level processing for their interpretation also produce low-level incompatibilities between the monocular stimuli. Nevertheless, a more general model requires the addition of a feedback pathway to account for the observed top-down effects.
A broader question concerns the extent to which the model can be applied to visual phenomena other than binocular rivalry. The model produces a single percept from multiple sensory inputs through a winner-take-all mechanism. When one sensory input has a brief, small advantage over other inputs, the activity resulting from that input builds from stage to stage while the activity arising from other inputs declines. Activity differences between channels translate into sensitivity differences through the nonlinear transformation from postsynaptic potential to action potential rate. Although the model has been applied here to binocular rivalry data, there is nothing in it that restricts it to this field. The building blocks of the model—multiple channels, multiple stages, cross-channel excitation and inhibition—could in principle be applied to other areas of study, such as form vision. It would be of considerable interest to see whether such a generalization is possible.
I thank I. Cathers, C. Clifford, and V. Nguyen for comments on an earlier version of this paper.
The costs of publication of this article were defrayed in part by the payment of page charges. The article must therefore be hereby marked “advertisement” in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.
- Copyright © 2005 by the American Physiological Society