The present study suggests that the neural computations used to integrate information from different senses are distinct from those used to integrate information from within the same sense. Using superior colliculus neurons as a model, it was found that multisensory integration of cross-modal stimulus combinations yielded responses that were significantly greater than those evoked by the best component stimulus. In contrast, unisensory integration of within-modal stimulus pairs yielded responses that were similar to or less than those evoked by the best component stimulus. This difference is exemplified by the disproportionate representations of superadditive responses during multisensory integration and the predominance of subadditive responses during unisensory integration. These observations suggest that different rules have evolved for integrating sensory information, one (unisensory) reflecting the inherent characteristics of the individual sense and, the other (multisensory), unique supramodal characteristics designed to enhance the salience of the initiating event.
Understanding how the brain synthesizes information from different senses has been the impetus for many studies of multisensory neurons in the superior colliculus (SC) (for review see Stein et al. 2004). The SC receives visual, auditory, and somatosensory inputs from several sources in a variety of convergence patterns to produce individual neurons with multiple, spatially congruent, modality-specific receptive fields. Each possible multisensory type is represented, with the visual–auditory neuron being most common. Of current relevance is the fact that SC neurons integrate the information derived from stimuli falling simultaneously on their different modality-specific receptive fields to produce responses that are significantly different from those elicited by these stimuli individually (Stein and Meredith 1993). This physiological effect is presumed to alter stimulus salience and have substantial behavioral and perceptual consequences.
The most common outcome of integrating excitatory modality-specific influences is enhancement, wherein the multisensory response is greater than the constituent unisensory responses and sometimes greater than their arithmetic sum. This augmentation is greatest when the modality-specific cues are weak (Meredith and Stein 1986a; Perrault et al. 2005; Stanford et al. 2005), a relationship that has an analog at the level of overt behavior (Stein et al. 1988, 1989).
Despite the extensive period during which stimulus integration and its potential significance for understanding behavior have been examined, one of its most fundamental tenets remains unexplored: whether the computations that underlie the integration of stimuli from different senses are substantially different from those that underlie integration of information within the same sense. In short, do the computations differ when processing, for example, a visual–auditory stimulus as opposed to two visual (or two auditory) stimuli?
There is reason to expect either of two diametrically opposed possibilities. In the first, unisensory and multisensory integration operate according to different rules, the former constrained by the inherent anatomical and physiological properties of a given sense and the latter optimized to facilitate pooling information from different senses. Consistent with this perspective, the spatial summation of visual stimuli within the excitatory receptive fields of cortical visual neurons is most often subadditive (Britten and Heuer 1999; Carandini et al. 1997; Gawne and Martin 2002; Henry et al. 1978; Lampl et al. 2004; Li and Basso 2005; Movshon et al. 1978; Reynolds et al. 1999), whereas the integration of excitatory cross-modal cues in the SC is most often additive or superadditive (see Stanford et al. 2005). However, the second and opposite possibility is equally plausible: unisensory and multisensory integration operate according to the same rules. This seems reasonable because SC neurons, which are involved primarily in detecting stimuli to effect orientation behavior (rather than feature extraction for perception as in primary sensory areas), may use similar computations to integrate their inputs regardless of stimulus origin.
Understanding this issue motivated the experiments detailed here and a preliminary report of these observations previously appeared in abstract form (Alvarado et al. 2004).
All animal protocols were in accordance with the Guide for the Care and Use of Laboratory Animals (National Institutes of Health Publication 86–23) and were approved by the Animal Care and Use Committee of Wake Forest University School of Medicine, an AAALAC-accredited institution.
Surgical procedures were similar to those described previously (Jiang and Stein 2003; Jiang et al. 2001). Experiments were performed in three adult cats. Briefly, each animal was prepared for surgery with a mixture of ketamine hydrochloride [30 mg/kg, administered intramuscularly (im)] and acepromazine maleate (3–5 mg/kg, im), intubated through the mouth, and then maintained for surgery with isoflurane (0.5–3%). It was placed into a stereotaxic head holder, and a craniotomy exposed the cortex overlying the SC. A hollow stainless steel cylinder that provided access to the SC and that served to hold the animal's head during recording experiments (McHaffie and Stein 1983) was attached stereotaxically to the skull over the SC craniotomy with surgical screws and orthopedic cement. A heating pad maintained body temperature (37–38°C) during the surgery and recovery from anesthesia. After surgery the cat received analgesics (butorphanol tartrate, 0.1–0.4 mg·kg−1·6 h−1) as needed and antibiotic for 7–10 days (ceftriaxone 20 mg/kg, bid). The initial recording session was scheduled 1–5 days after completing the antibiotic regimen.
No wounds or pressure points were present during recording. To prepare the cat for recording, anesthesia was induced with ketamine hydrochloride (30 mg/kg, im) and acepromazine maleate (3–5 mg/kg, im). The animal was maintained during the recording session with continuous intravenous infusion of ketamine hydrochloride (4–6 mg/kg), the paralytic pancuronium bromide (0.1–0.2 mg·kg−1·h−1; initial dose was 0.3 mg/kg), and 5% dextrose Ringer solution (3–6 ml/h). It was artificially respired and respiratory rate and volume were adjusted to maintain end-tidal CO2 at roughly 4.0%. Body temperature was kept at 37–38°C using the heating pad. The head was held by the implanted head holder that was attached to a metal frame. At the beginning of the experiment the pupil of the eye contralateral to the SC to be studied was dilated with an ophthalmic solution of atropine sulfate (1%). A contact lens corrected the contralateral eye's refractive error and an opaque lens occluded the other eye. A hydraulic microdrive advanced a glass-insulated tungsten electrode (tip diameter: 1–3 μm; impedance: 1–3 MΩ at 1 kHz) into the SC; sensory neurons were identified by their responses to visual, auditory, and somatosensory “search” stimuli. Neuronal responses were amplified, displayed on an oscilloscope, and played through an audio monitor. The X–Y coordinates of the electrode penetration and the recording depth of each neuron encountered were systematically recorded. At the end of the recording session the anesthetic and paralytic were discontinued. When stable respiration and locomotion were reinstated, the cat was returned to its home cage.
Neuronal search paradigm, receptive field mapping, and sensory tests
Visual search stimuli included moving and stationary flashed light bars projected onto a tangent screen 45 cm in front of the animal. Auditory search stimuli consisted of broadband (20–20,000 Hz) noise bursts, clicks, and taps. Visual receptive fields were mapped with moving light bars. Auditory receptive fields were mapped with broadband noise bursts 10 dB above threshold and originating from any of 16 hoop-mounted speakers placed 15° apart and 15 cm from the animal's head on a rotating hoop so that elevation could be examined without obstructing the animal's vision or the visual stimulus (Meredith and Stein 1986a,b). After mapping, receptive fields were plotted on standardized representations of visual and auditory space (Stein and Meredith 1993).
Visual test stimuli consisted of either one or two moving or stationary light bars placed within the borders of the neuron's receptive field, close to its center. The bars (0.11–13.0 cd/m2 against a 0.10-cd/m2 background) were generated by a Silicon Graphics workstation and projected by a Barcodata projector onto a tangent screen that subtended nearly 60° of visual angle in each hemifield. The stimuli could be moved in all directions across the receptive field at amplitudes of 1–110° and speeds of 1–400°/s.
Auditory test stimuli were computer-generated bursts of white noise, played from any one of the hoop-mounted speakers and placed within the neuron's auditory receptive field. Like the visual stimulus, these were placed in the center of the receptive field. The duration of the auditory stimuli varied from 10 to 50 ms at intensities of 52–70 dB SPL against an ambient background SPL of 51.4–52.0 dB.
General testing paradigm
Evaluations of multisensory neurons consisted of examining the neuron's responses to visual targets presented alone, auditory target presented alone, paired visual–visual targets (within-modal tests to examine unisensory integration), or paired visual–auditory targets (cross-modal tests to examine multisensory integration). For each one of the responses evaluated, trial sequences were interleaved in a pseudorandom fashion for eight presentations, with 8–20 s between trials. Generally the paired within-modal and cross-modal stimuli were presented simultaneously. However, in several cases a neuron's auditory and visual latencies were so different that a 50-ms delay (V before A) was required to ensure a robust multisensory interaction. To evaluate the integrative profiles of these neurons, five different points within their dynamic response ranges were chosen by systematically manipulating the relative effectiveness (i.e., intensity) of either or both stimuli in a series of pretests with each neuron. A rough determination of threshold and saturation was first made and three additional, equally spaced stimulus intensities were chosen to span these extremes. Testing in unisensory visual neurons followed a similar paradigm, with the two visual stimuli presented individually or together, in a randomly interleaved pattern, at five different points along a neuron's dynamic range.
Data acquisition and analysis
Statistical analysis was performed using Statistica for Windows, release 5.0 (StatSoft). Neuronal responses to each possible stimulus condition (visual alone, auditory alone, visual–visual, and visual–auditory) were assessed based on the mean number of impulses evoked during a common time window for that neuron. This time window bracketed the longest response train (beginning at stimulus onset) evoked by one of the experimental conditions and was used in the evaluations of all conditions for that neuron to maintain internal consistency. The stimulus-evoked responses were corrected for background activity by subtracting spontaneous discharge rates (i.e., spontaneous discharge rates were measured for 1 s before the onset of the first stimulus during each set of trials and then normalized for the time window in which responses were counted) (Jiang et al. 2001). Responses were analyzed statistically to determine whether a significant (two-tailed t-test; P < 0.05) change in the number of impulses occurred with combined stimuli compared with the most effective single-modality stimulus. The magnitude of this combined response enhancement was evaluated with the multisensory enhancement index described by the following formula (Meredith and Stein 1983) where CM is the mean number of impulses/trial evoked by the combined stimuli (i.e., either visual–auditory or visual–visual) and SMmax is the mean number of impulses per trial evoked by the most effective modality-specific stimulus.
A second index of multisensory integration evaluated the mean multisensory response (averaged across trials) against a benchmark of simple summation of the modality-specific stimulus components. To do so, the actual mean multisensory response was compared with a distribution of expected means based on summation of the same neuron's responses to the individual modality-specific stimuli. To generate the predicted distribution of mean multisensory responses, all possible sums from visual and auditory alone trials were computed. Thus for eight trials each of a visual and auditory stimulus, a sample distribution of 64 (8 × 8) possible sums was created. From this distribution of 64 sums, eight trials were randomly selected (i.e., random without replacement) and averaged to create a predicted mean multisensory response. This sampling and averaging operation was repeated 10,000 times to build a reference distribution of mean multisensory responses. The actual mean multisensory response could then be compared directly to the distribution of means predicted from summation and this relationship expressed as a Z-score. If the actual response was >2 SDs above (Z >1.96) or below (Z < −1.96) the predicted mean, the null hypothesis of additivity was rejected and the multisensory response was considered to be superadditive and subadditive, respectively (see Stanford et al. 2005). It is important to note that these comparisons were made only for combinations in which the intensities for the paired stimulus combination (i.e., either visual–visual or visual–auditory) were matched with those intensities used to generate the unisensory predictions.
In addition, the differences among neuronal profiles were assessed using various statistical treatments, depending on whether the sample distributions met the assumptions of normality. The Kolmogorov–Smirnov test of normality was used to determine whether the variables studied (mean impulses of the unisensory and combined responses, enhancement indices of the combined responses, and Z-scores of the combined responses) reflected an underlying normal distribution. For the cases in which assumptions of normality were met, t-tests and ANOVAs were used. Alternatively, nonparametric tests, such as Kruskal–Wallis ANOVA, Spearman rank ρ, and χ2, were used if assumptions of normality were rejected.
A total of 106 sensory-responsive neurons were sampled from the multisensory (i.e., deep) layers of the SC. Ninety-five of them were isolated for sufficient periods of time to examine their integrative properties in detail. The analyses conducted here were restricted to visual–auditory multisensory neurons (51% of the sample) and visual unisensory neurons (49%).
Multisensory neurons: responses to cross-modal stimuli
Consistent with previous findings (e.g., Jiang et al. 2001; Kadunce et al. 1997; Meredith and Stein 1986a; Perrault et al. 2003; Stein 1998; Wallace et al. 1998), multisensory enhancement was found to be a common feature among multisensory neurons in the cat SC and was obtained in 87% (42/48) of those neurons examined. An example is presented in Fig. 1. Note that the presentation of the combined visual–auditory stimulus produced a response that was significantly (P < 0.01) greater than either of the individual modality-specific responses and even surpassed their arithmetic sum.
The magnitude of multisensory enhancement among SC neurons generally proved to be most pronounced at lower levels of stimulus effectiveness and decreased as the constituent unisensory responses became more robust. This principle of “inverse effectiveness” (see Meredith and Stein 1986a; Stein and Meredith 1993) was noted in 73% of the multisensory neurons studied, despite considerable variability in the vigor of the responses and the absolute magnitude of the multisensory enhancement noted. The example presented in Fig. 1 is generally representative and shows that the degree of response enhancement steadily decreased as stimulus effectiveness increased.
To determine the nature of the underlying computation used by this neuron when integrating information from the cross-modal stimulus combination, responses were evaluated with respect to a linear summation model. The summation model provides a simple null hypothesis in which the predicted response to any combined stimulus is equal to the sum of its constituent unisensory responses (see methods). A Z-score of ±1.96 was taken as deviating significantly from this prediction. In Fig. 1, the Z-scores obtained in the first two levels of stimulus effectiveness (Z = 4.5 and Z = 4.3) revealed that the multisensory responses significantly exceeded the predicted sum, thereby demonstrating that the integrated responses were superadditive. Most significant in the current context was that the multisensory integrative operation performed by this neuron changed from superadditive at low levels of stimulus effectiveness to additive at higher levels. Figure 1 illustrates two general tendencies that were noted in a number of multisensory SC neurons and will be explored in more detail in the population analysis provided later: first, that superadditivity was obtained predominantly at low levels of modality-specific stimulus effectiveness; second, that there was a tendency to transition from one computational mode to another as stimulus effectiveness changed. Using a stationary visual stimulus yielded similar results (not shown), with multisensory integration changing from superadditive at low levels of stimulus effectiveness to additive at higher levels. This is consistent with a previous study showing that this aspect of multisensory integration is not strongly dependent on the physical features of the stimulus (Perrault et al. 2005).
Multisensory neurons: responses to within-modal stimuli
Pairs of within-modal cues were presented to the same multisensory neurons to examine whether the properties of multisensory and unisensory integration would be similar or different. A total of 48 multisensory neurons were studied in this fashion and their resultant multisensory and unisensory products were compared as described in methods.
Figure 2 shows the responses of the neuron described in Fig. 1 to pairs of visual stimuli. In contrast to the significant response enhancement that resulted when this neuron integrated visual–auditory stimuli (Fig. 1), its responses to pairs of visual stimuli did not significantly exceed the most effective of these stimuli alone at any level of stimulus effectiveness.
The failure of within-modal stimulus combinations to yield enhanced responses could not be attributed to postsynaptic response saturation (i.e., a “ceiling effect”) given the wide response range of the neuron and the fact that integration tended toward subadditivity even for combinations of the least effective stimuli. Rather, it appeared that there were fundamental differences in the computations used by this neuron during multisensory and unisensory integration. Z-scores illustrate that unisensory integration was subadditive at every level of stimulus effectiveness and this trend was representative of many of the multisensory neurons examined despite substantial differences in their receptive field sizes, their locations in the SC, and the overall robustness of their responses. These observations raised the question of whether the expression of different integrative modes for cross-modal and within-modal stimuli was specific to multisensory neurons or represented a more general difference in the way in which cross-modal and within-modal stimuli are coded in the SC.
Unisensory neurons: responses to within-modal stimuli
To examine this issue, the same within-modal tests were conducted in a population (n = 47) of neighboring unisensory visual SC neurons. As was true for multisensory neurons, response enhancement was found to be rare in response to within-modal stimulus pairs. A representative example of the responses to individual and combined visual stimuli is shown in Fig. 3. Addition of the second stimulus produced no significant response enhancement in this neuron regardless of the relative level of visual stimulus effectiveness. As was the case when similar tests were conducted with the multisensory neuron shown in Fig. 2, this failure did not arise from the neuron's inability to respond with sufficient numbers of impulses. Indeed, it seemed to differentiate between stimulus conditions with significant response depression because its peak response to a single visual stimulus (V2) significantly exceeded its response to the pair of visual stimuli at the highest levels of stimulus effectiveness. Z-scores once again indicated that unisensory integration was subadditive at all levels of stimulus effectiveness. The similarities between these results obtained with unisensory neurons and the results obtained with multisensory neurons indicate that this computational strategy is not specific to a given class of SC neurons. Rather, the data suggest that the SC uses different computations to integrate within-modal and cross-modal stimuli.
Population patterns reflect differences in multisensory and unisensory integration
To compare multisensory and unisensory integration at the population level, for all neurons and for each stimulus combination, the relationship between the integrated response and the response to the most effective component stimulus was plotted for both cross-modal (Fig. 4A) and within-modal (Fig. 4B) stimulus pairs. Note that in the cross-modal stimulus condition, the majority (78.1%) of points fell above the line of unity (Fig. 4A), indicating that the multisensory responses were greater than their most effective unisensory counterparts. Most (55.3%) of these differences were statistically significant, thereby achieving the criterion necessary to be classified as examples of multisensory response enhancement. Although some (9.2%) of the points fell below the line of unity, very few of these (1.3%) constituted statistically significant cases of multisensory depression. Statistical evaluation of the population data confirmed that, on the whole, the cross-modal responses were significantly enhanced [Kruskal–Wallis: H(2,686) = 129.71, P < 0.0001].
In contrast, responses to the combined stimulus in the within-modal stimulus condition clustered around the line of unity (Fig. 4B). The few responses that were significantly above the line of unity (5.0%) were counterbalanced by an even greater number of responses that were significantly below it (18.7%). Thus at the population level, there were more instances in which the combined stimulus in the within-modal condition significantly depressed rather than enhanced the response. Most frequently, however, the combined response was no different from the response to the most effective individual visual stimulus. This within-modal pattern was independent of whether the neuron examined was multisensory or unisensory because their probabilities of response enhancement (4.4% for multisensory and 5.6% for unisensory neurons) and response depression (20.1% for multisensory and 17.3% for unisensory neurons) were nearly equal (see insets in Fig. 4B).
The within-modal results indicate that simultaneous presentation of a second and more weakly effective stimulus has either very little influence or a suppressive influence on the number of impulses evoked. An example of each of these is shown in Fig. 5, A and B. For the neuron in Fig. 5A, the response to the combined stimulus (V1V2) closely approximated that to the more effective of the two visual stimuli (V2) for each of the three levels of stimulus effectiveness. At least from the perspective of impulse number, this strongly subadditive response suggests a neural implementation of the maximum operator (an operation that returns the largest of its inputs). In contrast, for the neuron shown in Fig. 5B, the response to the combined stimulus (V1V2) was intermediate to that of either visual stimulus alone, an observation that was readily apparent when the two visual stimuli themselves differed substantially in their efficacies (as is the case for stimulus effectiveness values 1 and 3). Here the response to the combined stimulus might best be described as an average of the responses to the individual component stimuli.
The responses of an additional 51 neurons were examined to determine whether a maximum or averaging operation best characterized the observed subadditive interactions. For these neurons, visual stimulus intensities were manipulated to ensure that the responses to the two visual stimuli would differ because the more different the responses to the individual visual stimuli, the more divergent the predictions of the maximum and averaging operations. The same analytical methods (see methods) used for evaluating the combined response with respect to summation were used for evaluating responses with regard to the maximum and averaging predictions. For the majority of neurons (30/51; 58.8%) there was not a consistent pattern to suggest uniform implementation of the maximum or averaging operations across stimulus level or stimulus efficacy. Of the relatively few neurons that did show a uniform mode of integration, five (9.8%) were consistent with the maximum (Fig. 5A) and 11 (21.6%) were consistent with the averaging (Fig. 5B) operation. Unisensory visual neurons and multisensory neurons did not appear to differ in this regard.
If one considers each of the different levels of stimulus effectiveness as an individual stimulus condition, the incidence of multisensory neurons showing response enhancement to at least one cross-modal stimulus condition was 87%. By contrast, the incidence of neurons showing response enhancement to a within-modal stimulus condition was 18% (17% in the 48 multisensory neurons and 19% in the 47 unisensory visual neurons). These differences were most pronounced at the lowest levels of stimulus effectiveness, as might be expected based on the principle of inverse effectiveness. Thus there was a negative relationship between stimulus effectiveness and the multisensory enhancement index [Kruskal–Wallis: H(4,674) = 49.17, P < 0.0001]. The mean magnitude of the multisensory enhancement index was 280% at the lowest level of stimulus effectiveness and became <40% at the highest level. This contrasts with the integrated unisensory response, which was <40% even at the lowest level of stimulus effectiveness and was generally close to 0% all subsequent levels, a product that is once again consistent with an averaging or a maximum operator applied to the individual modality-specific responses.
An analogous population analysis was conducted to compare the computations used in integrating within- and cross-modal cues. Distributions of Z-scores for responses to cross-modal (multisensory neurons) and within-modal (multisensory and unisensory neurons) stimuli are shown in Fig. 6, A and B, respectively. Superadditive and additive interactions characterized nearly all cases of cross-modal integration (Fig. 6A), whereas subadditive and additive interactions predominated for integration of within-modal information (Fig. 6B and also inset in Fig. 6B), a difference that is readily apparent in the comparison of cumulative density functions shown in Fig. 6C.
Accordingly, plotting the integrated response against the predicted sum (Fig. 7) shows that for cross-modal integration most points clustered around or above the line of unity, whereas for within-modal integration most fell substantially below the line of unity. Chi-square analysis of the frequencies of each computational operation during cross-modal and within-modal tests showed that the group differences were unlikely to have been the result of chance (χ2 = 215.6, df = 4, P < 0.0001). In short, the products of multisensory and unisensory integration reflect different computations. Cross-modal integration is characterized by superadditivity at very low levels of stimulus efficacy, which gives way to additivity as stimulus efficacy increases. Within-modal integration is characterized by additivity at the lowest levels of efficacy and gives way to subadditivity as stimulus efficacy increases (see Fig. 8).
EFFECT OF ADDING A VISUAL STIMULUS TO A CROSS-MODAL STIMULUS PAIR.
Addition of a visual stimulus to the cross-modal stimulus pair to form a stimulus triad (multisensory plus a second visual) did not enhance the responses beyond those already achieved with the cross-modal stimulus. A plot of the responses to the stimulus triad against those to the cross-modal stimulus pair (Fig. 9A) or against the predicted responses (Fig. 9B) shows points clustered around the line of unity and strong correlation between the measures (Spearman ρ = 0.96). No statistically significant differences were found between the mean response to the cross-modal pair of stimuli (16.5 ± 0.7 impulses) and the mean response to that cross-modal stimulus pair plus the second visual stimulus (17.1 ± 1.1 impulses).
The present study demonstrates that the products of unisensory and multisensory integration within the SC are appreciably different. As shown in previous studies (e.g., Jiang et al. 2001; Meredith and Stein 1983; Perrault et al. 2005; Stanford et al. 2005; Wallace et al. 1998), response enhancement characterizes the integration of cross-modal information (multisensory integration). In contrast, the present results showed that combinations of within-modal stimuli often produce response depression, which in many cases manifests as an approximate averaging of the responses to the component stimuli (unisensory integration).
Following the same systematic stimulus presentation and analytical approach recently described (Stanford et al. 2005), the current study reproduced the principal findings for multisensory integration: multisensory interactions ranged from superadditive to subadditive, with the efficacy of the modality-specific stimulus components being the primary determinant of the integrative mode expressed. As previously reported, superadditivity predominated when weakly effective stimuli were combined but yielded to additivity as the stimuli became more effective. Subadditivity was again quite rare (see Stanford et al. 2005), observed only for combinations of the most effective modality-specific stimuli.
In contrast, unisensory integration, except in the rare case, consisted of subadditive or additive interactions. From a mechanistic perspective, it is important to emphasize that the pattern with which unisensory and multisensory SC neurons dealt with within-modal stimuli was virtually identical. Thus there was no evidence that unisensory and multisensory SC neurons are intrinsically different in this regard, but that distinctions between integrating within-modal (subadditive) and cross-modal (additive or superadditive) information arise from differences in extrinsic input sources and/or local circuitry.
The forms of unisensory integration observed here are reminiscent of those in studies of spatial integration for multiple stimuli within the receptive fields of neurons in striate and extrastriate cortical regions. Evidence for subadditive interactions that approximate a weighted average of multiple stimulus influences is quite strong, having been observed at multiple levels of the visual hierarchy, including V1 (Carandini et al. 1997; Heeger 1992); V2 and V4 (Reynolds et al. 1999); MT (Britten and Heuer 1999; Heuer and Britten 2002; Recanzone et al. 1997; Treue et al. 2000), and IT (Rolls and Tovee 1995; Zoccolan et al. 2005). Recent studies also report the existence of neurons that do not average the influences of multiple stimuli, but appear to implement a maximum operation wherein the response approximates that of the single most effective of two receptive field stimuli. Examples of this form of stimulus integration were reported for complex cells in V1 (Lampl et al. 2004) and V4 (Gawne and Martin 2002).
These observations of normalization (averaging) and maximum operations in cortex lead to relatively straightforward implications for visual processing. Averaging (rather than linear summation), for example, could serve as a mechanism of gain control by preventing response saturation, thereby allowing neurons to continue to use response rate to signal information along some feature dimension (e.g., motion direction). Response normalization thus permits relatively invariant feature coding in the presence of different numbers of stimuli in the receptive field, the spatial extent of the existing stimuli, or their spatial contrast (e.g., Heuer and Britten 2002; Zoccolan et al. 2005). On the basis of logical arguments and modeling results (Riesenhuber and Poggio 1999), neurons implementing the maximum operation could be important for the parallel analysis of multiple objects within a visual scene by responding similarly to a “preferred” stimulus, irrespective of whether other stimuli are nearby (i.e., within the same receptive field). In principle, such neurons would be ideal for the analysis of cluttered visual environments (see Rousselet et al. 2003, 2004).
Although the qualitative similarities between our findings for unisensory integration in the SC and those reported for visual cortex point to possible commonalities in underlying mechanisms (e.g., lateral inhibition), the functional implications may be quite different for different structures. Although neurons in the SC can show some sensitivity to stimulus features, like those in cortex (see Horwitz et al. 2004), they are better characterized as stimulus detectors than feature analyzers and their sensory-related activity is more closely linked to stimulus salience and represents potential targets for visual orienting. Given that gaze shifts are made one at a time and with high spatial precision, the failure of multiple receptive field stimuli to have a reinforcing effect on response magnitude seems adaptive. It is reasonable to assume that instantiated within SC circuitry would be mechanisms that, by default, treat spatially resolvable stimuli as competitors (subadditive interactions) rather than synergists (additive or superadditive), even if their proximity places them within the same receptive field.
Evidence for within-modal competition at the behavioral level comes from studies examining neural activity in monkeys required to choose the target of a gaze shift from among several possibilities (Basso and Wurtz 1997, 1998; McPeek and Keller 2002). These studies indicate that an initially ambiguous representation consisting of a focus of activity for each potential target (target and distracters) gives way to a single locus of activity within the SC sensorimotor topography. Thus between the time that the stimuli appear and the time that a gaze shift command is issued, competing alternatives are eliminated to leave activity only at a locus consistent with the gaze shift required to look toward the target. Most relevant here is that even the earliest stimulus-linked component of an SC neuron's response provides evidence for competition between simultaneously present stimuli. For example, Basso and Wurtz (1997, 1998) demonstrated an inverse relationship between the visual responsiveness of primate SC neurons and the number of simultaneously present stimuli. Similarly, McPeek and Keller (2002) noted that the early visual stimulus-driven responses were suppressed when more than one stimulus was present in an array of potential targets. It is important to note that, unlike the relatively slow-to-evolve target/distracter discrimination, the suppressive effect on early visual activity is likely to be a bottom-up phenomenon and a reflection of a more hard-wired aspect of circuitry (e.g., lateral inhibition) within the SC or one of its input sources. In the current data set, subadditive interactions were evident in the early stimulus-linked response. This, combined with the fact that these interactions were observed in anesthetized animals, seems to favor a bottom-up explanation.
A potentially important difference between the present study and the above-described primate studies concerns the proximity of the visual stimuli. By design, the stimulus arrays used in the primate studies were composed of widely dispersed elements, so that no more than a single stimulus was present within any given neuron's receptive field at one time. However, in a very recent study, Li and Basso (2005) demonstrated competitive interactions for pairs of visual stimuli within the receptive fields of single primate SC neurons. Consistent with the presently reported findings from the cat SC, Li and Basso (2005) report primarily subadditive interactions that are best approximated by an average of the individual stimulus responses.
When considered from a behavioral perspective, the differences in responses to cross-modal and within-modal stimuli that differentiate multisensory and unisensory integration are intuitively understood because cues from different modalities, when occurring in close temporal and spatial proximity, are likely to be derived from the same external event. As such, cross-modal cues reinforce one another, increasing the likelihood of detecting and/or initiating an orienting response to the initiating event, as was repeatedly obtained in experiments with behaving cats (see Jiang et al. 2002; Stein 1998; Stein et al. 1988, 1989; Wilkinson et al. 1996). A host of studies with humans have demonstrated analogous perceptual and behavioral benefits from such multisensory cues (Bolognini et al. 2004; Frassinetti et al. 2002; Frens et al. 1995; Goldring et al. 1996; Harrington and Peck 1998; Laurienti et al. 2004; Lovelace et al. 2003).
The multisensory–unisensory distinctions observed here suggest significant differences in underlying circuitry. Lateral inhibitory effects, which might produce the divisive scaling necessary for normalization, were clearly not evident during multisensory integration, a process that is also dependent on cortical influences. Most important in this context are the inputs from the anterior ectosylvian sulcus (AES) and a neighboring region, the rostral aspect of the lateral suprasylvian cortex (rLS) (Meredith and Clemo 1989; Stein et al. 1983; Wallace et al. 1993) that are essential for multisensory integration (Jiang et al. 2001; Wallace et al. 1994). Reversible deactivation of these cortices disrupts the ability of SC neurons to integrate their cross-modal inputs, but does not interfere with their ability to be multisensory (Clemo and Stein 1986; Jiang and Stein 2003; Jiang et al. 2001). Although these neurons are still responsive to non-AES–derived modality-specific influences, their response enhancement is eliminated. Preliminary observations suggest that unisensory integration is not affected (Alvarado et al. 2005). In short, deactivating cortex minimizes the differences between cross-modal and within-modal integration.
Because many of the principles of multisensory integration are neither stimulus specific nor structure specific, it is tempting to conclude that the multisensory–unisensory computational distinctions noted here reflect general principles in the nervous system and will be evident for other stimulus combinations, other sensory modalities, and in other multisensory structures. However, whether these distinctions—which appear adaptive in a brain area involved in target localization—extend to other stimulus combinations, other senses, and/or areas involved in very different multisensory tasks remains to be determined.
This work was supported by National Institutes of Health Grants NS-36916, NS-22543, and EY-016716.
We thank N. London for editorial assistance.
The costs of publication of this article were defrayed in part by the payment of page charges. The article must therefore be hereby marked “advertisement” in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.
- Copyright © 2007 by the American Physiological Society