In the brain, mutual spatial alignment across different sensory representations can be shaped and maintained through plasticity. Here, we use a Hebbian model to account for the synaptic plasticity that results from a displacement of the space representation for one input channel relative to that of another, when the synapses from both channels are equally plastic. Surprisingly, although the synaptic weights for the two channels obeyed the same Hebbian learning rule, the amount of plasticity exhibited by the respective channels was highly asymmetric and depended on the relative strength and width of the receptive fields (RFs): the channel with the weaker or broader RFs always exhibited most or all of the plasticity. A fundamental difference between our Hebbian model and most previous models is that in our model synaptic weights were normalized separately for each input channel, ensuring that the circuit would respond to both sensory inputs. The model produced three distinct regimes of plasticity dynamics (winner-take-all, mixed-shift, and no-shift), with the transition between the regimes depending on the size of the spatial displacement and the degree of correlation between the sensory channels. In agreement with experimental observations, plasticity was enhanced by the accumulation of incremental adaptive adjustments to a sequence of small displacements. These same principles would apply not only to the maintenance of spatial registry across input channels, but also to the experience-dependent emergence of aligned representations in developing circuits.
When analyzing stimuli, the CNS integrates information from many different sources. The information can arise from functionally distinct channels within the same sensory modality, such as the representations of movement and color or the representations of the visual scene from the two eyes, or it can arise from different sensory modalities, such as the auditory and visual representations of an object's location. To integrate information appropriately, the brain must associate information from different channels that correspond to the same object.
The plasticity of convergent spatial information across channels has been studied with various experimental manipulations. For instance, when a tadpole's eye is rotated in its orbit, the visual map of space in the optic tectum (OT) that originates from the rotated ipsilateral eye reorganizes and realigns with the visual map of space that originates from the contralateral eye (Udin and Grant 1999). Analogously, when barn owls are fitted with optical displacing prisms, the auditory space map shifts to align with the displaced visual space map (DeBello and Knudsen 2004; Knudsen 2002). When barn owls are fitted instead with acoustic devices that alter auditory spatial cues, again the auditory space map shifts to align with the visual space map (Gold and Knudsen 1999).
In principle, realignment of spatial representations across input channels could be achieved either by partial adjustments in both of the representations or by complete adjustment of one of the representations. However, experiments show that when a tadpole's eye is experimentally rotated, binocular realignment is mediated predominantly by plasticity of the inputs from the ipsilateral eye and, when a barn owl is fitted with optical displacing prisms or acoustic devices, auditory–visual realignment is accomplished predominantly by plasticity of the auditory inputs. Thus in both of these systems, plasticity appears to be highly asymmetric between the two sensory channels.
A simple explanation of these results would be that (contralateral) visual synapses are incapable of plasticity. However, there is extensive evidence of visual plasticity throughout the brain, including in the projection from the contralateral retina to the optic tectum. The retinotectal synapses in the tadpole can be potentiatated or depressed, depending on the temporal relationship between presynaptic and postsynaptic activity (Zhang et al. 1998). Tectal neurons can rapidly develop directional sensitivity after stimulation with moving visual stimuli (Engert et al. 2002). Additionally, in a variety of model systems, when part of the tectum is ablated, the retinal projections compress into the remainder of the tectum. Conversely, when part of the retina is ablated, the remaining projections expand to fill the entire tectum (Udin and Fawcett 1988). When an additional eye is surgically implanted in a tadpole, the projections from the implanted eye to the optic tectum form ocular dominance columns that interdigitate with the tectal projections from the contraletral eye (Constantine-Paton and Law 1978; Reh and Constantine-Paton 1985).
Given that both input channels are capable of plasticity, the naïve expectation is that Hebbian learning would cause both input channels to shift equally in response to a sensory misalignment. Contrary to this expectation, we find that Hebbian learning typically causes substantial asymmetry in the amount of plasticity across input channels, even when both channels are equally capable of plasticity. We show that small differences in the relative strength of drive and in the relative width of the receptive fields across input channels lead to dramatic differences in the plasticity expressed by each input channel. Thus relative differences in input strength and RF width across channels can result in a dominance of one input channel over another in guiding plasticity and could account for the development and maintenance of aligned sensory representations in the brain.
The architecture of our model is shown in Fig. 1. For concreteness, we refer to the two sensory input layers that converge onto the output cell as visual and auditory, similar to the organization in the superior colliculus or optic tectum (but they could just as well be spatiotopic representations of any two sensory parameters in any part of the brain). Each of the presynaptic sensory layers consists of a one-dimensional map of spatially selective cells. uai and uvi are the presynaptic firing rates of the ith cell in the auditory and visual layers, respectively. wai describes the synaptic weight between the ith neuron in the auditory layer and the postsynaptic cell. Similarly, wvi describes the weight between the ith neuron in the visual layer and the postsynaptic cell. The postsynaptic neuron has a firing rate r.
In the left column of Fig. 1, the visual field is not displaced, and auditory and visual activity arising from the same stimulus object are aligned across the two input layers. In the right column of Fig. 1, visual and auditory activities arising from the same stimulus object are misaligned across the two input layers, as would result, for example, from a change in eye position or optical prisms, or from a systematic misrouting of connections during development. The projection of the visual field onto the visual layer was displaced by 45° to the left.
Auditory and visual stimuli created Gaussian distributions of activity in the presynaptic layers that were centered at θa(t) and θv(t), the position of the auditory and the visual stimuli in the outside world. The ith cell in either layer fired maximally in response to a stimulus positioned at θi in the external world. The distributions of activity in the presynaptic layers were modeled as follows (1)
σa and σv represent the widths of the auditory and visual activity distributions. Na and Nv represent the normalization terms for the Gaussians such that ∑i uvi(t) = z and ∑i uai(t) = kz, where N is the number of neurons in each of the presynaptic layers and z is a scaling factor. Thus k denotes the ratio of the total activity in the auditory and the visual presynaptic layers (2)
As derived in the supplemental materials,1 given the above-cited constraints, the normalization terms are (3)
The relative width of auditory and visual activity distributions is determined by the variable b (4)
The binary variables Sa and Sv, which can be either 0 or 1, represent whether the auditory or visual stimulus was present at a given point in time (0 corresponds to not present; 1 corresponds to present). Assuming a probability p of the stimulus being present at a given point in time, the average values of these variables are as follows (5) The variable f denotes the strength of temporal correlation between the auditory and visual presynaptic activity. If there were perfect correlation between auditory and visual presynaptic activity, f would equal 1. In reality, f is expected to be significantly <1 because most stimuli are not bimodal. If auditory and visual activity are uncorrelated, f = p. If the two modalities are anticorrelated, f vanishes. Since p is likely to be ≪1, we will consider here the entire range between zero and 1 as the relevant range of f.
Under standard conditions, when auditory and visual stimuli originate from the same object, they activate neurons encoding the same position of space in the auditory and visual presynaptic layers (6)
The left column of Fig. 2 shows the distributions of presynaptic activity for an example stimulus positioned straight ahead (at 0°) in the external world, when the visual field is not displaced. The activity representing these stimuli is centered at 0° in both the auditory and the visual presynaptic layers.
An experimentally induced spatial displacement of the visual field of magnitude φ creates an offset between the distributions of activity in the auditory and visual presynaptic layers (7)
The right column of Fig. 2 shows the distributions of presynaptic activity for an example stimulus positioned at 0° in the external world, in the presence of a leftward displacement of the visual field (φ = 45°). The auditory activity is again centered at 0°, but the visual activity is now centered at −45°, displaced to the left in the visual presynaptic layer.
To avoid the confound of boundary conditions, we consider the neurons in each of the presynaptic layers as positioned around a ring, with positions ranging from −180 to 180° in steps of 0.5°. However, the model gave similar results when we positioned the neurons on a line.
Postsynaptic firing rate
The postsynaptic firing rate r is described by a simple firing rate model (8)
We assume that the timescale τr of the adjustment of the firing rate of the postsynaptic neuron is short relative to the timescale of the sensory stimulation as well as the modulation in a neuron's synaptic weights; thus Eq. 8 reduces to (9)
The synaptic weights were adjusted according to a variant of a Hebbian learning rule (10) where (11)
The terms Hai and Hvi constitute several components that are summed and thresholded above zero. The threshold serves to avoid a runaway of synaptic strengths. However, its specific form is not crucial for our results because we found that replacing the threshold nonlinearity with a sigmoidal nonlinearity does not affect our central findings. The first component of Hai and Hvi is the Hebbian correlation term 〈ruai〉 or 〈ruvi〉, the time-averaged product of the firing rate for the postsynaptic neuron (r) and the ith presynaptic neuron (uai or uvi). The second component is a subtractive suppressive term, −I ∑j waj or −I ∑j wvj, which suppresses the growth of the total synaptic weights of each sensory modality. Finally, a constant synaptic potentiation term, a, strengthens each synapse at each iteration and prevents the decay of all the weights to zero. The terms −wai and −wvi in Eq. 10 are local weight decay terms that guarantee the overall stabilization of the pattern of synaptic weights.
We combine Eqs. 9–11, taking out the weights wa and wv from the time averaging, under the assumption that they change much more slowly in time than the presynaptic and postsynaptic firing rates. This yields the following equations (12)
The Cij terms correspond to the time-averaged correlation in presynaptic activity (13)
As derived in the supplemental materials, under the assumption that stimulus positions are uniformly distributed in space, the correlations correspond to Gaussian distributions, with amplitudes Jaa and Jvv (14)
The intramodal correlations are centered at θi = θj because a neuron's firing rate is always most correlated with its own firing rate (Fig. 3A). This is true regardless of the presence of a spatial displacement of the visual field. For the auditory and visual intramodal correlations depicted in Fig. 3A, the auditory correlation is broader than the visual, reflecting the fact that, for this example, the distribution of auditory presynaptic activity is broader than that of the visual (b = 1.5, k = 1).
Under standard conditions, the crossmodal correlation terms (Cijav and Cijva) are also centered at the position θi = θj because correlated auditory and visual stimuli are aligned in space and the visual field is not displaced (Fig. 3B, solid line) (15)
A displacement of the visual field causes the crossmodal terms to be offset from θi = θj by the displacement φ because correlated auditory and visual activity distributions in the presynaptic layers are now misaligned (Fig. 3B, dotted line and dashed line) (16) Cijav and Cijva are offset from θi = θj in opposite directions. Cijav is centered at θi = θj − φ since an auditory neuron at position i is most correlated with a visual neuron a distance −φ away (Fig. 3B, dotted line). Conversely, Cijva is centered at θi = θj + φ since a visual neuron at position i is most correlated with an auditory neuron a distance +φ away (Fig. 3B, dashed line).
The magnitudes of the correlations Jaa, Jav, and Jvv are related by k, the relative strength of auditory and visual presynaptic activity, and f, the correlation between auditory and visual presynaptic activity (17)
The width of the crossmodal correlation σav is a function of the width of the distributions of presynaptic activity (18)
Alternate forms of the model
To test whether the results were dependent on the specific form of the synaptic suppressive term, we introduced two alternate versions of the model. The first (“Alternate Form A”) included a multiplicative normalization in addition to the subtractive suppressive term. We added a multiplicative term to the model, rather than replacing our subtractive term with a multiplicative term, because a multiplicative normalization on its own fails to generate restricted receptive fields (Miller and MacKay 1994). In “Alternate Form A,” at every iteration, the value of the synaptic weights are normalized by the sum of the weights, and multiplied by a constant c, as shown (19)
The second alternate form of the model (“Alternate Form B”) was a linear variant of a Bienenstock–Cooper–Munro (BCM) learning rule (Bienenstock et al. 1982). It involved implementing a sliding threshold to replace the original subtractive suppressive term. We replace Eq. 11 with the following terms (20)
This sliding threshold results in the “Alternate Form B” as follows (22)
Additionally, to test whether the results of the model were sensitive to the form of the nonlinearity, we developed an alternate form of the model with a different nonlinearity. In this third alternate form of the model (“Alternate Form C”), the threshold nonlinearity in Eq. 11 is replaced with a sigmoidal nonlinearity (23)
Therefore “Alternate Form C” takes the form (24)
Assessment of postsynaptic RFs
The auditory and visual RFs of the postsynaptic neuron [Fa(θi) and Fv(θi)] were calculated by convolving the presynaptic response to a stimulus at a given position, θi, with the corresponding synaptic weights (25)
To update the synaptic weights, the set of differential equations (Eq. 12) was solved numerically in Matlab using the Euler method. Gaussian distributed noise, scaled by the square root of the time step, was added at each iteration to simulate biological noise.
In our simulations, Jvv was fixed at 2.5, σv was fixed at 5°, and I was fixed at 100, unless otherwise stated. The synaptic weights were initialized as Gaussian distributions centered at 0° (amplitude of 1; width at half-maximum of 10°). Before the spatial displacement of the visual field was applied, the simulation was run for 30 time points to allow the weights to stabilize. The constant potentiation term a was set to 1. The magnitude of the noise that was added to the weights at each time point was 0.001.
The important free parameters that were studied in this work are listed in Table 1.
The goal of this work was to understand the dynamics of the compensatory plasticity that occurs in response to a spatial displacement of one input channel relative to another when both channels are equally plastic. In particular, we were interested in how plasticity divides between different input channels that converge onto a postsynaptic cell when the input channels are spatially misaligned. For concreteness, we refer to the two sensory input layers as visual and auditory, similar to the organization in the optic tectum.
Our paradigm for inducing synaptic plasticity in the auditory and visual channels involved two stages. First, the synaptic weights were initialized by running the simulation with a nondisplaced visual field. Then, after the synaptic weights had stabilized, we introduced a displacement in the visual field. The auditory and visual RFs of the postsynaptic neuron served as the read-out of plasticity. We studied the influences of three factors in determining the extent and dynamics of RF plasticity for each modality (Table 1): 1) differences in the profile of presynaptic activity between the two modalities (b and k), 2) the strength of the crossmodal correlation (f), and 3) the magnitude of the visual field displacement (φ).
Dependence of plasticity on auditory and visual activity profiles
Identical auditory and visual presynaptic activity profiles (b = 1, k = 1) result in two equally likely outcomes: either the auditory or the visual RF shifts in response to a spatial displacement of the visual field. Before the visual field is displaced, the auditory and visual RFs of the postsynaptic neuron are identical (φ = 0°, Fig. 4A). The 45° leftward displacement of the visual field (φ = 45°, Fig. 4, B and C, white dotted line) causes visual and auditory RFs to be temporarily misaligned. Shortly after exposure to the displaced visual field, RF plasticity occurs; it reinstated mutual alignment of the RFs across the two modalities and the response profiles for the two RFs return to the profiles they exhibited before the manipulation. The position of the RFs stabilized in one of two possible configurations, with equal probability. In one configuration, the visual RF shifts to the left (Fig. 4B, top) and the auditory RF remains at its original position (Fig. 4B, bottom); in the other configuration, the auditory RF shifts to the right (Fig. 4C, bottom) and the visual RF remains at the displaced position (Fig. 4C, top). We call these dynamics “winner-take-all” because only one of the two modalities ultimately shifted. In the winner-take-all regime, the RF of the plastic modality jumps from the original position to the new, aligned position without transitioning through intermediate positions. The reason for this nonlinear behavior will be explained later in this section.
Although ultimately the RF for only one of the modalities shifts, there are transient changes in the RF for the other modality as well. In response to the displacement, both RFs weaken at the original positions and they begin to grow at a new position (45° for auditory, 0° for visual; Fig. 4, B and C). However, the RF stabilizes at the new position for only one of the two modalities, whereas for the other modality, the RF returns to its original shape and position.
Small changes in the relative width or the relative strength of presynaptic activity cause the symmetry between auditory and visual plasticity to break. When the total strength of activity in the auditory presynaptic layer is reduced relative to that in the visual layer (b = 1; k = 0.9), the auditory RF always shifts completely and the visual RF always remains unchanged (Fig. 5, A and B). Similarly, when the width of the auditory presynaptic activity profile is increased relative to that of the visual profile, but the total strength of activity is the same across the modalities, the auditory RF always shifts completely, whereas the visual RF remains unchanged (Fig. 5, C and D).
The relationship between which modality shifts and the relative width and the relative total strength of presynaptic activity is shown in Fig. 6 (black line). A good predictor of which modality shifts is the relative strength of the mean presynaptic weights: the modality with the weaker mean synaptic weights tends to be the modality that shifts (Fig. 6, comparison of red dotted line and black line). The mean strength of the synaptic weights is determined by the presynaptic activity. Both decreasing the strength of the presynaptic activity and increasing the width of the presynaptic activity lead to a weakening of the mean synaptic weights, in turn leading to the asymmetry in the resulting plasticity, for reasons that are discussed in the following text.
To gain an intuition for how auditory and visual RFs shift to compensate for the displacement of the visual field, we must first understand the terms that drive plasticity. The auditory and visual synaptic weights are strengthened by two correlation terms (Eq. 12): the correlation of presynaptic activity within each modality (intramodal correlation; Cijaa and Cijvv) and the correlation of presynaptic activity across the two modalities (crossmodal correlation; Cijav and Cijva). The intramodal correlation strengthens the synaptic weights without causing them to shift position in the presynaptic layer, whereas the crossmodal correlation causes the weights to shift their position to compensate for the visual field displacement. In the absence of a spatial displacement of the visual field, the crossmodal correlation is centered at zero (Fig. 3B, solid line), driving the auditory and visual RFs to be aligned in space. In the presence of a spatial displacement of the visual field, the crossmodal correlation terms are offset by the magnitude of the displacement (Fig. 3B, dashed and dotted lines), driving the RFs for the two modalities to separate by the magnitude of the displacement.
The crossmodal correlation term is convolved with the visual synaptic weights (∑j Cijavwjv in Eq. 12) to drive auditory plasticity and is convolved with the auditory synaptic weights to drive visual plasticity (∑j Cijvawja in Eq. 12). Figure 7 displays the distributions of synaptic weights and the convolution of the weights with the correlation terms at several important time points in the simulation. Before the visual field displacement, the auditory and visual weights are centered at zero (Fig. 7A). Similarly, both the intramodal correlation convolved with the appropriate weights (Fig. 7D) and the crossmodal correlation convolved with the appropriate weights (Fig. 7G) are centered at zero. This is because both the intramodal and the crossmodal correlations are centered at zero for the nondisplaced visual field (Fig. 3, A and B, solid lines). Since the weights and convolution terms are aligned, there is no force driving the synaptic weights to shift. Immediately after the visual field displacement, there is not yet a change in the synaptic weights and the weights are still centered at zero (Fig. 7B). The intramodal term convolved with the appropriate weights also remains unchanged (Fig. 7E). However, there is an important change in the crossmodal term convolved with the synaptic weights. Since the crossmodal correlations are now offset from 0 (Fig. 3B, dashed and dotted lines), the convolutions of the crossmodal correlations with the weights are now offset from zero (Fig. 7H). These terms now drive the auditory weights to shift to the right and the visual weights to shift to the left. After adaptation to the visual field displacement, the auditory weights are shifted to the right and the visual weights remain at the original position (Fig. 7C). Both the intramodal correlations convolved with the appropriate weights (Fig. 7F) and the crossmodal correlations convolved with the appropriate weights (Fig. 7I) are aligned with the weights, so there is no term driving further changes in the distributions of the weights.
Why does the auditory, and not the visual, RF shift when the auditory presynaptic activity is weaker or broader than the visual? When the presynaptic visual activity is either stronger or more narrowly distributed than the auditory presynaptic activity, the mean visual synaptic weights become stronger than the mean auditory weights (Fig. 6, red dotted line). As a result, the crossmodal term driving auditory plasticity (∑j Cijavwjv in Eq. 12; Fig. 7H, blue line) becomes larger than the crossmodal term driving visual plasticity (∑j Cijvawja in Eq. 12; Fig. 7H, red line), causing the auditory weights to shift preferentially (Fig. 7C). Notably, a narrowing of the presynaptic activity leads to a weakening of the mean synaptic weights, even when the mean presynaptic activity is unchanged, an effect that is likely mediated by the nonlinearity (Eq. 12).
Regimes with different dynamics of plasticity
So far, we have discussed winner-take-all dynamics, in which the input channel with the weaker or broader RFs shifts by the entire visual field displacement, whereas the RFs for the other input channel remains at their original position (Figs. 4 and 5). Interestingly, the model produces two additional regimes with different plasticity dynamics: a mixed-shift regime and a no-shift regime. In the mixed-shift regime (Fig. 8A), the RFs of both modalities shifts and the RFs moved continuously from one position to another (in contrast to the jump in RF position, characteristic of the winner-take-all regime). Although both modalities shift to some extent in the mixed-shift regime, the modality with stronger or broader input activity exhibits a greater shift (Fig. 8B). The boundary between which modality exhibits the majority of the shift is similar, but not identical, to that in the winner-take-all regime (Fig. 6). In the no-shift regime (Fig. 8C), neither RF shifts, despite the displacement of the visual field, and the auditory and visual RFs remain misaligned.
The transition between these three dynamic regimes depends primarily on two parameters: the magnitude of the spatial displacement and the strength of the crossmodal correlation. The magnitude of the visual field displacement (φ) affects plasticity by determining the extent of the overlap between the crossmodal and the intramodal correlation terms (Fig. 3). As the visual field displacement increases, the overlap decreases. The strength of the crossmodal correlations affects the dynamics because it is the crossmodal correlations that drive plasticity. As the crossmodal correlations (f) approach zero, the force driving plasticity approach zero as well.
The boundaries of the three regimes (winner-take-all, mixed-shift, no-shift) are shown in Fig. 9A as a function of the visual field displacement (φ) and the crossmodal correlation (f). Understanding these boundaries requires considering the force acting on the synaptic weights after the visual field was displaced. Immediately after a visual field displacement, both auditory and visual synaptic weights decrease in magnitude at their original position because they are no longer strengthened by the crossmodal correlation term, which is shifted to a new position by the visual field displacement. This causes a decrease in the strength of the postsynaptic firing rate during the time points following the visual field displacement (for example, see Fig. 4). The decayed synaptic weights can be estimated based on the condition in which there is no crossmodal correlation (f = 0). We calculated the force on the decayed weights (right-hand side of Eq. 12) and used it to predict the regime boundaries. Notably, now we consider the net force on the weights, incorporating the synaptic suppressive term and the nonlinearity, whereas in Fig. 7 we displayed only the contributions from the correlation term.
The winner-take-all regime occurs when there was little overlap between the crossmodal and the intramodal correlations (large φ) and the crossmodal correlation was strong (large f). In this condition, the force on the synaptic weights is greater than zero for at least one of the modalities at the aligned position of the RF (because of the large value of f; Fig. 9B, bottom) and equal to zero for positions between the aligned position and the initial position for both modalities (because of the large value of φ). This distribution of the force caused the weights (for the auditory channel) to jump to the new position and grow rapidly through positive feedback (Fig. 9B).
The no-shift regime occurs when the crossmodal correlation (f) is small and the spatial displacement (φ) is large relative to the crossmodal correlation. In this condition, there is little overlap between the intramodal and the crossmodal correlations and the crossmodal correlation is weak. The force acting on the synaptic weights is near zero for all positions for both modalities, so neither RF shifts (“no-shift”; Fig. 9C).
The mixed-shift regime occurs when the spatial displacement (φ) is small. In this condition, there is substantial overlap between the crossmodal and the intramodal correlations after the visual field displacement. This overlap creates a positive driving force on the synaptic weights at all positions ranging from the initial to the aligned position, so the RFs shift continuously, and both modalities shift (“mixed-shift”; Fig. 9D). As shown in Fig. 8A for the same parameter values, the auditory RFs shift substantially more than the visual RFs.
After RFs shift, the RF shape recovers to that observed before plasticity (for example, see Fig. 8A). This is because both before the visual field was displaced and after the auditory RF shifts in response to the displaced visual field, the crossmodal correlation is aligned with the RF, so the force driving the synaptic weights is identical (Eq. 12). However, when the RF does not shift in response to a displaced visual field (“no-shift”; Fig. 8C), the RF remains in a weakened state. This is because the crossmodal correlation term remains misaligned with the RF, so the crossmodal correlation does not strengthen the synaptic weights at the RF.
An important implication of these distinct dynamical regimes is that the plasticity depends heavily on the magnitude of the visual field displacement. For smaller values of the crossmodal correlation (f), small visual field displacements correspond to the mixed-shift regime, whereas large visual field displacements corresponded to the no-shift regime (Fig. 10A, inset; blue dashed line). The rate of plasticity was largest for values of the displacement near the middle of the mixed-shift regime (Fig. 10A). Additionally, because of the dependence of the plasticity regime on the magnitude of the visual field displacement, plasticity under conditions of low crossmodal correlation could be enabled and substantially enhanced through adaptation to multiple, small displacements of the visual field rather than a single, large displacement (Fig. 10B).
The dependence of the rate of plastic change in the mixed-shift regime on the magnitude of the displacement can be understood intuitively (Fig. 10A). For small displacements, plasticity is relatively slow because of the proximity of the initial and aligned positions, causing the force on the synaptic weights to be similar at the two positions. Therefore there is little differential shift of the synaptic weights toward the aligned position from the initial position. For large displacements, the rate of plasticity is also slow because there is little overlap between the intramodal and crossmodal correlations, causing the force on the weights to be small. The trade-off between these two factors (similar forces at the initial and aligned positions for small displacements and lack of overlap between the intramodal and crossmodal correlations for large displacements) causes the rate of plasticity to be largest for intermediate values of the displacement (Fig. 10A).
Robustness of the model to alternate forms of the Hebbian rule
We have compared the results of our model with several variant models with different forms of synaptic suppression to test whether our findings are a general result of Hebbian learning, rather than a result of a specific model choice. Adding a multiplicative normalization of the synaptic weights to the model does not change the winner-take-all behavior (“Alternate Form A”, Fig. 11, A and B). Likewise, implementing the competition across synapses by a sliding threshold, similar to the BCM model, rather than by the subtractive suppressive term, does not alter the finding of winner-take-all dynamics (“Alternate Form B”, Fig. 11, C and D).
To test whether our results depend critically on the choice of nonlinearity, we compared the results of the model with a variant model that used an alternate form of the nonlinearity. Specifically, we replaced the threshold nonlinearity with a sigmoidal nonlinearity (“Alternate Form C”, Fig. 11, E and F). We found that winner-take-all behavior persists even with this very different form of the nonlinearity.
Plasticity dynamics for parameters chosen to approximate RF properties in barn owl OT
So far, we have explored the dynamics of the model under the conditions that the auditory presynaptic activity is slightly broader than the visual (b = 1.5) and the total strengths of the presynaptic activity are identical (k = 1). These parameter values allow for a clear differentiation between the three dynamic regimes. However, in experiments, much greater differences between RF properties have been reported. For example, in the optic tectum of the barn owl, auditory RFs are threefold broader than visual RFs (half-maximum auditory RF: 30°; half-maximum visual RF: 10°) (Knudsen 1982). We chose model parameters for the presynaptic activity to reflect these postsynaptic RF widths. However, the strengths of the responses across modalities have not been reported. Therefore we chose a value for auditory RF strength that was 50% larger than that of visual RF strength, a difference that was chosen to minimize the asymmetry in the resulting plasticity. For the model parameters that reflect these conditions (b = 3.8; k = 1.5; φ = 23°), although the dynamics are in the mixed-shift regime, there is still an extreme asymmetry in the plasticity expressed by each modality: the visual RF shifts by only 2°, whereas the auditory shifts by 21° (Fig. 12). Thus for these values of the presynaptic activity, the model behaves qualitatively similarly to the winner-take-all regime, with the auditory modality exhibiting the vast majority of the plasticity.
In this work, we used a Hebbian model to explore the synaptic plasticity that results from a spatial displacement of one input representation relative to another. Our main discovery is that equal capacities for Hebbian plasticity in two input channels do not result in equal plasticity in the two channels under the vast majority of conditions. Small differences in the relative strength or width of RFs between the two channels have a dramatic impact on the plasticity that occurs, with the channel with the weaker or broader RFs exhibiting most or all of the plasticity. These principles, which are robust to the specific form of the model, apply not only to the maintenance of spatial registry across inputs, but also to the experience-dependent emergence of aligned representations in developing circuits.
Our model predicts different regimes of plasticity depending on the size of the sensory displacements and the strength of the correlation across the input channels. When the displacement is large and the crossmodal correlation is strong, our model results in winner-take-all dynamics, with all plasticity being expressed by the weaker or more broadly tuned channel. In the winner-take-all regime, the dominant process is a decline of the receptive field at the original position and a concomitant growth of the receptive field at the new position (for example, Fig. 4B). Even in this regime, the dynamics may be complex, with, for example, initial increases in the synaptic weights at the aligned positions for both channels (for example, Fig. 4B) before plasticity in one of the channels takes over. When the displacement is small, the mixed-shift regime occurs. In the mixed-shift regime, the process is dominated by a gradual shift of both receptive fields from their initial positions to the aligned positions. When the displacements are large and the correlation is small, the no-shift regime occurs.
Reports in the tectum of juvenile barn owls indicate that both a jump in the RF to a new position (winner-take-all) and a gradual shift of the RF position (mixed-shift) exist in response to the same visual field displacement (Brainard and Knudsen 1995). During the course of plasticity, at some sites, auditory RFs are bimodal, with a peak at both the original and the new positions, suggestive of winner-take-all dynamics (a jump in the auditory RF rather than a gradual shift). Meanwhile, at other sites, auditory RFs shifted to intermediate degrees, which is suggestive of mixed-shift dynamics. These differences may reflect local differences in RF width and strength in a circuit operating near the boundary between mixed-shift and winner-take-all regimes. Future experiments should vary the magnitude of the displacement and look for a transition between the dynamical regimes, as well as a relationship between the auditory and visual RF properties at a particular site and the dynamics expressed at that site.
The observed asymmetry in the amount of plasticity expressed by each input channel is a purely dynamical phenomenon: when the input from one channel is displaced, the transient force on the synaptic weights of the weaker or more broadly tuned channel is stronger than the force on the synaptic weights of the stronger or more sharply tuned channel, giving rise to a strong imbalance in the trajectories of the synaptic plasticity across the two channels. In the winner-take-all regime, the force that drives plasticity is present only at the new location, but vanishes at intermediate locations. The force vanishes at intermediate locations because of the nonlinearity in the learning rule, coupled with a lack of overlap between the intramodal and crossmodal terms for large displacements. Because of this, there is no strengthening of the synaptic weights at intermediate locations, but instead there is a strengthening of the weights at the new location for the modality experiencing the stronger force.
The finding of winner-take-all plasticity dynamics holds not only for our main Hebbian model, but also for the three alternate Hebbian models we presented. In our main model, the relative mean strength of the synaptic weights, which is determined by both the relative strength and width of the presynaptic activity, is a good predictor of which modality shifted (Fig. 6, red vs. black line). It remains to be explored whether the relative mean weights or other criteria determine the “winning” modality in the three alternate models. In particular, for the case of the multiplicative normalization (Alternate Model A), the relative mean strength of synaptic weight is immaterial since the weights are normalized at every iteration such that their mean strengths are identical for the two modalities.
Comparison to previous theoretical studies
An advantage of our Hebbian learning rule compared with previous rules is that the synaptic normalization is not a hard constraint. In other words, the mean strength of the weights is not fixed to a particular value, a demand that is biologically unlikely. Instead, the overall strength of the weight is determined by the nonlinearity and the subtractive suppressive terms, which result in an approximate normalization. In contrast to many previous approaches, this yields synaptic weight profiles that are spatially constrained, smooth, and not saturated at their maximum values.
Another novel feature of our model is that the synaptic suppressive terms are local to each modality (see Eq. 11), rather than the fully global competition across modalities used in most previous competitive Hebbian models, such as those describing the development of ocular dominance columns (Miller 1996). Because of this difference, in our model, both input channels are represented by the postsynaptic cell, whereas in the previous models, ultimately only one of the two input channels drives the postsynaptic cell. This explains why our model does not reduce to previous models of ocular dominance plasticity (e.g., Miller 1996) when the crossmodal correlation is zero (f = 0).
For the postsynaptic neuron to be driven by both input modalities, the exact form of this intramodal competition is not important. What is important is that the competition within a modality is stronger than the one across modalities. This condition makes sense for all circuits that must combine information across input channels. An important question is whether such a specific competition is biologically realized. We hypothesize that in the case of the optic tectum, synaptic terminals from different input modalities segregate on different parts of the dendritic tree of the tectal neuron. If true, a local homeostatic suppressive term may provide the required mechanism. There is both experimental and theoretical evidence in support of such a mechanism. Synaptic normalization has recently been shown to occur independently at different synapses (Hou et al. 2008), although there is evidence of more global mechanisms as well (Ibata et al. 2008). Additionally, a theoretical study found that if synaptic normalization is based on local activity, then the resulting learning rule normalizes weights separately in different dendritic compartments (Rabinowitch and Segev 2006).
Our approach differs from previous theoretical studies of crossmodal plasticity in the barn owl. A previous Hebbian model reproduced experimental plasticity results based on the assumption that visual connections were not capable of plasticity (Gelfand et al. 1988). Another model involved a spike-timing–based mechanism to preferentially bias the circuit toward auditory (and not visual) plasticity (Mysore and Quartz 2005). Based on the results of our model, such assumptions may be unnecessary: differences in receptive field properties may be sufficient to lead to a great asymmetry in the division of plasticity across input channels. Another model assumed that auditory synapses were strengthened by a reward signal that was activated whenever the animal successfully foveated a target (Rucci et al. 1997), a mechanism that has not been supported by subsequent experiments (Hyde and Knudsen 2001).
In contrast to mechanistic approaches, other studies have implemented functional approaches, maximizing the mutual information between the stimulus and the neural output. They achieved plasticity of the auditory and not the visual RFs based on the assumption that either the signal-to-noise ratio (Kardar and Zee 2002) or the strength (Atwal 2004) of the auditory inputs is much less than that of the visual inputs.
Relationship to experimental findings
When inputs have been displaced in experimental systems, the resulting plasticity is often asymmetrical. For example, when barn owls are fitted with optical displacing prisms, auditory plasticity reinstates crossmodal alignment, whereas visual plasticity has not been observed, even though visual plasticity in the tectum has been reported in other species following other manipulations. Similarly, when juvenile frogs are raised with a rotated eye, plasticity of the input from the ipsilateral eye reinstates binocular alignment. In both cases, there is a dramatic asymmetry in the plasticity exhibited by the two input channels.
Our findings suggest that the imbalance of plasticity across input channels in these experiments could be explained mechanistically by differences in RF properties of the two channels. In the barn owl optic tectum, a site of crossmodal realignment (DeBello and Knudsen 2004), auditory RFs are threefold broader than visual RFs (Knudsen 1982). According to our model, this substantial difference in the RF width would cause auditory RFs to shift by the entire displacement in the winner-take-all regime and to shift by most of the displacement in the mixed-shift regime. Another site of auditory plasticity in the system is the external nucleus of the inferior colliculus (ICX), which is one step earlier in the auditory processing stream from the optic tectum. Given that the optic tectum exhibits plasticity in owls that show no plasticity in the ICX (DeBello and Knudsen 2004), and that the optic tectum is required for plasticity in the ICX (Hyde and Knudsen 2001), it is possible that the plasticity initially occurs in the optic tectum by the mechanism described here and then propagates back to the ICX.
The imbalance of plasticity observed in the frog optic tectum, a site of binocular realignment, could also be explained by differences in RF properties. Most sites are driven more strongly by inputs from the contralateral rather than the ipsilateral eye (Gaillard 1985). As we have shown, these differences in RF properties are sufficient to cause the observed asymmetry in RF plasticity, where the contralateral, and not the ipsilateral, RFs are plastic.
In experimental models, an important factor that affects the capacity for plasticity is the magnitude of the spatial displacement imposed on one of the input channels. For both adult barn owls adapting to auditory–visual misalignments, as well as frogs adapting to binocular misalignments, plasticity is enhanced through training with multiple, smaller displacements (Keating and Grant 1992; Keating et al. 1975; Linkenhoker and Knudsen 2002). Our computational model reproduces the enhanced plasticity resulting from incremental training (Fig. 10B). The inability to adapt to large displacements follows from the fact that as displacements increase, the model circuit transitions from the mixed-shift to the no-shift regime (Fig. 9A).
In our computational model, the strength of the crossmodal correlation plays an important role in determining the nature of the plasticity. The behavioral state of the animal during the period of plasticity could affect the strength of the crossmodal correlation. For example, during hunting, owls exhibit heightened attention to certain bimodal stimuli (e.g., a scurrying mouse). Attention is known to substantially increase firing rates (Desimone and Duncan 1995; Knudsen 2007; Reynolds and Chelazzi 2004). Higher firing rates in response to bimodal stimuli would result in a stronger crossmodal correlation. As the crossmodal correlation increases, the dynamics transition from the no-shift to the winner-take-all regime (Fig. 9A). This could explain the observation that adult owls have a greater capacity for plasticity when they are allowed to hunt live mice (Bergan et al. 2005).
Testable predictions for future experiments
One prediction of our model is that visual plasticity in the barn owl system should be maximized in the incremental training experiments because, in that case, the system should be in the mixed-shift regime. Gradual shifts in the auditory receptive field position, as predicted by the mixed-shift regime, have indeed been observed in the optic tectum of the barn owl (Brainard and Knudsen 1995). However, shifts in visual receptive fields were not reported. This can be explained by the fact that auditory fields are threefold wider than visual RFs (Knudsen 1982). Therefore according to the model, there should be very little change in the visual RF position, on the order of 1°. This magnitude of shift is well below the resolution of previous measurements. Future experiments could test for effects of incremental training on visual RFs in the tectum.
A second prediction of our model is that manipulating the relative quality of auditory and visual stimuli during learning should change the division of plasticity across the two modalities. For example, if owls were reared with both diffusing and displacing lenses, such that visual responses were significantly weaker than auditory responses, plasticity of visual RFs should result.
A third prediction is that responses should weaken in animals that do not adjust to the displacement of the visual field, as in Fig. 8C. Adult barn owls, which are normally not plastic in response to experience with a displaced visual field, should have weaker responses after experience with misaligned bimodal inputs compared with control animals. Indeed, weakened auditory responses in optic tecta that do not have adaptively shifted auditory RFs have been reported anecdotally (Brainard and Knudsen 1998).
Another example of plasticity occurs within the auditory system without reference to a visual input (Miller and Knudsen 2003). Plasticity occurs in the auditory thalamus after barn owls are fitted with an acoustic filtering device that alters auditory spatial cues in a frequency-dependent manner. In response to these devices, the tuning of each frequency channel shifts appropriately to regain spatial alignment across frequency channels. The division of the adaptive plasticity across frequency channels is unknown. Our model predicts that, under these conditions, the frequency channels with the broader or weaker RFs will be the ones to shift their patterns of inputs.
This work was supported by National Institutes of Health grants to E. I. Knudsen, a National Science Foundation Graduate Research Fellowship to I. B. Witten, and an Israeli Science Foundation grant to H. Sompolinsky.
We thank the faculty and students at the Methods in Computational Neuroscience course at the Marine Biological Laboratory, as well as M. Goldman, J. Bergan, D. Winkowski, S. Mysore, and A. Asadollahi for helpful comments on this manuscript.
↵1 The online version of this article contains supplemental data.
The costs of publication of this article were defrayed in part by the payment of page charges. The article must therefore be hereby marked “advertisement” in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.
- Copyright © 2008 by the American Physiological Society