We model the putative neuronal and synaptic mechanisms involved in learning a visual categorization task, taking inspiration from single-cell recordings in inferior temporal cortex (ITC). Our working hypothesis is that learning the categorization task involves both bottom-up, ITC to prefrontal cortex (PFC), and top-down (PFC to ITC) synaptic plasticity and that the latter enhances the selectivity of the ITC neurons encoding the task-relevant features of the stimuli, thereby improving the signal-to-noise ratio. We test this hypothesis by modeling both areas and their connections with spiking neurons and plastic synapses, ITC acting as a feature-selective layer and PFC as a category coding layer. This minimal model gives interesting clues as to properties and function of the selective feedback signal from PFC to ITC that help solving a categorization task. In particular, we show that, when the stimuli are very noisy because of a large number of nonrelevant features, the feedback structure helps getting better categorization performance and decreasing the reaction time. It also affects the speed and stability of the learning process and sharpens tuning curves of ITC neurons. Furthermore, the model predicts a modulation of neural activities during error trials, by which the differential selectivity of ITC neurons to task-relevant and task-irrelevant features diminishes or is even reversed, and modulations in the time course of neural activities that appear when, after learning, corrupted versions of the stimuli are input to the network.
- categorization learning
- Hebbian learning
- spiking neuronal network
- visual categorization
a large body of knowledge has accumulated about the brain areas involved in categorization across multiple sensory modalities auditory (Vallabha et al. 2007), somatosensory (Romo and Salinas 2001), and olfactory (Howard et al. 2009) systems, with categorization of visual stimuli being the most studied (Knoblich et al. 2002).
Several open questions remain as to the specific roles played by the areas participating in the association between sensory stimuli and categories (Freedman and Assad 2011; Swaminathan and Freedman 2012) and the learning mechanisms involved. The neural correlates of the acquired association carry multiple traces of category-related modulations (Freedman et al. 2001, 2003; Sigala and Logothetis 2002; De Baene et al. 2008; Meyers et al. 2008); however, it is not always clear (especially for feature-encoding areas) whether such modulations are epiphenomenal reflections of category-specific neural signals generated elsewhere or they have an important computational role in the categorization process.
Among the brain areas involved in categorization, the prefrontal cortex (PFC) plays a fundamental role, notably for learning novel associations and encoding abstract rules (Seger and Miller 2010). Neurons in PFC show sharp response properties across boundaries between categories largely independent of stimulus similarity (Freedman et al. 2003). However, category-related actions/decisions involve multiple areas: premotor cortex for action planning (Boettiger and D'Esposito 2005; Muhammad et al. 2006); parietal cortex to implement visuospatial processing linking perceptual information with potential responses; basal ganglia to gate selectively cortical areas for choice of action (Humphries et al. 2006; Seger 2008); hippocampus and medial temporal lobe to encode and learn items to be categorized (Myers et al. 2003; Shohamy and Wagner 2008); the dopaminergic system and the associated plasticity of striatal and corticostriatal synapses to support reward-modulated learning; and the inferotemporal cortex (ITC), where a modular, feature-based representation has been observed (Tsunoda et al. 2001; Yamane et al. 2006) and where neurons with category-related tuning properties have been reported (Vogels 1999; Freedman et al. 2003).
A general plausible computational principle underlying the categorization process is that it must rely on the selection of relevant features in a specific behavioral context, and neurophysiological studies in ITC have indeed shown that the activity of neurons encoding sensory features relevant for the task is maximally modulated (see Sigala and Logothetis 2002; De Baene et al. 2008).
Starting from this evidence, the basic assumption of the present modeling work is that a mutual interaction between a feature-encoding (e.g., ITC) and a category-encoding (e.g., PFC) brain areas is the fundamental neural substrate of categorization. We implement a learning scenario for the acquisition of an association between stimuli and categories and explore the consequences of top-down selective modulation of sensory representations.
We take as a relevant testing ground of our model the results of Sigala and Logothetis (2002) and De Baene et al. (2008), extending the scope of our previous work Szabo et al. (2006) to the domain of dynamic, online learning, and from this we address more general questions about the computational role of learned selective feedback in a categorization task. As a significant progress over Szabo et al. (2006) in the present work, we face the full complexity of the ongoing dynamic coupling between spiking activity induced by stimuli and the spike-driven, local synaptic dynamics. Beyond the appeal in terms of biological plausibility, this entails inter alia coping with the finite-size effects that are important in determining learning histories (see Del Giudice et al. 2003), due to the distribution of firing rates and the consequent distribution of rates of synaptic changes.
Based on simulations of a multimodular architecture composed of spiking (integrate-and-fire) neurons and plastic, spike-driven synapses, we will indeed show that successful learning histories emerge naturally through a combination of Hebbian plasticity for correct trials and partially anti-Hebbian plasticity for error trials. The learnt top-down synaptic structure produces better performances and faster responses. Besides reproducing, in correct trials, modulation of neural activity in ITC modules, qualitatively similar to the one observed in Sigala and Logothetis (2002), the model generates specific neurophysiological and behavioral predictions, including a different and specific tuning during error trials: the selectivity of the task-relevant feature neurons is diminished or even reversed.
Neuronal and synaptic dynamics.
Our neural model is the single compartment linear integrate-and-fire neuron (Fusi and Mattia 1999). The subthreshold dynamics of the membrane potential of neuron i is: V̇i(t) = −β + Ii(t) (assuming units such that the membrane capacitance C = 1), with a reflecting barrier condition such that if Vi(t) is driven below 0, it stays 0. β Is a constant leakage term. When the membrane potential reaches the threshold θ = 1, the neuron emits a spike and the membrane potential Vi is set and kept to a reset potential Vr for a refractory period τarp. Ii(t) is the total synaptic current afferent to neuron i, and it is the sum of the external excitatory current Iext, the recurrent excitatory and inhibitory currents: Ii(t) = ∑jJij∑k δ(t − tjk − δj) + Iext. Jij represents the amplitude of the instantaneous change of the postsynaptic potential (positive for excitatory synapses and negative for inhibitory synapses). The sums are over all presynaptic neurons j and for each j over all the emitted spikes at times tjk, reaching the target neuron i with delay δj. Delays are randomly sampled from a truncated exponential distribution with a minimum and a maximum value, respectively, δjm and δjM.
Parameters are listed in Table 1.
Plastic synapses in the model are bistable and stochastic as motivated and described in Fusi et al. (2000). The synaptic efficacy J takes one of two values, J− (depressed) and J+ (potentiated). Learning evolves as a sequence of random transitions between J− and J+, triggered by the arrival of presynaptic spikes; the direction of the transition (potentiation or depression) is determined by the instantaneous value of the postsynaptic potential (Fusi et al. 2000), as detailed in the following sections (see Table 3).
Each synapse has an internal dimensionless variable (“synaptic potential,” XJ), ranging in the interval [0; 1]. This range is split in two by a threshold θJ; when XJ is above this threshold, the synapse is in the potentiated state and XJ constantly moves toward 1, with a constant drift αX; below θJ, the synapse is depressed and XJ moves toward 0, with the same drift αX. Thanks to the drift, in the absence of presynaptic spikes, the synaptic state (potentiated or depressed) never changes [thus we have long-term potentiation (LTP) or long-term depression (LTD)]. Transitions can happen only upon the arrival (at time tk) of a presynaptic spike (with index k), which causes a sudden jump in XJ. If the postsynaptic potential is found above a threshold θV, the jump will be positive and of size dX+, otherwise it will be negative and of size dX−. A positive jump can take XJ above θJ, making thus the synapse switch to the potentiated state; conversely, a negative jump can make the synapse switch to the depressed state. In formula: LTP results from high presynaptic activity (high rate of triggering presynaptic spikes) and high postsynaptic activity (which on average implies high values of postsynaptic membrane potential). LTD occurs for highly active presynaptic neuron and poorly active postsynaptic one.
Parameter values for the synaptic dynamics are listed in Table 2.
Network architecture and stimuli.
We set up a network with two layers, analogous to the one described in Szabo et al. (2006) (see Fig. 1), meant to describe, respectively, a cortical area coding for the visual features defining each stimulus and a higher cortical area coding for the category assignment to stimuli according to a rule to be learnt. Each layer includes 6,000 neurons, divided into several selective populations of excitatory neurons (see below), one nonselective excitatory “background” population and one inhibitory population.
The first layer (“ITC”) comprises NF + 1 feature-selective populations of 240 neurons each (denoted with the letters D and O in Fig. 1, see Task and learning for the role played by the different populations). In the absence of stimuli, all the neurons in each population receive a background (excitatory) Poissonian synaptic input of base-rate λ0. Each stimulus to ITC is defined by the activation of one of the two values of each feature (e.g., small vs. large distance between eyes in a face, see Fig. 3); upon stimulation two disjoint subsets of 120 neurons for each ITC population (D1–D2 and O1–O2: NF = 1 in Fig. 1) will receive a differential current (except for Selective top-down synapses sharpen tuning curve in which population D will be divided into four subsets of 120 neurons each, to code for four values); the active (inactive) value corresponds to an input Poisson spike train with rate λ + Δλ (λ − Δλ), with λ a reference value; λ ± Δλ > λ0. The value of Δλ can be varied (see Influence of selective top-down synapses on network performances). Corrupted stimuli are implemented (see Corrupting learned stimuli: a footprint of categorization in ITC) by reducing the size of selective populations to a fraction x < 1 of the original size, the remaining (1 − x) 240 neurons being given a stimulus-independent current of rate λ0. NF too will be varied in the following sections to study the effects of a more or less complex feature space; NF = 16 unless otherwise specified.
Stimulation parameters are listed in Table 1.
The second layer (“PFC”) has a winner-take-all structure, similar to the one described in Wang (2002) in which two cooperating-competing populations (480 neurons each) encode the two categories, C1 and C2; in the regime of interest, upon stimulation of the ITC populations, the network dynamics always leads to a stable state where C1 is firing high and C2 is almost silent or vice versa, signaling a decision of the network as to which category the presented stimulus is assigned to.
All neurons have a probability c = 25% of being synaptically connected with any other neuron in the network, with the exception that excitatory-to-inhibitory and inhibitory-to-excitatory connections are restricted to each of the two layers (“local” inhibition). C1 and C2, as well as the subsets in ITC, are self-excited and mutually excited; besides, they are reciprocally connected with the excitatory background and inhibitory populations in the corresponding layers (see Table 1 and Fig. 1).
The bidirectional synaptic connections between the ITC and PFC layers are the only plastic synapses (see Neuronal and synaptic dynamics) and are thus shaped by learning (see Task and learning).
Values chosen for the fixed synaptic efficacies are in Table 3.
Task and learning.
We define the task after Sigala and Logothetis (2002) and De Baene et al. (2008). In the experiments, monkeys were shown schematic visual stimuli, defined by a fixed number of features, and grouped in two categories, to which the monkey is trained to assign stimuli by trials and errors. Only a subset of features were relevant for the categorization, and neurons in ITC selective for those “diagnostic” features turned out to be maximally modulated depending on the feature values.
We chose only one of the NF + 1 features to be relevant (“diagnostic”) for the categorization: expect for Selective top-down synapses sharpen tuning curve, the task the network has to learn is to associate stimuli with one of the two values (e.g., distant eyes D1) of this one feature to category C1, and stimuli taking the other value (e.g., close eyes D2) to C2, regardless of the values taken by the remaining NF nondiagnostic features.
At the beginning of training, we generate a random pattern of activation λ ± Δλ for each of the NF + 1 ITC populations. If the answer of the network, as read from the pattern of activity of C1 and C2 in the PFC layer, is correct (the correct classification being determined by the value of the diagnostic feature), we go on generating a new stimulus encoded by a new random choice of all NF + 1 features for the subsequent trial. If not, in the subsequent trial the network is presented with a new random stimulus belonging, however, to the same class as the preceding (wrongly classified) one (NF random values for the nondiagnostic features). This is consistent with the training strategy adopted in Sigala and Logothetis (2002) (N. Sigala, private communication).
Each stimulus lasts for 2 s. At 1.5 s from stimulus onset, the network response is “read” and a signal reward/no reward is determined, which activates the appropriate synaptic plasticity mode (see below), until the end of the stimulus.
Learning is semisupervised and partially anti-Hebbian: the Hebbian synaptic dynamics (see Neuronal and synaptic dynamics) is activated only upon the correct completion of a trial, that is just after the network has correctly classified a stimulus (“reward” condition). When the network generates a wrong classification (“no reward” condition), synapses that would undergo a positive jump dX+ (upregulated) are subject to a negative jump dX− (downregulated), while the ones that would be downregulated are left unchanged. This way synapses that would be potentiated if the classification provided was correct will be depressed, while the other synapses are left unchanged.
Figure 2 shows a cartoon of the average expected effect of learning on the synaptic structure of the network, both upon correct and wrong classification. This suggests that the synaptic connections between PFC and ITC, after learning, can be effectively described with six parameters (see Fig. 1): JP, the average synaptic efficacy between D1 and C1, and D2 and C2 (correct association); JD, the average synaptic efficacy between D2 and C1, and D1 and C2 (wrong association); Jn, the average synaptic efficacy between category populations and nondiagnostic features (C1/C2 ↔ O1/O2) in both directions.
As already noticed in Roelfsema et al. (2010), synaptic dynamics is required to change after an erroneous answer to prevent the impairment of what was learned contingently to correct answers. Such strong assumptions are consistent with the experimentally observed reward-related modulation of synaptic plasticity by the dopaminergic neurons (Schultz 1998; Schultz and Dickinson 2000).
Rationale for a features-based representation in the ITC layer.
The adopted model is certainly a gross oversimplification, both as to the type of neural representation in ITC and as to the areas involved. However, regarding the former, we emphasize that evidence reported in the literature provides at least a rationale for adopting a neural representation in ITC that is based on feature-selective populations and for assuming that the collection of such segregated representations is what is forwarded in the first place to PFC for further processing. To substantiate this statement, in Fig. 3 we propose a close analogy between our ITC model and the results reported in Tsunoda et al. (2001). These authors extensively explored the representation of visual objects in ITC and how such representations are altered when simplified versions of the same objects are presented (a simplified object being an object deprived of some features). Figure 3A shows the observed patchiness of the neural representations of an object, and what those representations reduce to when some features are removed. It is seen that there are patches uniquely associated to some features, others are overlapping to various degrees. In the words of Tsunoda et al. (2001), “an object is represented by a combination of cortical columns, each of which represents a visual feature (feature column)” (other aspects of the results from Tsunoda et al. will be commented on in the discussion, in the light of the present model). Figure 3B illustrates the assumed patchy representation in ITC that, by way of analogy, we associate with the representation of the Brunswick faces used in the Sigala and Logothetis (2002) work. The scheme is then mapped onto the model architecture described in Fig. 3C, top, where we suggest that the simplified representation we adopted could be imagined to be obtained from a selection of recorded neurons based on their feature selectivity (as exemplified in Fig. 3D).
All the results we present are from simulations in which both network and learning dynamics run concurrently. We performed a preliminary exploration in the large parameter space by resorting to dynamic mean-field equations (Amit and Brunel 1997; Del Giudice et al. 2003; Fusi and Mattia 1999). The simulations have been carried out with a high-performance custom C program, implementing the event-based approach described in Mattia and Del Giudice (2000). To estimate instantaneous firing rates, the spikes from each neuronal population are sampled in a 10-ms sliding window.
In the following sections, we first illustrate the typical time course of a learning history, and describe the way it is affected by the build up of a selective top-down synaptic structure. We then move to illustrate how the network performance is affected by such top-down synaptic structure. Finally, we study the network behaviour for corrupted versions of the training stimuli, and the time course of the neural activities in error trials, and formulate testable predictions in both cases.
Neural correlate of categorization.
Figure 4, A–C, shows the evolution of the network performance and synaptic configuration as learning proceeds. Before stimulation, all populations in the network are in a stable asynchronous state of low firing rate (few Hz). Upon stimulation, one of the two category populations in the PFC layer is in the end brought to an “up state” (≈40 Hz), resulting from winning the competition with the other category, which settles into a “down state” ≈0 Hz.
Depending on whether the winning category is the correct one according to the defined categorization rule, the plastic synapses are allowed to change following an Hebbian rule (for correct outcome) or a partially anti-Hebbian rule (for incorrect outcome), as described in Fig. 2 and in the methods. We recall here that we repeat the stimulus after a wrong answer. Figure 2A shows the time course of the performance on the categorization task, i.e. the fraction of correct outcomes averaged over a nonoverlapping sliding window of 30 trials, for the case in which only the bottom-up (ITC to PFC) synapses are plastic (“TD-off,” black) and when both bottom-up and top-down synapses are plastic (“TD-on,” grey). Performance is moderately affected by the presence of plastic top-down synapses: learning is seen to be slightly faster and more stable. As seen from Fig. 4, worse performance for the TD-off case is accompanied by larger fluctuations, as expected.
Moreover the differences between the bottom-up structures suggest that there is a better signal-to-noise ratio for the TD-on case even if the performances are similar for TD-on and TD-off, mainly due to the high stimulus contrast, that is the difference of stimulation to the two diagnostic population subsets Δλ = 0.3 Hz, used in these simulations. We will see in the following that as the stimulus contrast gets lower the top-down selective synaptic structure also entails differences in the performance and in the time needed for the categorization.
Figure 4, B and C, shows the corresponding evolution of the fraction of potentiated synapses for the different bottom-up (Fig. 4B) and top-down (Fig. 4C) synapses, grouped according to the feature/category populations they connect. Only a representative subset of synapse groups is shown in Fig. 4.
In Fig. 4D, the three checkerboards illustrate the final synaptic configurations for both plastic and nonplastic top-down synapses. Each square represents the fraction of potentiated synapses at the end of the learning period. The synaptic weights connecting a diagnostic feature population with its (anti-) correlated category were (depressed) potentiated. For the synapse groups involving nondiagnostic features, the case shown is the computationally advantageous one in which the final bottom-up synapses are slightly depressed (which helps in obtaining a stable learning trajectory), and the top-down ones are markedly depressed (which results in a sort of “selective amplification” of the relevant information, improving the categorization performance, see Fig. 6 and discussion below). The final synaptic configuration is consistent with the expectations explained in Fig. 1.
We show the neural correlate of the categorization process in Fig. 5. The first three panels at left show, for three successive stages during learning, the time course of the firing activity of five populations, averaged over 20 correct trials (outcome C1). In each panel, we plot the activity for the two category populations (C1 and C2, respectively, requested and nonrequested class), the population encoding the active value of the diagnostic feature (Δλ = 0.3 Hz), one nonstimulated population and one of the stimulated populations coding for the nondiagnostic features.
Here it is seen that, as learning proceeds, the activities of diagnostic populations stimulated and not stimulated (cyan and orange) split, consistent with the experimental observations of Sigala and Logothetis (2002) and with theoretical results obtained by Szabo et al. (2006) in a simpler context. The last panel in Fig. 5 shows the time course of the average activities for the same populations at the end of learning, but with nonplastic top-down synapses, and confirms that the splitting observed for the TD-on case is an exquisite effect of a selective top-down synaptic structure.
Influence of selective top-down synapses on network performances.
We showed in Fig. 4 that the implemented plasticity in the top-down synapses: 1) sharpens the selectivity of the bottom-up synaptic structure, and 2) moderately affects the time course of the learning dynamics. We also found (Fig. 5) that the selective top-down synaptic structure entails breaking the symmetry between the neural activity encoding stimulated diagnostic and nondiagnostic features, consistent with the results of Sigala and Logothetis (2002). One can still ask to what extent the above results also imply important effects on the computational performance of the network, thereby helping formulating informed guesses about the mechanisms underlying the corresponding experimental findings.
To understand this, it is important first to remark that our system is a noisy one and in fact it is subject to two very different main sources of noise, one “exogenous” and the other “endogenous.” The first depends on the dimensionality of the feature space in which the stimuli to be categorized are defined. The very definition of diagnostic vs. nondiagnostic features implies that during learning the former are consistently associated with the defined categories, trial after trial, while the latter implement a random category labeling for each trial. Consequently, the higher the number of nondiagnostic features, the larger the associated component of the total input to PFC, that would overwhelm the one from diagnostic features in the absence of plasticity in top-down synapses, which realizes an effective amplification of this signal-to-noise ratio (which is then to be understood as the ratio between the inputs coming from diagnostic and nondiagnostic features). The other, endogenous noise source is due to stochastic nature of the neuronal network dynamics. The network has sparse connectivity, a different realization of which is generated for each simulation. This quenched noise, together with the finite-size effects due to the finite number of neurons in each population and the consequent distribution of firing rates, affects both the dynamics of decision (through the firing rate fluctuations) and the dynamics of learning (since, as explained in the methods, the dynamics of synaptic plasticity, although rate dependent on average, is stochastic, to the extent that the neural activities are). This endogenous noise, besides being a realistic ingredient of any finite and sparse system, contributes to expand the dynamic repertoire of the network (e.g., the range of decision times). Of course, an additional “fast” noise component is due to the Poisson spike trains implementing the stimuli.
In the following sections, we quantify the effect of a selective top-down synaptic structure in diminishing the effect of the exogenous noise, thus improving learning performance. We also investigate how the top-down synaptic structure affects the dynamics and characteristic times of the classification process.
At the end of learning (obtained with Δλ = 0.3 Hz, NF = 16), we freeze the synaptic structure and then apply new stimuli with different levels of contrast Δλ = 0.05, 0.1, and 0.2 Hz (λ = 4.2 Hz) and different numbers of stimulated nondiagnostic features (NF = 4, 8, and 16), and test the performance. For each Δλ and NF we also test the performance of the network in which we substitute the learned, selective top-down synaptic structure with a uniform set of synapses (JPFB = JDFB = JnFB), the efficacies of which are drawn from the same probability distribution, with an average such as to match the top-down synaptic efficacy averaged over each postsynaptic population of the structured TD-on case.
Figure 6A shows the network performance (percentage of correct classification) for different numbers NF of nondiagnostic features and for three values of stimulus contrast Δλ, with selective and uniform top-down synaptic structure.
For the same number of nondiagnostic features, performance increases with Δλ, as expected; the most relevant result is that, for given Δλ, the network with structured top-down synapses retains high performance even for a large number of nondiagnostic features. The effective signal-to-noise ratio that drives the competitive mechanism in the PFC layer is made almost independent from the number of “distracting” nondiagnostic features, as a result of the depressed top-down synapses pointing to nondiagnostic features. In other words, if we call S and S + ΔS the total input to the two category populations, the structured top-down synapses determine a higher ΔS/S ratio: as the number of stimulated nondiagnostic features increases, the common input S would be dominated by the activity of nondiagnostic populations, were it not for the combined effects of top-down depression of JnFB synapses and the sharpened differentiation of (JDFF vs. JPFF) and (JDFB vs. JPFB) (see Fig. 6D).
On the other hand, we remark that the selective top-down synaptic structure does not increase the signal ΔS since, until the competition in the category layer sets in and a decision is taken, the diagnostic features populations receive equal feedback. This is consistent with the observation that, for low numbers of nondiagnostic features, the performance is essentially the same for structured and unstructured top-down synapses.
Figure 6, B–D, illustrates the relationship between performance and decision times for the same values of NF and Δλ as in Fig. 6A for both selective and uniform top-down synaptic structure. We defined the decision time (DT) as the instant when the absolute difference between the firing rates of the two category populations, divided by their sum, exceeds a given threshold D (we set D = 0.7 as in Marti et al. 2008). The typical speed-vs-performance curve derived from psychophysics experiments is monotonically decreasing (as it derives from increasing the stimulus control parameter, Δλ in our case, which entails increasing performance and decreasing decision time). The selective network (gray lines) is always faster than the uniform one (black lines) at equal performance. Thus the structured feedback not only favors better performance for the same input (Fig. 6A) but also makes the network more prompt to respond if the input current is adjusted to match the performance. In the case with uniform top-down, for given NF, C1 and C2 receive greater symmetric input S, and are therefore less sensitive to the relative variations of their activities, such that they spend more time in a symmetric, “undecided” state. The higher NF, the greater the comparative advantage of the selective network (Fig. 6, B–D). The DT gap between selective and uniform case for high performance is seen to be mostly due to a marked flattening of the DT vs. performance curves for the uniform network: for the selective top-down network, DT preserves higher sensitivity to the selective input strength Δλ and the expected DT vs performance fall-off is observed even for the highest NF. The above observation can be understood again in terms of the larger increase of the symmetric input component S to C1 and C2 for the uniform network. As S increases with NF, the dynamics spends longer and longer times around a state with almost equally high firing rates for both C1 and C2; Δλ has little influence on this time, yet it still determines the performance of the network; taken together, these two effects explain the observed flattening of the RT vs. performance curves. Indeed, an almost symmetric fixed point of the dynamics with high firing rates for C1 and C2 is expected to develop as S increases, around which the dynamics of decision is strongly distorted and where the network stays for longer times.
Figure 7 illustrates the different neural dynamics during the decision process for the selective and uniform networks, for NF = 16 and Δλ = 0.5 Hz, which underlie the behaviour described above. Figure 7, A and B, shows the time course of firing rates of C1 and C2 (black and grey curves, respectively) during the fastest (thicker curves) and the slowest (thinner curves) trials. It is seen that the spread in decision times is much larger for the uniform case (see also the distributions of decision times in Fig. 7, C and D). One possible interpretation, consistent with known phenomenology of cooperative-competitive models of decisionmaking, can be obtained if we picture the network dynamics as the motion on an “energy landscape,” with each point being identified by the energy level and the average firing rates of C1 and C2. The stable decision states of the network, induced by a stimulus, are identified by two asymmetric minima of the landscape (high firing of C1, low firing of C2, and vice versa). The diagnostic component of the stimulus determines a force driving the dynamics from the initial saddle the systems starts from (almost equal firing for C1 and C2) towards one of the two decision states, and it is expected to be maximally effective for the selective case in which the component related to diagnostic features is amplified; for the uniform case the strong common component of the input to C1 and C2 makes the system roll on a flatter saddle, the departure from which, towards one of the asymmetric minima, take more time on average.
Of course the chosen value of uniform top-down synapses affects the characteristic times of the decision dynamics; by varying this value between 0 (no feedback) and 2 JFB (beyond which the state of spontaneous activity of the network with no external stimuli is disrupted), we checked that 1) performance is only slightly affected (< 6%); and 2) the qualitative features of the lines in Fig. 6, B–D, are preserved, though the exact values of DT obviously change.
Corrupting learned stimuli: a footprint of categorization in ITC.
The multipopulation model system is a complex recurrent network, with local intramodules feedback and intermodules PFC-ITC feedback. One might then ask whether certain characteristic properties of recurrent networks could play a role in the situation under study, such as the pattern completion ability of attractor networks.
Starting from a network configuration obtained at the end of successful learning, we went on to stimulate the network with corrupted versions of the stimuli, i.e., stimuli for which only a fraction x of the neurons in the currently activated diagnostic population receive the usual increased external current, as explained in methods.
In this situation, we studied the time course of the neural activities of the stimulated and nonstimulated neurons in the diagnostic population defining the current stimulus and how it is affected by the learned, selective synaptic top-down structure. The expectation is that, as the decision process in the PFC matures, the unstimulated neurons belonging to the activated diagnostic feature would be recruited as a result of the combined effect of the selective top-down input and the recurrent interaction with the stimulated neurons in the same population.
Figure 8 illustrates example results, obtained for low value of Δλ = 0.05 Hz (where we expect the recurrent synaptic structure to play a comparatively greater role) and x = 0.75. Neural activities are reported for three trials that happen to have (for the same network configuration and stimulation parameters) different decision times DT. Firstly, we note that, as expected, as the decision process matures the unstimulated population gets recruited (green trace). Secondly, and interestingly, the recruitment occurs with a latency determined by the time it takes for the decision process to complete (compare the three panels in Fig. 8). This latter observation is interesting because it constitutes a specific reflection of the time course of the decision process developing in PFC, in a time-dependent modulation occurring in the unstimulated neurons in ITC, at the single trial level.
The same reasoning would also apply to a situation in which each stimulus is defined by multiple diagnostic features, in which case corrupting a stimulus might mean excluding one or more diagnostic features from the ones the stimulus is supposed to activate.
In other words, assuming the feature selectivity of ITC neurons is experimentally well characterized, so that one can identify the “unstimulated” diagnostic neurons that correspond to a specific corruption of the stimuli, the point in time when those neurons would start to sharply increase their firing activity during the trial would signal the completion of the decision process in the absence of simultaneous recording from PFC.
We remark that the observed recruitment of nonstimulated neurons coding for a diagnostic feature would not be observed in the absence of a category-related top-down information flow, which is consistent with the data reproduced in Fig. 3 where (for the anesthetized animals that do not perform decisions and hence top-down projection would be unavailable and pattern completion should not occur) an impoverished stimulus makes the corresponding feature representations disappear.
Selective top-down synapses sharpen tuning curve.
For all the numerical experiments described so far, we adopted a minimum feature representation (one diagnostic feature, with two values). It is known that feature-selective neurons in the temporal lobe exhibit a variety of tuning curves, see e.g., Freiwald et al. (2009). As already noted, we do not aim at reproducing specific aspects of the activity in the infero-temporal cortex; however, studying the implications of tuning curves in our ITC layer is relevant from the computational point of view that concerns us here. Therefore, we adopted a simple generically plausible shape of tuning curves and explore how they are affected by learning. We report results obtained for a network in which the diagnostic feature possesses four values (therefore, four populations D1, …, D4 code for it in the ITC layer); furthermore, each stimulus is encoded by a profile of activation of the four diagnostic populations, with the maximal activation for the active value of the feature for that stimulus (as before we have NF nondiagnostic features with two values each). In this way we implement a crude representation of tuning curves in this feature layer. Stimuli with maximal activation of D1 and D2 are to be mapped to class C1 and D3 and D4 to C2. Stimuli with maximal activation of D1 and D4, for which the activation profiles of the four diagnostic populations has the smallest overlap, are “easier” to classify with respect to D2 and D3; we will therefore label them as “easy” and “difficult” stimuli. One can expect that the build-up of the selective top-down synaptic structure could affect the activation profile of the diagnostic populations. In fact, we observed (see Fig. 9) a marked sharpening of the tuning curves for both easy and difficult stimuli (compared Fig. 9, A and B vs. C and D) for which, however, we obtained slightly different final performances (see percentages of correct responses reported in Fig. 9, right). The results shown in Fig. 9 constitute a further evidence of the computational consequence of a learned selective top-down synaptic structure and establish a specific experimental prediction. We remark that in our model the sharpening develops after the decision is taken, which suggests that if one could monitor the tuning curves in successive time intervals during trials, depending on when their sharpening occurs with respect to the decision time, one could get an indication about their origin being in a task-specific top-down signal.
Countermodulation in error trials.
In Fig. 10, we show how network activities are differently modulated according to the correct (top) or wrong (bottom) outcome of the trial. Solid lines are the average activities of the relevant populations over five consecutive trials approximately at an intermediate point of the learning process. Figure 10, top, shows the expected (see Fig. 5) ranking of the stimulated diagnostic vs. nondiagnostic features, which is destroyed for the wrong trials. Notice in particular the nonstimulated diagnostic feature value (orange trace), which as expected shows and increase of activity, being tightly correlated to the wrong decision state. This is because in the former case the stimulated diagnostic population receives, in addition to the stimulation, a coherent reinforcement from the correctly winning PFC population, with which learning has already formed strengthened synapses, while in the latter case the wrongly winning PFC population feeds back on it through synapses which have been weakened by learning. This observation suggests that the operation of a task-related selective feedback could be tested in in vivo experiments by comparing the modulation of the activities of neurons with different selectivities in correct and wrong trials. A hint that such a strategy could be viable and informative is provided in Mirabella et al. (2007), where evidence is provided that the activity of neurons in V4 during a visual selection task involving attention is differently modulated depending on the trial being correct or wrong.
The idea that perceptual learning, visual categorization, and/or selective attention involve an effective suppression of irrelevant sensory input is not new (i.e., see Riesenhuber and Poggio 1999; Bar 2003; Spratling and Johnson 2004; Roelfsema and van Ooyen 2005). For the specific case of visual categorization, even if experimental evidence is still inconclusive as to whether top-down control by PFC is needed to accomplish it (see Minamimoto et al. 2010; Buckley and Sigala 2010), several reported results suggest a role of ITC-PFC mutual interaction in the arbitrary stimulus-category association. In the present work, we put this idea in a specific context and address a general computational issue, i.e., the implication of learned, selective top-down projections between a category-aware area and a sensory-coding one, taking visual classification as a relevant case in point. With a simplified ITC-PFC model designed to account for the key results shown in Sigala and Logothetis (2002) and in De Baene et al. (2008), we showed that a semi-anti-Hebbian, spike-driven learning mechanism generates a selective amplification of taskrelevant neural representations in ITC and, because of this, enhanced classification performance and faster response. In the present work, we adopted the simplest choice for the feature space defining the stimuli, i.e., just one diagnostic feature with two possible values. We remark, and checked in simulation in a few cases, that as long as the classification problem remains linearly separable, enlarging the dimension of our one-dimensional “diagnostic subspace” (as in the case of the two-dimensional subspace of Sigala and Logothetis 2002) does not spoil the classification ability of the network (it acts in fact as a Perceptron; Rosenblatt 1958), nor the mechanism of selective amplification of diagnostic information. However, for the higher dimensional case the synaptic configuration generated by online learning does depend on the choice of the training stimuli, on the presentation sequence and on the initial condition (as it is the case for the Perceptron). This work is one of the few so far addressing dynamic learning effected by the ongoing spike-driven synaptic dynamics of LTP and LTD, coupled in closed loop with the stimulus-driven spiking activity in a multimodular network. We showed how robust learning histories lead the global network to perform well in the face of the many sources of instabilities that affect dynamic learning (see Del Giudice and Mattia 2001; Del Giudice et al. 2003; Amit and Mongillo 2003). In particular, finite-size effects are important and bring about deviations from the predictions of the mean field theory that guide simpler approaches like the one we adopted in Szabo et al. (2006). For a finite number of synaptic connections per neuron, each population has a distribution of emission rates. We remind that our synapses are stochastic, as long as neural activities are (see methods). Considering for example the synapses connecting populations of neurons stimulated by the same stimulus, and therefore supposed to get potentiated, the high- and low-rate tails of the actual frequency distribution corrupt the homogeneity of the pattern of synaptic transition probabilities, such that in the same synaptic group some synapses will have too high LTP probability, while others will be unexpectedly unchanged. Similarly, finite-size effects can provoke unwanted synaptic transitions where they are not expected and harmful (such as a potentiation of synapses involving postsynaptic background neurons, which can become the seed of instability for the spontaneous state). One ingredient that makes finite-size effects more or less harmful is the character of the “synaptic transfer function,” meaning the function giving the LTP/LTD transition probabilities as functions of the pre- and postsynaptic emission rates. The sensitivity of this function in the critical region where the rate distributions involved overlap is an important factor in determining how serious finite-size effects are going to be. These and other effects make online learning with realistic, spike-driven synaptic dynamics a major challenge that we faced in this work, which, besides the value of the new results and predictions we provide, hopefully contributes to advance a domain of modeling studies that needs progress. The assumed plasticity for the top-down (PFC-to-ITC) synapses turns out to ensure slightly faster, and significantly more robust, learning histories (Fig. 4).
The value of a model lies of course in its ability to generate testable predictions. We list in the following some speculative implications of our model. The learned selective top-down synaptic structure essentially lowers the effect of the “noise” associated with the nondiagnostic features, by amplifying the “signal” component associated with the diagnostic features; it is therefore expected to matter more as the number NF of such features increases. This is indeed what we showed in Fig. 6, where it is seen how the improvement in the classification performance increases markedly with NF. Note that, because of the nonlinearity of network responses, the additive top-down feedback can in fact produce essentially multiplicative effects, as it has been shown in the context of attention modeling in Deco and Rolls (2005).
We expect the predicted task-specific modulation not to depend much on the details of the chosen model and to be shared by any model based on selective amplification of taskrelevant sensory information. We can then speculate that, if the global neural activity induced by a stimulus in ITC is measured, for instance by the BOLD signal, it would be more evenly distributed in the naive subject, before learning the task and more concentrated in the well-trained subject: before learning, the patchy ITC representation would generate signal spots of comparable intensity, while after learning the signals associated with the amplified, taskrelevant features would pop up. Also, one can predict that turning a nondiagnostic feature into a diagnostic one in a previously learned set of stimuli would result in an expansion of the dominant signal spots.
Previous works (Op De Beeck et al. 2006; Gillebert et al. 2009) studied the functional MRI correlate of learning categorization, showing that the BOLD signal associated with categorized images is enhanced after learning. Our results are compatible with those findings, in that the global activity in our feature layer after learning is indeed higher. However, our results together with the electrophysiological results of Sigala and Logothetis (2002) and De Baene et al. (2008) suggest the above prediction which goes beyond these findings, i.e., the spatial variability of the BOLD signal should reflect the differential activation corresponding to diagnostic and nondiagnostic features.
Our results shown in Fig. 9 suggest that if the proposed mechanism of selective amplification of diagnostic features representation operates, learning the categorization task would result in sharpen tuning curves for diagnostic features. This effect is qualitatively compatible with the differential category-tuning reported by De Baene et al. (2008).
We showed in Fig. 10 that error trials entail a qualitatively different modulation of neural activities. This generates a testable prediction, in line with previous suggestions like Mirabella et al. (2007) in which evidence is provided that the activity of neurons in V4 during a visual selection task (involving attention) is modulated depending on the trial being correct or wrong. It is clear that such an option would be viable only for a version of the task allowing for a non negligible error rate even in well trained subjects.
Results shown in Fig. 6, B–D, suggest that some of the direct implications of the selective feedback synaptic structure envisaged in the model might be amenable to investigation in psychophysics. Indeed, both the relevant parameters Δλ and NF playing a role in shaping the plots of Fig. 6, B–D, can be experimentally varied (for NF various choices are available in principle, including adding new features or making preexisting nondiagnostic features variable among stimuli). The nontrivial, and interesting, experimental question relates to the interplay between Δλ and NF: Δλ has the dual role of biasing the decision dynamics and also (when feedback is present and not yet selective) to cause an increase in the activity induced by nondiagnostic features. Therefore, by looking at the changes in the DT vs. performance plots while the subject is “learning” to ignore one more nondiagnostic feature one could qualitatively check the prediction implied by a selective feedback buildup.
More in general, one can easily imagine a situation (frequently explored both in psychophysics and in electrophysiology) where a pair of features are consistently associated with the same category or where stimuli are presented in different sensory modalities. In such situations, after training, a generic and testable prediction is that the presentation of one member of the pair, or the presentation of one sensory modality, would entail the partial activation of the other member of the pair or the representation pertaining to the other sensory modality, respectively.
In Fig. 3, we suggested that the chosen architecture of our simplified model is consistent with reported evidence concerning visual objects representations in ITC. In Tsunoda et al. (2001), besides reporting the patchiness of object representation in IT cortex that we referred to in the methods, the authors also notice that, as the visual appearance of objects is deprived of features that, although “small,” are key for its interpretation, not only the corresponding activity patches disappear, but also new ones appear. The authors suggest that the distributed object representation would result from the activation, and active suppression, of a constellation of (possibly overlapping) feature-specific neural populations. Based on our modeling results, we can speculate that the task-dependent modulation of selectivity induced by learning might reshape such regions of overlapping representations (see Fig. 3). Specifically, if a IT neuron, before learning, has a mixed selectivity for two (values of) features, if the latter are mapped by the rule to the same category, the changes induced by learning in the synapses linking that neuron to the PFC category populations are consistent, and this will end up boosting the activities in the mixed selectivity neurons. On the other hand, if the mixed selectivity relates to features belonging to different categories, we can expect that the original differences in the neural responses to those features get amplified. If neurons in the overlapping patches have a nontrivial distribution of firing rates for the two features, learning would result in shrinking the overlapping regions. While the details will depend on several factors (such as the initial tuning curves for the involved features, the frequency of presentation of different stimuli, the nonlinearities in the dependence of synaptic changes on neural activity), we suggest that a task-dependent modulation of the overlaps in the patchy ITC representation would be expected. With reference to the categorization problem in Sigala and Logothetis (2002), in which the category boundary is linear in the two-dimensional diagnostic space, the proposed effect can be seen as a modulation of the classifier margin. Also, the differential modulation of diagnostic and nondiagnostic feature representations could in principle account for the appearance of previously suppressed features when key elements of the visual object are removed (observed in Tsunoda et al. 2001), via the mutual inhibitory interactions in the ITC layer.
Finally, the experiments performed with corrupted stimuli, besides confirming expectations based on the recurrent network architecture, suggest that (thanks to the learned selective topdown synaptic structure) a specific neural correlate of the decision process being completed would be available in a feature-encoding area.
The research leading to these results received funding from the CONSOLIDER-INGENIO 2010 Programme CSD2007-00012. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
No conflicts of interest, financial or otherwise, are declared by the author(s).
We thank Jochen Braun for a critical reading of the manuscript.
- Copyright © 2012 the American Physiological Society