Beiser, David G. and James C. Houk. Model of cortical-basal ganglionic processing: encoding the serial order of sensory events. J. Neurophysiol. 79: 3168–3188, 1998. Several lines of evidence suggest that the prefrontal (PF) cortex and basal ganglia are important in cognitive aspects of serial order in behavior. We present a modular neural network model of these areas that encodes the serial order of events into spatial patterns of PF activity. The model is based on the topographically specific circuits linking the PF with the basal ganglia. Each module traces a pathway from the PF, through the basal ganglia and thalamus, and back to the PF. The complete model consists of an array of modules interacting through recurrent corticostriatal projections and collateral inhibition between striatal spiny units. The model's architecture positions spiny units for the classification of cortical contexts and events and provides bistable cortical-thalamic loops for sustaining a representation of these contextual events in working memory activations. The model was tested with a simulated version of a delayed-sequencing task. In single-unit studies, the task begins with the presentation of a sequence of target lights. After a short delay, the monkey must touch the targets in the order in which they were presented. When instantiated with randomly distributed corticostriatal weights, the model produces different patterns of PF activation in response to different target sequences. These patterns represent an unambiguous and spatially distributed encoding of the sequence. Parameter studies of these random networks were used to compare the computational consequences of collateral and feed-forward inhibition within the striatum. In addition, we studied the receptive fields of 20,640 model units and uncovered an interesting set of cue-, rank- and sequence-related responses that qualitatively resemble responses reported in single unit studies of the PF. The majority of units respond to more than one sequence of stimuli. A method for analyzing serial receptive fields is presented and utilized for comparing the model units to single-unit data.
The serial order of events and actions is critical in cognition and behavior. In addressing this issue more than four decades ago, Lashley (1951) postulated that the brain analyzes and controls serial order by creating and using a spatial pattern of neural activity, which he referred to as a “determining tendency” or idea. To control sequential actions, this spatial pattern would require translation into expressive action in the time domain through a process he likened to the application of “syntax” in the formation of language from ideas. The inverse transformation also must exist to transform temporally spaced sensory experiences into a sustained spatial pattern of brain activity, for example, to construct a concept from sequential sensations during haptic manipulation or visual survey.
Lesion results suggest that the prefrontal cortex is critical in analyzing serial events and in using the results to control behavior. Subjects with frontal lobe lesions show impaired performance on tasks requiring organization of sequential pointing responses (Petrides and Milner 1982; Wiegersma et al. 1990), serial-order recognition (Kesner et al. 1994), or recency judgments (Milner et al. 1991). Monkeys subjected to bilateral lesions of areas 46 and 9 have difficulty monitoring sequences of novel stimuli (Petrides 1991). The basal ganglia also are implicated in serial processing through the impairments of cognitive and motor skills in Parkinson's (Brown and Marsden 1990; Harrington and Haaland 1991) and Huntington's disease (Gabrieli 1995; Willingham and Koroshetz 1993). Some of these deficits are strikingly similar to the ordering deficits of frontal patients (Sagar et al. 1988; Sullivan and Sagar 1989; Willingham and Koroshetz 1993).
Single-unit recordings in primates executing delayed-sequence tasks support the importance of prefrontal cortex and basal ganglia in serial processing. Instructional cues are presented in a particular sequence, and, after a delay period, the subject must produce a corresponding sequence of responses. Neurons in prefrontal areas, and closely linked areas of the frontal eye fields and caudate nucleus, are sensitive to the serial order of the instructional sequence (Barone and Joseph 1989; Funahashi et al. 1993; Kermadi and Joseph 1995; Kermadi et al. 1993). Responses that are initiated by the instructions and sustained through the delay period could represent conversions of temporal sequences of sensory input into spatial patterns of neural activation. Similarly, some motor-preparation units in the frontal eye fields, caudate nucleus, and globus pallidus are related to the serial order of the subsequent sequential actions (Barone and Joseph 1989; Kermadi and Joseph 1995; Kermadi et al. 1993; Mushiake and Strick 1995; Tanji and Shima 1994). Such activity could represent commands for the conversion of a spatial pattern of activation into the temporal domain of movement. Together, these studies provide persuasive evidence for the existence of conversion mechanisms bridging the temporal domain of sensory input, the spatial domain of Lashley's “determining tendency,” and the temporal domain of behavioral expression.
Sustained responses in the prefrontal cortex of primates appear to function as a spatial working memory during delayed-response tasks (Funahashi et al. 1989, 1990; Fuster and Alexander 1971; Goldman-Rakic 1995; Goldman-Rakic et al. 1990; Petrides 1991). Evidence for working memory activity within analogous areas of the human prefrontal cortex comes from functional imaging studies (Fiez et al. 1996; Jonides et al. 1993; McCarthy et al. 1994). Discharge that is sustained through the delay period also has been identified in the caudate (Hikosaka et al. 1989b; Schultz and Romo 1992) and SNr (Hikosaka and Wurtz 1983) and in the thalamus (Fuster and Alexander 1973). Evidently neural correlates of spatial working memory and serial processing are found in many of the same areas of the CNS. Indeed, it has been suggested that the mechanisms providing temporal integration in sequencing tasks be viewed as extensions of those providing working memory representations in delayed-response tasks (Fuster 1985; Goldman-Rakic 1987).
In this paper, we present a neural network model of cortical-basal ganglionic processing that focuses on the transformation of sequential sensory input into spatial patterns of neural activity, an operation that we refer to as encoding. Although we do not model it here, we will refer to the inverse transformation, from a spatial pattern to a sequence of movements, as a decoding operation. Some means of encoding the serial order of events or perceptions and for decoding the result into appropriate actions clearly is required for the performance of most of the tasks discussed in the previous paragraphs. The model presented here demonstrates how the encoding process might be a natural outcome of the basic anatomy and physiology of the basal ganglia and cerebral cortex. As a test of the model, we compare its responses with the single-unit responses of neurons recorded from the prefrontal cortex and basal ganglia during the instruction and delay phases of delayed-sequence tasks.
The encoding model presented here is an implementation of the conceptual model of cortical-basal ganglionic processing proposed by Houk and Wise (1995). These authors based their conceptual model on the modular anatomic organization of “parallel loops” linking the frontal cortex, basal ganglia, and thalamus, originally conceived by Alexander, DeLong, and Strick (1986) and supported by recent transsynaptic labeling studies (Middleton and Strick 1997a). The present encoding model deals specifically with the loop through area 46 in the prefrontal cortex, through caudate nucleus (CD), internal segment of the globus pallidus (GPi), thalamus (T), and back to the PF. We follow Wise and Houk (1994) in assuming that this macroscopic module is itself composed of an array of similarly organized microscopic modules. Thus the (microscopic) module illustrated in Fig. 1 follows the basic anatomic plan of the prefrontal cortical-basal ganglionic loop.
The first stage consists of convergent excitatory projections from a large number of cells in the cerebral cortex (C) onto a medium spiny neuron within the caudate nucleus (CD) of the neostriatum. Portions of the prefrontal cortex, in particular areas 9, 10, and 46, project preferentially to the dorsolateral head of the caudate (Selemon and Goldman-Rakic 1985, 1988). Each medium spiny neuron receives input from ∼10,000 different corticostriatal afferents (Wilson 1995). This highly convergent neuronal architecture, together with the physiological properties of the cells, led Houk and Wise (1995) to suggest that spiny neurons are positioned ideally for detecting contextual events of behavioral significance. With respect to the instructional phase of a delayed-response task, contextual event detection might involve the recognition of stimulus-related signals conveying an instructional cue's spatial position, identity, or other physical characteristics. In a serial task, context also would include intrinsic signals such as working memory representations of previous stimuli.
There is some disagreement regarding the cortical origins of projections to a given volume of the striatum (Wise et al. 1996). One hypothesis favors convergent input from cells in functionally related, yet distinct, cortical areas (Flaherty and Graybiel 1993; Parthasarathy et al. 1992; Yeterian and Van Hoesen 1978), whereas another favors convergence from neighboring cells in a single cortical area (Selemon and Goldman-Rakic 1985; Strick et al. 1995). Either anatomic arrangement would provide the convergence of sensory and recurrent projections onto the CD layer as required by the model. Corticostriatal projections from the prefrontal cortex and several of its reciprocally linked areas (e.g., posterior parietal, orbitofrontal, anterior cingulate, and superior temporal cortex) converge in a general way onto the same volume of caudate, although the predominate pattern is one of segregation or interdigitation of terminal fields as opposed to frank intermixing (Selemon and Goldman-Rakic 1985). Alternatively, cue-related sensory signals in posterior parietal might be relayed to CD units via the sensory-related cells in the PF through cortical-cortical projections (Bates and Goldman-Rakic 1993; Selemon and Goldman-Rakic 1988). What is important to note here is that either mechanism of convergence could be used to provide the model's caudate layer with sensory-related input information.
Continuing on to the next layer of the loop, spiny neurons in the head of the caudate make inhibitory synapses (depicted as “a” in Fig. 1) with neurons in the dorsomedial one-third of the GPi (Hedreen and DeLong 1991), which in turn project to nuclei of the thalamus including ventralis anterior (VA) and ventralis lateral (VL) (DeVito and Anderson 1982). Neurons in the GPi are characterized by a high rate of tonic activity interspersed with momentary pauses due to spiny neuron firing episodes (Wilson 1990). The tonic activity inhibits projection targets in the thalamus, and the pauses produce a disinhibition of thalamic neurons (Deniau and Chevalier 1985). This disinhibition initiates a postinhibitory rebound discharge response within thalamic relay neurons that is mediated, in part, by low-threshold T-type calcium channels (Wang et al. 1991). Thus the dual inhibitory action of this pathway serves to activate thalamic discharge through disinhibition (Deniau and Chevalier 1985).
VA and VL, along with other thalamic nuclei including the medialis dorsi (MD), contain neurons that project ipsilaterally back to the PF to close the cortical-basal ganglionic loop (DeVito and Anderson 1982). An additional loop is formed by neurons in area 46 of the PF that project in a reciprocal manner back to several thalamic nuclei including MD and VA (Jacobson et al. 1978; Siwek and Pandya 1991). It has been suggested that such a cortical-thalamic loop has the potential, given sufficient gain, for sustaining activations, like those thought to be correlates of working memory, through positive feedback (Dominey and Arbib 1992; Hikosaka 1989; Houk and Wise 1995).
There is also an indirect pathway through the basal ganglia that is not depicted in Fig. 1 because it is not simulated in the present rendition of the encoding model.
Sequence encoding with an array of modules
The delayed sequence task begins with an instructional period during which three cues are illuminated in a particular serial order (Barone and Joseph 1989; Funahashi et al. 1993; Kermadi and Joseph 1995; Kermadi et al. 1993). After a short delay period, the subject is required to touch the cues in the same order in which they were illuminated. Because the present model focuses on the encoding problem, we will only consider the instruction and delay phases of the task.
The encoding model (Fig. 2) combines several modules of the type shown in Fig. 1 into an interacting array. The PF layer is composed of event-related (E) and recurrent (R) neurons. The three event-related units (labeled A, B, and C in Fig. 3) provide the model with a labeled-line representation of the instruction sequence. To simulate the onset and offset of individual cue lights, E units are toggled sequentially on and off. This type of signal resembles that of visual fixation neurons of the posterior parietal cortex; these neurons respond to the onset of the stimulus and give brisk discharges that continue as long as the stimulus remains within the receptive field (Goldberg and Colby 1989). Neurons in area 7a respond to the retinal location of a visual stimulus with receptive fields that are typically unimodal and broadly tuned (Robinson et al. 1978). Such cue-related signals could be conveyed to cells of the prefrontal cortex via corticocortical projections. Clearly, the model's labeled-line inputs do not exploit much of the rich information contained in parietal responses; however, this simplification allows us to focus on the ordinal, rather than spatial, aspects of the encoding task.
Corticostriatal afferents make en passant synapses with spiny neurons (Wilson 1990); this serves to distribute information to CD units across the entire modular array. There is a mixture of input from the E units described in the previous paragraph and input from the R units in the PF cortex, so named because they receive recurrent input from the model's processing modules. The R inputs provide each module access to sustained cortical-thalamic activity, representing results obtained from the processing of prior events. Thus each CD unit is presented a spatial pattern of input representing both present events and context signals based on the processing of prior events.
The modules also compete through the inhibitory collaterals of caudate spiny neurons (shaded region in Fig. 2). Striatal competition is strongly suggested by the preponderance of medium spiny neurons, by the extent of the axonal arborizations of their collaterals, and by some physiological evidence (Groves 1983; Katayama et al. 1981; Rebec and Curtis 1988; Wilson 1995). Wickens (1993) has modeled spherical zones of mutual inhibition that he calls inhibitory domains. We instead model competitive interactions with a fully connected network of inhibitory CD units. The use of a single domain is a simplification that neglects the potential for more complex interactions.
Neurons were modeled as single membrane-bound compartments with passive leakage conductances. A first-order differential equation relates the membrane leakage current and synaptic currents to the membrane potential for a neuron, j (Eq. 1 ) Equation 1The passive electrical properties of the model's neurons are representative of those reported for the cortex, striatum, and thalamus (Connors et al. 1982; McCormick and Huguenard 1992; McCormick et al. 1985; Wilson 1990). A membrane capacitance (C) value of 0.5 nF and leakage conductance (g L) of 0.0333 μS gives each neuron a time constant of 15 ms. Resting potential (E L) was set to −60 mV. The membrane leakage currents are defined by Eq. 2 Equation 2The model represents synapses as scalar weights (w j,k) between neurons k and j. Making the simplifying assumption that inputs sum in a linear fashion, we lump the action of many synapses into a single current. The weighted sum of presynaptic firing rates gives the synaptic current (Eq. 3 ) Equation 3A sigmoidal activation function (Eq. 4 ) with a threshold (V th) of −55 mV is used to convert membrane potential into an output firing rate within a normalized range between 0 and 1. In the CD layer, a large slope parameter, b in Table 2, was used to model the sharp transitions between “up” and “down” states displayed by striatal spiny neurons (Wilson 1995) Equation 4The caudate layer of the module receives convergent excitatory inputs from neurons of the PF cortex, modeled by Eq. 3. In addition, CD units compete through the inhibitory action of GABAergic collaterals. The total inhibitory current for each CD unit is determined by scaling the sum of the activations of all other CD units in the layer; CD units do not receive self-inhibitory input.
Pallidal neurons were modeled with a spontaneous firing rate of 0.5 using a bias current (−0.1665 nA) that depolarizes the membrane potential to V th. At V th, the output of the GPi unit is maximally responsive to inhibitory input from the CD layer. The synaptic weights between CD and GPi layers were adjusted such that each CD input strongly inactivated its GPi target.
Thalamic relay neurons display postinhibitory rebound behavior mediated by T-type calcium currents (McCormick and Pape 1990). This rebound current permitted firing in response to pauses in the inhibitory input from GPi. It was modeled as specified by Wang et al. (1991) Equation 5The voltage dependence a of the steady-state activation and inactivation gates m and h was modeled with the Boltzman equation (Eq. 6 ) Equation 6The constants for these curves were set at physiologically plausible values noted in Table 1. The kinetics of the channel's gating variables both follow first-order differential equations with voltage-dependent time constants (Wang et al. 1991).
The inhibitory weights between GPi and T units were adjusted such that T units remained hyperpolarized at −76 mV under inhibition from tonically active pallidal units. This hyperpolarized membrane potential results in a strong rebound response from the calcium channel. The recurrent excitatory weights from T to PF and back were selected such that they would produce sustained cortical-thalamic firing rates once the PF unit was activated. All synaptic weights are listed in Table 2.
Alternate model assumptions
Most of the simulations reported in this paper used the model of the synaptic current detailed above to calculate excitatory and inhibitory synaptic currents from synaptic weights and presynaptic firing rates. This approach ignores the nonlinear effects of membrane potential on synaptic current values and thus treats the synapse as a “current source.” To explore the limitations of the “current-source” synapse assumption, simulations were run with a more physiological synaptic model that treats the weighted sum of the presynaptic firing rates as a synaptic conductances. These excitatory (Eq. 7 ) and inhibitory (Eq. 8 ) conductance values are converted into currents by multiplying the difference between the membrane potential and the applicable synaptic reversal potential (Eq. 9 ) Equation 7 Equation 8 Equation 9
An object-oriented simulator was written using the C++ programming language. The simulations were performed using batch processes running across a group of 30 Hewlett-Packard workstations (HP 712/80 i, HP 715/50, and HP 715/33). The nonoverlapping cue presentation paradigm was modeled after the approach used in the caudate studies by Kermadi et al. (Kermadi and Joseph 1995; Kermadi et al. 1993). The task is simulated by sequentially toggling the activation of the model's event-related (E) neurons on and then off (Fig. 2). In the Kermadi paradigm, consecutive cues are illuminated for 800 ms at 1,500-ms intervals. However, the time necessary for the network to reach equilibrium was much less than the 800 ms between changes in the state of the cues and varied considerably according to the magnitude of the corticostriatal weights. To minimize the amount of wasted simulation time, the original paradigm was modified so that the three onsets and offsets of the cue sequence were varied to trigger as soon as the network settled into a stable equilibrium.
The model equations were solved numerically using a fourth-order Runge-Kutta method with an adjustable time step ranging between 0.1 and 1.0 ms as a function of the magnitude of the first-order Runge-Kutta term. During a time step, each of the model's layers was synchronously updated in the order CD, GPi, T, and PF. Time steps were small in comparison with the time constants of network equilibration.
- prefrontal cortex
- caudate nucleus
- internal segment of globus pallidus
- maximum random synaptic weight
- range of random synaptic weight distribution
- membrane potential
- membrane capacitance
- leakage current
- leakage conductance
- membrane resting potential
- synaptic current
- excitatory synaptic weight
- inhibitory synaptic weight
- threshold potential
- presynaptic firing rate
- slope of sigmoidal activation function
- low-threshold calcium T-type current
- calcium reversal potential
- activation gating variable
- inactivation gating variable
- maximum T-type conductance
- steady-state activation/inactivation of m and h gates
- half-maximal voltage
- Boltzman equation slope parameter
- excitatory synaptic conductance
- inhibitory synaptic conductance
- excitatory synaptic reversal potential
- inhibitory synaptic reversal potential
Responses of an isolated module
The response of an isolated module (Fig. 1) to a single cue input serves to illustrate the model's basic processing operations. In its initial resting state, the module's units are quiescent except for the GPi unit, which exists in a tonic state of moderate activation (Fig. 3, GPi). A single event-related input is pulsed on and then off to simulate the onset and offset of an instructional cue (Fig. 3, Cue). This input induces the spiny unit in the CD layer to fire phasically. The short burst of CD activity (Fig. 3, CD) produces a momentary pause in GPi activity (Fig. 3, GPi), thus releasing the T unit from a state of tonic inhibition. Transient removal of pallidal inhibition produces a slow depolarization of the T unit with dynamics initially dominated by the passive properties of the unit (Fig. 3, T). This slow depolarization allows the activation variable, m, to increase, creating an inward calcium spike, which then quickly depolarizes the T unit—driving it into an activated state (Fig. 3, T). The C, which receives an excitatory input from the T layer, subsequently begins to fire ∼32 ms after the CD unit first crossed its firing threshold (Fig. 3, C). Most of the signal pathway's 32-ms delay is due to the kinetics of the thalamic T-type calcium channel. Reciprocal excitatory inputs from the C unit stabilize the membrane potential of the T unit at a level above threshold as it begins to repolarize. The reciprocal system quickly latches into a state of sustained activation that is maintained even after the return of tonic GPi inhibition. The cortical-thalamic loop is a bistable system because it has two stable equilibrium states (activated and inactivated) at moderate levels of pallidal input. Corticothalamic bistability is one of the key computational features of the Houk and Wise module. The transition from the activated state back to the inactive state requires a burst of inhibitory input to this bistable loop. Such a burst response could be effected by a burst of excitatory input to the GPi from the subthalamic nucleus (STN) of the indirect pathway. Presumably, this burst of STN activity would reflect activation of other spiny units within the striatum. The present simulation does not include this mechanism.
Interacting array of modules: emergence of spatial patterns
Sensory inputs produce sustained activations within cortical-thalamic loops and thus alter the internal states of individual modules, as illustrated in the previous section (Fig. 3). In an array of modules (Fig. 2), internal state information serves as an additional input modality to the CD layer. These recurrent (R) inputs provide information about past events that can influence future states of the model, thus providing a way of linking temporally spaced sensory inputs.
An array of 30 modules was initialized with randomly distributed corticostriatal weights and examined during the sequential cue paradigm. Figure 4, top, displays the cue inputs for the instruction sequence ABC. As mentioned in the previous section, the event-related units provide the network with labeled-line representations of cue stimuli. Accordingly, their time traces simply reflect the state of the three instruction cues during the sequence presentation.
Figure 4, middle, displays the response of the CD layer. Recall from Fig. 3 that the response of the GPi layer, though inverted, is very similar to the CD layer response. After the onset of the first cue in the sequence, the competitive CD layer settles into an equilibrium state defined by the activation of CD units 11 and 26. Cue offset often results in a resettling of the network into a second equilibrium consisting of a different group of winning CD units. In fact, over the course of the three-cue sequence, the network will often settle into six distinct equilibriums. This competitive settling produces a mixture of short and long bursts, as well as phasic response dynamics where, in fact, sustained responses in the CD and GPi layer are the exception rather than the rule. This result agrees with experiments in the caudate (Hikosaka et al. 1989a) and SNr (Hikosaka and Wurtz 1983) reporting that 10% (80/867) and 16% (15/95) of task-related neurons in these areas, respectively, display sustained responses.
During the competitive settling phase, a CD unit that becomes active for a significant time period will induce a rebound response in its module's thalamic relay neuron, leading to sustained activation of its cortical-thalamic loop. The PF layer response is displayed in the bottom set of traces in Fig. 4. In comparing activity in the CD and PF layers, note that while activated CD units may become deactivated during this time, the activity of the PF units, once elevated, is maintained by positive feedback within the bistable cortical-thalamic loops. Thus the PF activations provide a spatial record of significant CD activity. In this example of a simulated trial, the spatial code for the sequence ABC involves a pattern of 12 activated PF units (units 1, 3, 5, 8, 9, 10, 11, 12, 16, 23, 25, and 26).
In addition to providing a spatial record of CD activity, the PF patterns also can provide an unambiguous encoding of the input sequence. To demonstrate this computational property, a group of six sequences of three cues A, B, and C (i.e., ABC, ACB, BAC, BCA, CAB, and CBA) was presented to a network of 30 modules. The 15 rows of the prefrontal activation diagram (Fig. 5, gray squares indicate active units) represent the equilibrium state of the PF layer at each stage of the cue presentation for this group of six sequences. Together, the six sequences present the network with 15 different sequential contexts (A, B, C, AB, AC, BA, BC, CA, CB, ABC, ACB, BAC, BCA, CAB, and CBA). The patterns of active PF units comprising each row of the diagram can be thought of as spatial representations, or encodings, of the 15 sequential contexts. Note that the PF units in Fig. 5 display 15 distinct patterns of activation in response to the 15 different sequential contexts. These distinct responses represent an unambiguous encoding of the serial information presented by the set of instruction sequences.
Note that the spatial code of PF activation is relatively dense; indeed, each of the six sequences (bottom 6 rows of Fig. 5) engages between 11 and 18 units. We will explore this issue further in the next section. Next, note that increasing numbers of PF units are engaged as each sequence progresses, thus providing increasing amounts of recursive input to the CD layer. The number of activated CD units at any given instant is determined by a balance between the level of striatal inhibition and the total number of active corticostriatal inputs. Thus as the sequence progresses, and increasing amounts of input are supplied by the recurrent frontal projections, a greater number of CD units becomes activated. This increase in PF input stabilizes the network, making it less sensitive to future sensory inputs from the posterior parietal layer.
Tolerance to random corticostriatal weights
The corticostriatal weights of the network discussed in the previous section were selected randomly from a uniform distribution of values spanning the closed-positive interval defined by a maximum (max) and range (range) of weight values. The network was instantiated and tested at several combinations of max and range until it produced an unambiguous set of PF patterns in response to the 15 sequential contexts. An instantiation of the network that produced such “perfect” performance of the task was found within the first few instantiation attempts. Given the ease by which appropriate distribution parameters were established in this initial study, it appeared quite likely that other combinations of max and range combinations also might produce perfect networks.
To explore this hypothesis, the network was instantiated with weights drawn from uniform distributions defined by 5,564 different combinations of max and range. For each instantiation, the network was tested with the 15 serial contexts and the number of distinct PF patterns produced was recorded. The color of each pixel in Fig. 6 A indicates the number of distinct PF patterns produced by an instantiation of the network at a particular combination of maximum and range. Note that random weight distributions defined by combinations of max and range such that range > max were not tested because they allow for negative (i.e., inhibitory) values of corticostriatal weight.
There are several distinctive features of this color map. First, much of the valid parameter space appears to be effective, yielding networks that produce distinct codes for 13–15 of the sequential contexts. The left edge of this effective area is fairly distinct, indicating a sharp transition between ineffective and effective parameter values. Note that there is a lower limit to the max weight value (∼2.0), below which the network fails to produce distinct codes. Maximum weights below this value result in CD units with synapses too weak to produce suprathreshold responses, and thus produce no PF patterns. At the other end of the scale, large values of the max parameter result in networks with overactivated CD layers, thus completely activating, that is, saturating, the PF layer.
Note that when the range equals zero, the network produces no distinct patterns. This is a result of the symmetrical inhibition within the striatum. With all weights identical, and no noise within the system, there can be no winning neuron within the striatum. As a result, either all or none of the CD units become activated depending on the value of the max.
Individual instantiations of the network with the same max and range parameters will perform differently on the task because the weights are assigned randomly. This accounts for much of the variation in pixel color across the effective parameter region. To get a better picture of the parameter combinations leading to perfect networks, each of the 5,564 combinations was tested with 10 network instantiations—for a total of 55,640 network instantiations. Figure 6 B indicates all parameter combinations which produced “perfect” performance in ≥1 of 10 instantiations. Note that the parameter combinations producing perfect performance do not fall along the line of maximum range (i.e., the main diagonal of the figure). This result may be due to the fact that maximal range values produce a greater proportion of near-zero weight values. Rather than contributing to distinct network responses, these near-zero weights might be largely useless.
Two summary measures were used to describe the PF patterns produced by the perfect networks. First, the mean number of PF units activated by the six sequences of three cues was 14.64 out of 30 units, with a standard deviation of 0.23 units. This is a fairly dense coding scheme. Second, the average vector cosine between the six PF patterns, gives a measure of the similarity of the patterns produced by the group of six sequences. The relatively high mean value of this cosine (0.643 ± 0.009, mean ± SD) indicates that the PF patterns produced by the network are oriented, on average, 50° apart within their 30 dimensional vector space. This would represent a considerable amount of overlap between sparse codes, but given the density of this code, the patterns are reasonably uncorrelated.
Analysis of receptive fields
Let us return to the PF activation diagram (Fig. 5) introduced earlier to illustrate the model's ability to encode sequential inputs within spatial patterns of sustained activation. It is also interesting to examine Fig. 5 in a column-wise fashion—the perspective of a physiologist recording the receptive fields of single units. However, first it is advantageous to make a slight modification to the diagram. Recall that PF units, once activated, remain active through the remainder of the sequence due to sustained cortical-thalamic feedback. Accordingly, with the exception of rows A, B, and C, the rows of Fig. 5 represent responses to the current cue as well as sustained activations produced by previous cues within the sequence. For example, unit 2 in Fig. 5 begins firing when cue C arrives first in a sequence and sustains this activation through the presentation of subsequent cues A and B. These “working memory” squares of the plot, in this case those corresponding to CA, CB, CAB, and CBA, are redundant and thus obscure the responses defining a unit's receptive field. In Fig. 7, the working memory activations are eliminated, and thus each column of the plot can be thought of as a binary vector defining the “receptive field” of a PF unit.
Examining the columns of Fig. 7, notice that a few of the units (2 and 9) respond to only 1 of the 15 serial contexts. Such responses, which we refer to as “simple,” can be grouped into one of three response types: Rank1(X), Seq2(XY), and Seq3(XYZ) [Note that in our notation, (X) refers generically to any 1 of the 3 cues, (XY) to the 6 sequences of length 2, and (XYZ) to the 6 sequences of length 3]. All three types of simple responses have been reported in the single-unit literature. For example, unit 2, which only responds to cue C as the first cue in a sequence, is sensitive to the serial rank of the cue. This type of response, which we refer to as Rank1(X), has been observed in the PF (Funahashi et al. 1993) and frontal eye field (FEF) (Barone and Joseph 1989), caudate (Kermadi and Joseph 1995; Kermadi et al. 1993), and GP (Mushiake and Strick 1995) during the presentation phase of delayed sequencing experiments. Unit 9, which responds to cue C when it is preceded by the sequence AB, displays a sequence dependence we term Seq3(XYZ). Other instantiations produced units with Seq2(XY) responses. This type of unit has been identified experimentally in the FEF (Barone and Joseph 1989) and caudate (Kermadi and Joseph 1995; Kermadi et al. 1993).
Many of the units in Fig. 7 respond to more than one serial context—we refer to these as “compound” receptive fields. For example, unit 18 responds to cue A, independent of serial rank or context. This compound receptive field, which we term Cue(X), is composed of a mixture of Rank1(X), Seq2(YX), Seq2(ZX), Seq3(YZX), and Seq3(ZYX) responses. Such a response is similar to the spatial working memory responses of the dorsolateral PF (Funahashi et al. 1989, 1993). Cue-related responses also have been recorded in the caudate during spatial-delayed sequencing; however, in contrast to their PF counterparts, they have phasic activations (Kermadi and Joseph 1995; Kermadi et al. 1993) .
How many different types of receptive fields are displayed by the model? First, consider the theoretical limit. A binary vector of 15 elements can have 215 (i.e., 32,768) distinct configurations; however, if we enforce the structure-based constraint that PF units, once activated, remain active throughout the remainder of the sequence, only 1,000 different “receptive field” vectors exist. Further, many of these 1,000 receptive fields are operationally equivalent. For example, a PF unit that responds to cue A as the first cue in a sequence is operationally equivalent to one responding to B in the same serial position. Eliminating these operational equivalents, only 190 different responses (including the null, or task-insensitive response) are theoretically possible.
To test this limit, we developed an algorithm to classify the receptive fields of 20,640 units from a sample of 688 perfect networks. Although all 190 receptive fields were expressed by this sample, some fields were much more common than others. Figure 8 displays the 35 most common receptive fields and sorts them by decreasing frequency beginning with the task-insensitive units in bin 1. Note that Fig. 8 groups operationally equivalent responses, such as Rank1(A) and Rank1(B), into the same bin and arbitrarily represents them by one of these equivalents.
First, a large number of units (bin = 1, n = 3,054) were task insensitive. Among task-related units, the simple responses: Rank1(X) (bin 3, n = 1,049), Seq2(XY) (bin 6, n = 708), and Seq3(XYZ) (bin 4, n = 915) are three of the five most common. However, 85% of the task-related units display compound responses. The logical function performed by some of these compound receptive fields, such as Cue(X) (bin = 35, n = 27) can be understood easily and succinctly stated by higher-level monikers. Other examples include a population of units (bin = 8, n = 512) that responded nondifferentially to the first cue of all sequences. Such a response is recognized easily as a rank-dependent response or pure Rank1 receptive field in our terminology. Units in the PF fitting our Rank1 definition have been attributed to nonspecific preparation, arousal, or attention (Funahashi et al. 1993). Similarly, units (bin = 19, n = 243) responding to all six Seq2(XY) contexts (i.e., AB, AC, BA, BC, CA, and CB) can be classified as pure Rank2 and units (not shown, n = 27) responding to all six Seq3(XYZ) contexts as pure Rank3. However, Rank2 and Rank3 responses have not yet been described in the literature.
Although the idea of simple responses combining to yield easily understood compound receptive fields is intuitively appealing, in the case of the model, most receptive fields defy such straight-forward labels such as Cue or Rank1. Instead, the logic performed by the majority of units often is described most easily by a list its component receptive fields. For example, the most common task-related unit (bin = 2, n = 1,159) displays an odd mixture of Rank1(A), Rank1(B), Seq2(CA), and Seq2(CB) responses. This type of response might best be classified as a combination of Cue(A) and Cue(B). Similar units with cue-related responses to two different cues have been reported in the caudate (Kermadi and Joseph 1995). Several other compound receptive fields resemble those observed in single-unit studies. For example, two types of units, one (not shown, n = 27) displaying both Rank1(X) and Rank2(X) and the other (not shown, n = 32) displaying a mixture of Rank2(X) and Rank3(X) activity, resemble units reported in the FEF (Barone and Joseph 1989). Units with a combination of Rank2(X) and Rank3(X) activity also have been reported in the caudate (Kermadi and Joseph 1995; Kermadi et al. 1993).
Alternative modeling assumptions
To better understand the dependence of the simulation results on our modeling assumptions, we repeated the group of simulations from Fig. 6 A with an alternative synaptic model, decreased activation function slope, and increased membrane time constants. In general, we found that our results were quite insensitive to changes in these parameters within the GPi, T, and PF layers. However, they were quite sensitive to changes in the CD layer model, and these findings are reported here. Finally, we repeated simulations using a CD layer with different levels of collateral inhibition and also a feed-forward model of caudate inhibition.
ACTIVATION FUNCTION SLOPE.
We explored the sensitivity of our results to the slope parameter, b, of the sigmoidal activation function of the units in the CD layer. The preceding studies all used an extremely steep slope parameter of 50 in the CD layer. Repeating the study of Fig. 6 A with slopes of 5 demonstrated no significant qualitative or quantitative differences in results. However, further reducing the slope to 1 qualitatively changed the shape of the effective coding area and decreased the number of perfect networks to 10 (Fig. 6 C). Note that the network fails to encode any sequences at low to moderate values of range.
The results in Fig. 6 A were also sensitive to the time constant of the CD layer units. Increasing the membrane time constant from 15 to 50 ms (by increasing the membrane capacitance from 0.5 to 1.67 nF) produced a considerably better performance (Fig. 6 D). In addition to displaying a qualitatively larger effective coding region, this set of studies produced 460, as compared with 270, perfect networks.
ALTERNATE SYNAPTIC MODELS.
As outlined in methods, the model assumes a “current-source” representation of synaptic action. We repeated the random corticostriatal weight studies of Fig. 6 A with a more realistic synaptic model in the CD layer. In this alternative synaptic model, excitatory (Eq. 7 ) and inhibitory (Eq. 8 ) conductance values of the CD layer are converted into currents by multiplying the difference between the membrane potential and the applicable synaptic reversal potential (Eq. 9 ). Figure 6 E displays the results of a study using an excitatory reversal potential, E ex, of 0 mV and an inhibitory reversal potential, E inh, of −90 mV. Although the results of this group of simulations (Fig. 6 E) looks qualitatively similar to the current-source results of Fig. 6 A, only 64 perfect networks, compared with 270 in the current-source case, were produced. A further decrease in coding ability was observed in simulations using an inhibitory reversal potential of −70 mV, which only produced 34 perfect networks.
LEVELS OF COLLATERAL INHIBITION.
In both the current-source and more realistic synaptic models, adjustments in the level of inhibition have a scaling effect on the number, location, and spread of perfect networks within the max-range parameter space. Increases in the level of collateral inhibition produce increases in the number of perfect networks and area of the effective region. However, because increases in inhibition also shift the region to larger values of range, and thus a larger area of the max-range parameter space, the overall proportion of perfect networks to imperfect networks stays approximately the same.
Although collateral inhibition is strongly suggested by the existence of GABAergic spiny neuron collaterals as well as by a limited amount of physiological evidence (Groves 1983; Katayama et al. 1981; Rebec and Curtis 1988; Wilson 1995), there is also clear evidence for feed-forward inhibition via GABAergic interneurons (Kitai and Surmeier 1993). To address this possibility, we constructed an alternative model incorporating feed-forward, rather than collateral, inhibition. As before, the feed-forward model was composed of 30 modules coursing through the PF, CD, GPi, and T layers; however, the inhibitory collaterals of the CD layer were omitted. In their place, an additional, and separate, layer of 30 interneuron units was added to provide feed-forward inhibition to the original CD layer. Like the original CD layer, each unit in the interneuron layer received corticostriatal projections from the entire PF layer; however, instead of making inhibitory projections to the GPi layer, each unit sent inhibitory projections to all of the units of the original CD layer.
The feed-forward version of the model presented dynamics that were radically different from those of the competitive model. For example, the activation levels of the CD units tended to move in unison rather than in competitive opposition. Often when only a portion of the CD layer was active, the remainder of the units huddled only a few millivolts below threshold. A slight increase in corticostriatal input via the recurrent PF units often would drive all of these units above threshold, thereby saturating the PF layer. In other words, activation of the CD layer was often an all-or-none proposition. To minimize this effect, we found it necessary to reduce the synaptic weights of the recurrent PF input to ∼1/100 of that of the sensory units.
Figure 6 F indicates all parameter combinations that produced “perfect” performance in ≥1 of 10 instantiations. Comparing Fig. 6, B and F, note that although the feed-forward version of the model is capable of encoding sequences, it does so within a region of the parameter space that is smaller and distinctly different in shape. Like the case with low activation function slopes (Fig. 6 C), the feed-forward model fails to resolve small differences in corticostriatal weight and thus fails to produce perfect networks in regions of low range. In addition, fewer instantiations of the feed-forward model produce perfect performance on the encoding task. Both of these results suggest that the range of appropriate corticostriatal weights is much smaller for the feed-forward version.
These simulation results suggest that the circuits linking the basal ganglia, thalamus, and cortex have an inherent capacity for encoding the serial order of events. Whereas we specifically modeled the encoding of a sequence of simple visual inputs, the same mechanisms are equally applicable to the encoding of the serial order of other sensory or internal events, as may occur, for other cortical-basal ganglionic loops, in the haptic recognition of an object or in registering the sequence of words in a sentence. This special computational property does not require adaptive training mechanisms of any kind, although adaptive tuning might improve its encoding efficiency. Another important feature of the model is that the receptive fields of its units bear a close qualitative resemblance to the receptive fields observed in single unit studies.
We begin this section by discussing the computational elements of the model, after which we consider the potential role of modulation and learning. Next we analyze the relationship between the receptive fields of the model and single unit data. Finally, we explore possible extensions of the model to other cortical areas.
Computational elements of the model
The ability of the model to encode the serial order of sequential events stems from three computational elements that combine in a cooperative manner due to the structure of the cortical-basal ganglionic network. The computational elements are: working memory, competitive pattern classification, and recursion. In this section, we review the origin and analyze the role of each of these elements.
The model's cortical-thalamic loops support self-sustained activations that provide working memories of the results of prior processing. Two features contribute to this operation: focused positive feedback and low-threshold calcium channels. Positive feedback, focused within individual loops, endows the bistability that permits activations to persist after the return of tonic pallidal inhibition. Bistability substantially decouples the dynamics of working memory from operations in the CD layer. Calcium channels inactivate too rapidly to play a role in maintaining either of the stable states. Instead, they are important for initiating loop activity through a postinhibitory rebound of thalamic membrane potential. Rebound is necessary for initiating loop activity, given the double inhibitory pathway through the basal ganglia. Without a rebound mechanism, loop activation would be solely dependent on an excitatory input such as a cortico-cortical projection to PF units.
Cortical-thalamic loops represent only one possible method for producing sustained working memory activity. There is potential for sustained activity within at least four other types of positive feedback loops involving PF (Houk 1997): cortical-cortical loops with PP cortex, cortical-cortical loops within PF, cortical-cerebellar loops between PF and dentate nucleus, and trans-striatal loops through basal ganglia. It is likely that each of these loops contributes to the net gain of positive feedback, thus contributing to sustained activity in PF neurons. For simplicity, we focused here on just one loop, the cortical-thalamic. Although not the emphasis of our study, sustained trans-striatal activity did occasionally arise within the network.
Once engaged, the working memory loops remain at a constant level of activation for the remainder of the cue sequence presentation, an assumption that is consistent with single-unit data (Funahashi et al. 1989; Fuster and Alexander 1971). Other models of sequence encoding have used decaying working memory profiles (e.g., Wang and Arbib 1990). Although decaying profiles have certain computational advantages, within a complex recurrent network they have the potential to produce limit cycles or chaotic states. Sustained working memory traces, by contrast, tend to stabilize the overall network and ensure that it settles into a stable equilibrium. Sustained traces also create PF codes that are relatively insensitive to the rate of cue presentation. Accordingly, they encode the serial order rather than the timing of the cue presentation. Although this lack of timing information might pose a problem for a network attempting to encode a musical phrase (Cummins et al. 1993), it can be an asset in skill learning. For example, motor responses in a delayed-sequencing task may be performed slowly initially. As the speed of the cue presentation and motor performance is increased, the representations in the PF would remain unchanged. This should simplify learning because what is learned during a slow-motion rehearsal remains relevant to performance at faster speeds.
COMPETITIVE PATTERN CLASSIFICATION.
The model's CD units perform competitive pattern classification on a vector of corticostriatal inputs. Their random synaptic weights provide each unit with a unique perspective on the state of event and recurrent PF activity. Some units react strongly to certain combinations of input whereas others do not react at all. Successful pattern classification does not require training but does require weight matrices with sufficient diversity and a gain balanced with the degree of striatal inhibition.
Three interpretations of the circuitry underlying striatal inhibition have been proposed: 1) collaterals of the GABAergic spiny neurons produce mutual (competitive) inhibition (Groves 1983; Katayama et al. 1981; Park et al. 1980; Rebec and Curtis 1988); 2) inputs from the cerebral cortex to GABAergic interneurons produce feed-forward inhibition (Jaeger et al. 1994; Kitai and Surmeier 1993); or 3) both mechanisms coexist (Kita 1993). Our simulations, which contrast the performance of collateral and feed-forward inhibition, indicate that both help to regulate the extent of CD layer activation, which ultimately results in sparser patterns of PF layer activation. Sparse PF patterns allow room for a greater number of sequences to be stored within the PF layer, whereas excessive CD layer activation leads to a completely activated PF layer—a useless state from an encoding standpoint. This effect is particularly important as increasing numbers of PF units become activated by successive cues in the sequence.
Although both mechanisms regulate against saturation, collateral inhibition is more effective than feed-forward inhibition because it enhances the pattern classification abilities of the striatal layer by magnifying small differences (Ratliff et al. 1967). Spiny units display a characteristic mixture of burst durations in the simulations with collateral inhibition, in good agreement with single unit data (Wilson 1990). Feedforward inhibition, on the other hand, produces spiny unit activations that move in unison rather than in competitive opposition. Instead of differentiating between similarly activated units, feed-forward inhibition simply reduces the activation of all units through global downregulation. The simulations indicate that feed-forward inhibition (Fig. 6 F) only produces perfect networks at large values of range, whereas collateral inhibition is effective across a larger portion of the parameter space (Fig. 6, A and B). The greater magnification of small differences in collateral networks is accentuated in current-source type synapses as opposed to synapses with reversal potentials because the former do not suffer from shunting or saturation. Activation functions with steep slopes, justified on the basis of the abrupt transitions between “up” and “down” states observed in membrane potentials (Wilson 1995), also enhance the CD layer's ability to magnify small differences in the input patterns (Fig. 6, A vs. C).
Collateral inhibition approximates the computational function of a winner-take-all (WTA) mechanism. An ideal WTA mechanism selects the unit with the strongest input vector—independent of the initial state of the network. However, Fukai and Tanaka (1997) proved that collateral inhibitory networks with small or zero self-inhibition are sensitive to initial conditions and thus do not always select the unit with the strongest inputs. Consider the simple case of a two-neuron inhibitory network (A and B) without self-inhibition where unit A starts in an active state (and thus inhibits unit B). If equal inputs then are applied to the network, unit A will remain the winner because of inhibition to unit B. To win, unit B must receive inputs greater than unit A's by an amount large enough to overcome this inhibition. Thus a collateral network may not resolve small differences in input patterns.
As the cue sequence progresses and increasing amounts of input are supplied by the recurrent frontal projections, CD units receive an increasing number of excitatory inputs. With current-source synapses, additional inputs always provide additional excitatory current. This is not always the case for reversal-potential synapses because they are limited by shunting and saturation. Thus winning and losing CD units with current-source synapses can potentially receive vastly different magnitudes of excitatory current. The same is true for inhibitory synapses; however, the effect is not as important. For example, if n CD units are active, each inactive CD unit receives n identical inhibitory inputs while each active unit receives (n − 1) inhibitory inputs. Thus active and inactive units differ by only a single inhibitory input. In the current-source network, the magnitude of this difference is always a constant. However, with reversal-potential synapses, this difference becomes increasing small as additional inhibitory inputs are added. Combining the effects of excitatory and inhibitory inputs, current-source synapses tend produce large excursions in membrane potential for winning (and losing) units. This results in equilibrium states where units are quite depolarized (or hyperpolarized) with respect to firing thresholds. This highly polarized state requires larger differences in synaptic input to change state than a state where units are all close to threshold. By contrast, the difference in membrane potential between winning and losing CD units with reversal-potential synapses becomes smaller with increasing numbers of inputs. Accordingly, the membrane potential of these units tends to stay quite close to the threshold potential providing for effective WTA computations.
Unlike an ideal WTA network, a collateral network also has a limited time resolution—it works as a low-pass filter. In a slow network, fast changes in input do not result in changes in output firing. This effect is potentiated in the highly polarized states produced by current-source synapses. This effect is diminished in networks with reversal potential synapses because the units stay closer to threshold. Somewhat paradoxically, the poor WTA performance can increase coding performance by limiting the number of CD layer state changes, which ultimately reduces the degree of PF-layer activation.
The units in the CD layer perform competitive pattern recognition on a spatial array of PF inputs and, in this sense, are similar to units in traditional feed-forward networks. These inputs are composed of a mixture of sensory event-related and recurrent state-related inputs. The latter, being working memories of the results of prior processing, provide a special context for the interpretation of new event inputs. They provide a continuity with the past that is not found in feed-forward networks and is critical for sequence encoding.
To understand recursive processing, consider the model's response in the sequential paradigm. The event inputs could represent any sensory modality, proprioceptive information or even internally generated signals such as recalled long-term memories. With respect to the simulated task, they consist of simple labeled-line representations of the cues. As successive cues are applied to the model, recursion allows the network to be responsive to its own prior states and thus recursively transformed—or, metaphorically speaking, reinterpreted—by the CD layer. In this way, the internal state of the network evolves into a higher-order representation that registers not only the occurrences of events but also the order in which they occurred. Creating higher-order representations through recurrence differs substantially from the idea of hierarchical processing commonly ascribed to the visual system. In hierarchical visual processing, low-level information is transformed into progressively more and more complex representations through successive processing of the current events in different neural regions (Ungerleider 1995). Recursion has the advantage of allowing a single neural region to successively process, and thus reinterpret, the results of its own processing.
COMPUTATIONAL ELEMENTS OF OTHER MODELS OF SEQUENTIAL PROCESSING.
Models of sequential or temporal processing have been proposed for the hippocampus (Granger et al. 1994; Reiss and Taylor 1991), central pattern generators in tritonia (Kleinfeld and Sompolinsky 1989), and basal ganglia (Dominey 1995; Dominey et al. 1995; Mitchell et al. 1991; Wickens and Arbuthnott 1993). Each of these networks derives a portion of its processing capabilities from computational elements similar to those found in our model, such as lateral inhibition (Dominey 1995; Granger et al. 1994; Wickens and Arbuthnott 1993), working memory loops (Dominey 1995), and recursion (Dominey 1995; Mitchell et al. 1991; Reiss and Taylor 1991). However, these models also rely on additional elements such as synaptic eligibility traces (Granger et al. 1994; Wickens and Arbuthnott 1993), refractory periods (Wickens and Arbuthnott 1993), widely distributed neural time constants (Dominey 1995; Reiss and Taylor 1991), and transmission delays (Kleinfeld and Sompolinsky 1989; Wickens and Arbuthnott 1993). Of these models, the model of saccade generation by Dominey (1995) bears closest structural resemblance to ours; however, the Dominey model focuses primarily on the decoding, rather than the encoding and registration, phases of the delayed-spatial sequencing task.
Role of learning and modulation
Many neural network models rely on adaptive mechanisms such as Hebbian learning, reinforcement learning, or backpropagation for training their synapses to participate in useful computations. For example, competitive Hebbian learning has been used to explain the formation of orientation-specific neurons in the visual cortex (Bienstock et al. 1982; Linsker 1986; von der Malsburg 1973). In another example, a recurrent network by Jordan (1986), which bears passing resemblance to our model, uses a variant of backpropagation to transform spatial inputs from “plan” units into complex sequences of phonemes. Our model, which uses random rather than trained synapses, is a departure from this general approach. Rather than focusing on the learning, we have focused on the inherent sequence encoding abilities of the cortical-basal ganglionic architecture. The implication is that the brain has an innate ability to encode serial events into integrated concepts represented in spatial patterns of neural activity. This is not to say that adaptive mechanisms do not play a role; for example, they would be quite useful for tuning the competitive pattern classification stage so as to improve the encoding performance of the network. This might provide a more efficient code or it might emphasize serial events that are of particular relevance to the organism.
MECHANISMS FOR NEUROMODULATION.
Only a portion of max-range parameter space produced perfect networks (Fig. 6). If corticostriatal synaptic weights do not naturally exist in this region, it might be functionally advantageous to have a mechanism for guiding them into it. A number of modulatory mechanisms that have been shown to affect corticostriatal interactions could provide such a mechanism (Kitai and Surmeier 1993). Neuromodulators generally are thought to effect changes extrasynaptically. Thus rather than adjusting synaptic efficiency, many mechanisms appear to modulate the excitability and bias of the pre- and postsynaptic neurons by altering ionic conductances. The computational question is can changes in excitability be equated with changes in max-range parameters? To a first approximation, the answer to this question is probably yes. For example, consider the neuromodulatory action of either a pre- or postsynaptic muscarine-sensitive potassium channel (Calabresi et al. 1993; Caulfield et al. 1993; Lovinger and Tyler 1996). Modulation of potassium channels also can be effected by dopaminergic inputs via the D1 or D2 families of receptor subtype (Surmeier and Kitai 1993). Serotonin, a global modulator, also could play an important role in such a modulatory system because it appears to exert both an excitatory action on the striatum and stimulate dopamine release from terminals (Jacobs and Azmitia 1992).
On modulation, an increase (decrease) in potassium conductance would make a spiny neuron less (more) responsive to glutamatergic inputs. Although the concomitant alternations in membrane resistance also would affect the neuron's bias, the end result would be quite similar to that of adjusting the neuron's average synaptic weights. If dopamine is an attentional signal within the striatum (Miller 1993; Ward and Brown 1996), this could guide corticostriatal weights into an effective region of the parameter space during an encoding task.
Inhibition levels also require fine tuning. Are there mechanisms that could regulate the extent of competition between striatal spiny neurons? Network models suggest that muscarine-sensitive modulation of potassium channels represent a biologically plausible mechanism for adjusting the level of competition (Wickens et al. 1991). These channels are affected by changes in the dopamine-cholinergic balance via a mechanism involving the D2 receptor subtype found in cholinergic interneurons. The latter exist in a state of tonic inhibition (Lehmann and Langer 1983), and a decrease in striatal dopamine levels releases them to produce an increase in acetylcholine (ACh). Simulations suggest that an increase in potassium conductance, produced by high levels of ACh, serves to deemphasize the effects of GABAergic synapses and thus lowers the level of striatal competition (Wickens et al. 1991). Conversely, decreased potassium conductances, produced by high levels of dopamine, enhances competition. Although important details such as the sign and magnitude of this effect remain the subject of debate, the end result appears somewhat equivalent to that of modulating the synaptic weights of the CD layer's GABAergic synapses.
Understanding the model's receptive fields
The receptive fields of the model units are remarkably similar to the receptive fields observed in single-unit studies. Still, several differences between the model and experimental data must be addressed. Before addressing these differences, it is helpful to clarify a few of our assumptions regarding receptive fields.
We have assumed that a unit's receptive field is defined by the complete set of responses elicited by a given task paradigm. In the case of the simulation paradigm, with its set of 15 different cue sequences, there exists the potential for a very large number of ways in which a neuron might respond. Through several assumptions, we could shrink this number into a more manageable set. First we assumed that the neural code is strictly spatial—defined by spatial patterns of PF activation. This assumption ignores firing rate codes and temporal firing codes that also could be used by an assembly of neurons to encode information. However, in attempting to capture the essential character of system interactions, we chose to simplify many of the physiological and anatomic features that would support such alternative codes. For example, the model's PF units respond in a manner that is essentially binary. Accordingly, no information can be coded in their firing frequencies. In addition, our analysis only considers the steady-state portion of the PF response: thus ignoring any information that might be contained in the nuances of the transient states of the units. Even with these two major simplifications, there are ≥215 potential ways in which a neuron could respond to this set of 15 contexts. The bistable design of the cortical-thalamic loops, which sustain PF activations once elevated, reduces the realm of possibilities further to 1,000 different sets of responses, of which only 190 are operationally distinct.
Given the large number of possible receptive fields, why have such a limited few been reported in the delayed-sequencing literature? Further, why are some of the experimentally described receptive fields displayed by such a small fraction of model units? Several factors, both paradigm- and model-dependent, contribute to this disparity.
Receptive fields are largely artificial constructs, reflections of an experimental paradigm and necessarily constrained by the design of the experiment. Different behavioral paradigms reveal different cross-sections of a unit's response characteristics. This point is especially true when the concept of receptive fields is applied to sequential tasks and provides the key for resolving many of the apparent discrepancies between our model data and experiment.
Area 46 of the PF traditionally has been considered a locus for a type of sustained responses thought to underlie spatial working memory. After a slight phasic response to the cue, PF neurons show sustained delay period activity that terminates just after the initiation of the saccade (Funahashi et al. 1989, 1990; Fuster and Alexander 1971; Goldman-Rakic 1995; Goldman-Rakic et al. 1990; Petrides 1991). The activity does not decay and, in fact, often shows a slight increase over the delay period (Funahashi et al. 1989). Area 46 neurons displaying delay period activity are activated preferentially by targets at specific retinotopic positions termed memory fields (Goldman-Rakic 1995). Similar sustained delay-period activity that is highly selective for particular shapes, colors, and scenes also have been reported in the temporal cortex during similar short-term memory tasks (Miyashita and Chang 1988).
Despite the close similarity between spatial working memory units and the Cue(X) response of the model, few of the model units displayed Cue(X) or Cue(X) + Cue(Y) responses (0.82 and 6.6%, respectively). However, these fractions are altered dramatically if we analyze only the portion of model data available in a spatial working memory task. To explain, in a spatial working memory task involving three cues, a single cue, rather than a sequence, is presented. Accordingly, the paradigm only presents three distinct contexts, allowing only six possible receptive fields including the null response. Of these, four can be considered operationally distinct as depicted in Fig. 9: Insensitive, Cue(X), Cue(X) + Cue(Y) and the pure Rank1 [i.e., Cue(X) + Cue(Y) + Cue(Z)] response. First, note that in this reduced data set only 9,076 (compared with 17,586 for the 3-target paradigm) of the 20,640 units are task related. Of these units, 70% appear as spatial working memory, or Cue(X), with the other major fraction displaying two-cue Cue(X) + Cue(Y) behavior. In addition, a small fraction (5.6%) display pure Rank1, which generally are referred to as attentional responses. This analysis leads to the prediction that units displaying either sustained Cue(X) or attentional behavior in a spatial working memory task are likely also to display sequence-related responses in sequential paradigms. This prediction also would hold true for units in the thalamus, for example the VA nucleus, displaying working memory activity.
The single-unit data for PF units during sequence encoding paradigms are extremely limited. In fact, most of the evidence comes from a study using a two-cue paradigm (Funahashi et al. 1993). This paradigm used two cues, A and B, presented in the order AB or BA. The authors found that 58% of their task-related units displayed what could be considered Rank1(X) behavior, 21% Cue(X), and 21% pure Rank1. By contrast, the model units displayed these fields 6, 0.82, and 2.9% of the time, respectively. However, reanalyzing the portion of the model data spanned by this two-cue paradigm, the fractions adjust considerably. First, note that in the two-cue paradigm, only nine different receptive fields are possible. Of these, only six are operationally distinct as listed and defined in Fig. 10. Using the receptive field definitions of Fig. 10 to reclassify the model units, we see a substantial increase in the number of units fitting the Cue(X), Rank1(X), and pure Rank1 descriptions. Note that while rank-dependent responses to the second cue are common in the model, none were observed in the experimental study by Funahashi et al. (1993). One possible explanation for this is that in the two-cue paradigm, the second cue does not provide any additional information to the monkey. In other words, once the first cue is presented, the identity of the second cue is strictly determined and need not be encoded by the monkey. A decrease of attentional modulation during the second cue could result in a decrease or elimination of Seq2(XY) and pure Rank2 responses.
The same argument regarding the salience of the last cue can be applied to the delayed-sequencing data recorded in the PF-FEF region and the caudate nucleus (Barone and Joseph 1989; Kermadi and Joseph 1995; Kermadi et al. 1993). Similar to the paradigm applied to the model, each of these studies used six sequences of three cues. However, because the paradigm only involves three different cues, once the first two cues of the sequence arrive, the third is determined. Although the model pays equal attention to all three cues in the sequence, the monkey need not. Thus, experimentally, only 9 of the 15 contexts are salient. When responses are concentrated on the first two cues, then there are only 29 (i.e., 512) possible receptive fields. Adding the restriction imposed by the latching cortical-thalamic loops, this leaves 125 classes, of which, only 30 are operationally distinct. Unlike the working memory and two-cue tasks, many of these 30 classes are not readily describable by simple names such as Cue(X) or Rank2; rather, the response contingencies represented by these classes are often quite complex. When the model units are reclassified according to these 30 classes, 46% of the responses fall into experimentally observed classes. Figure 11 lists these experimentally observed receptive fields along with the number and percent of model units displaying them.
In several cases, differences between the model behavior and single-unit data can be traced to the simplified structure and physiology of the model. For example, in the model's PF layer, we have modeled E units as pure sensory neurons without any ascending input from the thalamus and R units as pure context units without any cortical input from sensory association areas. This approach was taken for the sake simplicity even though such a dichotomous input arrangement is not found in the PF. Single-unit recordings demonstrate that, although some neurons have mainly event-related response components and others have mainly the sustained discharges associated with working memory, many have mixtures of these two components (Funahashi and Kubota 1994; Funahashi et al. 1990). Such mixed responses of the PF, which could be produced in units receiving both recurrent and event-related inputs, have been left out of the model for simplicity.
Also, the model's PF units do not display the brief phasic burst characteristically found at the onset of working memory traces. This lack of agreement with physiology can be traced partially to the threshold activation functions assumed for the model's units that resulted in binary output behavior. Returning to the response of a single module to a phasic input, we can see that although the thalamic membrane potential (Fig. 3, T) has a brief phasic component, its activation does not. Accordingly, the PF layer membrane potential (Fig. 3, PF), when driven by a sole input from the thalamus, has a monotonic rather than phasic-tonic, profile. Using a linear activation function instead would allow for this phasic-tonic profile to be reflected in the PF firing rate. Anatomic simplifications also contribute to this lack of phasic-tonic profiles. For example, additional, cue-related inputs to each PF unit, such as cortico-cortical inputs from the posterior parietal cortex (Selemon and Goldman-Rakic 1988), also would promote phasic-tonic membrane potential profiles.
The simplified model units also do not display the spatial-tuning curves characteristic of the memory fields of area 46 (Goldman-Rakic 1995). This departure reflects the model's simplified input structure and threshold activation functions. We chose to use labeled-line inputs instead of spatially tuned responses because we wanted to emphasize the serial, as opposed to spatial, aspects of the task. Certainly, spatial-tunings would arise if a course-coded input layer was used instead: modeled, perhaps, after the retinotopic responses of area 7a of the posterior parietal cortex. Once again, as mentioned above, linear activation functions allow such graded responses.
The distribution of receptive fields also might be sensitive to the model's structure. For example, although each CD unit receives three sensory inputs, the overall amount of contextual input it receives is dependent on the number of modules in the network. A CD unit in a network of 30 modules receives 30 contextual inputs. An increase or decrease in network size serves to increase or decrease the overall importance of contextual relative to sensory input. Thus altering the network size could have an effect on the frequency of certain receptive fields. Similarly, the addition of sensory inputs, as with the course-coded input structure suggested above, could alter the receptive field proportions.
Extensions to other cortical areas
Because the distributed neuronal architecture of Fig. 2 is shared by several other areas of the cerebrum, the mechanisms proposed in the present paper might generalize to additional cognitive-motor processing stages. The abbreviated discussion provided in this section will be limited to comparing PF with two cortical areas known to participate in the execution of sequential limb movements, the supplementary motor area (SMA) and the primary motor cortex (M1). Figure 12, left, schematically illustrates that both SMA and M1 have loops through the basal ganglia that form networks analogous to the PF-basal ganglionic modular array analyzed in the present paper, based on the transsynaptic transport of viral markers (Middleton and Strick 1997a; Strick et al. 1995). In Fig. 12, right, we illustrate the additional observations (Middleton and Strick 1997b) that PF and M1 also have loops through the cerebellum. Note that the three loops through basal ganglia and the two through cerebellum each involve segregated groups of cells in GPi and in the dentate nucleus (DN), respectively.
An essential feature of the model presented here is its ability to generate a spatial pattern of sustained bursting activity that encodes a working memory of the serial order of events. What might be the analogous features of the loops subserving SMA and M1? Single-unit recordings in a sequential movement task have demonstrated somewhat shorter periods of sustained bursting in SMA neurons; these intermediate-length bursts are both sequence- and movement-specific (Tanji and Shima 1994). M1 neurons recorded in the same task exhibited yet shorter bursts that were sequence independent and movement specific. Evidence that basal ganglionic networks might participate in the generation of the SMA bursts comes from the recording in GPi neurons of pausing patterns of discharge with similar sequence-dependent properties (Mushiake and Strick 1995). The short bursts of sustained discharge in M1 correspond to the relatively brief durations of the movements, although some units also discharge during the waiting periods that were imposed between the individual movements of a sequence. We propose that the brief, movement-related bursts in M1 are analogous, though on a shorter time scale, to the long-duration sustained discharges that encode serial order in PF. We further propose that the intermediate-duration, sequence-specific discharge is the appropriate analogy in SMA. In addition to being of shorter duration than in PF, the bursts in M1 and SMA are movement related and are not involved in encoding serial order. Instead, they seem to be involved in the generative decoding process mentioned in the introduction.
For simplicity, the present model includes only one mechanism for generating sustained working-memory discharge. However, as pointed out earlier, the loops through the basal ganglia and cerebellum and the cortico-cortical loops illustrated in Fig. 12 should each contribute positive feedback gain in support of working memory discharge. The loop through the cerebellum appears to be quite important for sustaining burst discharge in M1 (Houk et al. 1993). In addition, the substantial working memory discharge found in the dorsomedial thalamic nucleus (Fuster and Alexander 1973), which relays cerebellar input to PF (Kuroda et al. 1993), suggests that the cerebellar loop also may be important in producing sustained PF activity. In contrast, a cerebellar channel subserving the SMA currently lacks demonstration (Middleton and Strick 1997a), so the sequence-specific sustained discharge in SMA probably relies more on the alternative mechanisms.
The role of the striatum in the present model is to classify spatial patterns that exist in its PF input, using the result to recursively refine this pattern so that it uniquely reflects the serial order of sensory events. In contrast, Tanji and Shima's (1994) results indicate that the spatial pattern in SMA uniquely specifies the evolving sequential movement. Single cells start to fire, for example, when movement A is completed but only if B is to be the next movement in the sequence. Presumably the striatal target of SMA input detects this condition and, through the intermediate GPi channel, recursively updates the spatial pattern of SMA discharge. In M1, the spatial pattern of activity reflects individual movements in the sequence with relatively quiescent waiting periods between moves (Tanji and Shima 1994). In addition, 25% of the population discharges during the waiting period, and this discharge specifies the next movement in a sequence-independent manner. We suggest that the striatal target of M1 uses convergent input from SMA (Inase et al. 1996) to classify, and thus select, the appropriate cells to discharge in the waiting period. These discharges, combined with another convergent input reflecting the auditory “go” signal, then might be used to initiate the more intense movement-related burst discharge. Recursion in the cortical-basal ganglionic loop could facilitate the recruitment of a sufficient and appropriate population of neurons to command the specified movement.
The mechanisms outlined in the above paragraphs are quite speculative and are likely to require future revision. Some of the proposed processing steps overlap with functions suggested earlier for cortical-cerebellar loops (Houk 1997; Houk et al. 1993). This may reflect a degree of redundancy in the overall system or simply errors in our interpretation. We wish to stress that the concepts formulated here should be treated as testable working hypotheses, advanced to illustrate the substantial potential of cortical-basal ganglionic architectures for analyzing and controlling serial motor behaviors.
In this paper, we have shown how a mechanism for encoding the serial order of events could emerge from known interactions between the prefrontal cortex, basal ganglia, and thalamus. This sequence encoding ability is a result of the macro-organization of these circuits rather than the organization of individual synapses. Accordingly, the model's synapses do not need to be individually adapted through training. Rather, the synapses of the model's striatal layer simply require global adjustment into a favorable region of a random weight space. At the behavioral level, this result implies that the brain is capable of creating working memory representations of sequential stimuli without previous training or exposure.
The model's units bear qualitative resemblance to cue-, rank- and sequence-related responses that have been recorded from the prefrontal cortex, FEF, and caudate. In making these comparisons, it became apparent that the observed receptive fields are highly dependent on the serial paradigm of the task; that is, different length sequences reveal different cross-sections of a unit's response. Based on this rationale, the model predicts that units of the prefrontal cortex displaying spatial-working memory responses would likely display serial dependencies in a sequential task. The model also predicts that sequence-related responses will be found in the thalamus. Our results provide an explanation for why cortical and basal ganglionic neurons often display complex mixtures of responses. The context-dependencies of these responses, although sometimes simple and easily interpretable [e.g., Rank1(X), Seq2(XY)], more often consist of complex mixtures of several simple responses. This observation, combined with the large number of possible receptive fields, suggests that it will be difficult to find units with identical receptive fields.
The mechanisms of sequential processing in the brain are just beginning to be explored by neuroscientists and thus present an exciting array of challenges and opportunities for future work. Although much of the connectivity between the relevant neural areas has been established experimentally, very little is known about the behavior of these circuits. Given the complexity of these interactions and the number of potential experimental questions, a coordinated effort between modelers and experimentalists will likely be essential for future progress in this area.
The authors are grateful to Drs. Andrew Barto and Sara A. Solla for comments on the manuscript.
This work was supported by National Institute of Mental Health Grant P50-MH-48185.
Address for reprint requests: J. C. Houk, Dept. of Physiology, M211, Ward Building 5-315, 303 E. Chicago Ave., Chicago, IL 60611-3008.