 |
INTRODUCTION |
The serial order of events and actions is critical in cognition and behavior. In addressing this issue more than four decades ago, Lashley (1951)
postulated that the brain analyzes and controls serial order by creating and using a spatial pattern of neural activity, which he referred to as a "determining tendency" or idea. To control sequential actions, this spatial pattern would require translation into expressive action in the time domain through a process he likened to the application of "syntax" in the formation of language from ideas. The inverse transformation also must exist to transform temporally spaced sensory experiences into a sustained spatial pattern of brain activity, for example, to construct a concept from sequential sensations during haptic manipulation or visual survey.
Lesion results suggest that the prefrontal cortex is critical in analyzing serial events and in using the results to control behavior. Subjects with frontal lobe lesions show impaired performance on tasks requiring organization of sequential pointing responses (Petrides and Milner 1982
; Wiegersma et al. 1990
), serial-order recognition (Kesner et al. 1994
), or recency judgments (Milner et al. 1991
). Monkeys subjected to bilateral lesions of areas 46 and 9 have difficulty monitoring sequences of novel stimuli (Petrides 1991
). The basal ganglia also are implicated in serial processing through the impairments of cognitive and motor skills in Parkinson's (Brown and Marsden 1990
; Harrington and Haaland 1991
) and Huntington's disease (Gabrieli 1995
; Willingham and Koroshetz 1993
). Some of these deficits are strikingly similar to the ordering deficits of frontal patients (Sagar et al. 1988
; Sullivan and Sagar 1989
; Willingham and Koroshetz 1993
).
Single-unit recordings in primates executing delayed-sequence tasks support the importance of prefrontal cortex and basal ganglia in serial processing. Instructional cues are presented in a particular sequence, and, after a delay period, the subject must produce a corresponding sequence of responses. Neurons in prefrontal areas, and closely linked areas of the frontal eye fields and caudate nucleus, are sensitive to the serial order of the instructional sequence (Barone and Joseph 1989
; Funahashi et al. 1993
; Kermadi and Joseph 1995
; Kermadi et al. 1993
). Responses that are initiated by the instructions and sustained through the delay period could represent conversions of temporal sequences of sensory input into spatial patterns of neural activation. Similarly, some motor-preparation units in the frontal eye fields, caudate nucleus, and globus pallidus are related to the serial order of the subsequent sequential actions (Barone and Joseph 1989
; Kermadi and Joseph 1995
; Kermadi et al. 1993
; Mushiake and Strick 1995
; Tanji and Shima 1994
). Such activity could represent commands for the conversion of a spatial pattern of activation into the temporal domain of movement. Together, these studies provide persuasive evidence for the existence of conversion mechanisms bridging the temporal domain of sensory input, the spatial domain of Lashley's "determining tendency," and the temporal domain of behavioral expression.
Sustained responses in the prefrontal cortex of primates appear to function as a spatial working memory during delayed-response tasks (Funahashi et al. 1989
, 1990
; Fuster and Alexander 1971
; Goldman-Rakic 1995
; Goldman-Rakic et al. 1990
; Petrides 1991
). Evidence for working memory activity within analogous areas of the human prefrontal cortex comes from functional imaging studies (Fiez et al. 1996
; Jonides et al. 1993
; McCarthy et al. 1994
). Discharge that is sustained through the delay period also has been identified in the caudate (Hikosaka et al. 1989b
; Schultz and Romo 1992
) and SNr (Hikosaka and Wurtz 1983
) and in the thalamus (Fuster and Alexander 1973
). Evidently neural correlates of spatial working memory and serial processing are found in many of the same areas of the CNS. Indeed, it has been suggested that the mechanisms providing temporal integration in sequencing tasks be viewed as extensions of those providing working memory representations in delayed-response tasks (Fuster 1985
; Goldman-Rakic 1987
).
In this paper, we present a neural network model of cortical-basal ganglionic processing that focuses on the transformation of sequential sensory input into spatial patterns of neural activity, an operation that we refer to as encoding. Although we do not model it here, we will refer to the inverse transformation, from a spatial pattern to a sequence of movements, as a decoding operation. Some means of encoding the serial order of events or perceptions and for decoding the result into appropriate actions clearly is required for the performance of most of the tasks discussed in the previous paragraphs. The model presented here demonstrates how the encoding process might be a natural outcome of the basic anatomy and physiology of the basal ganglia and cerebral cortex. As a test of the model, we compare its responses with the single-unit responses of neurons recorded from the prefrontal cortex and basal ganglia during the instruction and delay phases of delayed-sequence tasks.
 |
MODEL |
The encoding model presented here is an implementation of the conceptual model of cortical-basal ganglionic processing proposed by Houk and Wise (1995)
. These authors based their conceptual model on the modular anatomic organization of "parallel loops" linking the frontal cortex, basal ganglia, and thalamus, originally conceived by Alexander, DeLong, and Strick (1986) and supported by recent transsynaptic labeling studies (Middleton and Strick 1997a
). The present encoding model deals specifically with the loop through area 46 in the prefrontal cortex, through caudate nucleus (CD), internal segment of the globus pallidus (GPi), thalamus (T), and back to the PF. We follow Wise and Houk (1994)
in assuming that this macroscopic module is itself composed of an array of similarly organized microscopic modules. Thus the (microscopic) module illustrated in Fig. 1 follows the basic anatomic plan of the prefrontal cortical-basal ganglionic loop.

View larger version (12K):
[in this window]
[in a new window]
| FIG. 1.
Individual cortical-basal ganglionic module (adapted from Houk and Wise 1995 ). Convergent projections from many cortical cells (C) make excitatory synapses (depicted as , where the distribution of dot sizes represents a distribution of synaptic weights) with a spiny neuron in the caudate nucleus (CD). This CD unit sends an inhibitory projection (depicted as "a") to a unit in the internal segment of the globus pallidus (GPi), which in turn inhibits a thalamic relay unit (T). Thalamic unit makes a reciprocal excitatory connection with a cortical unit to complete the module's recurrent loop.
|
|
The first stage consists of convergent excitatory projections from a large number of cells in the cerebral cortex (C) onto a medium spiny neuron within the caudate nucleus (CD) of the neostriatum. Portions of the prefrontal cortex, in particular areas 9, 10, and 46, project preferentially to the dorsolateral head of the caudate (Selemon and Goldman-Rakic 1985
, 1988
). Each medium spiny neuron receives input from ~10,000 different corticostriatal afferents (Wilson 1995
). This highly convergent neuronal architecture, together with the physiological properties of the cells, led Houk and Wise (1995)
to suggest that spiny neurons are positioned ideally for detecting contextual events of behavioral significance. With respect to the instructional phase of a delayed-response task, contextual event detection might involve the recognition of stimulus-related signals conveying an instructional cue's spatial position, identity, or other physical characteristics. In a serial task, context also would include intrinsic signals such as working memory representations of previous stimuli.
There is some disagreement regarding the cortical origins of projections to a given volume of the striatum (Wise et al. 1996
). One hypothesis favors convergent input from cells in functionally related, yet distinct, cortical areas (Flaherty and Graybiel 1993
; Parthasarathy et al. 1992
; Yeterian and Van Hoesen 1978
), whereas another favors convergence from neighboring cells in a single cortical area (Selemon and Goldman-Rakic 1985
; Strick et al. 1995
). Either anatomic arrangement would provide the convergence of sensory and recurrent projections onto the CD layer as required by the model. Corticostriatal projections from the prefrontal cortex and several of its reciprocally linked areas (e.g., posterior parietal, orbitofrontal, anterior cingulate, and superior temporal cortex) converge in a general way onto the same volume of caudate, although the predominate pattern is one of segregation or interdigitation of terminal fields as opposed to frank intermixing (Selemon and Goldman-Rakic 1985
). Alternatively, cue-related sensory signals in posterior parietal might be relayed to CD units via the sensory-related cells in the PF through cortical-cortical projections (Bates and Goldman-Rakic 1993
; Selemon and Goldman-Rakic 1988
). What is important to note here is that either mechanism of convergence could be used to provide the model's caudate layer with sensory-related input information.
Continuing on to the next layer of the loop, spiny neurons in the head of the caudate make inhibitory synapses (depicted as "a" in Fig. 1) with neurons in the dorsomedial one-third of the GPi (Hedreen and DeLong 1991
), which in turn project to nuclei of the thalamus including ventralis anterior (VA) and ventralis lateral (VL) (DeVito and Anderson 1982
). Neurons in the GPi are characterized by a high rate of tonic activity interspersed with momentary pauses due to spiny neuron firing episodes (Wilson 1990
). The tonic activity inhibits projection targets in the thalamus, and the pauses produce a disinhibition of thalamic neurons (Deniau and Chevalier 1985
). This disinhibition initiates a postinhibitory rebound discharge response within thalamic relay neurons that is mediated, in part, by low-threshold T-type calcium channels (Wang et al. 1991
). Thus the dual inhibitory action of this pathway serves to activate thalamic discharge through disinhibition (Deniau and Chevalier 1985
).
VA and VL, along with other thalamic nuclei including the medialis dorsi (MD), contain neurons that project ipsilaterally back to the PF to close the cortical-basal ganglionic loop (DeVito and Anderson 1982
). An additional loop is formed by neurons in area 46 of the PF that project in a reciprocal manner back to several thalamic nuclei including MD and VA (Jacobson et al. 1978
; Siwek and Pandya 1991
). It has been suggested that such a cortical-thalamic loop has the potential, given sufficient gain, for sustaining activations, like those thought to be correlates of working memory, through positive feedback (Dominey and Arbib 1992
; Hikosaka 1989
; Houk and Wise 1995
).
There is also an indirect pathway through the basal ganglia that is not depicted in Fig. 1 because it is not simulated in the present rendition of the encoding model.
Sequence encoding with an array of modules
The delayed sequence task begins with an instructional period during which three cues are illuminated in a particular serial order (Barone and Joseph 1989
; Funahashi et al. 1993
; Kermadi and Joseph 1995
; Kermadi et al. 1993
). After a short delay period, the subject is required to touch the cues in the same order in which they were illuminated. Because the present model focuses on the encoding problem, we will only consider the instruction and delay phases of the task.
The encoding model (Fig. 2) combines several modules of the type shown in Fig. 1 into an interacting array. The PF layer is composed of event-related (E) and recurrent (R) neurons. The three event-related units (labeled A, B, and C in Fig. 3) provide the model with a labeled-line representation of the instruction sequence. To simulate the onset and offset of individual cue lights, E units are toggled sequentially on and off. This type of signal resembles that of visual fixation neurons of the posterior parietal cortex; these neurons respond to the onset of the stimulus and give brisk discharges that continue as long as the stimulus remains within the receptive field (Goldberg and Colby 1989
). Neurons in area 7a respond to the retinal location of a visual stimulus with receptive fields that are typically unimodal and broadly tuned (Robinson et al. 1978
). Such cue-related signals could be conveyed to cells of the prefrontal cortex via corticocortical projections. Clearly, the model's labeled-line inputs do not exploit much of the rich information contained in parietal responses; however, this simplification allows us to focus on the ordinal, rather than spatial, aspects of the encoding task.

View larger version (26K):
[in this window]
[in a new window]
| FIG. 2.
Array of cortical-basal ganglionic modules. Three modules of the type shown in Fig. 1 are combined to illustrate the organization of a modular array regulating prefrontal (PF) cortex activity. C units of Fig. 1 are divided into 2 categories. Those that receive recurrent input via the basal ganglia and thalamus are designated R (recurrent) units, whereas those receiving cue-related input from posterior parietal cortex are designated event (E) units. CD units receive convergent input from many R and E units and themselves are interconnected by inhibitory collaterals to form a competitive network (shown symbolically by the shaded gray area).
|
|

View larger version (19K):
[in this window]
[in a new window]
| FIG. 3.
Response of an isolated module to a cue input. Single instructional Cue is pulsed on and off. Input from this cortical event unit depolarizes the CD unit, leading to its activation. Inhibitory input from the CD unit causes the tonically active GPi unit to hyperpolarize and pause. This pause in GPi inhibition produces a rebound response in the T unit. Activation of the T unit is relayed to the cortical unit that participates in this module, and its activity is sustained by positive feedback between the T and C units. This illustrates the bistable nature of the model's cortical-thalamic loop. Solid and dashed traces depict membrane potential and activation, respectively.
|
|
Corticostriatal afferents make en passant synapses with spiny neurons (Wilson 1990
); this serves to distribute information to CD units across the entire modular array. There is a mixture of input from the E units described in the previous paragraph and input from the R units in the PF cortex, so named because they receive recurrent input from the model's processing modules. The R inputs provide each module access to sustained cortical-thalamic activity, representing results obtained from the processing of prior events. Thus each CD unit is presented a spatial pattern of input representing both present events and context signals based on the processing of prior events.
The modules also compete through the inhibitory collaterals of caudate spiny neurons (shaded region in Fig. 2). Striatal competition is strongly suggested by the preponderance of medium spiny neurons, by the extent of the axonal arborizations of their collaterals, and by some physiological evidence (Groves 1983
; Katayama et al. 1981
; Rebec and Curtis 1988
; Wilson 1995
). Wickens (1993)
has modeled spherical zones of mutual inhibition that he calls inhibitory domains. We instead model competitive interactions with a fully connected network of inhibitory CD units. The use of a single domain is a simplification that neglects the potential for more complex interactions.
 |
METHODS |
Neurons were modeled as single membrane-bound compartments with passive leakage conductances. A first-order differential equation relates the membrane leakage current and synaptic currents to the membrane potential for a neuron, j (Eq. 1)
|
(1)
|
The passive electrical properties of the model's neurons are representative of those reported for the cortex, striatum, and thalamus (Connors et al. 1982
; McCormick and Huguenard 1992
; McCormick et al. 1985
; Wilson 1990
). A membrane capacitance (C) value of 0.5 nF and leakage conductance (gL) of 0.0333 µS gives each neuron a time constant of 15 ms. Resting potential (EL) was set to
60 mV. The membrane leakage currents are defined by Eq. 2
|
(2)
|
The model represents synapses as scalar weights (wj,k) between neurons k and j. Making the simplifying assumption that inputs sum in a linear fashion, we lump the action of many synapses into a single current. The weighted sum of presynaptic firing rates gives the synaptic current (Eq. 3)
|
(3)
|
A sigmoidal activation function (Eq. 4) with a threshold (Vth) of
55 mV is used to convert membrane potential into an output firing rate within a normalized range between 0 and 1. In the CD layer, a large slope parameter, b in Table 2, was used to model the sharp transitions between "up" and "down" states displayed by striatal spiny neurons (Wilson 1995
)
|
(4)
|
The caudate layer of the module receives convergent excitatory inputs from neurons of the PF cortex, modeled by Eq. 3. In addition, CD units compete through the inhibitory action of GABAergic collaterals. The total inhibitory current for each CD unit is determined by scaling the sum of the activations of all other CD units in the layer; CD units do not receive self-inhibitory input.
Pallidal neurons were modeled with a spontaneous firing rate of 0.5 using a bias current (
0.1665 nA) that depolarizes the membrane potential to Vth. At Vth, the output of the GPi unit is maximally responsive to inhibitory input from the CD layer. The synaptic weights between CD and GPi layers were adjusted such that each CD input strongly inactivated its GPi target.
Thalamic relay neurons display postinhibitory rebound behavior mediated by T-type calcium currents (McCormick and Pape 1990
). This rebound current permitted firing in response to pauses in the inhibitory input from GPi. It was modeled as specified by Wang et al. (1991)
|
(5)
|
The voltage dependence a of the steady-state activation and inactivation gates m and h was modeled with the Boltzman equation (Eq. 6)
|
(6)
|
The constants for these curves were set at physiologically plausible values noted in Table 1. The kinetics of the channel's gating variables both follow first-order differential equations with voltage-dependent time constants (Wang et al. 1991
).
The inhibitory weights between GPi and T units were adjusted such that T units remained hyperpolarized at
76 mV under inhibition from tonically active pallidal units. This hyperpolarized membrane potential results in a strong rebound response from the calcium channel. The recurrent excitatory weights from T to PF and back were selected such that they would produce sustained cortical-thalamic firing rates once the PF unit was activated. All synaptic weights are listed in Table 2.
Alternate model assumptions
Most of the simulations reported in this paper used the model of the synaptic current detailed above to calculate excitatory and inhibitory synaptic currents from synaptic weights and presynaptic firing rates. This approach ignores the nonlinear effects of membrane potential on synaptic current values and thus treats the synapse as a "current source." To explore the limitations of the "current-source" synapse assumption, simulations were run with a more physiological synaptic model that treats the weighted sum of the presynaptic firing rates as a synaptic conductances. These excitatory (Eq. 7) and inhibitory (Eq. 8) conductance values are converted into currents by multiplying the difference between the membrane potential and the applicable synaptic reversal potential (Eq. 9)
|
(7)
|
|
(8)
|
|
(9)
|
Simulation methods
An object-oriented simulator was written using the C++ programming language. The simulations were performed using batch processes running across a group of 30 Hewlett-Packard workstations (HP 712/80 i, HP 715/50, and HP 715/33). The nonoverlapping cue presentation paradigm was modeled after the approach used in the caudate studies by Kermadi et al. (Kermadi and Joseph 1995
; Kermadi et al. 1993
). The task is simulated by sequentially toggling the activation of the model's event-related (E) neurons on and then off (Fig. 2). In the Kermadi paradigm, consecutive cues are illuminated for 800 ms at 1,500-ms intervals. However, the time necessary for the network to reach equilibrium was much less than the 800 ms between changes in the state of the cues and varied considerably according to the magnitude of the corticostriatal weights. To minimize the amount of wasted simulation time, the original paradigm was modified so that the three onsets and offsets of the cue sequence were varied to trigger as soon as the network settled into a stable equilibrium.
The model equations were solved numerically using a fourth-order Runge-Kutta method with an adjustable time step ranging between 0.1 and 1.0 ms as a function of the magnitude of the first-order Runge-Kutta term. During a time step, each of the model's layers was synchronously updated in the order CD, GPi, T, and PF. Time steps were small in comparison with the time constants of network equilibration.
Glossary
| PF |
prefrontal cortex
|
| CD |
caudate nucleus
|
| GPi |
internal segment of globus pallidus
|
| T |
thalamus
|
| MAX |
maximum random synaptic weight
|
| RANGE |
range of random synaptic weight distribution
|
| V |
membrane potential
|
| C |
membrane capacitance
|
| IL |
leakage current
|
| gL |
leakage conductance
|
| EL |
membrane resting potential
|
| Isyn |
synaptic current
|
| wex |
excitatory synaptic weight
|
| winh |
inhibitory synaptic weight
|
| Vth |
threshold potential
|
| Z |
presynaptic firing rate
|
| b |
slope of sigmoidal activation function
|
| IT |
low-threshold calcium T-type current
|
| ECa2+ |
calcium reversal potential
|
| m |
activation gating variable
|
| h |
inactivation gating variable
|
| gT |
maximum T-type conductance
|
| a |
steady-state activation/inactivation of m and h gates
|
| Vh |
half-maximal voltage
|
| k |
Boltzman equation slope parameter
|
| gex |
excitatory synaptic conductance
|
| ginh |
inhibitory synaptic conductance
|
| Eex |
excitatory synaptic reversal potential
|
| Einh |
inhibitory synaptic reversal potential |
 |
RESULTS |
Responses of an isolated module
The response of an isolated module (Fig. 1) to a single cue input serves to illustrate the model's basic processing operations. In its initial resting state, the module's units are quiescent except for the GPi unit, which exists in a tonic state of moderate activation (Fig. 3, GPi). A single event-related input is pulsed on and then off to simulate the onset and offset of an instructional cue (Fig. 3, Cue). This input induces the spiny unit in the CD layer to fire phasically. The short burst of CD activity (Fig. 3, CD) produces a momentary pause in GPi activity (Fig. 3, GPi), thus releasing the T unit from a state of tonic inhibition. Transient removal of pallidal inhibition produces a slow depolarization of the T unit with dynamics initially dominated by the passive properties of the unit (Fig. 3, T). This slow depolarization allows the activation variable, m, to increase, creating an inward calcium spike, which then quickly depolarizes the T unit
driving it into an activated state (Fig. 3, T). The C, which receives an excitatory input from the T layer, subsequently begins to fire ~32 ms after the CD unit first crossed its firing threshold (Fig. 3, C). Most of the signal pathway's 32-ms delay is due to the kinetics of the thalamic T-type calcium channel. Reciprocal excitatory inputs from the C unit stabilize the membrane potential of the T unit at a level above threshold as it begins to repolarize. The reciprocal system quickly latches into a state of sustained activation that is maintained even after the return of tonic GPi inhibition. The cortical-thalamic loop is a bistable system because it has two stable equilibrium states (activated and inactivated) at moderate levels of pallidal input. Corticothalamic bistability is one of the key computational features of the Houk and Wise module. The transition from the activated state back to the inactive state requires a burst of inhibitory input to this bistable loop. Such a burst response could be effected by a burst of excitatory input to the GPi from the subthalamic nucleus (STN) of the indirect pathway. Presumably, this burst of STN activity would reflect activation of other spiny units within the striatum. The present simulation does not include this mechanism.
Interacting array of modules: emergence of spatial patterns
Sensory inputs produce sustained activations within cortical-thalamic loops and thus alter the internal states of individual modules, as illustrated in the previous section (Fig. 3). In an array of modules (Fig. 2), internal state information serves as an additional input modality to the CD layer. These recurrent (R) inputs provide information about past events that can influence future states of the model, thus providing a way of linking temporally spaced sensory inputs.
An array of 30 modules was initialized with randomly distributed corticostriatal weights and examined during the sequential cue paradigm. Figure 4, top, displays the cue inputs for the instruction sequence ABC. As mentioned in the previous section, the event-related units provide the network with labeled-line representations of cue stimuli. Accordingly, their time traces simply reflect the state of the three instruction cues during the sequence presentation.

View larger version (34K):
[in this window]
[in a new window]
| FIG. 4.
Response of CD and PF layers to a sequence (ABC) of cues. Onset of cue A produces brief phasic and longer-duration burst responses among the competing CD units. Bursting CD units trigger bistable cortical-thalamic loops, leading to sustained activations of the corresponding R units in PF cortex (Fig. 3). Recurrent input from R units provides a context that influences the responses of the CD layer during subsequent cue presentation. By the end of the period of cue presentation, the particular temporal sequence of events is effectively encoded by the spatial pattern of sustained activity in the R units of the PF layer.
|
|
Figure 4, middle, displays the response of the CD layer. Recall from Fig. 3 that the response of the GPi layer, though inverted, is very similar to the CD layer response. After the onset of the first cue in the sequence, the competitive CD layer settles into an equilibrium state defined by the activation of CD units 11 and 26. Cue offset often results in a resettling of the network into a second equilibrium consisting of a different group of winning CD units. In fact, over the course of the three-cue sequence, the network will often settle into six distinct equilibriums. This competitive settling produces a mixture of short and long bursts, as well as phasic response dynamics where, in fact, sustained responses in the CD and GPi layer are the exception rather than the rule. This result agrees with experiments in the caudate (Hikosaka et al. 1989a
) and SNr (Hikosaka and Wurtz 1983
) reporting that 10% (80/867) and 16% (15/95) of task-related neurons in these areas, respectively, display sustained responses.
During the competitive settling phase, a CD unit that becomes active for a significant time period will induce a rebound response in its module's thalamic relay neuron, leading to sustained activation of its cortical-thalamic loop. The PF layer response is displayed in the bottom set of traces in Fig. 4. In comparing activity in the CD and PF layers, note that while activated CD units may become deactivated during this time, the activity of the PF units, once elevated, is maintained by positive feedback within the bistable cortical-thalamic loops. Thus the PF activations provide a spatial record of significant CD activity. In this example of a simulated trial, the spatial code for the sequence ABC involves a pattern of 12 activated PF units (units 1, 3, 5, 8, 9, 10, 11, 12, 16, 23, 25, and 26).
In addition to providing a spatial record of CD activity, the PF patterns also can provide an unambiguous encoding of the input sequence. To demonstrate this computational property, a group of six sequences of three cues A, B, and C (i.e., ABC, ACB, BAC, BCA, CAB, and CBA) was presented to a network of 30 modules. The 15 rows of the prefrontal activation diagram (Fig. 5, gray squares indicate active units) represent the equilibrium state of the PF layer at each stage of the cue presentation for this group of six sequences. Together, the six sequences present the network with 15 different sequential contexts (A, B, C, AB, AC, BA, BC, CA, CB, ABC, ACB, BAC, BCA, CAB, and CBA). The patterns of active PF units comprising each row of the diagram can be thought of as spatial representations, or encodings, of the 15 sequential contexts. Note that the PF units in Fig. 5 display 15 distinct patterns of activation in response to the 15 different sequential contexts. These distinct responses represent an unambiguous encoding of the serial information presented by the set of instruction sequences.

View larger version (51K):
[in this window]
[in a new window]
| FIG. 5.
Distinctive spatial patterns of sustained PF activity generated in a perfect network by the 15 sequential contexts that result from the presentation of 6 test sequences of cues (ABC, ACB, BAC, BCA, CAB, and CBA). In each row, the darkened (gray) squares indicate which PF units were active after each period of cue presentation. Note that each row is characterized by a different spatial pattern, indicating that each sequential context is encoded by a unique spatial pattern of PF activity. Columns indicate how each module participates in this encoding task.
|
|
Note that the spatial code of PF activation is relatively dense; indeed, each of the six sequences (bottom 6 rows of Fig. 5) engages between 11 and 18 units. We will explore this issue further in the next section. Next, note that increasing numbers of PF units are engaged as each sequence progresses, thus providing increasing amounts of recursive input to the CD layer. The number of activated CD units at any given instant is determined by a balance between the level of striatal inhibition and the total number of active corticostriatal inputs. Thus as the sequence progresses, and increasing amounts of input are supplied by the recurrent frontal projections, a greater number of CD units becomes activated. This increase in PF input stabilizes the network, making it less sensitive to future sensory inputs from the posterior parietal layer.
Tolerance to random corticostriatal weights
The corticostriatal weights of the network discussed in the previous section were selected randomly from a uniform distribution of values spanning the closed-positive interval defined by a maximum (MAX) and range (RANGE) of weight values. The network was instantiated and tested at several combinations of MAX and RANGE until it produced an unambiguous set of PF patterns in response to the 15 sequential contexts. An instantiation of the network that produced such "perfect" performance of the task was found within the first few instantiation attempts. Given the ease by which appropriate distribution parameters were established in this initial study, it appeared quite likely that other combinations of MAX and RANGE combinations also might produce perfect networks.
To explore this hypothesis, the network was instantiated with weights drawn from uniform distributions defined by 5,564 different combinations of MAX and RANGE. For each instantiation, the network was tested with the 15 serial contexts and the number of distinct PF patterns produced was recorded. The color of each pixel in Fig. 6A indicates the number of distinct PF patterns produced by an instantiation of the network at a particular combination of maximum and range. Note that random weight distributions defined by combinations of MAX and RANGE such that RANGE > MAX were not tested because they allow for negative (i.e., inhibitory) values of corticostriatal weight.

View larger version (51K):
[in this window]
[in a new window]
| FIG. 6.
Sequence encoding performance of networks instantiated with random corticostriatal weights. Each pixel represents a network instantiated with a uniform distribution of random corticostriatal weights defined by combinations of MAX and RANGE value. Pixel color indicates the number (0-15) of distinct PF patterns produced by that network in response to 15 sequential contexts. A: network with current-source synapses in CD layer. B: current-source networks that produced 15 PF codes in 1 of 10 instantiations. C: current-source networks with decreased activation function slope; D: with increased CD layer time constant. E: network with reversal potential synapses in CD layer. F: network with feed-forward CD layer inhibition.
|
|
There are several distinctive features of this color map. First, much of the valid parameter space appears to be effective, yielding networks that produce distinct codes for 13-15 of the sequential contexts. The left edge of this effective area is fairly distinct, indicating a sharp transition between ineffective and effective parameter values. Note that there is a lower limit to the MAX weight value (~2.0), below which the network fails to produce distinct codes. Maximum weights below this value result in CD units with synapses too weak to produce suprathreshold responses, and thus produce no PF patterns. At the other end of the scale, large values of the MAX parameter result in networks with overactivated CD layers, thus completely activating, that is, saturating, the PF layer.
Note that when the RANGE equals zero, the network produces no distinct patterns. This is a result of the symmetrical inhibition within the striatum. With all weights identical, and no noise within the system, there can be no winning neuron within the striatum. As a result, either all or none of the CD units become activated depending on the value of the MAX.
Individual instantiations of the network with the same MAX and RANGE parameters will perform differently on the task because the weights are assigned randomly. This accounts for much of the variation in pixel color across the effective parameter region. To get a better picture of the parameter combinations leading to perfect networks, each of the 5,564 combinations was tested with 10 network instantiations
for a total of 55,640 network instantiations. Figure 6B indicates all parameter combinations which produced "perfect" performance in
1 of 10 instantiations. Note that the parameter combinations producing perfect performance do not fall along the line of maximum RANGE (i.e., the main diagonal of the figure). This result may be due to the fact that maximal RANGE values produce a greater proportion of near-zero weight values. Rather than contributing to distinct network responses, these near-zero weights might be largely useless.
Two summary measures were used to describe the PF patterns produced by the perfect networks. First, the mean number of PF units activated by the six sequences of three cues was 14.64 out of 30 units, with a standard deviation of 0.23 units. This is a fairly dense coding scheme. Second, the average vector cosine between the six PF patterns, gives a measure of the similarity of the patterns produced by the group of six sequences. The relatively high mean value of this cosine (0.643 ± 0.009, mean ± SD) indicates that the PF patterns produced by the network are oriented, on average, 50° apart within their 30 dimensional vector space. This would represent a considerable amount of overlap between sparse codes, but given the density of this code, the patterns are reasonably uncorrelated.
Analysis of receptive fields
Let us return to the PF activation diagram (Fig. 5) introduced earlier to illustrate the model's ability to encode sequential inputs within spatial patterns of sustained activation. It is also interesting to examine Fig. 5 in a column-wise fashion
the perspective of a physiologist recording the receptive fields of single units. However, first it is advantageous to make a slight modification to the diagram. Recall that PF units, once activated, remain active through the remainder of the sequence due to sustained cortical-thalamic feedback. Accordingly, with the exception of rows A, B, and C, the rows of Fig. 5 represent responses to the current cue as well as sustained activations produced by previous cues within the sequence. For example, unit 2 in Fig. 5 begins firing when cue C arrives first in a sequence and sustains this activation through the presentation of subsequent cues A and B. These "working memory" squares of the plot, in this case those corresponding to CA, CB, CAB, and CBA, are redundant and thus obscure the responses defining a unit's receptive field. In Fig. 7, the working memory activations are eliminated, and thus each column of the plot can be thought of as a binary vector defining the "receptive field" of a PF unit.

View larger version (43K):
[in this window]
[in a new window]
| FIG. 7.
Receptive fields in a network of 30 units. Each column defines the receptive field of an individual unit. Units 2 and 9 display context-dependent Rank1(C) and Seq3(ABC) behavior. Unit 18 displays Cue(A) behavior that is similar to working memory activity. Most units respond to several different serial contexts.
|
|
Examining the columns of Fig. 7, notice that a few of the units (2 and 9) respond to only 1 of the 15 serial contexts. Such responses, which we refer to as "simple," can be grouped into one of three response types: Rank1(X), Seq2(XY), and Seq3(XYZ) [Note that in our notation, (X) refers generically to any 1 of the 3 cues, (XY) to the 6 sequences of length 2, and (XYZ) to the 6 sequences of length 3]. All three types of simple responses have been reported in the single-unit literature. For example, unit 2, which only responds to cue C as the first cue in a sequence, is sensitive to the serial rank of the cue. This type of response, which we refer to as Rank1(X), has been observed in the PF (Funahashi et al. 1993
) and frontal eye field (FEF) (Barone and Joseph 1989
), caudate (Kermadi and Joseph 1995
; Kermadi et al. 1993
), and GP (Mushiake and Strick 1995
) during the presentation phase of delayed sequencing experiments. Unit 9, which responds to cue C when it is preceded by the sequence AB, displays a sequence dependence we term Seq3(XYZ). Other instantiations produced units with Seq2(XY) responses. This type of unit has been identified experimentally in the FEF (Barone and Joseph 1989
) and caudate (Kermadi and Joseph 1995
; Kermadi et al. 1993
).
Many of the units in Fig. 7 respond to more than one serial context
we refer to these as "compound" receptive fields. For example, unit 18 responds to cue A, independent of serial rank or context. This compound receptive field, which we term Cue(X), is composed of a mixture of Rank1(X), Seq2(YX), Seq2(ZX), Seq3(YZX), and Seq3(ZYX) responses. Such a response is similar to the spatial working memory responses of the dorsolateral PF (Funahashi et al. 1989
, 1993
). Cue-related responses also have been recorded in the caudate during spatial-delayed sequencing; however, in contrast to their PF counterparts, they have phasic activations (Kermadi and Joseph 1995
; Kermadi et al. 1993
) .
How many different types of receptive fields are displayed by the model? First, consider the theoretical limit. A binary vector of 15 elements can have 215 (i.e., 32,768) distinct configurations; however, if we enforce the structure-based constraint that PF units, once activated, remain active throughout the remainder of the sequence, only 1,000 different "receptive field" vectors exist. Further, many of these 1,000 receptive fields are operationally equivalent. For example, a PF unit that responds to cue A as the first cue in a sequence is operationally equivalent to one responding to B in the same serial position. Eliminating these operational equivalents, only 190 different responses (including the null, or task-insensitive response) are theoretically possible.
To test this limit, we developed an algorithm to classify the receptive fields of 20,640 units from a sample of 688 perfect networks. Although all 190 receptive fields were expressed by this sample, some fields were much more common than others. Figure 8 displays the 35 most common receptive fields and sorts them by decreasing frequency beginning with the task-insensitive units in bin 1. Note that Fig. 8 groups operationally equivalent responses, such as Rank1(A) and Rank1(B), into the same bin and arbitrarily represents them by one of these equivalents.

View larger version (41K):
[in this window]
[in a new window]
| FIG. 8.
Thirty-five most-common receptive fields displayed by 20,640 units. Receptive fields are sorted by decreasing frequency beginning at bin 1. Simple receptive fields represented include Rank1(B) in bin 3, Seq2(CB) in bin 6, pure Rank1 in bin 8, pure Rank2 in bin 19, and Cue(B) in bin 35. Note that the vast majority of bins represent complex fields.
|
|
First, a large number of units (bin = 1, n = 3,054) were task insensitive. Among task-related units, the simple responses: Rank1(X) (bin 3, n = 1,049), Seq2(XY) (bin 6, n = 708), and Seq3(XYZ) (bin 4, n = 915) are three of the five most common. However, 85% of the task-related units display compound responses. The logical function performed by some of these compound receptive fields, such as Cue(X) (bin = 35, n = 27) can be understood easily and succinctly stated by higher-level monikers. Other examples include a population of units (bin = 8, n = 512) that responded nondifferentially to the first cue of all sequences. Such a response is recognized easily as a rank-dependent response or pure Rank1 receptive field in our terminology. Units in the PF fitting our Rank1 definition have been attributed to nonspecific preparation, arousal, or attention (Funahashi et al. 1993
). Similarly, units (bin = 19, n = 243) responding to all six Seq2(XY) contexts (i.e., AB, AC, BA, BC, CA, and CB) can be classified as pure Rank2 and units (not shown, n = 27) responding to all six Seq3(XYZ) contexts as pure Rank3. However, Rank2 and Rank3 responses have not yet been described in the literature.
Although the idea of simple responses combining to yield easily understood compound receptive fields is intuitively appealing, in the case of the model, most receptive fields defy such straight-forward labels such as Cue or Rank1. Instead, the logic performed by the majority of units often is described most easily by a list its component receptive fields. For example, the most common task-related unit (bin = 2, n = 1,159) displays an odd mixture of Rank1(A), Rank1(B), Seq2(CA), and Seq2(CB) responses. This type of response might best be classified as a combination of Cue(A) and Cue(B). Similar units with cue-related responses to two different cues have been reported in the caudate (Kermadi and Joseph 1995
). Several other compound receptive fields resemble those observed in single-unit studies. For example, two types of units, one (not shown, n = 27) displaying both Rank1(X) and Rank2(X) and the other (not shown, n = 32) displaying a mixture of Rank2(X) and Rank3(X) activity, resemble units reported in the FEF (Barone and Joseph 1989
). Units with a combination of Rank2(X) and Rank3(X) activity also have been reported in the caudate (Kermadi and Joseph 1995
; Kermadi et al. 1993
).
Alternative modeling assumptions
To better understand the dependence of the simulation results on our modeling assumptions, we repeated the group of simulations from Fig. 6A with an alternative synaptic model, decreased activation function slope, and increased membrane time constants. In general, we found that our results were quite insensitive to changes in these parameters within the GPi, T, and PF layers. However, they were quite sensitive to changes in the CD layer model, and these findings are reported here. Finally, we repeated simulations using a CD layer with different levels of collateral inhibition and also a feed-forward model of caudate inhibition.
ACTIVATION FUNCTION SLOPE.
We explored the sensitivity of our results to the slope parameter, b, of the sigmoidal activation function of the units in the CD layer. The preceding studies all used an extremely steep slope parameter of 50 in the CD layer. Repeating the study of Fig. 6A with slopes of 5 demonstrated no significant qualitative or quantitative differences in results. However, further reducing the slope to 1 qualitatively changed the shape of the effective coding area and decreased the number of perfect networks to 10 (Fig. 6C). Note that the network fails to encode any sequences at low to moderate values of RANGE.
TIME CONSTANT.
The results in Fig. 6A were also sensitive to the time constant of the CD layer units. Increasing the membrane time constant from 15 to 50 ms (by increasing the membrane capacitance from 0.5 to 1.67 nF) produced a considerably better performance (Fig. 6D). In addition to displaying a qualitatively larger effective coding region, this set of studies produced 460, as compared with 270, perfect networks.
ALTERNATE SYNAPTIC MODELS.
As outlined in METHODS, the model assumes a "current-source" representation of synaptic action. We repeated the random corticostriatal weight studies of Fig. 6A with a more realistic synaptic model in the CD layer. In this alternative synaptic model, excitatory (Eq. 7) and inhibitory (Eq. 8) conductance values of the CD layer are converted into currents by multiplying the difference between the membrane potential and the applicable synaptic reversal potential (Eq. 9). Figure 6E displays the results of a study using an excitatory reversal potential, Eex, of 0 mV and an inhibitory reversal potential, Einh, of
90 mV. Although the results of this group of simulations (Fig. 6E) looks qualitatively similar to the current-source results of Fig. 6A, only 64 perfect networks, compared with 270 in the current-source case, were produced. A further decrease in coding ability was observed in simulations using an inhibitory reversal potential of
70 mV, which only produced 34 perfect networks.
LEVELS OF COLLATERAL INHIBITION.
In both the current-source and more realistic synaptic models, adjustments in the level of inhibition have a scaling effect on the number, location, and spread of perfect networks within the MAX-RANGE parameter space. Increases in the level of collateral inhibition produce increases in the number of perfect networks and area of the effective region. However, because increases in inhibition also shift the region to larger values of RANGE, and thus a larger area of the MAX-RANGE parameter space, the overall proportion of perfect networks to imperfect networks stays approximately the same.
FEED-FORWARD INHIBITION.
Although collateral inhibition is strongly suggested by the existence of GABAergic spiny neuron collaterals as well as by a limited amount of physiological evidence (Groves 1983
; Katayama et al. 1981
; Rebec and Curtis 1988
; Wilson 1995
), there is also clear evidence for feed-forward inhibition via GABAergic interneurons (Kitai and Surmeier 1993
). To address this possibility, we constructed an alternative model incorporating feed-forward, rather than collateral, inhibition. As before, the feed-forward model was composed of 30 modules coursing through the PF, CD, GPi, and T layers; however, the inhibitory collaterals of the CD layer were omitted. In their place, an additional, and separate, layer of 30 interneuron units was added to provide feed-forward inhibition to the original CD layer. Like the original CD layer, each unit in the interneuron layer received corticostriatal projections from the entire PF layer; however, instead of making inhibitory projections to the GPi layer, each unit sent inhibitory projections to all of the units of the original CD layer.
The feed-forward version of the model presented dynamics that were radically different from those of the competitive model. For example, the activation levels of the CD units tended to move in unison rather than in competitive opposition. Often when only a portion of the CD layer was active, the remainder of the units huddled only a few millivolts below threshold. A slight increase in corticostriatal input via the recurrent PF units often would drive all of these units above threshold, thereby saturating the PF layer. In other words, activation of the CD layer was often an all-or-none proposition. To minimize this effect, we found it necessary to reduce the synaptic weights of the recurrent PF input to ~1/100 of that of the sensory units.
Figure 6F indicates all parameter combinations that produced "perfect" performance in
1 of 10 instantiations. Comparing Fig. 6, B and F, note that although the feed-forward version of the model is capable of encoding sequences, it does so within a region of the parameter space that is smaller and distinctly different in shape. Like the case with low activation function slopes (Fig. 6C), the feed-forward model fails to resolve small differences in corticostriatal weight and thus fails to produce perfect networks in regions of low RANGE. In addition, fewer instantiations of the feed-forward model produce perfect performance on the encoding task. Both of these results suggest that the range of appropriate corticostriatal weights is much smaller for the feed-forward version.
 |
DISCUSSION |
These simulation results suggest that the circuits linking the basal ganglia, thalamus, and cortex have an inherent capacity for encoding the serial order of events. Whereas we specifically modeled the encoding of a sequence of simple visual inputs, the same mechanisms are equally applicable to the encoding of the serial order of other sensory or internal events, as may occur, for other cortical-basal ganglionic loops, in the haptic recognition of an object or in registering the sequence of words in a sentence. This special computational property does not require adaptive training mechanisms of any kind, although adaptive tuning might improve its encoding efficiency. Another important feature of the model is that the receptive fields of its units bear a close qualitative resemblance to the receptive fields observed in single unit studies.
We begin this section by discussing the computational elements of the model, after which we consider the potential role of modulation and learning. Next we analyze the relationship between the receptive fields of the model and single unit data. Finally, we explore possible extensions of the model to other cortical areas.
Computational elements of the model
The ability of the model to encode the serial order of sequential events stems from three computational elements that combine in a cooperative manner due to the structure of the cortical-basal ganglionic network. The computational elements are: working memory, competitive pattern classification, and recursion. In this section, we review the origin and analyze the role of each of these elements.
WORKING MEMORY.
The model's cortical-thalamic loops support self-sustained activations that provide working memories of the results of prior processing. Two features contribute to this operation: focused positive feedback and low-threshold calcium channels. Positive feedback, focused within individual loops, endows the bistability that permits activations to persist after the return of tonic pallidal inhibition. Bistability substantially decouples the dynamics of working memory from operations in the CD layer. Calcium channels inactivate too rapidly to play a role in maintaining either of the stable states. Instead, they are important for initiating loop activity through a postinhibitory rebound of thalamic membrane potential. Rebound is necessary for initiating loop activity, given the double inhibitory pathway through the basal ganglia. Without a rebound mechanism, loop activation would be solely dependent on an excitatory input such as a cortico-cortical projection to PF units.
Cortical-thalamic loops represent only one possible method for producing sustained working memory activity. There is potential for sustained activity within at least four other types of positive feedback loops involving PF (Houk 1997
): cortical-cortical loops with PP cortex, cortical-cortical loops within PF, cortical-cerebellar loops between PF and dentate nucleus, and trans-striatal loops through basal ganglia. It is likely that each of these loops contributes to the net gain of positive feedback, thus contributing to sustained activity in PF neurons. For simplicity, we focused here on just one loop, the cortical-thalamic. Although not the emphasis of our study, sustained trans-striatal activity did occasionally arise within the network.
Once engaged, the working memory loops remain at a constant level of activation for the remainder of the cue sequence presentation, an assumption that is consistent with single-unit data (Funahashi et al. 1989
; Fuster and Alexander 1971
). Other models of sequence encoding have used decaying working memory profiles (e.g., Wang and Arbib 1990
). Although decaying profiles have certain computational advantages, within a complex recurrent network they have the potential to produce limit cycles or chaotic states. Sustained working memory traces, by contrast, tend to stabilize the overall network and ensure that it settles into a stable equilibrium. Sustained traces also create PF codes that are relatively insensitive to the rate of cue presentation. Accordingly, they encode the serial order rather than the timing of the cue presentation. Although this lack of timing information might pose a problem for a network attempting to encode a musical phrase (Cummins et al. 1993
), it can be an asset in skill learning. For example, motor responses in a delayed-sequencing task may be performed slowly initially. As the speed of the cue presentation and motor performance is increased, the representations in the PF would remain unchanged. This should simplify learning because what is learned during a slow-motion rehearsal remains relevant to performance at faster speeds.
COMPETITIVE PATTERN CLASSIFICATION.
The model's CD units perform competitive pattern classification on a vector of corticostriatal inputs. Their random synaptic weights provide each unit with a unique perspective on the state of event and recurrent PF activity. Some units react strongly to certain combinations of input whereas others do not react at all. Successful pattern classification does not require training but does require weight matrices with sufficient diversity and a gain balanced with the degree of striatal inhibition.
Three interpretations of the circuitry underlying striatal inhibition have been proposed: 1) collaterals of the GABAergic spiny neurons produce mutual (competitive) inhibition (Groves 1983
; Katayama et al. 1981
; Park et al. 1980
; Rebec and Curtis 1988
); 2) inputs from the cerebral cortex to GABAergic interneurons produce feed-forward inhibition (Jaeger et al. 1994
; Kitai and Surmeier 1993
); or 3) both mechanisms coexist (Kita 1993
). Our simulations, which contrast the performance of collateral and feed-forward inhibition, indicate that both help to regulate the extent of CD layer activation, which ultimately results in sparser patterns of PF layer activation. Sparse PF patterns allow room for a greater number of sequences to be stored within the PF layer, whereas excessive CD layer activation leads to a completely activated PF layer
a useless state from an encoding standpoint. This effect is particularly important as increasing numbers of PF units become activated by successive cues in the sequence.
Although both mechanisms regulate against saturation, collateral inhibition is more effective than feed-forward inhibition because it enhances the pattern classification abilities of the striatal layer by magnifying small differences (Ratliff et al. 1967
). Spiny units display a characteristic mixture of burst durations in the simulations with collateral inhibition, in good agreement with single unit data (Wilson 1990
). Feedforward inhibition, on the other hand, produces spiny unit activations that move in unison rather than in competitive opposition. Instead of differentiating between similarly activated units, feed-forward inhibition simply reduces the activation of all units through global downregulation. The simulations indicate that feed-forward inhibition (Fig. 6F) only produces perfect networks at large values of RANGE, whereas collateral inhibition is effective across a larger portion of the parameter space (Fig. 6, A and B). The greater magnification of small differences in collateral networks is accentuated in current-source type synapses as opposed to synapses with reversal potentials because the former do not suffer from shunting or saturation. Activation functions with steep slopes, justified on the basis of the abrupt transitions between "up" and "down" states observed in membrane potentials (Wilson 1995
), also enhance the CD layer's ability to magnify small differences in the input patterns (Fig. 6, A vs. C).
Collateral inhibition approximates the computational function of a winner-take-all (WTA) mechanism. An ideal WTA mechanism selects the unit with the strongest input vector
independent of the initial state of the network. However, Fukai and Tanaka (1997)
proved that collateral inhibitory networks with small or zero self-inhibition are sensitive to initial conditions and thus do not always select the unit with the strongest inputs. Consider the simple case of a two-neuron inhibitory network (A and B) without self-inhibition where unit A starts in an active state (and thus inhibits unit B). If equal inputs then are applied to the network, unit A will remain the winner because of inhibition to unit B. To win, unit B must receive inputs greater than unit A's by an amount large enough to overcome this inhibition. Thus a collateral network may not resolve small differences in input patterns.
As the cue sequence progresses and increasing amounts of input are supplied by the recurrent frontal projections, CD units receive an increasing number of excitatory inputs. With current-source synapses, additional inputs always provide additional excitatory current. This is not always the case for reversal-potential synapses because they are limited by shunting and saturation. Thus winning and losing CD units with current-source synapses can potentially receive vastly different magnitudes of excitatory current. The same is true for inhibitory synapses; however, the effect is not as important. For example, if n CD units are active, each inactive CD unit receives n identical inhibitory inputs while each active unit receives (n
1) inhibitory inputs. Thus active and inactive units differ by only a single inhibitory input. In the current-source network, the magnitude of this difference is always a constant. However, with reversal-potential synapses, this difference becomes increasing small as additional inhibitory inputs are added. Combining the effects of excitatory and inhibitory inputs, current-source synapses tend produce large excursio