JN AJP: Lung Cellular and Molecular Physiology
HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS
 QUICK SEARCH:   [advanced]


     


J Neurophysiol 90: 798-810, 2003. First published April 17, 2003; doi:10.1152/jn.00777.2002
0022-3077/03 $5.00
This Article
Right arrow Abstract Freely available
Right arrow Full Text (PDF)
Right arrow All Versions of this Article:
90/2/798    most recent
00777.2002v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Right arrow Citation Map
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via Web of Science (5)
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Graboi, D.
Right arrow Articles by Lisman, J.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Graboi, D.
Right arrow Articles by Lisman, J.

Recognition by Top-Down and Bottom-Up Processing in Cortex: The Control of Selective Attention

Dan Graboi and John Lisman

Volen Center for Complex Systems, Brandeis University, Waltham, Massachusetts 02454

Submitted 10 September 2002; accepted in final form 19 March 2003


 ABSTRACT
 
 TOP
 ABSTRACT
 INTRODUCTION
 RESULTS
 DISCUSSION
 DISCLOSURES
 ACKNOWLEDGMENTS
 REFERENCES
 
Visual recognition is achieved by a hierarchy of bidirectionally connected cortical areas. The entry of signals into higher areas involves the serial sampling of information within a movable window of attention. Here we explore how the cortex can move this window and integrate the sampled information. To make this concrete, we modeled the process of visual word recognition by hierarchical cortical areas representing features, letters, and words. At the start of the recognition process, nodes representing all contextually possible words are active. Simple connectivity rules allow a parallel top-down (T-D) computation of the relative probability of each feature at each location given the set of active words. This information is then used to guide the window of attention to information-rich features (e.g., a feature that is present in the visual image but has lowest probability). Bottom-up processing of this feature excludes words that do not contain it and leads to T-D recomputation of feature probabilities. Recognition occurs after several such cycles when all but one word has been excluded. We show that when 950 words are stored in long-term memory, recognition occurs after an average of 4.9 cycles. Because covert attention can be moved every 20–30 ms, word recognition could be as fast as determined experimentally (<200 ms of cortical processing). This model accounts for the findings that recognition time depends logarithmically on set size, recognition time is reduced when context reduces the number of possible targets, the time to classify a nonword decreases when its approximation to English decreases, and in high level cortex, the firing of neurons tuned to an object increases progressively as its recognition occurs. More generally the model provides a physiologically plausible view of how bi-directional signal flow in cortex guides attention to produce efficient recognition.


 INTRODUCTION
 
 TOP
 ABSTRACT
 INTRODUCTION
 RESULTS
 DISCUSSION
 DISCLOSURES
 ACKNOWLEDGMENTS
 REFERENCES
 
The visual recognition process is performed by a hierarchy of cortical areas in the ventral stream (Barone et al. 2000Go; Felleman and Van Essen 1991Go; Lerner et al. 2001Go). Processing in the first cortical region, V1, detects features (oriented contrast gradients), whereas processing at higher levels detects more complex patterns by combining inputs from lower-levels (Reid 2001Go; Tanaka 1996Go). This bottom-up (B-U) information flow has been extensively studied. However, there is also a top-down (T-D) flow of information the function of which is less clear. Anatomical studies have demonstrated the existence of massive connections from higher level areas back to lower level areas (Rockland and Pandya 1981Go; Salin and Bullier 1995Go). These T-D connections can strongly affect neuronal function (Cauller and Kulics 1991Go; Lee et al. 1998Go; Tomita et al. 1999Go) and provide a way for high-level information to affect perception (Ress et al. 2000Go). For example, in "semantic priming," recognition of words in a category is enhanced if subjects know the category (Lorch et al. 1986Go; Neely 1991Go; Neely et al. 1989Go). Thus contextual expectancies may affect the visual system even before the stimulus arrives. Recordings from different levels of the cortical hierarchy during recognition show only minor latency differences (~10 ms), suggesting that during the subsequent recognition process there can be interaction of B-U, T-D, and lateral information flow (Hupe et al. 2001Go; Lamme and Roelfsema 2000Go).

A second important property of information flow in cortex is that it is controlled by attentional processes. In a simplified view, attention can be considered a window (Broadbent and Broadbent 1990Go; Campbell 1985Go; Nakayama 1991Go; Van Essen et al. 1991Go) that is moved either by eye movements or by "covert" processes that occur without eye movements (Posner et al. 1980Go). Recent work using imaging methods in human V1 (Smith et al. 2001Go) and in monkey lateral geniculate nucleus (LGN) and V1 (Tootell et al. 1998Go; Vanduffel et al. 2000Go) and psychophysical methods (Bahcall and Kowler 1999Go; Caputo and Guerra 1998Go) indicates that attention imposes a ring of inhibition around an attended item, indicating a limitation to the window analogy. Attention can also be directed toward nonspatial properties, such as color or event, but even in such cases, spatial localization remains important (Bichot et al. 1999Go; Chun 2001Go; Mozer and Sitton 1998Go; Nissen 1985Go; Snyder 1972Go; Tsal and Lavie 1988Go).

The properties of attention have been explored using visual search tasks. In some cases, an object can be so distinct from nearby distractors that the time required to find the object is independent of the number of distractors (e.g., the "pop-out" of color). However, if the target and distractors cannot be simply distinguished (e.g., most letters), the time required to identify the target increases linearly with the number of distractors, consistent with a serial search process (Treisman and Gelade 1980Go; Treisman and Souther 1985Go). The slope of this linear relationship suggests that covert attention can shift serially <=50 times/s (Horowitz and Wolfe 1998Go). These internally generated shifts skip over an intervening obstacle without any time cost, suggesting that attention jumps rather than moves along a path (Egeth and Yantis 1997Go). Recent recordings from cortex have provided direct evidence for rapid, internally generated, covert shifts of attention (Woodman and Luck 1999Go). Attention shifts can also be generated by the onset of external stimuli, but these take longer to generate (Motter 1994Go; Ward et al. 1996Go) than internally generated shifts measured under the same conditions (Wolfe et al. 2000Go). The attention dependence of neuronal firing is most prominent in higher level areas (V4 and beyond) (Moran and Desimone 1985Go) but can also be detected in V1 and the lateral geniculate (reviewed in Kastner and Ungerleider 2000Go). The effects in V1 and geniculate may reflect feedback from attentionally modulated processes in higher areas (Martinez et al. 1999Go).

In this paper, we have attempted to understand how an attention-based recognition process could be organized by bidirectional processing in the cortical hierarchy. There has been considerable previous theoretical work on the role of bidirectional information flow in cortex (Cave 1999Go; Dayan et al. 1995Go; Grossberg 2001Go; McClelland and Rumelhart 1981Go; Rao and Ballard 1999Go; Tononi et al. 1992Go; Ullman 1995Go). There has also been considerable interest in how attention is moved (Koch and Ullman 1985Go; Phaf et al. 1990Go; Olshausen et al. 1993Go; Tsotsos et al. 1995Go) based on the saliency of information in the visual image (reviewed in Itti and Koch 2001Go) and in some cases the role of T-D information flow in controlling attention has been considered (Cave 1999Go; Phaf et al. 1990Go; Tsotsos 1990Go; Tsotsos et al. 1995Go). However, this work has dealt with visual search tasks or attentional tasks rather than single stimulus recognition tasks. In search tasks, the identity of the item is already known, and the subject attempts to find it among multiple items in the visual scene. In attentional tasks (Phaf et al. 1990Go), attentional control is directed to one stimulus attribute on the basis of another attribute, e.g., the subject must "name the form on the left." Here, we consider pure recognition tasks in which a single item is presented and the subject must compare it to the many items stored in memory until a match is found. The problem of moving attention during recognition has recently been addressed (Schill et al. 2001Go), although not in neural terms.

The involvement of attention in recognition raises two fundamental questions: how can the information obtained in multiple samples be integrated and how is the window of attention moved? A successful model of recognition should provide physiologically plausible answers to these questions and satisfy basic constraints provided by psychophysical measurements. Perhaps most fundamental of these measurements is that recognition time for targets within contextually constrained sets in long-term memory depends logarithmically on set size (Burrows and Okada 1975Go; Ross 1970Go). This rules out the possibility that memory is searched serially and therefore implies a parallel process. The basis of this logarithmic dependence is not known. A second important property of recognition is that it can be speeded by high level contextual information as seen in semantic priming experiments (Neely 1991Go). A third property of recognition is demonstrated in visual search for target words: when subjects search lists for target words, they search faster through nonword distractors than through word distracters (Graboi 1974Go). We have attempted to develop a model that accounts for these findings.


 RESULTS
 
 TOP
 ABSTRACT
 INTRODUCTION
 RESULTS
 DISCUSSION
 DISCLOSURES
 ACKNOWLEDGMENTS
 REFERENCES
 
To provide a concrete framework for exploring hierarchical processing, we used a simplified model of word recognition based on the work of Rumelhart and collaborators (McClelland and Rumelhart 1981Go; Rumelhart 1971Go; Rumelhart and McClelland 1982Go; Rumelhart and Siple 1974Go). To this model, we added an attention mechanism that feeds information from the feature level to higher levels only in a selected window of attention that is moved serially during the recognition process. Using this model, it is possible to compare the efficiency of different methods for moving the window and to test whether the model can account for basic properties of word recognition. Our model does not deal with many of the complexities of real world vision including scaling, rotation, letter variation, and noise. This is appropriate since the experiments we seek to account for did not involve these complexities.

Description of model

The general idea is as follows. The network has three hierarchical levels corresponding to the feature, letter, and word levels. Nodes at the "word" level are active at the beginning of the recognition process, provided they are consistent with current contextual constraints (an inclusion process). B-U flow of information through a narrow window of attention then leads to the inactivation (exclusion) of nodes that are inconsistent with the sampled information, thereby reducing the number of possible words. Recognition occurs when the serially sampled information leads to the inactivation of all but one word node. We will show that there are algorithms for moving attention that make the exclusion process efficient. These algorithms make use of T-D connections to compute the relative probability of each feature, given the set of still-possible words. Algorithms to move attention using both this T-D information and B-U information about which features are actually present can exclude a large fraction of words on each cycle. A diagram of the information flow is given in Fig. 1. What follows is a more detailed description of these processes.



View larger version (31K):
[in this window]
[in a new window]
 
FIG. 1. Bidirectional flow of information in a hierarchical model having feature, letter, and word levels. At the start of recognition, all nodes representing words that are possible within the current context are active. A serial attention-based sampling processing leads to progressive inactivation (exclusion) of word nodes until recognition occurs when only one word remains active. The flow of information is as follows: low-level B-U processing provides sensory information in parallel to feature detectors. The selective attention algorithm (SAA) moves the window of attention to a given position; "gating cells," allow only the feature within this window to be processed by higher levels. This high-level B-U processing leads to the inactivation (exclusion) of all word nodes that do not contain this feature at the particular position. T-D processing then calculates feature probabilities based on the set of still possible words. The SAA can use these feature probabilities and information about which features are actually present to move the window of attention. This cyclical process continues until recognition occurs.

 

PROPERTIES OF THE HIERARCHICAL LEVELS. At the feature level, there is a frame for the detection of four-letter words with a subframe for each letter (Fig. 2). Within each subframe there are 14 feature detectors used to distinguish letters in the font we have used (Fig. 3; note simplified font in Figs. 1 and 2). These detectors are sensitive to oriented line segments in a manner similar to the simple cells of V1. For simplicity, we assume that the sensory input drives the feature detectors between two states, "there" and "not there." This binary simplification is warranted, given the high-contrast stimuli used to obtain the experimental results we seek to account for. At the letter level there are four subframes, one for each letter position. Each subframe has 26 nodes representing each of the possible letters. At the word level, each node represents one of the stored common (nonpejorative) English four-letter words (typically 950). In the computer implementation, feature nodes receive input from pixel nodes having differing positions along a line segment, as in the Rumelhart (1971Go) model. However, because this pixel processing does not affect the function of the model, it will not be discussed further.



View larger version (19K):
[in this window]
[in a new window]
 
FIG. 2. Illustration of the connectivity between levels. At the feature level, there are 4 subframes, 1 for each letter. The font shown is a simplified version of the 14-segment font actually used (see Fig. 3). The letter level has 4 subframes, each of which contains 26 nodes, one for each letter. The word level contains a node for each known 4-letter word. The principle of T-D connectivity is illustrated to the left. Word nodes (e.g., ALSO) are connected to all letter nodes that make up that word. Thus ALSO connects to A, L, S, and O (in different subframes), but only A is illustrated. Letter nodes connect to all feature nodes that make up that letter. The B-U connections are illustrated starting from the top left vertical feature in the 2nd feature-level subframe. This connects to all letter nodes in the 2nd subframe that contain the feature (only A and B are illustrated). Letter nodes connect to all word nodes that contain that letter in the appropriate subframe. Implicit in these rules is that nodes will always be reciprocally connected.

 


View larger version (74K):
[in this window]
[in a new window]
 
FIG. 3. Illustration of each step in the recognition process and the gradual accumulation of information (reduction in number of possibilities) at all levels. The panels show the status of each feature in each subframe. The number given within each feature is its probability multiplied by 100. 1: the a priori feature probabilities before any word is presented. 2–5: steps during the recognition of "LADY". 6: the features in the final step that occurs after a slightly different unknown word, "OADY," is presented. Colors: red denotes the feature currently chosen by the SAA. Dark blue denotes features previously sampled and found to be "there." Dark green denotes features inferred to be "there." Light green denotes features inferred to be "not there."

 

SPECIFICATION OF T-D AND B-U CONNECTIONS. Collectively, the highly specific connections in the model represent the long-term memory of the structure of letters and words. These connections obey a simple "compositional rule": word nodes make T-D excitatory connections to all the letter nodes that compose the word; similarly letter nodes are connected to the feature nodes that compose the letter (Fig. 2). B-U connections connect features to all the letter nodes that contain that feature; similarly letter nodes are connected to the word nodes that contain that letter (Fig. 2).

RECOGNITION BY EXCLUSION OF ALL BUT ONE WORD. We assume that at the start of the recognition process the word nodes for contextually possible words are active. This leads to activity at letter and feature levels as computed by T-D linear summation processes and provides information used by the selective attention process (see following text). The B-U flow from each feature selected by attention will strongly excite all letter nodes that contain the feature, and these will excite the word nodes that contain these letters. Those nodes that do not receive excitation are assumed to be strongly inhibited by those that do and become inactive; this inactivation persists for the rest of the recognition process. We term this the process of "exclusion." The major phase of the recognition process is completed when all but one of the initially possible words has been excluded. This is sufficient for recognition if the subject can be certain that the items being presented are known words. If the task is such that the subject cannot be certain, an additional cycle, termed the "confirmation phase," is required. This will be described later.

We further assume that the activation of word nodes is normalized; as word nodes are excluded, the activity of the remaining active word nodes increases accordingly. As a result, the activity level is inversely proportional to the number of still possible words and represents word probability. Thus for the word node corresponding to the presented word, the probability will increase from a small value at the start of the recognition process to a value of 1 when recognition occurs. An important consequence of normalization is that the compositional rule for T-D processing leads straightforwardly to the computation of feature probabilities, which can then be used to efficiently move attention (see following text).

Selective attention algorithm (SAA) moves the window of attention during each cycle of the iterative recognition process

Although research shows that attention can be more complex than a simple "window," location is nevertheless always important (Bichot et al. 1999Go; Chun 2001Go; Mozer and Sitton 1998Go; Nissen 1985Go; Snyder 1972Go; Tsal and Lavie 1988Go), and it is the movement of attention to different locations that we address in our model. The aperture of the window of attention has not been established with certainty (Chun 2001Go); we therefore make the worst-case assumption that the window is very small and transmits only a single feature. If recognition under these conditions is feasible, it will only be more so if the window of attention is widened. The window of attention is implemented by "attentional gating nodes" (Fig. 1), a concept that was incorporated into several previous models of attention (e.g. Cave 1999Go; Tsotsos et al. 1995Go). These allow the further upward signal flow only if attention is moved to that node. In this way, the output from a single-feature node (perhaps in V1) is transmitted B-U to higher-level cortical regions where it leads to the exclusion of the still-possible letters and words that do not contain it. This is followed by T-D computation of a new feature probability landscape, which can then contribute to processes that determine the next location of attention. This model posits continual T-D/B-U processing cycles, each adding the information from a single feature to the accumulating knowledge base associated with the object being recognized. The specific set of computations that determine where attention will next be moved is termed the selective attention algorithm (SAA). Various SAAs for moving the window of attention will be considered later. These make different use of the available T-D and B-U information described in the next two sections.

T-D processing computes feature probabilities from word probabilities

Consider first the case when only one word node is active. It will excite the letter nodes contained in the word; the letter nodes (for each of the 4 positions) will then excite the features contained in those letters. Thus in this case, the feature probability landscape will resemble the word itself. If two words are active, linear summation processes will produce a feature probability landscape that looks like the superposition of two words, with features contained in both words twice as active as features contained in only one. The same logic applies for any number of still-possible words. Thus the feature probability will be directly proportional to the number of still-possible words that contain that feature. Figure 3 (1) shows the a priori feature probabilities for the set of 950 words that are stored in the long-term memory of the system. It is of interest that the probabilities of features are uneven. For instance, the diagonal features are relatively rare. Thus the landscape reflects constraints due to high-level context (which can reduce the number of possible words), the feature composition of letters and the letter composition of words. This probability landscape is a source of information available to the SAA even before a word is displayed. During recognition (Fig. 3, 1–4), the number of still-possible words is gradually reduced, and this, in turn, leads to changes in word probabilities, letter probabilities, and the feature probability landscape.

LOW LEVEL B-U PROCESSING DETERMINES WHICH FEATURES ARE "THERE" AND "NOT THERE." Another source of information available to the SAA is the result of continuous parallel low-level B-U processing of the stimulus from the retina to the primary projection area (V1). This specifies which of the 56 features are "there" (i.e., have contrast) and which are not.

Example of the recognition process

A detailed example of the recognition of a known word, LADY, is shown in Fig. 3. In this example the SAA uses both T-D and B-U information and selects the feature that is "there" that has the lowest probability. In the period before the item is presented, all of the 950 words are active and have equal low probability. From these probabilities, T-D processing computes the a priori feature probabilities shown in Fig. 3, 1. When the word "lady" is presented, the recognition process goes through 3 cycles leading to recognition. In the 1st cycle all but 75 words are eliminated; on the 2nd all but 7 are eliminated; on the 3rd cycle, the only still-possible word is the actual word, LADY. This is a sufficient criterion for recognition if the subject knows that only known words are being presented. This example illustrates the ability of the algorithm to eliminate a large percentage (in this case, >90%) of the remaining possible words on each successive cycle. The interested reader can follow each step of this process in Fig. 3. It is noteworthy that although attention acts at a particular place (i.e., gating nodes), the activity of each node at all levels will change as features, letters, and words are excluded. Thus information (a reduction in the number of alternatives) accumulates at all levels during recognition. In the example of Fig. 3, recognition of "LADY" occurred in a small number of steps. Figure 4A shows the recognition process for four other words, BEAR, CHEW, SURF, and ROSE, and illustrates the variability in the number of cycles required for word recognition. Considering 50 randomly selected cases of word recognition from the set of 950 words, the average was 4.9 cycles.



View larger version (19K):
[in this window]
[in a new window]
 
FIG. 4. Time dependence of node activity during recognition, with and without contextual information [selective attention algorithm (SAA) = bidirectional mismatch]. A: the number of still-possible words is shown as a function of the number of cycles during recognition. 4 examples (including BEAR) are shown starting with a set of 950 possible words. Recognition occurs when there is only 1 still-possible word. If contextual information is presented that restricts the possible words to animals (priming the animal context), only 35 words remain possible. When BOAR is presented under these conditions, recognition is much faster. B: the probability (firing rate) of different word nodes changes after presentation of contextual information (animal) and after the presentation of the word, BOAR. On each successive cycle, the probability of some word nodes decreases, while that of others increases. The probability of the BOAR node increases gradually to 1. The greatest interference comes from known words both semantically similar (i.e., other animal words), and words similar in physical shape to the presented word (i.e., letter overlap).

 

This form of information processing makes inferences. For example, during recognition of LADY, the system inferred that the first letter was L even though the SAA never moved attention to the first letter position. This inference was based on constraints at the word level: given that the last three letters were ADY, the only known word possible was LADY. The panels in Fig. 3 show (green color) the gradual development of inferred features (features inferred "there," dark green, P = 1; and "not there," light green, P = 0). Note that when there is only one still-possible word, the inferred plus known features exactly resemble the presented word (Fig. 3, 5). In other words, the T-D-computed feature probability map exactly resembles the features of the presented word.

Comparison of different SAAs

As illustrated in the example of Fig. 3, it is possible to determine the number of iterative cycles required for recognition of a given known word. By repeating such measurements for different words, one can determine the average number of cycles required for recognition using different SAAs. This number provides a quantitative measure for determining how the recognition process depends on the number of known words and for comparing the efficiency of different SAAs. Within the context of this model, two sources of information are available for selecting each feature. One source is the feature information provided by parallel low-level B-U processing of the stimulus (which features are "there" and "not there"). As a result of such processing, the visual stimulus activates a subset of the feature nodes in cortex. A second source of information is the feature probability landscape computed T-D. As argued in the preceding text, T-D connections convert word probabilities into feature probabilities. Although the a priori word probabilities are equal, the feature probabilities are not equal (Fig. 3). Furthermore, as word probabilities change during the recognition process, the T-D-computed feature probability landscape changes accordingly.

We have explored several different SAAs, which illustrate different ways of using the available B-U and T-D information. For each SAA, the average number of cycles required for recognition was determined for word sets of varying size ranging from 15 to 950. This number is plotted as a function of log2 of the number of words in long-term memory in Fig. 5. The data were well fit by straight lines (see Fig. 5 caption for details). We first consider an SAA that has predictable properties. This SAA picks a feature that is "there," as determined by low-level B-U processing and that is contained in 50% of the still-possible words (T-D[50%] and B-U[There]). The processing of this feature excludes half the remaining words on each cycle. This implies a slope of 1 when plotted on a log2 axis. The measured slope is 0.98 in good agreement with prediction. In this case, 1 bit of word-level information is acquired per cycle because the number of alternative words is reduced by one-half per feature acquisition.



View larger version (28K):
[in this window]
[in a new window]
 
FIG. 5. Comparison of the efficiency of different selective attention algorithms in producing recognition. See text for description of algorithms. The average number of cycles required for recognition of a known word is plotted as a function of the log2 of the number of possible words. Recognition is taken to occur when all but 1 word has been eliminated (the confirmation phase was not required here). Each set of data has been fit with a straight line having y-intercept 0 (when only one word is known, zero cycles are required). The slope, m, is given in the inset. The fit for T-D [Lowest P] was poor (R2 = 0.83); all other fits had R2 > 0.95.

 

Several of the SAAs tested were either less effective or only slightly more effective. These included simply picking a feature at random regardless of whether it was "there" or not; picking a feature that was "there" and expected with highest probability (T-D[Highest P] and B-U[There]); sampling the feature location with the lowest probability irregardless of whether the feature was "there" or not (T-D[Lowest P]) or picking at random only features that were "there" (B-U [There]).

Two other SAAs we examined were much more efficient than all the others. The simpler of these is the "unidirectional mismatch" computation (B-U [There] and T-D [Lowest P]). This selects a feature that is "there," as determined by B-U computation and that has the lowest probability, as determined by T-D processing. The other, the "bidirectional mismatch" computation, considers in addition those features that are expected with highest probability, but are "not there": whichever form of mismatch is greatest is selected. In the four-letter word-recognition task, this "bidirectional mismatch" algorithm is only slightly more efficient than the "unidirectional mismatch" algorithm. In these two most efficient algorithms, ~2 bits of word-level information are acquired per cycle and the average number of remaining words is cut in one-fourth by each selection. The observed slopes for these two algorithms are 0.52 and 0.47, respectively.

Three main conclusions can be made on the basis of the data shown in Fig. 5. First, the most efficient SAA's tested use both T-D and B-U information and exclude about twice as many words per cycle than algorithms that use only one source of information. Second, the most important principle that makes for an efficient SAA is to choose a feature with a large mismatch, e.g., a feature that is there, but which is contained in the smallest fraction of the still-possible words. Third, the time required for recognition with the efficient SAA's increases logarithmically with a slope of approximately one half (on log base 2 coordinates) with the number of words in the initial set.

Effects of contextual cueing

We next considered how the recognition process can be affected by contextual information that narrows the range of the initial set of possible words. The hierarchical organization of networks shown in Figs. 1 and 2 could be influenced by a yet higher network whose nodes represent categories of words, such as "animals," "plants," etc. In this case, the activity of particular word nodes would depend on whether the higher level category node to which the word belonged were active. If for example contextual information were present that made only the "animal" category node active, only the subset of word nodes that are in the animal category would be active at the start of the recognition process. The simulation in Fig. 4A shows that the availability of this contextual information reduces the initial set size to the 35 animal words in the list of 950 known words and leads to a dramatic reduction in recognition time.

It is instructive to plot how T-D-computed word probabilities change during the recognition process since neurons might have a firing rate related to item (word) probability. Thus the plots of probability in Fig. 4B may be relatable to electrophysiological data obtained from cortex during the recognition process (see DISCUSSION). It can be seen that when contextual information is introduced (the animal category), the probabilities of word nodes within this context (e.g., BEAR) increase, whereas the probabilities of nodes outside this context (ROSE) drop to zero. These changes reflect the fact that when the probabilities of some words fall, the probabilities of the remaining words necessarily rise. Such reciprocal changes in probability can also be seen during the course of the recognition process. Just after the stimulus BOAR is presented, the node for one word (MULE) stops firing after the first execution of the SAA, but BIRD, BEAR, and BOAR, which resemble each other, rise in probability. When the next feature is sampled, BIRD is eliminated and after one additional sample BEAR is eliminated. BOAR is now the only remaining word node and will fire maximally. This figure illustrates that when high-level (category level) contextual information is supplied, items within the category rise in probability, whereas items outside the category fall in probability. This reciprocal change is indicative of a competitive process. Similarly, this competition is evident throughout the recognition process; whenever the probability of some nodes rise within a given level, the probability of other nodes fall. Nodes representing words similar in shape to the target (e.g., BEAR is similar to BOAR) initially also rise but then fall off relative to the target at a time that increases as the similarity to the target increases. Feature nodes for both geometrically similar and semantically similar words (e.g., words in the same category) are preferentially selected. This may be viewed as a "filter" for feature selection based on both physical shape and semantic constraints.

Recognition when nonwords are possible: properties of the confirmation phase

So far we have considered how recognition can occur when only known words are presented. If both words and nonwords may be presented, then the exclusion of all but one word does not necessarily imply that this word corresponds to the presented word. For instance, if the nonword OADY is presented, the initial steps in this case are identical to those that occur when LADY is presented (Fig. 3, 1–3): after sampling three features, the only remaining known word is LADY. To establish whether all the inferred features correspond or don't correspond to those in the presented item, one additional cycle, which we term the "confirmation phase," is required. Because only one word is active at the word level, the computed feature probabilities will be one for all 19 features that are "there" in LADY and zero for the 37 features that are "not there." If the word presented is in fact LADY, the SAA in the final cycle finds no mismatch, and the word node for LADY will remain active (Fig. 3, 4). The system activity is then stable at all levels, confirming the word LADY. If the word presented is OADY, the feature shown in Fig. 3, 6, will be selected in the final cycle. The processing of this feature will exclude LADY, and the presented word must therefore be classified as an unknown word i.e., a "nonword." It should be noted that in this example, it takes the same number of cycles to classify OADY as a nonword as it takes to confirm LADY. However, as shown in the next section, on average, nonwords are classified faster than words.

Processing of words and nonwords

In visual search experiments in which subjects search lists for target words, distractors that are nonwords are classified and rejected more quickly than distractors that are words (Graboi 1974Go). Moreover, nonwords that are very different from words can be rejected more rapidly than nonwords that are similar to words (Graboi, unpublished). To examine whether these effects are captured by the model, two types of four-letter nonwords were generated: the letters of words in the list of 950 were scrambled to produce nonwords that closely approximate English ("high-bigram" letter strings), and letter strings that are not word-like ("low-bigram" letter strings). For example, the letters in "THAW" can form "WATH" (high-bigram) or "AWHT" (low-bigram). The methods for generating these two types of nonwords are given in the caption of Table 1. The time to classify a letter string as a nonword was taken to be the number of cycles required to eliminate all known words. The criterion for recognition of a word was taken to be the moment when a single word remained and was confirmed. Table 1 shows that it takes the least time on average to classify low-bigram letter strings as nonwords. It takes longer to classify high-bigram letter strings as nonwords and still longer to classify letter strings that are words. This effect occurs because words and nonwords differ statistically in their deviation from the average feature probabilities of words (nonwords will have greater differences); the greater the deviation, the more words can be eliminated on each cycle and the faster the process eliminates all known words.


View this table:
[in this window]
[in a new window]
 
TABLE 1. Effect of approximation to English on the average number of cycles required to classify letter strings as words or nonwords

 

Studies using rapid serial presentation show that category judgments (e.g., animal/nonanimal) can be made in very short period of time (Potter 1976Go; Thorpe et al. 1996Go). To explore this condition, we extended the simulations shown in Fig. 4 by comparing the processing time required for in-set (animal) and out-of-set (nonanimal) words. The average time to recognize an animal word (including confirmation) was 3.2 cycles. In contrast, a nonanimal word could be rejected as an animal word more quickly (2.3 cycles on average). This effect was significant at the P < .005 level (t = 3.08, df = 18). In 8 of 10 cases when nonanimal words were presented, the number of still-possible words jumped from greater than one to zero in a single step.


 DISCUSSION
 
 TOP
 ABSTRACT
 INTRODUCTION
 RESULTS
 DISCUSSION
 DISCLOSURES
 ACKNOWLEDGMENTS
 REFERENCES
 
We present here a model showing how the bidirectional flow of information in reciprocally connected hierarchical cortical areas can be organized to produce recognition, a fundamental function of cortex. Our model shows how recognition can occur through the detection of combinations of features. As such, the basic strategy is not specific to reading and could be employed for any task where feature combinations specify an item. The model explicitly addresses the question of how the serial process of attention can be integrated with the parallel processes that are the hallmark of neural networks. Specifically, we show how parallel T-D and B-U processing can move attention to information rich regions, thereby making the serial sampling process efficient. We furthermore show how the information obtained by a serial sampling process can be integrated. We will discuss the principles of network architecture that make these computations possible and the ability of the model to account for psychophysical and physiological data.

Integration of information

One might postulate that just before the start of the recognition process, brain networks are inactive; feedforward sensory-driven activity might then lead to activation of only those high level nodes that represent the presented item. What we postulate here is a quite different scenario. Even before the item is presented, nodes consistent with the current context are activated, a process that can be considered inclusive. After the item is presented, each serially sampled feature leads to the exclusion of a fraction of the remaining possibilities; recognition occurs when only one possibility remains. Such an exclusion process is only plausible if algorithms exist that enable each sample to exclude a large fraction of the possibilities on each cycle. Thus if 3/4 of the words are excluded on each cycle, only 1/16 will remain after two cycles and 1/64 after the three, etc. As summarized in the next section, such efficient exclusion can be achieved.

The biophysical mechanism that underlies integration over multiple samples is the process that sustains the activity on nonexcluded words and the inactivity of excluded words. Such processes can be implemented by known physiological mechanisms as will be discussed later. Although we have considered only integration for covert movements of attention, it may also occur during overt movements of attention that involve eye movements. In summary, the idea of high-level nodes (word nodes in this case) that are activated by context and deactivated by serially sampled sensory information provides an attractive and plausible scheme for integrating information.

Efficient exclusion

The efficiency of the exclusion process results from the computation of the feature probability landscape by T-D processing and the utilization of these probabilities in the guidance of selective attention. We have compared several different algorithms (Fig. 5) and found that the most efficient selects the feature for which there is the greatest mismatch between expectation and actuality (a feature that is "there" but has low probability or that is "not there" but has high probability, whichever is greatest). In information theory, this highly informative feature is termed the "Shannon Surprise" (Dayan and Abbott 2001Go). Almost as efficient as this "bidirectional mismatch" is the "unidirectional mismatch" algorithm that selects the feature that is there, that has the lowest probability. The most efficient algorithms that we have found are termed "greedy;" each successive attentional movement is determined by selecting a feature with the greatest ability to eliminate words in that cycle, without "looking ahead" to consider the process as a whole. It can be shown that although the greedy strategy is not necessarily optimal for the overall process, a true optimal strategy is NP complete (i.e., computationally intractable) and that the greedy algorithm is as close to optimal as achievable by any tractable strategy (Cohn).

An important finding is that T-D connections that obey a simple connectivity rule provide a simple mechanism for computing the feature probability landscape required for efficient exclusion. A "compositional rule" defines the encoding of long-term memory into the hierarchical structure: a word node connects to all the letter nodes contained in that word; a letter node connects to all the features contained in that letter (Fig. 2). This connectivity rule allows linear synaptic summation processes to compute the feature probability landscape. Thus for instance, if only one word is active, the features contained in that word will be active and all others will be inactive. If two words are active, the probability landscape will be the superposition of the two words with features contained in both words being twice as active as features contained in only one. It follows that when many words are active, probability of any given feature will be proportional to the number of these words that contain that feature. Importantly, the probability landscape can be based on a priori semantic and high-level contextual information (category) that is used to guide attention from the moment the stimulus arrives. As shown in Fig. 3, 1, the structure of English four-letter words is such that some features are less probable than others, for instance diagonals. It follows that if a diagonal is present in the stimulus, attention brought to this feature will be an effective way of excluding words.

The core prediction of the model is that during recognition of complex visual stimuli exemplified by words, the window of attention undergoes rapid covert movements to stimulus regions that are rich in information relevant to recognition. There are no methods yet for tracking rapid shifts in covert attention. However, these predictions could possibly be tested if the stimuli were arranged so that recognition required observable shifts in eye position. Indeed, experiments on eye movements during the viewing of natural scenes indicate that eye movements are preferentially made to information rich regions (Mackworth and Morandi 1967Go; Reinagel and Zador 1999Go).

Dependence of recognition time on the number of stored words and on context

A fundamental property of the cortical recognition reproduced by our model (Fig. 5) is that the time to recognize a target depends logarithmically on the size of the set of possible target items in long-term memory (Burrows and Okada 1975Go; Ross 1970Go). This logarithmic dependence follows straightforwardly from the idea that the underlying process is one of exclusion and that, on average, a constant fraction of items is excluded on each cycle.

The dependence of recognition time on set size provides a simple explanation for why contextual information can speed recognition time, as in "semantic priming," (Lorch et al. 1986Go; Neely et al. 1989Go; see Neely 1991Go for a review.). An example of such priming is shown in Fig. 4A. When a contextual cue is given that narrows the range of possible words to animal words, the recognition of animal words is speeded. The reason is simply that whenever the initial set of possible words is reduced, there is a consequent reduction in the number of cycles required to exclude all but the one presented. Our model is also able to account for why letter strings statistically less similar to word patterns are classified as nonwords more quickly than letter strings more similar to word patterns (Table 1). Taken together, these results show that several fundamental properties of word recognition are captured by the model. Because the flow of information is bidirectional, the architecture straightforwardly integrates high- and low-level information and uses this information to control the movement of attention in an orderly way.

Neural plausibility of the model

One objection to any serially organized process could be that it takes too long to be a realistic component of a fast recognition process. The available data indicate that covert attention (without eye movements) can be moved about once every 20–30 ms (i.e., 33–50 times/s.) (Horowitz and Wolfe 1998Go). Thus if it takes an average of 4.9 cycles for recognition of a four letter word to occur, the time required is 100–150 ms. This speed is in reasonable agreement with what is known about the speed of recognition during reading: the average reader makes about one saccade per word, bringing it into the fovea for a fixation that lasts ~250 ms. (Starr and Rayner 2001Go). There is thus ~250-ms processing time in cortex between the arrival of information about sequential words. These results are thus compatible with the idea that word recognition could involve five covert attentional shifts, each requiring 20–30 ms. It should be noted that the time required for specific item recognition is somewhat longer than that required for the simpler task of two-choice categorization (Gleitman and Jonides 1976Go; Jonides and Gleitman 1976Go; Thorpe et al. 1996Go).

A second related objection is that information flow between cortical regions may be too slow to allow T-D/B-U processing within a 25-ms iterative cycle. Relevant to this issue is the recent measurement of individual steps in cortical transmission. It has been shown that the time it takes for information to travel between cortical areas is <2 ms (Domenici et al. 1995Go; Movshon and Newsome 1996Go; Girard et al. 2001Go). When feedforward information arrives at a cortical area, it is generally received by layer 4 cells, which then excite layer 2/3 cells. It is these cells that in turn, can transmit information back to lower areas or up to higher areas. Using paired recording, the time for information transmission from layer 4 to layer 2/3 cells was measured as <2 ms (Silver et al. 2001Go). The short range of these times indicates that substantial T-D/B-U cortical processing can occur in 25 ms. If it is assumed that it takes ~100 ms for visual signals to reach V1 after a word is displayed, followed by five cortical cycles at 25 ms/cycle, which would take 125 ms, the total time to recognition would be 225 ms (it should also not be excluded that important bidirectional processing occurs within a given region of cortex; indeed simple and complex cells are found in V1.)

A third objection is that additional time or cycles may be required to perform other computations not discussed here, for example to transform the image, e.g., to correct for angle or skew. Although the experimental data that we have sought to explain do not deal with transformed images, it should be noted that in experiments dealing with recognition of rotated objects, there is in fact an increase in the time required for recognition (Shepard and Metzler 1971Go).

A fourth objection is that while T-D may be important in setting context, recognition can then proceed using purely B-U processing, with no updating by T-D information. Several counterarguments can be given: 1) the physiological data point to that fact that during recognition there is activity in the cells that give rise to T-D connections. It thus seems conservative to argue that this T-D flow has functional consequences. 2) Recognition can be shaped by rapid changes in context (e.g., instructions) that occur too fast to shape B-U processing by changing connections. 3) In a purely feedforward system, attention must be held hostage to salient environmental cues and cannot therefore be directed by high level constraints that could direct attention to important, but low-saliency information. This type of task can be performed. And 4) a SAA that moves attention without considering T-D information ("random SAA") is inefficient compared with SAAs that do.

The T-D/B-U signal flow in our model is organized into cycles of ~25 ms that might be detectable as oscillations. Brain oscillations in the gamma range (30–80 Hz) might reflect the cyclical organization of T-D/B-U processing. Such oscillations have been observed during the recognition process (Garrett et al. 2000Go; Lutzenberger et al. 1994Go) and have been linked to attention (Fries et al. 2001Go). It has generally been thought that gamma oscillations have a different role, specifically as an organizer of a binding process (Engel and Singer 2001Go), but some evidence against this has emerged (Lamme and Spekreijse 1998Go). Experiments that distinguish between these functional roles are needed. If indeed gamma oscillations are indicative of cyclical T-D/B-U processing, their amplitude in lower level areas should be reduced by inactivation of higher level areas.

An important question that relates to the current model is whether it could be implemented by plausible cellular and network mechanisms. Most aspects of the model utilize standard neural network principles and do not require extensive comment. For instance, the bottom-up formation of feature cells could be as in the Hubel and Weisel model (Reid 2001Go). Similarly, the B-U process by which feature nodes excite letter nodes and letter nodes excite word nodes depends on standard linear synaptic interactions, as do the inverse T-D processes. The types of B-U and T-D parallel processing required are straightforward to implement in parallel by suitably connected networks and are not computationally expensive. Finally, mechanisms based on reverberatory processes can make neurons bistable (Wang 2001Go); such processes could keep word nodes active until activity is quenched by inhibition from word nodes that receive B-U support. The decreased number of active word nodes would make the remaining ones receive less lateral inhibition and thus achieve a higher firing rate. This could be the basis of a normalization process at the word level. The recently discovered large network of electrically coupled cortical interneurons (Beirlein et al. 2000) is a potential mechanism for providing the inhibition required to perform this computation.

The gating cells (Fig. 1) that select the site to which attention is moved (e.g., a feature that is there but has low T-D probability) may function according to simple neural mechanisms. Mismatch can be computed by gating cells that are excited if the feature is present and inhibited in proportion to the probability of that feature as determined by T-D processing. The maximum mismatch can be found by an oscillatory winner-take-all mechanism (Lisman 1998Go) based on negative feedback inhibition from a global network of interneurons (Beierlein et al. 2000Go). Specifically, as global inhibition wanes on each cycle, the cell with the greatest mismatch fires first and then inhibits all the others. Winner-take-all process have been previously implicated in visual search, as first proposed by Koch and Ullman (1985Go).

Relationship to physiological data

The model makes several predictions about neurons in higher cortical regions that are critical for recognition. As shown in Fig. 4B, the word node that represents the displayed item will gradually increase its probability as the sampling process excludes other words. On the assumption that probability would be represented by the firing rate of a neuron, the firing rate would be expected to gradually rise during recognition. Consistent with this, neurons in the inferior temporal cortex, a high-level region critically involved in item recognition, gradually increase their firing rate >150 ms during the recognition process (Chelazzi et al. 1998Go; Desimone 1998Go). Furthermore both the model and data show that cells not well tuned to the stimulus either immediately undergo a decrease in firing rate or initially increase their firing rate along with the well tuned cells but then show a delayed drop. According to the model, the delay at which this divergence occurs should increase, the more similar the cell's tuning to the stimulus. This has not yet been examined in temporal cortex but has been seen in frontal cortex neurons (Bichot et al. 1999Go). A final prediction is that the low basal firing rates of neurons (before a stimulus) should not be considered noise but rather a representation of the low but finite probability of a contextually possible item. According to this view, if contextual information lowers the probability of the object represented by the cell, the cell's firing rate should go down (Fig. 4A). Such effects of context on baseline firing have been observed in temporal cortex (reviewed in Desimone 1998Go).

Perception as a construction of a model of the stimulus

The "figural synthesis" model of perception (Neisser 1967Go) was inspired by Hebb's (Hebb 1949Go) comparison of the perceiver with a paleontologist who extracts a few bones from a mass of irrelevant rubble and "reconstructs" the dinosaur. In the "figural synthesis model," Neisser proposed that focal attention involves a similar sparse sampling. The perceptual object is then reconstructed from this limited sampling and information stored in long-term memory. Similarly, in our model, recognition occurs gradually as the number of sampled and inferred features increases until eventually there is a perfect correspondence between features constructed by T-D computations and the word itself (Fig. 3, 5). An extreme case occurs in the situation where contextual information provided by previous words indicates the single word that is likely to come next. The T-D excitation will create a "model" of the word and if no mismatch is detected, the model can be determined to be correct, a process that occurs in only a single cycle.

The relationship of the T-D-computed model to perception may also be relevant when considering how nontarget words are perceived during rapid visual search through lists. Nontargets with low approximation to English are classified more rapidly than nontargets that are words (Graboi 1974Go), an effect reproduced in the model (Table 1). In such experiments, subjects report that whereas targets are clearly perceived, nontargets appear as a blur but are nevertheless correctly rejected (Briggs and Blaha 1969Go; Cavanaugh and Chase 1971Go; Gould and Carn 1973Go; Graboi 1971Go; Neisser 1963Go; Neisser et al. 1963Go;). We have found that during the progressive exclusion of nonwords there is often a direct transition from a state with several remaining words to a state with no remaining words. Thus the last feature probability landscape is the superposition of several still possible words, which may give rise to the perception of a "blur."

Extending the model

A simplification of the current model is that all words are assumed to have equal probability. More efficient recognition of real text would be possible if the model were modified to take word frequency into account. It is important to emphasize that we have modeled the worst-case assumption in which the window of attention is wide enough for only a single feature. Recognition might be speeded significantly if this window is widened. One way to effectively widen the window would be to pick the most informative feature in each letter subframe and then process the four selected features in parallel (cf., Phaf et al. 1990Go) ("heterarchical processing"). If our model was modified in this way, the information acquisition would be increased from 2 to 8 bits/cycle, i.e., processing speed would be increased by a factor of 64. However, there is clearly a limit to how wide the window can be made. In general, the small aperture of a single-feature window has the virtue of high selectivity; this results in improved noise immunity. We emphasize that many of the difficulties associated with real-world vision are not dealt with in our model. For instance, if visual noise was present, it would be rapidly detected by our algorithm since it detects features that are there but not expected in the context of the current set; this would lead incorrectly to rejection of all possible words. One modification of the SAA that would minimize such errors would be to require that the T-D probability exceed a low threshold value. Thus features that are not contextually relevant and have zero or very low probability could be present but not attract attention (in this sense attention to specific features is "filtered" by high level contextual constraints). A second difficulty with real world vision is missing or occluded features. In this regard, the unidirectional SAA, T-D[lowest P]and B-U[There] seems preferable because it does not select missing features and thus could produce recognition even when some features are occluded. Other models of recognition have successfully shown how hierarchically organized feed-forward neural networks can deal with some of the difficult problems of position and scale invariance (Fukushima 1986Go; Mel et al. 1998Go; Olshausen et al. 1993Go; Riesenhuber and Poggio 1999Go). It would seem useful to seek hybrid models that combine some of the power of feedforward processing with the T-D control of attention described here.


 DISCLOSURES
 
 TOP
 ABSTRACT
 INTRODUCTION
 RESULTS
 DISCUSSION
 DISCLOSURES
 ACKNOWLEDGMENTS
 REFERENCES
 
We gratefully acknowledge the support of the Sloan and Swartz Center for Theoretical Neurobiology at Brandeis University, National Institute of Mental Health Grant P50 MH-60450-01A1, and the W. M. Keck Foundation.


 ACKNOWLEDGMENTS
 
 TOP
 ABSTRACT
 INTRODUCTION
 RESULTS
 DISCUSSION
 DISCLOSURES
 ACKNOWLEDGMENTS
 REFERENCES
 
We thank A. Kepecs, S. Raghavachari, X.-J. Wang, D. Abadi, M. Idart, D. Pollen, and B. Millar for comments on the manuscript. We thank the following colleagues for useful discussions on the topics dealt with in this paper: J. Wolfe, P. Cavanagh, E. Miller, C. Koch, E. Niebur, J. Schall, J. Reynolds, P. Dayan, M. Cohn, and D. Ballard.


 FOOTNOTES
 
The costs of publication of this article were defrayed in part by the payment of page charges. The article must therefore be hereby marked "advertisement" in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.

Address for reprint requests: D. Graboi, 1314 Desert Rose Way, Encinitas, CA 92024 (E-mail: dgraboi{at}cts.com).


 REFERENCES
 
 TOP
 ABSTRACT
 INTRODUCTION
 RESULTS
 DISCUSSION
 DISCLOSURES
 ACKNOWLEDGMENTS
 REFERENCES
 
Bahcall DO and Kowler E. Attentional interference at small spatial separations. Vision Res 39: 71–86, 1999.[Web of Science][Medline]

Barone P, Batardiere A, Knoblauch K, and Kennedy H. Laminar distribution of neurons in extrastriate areas projecting to visual areas V1 and V4 correlates with the hierarchical rank and indicates the operation of a distance rule. J Neurosci 20: 3263–3281, 2000.[Abstract/Free Full Text]

Beierlein M, Gibson JR, and Connors BW. A network of electrically coupled interneurons drives synchronized inhibition in neocortex. Nat Neurosci 3: 904–910, 2000.[Web of Science][Medline]

Bichot NP, Cave KR, and Pashler H. Visual selection mediated by location: feature-based selection of noncontiguous locations. Percept Psychophys 61: 403–423, 1999.[Web of Science][Medline]

Briggs GE and Blaha J. Memory retrieval and central comparison times in information processing. J Exp Psychol 79: 395–402, 1969.[Web of Science]

Broadbent D and Broadbent MH. Human attention: the exclusion of distracting information as a function of real and apparent separation of relevant and irrelevant events. Proc R Soc Lond B Biol Sci 242: 11–16, 1990.[Medline]

Burrows D and Okada R. Memory retrieval from long and short lists. Science 188: 1031–1032, 1975.[Abstract/Free Full Text]

Campbell FW. How much of the information falling on the retina reaches the visual cortex and how much is stored in the visual memory? In: Pattern Recognition Mechanisms, edited by Chagas C and Gross C. Berlin, Germany: Springer, 1985, p. 83–95.

Caputo G and Guerra S. Attentional selection by distractor suppression. Vision Res 38: 669–689, 1998.[Web of Science][Medline]

Cauller LJ and Kulics AT. The neural basis of the behaviorally relevant N1 component of the somatosensory-evoked potential in SI cortex of awake monkeys: evidence that backward cortical projections signal conscious touch sensation. Exp Brain Res 84: 607–619, 1991.[Web of Science][Medline]

Cavanaugh JP and Chase WG. The equivalence of target and nontarget processing in visual search. Percept Psychophys 9: 493–495, 1971.

Cave KR. The FeatureGate model of visual selection. Psychol Res 62: 182–194, 1999.[Web of Science][Medline]

Chelazzi L, Duncan J, Miller EK, and Desimone R. Responses of neurons in inferior temporal cortex during memory-guided visual search. J Neurophysiol 80: 2918–2940, 1998.[Abstract/Free Full Text]

Chun MM. Visual attention. In: Blackwell's Handbook of Perception, edited by Goldstein EB. Oxford, UK: Blackwell, 2001, p. 272–310.

Cohn M. On the computational complexity of a vision task [Online]. Brandeis Computer Science Technical Report CS-02-229. (http://www.cs.brandeis.edu/~marty) [2002, July].

Dayan P and Abbott LF. Entropy. In: Theoretical Neuroscience - Computational and Mathematical Modeling of Neural Systems, edited by Sejnowski TJ and Poggio T. Cambridge, MA: The MIT Press, 2001, p. 124.

Dayan P, Hinton GE, Neal RM, and Zemel RS. The Helmholtz machine. Neural Comput 7: 889–904, 1995.[Web of Science][Medline]

Desimone R. Visual attention mediated by biased competition in extrastriate visual cortex. Philos Trans R Soc Lond B Biol Sci 353: 1245–1255, 1998.[Abstract/Free Full Text]

Domenici L, Harding GW, and Burkhalter A. Patterns of synaptic activity in forward and feedback pathways within rat visual cortex. J Neurophysiol 74: 2649–2664, 1995.[Abstract/Free Full Text]

Egeth HE and Yantis S. Visual attention: control, representation, and time course. Annu Rev Psychol 48: 269–297, 1997.[Web of Science][Medline]

Engel AK and Singer W. Temporal binding and the neural correlates of sensory awareness. Trends Cognit Sci 5: 16–25, 2001.[Web of Science][Medline]

Felleman DJ and Van Essen DC. Distributed hierarchical processing in the primate cerebral cortex. Cereb Cortex 1: 1–47, 1991.[Abstract/Free Full Text]

Fries P, Reynolds JH, Rorie AE, and Desimone R. Modulation of oscillatory neuronal synchronization by selective visual attention. Science 291: 1560–1563, 2001.[Abstract/Free Full Text]

Fukushima K. A neural network model for selective attention in visual pattern recognition. Biol Cybern 55: 5–15, 1986.[Web of Science][Medline]

Garrett AS, Flowers DL, Absher JR, Fahey FH, Gage HD, Keyes JW, Porrino LJ, and Wood FB. Cortical activity related to accuracy of letter recognition. Neuroimage 11: 111–123, 2000.[Web of Science][Medline]

Girard P, Hupe JM, and Bullier J. Feedforward and feedback connections between areas V1 and V2 of the monkey have similar rapid conduction velocities. J Neurophysiol 85: 1328–1331, 2001.[Abstract/Free Full Text]

Gleitman H and Jonides J. The cost of categorization in visual search: incomplete processing of targets and field items. Percept Psychophys 20: 281–288, 1976.

Gould JD and Carn R. Visual search, complex backgrounds, mental counters, and eye movements. Percept Psychophys 14: 125–132, 1973.

Graboi DG. Searching for targets: the effects of specific practice. Percept Psychophys 10: 300–304, 1971.[Medline]

Graboi DG. The effects of physical shape and meaning on the rate of visual search. Pattern Recognition Mechanisms. (PhD thesis). Department of Psychology. San Diego, CA: University of California, San Diego 1974.

Grossberg S. Linking the laminar circuits of visual cortex to visual perception: development, grouping, and attention. Neurosci Behav Rev 25: 513–526, 2001.[Web of Science][Medline]

Hebb DO. The Organization of Behavior: A Neuropsychological Theory. New York: Wiley, 1949.

Horowitz TS and Wolfe JM. Visual search has no memory. Nature 394: 575–577, 1998.[Medline]

Hupe JM, James AC, Girard P, Lomber SG, Payne BR, and Bullier J. Feedback connections act on the early part of the responses in monkey visual cortex. J Neurophysiol 85: 134–145, 2001.[Abstract/Free Full Text]

Itti L and Koch C. Computational modeling of visual attention. Nat Rev Neurosci 2: 194–203, 2001.[Web of Science][Medline]

Jonides J and Gleitman H. The benefit of categorization in visual search. Target location without identificiation. Percept Psychophys 20: 289–298, 1976.

Kastner S and Ungerleider LG. Mechanisms of visual attention in the human cortex. Annu Rev Neurosci 23: 315–341, 2000.[Web of Science][Medline]

Koch C and Ullman S. Shifts in selective visual attention: towards the underlying neural circuitry. Hum Neurobiol 4: 219–227, 1985.[Web of Science][Medline]

Lamme VA and Roelfsema PR. The distinct modes of vision offered by feedforward and recurrent processing. Trends Neurosci 23: 571–579, 2000.[Web of Science][Medline]

Lamme VA and Spekreijse H. Neuronal synchrony does not represent texture segregation. Nature 396: 362–366, 1998.[Medline]

Lee TS, Mumford D, Romero R, and Lamme VA. The role of the primary visual cortex in higher level vision. Vision Res 38: 2429–2454, 1998.[Web of Science][Medline]

Lerner Y, Hendler T, Ben-Bashat D, Harel M, and Malach R. A hierarchical axis of object processing stages in the human visual cortex. Cereb Cortex 11: 287–297, 2001.[Abstract/Free Full Text]

Lisman J. What makes the brain's tickers tock. Nature 394: 132–133, 1998.[Medline]

Lorch RF Jr, Balota DA, and Stamm EG. Locus of inhibition effects in the priming of lexical decisions: pre- or postlexical access? Mem Cognit 14: 95–103, 1986.[Web of Science][Medline]

Lutzenberger W, Pulvermuller F, and Birbaumer N. Words and pseudowords elicit distinct patterns of 30-Hz EEG responses in humans. Neurosci Lett 176: 115–118, 1994.[Web of Science][Medline]

Mackworth NH and Morandi AJ. The gaze selects informative detail within pictures. Percept Psychophys 547–552, 1967.

Martinez A, Anllo-Vento L, Sereno MI, Frank LR, Buxton RB, Dubowitz DJ, Wong EC, Hinrichs H, Heinze HJ, and Hillyard SA. Involvement of striate and extrastriate visual cortical areas in spatial attention. Nat Neurosci 2: 364–369, 1999.[Web of Science][Medline]

McClelland JL and Rumelhart DE. An interactive activation model of context effects in letter perception. I. An account of basic findings. Psychol Rev 88: 375–407, 1981.[Web of Science]

Mel BW, Ruderman DL, and Archie KA. Translation-invariant orientation tuning in visual "complex" cells could derive from intradendritic computations. J Neurosci 18: 4325–4334, 1998.[Abstract/Free Full Text]

Moran J and Desimone R. Selective attention gates visual processing in the extrastriate cortex. Science 229: 782–784, 1985.[Abstract/Free Full Text]

Motter BC. Neural correlates of feature selective memory and pop-out in extrastriate area V4. J Neurosci 14: 2190–2199, 1994.[Abstract]

Movshon JA and Newsome WT. Visual response properties of striate cortical neurons projecting to area MT in macaque monkeys. J Neurosci 16: 7733–7741, 1996.[Abstract/Free Full Text]

Mozer MC and Sitton M. Computational modeling of spatial attention. In: Attention, edited by Pashler H. East Sussex, UK: Psychology Press, 1998, p. 341–393.

Nakayama K. The Iconic Bottleneck and the tenuous link between early visual processing and perception. In: Vision: Coding and Efficiency, edited by Blakemore C. Cambridge, UK: Cambridge Univ. Press, 1991, p. 411–422.

Neely JH. Semantic priming effects in visual word recognition: a selective review of current findings and theories. In: Basic Processes In Reading, edited by Besner D and Humphreys GW. Hillsdale, NJ: L. Erlbaum, 1991, p. 264–336.

Neely JH, Keefe DE, and Ross KL. Semantic priming in the lexical decision task: roles of prospective prime-generated expectancies and retrospective semantic matching. J Exp Psychol Learn Mem Cogn 15: 1003–1019, 1989.[Web of Science][Medline]

Neisser U. Factors in the processing of visual stimulation. Am J Psychol 76: 376–385, 1963.[Web of Science]

Neisser U. Cognitive Psychology. New York: Appleton Century Crofts, 1967.

Neisser U, Novick R, and Lazar R. Searching for ten targets simultaneously. Percept Mot Skills 17: 955–961, 1963.[Web of Science][Medline]

Nissen MJ. Accessing features and objects: is location special? In: Attention and Performance, edited by Posner MI and Marin OSM. Hillsdale, NJ: Erlbaum, 1985, vol. XI, p. 205–219.

O'Connor DH, Fukui MM, Pinsk MA, and Kastner S. Attention modulates responses in the human lateral geniculate nucleus. Nat Neurosci 5: 1203–1209, 2002.[Web of Science][Medline]

Olshausen BA, Anderson CH, and Van Essen DC. A neurobiological model of visual attention and invariant pattern recognition based on dynamic routing of information. J Neurosci 13: 4700–4719, 1993.[Abstract]

Phaf RH, Van der Heijden AH, and Hudson PT. SLAM: a connectionist model for attention in visual selection tasks. Cogn Psychol 22: 273–341, 1990.[Web of Science][Medline]

Posner MI, Snyder CR, and Davidson BJ. Attention and the detection of signals. J Exp Psychol 109: 160–174, 1980.[Web of Science][Medline]

Potter MC. Short-term conceptual memory for pictures. J Exp Psychol Hum Learn 2: 509–522, 1976.[Medline]

Rao RP and Ballard DH. Predictive coding in the visual cortex: a functional interpretation of some extra-classical receptive-field effects. Nat Neurosci 2: 79–87, 1999.[Web of Science][Medline]

Reid RC. Divergence and reconvergence: multielectrode analysis of feedforward connections in the visual system. Prog Brain Res 130: 141–154, 2001.[Medline]

Reinagel P and Zador AM. Natural scene statistics at the center of gaze. Network 10: 341–350, 1999.[Web of Science][Medline]

Ress D, Backus BT, and Heeger DJ. Activity in primary visual cortex predicts performance in a visual detection task. Nat Neurosci 3: 940–945, 2000.[Web of Science][Medline]

Riesenhuber M and Poggio T. Hierarchical models of object recognition in cortex. Nat Neurosci 2: 1019–1025, 1999.[Web of Science][Medline]

Rockland KS and Pandya DN. Cortical connections of the occipital lobe in the rhesus monkey: interconnections between areas 17, 18, 19, and the superior temporal sulcus. Brain Res 212: 249–270, 1981.[Web of Science][Medline]

Ross J. Extended practice with a single-character classification task. Percept Psychophys 8: 276–278, 1970.

Rumelhart DE. A multicomponent theory of confusion among briefly exposed alphabetic characters. Center for Human Information Technical Report #22, University of California, San Diego 1–29, 1971.

Rumelhart DE and Siple P. Process of recognizing tachistoscopically presented words. Psychol Rev 81: 99–118, 1974.[Web of Science][Medline]

Rumelhart DE and McClelland JL. An interactive activation model of context effects in letter perception. II. The contextual enhancement effect and some tests and extensions of the model. Psychol Rev 89: 60–94, 1982.[Web of Science][Medline]

Salin PA and Bullier J. Corticocortical connections in the visual system: structure and function. Physiol Rev 75: 107–154, 1995.[Free Full Text]

Schill PA, Umkehrer E, Beinlich S, Krieger G, and Zetzsche C. Scene analysis with saccadic eye movements: top-down and bottom-up modeling. J Electron Imag 10: 152–160, 2001.

Shepard RN and Metzler J. Mental rotation of three-dimensional objects. Science 171: 701–703, 1971.[Abstract/Free Full Text]

Silver RA, Lubke J, Sakmann B, and Feldmeyer D. Quantaland Anatomical Properties of Layer IV-Layer II/III Synapses in Rat Barrel Cortex. Society for Neuroscience Annual Meeting. San Diego, California 2001.

Smith AT, Singh KD, Williams AL, and Greenlee MW. Estimating receptive field size from fMRI data in human striate and extrastriate visual cortex. Cereb Cortex 11: 1182–1190, 2001.[Abstract/Free Full Text]

Snyder CR. Selection, inspection, and naming in visual search. J Exp Psychol 92: 428–431, 1972.[Web of Science][Medline]

Starr MS and Rayner K. Eye movements during reading: some current controversies. Trends Cogn Sci 5: 156–163, 2001.[Web of Science][Medline]

Tanaka K. Inferotemporal cortex and object vision. Annu Rev Neurosci 19: 109–139, 1996.[Web of Science][Medline]

Thorpe S, Fize D, and Marlot C. Speed of processing in the human visual system. Nature 381: 520–522, 1996.[Medline]

Tomita H, Ohbayashi M, Nakahara K, Hasegawa I, and Miyashita Y. Top-down signal from prefrontal cortex in executive control of memory retrieval. Nature 401: 699–703, 1999.[Medline]

Tononi G, Sporns O, and Edelman GM. Reentry and the problem of integrating multiple cortical areas: simulation of dynamic integration in the visual system. Cereb Cortex 2: 310–335, 1992.[Abstract/Free Full Text]

Tootell RB, Hadjikhani N, Hall EK, Marrett S, Vanduffel W, Vaughan JT, and Dale AM. The retinotopy of visual spatial attention. Neuron 21: 1409–1422, 1998.[Web of Science][Medline]

Treisman AM and Gelade G. A feature-integration theory of attention. Cogn Psychol 12: 97–136, 1980.[Web of Science][Medline]

Treisman A and Souther J. Search asymmetry: a diagnostic for preattentive processing of separable features. J Exp Psychol Gen 114: 285–310, 1985.[Web of Science][Medline]

Tsal Y and Lavie N. Attending to color and shape: the special role of location in selective visual processing. Percept Psychophys 44: 15–21, 1988.[Web of Science][Medline]

Tsotsos JK. Analyzing vision at the complexity level. Behav Brain Sci 13: 423–469, 1990.[Web of Science]

Tsotsos JK, Culhane SM, Wai WYK, Lai Y, Davis N, and Nuflo F. Modeling visual attention via selective tuning. Artif Intelligence 78: 507–545, 1995.

Ullman S. Sequence seeking and counter streams: a computational model for bidirectional information flow in the visual cortex. Cereb Cortex 5: 1–11, 1995.[Abstract/Free Full Text]

Van Essen DC, Olshausen BA, and Gallant JL. Pattern recognition, attention and information bottlenecks in the primate visual system. In: Proceedings of the SPIE Conference on Visual Information Processing: From Neurons to Chips, edited by Mathur BP and Koch C. Burlingame, WA: SPIE, 1991, p. 17–28.

Vanduffel W, Tootell RB, and Orban GA. Attention-dependent suppression of metabolic activity in the early stages of the macaque visual system. Cereb Cortex 10: 109–126, 2000.[Abstract/Free Full Text]

Wang XJ. Synaptic reverberation underlying mnemonic persistent activity. Trends Neurosci 24: 455–463, 2001.[Web of Science][Medline]

Ward R, Duncan J, and Shapiro K. The slow time course of visual attention. Cogn Psychol 30: 79–109, 1996.[Web of Science][Medline]

Wolfe JM, Alvarez GA, and Horowitz TS. Attention is fast but volition is slow. Nature 406: 691, 2000.[Medline]

Woodman GF and Luck SJ. Electrophysiological measurement of rapid shifts of attention during visual search. Nature 400: 867–869, 1999.[Medline]




This article has been cited by other articles:


Home page
J. Neurosci.Home page
J. Jacobs and M. J. Kahana
Neural Representations of Individual Stimuli in Humans Revealed by Gamma-Band Electrocorticographic Activity
J. Neurosci., August 19, 2009; 29(33): 10203 - 10214.
[Abstract] [Full Text] [PDF]


Home page
BrainHome page
J. C. Snow and J. B. Mattingley
Goal-driven selective attention in patients with right hemisphere lesions: how intact is the ipsilesional field?
Brain, January 1, 2006; 129(1): 168 - 181.
[Abstract] [Full Text] [PDF]


This Article
Right arrow Abstract Freely available
Right arrow Full Text (PDF)
Right arrow All Versions of this Article:
90/2/798    most recent
00777.2002v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Right arrow Citation Map
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via Web of Science (5)
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Graboi, D.
Right arrow Articles by Lisman, J.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Graboi, D.
Right arrow Articles by Lisman, J.


HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS
Visit Other APS Journals Online
Copyright © 2003 by the The American Physiological Society.