Ventral and dorsal visual pathways perform fundamentally different functions. The former is involved in object recognition, whereas the latter carries out spatial localization of stimuli and visual guidance of motor actions. Despite the association of the dorsal pathway with spatial vision, recent studies have reported shape selectivity in the dorsal stream. We compared shape encoding in anterior inferotemporal cortex (AIT), a high-level ventral area, with that in lateral intraparietal cortex (LIP), a high-level dorsal area, during a fixation task. We found shape selectivities of individual neurons to be greater in anterior inferotemporal cortex than in lateral intraparietal cortex. At the neural population level, responses to different shapes were more dissimilar in AIT than LIP. Both observations suggest a greater capacity in AIT for making finer shape distinctions. Multivariate analyses of AIT data grouped together similar shapes based on neural population responses, whereas such grouping was indistinct in LIP. Thus in a first comparison of shape response properties in late stages of the two visual pathways, we report that AIT exhibits greater capability than LIP for both object discrimination and generalization. These differences in the two visual pathways provide the first neurophysiological evidence that shape encoding in the dorsal pathway is distinct from and not a mere duplication of that formed in the ventral pathway. In addition to shape selectivity, we observed stimulus-driven cognitive effects in both areas. Stimulus repetition suppression in LIP was similar to the well-known repetition suppression in AIT and may be associated with the “inhibition of return” memory effect observed during reflexive attention.
The central visual system includes several dozen visually responsive cortical areas, which may be placed in a hierarchy based on the pattern of their anatomical connections (Felleman and Van Essen 1991). These visual areas can be divided into two basic streams, a dorsal occipitoparietal pathway and a ventral occipitotemporal pathway (Fig. 1), based originally on the patterns of behavioral deficits observed after brain lesions in both humans and monkeys (Farah 2004; Ungerleider and Mishkin 1982). The functional distinction between dorsal and ventral pathways can be defined in terms of a “what/where” dichotomy. The ventral, or “what,” pathway is critical for object recognition and stimulus identification with respect to parameters such as shape and color, whereas the dorsal pathway, or “where,” pathway provides information about the spatial location of the stimulus (Ungerleider and Haxby 1994; Ungerleider and Mishkin 1982). Goodale and colleagues, although retaining basically the same distinction between object recognition and visuospatial processing, have provided a different slant to the difference between dorsal and ventral streams, defining it in terms of a “perception/action” dichotomy (Goodale and Haffenden 2003; Goodale and Milner 1992; Goodale and Westwood 2004). Under this interpretation, the ventral pathway is the “perception” pathway, building up a representation of the world that can be used for various cognitive operations such as object recognition and memory, whereas the dorsal pathway is the “action” pathway, serving to provide real-time visual guidance for motor actions such as eye movements or grasping objects by hand.
A possible consequence of the distinction between dorsal and ventral visual processing is that they may extract different information about stimulus shape, because that information will be used for different purposes. Some evidence suggesting that such may be the case comes from a functional MRI (fMRI) brain imaging study of human subjects that showed greater generalization for stimulus shape in the ventral pathway than in the dorsal pathway (James et al. 2002). Also, indirect evidence from a human psychophysical study indicated differences in the shape processing strategies of dorsal and ventral pathways (Ganel and Goodale 2003).
In this study, we examine this issue at the level of single-unit neurophysiology. The approach here is to compare shape selectivity in neurons in a high-level ventral area, the anterior inferotemporal cortex (AIT, generally corresponding to area TE), with shape selectivity in a high-level dorsal area, the lateral intraparietal cortex (LIP). Sereno and Maunsell (1998) have previously shown that shape selectivity is indeed a property of macaque monkey LIP neurons. Here we extend that work with additional data comparing responses in LIP and AIT using the same set of simple two-dimensional shapes.
Anterior inferotemporal cortex is the highest predominantly visual area within the ventral pathway. It is believed to be involved in visual object recognition (Tanaka 1996) and visual memory of objects and patterns (Miyashita 1993). As a late stage within the ventral pathway, it shows high selectivity to complex object shapes (Desimone 1984; Kobatake and Tanaka 1994; Rolls and Tovee 1995; Tamura and Tanaka 2001). Paradoxically, neurons in AIT are also involved in visual categorization (Freedman et al. 2003; Hung et al. 2005; Sigala and Logothetis 2002; Vogels 1999), which requires neurons to generalize across stimuli. The high selectivity required for object identification accompanied by an ability to generalize required for object categorization together define two key aspects of the problem of object recognition.
The lateral intraparietal cortex is a high-level area within the dorsal pathway concerned with issues related to eye movements (Colby and Goldberg 1999). Depending on the behavioral context, LIP neurons show mixed response components to the motor, sensory, mnemonic, attentional, and intentional aspects of the task (Colby and Goldberg 1999; Sereno and Amador 2006). However, they can respond briskly to flashed visual stimuli under a simple fixation task (Colby et al. 1996; Robinson et al. 1978; Sereno and Maunsell 1998), indicating that visual input in the absence of any actual or anticipated motor action is sufficient to drive them. Sereno and Maunsell (1998) have shown shape selectivity in LIP to two-dimensional patterns with single-unit recordings in behaving monkeys. Cue invariant responses to three-dimensional shapes were observed by Sereno et al. (2002) in anesthetized monkey using fMRI, and three-dimensional shape selectivity was shown by Shikata et al. (1996) and Nakamura et al. (2001) with single-unit recordings in behaving monkeys.
If one accepts the view that an important function of visual processing in the dorsal pathway is the guidance of skilled action, shape selectivity within parietal structures is not surprising, but occurs as a necessary step in directing appropriate motor responses to a given visual target. The question of interest here is whether shape encoding within the dorsal pathway has similar characteristics to shape encoding in the ventral pathway. Possibly, given the different functionality of the two pathways, different shape information is extracted. Alternatively, the same shape information may serve both pathways, being communicated along the strong neuroanatomical projections known to connect them (Webster et al. 1994).
Animals and surgical procedures
Two male macaque monkeys (Macaca mulatta, 10 kg, identified as monkey 1, and M. nemestrina, 8 kg, identified as monkey 2) were implanted with a head post and scleral search coil in an aseptic surgery. The animals were trained to perform a passive fixation task as well as three other behavioral tasks with these shape stimuli. Each animal had extensive daily training with these shape stimuli across a 6- to 12-mo period. After training, recording chambers were implanted. The chambers for LIP were implanted first and centered 3–5 mm posterior and 10–12 mm lateral, and the chambers for AIT were implanted after recording from LIP and were centered 18 mm anterior and 18–21 mm lateral. For all surgeries, animals were first restrained with ketamine and maintained on 1–3% isoflurane anesthetic. Throughout the surgical procedure, the animal was administered 5% dextrose in lactated Ringer solution, and the level of anesthesia was monitored and recorded. At the end of surgery and daily after surgery as needed, the animal was administered analgesics and antibiotics. All experimental protocols were approved by the University of Texas, Rutgers University, and Baylor College of Medicine Animal Welfare Committees and were in accordance with the National Institutes of Health Guidelines.
Tasks and procedures
Two macaque monkeys performed a passive fixation task with the stimulus shape positioned so that it fell within the receptive field. Recording procedures were identical in the two animals and the two areas. While the electrode was slowly advanced with a hydraulic micropositioner in search of cells, monkeys typically performed two matching tasks (see Sereno and Amador 2006 for details). In brief, we recorded from any neuron that we could isolate well and appeared stable. Before the start of data collection, preliminary testing was conducted to determine the stimulus position producing the strongest response, using a grid of locations over a range of stimulus eccentricities and angles within a polar coordinate system.
The stimulus for each trial in the fixation task was selected from a set of eight different shapes (Sereno and Maunsell 1998) (Table 1). Each shape was a simple two-dimensional black and white geometric form that fit within the same-sized square region and had an equal number of white pixels. All eight shapes were centered on the same position within the unit's receptive field. Stimulus size across the population of recorded units ranged from 0.65 to 2.00° of visual angle, with stimulus size increasing as a function of eccentricity to remain easily discriminable as acuity declined toward the periphery.
Within each trial during the data collection phase, one randomly selected shape was presented for between one and four repetitions (median = 4 for LIP and 3 for AIT) before a central fixation spot was extinguished. Stimulus duration was constant for each unit, ranging from 250 to 350 ms across the population of recorded units. Likewise, each blank interval following each stimulus repetition was of constant duration, ranging from 500 to 1,000 ms for different units. The animals were required to maintain fixation within 0.5° of the central 0.1° spot in the center of the video display throughout the trial. Eye position was monitored using the scleral search-coil method. The animals were rewarded for maintaining fixation on the central spot until it disappeared. There was a minimum of three trials presented for each shape (median = 6). Behavioral monitoring, eye position and spike sampling, and on-line data analysis were performed under computer control. Stimuli were presented on a 20-in computer monitor with a resolution of 1,152 × 864 pixels and a frame rate of 75 Hz, positioned 65 cm from the animal. While animals performed the fixation task, action potentials were recorded extracellularly with either transdural platinum/iridium (1–2 MΩ) or tungsten microelectrodes (1–2 MΩ; Microprobe).
In one animal, histological reconstruction using cresyl violet Nissl–stained samples (see Fig. 2) \. showed that the units recorded in posterior parietal cortex lay within area LIP in the lateral bank of the intraparietal sulcus. Units recorded in AIT were in the lower bank of the superior temporal sulcus (STS) and convexity of the middle temporal gyrus. A few perirhinal cells were included at our most anterior recording positions.
To determine which neurons were selective for shape, an F-test (ANOVA) was performed on the average rate of firing for the eight different shapes. Average firing rate was calculated starting 50 ms after stimulus until 50 ms after offset (collapsed across repetitions within and across trials). For these ANOVA tests of shape selectivity, a significance criterion level of P < 0.05 was used. Cells identified as shape selective in this manner were picked out for further analysis as detailed below, with response to stimuli always defined as average firing rate. This analysis and subsequent analyses pooled responses from multiple stimulus repetitions within each trial, with the exception of the calculations for the Fano factor described below.
SHAPE SELECTIVITY MEASURES.
Shape selectivity of each neuron was quantified by two measures. The first was the contrast measure of selectivity (1) with rmax and rmin representing the maximum and minimum responses of each cell to the eight stimulus shapes. SC took on values between 0.0 and 1.0, with larger values indicating higher shape selectivity. (We call it the “contrast” measure of selectivity caused by the mathematical form of Eq. 1, and not because of any relation to stimulus contrast.)
The second selectivity measure was the kurtosis of the probability density function of responses of each cell to the eight shapes (2) where ri was the cell's response to the ith shape, r̄ was the mean response over all shapes, σ is the SD of the responses, and 〈·〉 was the mean value operator (Lehky et al. 2005). Subtracting 3 normalized the measure so that a Gaussian distribution had a kurtosis of zero. Larger values of SK corresponded to greater shape selectivity.
We quantified the relative magnitude of signal and noise in the responses of AIT and LIP neurons. This was measured as the Fano factor F (3) where again ri was the cell's response to the ith shape. Mean and variance of the response was calculated from multiple presentations of the same shape during recording from a cell, across trials widely separated in time. Within each trial, only the first stimulus repetition was included, because otherwise, repetition suppression effects would have artificially elevated the Fano factor. We calculated a grand average value of F by taking the geometric mean across all shapes for all units selected by the ANOVA analysis. Results for shape stimuli that failed to elicit any response whatsoever in a particular neuron [Mean(ri) = 0.0] were excluded from the Fano factor grand average. This could happen if the spontaneous rate was very low or if the unit was suppressed for that particular shape.
In addition to looking at the responses of individual neurons to the shapes, we examined the collective population response of all neurons within our data set (either AIT or LIP) to each shape. Although each neuron was recorded one at a time, for the purpose of doing these population-coding analyses, we treated them as responding in parallel to the presentation of a stimulus.
We first determined the sparseness of the population representation for each shape (Lehky et al. 2005). This was calculated as the kurtosis of the probability density function of response magnitudes of all neurons within the population. Calculating population kurtosis (sparseness) used the same formula as the cell kurtosis (selectivity) above (4) but in this case, r had a different significance. Here, ri was the response of the ith neuron in the population to a given shape, and r̄ was the mean response of all neurons in a population to that shape. Under some theoretical interpretations (Field 1994; Simoncelli and Olshausen 2001), high sparseness corresponds to statistically efficient coding of visual stimuli, based on criteria derived from information theory.
For graphical purposes, the probability density functions (PDFs) of population responses were determined using kernel density estimation methods (Silverman 1986). Smoothing was carried out using a Gaussian kernel with a bandwidth of 3 spikes/s (half-width at half height). A separate PDF was calculated for each of the eight stimulus shapes and averaged together to produce the overall AIT or LIP population response PDF.
CLUSTER ANALYSIS AND MULTIDIMENSIONAL SCALING.
We further characterized population responses to different shapes by performing cluster analysis and multidimensional scaling. As a first step for both those procedures, it was necessary to define a distance or dissimilarity matrix, providing the value of some suitable scalar metric that indicated the difference between population responses to every pair of stimulus shapes. As that distance metric, d, we chose d = 1 − r, where r was the correlation coefficient between the components of two vectors (x1, x2, …, xn) and (y1, y2, …, yn) defining the population responses to two shapes. Here each vector component (xi or yi) represents the response of the ith neuron to a particular shape.
A correlation-based distance metric was chosen instead of Euclidean distance to emphasize the pattern of relative firing rates within a neural population rather than absolute differences. We regard different shapes as being coded by different directions of the population vector, and not the vector length, which may be affected by such factors as the contrast and luminance of shape stimuli. The Euclidean distance metric, and related metrics such as d′, suffer from the disadvantage that they do not distinguish between changes in vector length and vector direction. On the other hand, the correlation metric picks out just changes in vector direction and ignores length, which we regard as a desirable characteristic. For example, if population response vectors for two stimuli had identical directions and differed only in length, the Euclidean metric would report that as having nonzero distance (different shapes), whereas the correlation metric would report that as zero distance (the same shape).
We performed a hierarchical cluster analysis based on the distance matrix defined above. The cluster analysis was carried out using the single linkage (nearest neighbor) method. This procedure grouped together stimuli that produced population response patterns that showed high correlations. In other words, shapes that clustered together indicated that the population of neurons fired similarly (relative firing rates) to these shapes.
We also performed a classical multidimensional scaling (MDS) analysis based on the same distance matrices. This served to reduce the dimensionality of the space representing each stimulus shape from potentially up to n dimensions, where n is the neural population size (although less if given a limited data sample size), to a smaller number of dimensions that capture most of the variance in the data. Such a low dimensional representation allows easier visualization of patterns within the data. Furthermore, to the extent that MDS shows that it is possible to form low-dimensional neural representations of shape compatible with the data indicates a possible advantage from a computational perspective because that may reduce the computational load required for the representation and recognition of visual stimuli.
To better compare the MDS results for AIT and LIP, we carried out a Procrustes mapping (Borg and Groenen 1997) of the LIP configuration onto the AIT configuration. This involved finding a linear transform (scaling, rotation, translation, reflection) of the LIP configuration that minimized the mean square distance between the LIP points and AIT points. This procedure was carried out incorporating all dimensions generated by the MDS analysis that had positive eigenvalues (6 in AIT, 4 in LIP, plus 2 additional dimensions in LIP padded with 0s to match the dimensionality of AIT), with the results projected down to two dimensions for plotting purposes. Because we were interested in the relative configuration of points within the multidimensional shape space extracted by the MDS analysis and not their absolute locations, we used the Procrustes procedure to give an estimate of the degree of similarity between the shape encoding spaces used by AIT and LIP.
To determine the statistical significance of the goodness-of-fit value for the Procrustes mapping (mean square error), we performed a permutation test in which the goodness-of-fit value was repeatedly calculated for random permutations of the MDS configuration matrix. Random matrices were generated by permuting both the row and column indices of the original MDS matrix. The fraction of random matrices producing a goodness-of-fit measure better than the original observed value was calculated. If this fraction was greater than 0.05, it was concluded that the two inputs to the Procrustes mapping were not significantly different.
We also calculated the two-dimensional correlation coefficient between the two-dimensional MDS configuration for AIT and the corresponding Procrustes mapping for LIP (5) where Ā was the mean value of AIT coordinates of the stimulus shapes in the MDS space, L̄ was the mean value for LIP, m = 8 is the number of shapes, and n = 2 was the number of MDS dimensions included in the calculation.
CONTRIBUTION OF NOISE TO POPULATION SHAPE REPRESENTATIONS.
Differences in noise between the two cortical areas could artificially lead to differences in the dissimilarity matrices (i.e., greater noise, greater distances). Hence in addition to performing a Fano factor analysis of the responses, we split the data in half (even and odd trials) to directly estimate the contribution of noise to the dissimilarity matrix described above. We calculated the vector distance between even trials and odd trials for the same shape, which ideally should be zero. Because this procedure cut the volume of data in half for each vector from six to three trials, the contribution of noise was increased, leading to an overestimate of vector distances. We corrected for that taking into account the following two considerations. First, noise amplitude in the full-set data should be smaller than in the half-set by a factor of . Second, Monte Carlo simulations on a population of model neurons indicated that the (correlation-based) vector distance between noise-free and noise-degraded responses was approximately proportional to the square of the noise amplitude for small noise amplitudes. Combining those two factors, we estimated that vector distances in the full-set data would be one half those calculated from the half-set data.
We collected responses from a total of 103 AIT neurons (combined from 2 animals) and 76 LIP neurons (combined from 2 animals) to the eight shapes shown in Table 1. Histological localizations of the recording sites are shown in Fig. 2. Of these recorded neurons, 62/103 (60%) AIT neurons and 43/76 (57%) LIP neurons showed significant shape selectivity (P < 0.05) as indicated by an F-test (ANOVA). For AIT units showing significant shape selectivity, mean stimulus eccentricity was 4.3° (range: 2.2–10.0°), and for LIP neurons mean eccentricity was 10.5° (range: 7.4–15.3°). All further analyses discussed included only those cells that showed shape selectivity under this preliminary screening.
Neurons in LIP had substantially higher average firing rates than AIT neurons in response to the various shapes. This difference is apparent in plots of peristimulus responses as a function of time shown in Fig. 3, averaging over all shape-selective cells and all shape stimuli. Mean firing rate of LIP units to shape stimuli was 22 spikes/s compared with 12 spikes/s in AIT. This difference was significant at the P < 0.001 level. The same difference between LIP and AIT was apparent in the individual data from each of the two monkeys, with mean firing rates for monkey 1 being 11 spikes/s in AIT and 20 spikes/s in LIP, whereas for monkey 2, the mean firing rates were 13 spikes/s in AIT and 26 spikes/s in LIP. These data did not show significant changes in firing rate as a function stimulus eccentricity (P > 0.1). Even when stimulus eccentricities were taken into account using an analysis of covariance (ANCOVA) procedure, the difference in activity between the two areas remained significant at the P < 0.001 level.
Although the two areas showed a large difference in firing rate in response to shape stimuli, spontaneous activity was not significantly different, being 8 spikes/s in AIT and 10 spikes/s in LIP (P = 0.14). Spontaneous activity was measured in the period preceding the first stimulus presentation for each trial. Average latency in LIP (62 ms) was much shorter than in AIT (101 ms), measured as time to half-height between baseline and peak firing in the pooled response plot (Fig. 3).
There were multiple presentations of the same stimulus within each trial (usually 2–4). Figure 3 shows both the response for the first presentation (solid line) and the average response over all repetitions (dashed line). Clearly stimulus repetition causes a response decrement in both areas. The data for all repetitions were normalized relative to the first repetition and plotted in Fig. 4. The decrement in activity caused by stimulus repetition is clearly visible in that plot. A two-way ANOVA was performed on the normalized data, with the two factors being number of repetitions and brain area (AIT or LIP), using an unbalanced ANOVA design because the sizes of the different groups were unequal. This analysis showed that the repetition decrement was significant (P < 0.02) but that there was no significant difference in repetition effects between the two brain areas (P > 0.1). Because repetition effects were not significantly different for the two areas, subsequent analysis pooled data from intratrial stimulus repetitions to increase the data sample size.
Another notable aspect of both AIT and LIP responses apparent in Fig. 3 is the appearance of maintained activity after the end of stimulus presentation. This was more prominent in LIP, where activity remained far above baseline even until the end of the trial, ∼400 ms after the latency-shifted stimulus offset. The long-duration maintained activity, typically associated with some sort of cognitive processing such as memory, occurred even though the monkey was only performing a fixation task.
LIP neurons were “noisier” than AIT neurons in the sense that individual neurons showed greater variability from trial to trial in response to a particular stimulus. The Fano factor (ratio of firing rate variance to firing rate mean; Eq. 3) for LIP neurons had a geometric mean of 3.29 and multiplicative SE factor of 1.05. For AIT neurons, the mean Fano factor was 2.81 with a SE of 1.04. The difference was significant (t-test, P < 0.02). In the context of measuring visual responses, “noise” could include any firing rate changes caused by other sources, including drifts in the cognitive state of the animal or adaptation to the stimuli.
The degree of shape selectivity for each cell was quantified by two different measures. The first was the contrast shape-selectivity index SC (Eq. 1), and the second was the kurtosis shape-selectivity index SK (Eq. 2). This latter measure incorporated information across the entire distribution of responses to all shape stimuli, not just maximum and minimum responses used with SC. In each case, larger values indicate greater shape selectivity.
Under both measures, AIT neurons showed higher average shape selectivity than neurons in LIP. For the contrast shape-selectivity index, SC(AIT) = 0.63 ± 0.09 (SE), whereas SC(LIP) = 0.45 ± 0.08, on a scale that ran from 0.0 to 1.0. For the kurtosis shape-selectivity index, SK(AIT) = 0.45 ± 0.50 and SK(LIP) = –0.30 ± 0.37, on an unbounded scale. The higher shape selectivity of AIT versus LIP neurons was significant at the P < 0.001 level under a two-sample t-test for both SC and SK measures.
An ANCOVA test on the stimulus selectivity indices indicated that eccentricity was highly nonsignificant as a factor affecting them (P > 0.8 for both indices). For the contrast selectivity measure SC, the difference between AIT and LIP remained significant after taking eccentricity into account as a nuisance variable (P < 0.02). The kurtosis selectivity index SK slightly missed significance under the same procedure (P = 0.07). However, because eccentricity was not a significant factor affecting SK, that result was likely caused by a loss in statistical power when the data were subdivided as a function of eccentricity, combined with the relatively high sensitivity of SK to noise. (Calculating kurtosis involves raising the data to the fourth power, which means noise is also raised to the fourth power.)
Calculating shape selectivity after subtracting average baseline activity did not change the observation of higher selectivity in AIT. It led to an increase in the mean of the contrast shape-selectivity measure for both AIT and LIP [SC(AIT) = 0.74 and SC(LIP) = 0.55] and produced no changes in the kurtosis shape-selectivity measure.
The higher shape selectivity in AIT held true for each monkey individually. For monkey 1, under the contrast selectivity measure SC(AIT) = 0.68 and SC(LIP) = 0.45, whereas for monkey 2, SC(AIT) = 0.52 and SC(LIP) = 0.45. Under the kurtosis selectivity measure, for monkey 1, SK(AIT) = 3.34 and SK(LIP) = 3.00, whereas for monkey 2, SK(AIT) = 3.74 and SK(LIP) = 2.51.
Histograms of shape-selectivity values for AIT and LIP neurons are shown in Fig. 5. The overall shapes of the histograms are broadly similar for both the SC and SK indices, suggesting that they are indeed both picking out similar aspects of neural responsiveness. The correlation coefficient between SC and SK was 0.53 for AIT and 0.44 for LIP, calculated from selectivity values taken on a cell-by-cell basis.
In addition to calculating the SC and SK selectivity indices, the greater shape selectivity of AIT neurons relative to LIP is shown a third way in Fig. 6, plotting normalized response as a function of the response rank order of the stimulus shapes. The greater selectivity of AIT neurons in shape space causes their responses to drop off faster than those of LIP neurons, when stimuli are sorted according to response rank.
To examine the encoding of shape by populations of neurons, we treated the response from different cells as occurring in parallel in response to the presentation of a particular stimulus shape. Our AIT population consisted of 62 cells from that area that showed shape selectivity, and the LIP population consisted of 43 shape-selective cells. Example population vectors from AIT and LIP to two shapes are shown in Fig. 7. The analyses below are based on such vectors.
We determined the PDF of responses of all neurons within a population to each of the shapes in our stimulus set. The PDFs were calculated from data using kernel smoothing density estimation (Silverman 1986). AIT and LIP population PDFs were significantly different for each of the eight stimulus shapes at the P < 0.001 level of significance under a Kolmogorov-Smirnov test. The average population PDFs for AIT and LIP (over all 8 shapes) are plotted in Fig. 8, Sshowing a clear-cut difference in shape. The LIP PDF has a narrower peak and a heavier rightward tail.
The significance of the response PDFs (Fig. 8) is that their shapes provide an indication of the sparseness of visual representations in AIT and LIP neural populations, a topic of interest in theories of efficient neural encoding (Field 1994; Simoncelli and Olshausen 2001). One metric for quantifying population response sparseness based on PDF shape is the kurtosis of the PDF (Eq. 4). A PDF that is “peakier” (narrower peak) with heavier (thicker) tails than a Gaussian distribution has high kurtosis and therefore high sparseness. High sparseness, in turn, is an indication of efficient coding based on information theoretic criteria under some theoretical treatments (Field 1994; Simoncelli and Olshausen 2001), although this interpretation can be disputed (Lehky et al. 2005).
To quantify sparseness in the population coding of shape in AIT and LIP, we calculated the kurtosis, K, of the population response PDF for each stimulus shape. The results were K(AIT) = 1.3 ± 0.2 and K(LIP) = 4.8 ± 0.6. LIP exhibited much higher sparseness than AIT, and this difference was significant at the P < 0.001 level under a two-sample t-test.
Next, we quantified how population responses differed when presented with different stimulus shape, using the distance metric defined in methods. Calculating the response distance between all pairs of shape stimuli produced a distance matrix (Table 1). Clearly, the distances in LIP are much smaller. Thus the pattern of responses within the LIP population to different shapes tends to be more highly correlated than it is in AIT. Subtracting average baseline activity had no effect on these correlation-based distance measures. Data from each individual monkey also showed the same pattern of smaller distances in LIP. For monkey 1, the average distance d̄(AIT) = 0.37 and d̄(LIP) = 0.18, whereas for monkey 2, d̄(AIT) = 0.33 and d̄(LIP) = 0.03.
Although we used a correlation-based definition of distance as the basis of our presentation here, smaller distances in LIP also occur if we use the related Mahalanobis distance measure. For the Mahalanobis measure, mean distance in AIT was 20.8, and in LIP was 11.5, with the difference being significant at P < 0.001 under a t-test.
As differences in noise could affect the distance matrices, the magnitude of noise within population representations of shape was estimated by splitting the data in half (even and odd trials) and calculating population vector distances between the two halves for the same shape (analogous to the distances in Table 1). If the system were noise-free, the distances would all be zero. In fact, we calculated the median self-distance of the shape stimuli to be 0.08 in AIT and 0.02 in LIP. However, splitting the data in this manner reduces the sample size from six to three trials. If reduced sample size is compensated for, the distance measures drop in half (see justification for this in methods), so that distance in AIT is 0.04 and distance in LIP is 0.01. These last numbers can be compared with those in Table 1 to give a rough measure of the contribution of noise. The small value of the AIT same-shape distance indicates that noise within the AIT population encoding of shape is insufficient to account for the fact that AIT distances in Table 1 are much greater than those of LIP.
In both AIT and LIP, average response distance between pairs of shapes declined as stimulus eccentricity increased (Fig. 9). These distances are analogous to those shown in Table 1, but are calculated separately for neurons stimulated at different eccentricities. Even when eccentricity was taken into account as a nuisance variable in an ANCOVA analysis, response vector distances in AIT remained significantly greater than in LIP (P < 0.05). Stimulus size was increased as a function of eccentricity, thus this analysis unavoidably confounded those two variables.
The distances in Table 1 can further be used to order the stimulus shapes, such that those shapes having more similar population responses (smaller values of d) are placed closer to each other and shapes that have less similar population responses are placed farther apart. Although this can be accomplished using a variety of multivariate statistical techniques, we focus on two: cluster analysis and MDS.
Cluster analysis dendrograms based on the distance matrices in Table 1 are shown in Fig. 10. As expected from the larger response distances in AIT compared with LIP, the dendrogram for AIT is spread over a larger vertical scale than that of LIP. Interestingly, for AIT, the cluster analysis seems to have divided the eight shape stimuli into three groups, based on similarities in the populations' (relative) response patterns. The first group (yellow) consists of those shapes with strong vertical and horizontal features in their configuration. The second group (green) contains hollow, doughnut-like shapes. Finally, the third group (purple) has shapes that are triangular-like. In LIP, on the other hand, there is a much weaker differentiation of stimulus shapes into clusters (other than the outlier H-shape).
We conducted a classical (metric) MDS analysis based on distance matrices in Table 1 (cf. Young and Yamane 1992). As shown in Fig. 7, shape responses were embedded in the 62-dimensional space of the AIT neural population. For AIT, a three-dimensional projection of the MDS configuration accounted for 80% of the variance and a two-dimensional projection accounted for 67% of the variance. This is consistent with the hypothesis that AIT may be implementing a low-dimensional neural representation of shape. Similarly, for LIP, a three-dimensional projection accounted for 89% of the variance and a two-dimensional projection accounted for 82% of the variance. Even with the outlier H-shape removed, two dimensions still accounted for most of the variance in LIP.
A three-dimensional plot of the neural shape space produced by MDS is presented in Fig. 11A. The representations of the eight stimulus shapes within the AIT shape space are shown by the yellow, green, and purple dots, whose colors indicate the three groups of shapes previously picked out by cluster analysis (Fig. 10A). The three groups remain clearly separated for the MDS analysis, as they were for the cluster analysis. For LIP, the representations within its shape space are shown by blue dots. These are clumped near the origin because of the compressed scale of the LIP shape space relative to AIT (as would be expected from the relative magnitudes of the AIT and LIP distance matrices in Table 1). Data from each individual monkey also showed the same pattern of three clusters in AIT and a clustering of all shapes near the origin in LIP. Although both AIT and LIP are plotted on common axes for convenience, there is no necessity that the coordinate dimensions in the plot represent the same thing in each case, because the MDS analyses were carried out independently for the two brain areas.
A two-dimensional plot of neural shape space is presented in Fig. 11B. The representations of the eight shapes within shape space for AIT are plotted as star-shaped points. As before, the yellow, green, and purple colors code the three groups of shapes previously found by cluster analysis. Also plotted is a Procrustes mapping (Borg and Groenen 1997) of the LIP configuration (circles) onto the AIT configuration. The Procrustes procedure linearly scaled, rotated, and translated the LIP data to minimize mean square distance between the eight points within the AIT shape space and the eight points within the LIP space.
The two-dimensional correlation coefficient (Eq. 5) between the AIT and LIP configurations in Fig. 11B is r2D = 0.80. However, a permutation test on the goodness-of-fit value for the Procrustes mapping rejected the hypothesis that the AIT and LIP shape spaces were identical (P > 0.25). This permutation test showed that the Procrustes fit between the AIT and LIP coordinates in shape space was not significantly better than could be obtained by performing a Procrustes fit between the AIT coordinates and a random permutation of the LIP coordinates. Moreover, the average distance of the eight shapes from the origin remained much greater for AIT than LIP even after the Procrustes transform of the LIP data, with d̄AIT = 0.26 and d̄LIP = 0.14 (P = 0.02, paired t-test). Because the Procrustes transform included a scaling of the LIP shape space to maximize its congruency with the AIT space, the fact that a large difference in scale remained indicates that the Procrustes fit was constrained by major nonscale factors that differed between the two areas. Therefore despite some similarity indicated by the correlation coefficient, it seems that shape encoding in LIP is not simply an attenuated copy of shape encoding in AIT but includes notable differences that may reflect different goals within dorsal and ventral visual processing.
Shape selectivity in individual neurons
The majority of neurons in both AIT and LIP showed significant shape selectivity. Quantifying this effect further, we found that AIT neurons on average had higher shape selectivity than those of LIP (Fig. 5). In other words, AIT neurons were more narrowly tuned within neural shape space. Because neural responses to shape were measured under a simple fixation task, motor responses and cognitive factors such as attention were constant for the two cortical areas.
While many units in AIT showed low to moderate selectivity, there was a substantial subpopulation with quite high selectivity, a pattern of results similar to that seen in the AIT data of Op de Beek et al. (2001). LIP, on the other hand, had few highly selective cells. The higher shape selectivity we observed in AIT neurons indicates that different shapes are represented within AIT more distinctively than in LIP.
In addition to shape selectivity, we observed two effects that may be related to cognitive processes. These were both apparent in the peristimulus response plots (Fig. 3). The first was a repetition suppression effect. When the stimulus was presented multiple times within a trial, the response declined (Fig. 4). The second was maintained activity during the time periods between stimulus repetitions, which was much more prominent in LIP. Because the monkeys were performing only a fixation task, these effects were likely to be associated with reflexive, stimulus-driven, “bottom-up” cognitive processing, rather than goal-directed or “top-down“ processing.
That the repetition suppression was not purely a low-level biophysical adaptation effect, but rather a more complex, possibly cognitive phenomenon, is indicated by the fact that response decrements were reset between trials. The suppression was not cumulative over extended periods of repeated exposure. Repetition effects have been widely reported in AIT (Brown and Bashir 2002; Fahy et al. 1993; Miller et al. 1991, 1993; Xiang and Brown 1998). The reset property has also been noted previously in AIT (Holscher and Rolls 2002; Miller et al. 1991, 1993). Here we report that similar repetition effects occur in LIP.
Repetition effects in AIT have been associated with various types of memory, including priming, long-term recognition memory, and visual working memory (Miller et al. 1991; Xiang and Brown 1998). It is possible that such effects in LIP may similarly be associated with some form of memory, perhaps in a manner that reflects its role in spatial processing and attention. One possibility is that repetition decrements in LIP are involved in an attentional phenomenon called “inhibition of return” (IOR) (Itti and Koch 2002; Klein 2000; Posner and Cohen 1984; Sereno et al. 2006; Tipper et al. 1991), which biases attention away from returning to recently examined locations or objects in favor of novel stimuli. Inhibition of return correlates with attenuated activity in the superior colliculus (SC) (Bell et al. 2004; Dorris et al. 2002). However, this may be secondary to attenuated input to SC from parietal cortex (Dorris et al. 2002). If that is the case, the attenuation we observed in LIP activity during stimulus repetition may be an upstream source of the inhibition of return effects observed in SC.
Sparseness of shape representations
The probability density functions of neural responses within AIT and LIP populations are shown in Fig. 8. Both PDFs have high kurtosis (Eq. 4), with mean values K(AIT) = 1.3 and K(LIP) = 4.8 over all stimulus shapes. In particular, the LIP kurtosis is vastly larger than anything reported in the visual literature. Population response distributions with high kurtosis are said to have high sparseness. Under some theories of neural encoding, high sparseness has been interpreted as an indicator of statistically efficient coding of visual stimuli, based on information theoretic arguments (Field 1994; Simoncelli and Olshausen 2001). However, Lehky et al. (2005) have disputed any necessary connection between high sparseness and efficient coding, arguing that high sparseness may reflect deterministic nonlinearities in the system involved in implementing visual algorithms. Rather than efficient coding, the unusually high sparseness seen in LIP populations may be related to the implementation of visuomotor algorithms.
Population encoding of shape
Each shape can be treated as a point within a high-dimensional representation space. This may be a high-dimensional shape parameter space in psychophysics experiments (Cutzu and Edelman 1996; Edelman and Duvdevani-Bar 1997; Sugihara et al. 1998) or a high-dimensional neural space in neurophysiology experiments (for example, Kayaert et al. 2005; Op de Beek et al. 2001), in which the size of the neural encoding population defines the dimensionality of the representation. The encoding of two shapes within AIT (n = 62) and LIP (n = 43) populations are shown in Fig. 7, where the heights of the histogram bars give the coordinate values along each dimension of the n-dimensional neural representation space.
Once the shapes are embedded in this n-dimensional space, the distances separating them can be calculated, in our case using a correlation-based measure of distance, d = 1 − r. The separations between all pairs of shapes are given in Table 1, showing LIP distances are substantially smaller than those of AIT (i.e., LIP population responses to different shapes are more correlated than in AIT).
The relatively small vector distances separating LIP responses to different shapes at the population level can in part be the result of the lower shape selectivity of LIP individual neurons. However, the differences between LIP and AIT in Table 1 appear too large to be fully accounted for by the relatively modest differences in shape selectivity seen in Fig. 5, and we suspect there may be another aspect to the matter.
The high correlations observed in LIP population may reflect the fact that LIP is involved in sensorimotor integration and thus has both a sensory and a motor aspect to its function. Neural responses to different shapes would be expected to contain a large component related to motor response as well as the sensory component we are primarily concerned with here. Both the motor and sensory components would in general be stimulus dependent (shape-dependent in this case), and we shall therefore call them the “shape motor response” and the “shape sensory response.” If the monkey is in a fixed motor response state (e.g., simply fixating, as in this experiment), the shape motor response component acts as a constant background that is modulated by the shape sensory response component. The presence of this background dilutes the shape sensory responses and leads to high correlation between the overall responses to different shapes.
In mathematical terms, the LIP sensory response to shape X is given by an n-dimensional vector Rx = (x1, x2, …, xn) and sensory response to shape Y is Ry = (y1, y2, …, yn). In addition, there is shape motor response background activity denoted by the vector RB = (b1, b2, …, bn). RB remains the same for different shapes in the present task, but in other tasks may depend on the motor response required for a particular shape and the cognitive behavior required of the monkey. Total neural activities for the two shapes are given by (6) Although Rx and Ry may have only a low or moderate correlation when considered in isolation, when mixed with the invariant shape motor response component, the total responses RTx and RTy can become highly correlated. (Note that the background response cannot be a constant vector b1 = b2 = … = bn, because adding a constant to two random variables does not change their correlation.)
Postulating a large shape motor response component to LIP responses seems plausible and attractive given the diverse array of factors affecting LIP activity (Colby and Goldberg 1999; Colby et al. 1996). Furthermore, this account would leave open the possibility that the shape-selective responses that are seen in a majority of LIP units may become structured differently in another behavioral context. Such a behavioral effect on the grouping of visual stimuli was found in an adjacent parietal region, AIP, where the similarity structure was based on the specific and different behavioral responses associated with those visual stimuli (Murata et al. 2000) and not visual similarity per se. In contrast to Murata et al. (2000), during this study, no response was allowed, and hence the different shapes were not associated with explicit and different behaviors. Comparing AIT with LIP, the shape motor response background within AIT would be almost certainly much smaller and perhaps negligible, so that AIT responses in a passive fixation task would be expected to show greater response distances (lower correlations) for different shapes.
Another factor to consider in seeking to explain the low discriminability (high correlations) among different shape responses in LIP relative to AIT is the possibility that discriminability in LIP neurons is better for three-dimensional shapes over the flat ones used here. Such a suggestion is plausible in light of the particular importance of three-dimensional space in parietal structures for purposes of sensorimotor coordination. However, this issue is not straightforward because AIT neurons (Janssen et al. 2000; Sereno et al. 2002; Tanaka et al. 2001) as well as LIP neurons (Gnadt and May 1995; Nakamura et al. 2001; Sereno et al. 2002) are sensitive to three-dimensional representations. Finally, it is possible that in a different context (e.g., shape discrimination task), AIT responses may change and show even greater response distances than reported here for a passive fixation task.
For the AIT neural population, the cluster analysis divided the eight stimulus shapes into three groups (Fig. 10A). These groups had members who clearly resembled each other (“yellow shapes”: dominated by horizontal and vertical edges; “green shapes”: variants of a hollow, doughnut-like ring; “purple shapes”: triangular-like). In LIP (Fig. 10B), on the other hand, given the small distances separating population responses to different shapes (Table 1), the cluster analysis produced a compressed, poorly differentiated hierarchy, with less than 0.05 response distance separating seven of the eight shapes.
The cluster analysis results show that, in AIT, roughly similar shapes produce similar patterns of activity within a neural population encoding shape. Thus we see in AIT both the potential ability to identify shapes based on a particular pattern of activity in a population and the capability to generalize and group shapes based on correlations in population activities. Indeed, the close connection between identification and generalization of patterns is emphasized by formal models within the experimental psychology literature (Ashby and Lee 1991; Nosofsky 1986).
The form of generalization we observe reinforce a population coding approach to categorization. Other studies in AIT that have grouped stimuli based on patterns of population responses include Hung et al. (2005), Rolls and Tovee (1995), and Tsao et al. (2006). A number of AIT studies have focused on categorization effects at the level of single neurons rather than the population level (Freedman et al. 2003; Sigala and Logothetis 2002; Vogels 1999). However, if one views a shape category as a particular region within a multidimensional shape space, examination of the properties of single cells in isolation rather than as populations becomes an inadequate approach to characterizing the system. We are not aware of any previous studies related to shape encoding in LIP, either at the population level or single unit level.
If categorization is indeed built on top of the sort of grouping we observed in AIT, LIP would be expected to be poor at visual categorization, given the relatively undifferentiated results of the cluster analysis for that area (Fig. 10B). That is, differences in responses to different shapes are so small that, in a noisy system, the shape space would need to be carved up into much coarser chunks to be reliably differentiated. Therefore given the small distances separating LIP population responses to different shapes (Table 1), LIP is not only expected to do worse in object identification than AIT, but also worse in object categorization.
The MDS analysis reinforced the results of the cluster analysis. Projection of the MDS configuration to three dimensions (Fig. 11A) picked out the same three groups of shapes in AIT as did the cluster analysis. In the same figure, the LIP configuration appears bunched near the origin, again because of the small distances separating LIP responses to different shapes.
Most of the variance in the data for both LIP and AIT was accounted for by the three dimensions plotted in Fig. 11A (in fact, most variance can be accounted for by just 2 dimensions). While this is consistent with previous reports that the visual system is encoding shapes within a low-dimensional space (Cutzu and Edelman 1996; Edelman and Duvdevani-Bar 1997; Op de Beek et al. 2001; Sugihara et al. 1998), we cannot place too much significance on this low-dimensionality aspect of the analysis, given the limited number of shapes (n = 8) in our sample. Regardless of the nominally large dimensionality of a shape parameter space or neural population encoding space, N points (shapes) can not possibly occupy more than an N − 1 dimensional space. With a small shape sample, even that limited space may not be homogenously filled. Therefore the dimensionality reductions are somewhat less impressive than they might seem. While the reports cited above were aware of this “small sample” issue and constrained their experimental designs to deal with it, the issue of low-dimensional shape representation could still benefit from being re-examined with a larger sample of shapes.
Overall, it seems that AIT and LIP, in the context of a passive fixation task, do not represent these simple, two-dimensional shapes within the same shape space. Specifically, under identical conditions, we found that the patterns of activity within neural populations in AIT show an enhanced capacity not only to discriminate between shapes but also to generalize and group shapes based on correlations in population activities compared with LIP. These differences between the two pathways suggest that the shape spaces may be tailored to the different purposes for which they are being used and are not simply mirror copies of each other.
As was discussed earlier, we suggest that the high correlation among LIP responses to different shapes could be caused by the shape sensory response component being diluted by a large, invariant shape motor response component. The invariance of the motor response component to different shapes in turn reflects the task conditions of this particular experiment. Given the important role of the parietal pathway in sensorimotor integration, it is possible that under different task conditions, the LIP shape space would look quite different (see Murata et al., 2000 for behavioral effects on visual responsiveness in the AIP area, adjacent to LIP).
In summary, in a first comparison of shape selectivities between the two visual pathways, we observed lower selectivity for two-dimensional shapes in LIP than in AIT neurons, suggesting that LIP may be capable of less precise and subtle identification of objects. When population activities were examined as a whole, responses in the LIP population to different shapes were less distinct than those in AIT and showed a more poorly differentiated grouping of similar shapes. The striking attenuation in the shape modulation of population responses in LIP points to a reduced capability for precise object identification and categorization, at least within the present behavioral context. The attenuation of the shape signal in LIP may reflect its mixture with other signals, perhaps related to LIP's role in sensorimotor integration. These findings clearly support the idea that shape selectivities in the dorsal pathway are in some measure independent and are not merely the duplication of those formed in the ventral pathway.
This work was supported by National Institute of Mental Health Grants R01 MH-63340 and R01 MH-65492, the National Alliance for Research on Schizophrenia and Depression, and the J. S. McDonnell Foundation.
We thank J. Maunsell for support of this project; C. McAdams for help in data collection; L. Cathey and V. Juneja for help with data analysis; and M. Mishkin, M. Sereno, and X. Peng for manuscript review.
The costs of publication of this article were defrayed in part by the payment of page charges. The article must therefore be hereby marked “advertisement” in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.
- Copyright © 2007 by the American Physiological Society