Journal of Neurophysiology

Second-order receptive fields reveal multidigit interactions in area 3b of the macaque monkey

Pramodsingh H. Thakur, Paul J. Fitzgerald, Steven S. Hsiao

Abstract

Linear receptive field (RF) models of area 3b neurons reveal a three-component structure: a central excitatory region flanked by two inhibitory regions that are spatially and temporally nonoverlapping with the excitation. Previous studies also report that there is an “infield” inhibitory region throughout the neuronal RF, which is a nonlinear interactive (second order) effect whereby stimuli lagging an input to the excitatory region are suppressed. Thus linear models may be inaccurate approximations of the neurons' true RFs. In this study, we characterize the RFs of area 3b neurons, using a second-order quadratic model. Data were collected from 80 neurons of two awake, behaving macaque monkeys while a random dot pattern was scanned simultaneously across the distal pads of digits D2, 3, and 4. We used an iterative method derived from matching pursuit to identify a set of linear and nonlinear terms with significant effects on the neuronal response. For most neurons (65/80), the linear component of the quadratic RF was characterized by a single excitatory region on the dominant digit. Interactions within the dominant digit were characterized by two quadratic filters that capture the spatial aspects of the interactive infield inhibition. Interactions between the dominant (most responsive) digit and its adjacent digit(s) formed the largest class of cross-digit interactions. The results demonstrate that a significant part of area 3b responses is due to nonlinear mechanisms, and furthermore, the data support the notion that area 3b neurons have “nonclassical RF”-like input from adjacent fingers, indicating that area 3b plays a role in integrating shape inputs across digits.

  • somatosensory
  • tactile information
  • infield inhibition
  • generalized linear model
  • point process model

there is strong evidence that nonlinear mechanisms play a significant role in the initial processing of information in the sensory systems. In this report we investigate the nonlinear receptive field (RF) properties of neurons in area 3b of primary somatosensory cortex (SI) and its relevance to two aspects of somatosensory function. The first is the role of nonlinear mechanisms in two-dimensional (2D) form processing from a single finger. Previous studies have shown that somatosensory inputs related to the perception of 2D form are transformed from an initial isomorphic representation in the periphery into partially transformed responses in SI that are selective to spatial features of the stimuli (Hsiao et al. 2006; Phillips et al. 1988). One way of understanding these transforms is to understand the RF structures of the neurons.

One of the earliest attempts at characterizing the RF structures of neurons in area 3b of SI involved indenting the RF with temporally and spatially separated paired probes (Gardner 1984; Gardner and Costanzo 1980a, 1980b). These studies revealed that the RF has two components, a central excitatory core with a latency of ∼10 ms and a spatially overlapping inhibitory field called infield inhibition that is delayed with respect to the excitatory core. Pharmacological blockade of GABAergic transmission increases the size of the RF excitatory area, as well as the maximum response evoked at the RF center (Alloway and Burton 1991; Dykes et al. 1984). This suggests that infield inhibition is mediated by cortical interneurons, whereas the central excitatory core represents the thalamic feed-forward input. Thus understanding the respective contributions of excitatory core and infield inhibition to the neural responses may elucidate the roles that feed-forward inputs and local intracortical circuits play in imparting feature selectivity.

Infield inhibition has been also shown in intracellular recordings as delayed inhibitory postsynaptic potentials (IPSPs) (Andersson 1965; Innocenti and Manzoni 1972; Whitehorn and Towe 1968) and accounts for the decreased response to a test stimulus preceded by a conditioning stimulus within the RF (Gardner and Costanzo 1980a, 1980b; Laskin and Spencer 1979). Since it is only evoked by paired stimuli, and not revealed when single punctate probes are used as stimuli, which show area 3b RFs as simple homogeneous excitatory regions (Iwamura et al. 1983; Mountcastle and Powell 1959; Sur 1980, Sur et al. 1984), infield inhibition is an interactive second-order effect. Linear RF models estimated from responses to multiple simultaneous indentations, via either scanned embossed random dot patterns (DiCarlo et al. 1998) or random probe indentations (Sripati et al. 2006), capture inhibition via negative linear weights, but only in spatiotemporal regions beyond the excitatory core. These studies show two components of inhibition. The portion of infield inhibition that is temporally coincident with the excitatory core but spatially extends beyond the excitation is revealed as the temporally coincident component and gives the neurons a center-surround-like structure. The portion of infield inhibition that persists after the excitation has subsided is expressed as the lagging inhibitory component. However, since they are unable to discriminate between the linear excitatory effects and nonlinear inhibitory second-order effects in spatiotemporal regions where these two effects overlap, these linear models approximate the excitatory and inhibitory effects via an average of the two. Since excitation is stronger than inhibition, the resulting linear weights are positive in this region of overlap, and the representation of infield inhibition is lost. To fully disentangle the effects of excitation and inhibition, especially in regions of spatiotemporal overlap, it is essential to extend the linear models to include second-order terms to accurately capture interactive infield inhibitory effects.

The second aspect of somatosensory processing investigated here is multidigit interaction. Although neurons in the digit region of area 3b have traditionally been thought to be driven exclusively by inputs from single finger pads, optical imaging studies (Chen et al. 2001, 2005) indicate that subthreshold inputs to area 3b neurons may extend over an area substantially larger than suprathreshold inputs. Also, cortical plasticity studies (Wang et al. 1995) have shown that coincident presentation of stimuli on adjacent digits expands neuronal RFs in area 3b to span multiple digits, indicating that an underlying substrate for multidigit interactions may exist in area 3b. Recently, Reed et al. (2006) showed, using multiple probe stimuli, that responses of neurons with palmar RFs can be modulated by stimuli outside the classical RF, like nonclassical RF effects observed in the visual system (see Allman et al. 1985). The RF mechanisms that underlie these nonclassical effects in area 3b neurons are not known.

In this study, we estimate multidigit second-order RFs of area 3b neurons. Neural responses were obtained with methods used previously (DiCarlo et al. 1998) except that random dot patterns were scanned simultaneously across three distal pads (i.e., digits D2–4) of the awake macaque. RFs were then estimated by using a point process model that includes both linear and second-order (quadratic) terms. To circumvent the dimensional explosion due to inclusion of quadratic terms in the model, we used an iterative method based on matching pursuit (Mallat and Zhang 1993) to determine a subset of linear and quadratic terms that have the most salient effects on the neuronal responses. We then investigated how these nonlinear properties affect the neural responses.

METHODS

Animals and Surgery

Single-unit recordings were made with standard methods from three cortical hemispheres in two male rhesus monkeys (M. mulatta) weighing 5–7 kg. Briefly, before the recordings, animals were trained to sit quietly in a primate chair (see below). Surgery was then performed to secure a head-holding device and recording chambers (19-mm diameter) to the skull. Surgical anesthesia was induced with ketamine HCl (20 mg/kg im) and maintained with pentobarbital (10–25 mg·kg−1·h−1 iv). All surgical procedures were done under sterile conditions and were reviewed and approved by the Johns Hopkins Animal Care and Use Committee and in accordance with the rules and regulations of the Society for Neuroscience.

General Recording Techniques

All recordings were performed in awake, behaving monkeys. The monkeys were trained to sit in a primate chair with their hands restrained and were rewarded with sips of water at random intervals (about once every 7 s) while tactile stimuli (see below) were presented to the distal pads of digits D2, D3, and D4. On each recording day, a multielectrode microdrive (Mountcastle et al. 1991) was loaded with seven quartz-coated platinum-tungsten (90:10) microelectrodes (shaft diameter 80 μm, tip diameter 4 μm, impedance 1–5 MΩ at 1,000 Hz). On approximately half of the recording days, microelectrodes were coated with fluorescent dye (DiI; Molecular Probes, Eugene, OR) to facilitate later histological localization of the recording sites (DiCarlo et al. 1996). For all recordings, the tip of the microdrive was inserted into the recording chamber and oriented with the microelectrodes normal to the dura mater.

The seven microelectrodes were oriented at an angle of 45° to the mediolateral axis (parallel to the central sulcus) with center-to-center spacings of 400 μm. On each recording day, the microelectrodes were positioned within the recording chamber with micrometer-controlled translation stages (model 430; Newport, Irvine, CA) (accuracy within 5 μm).

To record in SI, we used a stereotaxic instrument to first center the recording chamber over the central sulcus [Horsley-Clarke (HC) coordinates: anterior 6. lateral 21]. The central sulcus was then located within the recording chamber based on the responses encountered on the microelectrodes. Responses anterior to the sulcus are motor, whereas responses posterior to the sulcus are somatosensory. Once the location of the sulcus was determined, the microelectrode array was adjusted to span the digit representation, just posterior to the central sulcus.

Because SI lies in the postcentral gyrus, we recorded first superficially before driving the electrodes deeper into cortex. Thus, for each position of the array within the chamber, recordings were first made in area 1, then on the next day in area 3b, and then on the third day in area 3a. After sampling the responses from the three areas, the microelectrode array was moved systematically along the postcentral gyrus until neurons ceased to be driven by the digits. Determination of which area we were recording from was based on the progression of RFs from the distal pads to the proximal pads and then to the distal pads as the electrodes passed from area 1 to area 3b and confirmed postmortem. Results in this report are based on the data collected from area 3b.

Data Collection

The signals from the seven channels were band-pass filtered and fed into a spike sorter (Alpha Omega), which detected and classified action potentials based on their similarity to spike templates. Neurons that met the following criteria were selected for analysis: 1) the action potentials were well isolated from noise and from other spike templates; 2) the neuronal firing was stable throughout the stimulus protocol; and 3) the neuronal RF, as determined by handheld probes, was located on the distal pads.

Stimulus

The stimuli consisted of raised dot patterns fabricated from sheets of photosensitive plastic that are water-soluble until exposed to UV light (Toyoba Printing Plastics, EF series). The patterns are produced by laying a photographic negative of the pattern over the plastic sheet and exposing it to UV light, which polymerizes and hardens the exposed plastic. The portion of the surface layer not exposed to UV light is scrubbed off lightly in water, leaving a raised pattern whose thickness equals that of the original water-soluble layer (380 μm). The stimulus pattern was specified in Postscript and printed at 3,386 dots/in. After construction, the sheet containing the stimulus patterns was wrapped around and glued to a cylindrical drum, 640 mm in circumference, which was mounted on a rotating drum stimulator (Johnson and Phillips 1988).

The random dot patterns were designed to be similar to the patterns used in a previous study (DiCarlo et al. 1998). Each dot was 380 μm high (in relief) and 400 μm in diameter at its top, with sides that sloped away at 60° relative to the surface of the drum. The location of the center of each dot was determined by selecting two random numbers from a uniform random number generator and scaling them to the height (40 mm) and width (560 mm) of the pattern. The dot centers were specified to a precision of 1 μm, and dots were allowed to overlap when the random number generator placed them closer than the dot diameter. Dots were added until the average dot density equaled 10 dots/cm2.

After one or more neurons with RFs (as determined by qualitative mapping) on the distal pads of digit D2, D3, or D4 were isolated, the drum with the random dot pattern was lowered and positioned so that it simultaneously scanned across the distal pads of digits D2, D3, and D4. The pattern was scanned in the proximal to distal direction at a speed of 25 mm/s, while a torque motor maintained a contact force of 90g. After each complete rotation, the drum was shifted laterally by 200 μm, thus presenting a slightly different pattern during successive scans. Data were collected continuously for a minimum of 30 scans, or ∼13 min. The contact patch of the finger on the drum was monitored with a metallic “hot” pin (1-mm diameter) placed on the surface of the drum. Contact of the finger with this pin triggered a series of square wave pulses that were stored as events in the data stream. This pulse signal was then used to align the section of stimulus pattern that was in contact with the pads at any given time step with the neural response.

Linear and Quadratic Receptive Field Estimation

The recorded spike train was binned (32-ms bin width) to yield a spike count vector (Rob = |r1, r2, …rT|) (see Fig. 1). Each stimulated pad was divided into subregions of size 800 μm × 800 μm, yielding 432 subregions (3 pads × 144 subregions each) that could potentially affect the neural response. For a linear model, each subregion was associated with a weight wi that defines its linear excitatory or inhibitory influence on the neural response. The neural firing rate at the nth step is related to the linear combination (εn) of the corresponding weights at each subregion multiplied by the number of dots xi(n) in that subregion at the nth time step (see Fig. 1A). Specifically, εn=w0+i=1432xi(n)wi(1) In this model, excitation and inhibition are captured by positive and negative weights, respectively.

Fig. 1.

Design of linear kernels. A demonstrates how the linear kernels and rate vector (Rob) were constructed for the random dot data. Discrete time steps were obtained by binning the entire duration of the experiment with suitable time bins. The nth entry in the rate vector array equals the number of spikes during the nth time step. Xab and Xac are two 1 pixel × 1 pixel kernels that span pixels at coordinates (a,b) and (a,c), respectively. The nth entry of each vector equals the number of dots within the areas spanned by the vectors during the nth time step. During the 1st illustrated frame, there is a dot at (a,b) but no dot at (a,c). Hence the corresponding entries in vectors Xab and Xac during this time step are 1 and 0, respectively. During the 2nd illustrated frame, there are no dots at (a,b) but a dot at (a,c). Hence the entries in the 2 vectors at this time step are 0 and 1, respectively. B shows construction of the quadratic kernels. The nth entry of each vector equaled the product of the number of dots in the 2 subregions (i,j) and (i,k) spanned by that vector during the nth time step.

The model was estimated in two steps. First, a subset of terms or pixels (usually <100) that had a salient contribution to the neural response was identified with a dimensionality reduction method based on matching pursuit (see appendix). Weights were then estimated by fitting a generalized linear model (GLM) (function glmfit, MATLAB; MathWorks) with a log-link function between the selected subset of terms and observed spike count vector. This assumes a Poisson distribution of spike counts [i.e., rnPoisson(eεn)], yielding a log-linear model in input covariates or subregions (x). Poisson assumption has been shown to be reasonable with a sufficient number of repeated trials or at intervals beyond refractory periods (Kass et al. 2005; Ventura et al. 2002).

For the quadratic model, in addition to the linear weights as described above, quadratic weights (wij) were defined to capture every pair of possible interactions between subregions (i,j) across the three pads (see Fig. 1B). Such weights denote the interaction between the specific pairs of subregions, and their values indicate the effect of stimulating the two subregions simultaneously on the neural response. Specifically, the model was defined as εn=w0+i=1432wixi(n)+i=1432j=1432wijxi(n)xj(n)(2) where xi and xj are the number of dots stimulating the ith and jth subregions during the nth time step. The second-order weight wij captures the effect of simultaneously indenting subregions i and j on the neural response. A negative (positive) weight indicates suppressive (facilitative) interaction due to a simultaneous stimulation of subregions i and j. On the other hand, the contribution of this term is zero if either of the two subregions i and j is not stimulated. Thus, under this model, any linear effect will be captured by the linear weights wi, which we refer to as the linear component of the quadratic model, whereas the interactive effects between pairs of subregions are captured by the second-order weights wij. If the infield inhibition is a linear effect it will be captured by negative linear weights, but if it is an interactive effect it will be captured by the second-order weights wij and will not be found in the linear component of the quadratic model.

Like the linear model, the quadratic model was estimated in two steps. First, a set of salient terms was identified with matching pursuit, followed by an estimate of the weights corresponding to those terms via fitting a GLM, yielding a log-quadratic model in input covariates or subregions (x) [i.e., rnPoisson(eεn)].

Linear and Quadratic Kernels Spanning Multiple Subregions

With an average dot density of 10 dots/cm2 (see above), the probability of stimulating a subregion at a given instant of time is P = 10/(12.5 × 12.5) = 0.064. With 598 bins each from 30 scans (N = 598 × 30 = 17,984 unique equations for fitting), the expected number of nonzero occurrences of stimulus dots at any subregion over the period analyzed is N × P = 1,148. However, the number of simultaneous occurrences of nonzero dot counts over two subregions (occurrences of simultaneous stimulation to uncover quadratic response) is N × P × P = 74. Because of such low prevalence of paired stimulation, the errors in estimating quadratic weights are likely to be higher. To improve the reliability of quadratic weights, we expanded our dictionary to include “kernels” that span multiple subregions (up to 5 subregions × 5 subregions or 4 mm × 4 mm) and used matching pursuit to iteratively pull out the best matching linear or quadratic kernel at any stage. For example, a linear kernel spanning 2 × 2 subregions was defined as the total number of dots within all 4 subregions spanned. A quadratic kernel spanning 2 × 2 subregions was defined as the product of the number of dots within the two 2 × 2 regions represented by that kernel. Mathematically, using such multiple subregion spanning kernels translates into assuming that weights are similar for all subregions spanned (and hence combining them and estimating a single weight instead). For example, a linear kernel BL spanning subregions x1, x2, x3, and x4 involves the following approximation: w1x1+w2x2+w3x3+w4x4w(x1+x2+x3+x4)wBL Similarly, a 2 × 2 quadratic kernel Bq representing a “combined” interaction between subregions x1, x2, x3, and x4 and subregions x5, x6, x7, and x8 involves the following approximation: w15x1x5+w16x1x6+w17x1x7+w18x1x8+...w45x4x5+w46x4x6+w47x4x7+w48x4x8+w(x1+x2+x3+x4)(x5+x6+x7+x8)wBq

Although the prevalence of paired stimulation of subregions is low, the prevalence increases dramatically as we consider kernels spanning multiple subregions. For example, the probability of finding at least one dot within 4 subregions is P = 10/(6.25 × 6.25) = 0.256. As a result, the expected number of occurrences of nonzero terms for a 2 × 2 quadratic kernel is N × P × P = 1,174, which is similar to a 1 × 1 linear kernel spanning just one subregion. For our analysis linear kernels spanning 1 × 1, 2 × 2, 3 × 3, 4 × 4, and 5 × 5 subregions and quadratic kernels spanning 2 × 2, 3 × 3, 4 × 4, and 5 × 5 subregions were included in the dictionary for matching pursuit; 1 × 1 quadratic kernels were excluded because of their low prevalence.

Multiple Instances of Linear and Quadratic Models via Randomized Matching Pursuit

Matching pursuit is an iterative algorithm that finds the best-fitting terms at any iteration from a dictionary of kernels. At the end of each iteration, it projects the remaining terms in the dictionary into a space orthogonal to the response vector, so that at subsequent iterations the algorithm picks up terms that best capture the portion of the response vector not accounted for up to the current iteration (see appendix). Given the iterative nature of the algorithm, a noisy term picked up at any step has a bearing on all subsequent terms picked up by the algorithm. Further, since the algorithm is fitting to a spike count vector, which is a single observation of a random process, it is likely to pick up noise at any step. To minimize the effects of noisy terms being picked up at any given step, we obtained multiple instances of each type of model by using randomization. Rather than picking the kernel with the maximum projection at every iteration, we randomly selected a kernel with projection >75% of the maximum projection. Because of the randomization, the algorithm selects a different sequence of vectors at each instance. We hypothesize that the signal terms or kernels are selected more often across the different instances since they should depend on the systematic component of the response common across all data sets, whereas a given noisy kernel will be selected only on a few runs since the noise component of the response will be dependent on the current rate vector at that iteration. The final weights obtained for each type of model were averaged across 25 such instances.

Expressing Second-Order Terms as Quadratic Filters

The excitatory and inhibitory effects of a linear RF can be easily visualized by plotting the weights at the corresponding spatial locations. However, second-order interactions between different spatial locations cannot be visualized directly from the quadratic terms. To obtain a visual display of the pattern of interactions between various spatial locations, we obtained an equivalent representation of Eq. 2 in terms of 2D quadratic filters. Specifically, Eq. 2 can be rewritten in matrix form: εn=w0+AnTWL+AnTWQAn(3) where An=[x1(n)x2(n):x432(n)],WL=[w1w2:w432]andWQ=[w11w12/2...w1,432/2w12/2w22...w2,432/2::...:w1,432/2w2,432/2...w432,432/2] WQ is known as the Hessian matrix and represents the interactive effect between pairs of subregions in a compact matrix form.

To obtain a visual display of the pattern of interactions between various spatial locations, an equivalent representation of the Hessian matrix can be obtained in terms of its eigenvalues (λi) and eigenvectors (qi): WQ=i=1432λiqiqiT Substituting in Eq. 3, εn=w0AnTWL+i=1432λi(AnTqi)2=w0+AnTWL+i=1432λi|λi|[AnT(|λi|qi)]2(4) The term λi|λi| is 1 if λi is positive and −1 if negative and determines whether the contribution of a particular eigenvector is suppressive (negative) or facilitative (positive).

Thus the eigenvectors, scaled by the root of the absolute value of the corresponding eigenvalue (i.e., λiqi), can be considered as quadratic filters that contribute to the response by the square of the convolution of these filters with the input stimulus pattern. Because of the eventual squaring operation, the overall sign of any quadratic filter is immaterial. (If a quadratic filter is multiplied by −1, so that the signs of the weights of all subregions are reversed, the contribution of the filter to the response remains unchanged.) Thus the effect of a given quadratic filter depends on the sign and magnitude of the corresponding eigenvalue, as well as the relative signs of the weights of different subregions in the quadratic filter. If a pair of subregions in the quadratic filter has the same sign, then a negative eigenvalue implies suppression, whereas a positive eigenvalue implies facilitation. On the other hand, if subregions have opposite signs, then a negative eigenvalue implies facilitative interaction, whereas a positive eigenvalue implies a suppressive interaction (see appendix for details on interpretation). These quadratic filters are analogous to second-order RF structures obtained via second-order methods such as spike-triggered covariance.

Separating Within-Digit and Cross-Digit Interactions

Instead of obtaining quadratic filters from the Hessian matrix directly, we transformed the matrix in such a way that it enables a direct and straightforward quantification, comparison, and interpretation of the different types of within-digit and cross-digit interactions. For every neuron we classified each of the three pads as 1) maximally responsive digit (Dmax: defined as the pad containing the subregion giving the highest linear response), 2) digit(s) adjacent to the maximally responsive digit (Dadj), and 3) nonadjacent digit (Dnonadj). The quadratic weights representing the interaction between two subregions were sorted into six classes of within-digit or cross-digit interactions depending upon the locations of the two subregions on the digits, namely, 1) Dmax × Dmax: both the subregions on Dmax; 2) Dadj × Dadj: both the subregions on Dadj; 3) Dnonadj × Dnonadj: both the subregions on Dnonadj; 4) Dmax × Dadj: one subregion each on Dmax and Dadj; 5) Dmax × Dnonadj: one subregion each on Dmax and Dnonadj; and 6) Dadj × Dnonadj: one subregion each on Dadj and Dnonadj.

Next, the Hessian matrix was expressed as a combination of six component matrices that represent specific types of within-digit or cross-digit interaction. Specifically, WQ=WDmax×Dmax+WDadj×Dadj+WDnonadj×Dnonadj+WDmax×Dadj+WDadj×Dnonadj+WDmax×Dnonadj

The (i,j)th element of the component matrix WDyxDz (which represents the interaction between digits y and z) is equal to the corresponding element of WQ if subregion i belongs to digit y and subregion j belongs to digit z or subregion i belongs to digit z and subregion j belongs to digit y and is zero otherwise. Thus this is a simple redistribution or sorting of the various terms of WQ into six groups (6 matrices) based on the location of the subregions that each term represents. Each component matrix, when subjected to an eigenvector/eigenvalue decomposition, yields quadratic filters that capture a specific type of interaction represented by the parent component matrix WDyxDz.

The quadratic filters that correspond to within-pad interactions on the maximally responsive digit (Dmax × Dmax) were obtained by setting all the quadratic weights, except the weights representing the Dmax × Dmax type of interactions, to zero before decomposing the Hessian matrix into eigenvalues and eigenvectors. Similarly, the quadratic filters representing the Dmax × Dadj type of interactions were obtained by setting all the quadratic weights, except the weights representing the Dmax × Dadj type of interactions, to zero before decomposing the Hessian matrix.

Comparing Strengths of Different Types of Interactions

The relative strength of the six classes of interactions was quantified by the ratio of the sum of squares of the quadratic weights belonging to a particular type of interaction to the sum of squares of all the quadratic weights (called fraction of quadratic energy captured by a particular type of interaction). The significance of each type of interaction was assessed by comparing the distribution with a null distribution obtained by repeating the entire model estimation procedure with the stimulus pattern on the adjacent digits shuffled along the time axis in the regression equations.

Strength of a Quadratic Filter

We again used measures based upon the sum of squares of quadratic weights to characterize the fraction of total quadratic strength accounted for by a given quadratic filter. If WDyxDz represents the Hessian matrix constructed from the weights corresponding to a particular type of interaction (DyxDz), then the sum of squares of all the weights corresponding to this particular interaction can be written as WDyxDz2=WDyxDzTWDyxDz=(i=1nλiqiqiT)T(i=1nλiqiqiT)=i=1nλi2 Also, the norm of the projection of a quadratic filter (defined as QFiQFiT) is λi2. Hence, the fraction of the quadratic strength of a component matrix accounted for by a given quadratic filter can be measured by the ratio λi2i=1nλi2.

Homogeneity of a Quadratic Filter

Since the relative signs of different regions of a quadratic filter determine the nature of interactive contribution of the filter to the neural response (see appendix), we estimated a homogeneity metric for each filter to characterize whether a given filter had similarly signed pixels throughout the RF or a balance of positive and negative regions. The RF homogeneity index measured (on a scale of 0–1) the relative strengths of regions with positive and negative signs within the quadratic filter. Specifically, if n1 and n2 are the sums of squares of positive and negative weights, respectively, then we define the homogeneity index as HI=|n1n2|n1+n2 HI = 0 indicates that the positive and negative regions within the quadratic filter have equal strengths, whereas HI = 1 indicates that all of the nonzero pixels of that quadratic filter have similar signs.

The strength of the contribution of a quadratic filter was measured as the ratio of the square of the corresponding eigenvalue to the sum of squares of all the quadratic weights representing that particular type of interaction.

Cross Validation

The data set was divided into two separate sets for the purposes of estimating free parameters (80%) and cross-validation (20%). We used the fraction of the variance in the 20% of data set aside that can be accounted for by a given model as a measure of predictive r2. Specifically, if yi is the observed response at the ith time step and ŷi is the response predicted by a given model, we obtained the total sum of squares (TSS) and the residual sum of squares (RSS) as TSS=i=1N(yi1Nj=1Nyj)2 RSS=i=1N(yiyi)2 In addition, we obtained an estimate of the variance attributed to noise (NSS) by assuming successive scans as repeats (see DiCarlo et al. 1998 for details). NSS=i=1NVar(Yi) where Yi = [yi, yiT, yi+T] Here T is the number of steps in each scan.

Predictive r2 was defined as r2=TSSRSSTSSNSS

The procedure was repeated 150 times, using a different set of 20% of the data each time. For each combination of 80–20 (data estimation-cross validation), model response ŷi was averaged across 25 instances of the model obtained by randomization (see above). The final predictive r2, linear and quadratic weights, and the predicted response reported in this article were averaged across all 150 repeats.

Linear Model with Spiking History

To test whether the improvements observed with a quadratic model over a linear model could be attributable to known neural nonlinearities such as spiking history (Paninski et al. 2004; Pillow and Simoncelli 2003; Truccolo et al. 2005, 2010), we also evaluated the goodness of fit of the linear model (Eq. 2) with terms added to capture spiking history. Specifically, the linear models derived by matching pursuit as described above were expanded by including a linear combination of terms denoting the effects of spiking history at a delay τ as follows: εn=w0+xi(n)wi+τ=1τmaxwhτ·rnτ Here whτ captured the weight/contribution of the neural firing at a history of τ ms.

The model was separately evaluated for multiple levels of spiking history such as τmax = 32 ms, τmax = 160 ms, and τmax = 320 ms by fitting a GLM, again yielding a log-linear model in linear terms and spiking history [i.e., rnPoisson(eεn)]. Note that unlike previous models, which let the algorithm iteratively pick salient terms from a dictionary of potential candidates, we simply appended terms for spiking history to the linear model obtained earlier (i.e., Eq. 1). As a result, if spiking history does not contribute to neuronal firing, it may lead to overfitting and hence poorer performance on cross validation.

RESULTS

Comparing Responses Predicted by Linear and Quadratic Receptive Fields

Figure 2 shows the responses of the linear model and the quadratic model (consisting of linear and quadratic terms as described in Eq. 2) of a typical 3b neuron. The linear RF (Fig. 2A) indicates that the major drive for this neuron is from D3. Consistent with previous studies (DiCarlo et al. 1998), the linear RF on this digit comprises an excitatory region (red) surrounded by an inhibitory region (blue). The inhibitory region is shifted distally on the finger pad with respect to the excitatory region, which could be due to spatial effects or due to inhibition that is delayed with respect to the excitation (DiCarlo et al. 1998; Gardner and Costanzo 1980b). Since the model fitting procedure correlates the responses with the instantaneous pattern of dots on the finger pad, this temporally trailing inhibition can only manifest in the linear RF by being shifted distally along the scan direction. However, trailing inhibition is an interactive effect between two dots separated along the temporal axis (Gardner and Costanzo 1980b). Since a linear RF, by definition, is restricted to just the linear terms, the fitting procedure accounts for this inhibitory effect by means of negative weights (DiCarlo et al. 1998) in regions on the pad surrounding the excitatory spot. When quadratic terms are included in the model, the linear component is composed solely of an excitatory area (Fig. 2B), with the trailing and infield inhibition being captured by the quadratic terms.

Fig. 2.

Comparison of responses of linear and quadratic receptive fields (RFs). A: structure of a linear RF of a sample neuron. Red (blue) indicates excitation (inhibition). The 3 squares indicate 1 cm × 1 cm regions around the centers of distal pads of digits D2, D3, and D4. B: the linear component of the quadratic RF for the same neuron. C: a section of the random dot stimuli (top) and the raw response as a spatial event plot (SEP) (1st raster). An estimate of the underlying firing rate is obtained by convolving the SEP with a 2-dimensional (2D) Gaussian filter of width (σ = 1 pixel) (2nd raster). The next 2 rasters show the responses of the linear RF (3rd raster) and the quadratic RF (4th raster). Each raster comprises 30 rows corresponding to 30 scans of 9.6 s each. Each successive scan was obtained by shifting the stimulus laterally by 200 μm. The last row (5th raster) shows the difference in the response predicted by the quadratic RF over that of the linear RF. Regions where the quadratic model outputs a higher (lower) response than the linear model are indicated in red (blue). At time steps A and B, the excitatory region of the linear RF was stimulated by 1 and 5 dots, respectively. At time step C, the number of dots in the inhibitory region of the linear RF exceeded the number of dots in the excitatory region of the linear RF. D: 1st column shows the stimulus pattern overlaid with the linear RF of the neuron at time steps A, B, and C. Columns 2–5 show 11 × 11 arrays of bins around A, B, and C for observed responses (2nd column), responses of the quadratic RF (3rd column) and linear RF (4th column), and the difference between the 2 model responses (5th column). Black dots in the panels correspond to time bins A, B, and C.

The first raster of Fig. 2C displays the observed response as a raw spatial event plot (SEP, consisting of 30 successive scans of the drum plotted as 30 rows). Note that the raw SEP reflects neuronal all-or-none firings, which are probabilistic events. To visualize the underlying firing rate, the second raster plot displays the same SEP convolved with a 2D Gaussian (σ = 1 pixel). The next two raster plots correspond to the responses of the linear (4th raster) and quadratic (3rd raster) RF over the data set. The bottom band (5th raster) displays the difference between the responses of the quadratic and the linear RFs. Regions where the quadratic model outputs a higher (lower) response than the linear model are indicated in red (blue). Overall, both the linear and quadratic responses qualitatively match well with the observed responses, suggesting that the linear models capture much of the primary responses of the neuron. However, at a finer scale, there are significant differences between the predictions of the two models. Examples of such differences are illustrated via enlarged images of three regions (namely, A, B, and C) in Fig. 2D.

The number of dots stimulating the excitatory region of the RF is higher at region B than region A. Hence the linear RF outputs a proportionately higher response at bins around B (compare the overall intensity of the top 2 panels in the 4th column of Fig. 2D). However, as indicated by the convolved SEP of Fig. 2D, the observed responses are unaffected by the dot density. This is because, unlike peripheral neural responses that increase their firing rate logarithmically as the average dot density within the RF increases, area 3b neurons are largely unaffected by the average dot density (Sripati et al. 2006). The quadratic terms correct for this difference by increasing the model response around A (red region in the top panel in the last column of Fig. 2D) and lowering the model response around B (blue region in the center of last column of Fig. 2D) such that the responses are similar around A and B, which matches the similarity of observed responses around A and B. As a result, the quadratic RF model provides a closer match to the observed responses and better mimics the invariance in the neural responses to changes in average dot density.

Region C in Fig. 2D corresponds to a time step when the number of dots in the inhibitory region of the linear RF exceeds the number of dots in the excitatory region. As indicated by the bottom panel of the fourth column in Fig. 2D, the linear RF predicts no response around this region in the raster, although the observed response shows neuronal firing (1st and 2nd columns, last row). This suggests that the inhibition in the linear RF is a result of a linear approximation to trailing inhibitory effects, which is in fact an interactive or quadratic effect. A quadratic filter is better at capturing the inhibition by second-order weights, and hence the linear component of the quadratic RF (Fig. 2B) differs from the purely linear model and lacks the distal inhibitory region. The result is a better match to the observed data (bottom panel of the 3rd column in Fig. 2D), as the quadratic model predicts a nonzero response around C similar to the observed response.

Characterization of Linear Component of Quadratic Field

The linear components were classified based on the strengths of the positive and negative weights within the three stimulated pads, with respect to the maximum weight. For most neurons (65/80), the negative weights within the dominant digit or the weights on the nondominant digits did not exceed 10% of the maximum weight, indicating that the linear response is dominated by a single excitatory region exclusively on a single pad (Fig. 3A, left). A few neurons had an inhibitory region surrounding the excitatory region (5/80) (Fig. 3A, 2nd from left), another excitatory region on an adjacent pad (multidigit) (2/80) (Fig. 3A, center), distinct multiple regions of excitation on a single pad (1/80) (Fig. 3A, 2nd from right), an inhibitory region on some other pad (2/80) (Fig. 3A, right), or no linear excitatory component at all (5/80) (not shown). Figure 3B displays the distribution of the excitatory area measured by counting the subregions with a positive weight. The mean area was 10.87 mm2 (min 3 mm2, max 21 mm2), which is similar to previous reports (DiCarlo et al. 1998: mean 14 mm2, min 3 mm2, max 43 mm2).

Fig. 3.

Characterization of the linear component of the quadratic RF. A: the different types of linear components along with their frequency encountered in the population. A majority of the neurons were characterized by a single excitatory component on the dominant digit (1st panel), and just a few had excitatory spots on multiple digits (3rd panel), multiple spots on a single digit (4th panel), or a single inhibitory spot (5th panel). The ellipses that were fitted to characterize the excitatory spot are also shown. B: distribution of the areas of the excitatory components measured by counting the pixels with positive linear weights. C: distribution of the eccentricities (measured as ratio of difference of lengths of the major and minor axes to their sum) of the fitted ellipse. D: distribution of the orientation of the fitted ellipses obtained as the orientation of the major axis.

The linear excitatory components were further characterized by fitting ellipses to the excitatory region (Fig. 3A). We measured the eccentricity of the fitted ellipse as the ratio of the difference between the lengths of the major and minor axes to the sum of the lengths of the two axes (0 = circular, 1 = sharply elliptical). The distribution of eccentricities across the population (Fig. 3C) has a mean of 0.09 (min 0.004; max 0.53), which is significantly less than 0.5 (P < 0.01; 1-sample t-test), indicating that the linear components of the excitatory part of the quadratic RF of many area 3b neurons are not sharply elliptical. The orientation of the major axis denotes the direction that the ellipse is oriented along. The distribution of orientations across the population has a peak around 90° (Fig. 3D), which corresponds to the distal-proximal axis, although the distribution is not significantly different from a uniform circular distribution (Raleigh test, P > 0.5).

Thus the salient characteristic of the linear component of the quadratic RF across the population was a mildly elliptical single-digit excitatory hot spot.

Quadratic Interactions Within and Between Digits

The quadratic terms were separated into six classes that were defined relative to the maximally responsive digit (see methods). The relative strength of each type of interaction is shown in Fig. 4A. As expected, the fraction of quadratic strength within a single digit is largest for the maximally responsive digit (34%). The adjacent and nonadjacent digits by themselves account for a small fraction of the total quadratic strength (9% and 5%, respectively). This is reasonable since neurons in area 3b are classically thought to have RFs confined to a single digit. Surprisingly, a large fraction of the total quadratic strength is accounted for by interactions between digits (45%). The largest interdigit interactions are between the maximally responsive digit and its adjacent digit (24%), followed by the interactions between the maximally responsive digit and the nonadjacent digit (12%) and interactions between the adjacent digit and the nonadjacent digit (9%). These distributions imply that there is a systematic reduction in the interaction within and between digits as we progressively move away from the center of the neuron's RF. The interactions within the maximally responsive digit and its adjacent digits account for nearly 60% of the total quadratic input to the response. These two components are characterized in greater detail in the next section.

Fig. 4.

Comparison of the different types of quadratic interactions. A: distributions of the fraction of quadratic energies captured by the 3 classes of within-digit interactions, namely, Dmax × Dmax (maximally responsive digit), Dadj × Dadj (pads adjacent to maximally responsive digit), and Dnonadj × Dnonadj (pad not adjacent to maximally responsive digit), and the 3 types of cross-digit interactions, namely, Dmax × Dadj, Dmax × Dnonadj, and Dadj × Dnonadj, with 2 different methods. Fraction of quadratic strength was measured as the ratio of the sum of the squares of quadratic weights corresponding to each type of interaction to the sum of the squares of all the quadratic weights (see appendix). B: cumulative plots comparing the distributions of fractions of quadratic strengths (blue) with those expected by chance (red). Chance-level distributions were obtained by shuffling the stimulus patterns on Dadj and Dnonadj across time in the model estimation procedure (see methods).

Testing Validity of Model Fits

In any system identification framework, the response obtained to any set of inputs can be characterized as having a systematic or driven component and a noise component. The aim of a system identification procedure is to capture as much of the systematic component without fitting the noise. Models can be tested for noise fitting with cross validation. However, given that we use matching pursuit with a very large dictionary of potential terms there is a possibility that the algorithm will find some combination of terms that fits the systematic component that does not have any functional meaning.

To test whether the quadratic terms that we observed are functionally relevant or are an artifact of blind curve-fitting, we obtained a null distribution of fractions of quadratic energies by repeating the model estimation procedure with the stimulus pattern on the Dadj and Dnonadj shuffled along the time axis in the regression framework. Shuffling the stimulus pattern on digits other than the maximally responsive digit removes systematic relationships between the neural responses and the actual stimulus pattern present on the digits. As such, quadratic terms related to Dadj and Dnonadj that are picked up after the shuffling are most likely a result of blind curve-fitting.

Figure 4B compares the null distribution (red curve) and the actual distributions (blue curves) of quadratic energies for the five types of interactions related to Dadj and Dnonadj. The null and actual distributions for Dmax × Dadj, Dmax × Dnonadj, Dnonadj × Dnonadj, and Dadj × Dadj types of interactions are significantly different (P < 0.0001; paired t-test), suggesting that these classes of quadratic interactions we observed are not fully attributable to noise. The null and actual distributions for the Dnonadj × Dnonadj type of interaction were not significantly different (P > 0.4; paired t-test) (Fig. 4B, top right), indicating that the within-digit interaction on the nonadjacent digits is small and is no better than the fit to the noise itself.

Within-Digit Quadratic Component on the Maximally Responsive Digit

We characterized the quadratic component on Dmax by estimating the quadratic filters corresponding to the most negative (termed filter Q1), and most positive (termed filter Q2) eigenvalues of the Dmax × Dmax quadratic component matrix.

Filter Q1.

Figure 5A shows a distribution of the RF homogeneity index (HI), which measures the relative strengths of regions with positive and negative signs within the quadratic filter (see methods) for filter Q1 across all the neurons. The distribution is significantly skewed toward 1 (H0: median = 0.5, P < 0.0001, Wilcoxon signed-rank test), indicating that for a majority of the neurons the nonzero subregions of filter Q1 tend to have similar signs. This is further illustrated by plotting filter Q1 for three sample neurons from three different locations on the distribution. Neuron C, which is drawn from a region with high HI (HI = 1), is characterized by a single large region with a negative sign. Neuron B, drawn from the center (HI = 0.55), again has a dominant large region with a positive sign, flanked by a small region of negative sign on the proximal side. Neuron A, which represents a small population of neurons with low HI (HI = 0.17), has two regions with opposite signs aligned along the proximal to distal or scan direction.

Fig. 5.

Characterization of filter (Q1) corresponding to most negative eigenvalue of Dmax × Dmax interaction matrix. A: distribution of homogeneity index that measured the relative strengths of positive and negative regions within the filter on a scale of 0 (balance between positive and negative regions) to 1 (dominated by a single sign). Sample filters from 3 different ranges of the distribution are shown at bottom. B: difference between the centers of the single largest lobe of this quadratic filter and the center of the excitatory component of the linear component. C: distribution of area of the single largest lobe. D: distribution of the ratios of the area of linear component to the area of the single largest lobe of filter Q1. E: distribution of the eccentricities of ellipses fitted to the single largest lobe of filter Q1 across the population. F: distribution of orientation of the fitted ellipse (measured as orientation of major axis of fitted ellipse). G: distribution of the fraction of Dmax × Dmax quadratic energy captured by filter Q1, measured as the ratio of the square of the eigenvalue associated with this filter (equals the Frobenius norm of the projection matrix associated with this filter) to the sum of squares of all Dmax × Dmax quadratic weights (see appendix).

Filter Q1 was further characterized by segmenting the largest contiguous lobe. Figure 5B, which plots the offset between the center of this lobe and the center of the corresponding linear component, indicates that for most neurons the center of this lobe spatially overlaps with the linear component (mean x offset 0.33 ± 0.94 mm: mean y offset 0.14 ± 1.13 mm). The mean area across the population (Fig. 5C) is 51.4 mm2, which is significantly greater than the area spanned by the linear component (mean area 10.87 mm2) (P < 0.0001; paired t-test). This is further illustrated in Fig. 5D, which plots the ratio of the two areas. The linear component spans only ∼30% of the area spanned by the largest similarly signed contiguous region of Q1 across the population.

We characterized the shape of this region by fitting an ellipse and estimating the eccentricity (Fig. 5E) and orientation (Fig. 5F) of the fitted ellipses. The mean eccentricity across the population is 0.17, which is significantly higher than the mean eccentricity of the linear component (P < 0.0001; paired t-test). The distribution of orientation of the major axis had a peak at 90°, which corresponds to the distal-proximal axis, and the distribution is significantly different from a uniform circular distribution (Raleigh test, P < 0.01). Figure 5G shows a distribution of the fraction of the Dmax × Dmax type of quadratic strength accounted for by this filter (see methods). On average, this filter accounts for ∼59% of the total within-pad quadratic strength on the maximally responsive digit.

Thus the salient characteristic of filter Q1 across the population was a single contiguous region, either positive or negative, that was larger in size, more elliptical, and spatially overlapped with the linear component of the quadratic RF.

Filter Q2.

In contrast to Q1, the distribution of the homogeneity index for filter Q2 (Fig. 6A) is significantly skewed toward 0 (H0: median = 0.5, P < 0.0001, Wilcoxon signed-rank test), indicating that for a majority of the neurons the strengths of regions with positive and negative signs are balanced. This is illustrated by plotting filter Q2 for three sample neurons from three different locations on the distribution. Neurons A and B, which are drawn from regions with low to moderate HIs (0.14 and 0.53, respectively) and are representative of the majority of the neurons, are characterized by a positive and a negative lobe each aligned nearly along the distal-proximal axis. Neuron B, drawn from the center (HI = 0.53), has, in addition to the two lobes, a few small regions in the surrounding space. Neuron C, which represents the small population of neurons with high HI (HI = 0.78), has a large positive region flanked by a few weak negative regions.

Fig. 6.

Characterization of filter (Q2) corresponding to most positive eigenvalue of Dmax × Dmax interaction matrix. A: distribution of homogeneity index that measured the relative strengths of positive and negative regions within the filter on a scale of 0 (balance between positive and negative regions) to 1 (dominated by a single sign). Sample filters from 3 different ranges of the distribution are shown at bottom. B: fraction of the total structure on this filter captured by the single largest positive and single largest negative lobe. C: difference between the mean of the centers of largest positive and largest negative lobes of filter Q2 and the center of the excitatory component of the linear component. D: orientations of the line joining the centers of the 2 lobes. E: distribution of the fraction of Dmax × Dmax quadratic energy captured by filter Q2. F: distribution of the fraction of Dmax × Dmax quadratic energy captured by both filters Q1 and Q2.

Filter Q2 was characterized by segmenting the largest contiguous positive and negative lobes. We measured the fraction of the total Q2 filter structure captured by these two lobes by taking the ratio of the sum of squares of weights of the pixels within the two lobes to the sum of squares of all nonzero weights within the filter (Fig. 6B). Across the population, the two lobes capture ∼86% of the total structure of filter Q2. Figure 6C, which plots the offset between the mean of the centers of the two lobes and the center of the corresponding linear component, indicates that, like filter Q1, for most neurons the two lobes spatially overlap with the linear component (mean x offset 0.58 ± 1.4 mm; mean y offset 0.5 ± 1.7 mm). Figure 6D shows the distribution of orientation of the line joining the centers of the two lobes. The distribution has a peak at 90°, which corresponds to the distal-proximal axis, and the distribution is significantly different from a uniform circular distribution (Raleigh test, P < 0.05). On average, this filter accounts for ∼12% of the total within-pad quadratic strength on the maximally responsive digit (Fig. 6E). Filters Q1 and Q2 together capture ∼71% of the total quadratic energy on the maximally responsive digit (Fig. 6F).

Thus filter Q2 was characterized across the population by a pair of lobes, opposite in sign and aligned along the proximal-distal axis or the scan direction.

Functional Roles of Quadratic Filters

To illustrate the roles played by the filters characterized above, we compared output responses of the linear component and the two quadratic filters Q1 and Q2 with the observed responses for a sample neuron (Fig. 7). The outputs for the quadratic filters are obtained by squaring the convolution of the filters with the instantaneous stimulus patterns scaled by the corresponding eigenvalue (see methods). Since the squaring operation always results in a positive number, the sign of the output response (i.e., whether the effect is facilitative or suppressive) is determined by the sign of the corresponding eigenvalue. Q1 provides a suppressive interaction, which is evident by the negative contributions (blue) shown in Fig. 7 (4th raster). In contrast, the contribution of Q2 lacks any negative predictions (blue color) because of the positive sign of the eigenvalue (Fig. 7, 5th raster).

Fig. 7.

Contributions of the linear component and quadratic filters Q1 and Q2 to the overall quadratic RF response. Top left: estimate of the underlying firing rate obtained by convolving the raw SEPs with a 2D Gaussian (σ = 1 pixel) (1st raster), the complete output of the quadratic model (2nd raster), outputs of just the linear component (3rd raster), quadratic filter corresponding to the most negative eigenvalue, i.e., Q1 (4th raster), and quadratic filter corresponding to the most positive eigenvalue, i.e., Q2 (5th raster). At time step A, the linear component of the quadratic RF for this neuron was stimulated by a single dot. At time steps B and C, the multiple dots stimulating the neuron's RF were either in a single region of Q2 (C) or spread evenly between the 2 regions of Q2 (B). Black circles highlight time steps A, B, and C in each raster.

The roles of the three filters in shaping the neural response are further illustrated via three time bins A, B, and C. The panels on the top right and bottom of Fig. 7 show the stimulus patterns during these time bins overlaid with the linear component and the two quadratic filters for the sample neuron. The corresponding locations of these time points are highlighted by black circles on each raster. The neuron is stimulated by a single dot at A but multiple dots at B. Since a linear filter linearly scales its output with the number of dots, the response of the linear component is significantly higher at B than A (compare the 2 left circles on the 3rd raster in Fig. 7). However, the observed responses around these bins, as highlighted by the corresponding circles on the first raster, indicate that the neuron is slightly inhibited around B compared with A. Filter Q1 corrects for this discrepancy by contributing a strong suppressive response at B (compare the corresponding circles on the 3rd raster).

The example denoted by region C also corresponds to a time bin when the stimulus consisted of multiple dots in the responsive region. However, unlike B, which corresponds to an instance when the multiple dots are spread evenly between the two lobes of filter Q2, C corresponds to an instance when the multiple dots were located within one of the two lobes of filter Q2. Q2 provides no response at B but provides a strong facilitative contribution at C (last 2 panels in last column, Fig. 7). Thus Q2 masks or reduces the suppression provided by Q1 if a pair of dots is located within one of the two lobes of Q2 but leaves the suppression unaffected if the two dots happen to be in the two lobes. For the majority of the neurons, the two lobes of Q2 are nearly aligned along the proximal-distal direction (Fig. 7), which is also the scan direction. This shows that filter Q2 does not mask the suppression provided by Q1 if the pair of dots is aligned along the scan direction.

Thus filter Q1 provides a nonspecific suppression in a region overlapping with the neuronal RF that scales quadratically with dot density within the RF and offsets the linear scaling with dot density provided by the linear component of the RF. Filter Q2 modulates the suppression provided by Q1 depending upon where the dots are located with respect to the scan axis. It reduces the suppression provided by Q1 if the multiple dots are located orthogonal to the scan axis but leaves the suppression unaffected if the dots are located along the scan axis. In other words, filter Q1 captures the spatial extent of infield inhibition, whereas filter Q2 captures the temporal or trailing effects of inhibition along the scan direction.

Characterizing Quadratic Interaction Between Digits

As mentioned above, the interaction between the maximally responsive digit and its adjacent digits was the second largest quadratic component across the population. In this section, we characterize this component by looking at the quadratic filters corresponding to the most positive and most negative eigenvalues of the Hessian matrix that represents this type of interaction. These filters are complementary with respect to each other in the sense that they have identical spatial structure, except that the overall sign on one of the digits is reversed (see sample examples in Fig. 8A). Thus we will characterize the spatial properties of just one of the two filters.

Fig. 8.

Characterization of the interaction between the maximally responsive digit and its adjacent digit. A: distribution of homogeneity index linear components of the quadratic RF for 2 sample neurons. For the 2 neurons, quadratic filters corresponding to the most negative eigenvalue (1st column) and the most positive eigenvalue (2nd column) of the Dmax × Dadj component matrix are shown together with the corresponding linear component (3rd column). B: ratio of the area of the quadratic filters on the maximally responsive digit to the excitatory area of the linear component on the maximally responsive digit. C: difference between the centers of the part of the quadratic filter on the maximally responsive digit and the centers of the excitatory regions of the linear component. D: distribution of the quadratic strength of these filters, measured as the ratio of the sum of the squares of the eigenvalues associated with these filters for a neuron (equals the Frobenius norm of the projection matrix associated with this filter) to the square of the norm of the Dmax × Dadj component matrix. E: raw SEP for a sample response showing observed response (1st raster), an estimate of the underlying firing rate obtained by convolving the SEP with a 2D Gaussian (σ = 1 pixel) (2nd raster), single-digit RF output (3rd raster), and multidigit RF output (4th raster). The last raster plots the difference between the outputs of the multidigit and single-digit RFs.

Figure 8A shows the distribution of the homogeneity index for the components of this filter on the maximally responsive digit (top) and the adjacent digit (bottom). The two distributions are significantly skewed toward 1 (P < 0.0001, Wilcoxon rank sum test), indicating that for the majority of the neurons the components of these filters on each pad tend to have similar signs. If the components on the two digits are opposite for the filter corresponding to the most negative eigenvalue (or similar for the filter corresponding to the most positive eigenvalue, since the two filters have complementary structures), it indicates facilitative interaction (see appendix for interpretation). The majority of neurons with a linear excitatory component on Dmax (52/75) had facilitative interaction between Dmax and Dadj, as indicated by the relative signs of the largest lobes of the filters on the two digits. Figure 8B, which plots the ratio of area spanned by the linear component to the area spanned by the largest lobe of this filter on the maximally responsive digit, indicates that the linear component spans only ∼33% of the area spanned by the largest lobe. Figure 8C plots the difference between the center of the largest lobe on the maximally responsive digit and the center of the corresponding linear component. Compared with filters Q1 and Q2 described above, the largest lobe of this filter has a greater offset with respect to the linear component (mean x offset 0.66 ± 1.79 mm, mean y offset 0.82 ± 2.2 mm). Across the population, these filters capture ∼30% of the total Dmax × Dadj type of interaction (Fig. 8D).

Figure 8E illustrates the contribution of the multidigit quadratic terms in the response SEP of a sample neuron. The top two rasters plot the raw SEP and an estimate of the underlying firing rate [obtained by convolving with a 2D Gaussian (σ = 1 pixel)], while the third and fourth rasters plot the outputs of single-digit quadratic and multidigit quadratic models, respectively. The last raster we plot the difference between the outputs of the two models. Red spots (highlighted by blue circles) on the last raster indicate regions on the SEP where the multidigit RF response was significantly greater than the single-digit RF. The corresponding locations on the neuronal response (highlighted by red circles) with the difference plot shows that a majority of the differences correlate with regions with bursts of high firing. This indicates that the presence of coincident stimuli on the maximally responsive digit and the adjacent digit (activating the Dmax × Dadj interaction terms) tends to increase the local firing rate to specific features.

Predictive Power of Models

We used cross validation to compare the predictive power of the quadratic model with that of the linear model by predicting responses to 20% of the data based on weights estimated from 80% of the data. Because of the approximation built into the estimate of the portion of the total sum of squares attributable to noise (i.e., NSS; see methods), individual estimates of predictive r2 may not be bounded between −1 and 1 like traditional regression. For example, individual samples in Fig. 9A, which shows the distribution of 150 samples of predictive r2 for the linear (blue) and quadratic (red) models for an example neuron, may exceed 1 depending upon the errors in the estimate of NSS for that data set. However, on a population level across multiple runs, we expect them to converge to the underlying true predictive r2. Thus absolute values of predictive r2 for any model may be prone to errors (see DiCarlo et al. 1998 for a description of the error), but a comparison across models on the same data set is independent of such errors, since they would be common across all models. Thus, even though we see considerable variability in estimates of predictive r2 for a particular model, the distribution for the quadratic model for the neuron in Fig. 9A is significantly greater than the distribution for the linear model, indicating that the quadratic model has significantly more predictive content for this neuron. Figure 9B compares the median predictive r2 across 150 samples for the linear model with the quadratic model for all neurons. For essentially all of the neurons (79/80), the quadratic model performs substantially better than the linear model, indicating that a quadratic model is systematically better at capturing the interactive infield inhibition and nonclassical RF effects.

Fig. 9.

Cross validation performance of the linear and quadratic RFs. The weights were estimated using 80% of the data. Cross validation was measured as the fraction of the variance of the remaining 20% of data that is accounted for by a given model. A shows the distribution of predictive r2 for the linear (blue) and quadratic (red) models for a sample neuron across the 150 repeats using a unique set of 20% data for prediction over each repeat. B compares the average (median) predictive r2 for the linear (x-axis) and quadratic (y-axis) models across all neurons. C compares the average number of terms (free parameters) in the linear (x-axis) and quadratic (y-axis) models. D compares the predictive r2 between the original linear model with an extended version of the linear model with number of free parameters matched to the corresponding quadratic model for that neuron. E compares the predictive r2 for the original linear model with versions of the linear model that incorporated spiking history over 32 ms (red), 160 ms (green), and 320 ms (blue).

The iterative algorithm used to select a subset of terms for the linear and quadratic models was terminated based upon a stop condition derived from an expected χ2-distribution of chance level projections (see appendix). Using an identical stop condition (α = 0.01; see appendix) for the linear and quadratic models (Fig. 9B) maintains parity between the two models but results in a larger number of terms in the quadratic model (Fig. 9C). Smaller order or number of free parameters of the linear model could potentially yield poorer performance due to underfit. We obtained a variant of the linear model by letting the iterative algorithm run until the number of terms equaled those of the corresponding quadratic model (order-matched linear model). Although matching the order to the quadratic model provided a modest improvement in the predictive r2 (Fig. 9D), it is still substantially lower than that of the quadratic model, indicating that the improved fit provided by the quadratic model cannot be attributed to differences in order of the models.

To test whether the quadratic effect is attributable to known neural nonlinearities such as spiking history, we compared the original linear model with variants of the linear model with terms explicitly accounting for spiking history (Fig. 9E) up to 32 ms (red), 160 ms (green), and 320 ms (blue). Adding terms to account for spiking history degrades the model, resulting in poorer fits and indicating that the interactive effects in area 3b neurons captured by the quadratic terms are not attributable to spiking history.

DISCUSSION

In this study, we investigated the nonlinear interactions in the responses of area 3b neurons by estimating quadratic RFs with linear and quadratic terms spanning the distal pads of digits D2–4. Unlike previous studies (DiCarlo et al. 1998; Sripati et al. 2006) that used a multiple regression framework between linear terms and a neuronal spike count to estimate linear RF, we used a GLM with a log-link function that yields a log-linear relationship between the input subregions and firing rate. A GLM fit may be better given the Poisson nature of spike counts; however, we obtained similar results when we used a multiple regression framework like the previous studies (results not shown). Since the inclusion of second-order terms in the quadratic model potentially causes a dimensional explosion in the number of terms in the model, we first estimated, using a method based on matching pursuit, a subset of the quadratic terms that contribute significantly to the observed response. We find that, for a majority of the neurons, the linear component of the RF consisted of a single excitatory region on the maximally responsive digit. Furthermore, we show that there are quadratic interactions on the maximally responsive digit, as well as quadratic interactions between the maximally responsive digit and its adjacent digit(s).

About 71% of the quadratic interactions on the maximally responsive digit were captured by two filters that correspond physiologically to previously reported nonlinear inhibitory fields, namely, infield suppression and trailing inhibition (DiCarlo and Johnson 2000; Gardner 1984). Furthermore, the interaction between the maximally responsive digit and its adjacent digit(s) was characterized by a pair of filters that together represented facilitative or suppressive modulations of the response on the dominant digit.

Since the method that we have derived from matching pursuit is statistical, it cannot completely eliminate picking up terms that fit noise. Thus at least a part of the observed multidigit interactions may still be attributable to noise. However, we believe that this contribution is small because 1) the strengths of different types of quadratic energies (Fig. 4A) show a systematic reduction in interactions as we move away from the center of the RF and terms fitting to noise would tend to distribute randomly across the RF and 2) the distributions of observed fractions of quadratic energies are significantly greater than a chance distribution obtained by fitting the response to arbitrary dot patterns obtained by shuffling the dot patterns on the adjacent and nonadjacent fingers (Fig. 4B).

Previous Studies

Previous studies in area 3b have revealed overlapping excitatory and inhibitory inputs throughout the RF of the neuron (Alloway and Burton 1991; Dykes et al. 1984). However, infield inhibition is rarely revealed by single punctate probes (Iwamura et al. 1983; Mountcastle and Powell 1959; Sur 1980; Sur et al. 1984), which show area 3b RFs as homogeneous excitatory regions (Sur 1980). This is in agreement with the predictions of our quadratic RFs. Since a single stimulus cannot evoke interactive components, the observed RF will be dominated by the linear component of the quadratic RF, which is a single excitatory region for most neurons.

Linear RFs estimated with a scanned random dot pattern (Dicarlo et al. 1998) or multiple probes randomized in space and time (Sripati et al. 2006) do reveal a significant amount of inhibition, but a linear RF fails to capture that part of inhibition that spatially and temporally overlaps with the excitatory core. Since the excitatory core is stronger than the overlapping inhibition, the weights are positive in the overlapping region. By including quadratic terms, we have disentangled the feed-forward excitatory drive (excitatory core), captured as the linear excitatory component, and the infield inhibition, which manifests as spatially overlapping suppressive quadratic filters. These filters span an area larger than the linear component across the population, which is consistent with previous reports suggesting that the inhibition extends over an area larger than the revealed excitatory region (Alloway et al. 1989; Alloway and Burton 1991; DiCarlo et al. 1998; Gardner and Costanzo 1980b). A linear RF reveals only parts of this infield inhibition as a temporally coincident but spatially offset inhibitory component or a spatially coincident but temporally lagging component (DiCarlo and Johnson 2000; Sripati et al. 2006). Since the interactive inhibition in the overlapping region is a systematic effect that the linear models fail to capture but is accounted for by the second-order model, quadratic models are better descriptions of the response than simple linear models (Fig. 9B).

Another factor that may contribute to the infield inhibition, especially in regions of small RF sizes such as the distal pads, is skin mechanics. The role of skin mechanics is evident in the periphery as surround inhibition (Sripati et al. 2006; Vega-Bermudez and Johnson 1999) in the RFs of SA1 afferents, although there are no known synaptic mechanisms in the periphery that could lead to lateral interactions. Since these very same afferents constitute the major input of area 3b neurons, they are likely to pass on the skin mechanical effects in their response.

In addition to spatial interactions, studies have revealed a temporal effect of infield inhibition in the form of a reduced response to the second of a pair of temporally separated stimuli (Gardner and Costanzo 1980b). Since we stimulated the skin with a constant scan velocity and direction (proximal-distal), effects along the temporal axis manifest as spatial effects along the distal-proximal axis. Thus the filter corresponding to the most positive eigenvalue, which overlaps with the linear component and consists of two lobes of opposite sign separated along the distal-proximal axis, captures the temporal aspect of infield inhibition in the form of suppressive interaction between the two regions of opposite sign separated along the distal-proximal direction. We hypothesize that if the scan direction is varied the two lobes of filter Q2 will move along the scan direction, whereas if the scan velocity is increased (decreased) the distance between the two lobes will increase (decrease). Furthermore, if a randomized spatiotemporal stimulus is used instead of a scanned stimulus to obtain a spatiotemporal quadratic RF, the two lobes will lie along the time axis. This is analogous to the fact that the trailing component of infield inhibition manifests (though partly) as a “replacing inhibition” along the temporal axis if a spatiotemporal stimulus is used (Sripati et al. 2006) but manifests as a spatially offset component of inhibition along the scan direction if a scanned stimulus is used (DiCarlo and Johnson 2000).

Pharmacological blockade of GABAergic transmission reveals an underlying excitatory drive from adjacent digits. This drive is attributable to intracortical lateral connections due to greater response latencies (Alloway and Burton 1991). However, given that <10% of the neurons in area 3b show multidigit responses to punctate probes (Sur 1980), the excitatory drive from adjacent digits is balanced by the inhibitory drive potentially through GABAergic interneurons. Our results suggest that this balance between lateral excitation and inhibition may be slightly offset. Such an offset may be the result of a short delay due to the extra synapse involved with GABAergic interneurons. This offset may not be sufficient to drive the neuron to a stimulus on the adjacent digit but may modulate the membrane potential so that the neuron fires more vigorously to a coincident volley of excitatory postsynaptic potentials (EPSPs) arising from the stimulation of the maximally responsive digit. Such a modulatory effect may be manifested as the quadratic interaction between the maximally responsive digit and its adjacent digits.

Functional Significance

Our results suggest that the transformation performed by area 3b neurons on the input spatial pattern has a significant nonlinear component, which may play an important role in shaping the feature selectivity. The linear excitatory core in area 3b is weakly elliptical and may have a role in shaping feature selectivity such as orientation selectivity. Thus orientation selectivity observed in area 3b (Bensmaia et al. 2008; Hyvarinen and Poranen 1978; Pubols and Leroy 1977; Warren et al. 1986) may arise because of the eccentricities observed in the spatially overlapping quadratic filters. A bar oriented along the major axis of this filter will result in a greater interactive suppression (and weaker response) than a bar oriented along the minor axis.

The presence of a significant interaction between the maximally responsive digit and its adjacent digits implies that integration of tactile information across pads begins as early as area 3b. For a stimulus spanning multiple digits, the “extension” of the stimulus on the adjacent digits provides a contextual modulation of the response to the part of the stimulus within the dominant digit. As studies in the visual system have shown, such a contextual modulation may be critical for the representation of extended contours (Fitzpatrick 2000; Gilbert and Wiesel 1990), corners (Silito et al. 1995), or local curvature (Krieger and Zetzsche 1996; Wilson and Richards 1992) and thus provides a first step of transforming the isomorphic input from the peripheral afferents to a nonisomorphic form that underlies perception.

This cross-digit interaction may also be the basis for the RF changes observed in cortical plasticity studies (Wang et al. 1995). Conditioning through a coincident presentation of stimuli on adjacent digits could strengthen the lateral excitatory inputs from adjacent digits. This long-term potentiation will lead to an increase in the drive from lateral inputs and cause the neuron to fire to any stimulus on the adjacent digit, causing the linear component of the RF to spread to adjacent digits.

We conclude that nonlinear mechanisms play a major role in shaping somatosensory neural responses as early as area 3b. The nonlinear effects function to shape the RFs to be selective to local spatial variations in form and give neurons in area 3b selectivity to local simple features such as orientation and motion. This underlies the perception of local spatial form and RFs that form the basis for roughness perception (Hsiao et al. 2003).

GRANTS

This work was supported by National Institute of Neurological Disorders and Stroke Grant NS-34086.

DISCLOSURES

No conflicts of interest, financial or otherwise, are declared by the author(s).

AUTHOR CONTRIBUTIONS

Author contributions: P.H.T., P.J.F., and S.S.H. conception and design of research; P.H.T., P.J.F., and S.S.H. performed experiments; P.H.T. analyzed data; P.H.T., P.J.F., and S.S.H. interpreted results of experiments; P.H.T. and P.J.F. prepared figures; P.H.T. drafted manuscript; P.H.T., P.J.F., and S.S.H. edited and revised manuscript; P.H.T. and S.S.H. approved final version of manuscript.

ACKNOWLEDGMENTS

We thank Bill Nash and Bill Quinlan for designing the drum stimulus. We thank Justin Killebrew, Eric T. Carlson, Frank Dammann, Zhicheng Lai, and Sung Soo Kim for their help in setting up the experiments and analysis.

APPENDIX

Matching Pursuit to Select Subset of Terms with Salient Contribution to Neural Response

The analytical tool used in this paper to circumvent the problem of “dimensional explosion” due to the inclusion of quadratic terms is an iterative algorithm derived from matching pursuit (Mallat and Zhang 1993). We assume that underlying firing rate (RTrue) of the neuron governing the spiking response is related to the input (X) through a linear combination of some kernels B(X) = [B1(X), B2(X), …, Bq(X)]. Specifically, RTrue=i=1qBi(X)Wi(A1)

Given n observations in an experiment, RTrue as well as the kernels Bi(X) can be thought of as n-dimensional vectors [with jth entry of the kernel vector Bi(X) computed from the input stimulus pattern during the jth time step]. These kernels can be thought of as nonlinear feature maps that transform the input into a feature space, in which neuronal response is linear (Wu and Gallant 2004). The nature of kernels depends upon the choice of basis set over which the user wishes to express the firing rate (RTrue). For example, if we want to estimate the linear RF, then B(X) is simply the linear terms X spanning the distal pads. If we wish to express the function in terms of a Volterra series, then B(X) consists of various powers and products of powers of X. The model is known exactly if we identify the set of kernels B(X) and estimate the corresponding weights w = [w1, w2, …, wq]. In general, this is a hard problem since we usually have an infinite set of possible kernels and a finite amount of data. The proposed algorithm circumvents this problem by building the model in steps from a constant term. At every iteration, the algorithm picks up and adds to B(X) a kernel vector from an overcomplete dictionary D that maximizes the projection of the observed spike train. At the end of each iteration, the remaining kernel vectors in D are orthogonalized with respect to the selected kernel. This ensures that the dictionary always spans a space orthogonal to the space spanned by the selected kernels in B, and hence always picks up that part of the spike response not already accounted for by the current running model. The iterations are terminated when the maximum projection at a given step is not statistically significant.

Note 1.

The idea of orthogonalizing the entire dictionary D with respect to the selected kernel is a daunting proposition. However, it can be achieved by projecting the observation vector R into the orthogonal space instead (see algorithm below for details).

Algorithm.

The spike trains collected in the experiments are converted into an array of spike count vectors (observation vector) by binning at a suitable bin interval. Let R (n × 1) be the “current” observation vector (given n observations) and D ≡ {D1, D2, …, Dm} be the dictionary of all vectors that we wish to test the observed response against. Let N = [‖D1‖, ‖D2‖, …,‖Dm‖] be a vector holding norms of the dictionary elements. Let B be the set of selected vectors and ws be the corresponding projections. (Note that this is not the same as weights w in Eq. 1. They converge only when dictionary elements are samples of Gaussian white noise, in which case the method resembles spike-triggered averaging). The steps involved in the algorithm are as follows:

1) Project R onto the dictionary elements and divide by the norm vector to obtain the current projections. pi=RTDi/Ni

2) Choose the vector (Ds) with maximum projection energy (ps2) as the current best estimate. Obtain the component of Ds orthogonal to the vectors in B (call it Ds) and include it in B (i.e., B = [B, Ds]). This ensures that all vectors in B are mutually orthogonal.

3) Update the projection vector, observation vector, and norm vector as follows: wcurr=RTDs/Ds ws=[ws,wcurr] R=RwcurrDs/Ds Ni=Ni2(DiTDs)2/Ds2 During the update, the observation vector (R) is replaced by the component of R that is orthogonal to the selected vector Ds. The update in norm vector ensures that at the next step we compute the projection of R on the components of the dictionary elements orthogonal to Ds, without explicitly computing the orthogonal component of each of the dictionary elements.

4) If the current projection energy (ps2) is less than a predetermined threshold (see below) then stop; otherwise, repeat the procedure by going back to step 1.

At the first iteration, we choose D0, a vector of ones, as the selected vector. This ensures that the observation vector R and all the elements of the dictionary have a mean of zero from the second iteration onwards and that the vectors that are selected by the algorithm capture the variability in the response not already accounted for by the mean. This also enables us to test directly our new model against the null model that the response can be explained simply by the mean firing rate.

Stop condition.

At every step, the algorithm tries to find a basis function that best captures the response vector. If unchecked, the algorithm will continue until it finds n linearly independent vectors from the dictionary that span a space containing the response vector R, thus providing an exact solution to the system of equations. In theory, if R is a vector of true firing rates it will have very small projections (zero in the absence of static nonlinearity) on those kernels in the dictionary that do not modulate the response. Hence, the algorithm will never pick up such vectors. However, the observation vector R consists of samples of a random variable and is noisy. It will have some nonzero projection on vectors that do not contribute to the response. The following is a theoretical framework to optimally terminate the algorithm and determines when the projection on a particular vector is not significantly better than chance.

The projection of the observation vector R on Di is p = DiTR/‖Di‖. If R is sufficiently big (large number of observations), successive stimulus presentations are random, and we assume a Poisson spike count, then by the central limit theorem p tends to Gaussian with mean μ = DiTRtrue/‖Di‖ and variance σ2 = ∑j=1ndij2rjtrue/‖Di2, where Rtrue = [r1, r2, …, rn]T is the underlying true rate vector. If Di is orthogonal to RTrue (Di has no role in the neuronal response), then μ = 0 and p22 follows a χ2-distribution with 1 degree of freedom.

Thus at the end of every iteration we can formulate a null hypothesis that the current estimate of rate vector + (obtained as the best-fitting linear combination of the kernels selected up to the current iteration passed through a static nonlinearity, i.e., + = [BwsT]+) is the best estimate of the firing rate underlying the observed neural response. Under this null hypothesis, projection energy on any other kernel (Di) in the dictionary is simply a sample of a zero-mean χ2-distribution with variance σ2 = ∑j=1ndij2j+/‖Di2. Hence, to terminate the algorithm at a particular level α, the maximum projection energy at any iteration can be compared to the cumulative value of this χ2-distribution at a level α. A cutoff of α = 0.01 was used in this study.

Estimating weights.

Once a suitable set of basis vectors consisting of linear and nonlinear kernels B(X) is identified, weights w are obtained by fitting a GLM with a log-link function. Usually, the number of kernels picked up by the algorithm is relatively small. Hence, fitting a GLM is tractable. Note that a GLM fit yields a model that is log-linear in the selected terms, which differs from the original linear framework between firing rate and kernels used to iteratively select the subset of salient terms (Eq. A1). Alternatively, a linear least-squares method could be used to fit the observed spike count vector and selected kernels with explicit static nonlinearity, which may maintain the linear relationship between the selected terms and neuronal firing rate but may be suboptimal given the Poisson nature of spike counts.

Interpreting Quadratic Filters

The interpretation of second-order interaction implied by a given quadratic filter can be understood by considering the following simple example. Assume that we have a 3-pixel or 3-term system given by X = [X1, X2, …, X3]T. Let us assume that the interaction between these 3 pixels is given by the following Hessian matrix: H=[001000100] This Hessian implies that there is a positive or facilitative interaction between pixels X1 and X3. An eigenvector/eigenvalue decomposition of this matrix gives two eigenvectors with nonzero eigenvalues, namely, 1.E1=[12,0,12]T(witheigenvalue1) 2.E2=[12,0,12]T(witheigenvalue1) Thus a facilitative quadratic interaction between two regions can manifest in two ways in the quadratic filter. If the eigenvalue is negative, then the two regions in the filter have opposite signs. If the eigenvalue is positive, then the two regions have identical signs.

Similarly, the Hessian H=[001000100] which implies a suppressive interaction between pixels X1 and X3, has the following two eigenvalues: 1.E1[12,,0,12]T(witheigenvalue1) 2.E2[12,,0,12]T(witheigenvalue1) Thus if two regions have suppressive quadratic interaction, they have identical signs if the corresponding eigenvalue is negative and opposite signs if the eigenvalue is positive.

REFERENCES

View Abstract