|
|
||||||||
The Journal of Neurophysiology Vol. 87 No. 4 April 2002, pp. 1749-1762
Copyright ©2002 by the American Physiological Society
Kresge Hearing Research Institute, University of Michigan, Ann Arbor, Michigan 48109-0506
| |
ABSTRACT |
|---|
|
|
|---|
Furukawa, Shigeto and
John C. Middlebrooks.
Cortical Representation of Auditory Space: Information-Bearing
Features of Spike Patterns.
J. Neurophysiol. 87: 1749-1762, 2002.
Previous studies have demonstrated
that the spike patterns of cortical neurons vary systematically as a
function of sound-source location such that the response of a single
neuron can signal the location of a sound source throughout 360° of
azimuth. The present study examined specific features of spike patterns
that might transmit information related to sound-source location.
Analysis was based on responses of well-isolated single units recorded from cortical area A2 in
-chloralose-anesthetized cats. Stimuli were
80-ms noise bursts presented from loudspeakers in the horizontal plane;
source azimuths ranged through 360° in 20° steps. Spike patterns
were averaged across samples of eight trials. A competitive artificial
neural network (ANN) identified sound-source locations by recognizing
spike patterns; the ANN was trained using the learning vector
quantization learning rule. The information about stimulus location
that was transmitted by spike patterns was computed from joint
stimulus-response probability matrices. Spike patterns were manipulated
in various ways to isolate particular features. Full-spike patterns,
which contained all spike-count information and spike timing with
100-µs precision, transmitted the most stimulus-related information.
Transmitted information was sensitive to disruption of spike timing on
a scale of more than ~4 ms and was reduced by an average of ~35%
when spike-timing information was obliterated entirely. In a condition
in which all but the first spike in each pattern were eliminated,
transmitted information decreased by an average of only ~11%. In
many cases, that condition showed essentially no loss of transmitted
information. Three unidimensional features were extracted from spike
patterns. Of those features, spike latency transmitted ~60% more
information than that transmitted either by spike count or by a measure
of latency dispersion. Information transmission by spike patterns
recorded on single trials was substantially reduced compared with the
information transmitted by averages of eight trials. In a comparison of
averaged and nonaveraged responses, however, the information
transmitted by latencies was reduced by only ~29%, whereas
information transmitted by spike counts was reduced by 79%. Spike
counts clearly are sensitive to sound-source location and could
transmit information about sound-source locations. Nevertheless, the
present results demonstrate that the timing of the first poststimulus
spike carries a substantial amount, probably the majority, of the
location-related information present in spike patterns. The results
indicate that any complete model of the cortical representation of
auditory space must incorporate the temporal characteristics of
neuronal response patterns.
| |
INTRODUCTION |
|---|
|
|
|---|
The spike patterns
of auditory cortical neurons vary systematically as a function of
sound-source location. At low sound levels, some neurons show somewhat
restricted spatial receptive fields in the sense that the neurons
respond with high spike probability (i.e., high spike count) to sounds
presented from some locations and respond with low probability or not
at all to sounds presented from other locations (e.g., Imig et
al. 1990
; Middlebrooks and Pettigrew 1981
;
Middlebrooks et al. 1998
; Rajan et al.
1990
). At moderate sound level, however, most neurons show
above-background responses to sound sources from any location. In
addition to location-dependent variation in spike counts, neurons also
show location-dependent variation in the distribution of spikes in time
relative to stimulus onset. In previous studies (Furukawa et al.
2000
; Middlebrooks et al. 1994
, 1998
; Xu
et al. 1998
), we have used a pattern-recognition algorithm (an
artificial neural network) to recognize location-dependent spike
patterns and, thereby, to estimate the locations of sound sources. The
accuracy of location estimates provided an empirical measure of the
location-related information carried by spike patterns. In many cases,
accuracy was degraded substantially when spike patterns were replaced
by mean spike counts
that is, when we eliminated any stimulus-specific
characteristics of the timing of spikes. That result indicated that
spike timing was important for stimulus coding but did not reveal the
particular features of spike patterns that might be important. For
instance, the previous analysis did not permit us to quantify the
relative importance of first-spike latencies, interspike intervals, or
other higher-order temporal features.
The goal of the present study was to quantify the relative amounts of
information about sound-source location that are carried by spike
counts and by spike timing and to identify particular temporal features
of spike patterns that might transmit stimulus-related information. We
recorded the responses of single units in area A2 of the auditory
cortex of anesthetized cats. We focused on area A2 because neurons
there tend to show broad frequency tuning (Schreiner and Cynader
1984
), suggesting that they might integrate location cues
across broad frequency ranges. Also, we have considerable previous data
from area A2 indicating that neurons can signal the locations of sound
sources in azimuth (Furukawa and Middlebrooks 2001
;
Furukawa et al. 2000
; Middlebrooks et al.
1998
) and elevation (Xu et al. 1998
) and that
neurons respond to spectral-shape cues for sound-source elevation
(Xu et al. 1999
) and paired clicks (Mickey and
Middlebrooks 2001a
) in a way that parallels human localization
judgments. We have shown previously that ensembles of ~100 A2 neurons
can signal sound-source azimuth with accuracy comparable to the
behavioral localization accuracy of cats (Furukawa et al.
2000
). Nevertheless, we are aware of no conclusive
demonstration of a role of area A2, or of any other cortical area, in
localization behavior. For that reason, the present results should be
regarded as pertaining to information-bearing features of spike
patterns in one particular cortical field
we cannot claim to have
evaluated every possible cortical spatial representation.
In the present study, we recorded the spike patterns of single neurons elicited by noise bursts that varied in location throughout 360° in the horizontal plane. We quantified the location-specific information that was transmitted by full-spike patterns and by spike patterns that were processed to degrade stimulus-related spike counts or spike timing. We also measured the information transmitted by three unidimensional parameters: mean spike count, mean spike latency, and the dispersion of spike latency. The results demonstrated that, for many neurons, spike timing transmitted more stimulus-related information than did spike count. The feature that transmitted the most stimulus-related information was the latency of the first spike in each spike pattern, often transmitting as much information as did the full-spike pattern.
| |
METHODS |
|---|
|
|
|---|
Stimulus generation
The experimental apparatus for stimulus generation was identical
to that detailed previously (Furukawa et al. 2000
;
Middlebrooks et al. 1998
). Briefly, experiments were
controlled with an Intel-based personal computer. Acoustic stimuli were
synthesized digitally at a sampling rate of 100 kHz using equipment
from Tucker-Davis Technologies (TDT; Gainesville, FL). Experiments were
conducted in a sound-attenuating chamber that was lined with acoustical foam (Illbruck, Minneapolis, MN) to suppress reflections of sounds at
frequencies >500 Hz. Sounds were presented from 18 calibrated loudspeakers, 1 loudspeaker at a time. Loudspeakers were positioned in
the horizontal plane with angular separation of 20° 1.2 m from the animal's head. The speaker location directly in front of the animal was labeled 0°, and positive azimuths indicated speakers on
the right side of the animal, which was ipsilateral to the recorded
cortical hemisphere.
Noise bursts were 80 ms in duration with abrupt onsets and offsets. Tone bursts were 80 ms in duration, ramped on and off with 5-ms rise/fall times. Noise and tone bursts were presented once every ~800 ms.
Animal preparation
This report presents data from 14 purpose-bred adult cats of
both sexes. The animal preparation was identical to that detailed previously (Middlebrooks et al. 1998
). In brief,
isoflurane anesthesia was used during surgery, and
-chloralose was
used for unit recording. All recordings were made from the right
cortical hemisphere. The animal was positioned in the center of the
sound-attenuating chamber, with its body supported in a sling that also
held a heating pad and its head supported from behind by a bar attached
to a skull fixture. Thin wire supports were used to push the external
ears into a forward position (Middlebrooks and Knudsen
1987
). The position of the ears was constant throughout each experiment.
At the end of each experiment, the animal was killed. The cortex was immersed in buffered formalin and later inspected visually to confirm the region of cortex recorded.
Data acquisition and spike sorting
Procedures for unit recording and for spike sorting were
identical to those detailed by Furukawa et al. (2000)
.
Briefly, unit activity was recorded extracellularly with
silicon-substrate multichannel probes (Anderson et al.
1989
). Each probe had one shank along which 16 recording sites
were located in 100- or 150-µm intervals. Neural waveforms were
amplified, digitized (16 bits, sampling rate of 25 kHz), and stored on
computer disk. Spikes were detected on-line for monitoring purposes.
Off-line custom software (Furukawa et al. 2000
) was used
to discriminate spikes for detailed analysis. The spike-sorting
procedure used a template-matching algorithm. Spike waveforms were
expressed as weighted sums of principal components, spikes were
selected from plots of the weights of first and second principal
components, and then waveform templates were computed from those
spikes. The analysis in the present study, except when stated
otherwise, was restricted to well-isolated single units that were
identified according to the following two criteria. First, the first
and second principal components of the spike waveforms formed a
discrete cluster. Second, the distribution of interspike intervals
formed across all trials peaked at >2 ms. An example of a single-unit
recording is shown in Furukawa et al. (2000)
. In a
separate analysis, we used recordings of two or more unresolved units.
Those multiunit recordings were sampled from the same placements of the
multichannel recording probe as those that yielded the single units,
but the single units and multiple units were recorded from different
recording sites. The final data set consisted of 40 single units and
111 multiunit clusters from 28 electrode placements in 14 cats.
Experimental procedure
Recordings were made from cortical area A2. Electrode
penetrations passed dorsoventrally, oblique to the cortical surface near the crest of the middle ectosylvian gyrus, ventral to area A1.
Area A2 was distinguished from area A1 by the absence of tonotopic organization and by frequency response bands that were one or more
octaves wide at signal levels 40 dB above threshold (Reale and
Imig 1980
; Schreiner and Cynader 1984
).
Search stimuli consisted of broadband noise bursts, presented in the
region of 0° to contralateral 40° azimuth. The recording probe was
adjusted in cortical depth so that spike activity could be recorded
simultaneously from as many recording sites as possible; typically,
single- or multiunit responses were observed at ~10 of 16 recording
sites in each probe penetration. We assume that recordings were
predominantly from layers III and IV based on the recording depths and
the presence of active units in this anesthetized preparation.
Study at each probe placement began identifying a stimulus location
from which noise bursts elicited a strong response, usually 0° or
contralateral 40° azimuth. Frequency sensitivity was tested using
tones varied in 1/3-octave steps of frequency. Thresholds for noise
bursts were estimated to the nearest 5 dB by inspection of on-line
poststimulus time histograms and of plots of spike counts versus
noise-burst sound level. When thresholds differed among recording sites
at one probe position, we adopted the modal threshold as representative
for that probe position. Usually, the range of thresholds at any probe
position was
10 dB across all recording sites. Finally, we measured
the spatial sensitivity using stimuli presented from 18 azimuths in the
horizontal plane (
180-160° in steps of 20°) at five sound levels
ranging from 20 to 40 dB above the units' threshold. Stimuli were
presented in pseudorandom order such that all locations were tested at
all sound levels once before repeating all stimuli again in a different random order. Each combination of location and sound level was tested
40 times.
Study at each probe placement typically lasted ~2 h. Data for other
experiments (e.g., unit recording using other auditory stimulus sets)
(Furukawa and Middlebrooks 2001
; Mickey and
Middlebrooks 2001a
; Xu et al. 1999
) normally
were collected from the same animals. For that reason, experiments
typically lasted 2-5 days.
Data analysis
Analysis of spike data from each unit consisted of the following steps: representation of spike patterns as lists of spike times; (optional) manipulation of spike patterns to degrade putative information-bearing features; formation of multiple bootstrap samples of eight response patterns; isolation of selected unidimensional features or low-pass filtering of average response patterns followed by re-sampling with 1-ms bins; pattern recognition with artificial neural networks to estimate sound-source locations; and computation of transmitted information from joint stimulus-response matrices.
REPRESENTATION OF SPIKE PATTERNS. In off-line spike sorting, spike times were stored with 20-µs precision as latencies relative to the onset of sound at a loudspeaker. The response to each stimulus therefore was represented by a spike pattern consisting of a list of spike times. The arrival of sound at the cat's head was delayed by ~3.5 ms because of the acoustical travel time. The range of spike times used for the analysis was between 10 and 80 ms after the stimulus onset. For the purpose of testing the artificial-neural-network recognition of response patterns (described later), we assigned the spike patterns for odd- and even-numbered trials to training and test sets, respectively. Thus 40 trials yielded 20 training trials and 20 test trials for each stimulus. The separation of training and test sets provided a cross-validation of the pattern recognition scheme.
DEGRADATION OF PUTATIVE INFORMATION-BEARING FEATURES. The goal of this study was to quantify the relative amounts of stimulus-related information transmitted by specific features of spike patterns. For that reason, we processed spike patterns either to isolate particular unidimensional features (described in the following text) or to degrade particular features.
Spike patterns were tested in control and three degraded conditions. Full-spike patterns tested the control condition: no manipulation was applied. Shuffled spike patterns tested the impact of obliterating all stimulus-related temporal structure. First, the distribution of all the spike times was compiled across all stimulus conditions in a particular stimulus set. Then each spike pattern was reconstructed by replacing each spike time with a time drawn randomly without replacement from the distribution of all spike times. That had the effect of preserving spike counts and the first-order distribution of spike times while eliminating any specific stimulus-related timing. Within-interval-shuffled spike patterns evaluated the effective temporal precision of stimulus-related temporal information. First, the recording window from 10 to 80 ms after stimulus onset was divided into equal time intervals of 1, 2, 4, 8, 16, 32, or 70 ms. Then spike times within each interval were shuffled among all trials and stimulus conditions. The shuffling procedure was identical to that followed for the shuffled spike patterns except that spike times were shuffled within limited intervals instead of within the entire recording window; the 70-ms-interval condition was identical to the shuffled-pattern condition. An alternative way to vary the temporal precision of spike patterns would have been to vary the binwidths of spike density vectors by grouping multiple 100-µs time bins into wider bins and expressing spike probabilities within those wider bins. That approach was rejected here because it would have changed the statistics of the spike density vectors. First-spike patterns isolated the information-transmission capacity of first-spike latencies. Each such pattern was formed by eliminating the second and later spikes after stimulus onset. First-spike patterns preserved any stimulus-related trends in the mean and dispersion of first-spike latencies but conveyed no stimulus-related differences in spike counts or interspike intervals.FORMATION OF BOOTSTRAP SAMPLES.
Under the conditions of animal preparation and anesthesia that were
used, cortical neurons typically responded to a noise burst with only
one or a few spikes at the onset of the sound. The sparseness of spike
patterns made it difficult to estimate sound-source locations on the
basis of responses of single neurons to single sound presentations. For
that reason, a bootstrap sampling procedure was used to form average
response patterns within the test (or training) set (Efron and
Tibshirani 1991
; Middlebrooks et al. 1998
). Each
average response pattern was formed from a sample of spike patterns on
eight trials, drawn randomly with replacement from a training set (or
test set) of 20 responses to each combination of stimulus location and
sound level. The samples of spike patterns were averaged together
according to procedures described in the following two paragraphs. We
chose eight as the number of trials to average because, in our previous study (Middlebrooks et al. 1998
), the precision of
stimulus identification tended to increase with averages across
increasing numbers of trials, but for most units the rate of increase
tended to slow beyond averages of around eight trials. We repeated the
bootstrap sampling procedure to form 20 test and 20 training samples
for each stimulus condition for each unit.
ISOLATION OF UNIDIMENSIONAL FEATURES. Three unidimensional features were isolated from bootstrap samples of response patterns. The mean spike count was the arithmetic mean of the number of spikes per trial averaged over each bootstrap sample of eight trials. The mean first-spike latency was computed by, first, selecting all the spike patterns in each bootstrap sample that contained one or more spikes and, then, computing the geometric mean of the latency (with 20-µs precision) of the first spike in each such pattern. The geometric mean was used for latencies, rather than the arithmetic mean, because the distribution of first-spike latencies tended to be highly skewed, having a long tail toward longer latencies with variance increasing with increasing latency. The spike dispersion was the SD of all the spike times in each bootstrap sample. The spike-dispersion measure was influenced by the durations of spike patterns as well as by the trial-by-trial variability in spike latencies. No first-spike latency was computed in cases in which the eight trials of a bootstrap sample contained no spikes, and no spike dispersion was computed in cases in which the eight trials contained a total of no more than one spike; in those cases, the numbers of training or test patterns that were available were reduced. In rare instances in which no latencies or dispersions could be computed from any of the responses to a particular combination of stimulus location and sound level, that stimulus condition was eliminated from further analysis. The impact of that situation on measurements of transmitted information was tested and found to be negligible.
LOW-PASS FILTERING AND RE-SAMPLING.
The four types of spike patterns (control and 3 degraded conditions)
consisted of lists of spike times. For further analysis, those lists of
spike times were converted to vectors of 1's and 0's, representing
the presence or absence of spikes in 100-µs time bins, and those
vectors were averaged across the eight trials in each bootstrap sample.
In the case of first-spike patterns, patterns that contained no spike
were omitted from the average so all average first-spike patterns had
unity magnitude. Next, the vectors of 1's and 0's were low-pass
filtered by convolution with a unit Gaussian impulse (
= 1 ms)
and re-sampled with 1-ms precision. The low-pass filter operation is a
conventional signal-processing procedure that is necessary to attenuate
aliased high frequencies. Low-pass filtering also served to smooth the
otherwise sparse spike-density vectors. An identical procedure has been
followed in our previous studies (Middlebrooks et al.
1998
; Xu et al. 1998
). The resulting spike
patterns, regardless of the type of manipulation, consisted of
70-element spike density vectors that represented the probability of a
spike in each of 70 1-ms time bins from 10 to 80 ms relative to
stimulus onset. The 100-µs precision of the underlying spike
latencies influenced the distribution of each Gaussian impulse across
1-ms time bins. For that reason, the effective precision of the
resulting spike density vectors was 100 µs. Figure 1 illustrates an example of a sample of
eight spike patterns (represented by rasters) converted to a
spike density vector (represented by a bar plot).
|
ARTIFICIAL-NEURAL-NETWORK RECOGNITION OF SPIKE PATTERNS.
We used an artificial neural network (ANN) to identify sound-source
locations by recognizing spike patterns. The ANNs were implemented with
the MATLAB Neural Network Toolbox (The Mathworks, Natick, MA). The ANN
architecture consisted of inputs, a competitive layer, and a linear
layer. The inputs were spike density vectors or single numbers
representing unidimensional features. The competitive layer had one
hidden unit and one output unit for each stimulus location, i.e., there
were 18 hidden units and 18 output units. Each hidden unit was
specified in 70 dimensions or 1 dimension, depending on the form of the
input. The ANN was trained, using the learning vector quantization
(LVQ) training algorithm (Demuth and Beale 1998
;
Kohonen 1987
) to classify the unit responses and to
assign each class to 1 of the 18 sound-source locations (i.e., locations from
180 to 160° in 20° steps). The learning rule, in
essence, positioned each hidden unit in 70- or 1-dimensional space to
minimize the mean squared Cartesian distance to the input vectors that
corresponded to a particular stimulus. Nicolelis and colleagues
(1998)
used a similar network design for study of encoding of
tactile information, and we have used such a design for study of
cortical coding of cochlear-implant stimuli (Middlebrooks and
Bierer 2002
)
0.83, n = 40 units). The amount of transmitted information captured by the two
network architectures was highly correlated, but the amount of
information captured by the perceptron was systematically lower. The
inferior performance by the perceptron ANN was probably due to loss of information in the process of quantizing the continuous varying perceptron outputs.
COMPUTATION OF TRANSMITTED INFORMATION.
The estimates of stimulus locations by the ANN were summarized as joint
stimulus-response probability matrices, which were used to compute the
transmitted information. In the present study, transmitted information
(also known as mutual information) was a measure of the reduction in
the uncertainty in stimulus location due to knowledge of unit responses
and classification by an ANN (Cover and Thomas 1991
).
The information (I) transmitted about a stimulus set,
S, given a response set, R, is defined as
|
(1) |
|
|
| |
RESULTS |
|---|
|
|
|---|
The analysis presented here was derived primarily from 40 single units recorded from 14 cats. A supplementary analysis used data from 111 multiunit clusters. We begin by describing the spatial sensitivity of single units and by quantifying the information about sound-source azimuth that could be derived from full-spike patterns, which preserved the magnitude and timing of unit responses. Then we quantify the degree to which information transmission depended on spike timing, and we evaluate the relative importance of the first and later spikes in the spike pattern. Next, we quantify the information carried by three unidimensional features of spike patterns: spike counts, first-spike latencies, and temporal dispersion of spike patterns. Finally, we evaluate the significance of two details of the experimental design: analysis of single trials versus averages across multiple trials and single- versus multiunit recording.
Azimuth information transmitted by full-spike patterns
Figure 2 represents the responses of three single units to noise bursts that were presented at various azimuths. Each unit is represented by a horizontal row of panels, and the left-most three panels in each row are raster plots that represent responses to sound levels that were 20, 30, and 40 dB above each unit's threshold. Each horizontal row of dots in the raster plots represents the spike pattern elicited by an 80-ms noise burst. Responses to eight stimulus presentations are shown for each azimuth. The three illustrated units are representative of the range of variation in spike patterns across the unit sample. Unit 9806/18/13a (top), responded to 80-ms noise bursts with bursts of spikes lasting ~10 ms, whereas unit 0003/130/13a (middle) typically produced only one or two spikes on each trial. Spike patterns of unit 9804/24/2a (bottom) consisted of a single spike at stimulus onset, a pause, then a burst of a few additional spikes. Note that even the longest-lasting spike patterns ended well before the end of the 80-ms noise burst.
|
In the examples in Fig. 2, one can see azimuth-dependent changes in mean spike counts per trial, in the first-spike latency, and in the dispersion of spikes in time. Those three dependent variables are plotted as a function of source azimuth in the three right columns of panels in Fig. 2; the computation of those unidimensional features is described in METHODS. The three curves in each panel represent the responses to sounds at the three sound levels. All three of the variables were modulated to various degrees by the sound-source azimuth as well as by the sound level.
Several characteristics of spatial sensitivity were common to most of the sampled population. Generally, units were broadly tuned for sound-source azimuth, responding to near-threshold sounds presented throughout the contralateral half or frontal-contralateral quadrant of space. At higher sound levels, spatial receptive fields tended to expand to 360°. Even within a 360° receptive field, however, sounds tended to elicit spike patterns that varied in the number of spikes and in the distribution of spikes in time.
We used an ANN to identify sound-source locations by recognizing unit responses, and then we quantified the azimuth-related information that was transmitted by each unit. Details of the ANN analysis and of the computation of transmitted information are provided in METHODS.
Figure 3 shows examples of joint
histograms obtained for the full-spike patterns of units shown in Fig.
2. In each panel, the abscissa and ordinate indicate the stimulus
locations and the ANN assignment of responses to locations,
respectively. The areas of the filled squares are proportional to the
joint probability of stimulus and ANN response. The loci corresponding
to perfect ANN identification of source locations lay on the positive
major diagonal of the plot. For the examples in Fig. 3, the ANN
estimates generally clustered around the diagonal. Clusters of
responses in the top-left and bottom-right
corners of each panel correspond to stimuli that were mislocalized
to the wrong side of the rear midline but were correctly localized to
the rear; note that azimuths
180 and +160° were adjacent locations
even though they appear far apart in the illustration. The middle
panel shows a situation in which responses to stimuli around
160° azimuth were systematically mislocalized to around +60°
azimuth. Those responses can be understood by referring to the raster
plots in Fig. 2, middle. That particular unit responded
similarly to stimuli around
160° and to those around +60° by
producing a spike only infrequently. Despite those few classes of
errors, it is noteworthy that the full-spike patterns of these units
signaled sound-source locations with varying degrees of accuracy
throughout nearly 360° of azimuth. The percentage of correct
localizations, averaged across all stimulus locations, ranged from 17.8 to 23.5% in the examples shown in Fig. 3. That is substantially better
than the value of 5.6% correct predicted from random-chance selection
from among 18 locations.
|
The transmitted information in the cases shown in Fig. 3 ranged from 1.03 to 1.24 bits. Figure 4 shows the distribution across the entire sample of 40 units of the transmitted information for the full-spike patterns. The transmitted information ranged from 0.24 to 1.33 bits and averaged 0.81 ± 0.25 (SD) bits. To provide some feeling for those numbers, perfect identification of the 18 sound-source locations would have required 4.17 bits of transmitted information, and 1 bit would have permitted perfect discrimination of left from right. Empirically, we found that units discriminated left from right somewhat imperfectly but discriminated many of the locations within each hemifield. On average, the units transmitted 19.4 ± 6.0% of the total entropy in the stimulus set. The distribution of transmitted information was essentially unimodal, so there was no basis for distinguishing distinct classes of units that were particularly good or bad localizers. Across the sample of 40 units, the percentage of correct localizations ranged from 7.7 to 23.5% and averaged 15.0 ± 3.5% correct. The percent correct was correlated with the transmitted information (correlation coefficient r = 0.84). The transmitted information for chance-level performance was estimated by randomizing the correspondence between spike patterns and stimulus locations. That procedure yielded a chance level of 0.12 ± 0.01 bits, which was substantially less than the information transmitted by any unit.
|
Importance of spike-timing information
The examples illustrated in Fig. 2 demonstrated that spike timing, as well as spike counts, could vary with stimulus location. In this section, we explore the contribution of spike timing to transmitted information. First, we test a condition in which the stimulus dependence of spike timing was disrupted entirely, leaving only the stimulus-related information carried by spike counts. Then we estimate the relevant time scale of stimulus-dependent spike timing by systematically degrading the precision of spike times.
We disrupted spike timing by forming shuffled spike patterns, as described in METHODS. The shuffled spike patterns included no stimulus-related information carried by spike timing but maintained any information that was carried by stimulus-specific spike counts and maintained the first-order statistics (i.e., the mean and SD) of spike times across all spike patterns. The stimulus-related information contained in shuffled spike patterns was evaluated with an ANN-classification procedure identical to that used for the full-spike patterns.
Figure 5 plots the information
transmitted by shuffled patterns and by full-spike patterns. The
shuffled patterns consistently transmitted less information about
sound-source azimuth than did the full patterns. The transmitted
information in the shuffled condition averaged 0.51 ± 0.19 bits
(compared with 0.81 ± 0.25 bits in the full-spike-pattern
condition; paired t-test; P < 0.001). The
information transmitted in the shuffled condition averaged only 65 ± 18% of that transmitted in the condition in which stimulus-related spike timing was intact. We interpret that result to say that, on
average, only 65% of location-related information in the full-spike pattern was available from spike counts alone. That result indicates that spike timing carried
35% of the information. We show later that
spike timing carried additional information, beyond the 35%, that was
redundant with that carried by spike counts.
|
We estimated the stimulus-related temporal precision of spike times by forming within-interval-shuffled spike patterns, as described in METHODS. The recording window corresponding to 10-80 ms after stimulus onset was divided into equal intervals with durations of 1, 2, 4, 8, 16, 32, or 70 ms. The within-interval-shuffling procedure preserved stimulus-related spike counts and first-order temporal statistics within each interval but disrupted any stimulus-related temporal structure within intervals. The width of the shuffling interval determined the temporal precision of the surviving spike-timing representation.
Figure 6, top, shows the information transmitted by spike patterns of four units, shuffled with various temporal precision; the transmitted information for the unshuffled condition also is shown. Three of the units are those that were presented in Fig. 2, and the fourth is the unit that showed the median amount of transmitted information across the sample of 40 units (0.84 bits). In each case, transmitted information decreased with increasing interval width (i.e., with decreasing precision). Figure 6, bottom, shows the distributions across 40 units of the transmitted information at various levels of precision. In that plot, transmitted information at each interval width is expressed as a fraction of the information in the unshuffled condition, and the distributions are represented with box plots. Each of the illustrated increases in interval width resulted in a significant decrease in transmitted information (P < 0.005 for 1 vs. 2 ms, P < 0.001 for all other pair-wise comparisons, paired t-test). At the 4-ms interval width, the median fraction was 0.91, indicating that half of the units lost >9% of transmitted information when the temporal precision was degraded by that amount. At the 16-ms interval width, the median loss was 22% and the spike patterns of 25% of units showed a loss of >39% of their transmitted information.
|
Dominance of the first spike
The analyses in the previous sections demonstrated that spike timing carried appreciable amounts of stimulus-related information. The majority of spike patterns consisted of no more than a single spike. Specifically, across all 40 single units and all stimulus presentations, 44% of spike patterns had no spikes, 35% had one spike, and only 21% of responses had two or more spikes. In response to the stimulus that produced the highest spike count for each unit, 26 of the 40 units showed median spike counts of <2 per trial. We tested the hypothesis that most of the stimulus-related information in spike patterns is carried by the latency of the first spike. In this section, we test spike patterns in which all but the first spike were deleted. In the next section, we evaluate the information carried by a unidimensional representation of first-spike latency.
We compared the information transmitted by full-spike patterns and by
patterns that contained only the first spikes (first-spike patterns; described in METHODS). The first-spike patterns
preserved first-spike latency and the trial-by-trial dispersion of
first-spike latency, but any information from spike probabilities was
eliminated. Figure 7 compares the
information transmitted by first-spike patterns with that transmitted
by full-spike patterns; * and
, respectively, indicate units that
showed median spike counts of
2 or <2 spikes in response to optimal
stimuli. Many of the points lie near the diagonal line, indicating that
for many units, the first spike in the response pattern carried nearly
all the stimulus-related information. Across all 40 units, the
information carried by the first-spike patterns averaged 89 ± 21% of the information carried by the full patterns. That is, ~89%
of the transmitted information was available from a measure that
transmitted no information in the form of spike counts or interspike
intervals. The higher-count units (*) showed a somewhat greater loss of
transmitted information in the first-spike condition. If all units are
considered, the ratio of transmitted information between first-spike
and full conditions was not significantly lower for the higher-count
units than for the lower-count units (P = 0.06), but
the difference was significant (P < 0.01) after
excluding the one outlying point that showed ~0.2 bits of transmitted
information in the full-pattern condition.
|
Unidimensional features of spike patterns
We evaluated the stimulus-related information carried by features
of spike patterns that could be represented by unidimensional measures,
specifically mean spike count, mean first-spike latency, and spike
dispersion (see METHODS for computation of those measures). The information carried by each unidimensional term was evaluated using
ANNs and measures of transmitted information as presented in the
preceding text. In the present section, however, the ANNs were
configured with a single input (for 1 of the unidimensional terms) or
with two or three inputs (for combinations of 2 or 3 of the terms). We
first wished to validate the efficiency of the ANN analysis for
low-dimensional inputs. For that reason, we duplicated the analysis of
the unidimensional features using maximum-likelihood classification
(Green and Swets 1966
). Results from the ANN and maximum-likelihood analyses were highly correlated, with correlation coefficients (r) of 0.98 for spike counts and 0.96 for
first-spike latency; the correlation between ANN and maximum-likelihood
analysis was lower for spike dispersion (r = 0.57). The
transmitted information identified with the ANN procedure, however, was
significantly greater (P < 0.001, paired
t-test), ranging from 0.02 to 0.14 bits greater than that
identified with maximum likelihood classification. For that reason, the
results presented in this section were those obtained with the ANN procedure.
Figure 8, top, represents the means and SD of the information transmitted by the three individual features and by combinations of the features. Of the three single features tested (the 3 leftmost bars), the first-spike latency showed the greatest transmitted information on average: 30 of 40 units showed the greatest information carried by first-spike latency. One might have expected the information that was carried by unidimensional spike counts and first-spike latencies to be roughly equivalent to that carried by the shuffled spike patterns and by the first-spike patterns, respectively. Nevertheless, the transmitted information that was computed was significantly less for both of the unidimensional features (P < 0.001). We attribute the differences to sensitivity of the ANN to the structure of inputs to the ANN; the shuffled-spike and first-spike patterns were 70-elements vectors, whereas spike count and first-spike latency were single numbers. Also, the first-spike patterns would have shown stimulus-dependent variation in the dispersion of first-spike latencies that would not have been evident in the mean-latency measure.
|
Transmitted information could be increased by combining unidimensional features, as indicated by the four rightmost bars in Fig. 8, top. For instance, a neural network that had both latency and spike-count inputs could identify stimulus locations more accurately than a network that had either of those inputs alone. The information transmitted by combinations of two or three features (latency and dispersion, latency and count, dispersion and count, all 3) always was greater than the greater of the information transmitted by any of the individual features (P < 0.001; paired t-test). This indicates that those three features carried at least some information that was independent among the features. The combined information, however, was never as great as the sum of the information carried by individual features. That is demonstrated in Fig. 8, middle, which shows the information transmitted by combined features as a fraction of the sum of the individual values of transmitted information. The fraction always was less than unity, indicating that the features carried mutually redundant stimulus-related information. The mutual redundancy of stimulus-related information was expected from the correlation between individual features. Figure 8, bottom, shows the means and SD of the squared correlation coefficients (R2). On average, the greatest correlation was between latency and spike count. That correlation can be seen in the Fig. 2's raster plots, which show that stimuli that produced the highest spike counts tended to elicit spikes with the shortest latencies. Nevertheless, latency and spike count transmitted enough mutually independent information that addition of a spike-count input to an ANN improved localization based on latencies (i.e., Fig. 8, top). First-spike latency and dispersion showed the lowest mutual correlation, and the information carried by those features combined showed the highest fraction of the information summed between the individual features.
Significance of procedural details
In this section we evaluate the impact on results of two elements of the experimental design: the use of averages across stimulus presentations and the use of single-unit compared with multiunit responses. The analyses presented to this point were based on spike patterns averaged across eight trials. That was done because cortical units generally responded to each stimulus with no more than a few spikes locked to the stimulus onset. The sparseness of the responses would have made it difficult to estimate the stimulus dependence of spike probabilities and to form accurate representations of spike timing. Also, the response of one unit averaged across multiple trials can be regarded as a surrogate for the responses of multiple isolated units on a single trial. Here we repeated some of the analyses presented in earlier sections in this case, applying them to spike patterns recorded on single trials, that is, to nonaveraged spike patterns.
Figure 9 shows the information
transmitted by full-spike patterns, by first-spike latencies, and by
spike counts in the single-trial condition compared with the
bootstrap-averaged condition. As expected, the transmitted information
was markedly less in the nonaveraged condition for all three of these
representations (P < 0.001; paired t-test).
The consequences of not averaging differed between the spike-count and
latency representations of responses. The information transmitted by
spike counts was particularly degraded by the lack of
averaging
transmitted information in the single-trial condition was
only 21 ± 35% of that computed in the averaged condition. The
principal reason for that result is that, in the single-trial condition, spike counts could take on only one of a few numbers, most
often 0, 1, or 2. The quantal nature of spike counts was ameliorated to
a large extent by averaging across trials. In contrast, the
distribution of first-spike latencies measured on single trials formed
a continuum. Information transmitted by latencies in the single-trial
condition was 71 ± 13% of that transmitted in the averaged
condition. Presumably the main benefit of averaging in the case of
spike latencies is that averages across trials permitted a more
accurate estimate of the central tendency of latency.
|
In previous studies of cortical coding of sound-source location (e.g.,
Furukawa et al. 2000
; Middlebrooks et al.
1998
), we based much of the analysis on recordings from
unresolved clusters of multiple units. Here, we evaluated the impact of
single- versus multiunit recording on conclusions concerning
transmission of stimulus-related information. One hundred and eleven
multiunit recordings were obtained from the same electrode placements
that yielded the 40 single-unit data recordings; the single- and
multiunit recordings were made simultaneously from different recording
sites on the multichannel recording probes. Comparisons between the single-unit and the multiunit recordings are shown in Table
1 for several representations of unit
responses. Generally, the multiunit recordings transmitted somewhat
less information than did the single-unit recording. We assume that the
summation of spike patterns from two or more units that showed somewhat
different spatial sensitivity would have resulted in an apparent
decrease in spatial sensitivity resulting in the decrease in
transmitted information. Nevertheless, the relative amounts of
transmitted information among the various response representations were
similar between single- and multiunit conditions. For instance, the
shuffled-spike conditions showed similar fractions of the full-pattern
transmitted information: ~65% and ~71% for the single-unit and
multiunit conditions, respectively. Similarly, of the three
unidimensional features of neural responses (spike count, latency, and
dispersion), latency transmitted the greatest amount of information in
both single- and multiunit conditions.
|
| |
DISCUSSION |
|---|
|
|
|---|
The present results demonstrate that the spike patterns of single neurons transmit substantial amounts of stimulus-related information, in many cases enough to identify sound-source locations throughout 360° of space. Tests of various degraded spike patterns demonstrate the relative amounts of information transmitted by spike counts and spike timing. A rather unexpected result was that, for many neurons, the timing of the first spike carried as much stimulus-related information as did the full-spike pattern. In DISCUSSION, we begin by relating the present results to those of previous studies of auditory spatial sensitivity. Next, we assess the relative importance of spike counts and spike times. We consider the significance of across-trial averages of spike counts and of spike times clocked relative to stimulus onset for an animal that must make location judgements on the basis of single stimulus presentations and that has no independent reference to stimulus onset. Finally, we evaluate possible characteristics of a neural code for sound-source location.
Relation to previous studies of auditory spatial sensitivity
In the present results, single units in area A2 showed broad
spatial tuning, most often showing spatial receptive fields that occupied much of the contralateral hemifield at low sound levels and
expanded to 360° at levels 30-40 dB above threshold. Similarly broad
spatial tuning has been encountered previously in studies of the cat's
area A2 (Furukawa and Middlebrooks 2001
;
Middlebrooks et al. 1998
) and other auditory areas (area
A1: Brugge et al. 1994
; Imig et al. 1990
;
Middlebrooks and Pettigrew 1981
; Rajan et al.
1990
; area AES: Korte and Rauschecker 1993
;
Middlebrooks et al. 1994
, 1998
). We have demonstrated
previously that the spike patterns of single units or clusters of units
can signal sound-source locations throughout 360° of azimuth
that
is, that single units localize sound sources panoramically. In our
previous work, we have classified unit responses using an ANN that
consisted of a feed-forward perceptron with either linear
(Middlebrooks et al. 1994
) or nonlinear (Furukawa
and Middlebrooks 2001
; Middlebrooks et al. 1998
;
Xu et al. 1998
) transfer functions in the middle layers.
The advantages of that particular ANN design were that it produced a
continuously varying estimate of sound-source location and that it
could interpolate between untrained stimulus locations. Those
properties facilitated comparison of ANN estimates of location with
localization judgements in psychophysical experiments (Furukawa et al. 2000
; Mickey and Middlebrooks 2001a
;
Xu et al. 1999
). A different ANN architecture was used
in the present study
one that produced discrete outputs that were
restricted to the set of discrete values in the stimulus set. The
discrete outputs were more appropriate to the goals of the present
study in that we were able to evaluate the contribution of particular
features of spike patterns to stimulus-related transmitted information.
In our previous studies (e.g., Furukawa and Middlebrooks
2001
; Furukawa et al. 2000
; Middlebrooks
et al. 1994
, 1998
), many of the recordings were from unresolved
clusters of multiple units. For that reason, we were concerned that
those studies might have overestimated the breadth of spatial tuning
and underestimated the accuracy of panoramic sound localization.
Indeed, in the present study, the single units showed somewhat more
accurate sound-source localization and somewhat greater transmitted
information than did multiunit clusters. Nevertheless, all the
conclusions regarding the relative amounts of information transmitted
by various features of spike patterns agreed between single- and
multiunit conditions.
Relative amounts of information transmitted by spike counts and by spike timing
The present study demonstrated that features of spike patterns related to spike timing tended to transmit more stimulus-related information than did features related to the count of spikes per trial. The first-spike patterns, which preserved first-spike timing but lacked any spike-count information, transmitted more information than did the shuffled patterns, which preserved spike counts but lacked any stimulus-dependent temporal structure. Among the unidimensional features, first-spike latency transmitted more information than did mean spike counts. Spike dispersion, which was another feature determined only by spike timing, typically transmitted about the same amount of information as did spike counts.
Spike counts were particularly ineffective in transmitting information
in the condition in which single trials were analyzed, i.e., in which
there was no across-trial averaging. That result highlights the point
that, given spike patterns that contain no more than one or two spikes,
a single response does not provide a useful estimate of spike count
(i.e., of spike probability); that point also has been raised by
Brugge and colleagues (1994)
. Experimentally, the spike
count of a single neuron is informative when averaged across multiple
stimulus presentations. In the context of an animal's perceptual
judgement, which normally must be based on a single stimulus
presentation, the mean spike count of a single neuron is significant
only as a surrogate for the response of a population of neurons. That
is, one might regard an average of the responses of one neuron across
many presentations as representative of the average of responses across
many identical neurons on a single presentation. In a previous study of
sound-source localization based on full-spike patterns
(Middlebrooks et al. 1998
), we showed that the accuracy
of localization estimates increased with increases in the number of
stimulus presentations across which spike patterns were averaged. We
assume that much of that improvement was a result of more accurate
representation of mean spike counts. Recanzone and colleagues
(2000)
have presented a model of the primate auditory cortex in
which a localization judgement was based on the sum of spike counts
across multiple sequentially recorded neurons. In that study, the sums
across multiple neurons discriminated between pairs of sound-source
locations more accurately than did spike counts of single neurons on
single stimulus presentations. We have shown a similar result in cats
(Furukawa et al. 2000
). We found, moreover, that
sound-source localization is more accurate when the patterns of
relative spike counts across neurons are classified than when all spike
counts are simply added together. Classification of relative spike
counts exploits any differences among neurons in their stimulus
specificity, whereas between-neuron differences degrade the accuracy of
a grand sum.
Several previous studies of the auditory cortex and of other cortical
areas have demonstrated the importance of spike timing for stimulus
coding. One way to demonstrate the importance of spike timing has been
to degrade the temporal information in spike patterns and to quantify
the resulting degradation in stimulus identification. We have
demonstrated previously that the accuracy of sound-source localization
by cortical neurons is reduced substantially when spike patterns are
reduced to unidimensional spike counts (Middlebrooks et al.
1994
, 1998
; Xu et al. 1998
) or when spike patterns of ensembles of neurons are represented by vectors of spike
counts (Furukawa et al. 2000
). In the somatosensory
cortex, degradation of the temporal structure in spike patterns reduces the accuracy of discrimination of tactile stimuli by ensembles of
cortical units (Ghazanfar et al. 2000
; Nicolelis
et al. 1998
). In the visual cortex, degradation or elimination
of temporal features of spike patterns particularly reduces the
information transmitted about stimulus contrast, whereas degradation of
spike-count information has a greater impact on transmission of
information about stimulus orientation (Gawne 2000
;
Gawne et al. 1996
; Reich et al. 2001
). In
the present study, we confirmed that disruption of temporal structure
(the shuffled spike pattern and within-interval-shuffled conditions) or
elimination of temporal information (the spike-count condition) results
in a substantial reduction in transmitted information.
The importance of spike timing for stimulus coding also has been
demonstrated by tests of specially constructed representations of spike
patterns that contain only timing information. In the visual cortex,
Richmond and Optican (1987
, 1990
) represented spike patterns by principal components. Generally, the first principal component (i.e., the component that accounted for the most variance across all spike patterns) correlated highly with spike counts. Nevertheless, the second- and higher-order components alone were demonstrated to carry stimulus-related information. Those components, by definition, were largely independent of spike count and thus presumably reflected only the time structure of spike patterns. Also in
the visual cortex, Reich and colleagues (2001)
demonstrated that first-spike latencies carried considerable
information about stimulus contrast, often as much information as that
carried by full-spike patterns. In the present study, we found that all
of the representations of spike patterns that deleted direct influence of spike counts (i.e., the 1st-spike patterns, 1st-spike latency, and
spike dispersion) transmitted substantial amounts of stimulus-related information.
In the present study, first-spike latencies appeared to transmit more
stimulus-related information than did any other feature of spike
patterns. The analysis of "first-spike patterns" revealed that, for
most units, spikes that follow the first spike transmit little or no
stimulus-related information that is not available from the first
spike. To a large extent, that result must reflect the low mean spike
count that was observed in the anesthetized preparation that was used.
That is, across all units and all trials, only 21% of spike patterns
contained two or more spikes, so it is not surprising that the second
and later spikes transmitted little information. In preliminary results
of recordings from awake cats (Mickey and Middlebrooks
2001b
), we find that units often fire in a more sustained
fashion than do neurons in the anesthetized condition and that
sustained portions of spike patterns tend to be modulated by stimulus
azimuth. It remains to be seen whether or not the sustained portions of
spike patterns transmit stimulus-related information that is not
available from the onsets of spike patterns.
Two observations from the visual cortex literature, one in anesthetized
animals and one in awake animals, suggest that the dominant role of the
first spike in information transmission is not entirely a result of low
spike counts. First, Reich and colleagues (2001)
studied
visual cortex responses in an anesthetized-monkey preparation that
showed considerably more tonic activity than we observed in the cat
auditory cortex. They reported that coding of stimulus contrast was
dominated by information in the first-spike latency with little
contribution of "transient, tonic, and off" responses. Second,
Gawne and colleagues (Gawne 2000
; Gawne et al.
1996
) found in unanesthetized monkeys that elimination of mean
first-spike-latency information severely impaired signaling of stimulus
contrast by visual cortex responses.
Experimentally, spike latencies can be recorded with great precision relative to the onset of the stimulus. In contrast, the nervous system has no independent measure of stimulus onset. For that reason, information carried by latencies presumably is available to an animal only in the form of interspike intervals in the spike patterns of single neurons and relative spike times among multiple neurons. The contribution of interspike intervals to information transmission in the present study must have been small, because only ~20% of spike patterns contained two or more spikes. The relative lack of importance of interspike intervals was demonstrated by the relatively small loss of transmitted information that resulted from elimination of all interspike intervals (i.e., the 1st-spike-pattern condition).
We presume that spike-latency information is available to an animal
predominantly in the form of relative spike timing between neurons. The
present study did not examine directly the stimulus dependence of
between-neuron spike timing. Nevertheless, if we regard across-trial
averages of spike patterns as surrogates for recordings from multiple
neurons, we can treat the "spike dispersion" term as one indicator
of the stimulus-related synchrony of firing among neurons. The results
showed that spike dispersion transmitted roughly the same amount of
information as did spike counts. In the cat's cortical area A1, Brugge
and colleagues (Brugge et al. 1996
; Jenison
1998
) have demonstrated that first-spike latencies show
systematic gradients as a function of sound-source location within
single-neurons' spatial receptive fields. In their model, spike-time
differences between neurons with differing spatial gradients could
carry information about sound-source location. We have shown previously
(Furukawa et al. 2000
) that in many cases small
ensembles of cortical neurons can signal sound-source location with
approximately equal accuracy regardless of whether spike times are
recorded relative to stimulus onset or relative to the first spike in
the ensemble response. That is an empirical demonstration of effective
stimulus coding by between-neuron spike times. A similar
relative-timing representation has been modeled formally by
Jenison (2001)
.
Combined spike-count and -timing signaling of stimulus location
One of the goals of this study was to evaluate the importance of spike timing for stimulus coding. For that reason, we went to