|
|
||||||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
REPORT
Department of Computational Intelligence and Systems Science, Tokyo Institute of Technology, Yokohama, Japan
Submitted 16 December 2005; accepted in final form 13 April 2006
|
|
ABSTRACT |
|---|
|
|
|
INTRODUCTION |
|---|
|
The information-theoretic measure has been applied to different issues, for example, human category learning has been explained using information entropy (Corter and Gluck 1992
). This measure was used to formulate the hypothesis-testing process (Nelson 2005
). It is also useful for learning and representing human knowledge in machine intelligence (Nakamura et al. 1983
; Quinlan 1983
). A neural system has been shown to adaptively maximize information transmission by spike trains (Fairhall et al. 2001
). These also suggest that the brain may quantify information using the information-theoretic measure.
Recent studies have shown that the dorsal premotor cortex (PMd) is involved in motor control based on higher cognitive functions, such as mental rehearsal (Cisek and Kalaska 2004
) and memory and generation of motor sequences (Ohbayashi et al. 2003
). This suggests that the PMd may be involved in another higher function of measuring information.
The information measure is calculated from probability values. The dopamine neurons of the ventral midbrain show activity that reflects probabilistic uncertainty (Fiorillo et al. 2003
). According to functional magnetic resonance imaging studies, the activity in the human midbrain region is modulated by the probabilistic uncertainty (Volz et al. 2003
) and information entropy (Aron et al. 2004
). The dopamine neurons project to the PMd (Berger et al. 1991
; Ilinsky et al. 1985
; Williams and Goldman-Rakic 1993
). The neurons of the lateral intraparietal area are sensitive to the expected reward probability (Platt and Glimcher 1999
). This area is connected with the PMd through a neural pathway via the prefrontal cortex (Andersen et al. 1985
; Cavada and Goldman-Rakic 1989
; Lu et al. 1994
). This also suggests that the PMd may be involved in computing the information measure by using signals from the areas that project onto it. In this study, I explored the possibility of measuring neural information by recording from single neurons in the PMd of monkeys.
|
|
METHODS |
|---|
|
Behavioral task
The monkeys were trained to perform three tasks with varying expected amounts of information. They were seated in a primate chair facing a computer display. The eye position was monitored with an infrared eye-tracking system. To begin a run of trials for task A, the monkey fixated on a central cross (size, 1°) presented on a black background for 200 ms (Fig. 1). During fixation, the monkey had to maintain its gaze within 2° of the fixation point. Subsequently, six white dots (size, 1°) appeared around the cross at an eccentricity of 7°. One dot was randomly designated as the reward target, whereas the others were distractors. The monkey maintained fixation on the central cross for 1,000 ms. The disappearance of the fixation cross cued the selection of a dot by saccadic eye movement. It was able to choose from among the six dots. When it fixated on a dot for 300 ms, the dot was regarded as its choice. When the chosen dot was a distractor, it turned green and the central cross reappeared. The monkey fixated on the cross and maintained the fixation for 1,000 ms. The disappearance of the fixation cross cued the selection of another dot. This exercise was repeated until the monkey chose the reward target. When the chosen dot was the reward target, the dot turned red, all distractors turned green, and the animal received 0.3 ml drop of water. The location of the reward target was randomized and the next trial was begun.
|
In task C, the reward target was located at the same site as the informative target in task B. In this task, the first saccade did not provide the monkey with any information because the reward target was known before the first saccade.
First, the monkey performed three tasks: task A; task B, where the lower right dot was the informative target; and task C, where the lower right dot was the reward target. Each task was provided as a trial block, and it ended when the animal chose the lower right dot at the first saccade 20 times. The first fixation of these 20 trials shared an identical visual stimulus configuration and was followed by the same motor response in all the tasks. The order of the tasks was randomized for recording different units. Second, the monkey performed task B in which either the top dot or the lower left dot was informative. This task ended when it chose the informative target at the first saccade 20 times. In these 20 trials, the first cross fixation shared an identical stimulus configuration but was followed by different motor responses. Of the 20 trials performed in each task, the last 15 trials were used for unit recording to ensure that the monkey recognized the task that it had performed. For recording each unit, it performed
60 trials of task A, 40 trials of task B, and 20 trials of task C (In the early trials of each task A, it selected different dots as the first choice but then came to select the same dot first. It appeared to stop looking for the informative target and the reward target at the first saccade.).
Neurophysiological recording
The activity of single neurons was recorded from the right premotor cortex of monkey A and from the left premotor cortex of monkey B by using tungsten microelectrodes (FHC). The neurons were randomly selected; no attempt was made to search for a task-related activity. Waveform separation was performed off-line using a template-matching spike sorter (Spike2, CED). After the recording was completed, the animals were killed and then perfused with a fixative. During perfusion, 2 pins were inserted at a distance of 7 mm from each other at known coordinates to aid in the localization of the recording site. The recording sites were plotted on the basis of the positions of these pins.
Information measure
The expected information during the first fixation was calculated as follows: Let pi denote the probability that the ith dot is the reward target (i = 1,..., 6). In task A, pi is 1/6 for any dot before the first choice is made. Therefore the information entropy is log2(1/6) (bits). If the chosen dot is a distractor, pi becomes 0 for the chosen dot and 1/5 for the other dots, thereby decreasing the information entropy to log2(1/5). If the chosen dot is the reward target, pi becomes 1 for the chosen dot and 0 for the other dots, and the information entropy becomes 0. Because the probability that the chosen dot is a distractor is 5/6, the expected information (that is, the expected decrement in the information entropy) is log2(1/6) (5/6)[log2(1/5)], which is nearly 0.65 bits. The expected information is the same for all the dots. In task B, for all the dots other than the informative target, the expected information is the same as described in the preceding text. If the informative target is chosen, pi becomes 1 for the reward target and 0 for the other dots because the reward target is revealed to the monkey. The information entropy then decreases to 0. Therefore the expected information is log2(1/6), which is nearly 2.58 bits. In task C, pi is always 1 for the reward target and 0 for the other dots; therefore the expected information is 0.
Alternative measures of information
I also tested the following two information measures (Nelson 2005
): the expected improvement in the probability of correctly identifying the true target (probability gain) and the expected absolute change in beliefs about the location of the true target (impact).
Probability gain is defined by the following equation
![]() | (1) |
Impact is defined by the following equation
![]() | (2) |
State value in reinforcement learning
In addition to the information measures, I tested the state value that is the net present value of the discounted future reward. The state value indicates the proximity of the present state to the reward in the reinforcement learning theory (Sutton and Barto 1998
). It is given as E[
j dj Rj], where Rj (j = 0, 1, 2,...) denotes the reward obtained by the (j +1)th choice in each trial, and d is the discount rate (0 < d < 1) that indicates the subjects concern regarding the future reward. In the present study, Rj was 0 or a drop of water depending on the monkeys selections. In task A, the probability that the monkey selects the reward target as the first choice is 1/6. Therefore E[R0] = R/6, where R denotes 0.3 ml drop of water. The probability that it selects the reward target as the second choice is a product of the probability that the first choice is a distractor, 5/6, and the probability that it selects the reward target from the remaining five white dots as the second choice, 1/5. Therefore E[dR1] = d(5/6)(1/5)R = d(R/6). Similarly, E[d2R2] = d2(R/6),..., E[d5R5] = d5(R/6). From these, we obtain E[
j dj Rj] = (1 + d + d2 + d3 + d4 + d5)(R/6). In task B, the monkeys almost always chose the informative target as the first choice. Assume that this probability is 1. The probability that the informative target is the reward target is 1/6. Therefore E[R0] = R/6. The probability that the monkey selects the reward target as the second choice is a product of the probability that the informative target is not the reward target, 5/6, and the probability that it selects the remaining white dot, 1. E[dR1] = d(5/6)R. We obtain E[
j dj Rj] = (1 + 5d)(R/6). In task C, the monkeys almost always chose the reward target as the first choice. Assuming that this probability is 1, E[R0] = R. We obtain E[
j dj Rj] = R. By normalizing the state values to value R, we obtain (1 + d + d2 + d3 + d4 + d5)/6 for task A, (1 + 5d)/6 for task B, and 1 for task C. Because 0 < d < 1, the normalized state values are the greatest in task C and the least in task A.
Data analysis
The discharge rates of the neural activity recorded during the first cross fixation that was followed by saccades to the lower right dot were compared across the three tasks using the Mann-Whitney U test. In addition, in task B, the neural activity during the first cross fixation was compared between different locations of the informative target using the Mann-Whitney U test. The neural activity during the sixth cross fixation that was followed by saccades to any dot in task A was compared with the neural activity during the first cross fixation that was followed by saccades to the lower right dot in task C. Further, the former activity was compared with the neural activity during the first cross fixation that was followed by saccades to the lower right dot in task A. These comparisons were performed using the Mann-Whitney U test. Unless otherwise stated, P < 0.01 was considered to indicate statistical significance.
To evaluate information-related activity across the population, the mean response of neurons that correlated with the information measures was calculated. This was done following the normalization within each neuron to the response to task B for increasing responses and to task C for decreasing responses because the information measures were the greatest in task B and the least in task C. It follows that the normalized increasing and decreasing responses were 1 for both tasks B and C, respectively. I examined whether the population responses of information-measuring neurons were fitted by straight lines Y =
(X Xo) +1, where X and Y were the values of information measures and normalized responses, respectively, Xo was the values in task B for increasing responses and in task C for decreasing responses, and
was coefficients. The values of
were obtained by minimizing the mean square error of the data from the straight lines. Using the F test, the null hypothesis that Y was independent of X was tested against the alternative that the straight line Y =
(X Xo) +1 fitted the data for each of the three information measures. The normalized responses were also compared between the three tasks using the Mann-Whitney U test.
I also examined the mean response of the neurons that correlated with the state values. Because the normalized state values are the greatest in task C and the least in task A, the response of each neuron was normalized to the response of task C for increasing responses and to that of task A for decreasing responses. I fitted a single straight line Y =
(X 1) +1 to both the increasing and decreasing responses of each monkey as described below, where X, Y, and
were the normalized state values, normalized responses, and coefficients, respectively. The mean square error of the data from the straight line was minimized over the pairs of values d and
. From this, we obtain a single value of d for each monkey; this indicates the monkeys concern regarding the future reward. To fit a single line to both the increasing and decreasing responses, I regarded the decreasing responses as the increasing responses in a reverse order of the state values. In other words, I hypothetically regarded the decreasing responses at state values (1 + d + d2 + d3 + d4 + d5)/6, (1 + 5d)/6 and 1 as responses at state values 1, (1 + d + d2 + d3 + d4 + d5)/6 + (1 (1 + 5d)/6), and (1 + d + d2 + d3 + d4 + d5)/6, respectively. It follows that the reversed decreasing responses were 1 at the normalized state value, 1. I then fitted a straight line to the normalized responses from all neurons that correlated with the state value of each monkey. Using the F test, the null hypothesis that Y was independent of X was tested against the alternative that the straight line Y =
(X 1) +1 fitted the data.
|
|
RESULTS |
|---|
|
Figure 2A shows the success rate of trials in task A. Both the monkeys completed most of the trials to receive a reward whenever they made their first selection (92 and 76%). Monkey A completed the trials at nearly the same success rate, irrespective of the number of choices necessary to receive the reward. The success rate of monkey B decreased slightly with the number of choices.
|
In task C, the monkeys almost always chose the reward target at the first saccade (99% of the trials in the case of both the monkeys).
Neural correlates of information measure
I recorded the activity of 1,832 randomly selected neurons in the PMd of the two monkeys (Fig. 3) and analyzed the neural activity that was recorded during the first cross fixation. The neural activity was stored in the database if the fixation was followed by saccades to the informative target in task B or by saccades to the dot at the same position as the informative target in tasks A and C. Thus the activity shared an identical visual stimulus configuration and was followed by the same motor response. The amount of information expected from the subsequent eye movement was the highest in task B and the least in task C. Forty-six percent of the neurons showed significant differences in the activity between some of the tasks (844/1832), and 13% of the neurons showed significant differences between all the tasks (110/844). Of the remaining 110 neurons, 50% (55 neurons) showed an activity that reflected the expected amount of information. Figure 4A shows the representative data that was obtained from an information-measuring neuron, which exhibited the greatest activity during task B and the least activity during task C. This activity cannot be explained by sensory input or the preparation of motor response because these were identical for all the tasks. Thus the activity may reflect the amount of information that the monkeys expected to obtain from subsequent eye movements. Of the 55 neurons, 26 showed a significantly greater activity in task B than in task A and a significantly greater activity in task A than in task C. The remaining 29 neurons exhibited the opposite pattern: the least activity in task B and the greatest activity in task C.
|
|
To evaluate the information-related activity across the population, I normalized the activity of each information-measuring neuron and plotted its activity as a function of the information-theoretic measure (Fig. 4C). The F test suggests that the change in the population responses could be proportional to the information-theoretic measure [variance ratios were 16.6 (P < 0.01) and 53.0 (P < 0.01) for the increasing and decreasing responses of monkey A and 6.72 (P < 0.025) and 72.8 (P < 0.01) for the increasing and decreasing responses of monkey B, respectively] and such changes between tasks were significant. Such proportionality may reflect the intuition that the amount of information is additive between stochastically independent events.
The F test also suggests proportionality to the alternative measures of information: in the case of probability gain, the variance ratios were 8.67 (P < 0.01) and 52.9 (P < 0.01) for the increasing and decreasing responses for monkey A and 3.11 (not significant) and 29.8 (P < 0.01) for the increasing and decreasing responses for monkey B, respectively. In the case of impact, the variance ratios were 36.6 (P < 0.01) and 49.5 (P < 0.01) for the increasing and decreasing responses for monkey A and 16.2 (P < 0.01) and 79.2 (P < 0.01) for the increasing and decreasing responses for monkey B, respectively. These variance ratios suggest that all population responses were proportional to any of the three information measures except for the increasing response of monkey B that could have been uncorrelated with the probability gain. The variance ratios were smaller in the probability gain than in the other measures except for the decreasing response of monkey A. This suggests that the three information measures showed relatively similar proportionality to the neural responses although the probability gain was slightly inferior to the others.
The information-theoretic measure satisfies the following two conditions for the information measure: decreasing with probability and additive between independent events. The question that then arises is whether or not the alternative measures satisfy these conditions. It is shown that the probability gain decreases with probability but is not additive and the impact neither decreases with probability nor is additive (APPENDIX).
These considerations suggest that in comparison with the other measures, the information-theoretic measure is more plausible for information measure in the brain, although the three information measures showed similar proportionality to the neural response.
Motor control by information-measuring neurons and the time course of population response
To examine whether the information-measuring neurons participated in the motor selection process, I analyzed the neural activity with changes in the location of the informative target in task B. Approximately half of the information-measuring neurons (30/55 or 54%) exhibited significant changes in the activity during the first fixation. The neuron shown in Fig. 4A was more active when the lower right dot was informative than when the top dot was informative [tasks B and B(t)]. These activities shared the same visual stimulus configuration and information value expected from subsequent eye movements but differed in terms of subsequent eye movements. Therefore the activities may reflect the preparation for those eye movements. This suggests that the information-measuring neuron might also be involved in the selection of motor response based on expected information.
The time course of population responses showed initial differences between the tasks, followed by a gradual increase in the great responses (Fig. 4, D and E). This suggests that the monkeys were aware of the task type at the beginning of the trials and defined their expected information value based on this awareness.
Neurons sensitive to reward proximity
The 110 neurons that exhibited significant changes in activity between all the tasks included 43 neurons (39%) of another type the activity of which reflected proximity to the reward. Figure 4B illustrates an example of these reward-proximity-coding neurons, which were the least active in task A and the most active in task C. The expected number of eye movements required to receive the reward was the greatest in task A, whereas only one eye movement was required to receive the reward in task C. This suggests that these neurons may encode proximity to the reward. Of these 43 neurons, 33 were the most active in task C and the least active in task A. The remaining 10 neurons showed the opposite pattern, that is, they were the most active in task A and the least active in task C.
I fitted the linear function of the state value to the normalized activity of the reward-proximity-coding neurons. The values of d minimizing the mean square error were 0.742 for monkey A and 0.515 for monkey B. Because value d indicates the subjects concern regarding the future reward, the obtained values suggest that monkey A was more concerned regarding the future reward than monkey B. This might cause monkey A to continue task A longer than monkey B (Fig. 2A). The variance ratios were 45.4 (P < 0.01) for monkey A and 177 (P < 0.01) for monkey B, suggesting that the state value well explains the responses of the reward-proximity-coding neurons.
Distribution of task-related neurons
The question whether the present data are a result of random activity leads to six possible permutations for the three tasks. The information-measuring neurons increased their activity in the order of tasks C, A, and B or B, A, and C. The reward-proximity-coding neurons increased their activity in the order of tasks A, B, and C, or C, B, and A. If the data were a result of random activity, then the information-measuring neurons, the reward-proximity-coding neurons, and the other neurons would have been present in an equal proportion among the 110 neurons. The distribution of the three neuron types deviated widely from equal proportions (50, 39, and 11%, respectively), suggesting that the present data were not the result of random activity.
|
|
DISCUSSION |
|---|
|
The neural activity related to reward proximity has previously been observed in the anterior cingulate cortex (Shidara and Richmond 2002
) and the caudate nucleus (Kawagoe et al. 1998
). The present neural activity related to reward proximity indicates that the PMd also plays a role in the calculation of reward proximity.
The present study has shown that the brain may measure information using information entropy. The ability to measure information is essential to seek more information. The information-theoretic measure has been shown to serve human category learning (Corter and Gluck 1992
) and the representation of human knowledge in computers (Nakamura et al. 1983
; Quinlan 1983
). A mechanism calculating the information entropy could contribute to the general information-seeking activity.
|
|
APPENDIX |
|---|
|
I examined whether the alternative measures satisfied the two conditions for the information measure: decreasing with probability and additive between independent events. For the previously mentioned example of identifying a playing card, the probability gain of the number on the card is obtained as follows: Q is what is the number on the card. q and h are 1,..., 13. P(q) is 1 for q that is the number on the card and 0 otherwise. P(h|q) is 1 for h that is the number on the card and 0 otherwise in the case of q that is the number on the card. P(h) is 1/13 for any h. Consequently, Pg(Q) = (1)max[1, 0,..., 0] max[1/13,..., 1/13] = 12/13. Similarly, the probability gains of the suit and the card name (for example, "heart seven") are 3/4 and 51/52, respectively; 12/13 > 3/4 and 12/13 + 3/4 is not equal to 51/52. It follows that the probability gain decreases with probability but is not additive.
The impact of the number on the card is (1)(abs[1 1/13] + (12)abs[0 1/13])/13 = (2)(12)/132 = 24/169. Similarly, the impacts of the suit and card name are 6/16 and 102/2,704, respectively; 24/169 < 6/16 and 24/169 + 6/16 is not equal to 102/2,704. It follows that impact neither decreases with probability nor is additive.
Decrement in variance satisfies the two conditions for information measure. However, variance is not available unless a statistical variable is provided. As seen in the example of card identification, humans estimate the amount of information even in the absence of a statistical variable. In the present tasks, no statistical variable was provided, suggesting that the brain did not use variance for information measure.
|
|
GRANTS |
|---|
|
|
|
ACKNOWLEDGMENTS |
|---|
|
|
|
FOOTNOTES |
|---|
Address reprint requests and other correspondence: Dept. of Computational Intelligence and Systems Science, Tokyo Institute of Technology, 4259-G3-46 Nagatsuta, Midoriku, Yokohama 226-8502, Japan (E-mail: nakamura{at}dis.titech.ac.jp)
|
|
REFERENCES |
|---|
|
Aron AR, Shohamy D, Clark J, Myers C, Gluck MA, and Poldrack RA. Human midbrain sensitivity to cognitive feedback and uncertainty during classification learning. J Neurophysiol 92: 11441152, 2004.
Berger B, Gaspar P, and Verney C. Dopaminergic innervation of the cerebral cortex: unexpected differences between rodents and primates. Trends Neurosci 14: 2127, 1991.[CrossRef][Web of Science][Medline]
Cavada C and Goldman-Rakic PS. Posterior parietal cortex in rhesus monkey. II. Evidence for segregated corticocortical networks linking sensory and limbic areas with the frontal lobe. J Comp Neurol 287: 422445, 1989.[CrossRef][Web of Science][Medline]
Cisek P and Kalaska JF. Neural correlates of mental rehearsal in dorsal premotor cortex. Nature 431: 993996, 2004.[CrossRef][Medline]
Corter JE and Gluck MA. Explaining basic categories: feature predictability and information. Psychol Bull 111: 291303, 1992.[CrossRef]
di Pellegrino G and Wise SP. Visuospatial versus visuomotor activity in the premotor and prefrontal cortex of a primate. J Neurosci 13: 12271243, 1993.[Abstract]
Fairhall AL, Lewen GD, Bialek W, and de Ruyter van Steveninck RR. Efficiency and ambiguity in an adaptive neural code. Nature 412: 787792, 2001.[CrossRef][Medline]
Fiorillo CD, Tobler PN, and Schultz W. Discrete coding of reward probability and uncertainty by dopamine neurons. Science 299: 18981902, 2003.
Fujii N, Mushiake H, and Tanji J. Rostrocaudal distribution of the dorsal premotor area based on oculomotor involvement. J Neurophysiol 83: 17641769, 2000.
Hoshi E and Tanji J. Integration of target and body-part information in the premotor cortex when planning action. Nature 408: 466470, 2000.[CrossRef][Medline]
Ilinsky IA, Jouandet ML, and Goldman-Rakic PS. Organization of the nigrothalamocortical system in the rhesus monkey. J Comp Neurol 236: 315330, 1985.[CrossRef][Web of Science][Medline]
Kawagoe R, Takikawa Y, and Hikosaka O. Expectation of reward modulates cognitive signals in the basal ganglia. Nat Neurosci 1: 411416, 1998.[CrossRef][Web of Science][Medline]
Lu MT, Preston JB, and Strick PL. Interconnections between the prefrontal cortex and the premotor areas in the frontal lobe. J Comp Neurol 341: 375392, 1994.[CrossRef][Web of Science][Medline]
Nakamura K, Sage AP, and Iwai S. An intelligent data-base interface using psychological similarity between data. IEEE Trans Syst Man Cybern 13: 558568, 1983.
Nelson JD. Finding useful questions: on Bayesian diagnosticity, probability, impact, and information gain. Psychol Rev 112: 979999, 2005.[Medline]
Ohbayashi M, Ohki K, and Miyashita Y. Conversion of working memory to motor sequence in the monkey premotor cortex. Science 301: 233236, 2003.
Platt ML and Glimcher PW. Neural correlates of decision variables in parietal cortex. Nature 400: 233238, 1999.[CrossRef][Medline]
Quinlan JR. Learning efficient classification procedures and their application to chess and games. In: Machine Learning: An Artificial Intelligence Approach, edited by Michalski RS, Carbonell JG, and Mitchell TM. San Mateo, CA: Morgan Kaufmann, 1983, p. 463482.
Rizzolatti G, Luppino G, and Matelli M. The organization of the cortical motor system: new concepts. Electroencephalogr Clin Neurophysiol 106: 283296, 1998.[CrossRef][Web of Science][Medline]
Shannon CE and Weaver W. The Mathematical Theory of Communication. Urbana, IL: University of Illinois Press, 1949.
Shidara M and Richmond BJ. Anterior cingulate: single neuronal signals related to degree of reward expectancy. Science 296: 17091711, 2002.
Sutton RS and Barto AG. Reinforcement Learning: An Introduction. Cambridge, MA: MIT Press, 1998.
Volz KG, Schubotz RI, and Cramon DY. Predicting events of varying probability: uncertainty investigated by fMRI. NeuroImage 19: 271280, 2003.[CrossRef][Web of Science][Medline]
Williams SM and Goldman-Rakic PS. Characterization of the dopaminergic innervation of the primate frontal cortex using a dopamine-specific antibody. Cereb Cortex 3: 199222, 1993.
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |
| Visit Other APS Journals Online |