## Abstract

A neural network model was developed to explain the gravity-dependent properties of gain adaptation of the angular vestibuloocular reflex (aVOR). Gain changes are maximal at the head orientation where the gain is adapted and decrease as the head is tilted away from that position and can be described by the sum of gravity-independent and gravity-dependent components. The adaptation process was modeled by modifying the weights and bias values of a three-dimensional physiologically based neural network of canal–otolith-convergent neurons that drive the aVOR. Model parameters were trained using experimental vertical aVOR gain values. The learning rule aimed to reduce the error between eye velocities obtained from experimental gain values and model output in the position of adaptation. Although the model was trained only at specific head positions, the model predicted the experimental data at all head positions in three dimensions. Altering the relative learning rates of the weights and bias improved the model-data fits. Model predictions in three dimensions compared favorably with those of a double-sinusoid function, which is a fit that minimized the mean square error at every head position and served as the standard by which we compared the model predictions. The model supports the hypothesis that gravity-dependent adaptation of the aVOR is realized in three dimensions by a direct otolith input to canal–otolith neurons, whose canal sensitivities are adapted by the visual-vestibular mismatch. The adaptation is tuned by how the weights from otolith input to the canal–otolith-convergent neurons are adapted for a given head orientation.

## INTRODUCTION

The angular vestibuloocular reflex (aVOR) maintains stability of images on the retina during head movements from activity arising in the semicircular canals. The gain, defined as the ratio of (peak eye velocity)/(peak rotational head velocity), is a direct measure of the performance of the aVOR. A key property of the gain of the aVOR is that it is highly adaptable when there is persistent visual-vestibular mismatch, increasing or decreasing the movement of the eyes to stabilize the visual surround relative to the head (for review, see Cohen and Gizzi 2003; Ito 2002). Adaptation of the gain of the aVOR is also affected by head orientation with respect to gravity (Baker et al. 1987a,b; Tan et al. 1992; Tiliket et al. 1993; Yakushin et al. 2000a, 2003a,b,c). If the aVOR gain is adapted in a particular head position relative to gravity, then the changes in gain are maximal in that head position and minimal or absent in the opposite head position (Yakushin et al. 2005b,c). Moreover, the gain changes roughly follow a sinusoidal profile as a function of head orientation with respect to gravity as the head is moved away from the position in which adaptation took place. Therefore it has been postulated that the aVOR adaptation is modulated by otolith organ input because the otolith organs are the primary sensors of head position with respect to gravity. It was further proposed that aVOR adaptation constitutes gravity-independent and gravity-dependent processes (Xiang et al. 2004; Yakushin et al. 2000a, 2003a,b,c, 2005b,c). Exactly how the otolith input spatially tunes the adaptation of the aVOR, giving rise to the gravity-dependent and gravity-independent components, is not known.

Modeling of aVOR function has proved useful for revealing the organization of the aVOR (for review, see Raphan and Cohen 2002) and in understanding the physiological basis of aVOR gain adaptation (Albus 1971; Highstein et al. 2005; Hirata and Highstein 2001, 2002; Ito 1984, 2002; Lisberger et al. 1994a,b,c; Marr 1969).

Using a matrix of gain values, another model was previously implemented in three dimensions to explain the contribution of the individual canals to the gain of the aVOR (Yakushin et al. 1998). This model is physiologically based, using a canal projection matrix (*T _{can}*) to reflect the projection of the head velocity in head coordinates into a canal coordinate frame. A gain matrix (

*G*) then projects the canal-based vector into the head coordinate frame, driving the oculomotor system. This latter transformation can be represented by a matrix

*T*(Yakushin et al. 1998). In other studies, it was shown that the gravity-dependent adaptation data in three dimensions can be represented by a double sinusoid (Yakushin et al. 2005c). Such a fit assumes that gain adaptation falls off as a sinusoid from some peak value, regardless of the direction of head orientation away from the point of maximum gain, suggesting that the otolith organs tune the gain adaptation relative to gravity (Yakushin et al. 2005c). A probable basis for gravity-dependent adaptation is the extensive convergence of otolith inputs onto semicircular canal recipient neurons in the vestibular nuclei (Baker et al. 1984; Brettler and Baker 2001; Curthoys and Markham 1971; Dickman and Angelaki 2002; Duensing and Schaefer 1958; Fukushima et al. 1990; Graf et al. 1993; Perlmutter et al. 1998; Yakushin et al. 2005a, 2006). The purpose of this study was to implement a mapping of the elements of the gain matrix (Yakushin et al. 1998) to the structure of the canal–otolith convergence in the central vestibular system using an artificial neural network. Such a physiologically based neural network would demonstrate the feasibility of this hypothesis and give insight into the realization of the gravity-dependent adaptation in three dimensions.

_{head}## METHODS

### Experimental protocols

Experimental data used for comparison with model predictions came from five cynomolgus monkeys. The surgical and experimental protocols were described in previous publications (Yakushin et al. 2000b, 2003c) and were approved by the Institutional Animal Care and Use Committee (IACUC) of the Mount Sinai School of Medicine. Briefly, one scleral search coil measured the horizontal and vertical components of eye position (Judge et al. 1980; Robinson 1963) and a second coil was used to measure the torsional component of eye position (Cohen et al. 1992). During testing, the animals were in darkness and sat in a primate chair in a four-axis vestibular stimulator surrounded by an optokinetic drum. The diameter of the drum surrounding the animal is 91 cm. Thus the distance between the visual surround and the monkey was 45 cm. Gains were decreased by rotating the animal and visual surround in the same direction and increased by rotating the animal and the visual surround in opposite directions. Adaptation was carried out over a 4-h period in each instance. See Xiang et al. (2004) and Yakushin et al. (2003c, 2005b) for a complete description of the protocol.

Data used for comparison with the model predictions were obtained following vertical aVOR gain adaptation for single-state (Monkeys 1 and 2), dual-state (Monkeys 3 and 4; Yakushin et al. 2003c), and triple-state conditions (Monkeys 1 and 5). Only data on the dual-state adaptation were obtained from a previous study (Yakushin et al. 2003c). For the single-state condition, Monkeys 1 and 2 were adapted in left-side-down (LSD) and right-side-down (RSD) head positions. For the dual-state–adapted condition (data obtained from a previous study for Monkeys 3 and 4), the vertical aVOR was adaptively decreased in one side-down position while being increased in the opposite side-down position. The triple-state adaptation condition was implemented for the vertical aVOR by decreasing the gain in the LSD position for 20 min, decreasing the gain in the upright position for 20 min, and then increasing the gain in the RSD position for another 20 min. This was repeated four times over a 4-h cycle and data were collected from Monkeys 1 and 5.

Changes in the vertical aVOR gain were measured for each adapted state, with the head tilted from LSD to RSD in 10° increments. For measuring three-dimensional (3D) gain changes, the animal's head was tilted from −90 to 90° in four sequences: prone-supine and LSD–RSD, and in two intermediate planes that were 45° from the prone–supine and LSD–RSD planes. Three-dimensional gain surfaces were then created using a spline-interpolation of the data. The neural network was trained using only the gain values before and after the gain modification at the head orientation where adaptation took place. After training, the neural network model predictions were compared with the spatial gain distribution obtained from the experiments. We also compared the predictions of the neural network model, which was trained at a single head orientation, to single-sinusoid fits to data in one plane and to double-sinusoid fits to data in multiple planes that minimized errors between the experimental data and the fits at every test position after adaptation (Yakushin et al. 2005c).

Paired Student's *t*-tests with a 5% significance level were used to statistically analyze differences between the neural network model predictions and sinusoidal fits to the experimental data.

### Conceptual basis of model development

Artificial neural networks are composed of weighted excitatory and inhibitory connections that sum at processing units or nodes (Anderson 1995; Bishop 1995; Rumelhart and McClelland 1986). These networks have been extensively used to explain the behavior of neural systems that require motor learning (Anastasio 1992; Anastasio and Robinson 1989, 1990; Quinn et al. 1998; Zipser and Andersen 1988). A number of problem-specific learning rules were used in training networks, depending on whether the learning is supervised or unsupervised (Hebb 1949; Kohonen 2000; Rumelhart et al. 1986; Widrow and Hoff 1960).

Because VOR adaptation is driven by an error signal between eye velocity and surround velocity, we assumed that the learning is supervised and is based on a delta rule (Rumelhart et al. 1986; Widrow and Hoff 1960), in which the weights are updated in proportion to the gradient of error function. Because the network is distributed in angular space coordinates, we also implemented a local learning rule for each of the cells within the distributed network that represent canal–otolith-convergent neurons within the vestibular nuclei (Baker et al. 1984; Brettler and Baker 2001; Curthoys and Markham 1971; Dickman and Angelaki 2002; Duensing and Schaefer 1958; Fukushima et al. 1990; Graf et al. 1993; Perlmutter et al. 1998; Yakushin et al. 2005a, 2006). Therefore the neural network implementation tested the hypothesis that the network of canal–otolith-convergent neurons learns by a supervised delta rule using the error between eye velocity and surround velocity. We further hypothesized that the learning is implemented locally on each of the gain elements, i.e., weights that realize the 3D gain matrix of the aVOR, based on projections of otolith polarization vectors^{1} on canal recipient central vestibular neurons.

The aVOR gain matrix was characterized as a collection of distributed components, each composed of a bias value and a sum of weighted contributions from central neurons receiving canal input and input from the otolith organs with otolith polarization vectors assumed to be lying in canal planes (Raphan and Schnabolk 1988; Schnabolk and Raphan 1992; Sheliga et al. 1999). The model predictions were statistically compared with data from monkeys with respect to horizontal, vertical, and torsional gain adaptation, obtained over a wide range of head orientations to show the feasibility of the proposed neural organization. We further tested the model by comparing its predictions of gain changes with experimental data as a function of head orientation relative to gravity after adaptation at two and three different head positions, i.e., multistate adaptation.

### Model overview

A schematic of the neural network model for adapting the gains during gravity-dependent adaptation is shown in Fig. 1*A*. The network of canal-related neurons have specific input corresponding to the plane of the canals. These neurons process the canal input and project this information to the oculomotor system with a specific gain implementing each of the *g _{ij}* in the gain matrix (Yakushin et al. 1998). The gain matrix therefore operates on the canal signal and drives the oculomotor system in three dimensions. In addition, these canal-related neurons receive input from otolith polarization vectors that are weighted and can modify the canal transduction gains

*g*. Modification of these gains is accomplished by a weighted sum of 108 otolith-related neurons as well as a bias input in a particular canal plane that modify the canal transduction. Each plane in the model modifies three gain elements in a column, representing how that class of canal neurons activates roll, pitch, and yaw eye velocity. This model structure is supported by the fact that there are canal-related cells in the vestibular nuclei that receive otolith input over a wide range of polarization angles (Dickman and Angelaki 2002).

_{ij}A simplified diagram of how canal–otolith and canal-only neurons might produce the aVOR response is shown in Fig. 1*B*. The gravitydependent response component after adaptation is implemented by the weights (*w _{ijl}*) that are adapted for a given head orientation. After adaptation, these weights, which are set by the otolith input depending on head orientation, determine the canal-related response. The gravity-independent response is determined by the canal-only neurons, whose transduction gain is represented by the bias value (

*b*). These units would respond similarly, regardless of head orientation. Thus the model is physiologically based, although the actual connectivity of neurons in the vestibular nuclei among these cells is not known.

_{ij}We have represented the connectivity as a weighted summation of these cells whose weights adapt with visual-vestibular mismatch. The question we sought to answer was whether the weights of the 108 neural inputs to the canal transduction gains could be adapted to match the data at one head position and then fit the data at all other head positions in 3D space. Whether the canal sensitivity of neurons in the vestibular nuclei is modified in a gravity-dependent way by this weighted otolith input as predicted by the model will be answered by neural recordings in the vestibular nuclei during and after gravity-dependent adaptation.

The weights connecting the input and the output of the artificial neurons represent the modifiable elements governing the gravity-dependent aVOR gain adaptation, whereas the bias parameter implements the gravity-independent gain adaptation of the aVOR. The computed eye velocity using the neural network (Fig. 1, Eye Vel) is subtracted from the product of the head velocity and experimentally obtained gain values determined at the adaptation site (Fig. 1, Target Eye Vel). The difference is the velocity error, which is used in training the weight and bias of the neurons, by two separate learning rules, which modify the *g _{ij}* values (Fig. 1).

### Head–canal coordinate frames

Head and eye position and velocity were referenced to a head coordinate frame, defined by the roll (X_{H}), pitch (Y_{H}), and yaw (Z_{H})axes of the head (Fig. 2*A*). Normal vectors to the anterior, posterior, and lateral semicircular canal planes defined the canal coordinate frame (X_{C}, Y_{C}, and Z_{C}) and were used to describe the activation of the canals. The relationship between head and canal coordinates has been derived (Yakushin et al. 1998) as (1) where θ_{a} and ψ_{a} are the second and third Euler angle rotations (Goldstein 1980) of the head roll axis (X_{H}), θ_{p} and ψ_{p} are the second and third Euler angle rotations of the head pitch axis (Y_{H}), and φ_{l} and θ_{l} are the first and second Euler angle rotations of the head yaw axis (Z_{H}). By choosing these angles appropriately, the axes of the rotated coordinate vectors are aligned with the normals to the anterior, posterior, and lateral canals, respectively (Fig. 2*B*). Based on experimental data (Yakushin et al. 1995, 1998), these angles were set as: θ_{a} = θ_{p} = −40°, ψ_{a} = ψ_{p} = 135°, φ_{l} = 0°, and θ_{l} = −30°. Thus given a velocity vector in the head frame in roll, pitch, and yaw, *Eq. 1* will convert the vector into the canal frame so that the adaptation related to the anterior, posterior, and lateral canals can be separated. If we denote the inverse of *T _{can}* as (2) then a projection mapping from canal coordinates back to head coordinates,

*T*, can be given by (3) where

_{head}*g*represents the contribution to the

_{ij}*i*th component of the aVOR from the

*j*th canal of the left labyrinth. That is,

*g*

_{0j}represents the contribution to the roll aVOR (0) from the anterior (

*j*= 0), posterior (

*j*= 1), and lateral (

*j*= 2) canals. Similarly,

*g*

_{1j}and

*g*

_{2j}represent the contribution to the pitch (1) and yaw (2) aVOR from the

*j*th canal. The elements

*g*can then be viewed as a matrix

_{ij}*G*, given by (4) The parameters of this matrix are functions of otolith input and determine the gravity dependency of the gain of the aVOR.

### Development of the neural network for determining g_{ij}

Because each *g _{ij}* represents the contribution to the

*i*th direction of the head from the

*j*th canal, each otolith polarization vector is assumed to contribute activity in a particular canal plane. Thus in the neural network model (Fig. 1), the gain parameters

*g*were implemented as a parallel distributed network in which each

_{ij}*g*is a weighted sum of the projections of a unit vector along the acceleration of the gravity on the individual polarization vectors

_{ij}**p̂**, plus a bias

_{l}*b*, which represents the part of the gain that is independent of gravity, given as (5) where

_{ij}*i*= 0 (torsional), 1 (vertical), and 2 (horizontal);

*j*= 0 (anterior canal), 1 (posterior canal), and 2 (lateral canal); and

*n*= number of units.

The unit vector **â _{g}** is in the direction of the equivalent acceleration of gravity, pointing upward from the earth, whereas

**p̂**is a unit vector in the direction of the polarization for a particular neural unit. For a particular head orientation, 〈

_{l}**â**,

_{g}**p̂**〉 is the inner product (dot product) between the acceleration of gravity and individual polarization vectors, which implements a cosine tuning of activity. The more a unit's vector coincides with the direction of the equivalent acceleration of gravity, the larger the positive stimulus. When the unit's polarization vector is oriented opposite the direction of the acceleration of gravity, the projection is negative. When the head is tilted to various orientations, projections will be sinusoidally modulated. The index

_{l}*l*runs from 0 to

*n*− 1 and enumerates all the units. Because of the central convergence of otolith units onto canal-related neurons, it was operationally equivalent to consider the

*n*polarization units divided into three groups, each associated with a specific canal plane.

We developed an updating (learning) rule that is very similar to the generalized delta rule, given by (6) although there was no hidden layer or nonlinear activation function (Rumelhart et al. 1986).

The parameter *k* is an adjustable parameter determining the speed of the learning process. The error *E _{i}* is the

*i*th component of the velocity error in head coordinates. These errors are represented as angular velocities around the roll, pitch, and yaw axes of the head, with

*E*

_{0},

*E*

_{1}, and

*E*

_{2}representing the torsional, vertical, and lateral velocity errors, respectively, which are computed as follows.

Let denote the head velocity in the head coordinate frame, with the superscripts 0, 1, and 2 representing the roll, pitch, and yaw components, respectively. Then the corresponding velocity transformed into the canal frame can be determined by (7) with 0, 1, and 2 representing anterior, posterior, and lateral canal, respectively. A subsequent multiplication by *T _{head}* converts the head velocity in the canal frame to the head frame and also modifies the velocity signal by appropriate gain components to generate the eye velocity command (ν

*) in the head frame, given by (8) The target gain value*

_{e}*g*

^{(target)}was obtained from the data and the target eye velocity ν

_{e}

^{(target)}was simulated by multiplying the target gain by the head velocity ν

_{h}. The difference between ν

_{e}

^{(target)}and the eye velocity calculated from

*Eq. 8*forms the error vector (9) that is used to train the gain matrix

*G*.

The parameter *s _{ij}*, in

*Eq. 6*, is the amount of command velocity produced along the

*i*th head coordinate arising from the velocity induced along the

*j*th canal. This parameter thus represents the canal input during the adaptation. It is independent of the gain matrix and a larger value of

*s*induces larger weight modification. We define the parameter matrix

_{ij}*S*= [

*s*] as (10) Thus

_{ij}*s*represents the stimulus strength contributed by ν

_{ij}_{c}

^{j}. Multiplication of

*s*by

_{ij}*g*would give the actual eye velocity contributed by

_{ij}*ν*. Thus

_{c}^{j}*s*forms a matrix of computed values that does not require adaptation. It is equivalent to the derivative of the activation function in a generalized delta-rule learning scheme. For example,

_{ij}*g*

_{00}

*s*

_{00}=

*g*is the component of roll eye velocity induced by velocity along the anterior canal normal, which is initiated by the head rotation. In the case of gain increase where the error is always positive, a positive

_{00}Aν_{c}^{0}*s*

_{00}increases

*g*

_{00}to reduce the error. If

*s*

_{00}is negative,

*g*

_{00}must decrease its value to reduce the error. In addition to the sign of

*s*, the magnitude of

_{ij}*s*also contributes to the rate of gain change.

_{ij}From *Eq. 6* it is thus suggested that a weight change over time is proportional to the velocity error (Δ*E _{i}*/Δ

*t*), the projection of the acceleration of gravity onto the polarization vector, and the magnitude of the projected head velocity in canal coordinates. This corresponds to the components implemented in learning using the generalized delta rule, i.e., the error, the input, and the derivative of the activation function (Rumelhart et al. 1986). The bias values

*b*can be trained in a similar way based on canal input and velocity error, except that they are not affected by the gravity-encoded projection (11) A different learning rate factor

_{ij}*h*has been introduced, enabling us to determine the effects of differential learning rates of gravity-dependent and gravity-independent components on the adapted gains. The factor

*f*is to ensure that the amount of update for gravity-independent gain (contributed by Δ

*b*) is equal to that for gravity-dependent gain (contributed by Δ

_{ij}*w*) when

_{ijl}*h*and

*k*are the same. The value of

*f*has been set at 4 by trial and error.

Thus units having the largest projection of the acceleration of gravity will be trained the most in a gravity-dependent manner, whereas other units will be trained, dependent on the individual degree of the projection of gravity on the polarization vector. Once trained, the gain change in *g _{ij}* will be fully represented by the array of weight values for the specific head position in which adaptation took place. When the head turns to any new positions, the weights will remain the same, whereas the array of projections 〈

**â**,

_{g}**p̂**〉 change. Thus units that originally had the largest projection will have smaller projections, reducing the contribution of the units that were maximally trained previously. Other units that are now in line with the gravity vector will now have a larger projection, but will have reduced associated weights. Therefore the overall gain values will be less than the maximal value, predicting the gravity-dependent adaptation on head orientation.

_{l}### Choice of neural network units

The number of neural units used to simulate the combined contribution of the semicircular canals and otolith is not a critical aspect of the model. We have chosen 36 polarization vectors evenly distributed at 10° intervals in each of the three approximately orthogonal canal planes, giving a total of 108 units that were used for simulation. The angular resolution is consistent with the experimental resolution, which was in 10° steps. The 108 units spatially reside in planes coinciding with the canal planes and, within each of the three planes, units are distributed omnidirectionally the same way as the otolith units. This provides the flexibility to represent both the otolith organs and how they interact with the canal system in exerting their contribution to the gravity-dependent adaptation. The positions and orientations of the 108 polarization vectors were represented in canal planes so that they could easily interact with the canal-recipient units (Fig. 3*A*). Their gravitational projection profiles when the head is upright are shown in Fig. 3*B*. The polarization vectors distributed within the anterior canal plane were symmetrical with those distributed within the posterior canal plane and were more vertically tilted, whereas the polarization vectors distributed within the lateral canal plane were tilted by 30° around the interaural axis. Therefore the projection profile of polarization vectors in the anterior/posterior canal planes are sinusoids with higher amplitude, and thus exert more weight in gravity-dependent gain adaptation than those in the lateral canal plane (Fig. 3*B*). When the head is put into different orientation, the projection profiles of the units will be changed and influence the adaptation.

### Procedure of the training

The neural network first underwent an initialization process where both the bias values and the weights were trained to produce the initial state of the system. This initialization procedure was motivated by the fact that in an unadapted animal the gain is close to one at all head orientations with small variation with respect to gravity. We also sought to establish the initial state in an unbiased manner, so as not to influence the succeeding gravity-dependent adaptation process. We thus initialized the neural network weights to zero and bias values to 1. We then trained the network by adapting the bias values for the units to a mean value of preadapted gain. This represented the “normal” gain of the aVOR. Consistent with the data, the preadapted gains were not uniformly distributed when the head was oriented into various positions, possibly from residues from previous adaptations (Yakushin et al. 2003b,c). We then adapted the weights only, holding the bias values constant to simulate the slight variation in the gain of preadapted normal monkeys. This was the initial preadapted state from which we performed the concurrent gravity-dependent and gravity-independent gain adaptation simulations of the bias and weights at a particular head orientation, using the actual measured gain value at the head position where the adaptation occurred as the target value.

For vertical gain adaptation, we set the head velocity ν_{h} for training as (12) The *k* factor governed the training speed of the weights. For simplicity, we set the default value as 1 and found by trial and error that a value of 3 was about the threshold of the system's convergence. The *h* factor, which determines the training speed of the bias values, was initially set to 1 as well, which made the training speed for the bias value the same as that for the weights. After the norm of the error vector reached below a preset threshold (0.01), the training stopped and the weights and bias values were retained as the basis for predicting gain values at other head tilt angles. Gain changes in percentage were then calculated from the predicted gains and the preadapted gains.

For simulation, the weights and bias values remained fixed throughout, whereas the distributed gain matrix *G* was changed as the head tilt angle changed. For every new tilt angle, the polarization vector **p̂ _{l}** of the individual units was reoriented and the new projection 〈

**â**,

_{g}**p̂**〉 was calculated. The matrix

_{l}*G*was then updated following

*Eq. 5*and the eye velocity was calculated according to

*Eq. 8*. The computed aVOR gain was then obtained by taking the ratio of eye velocity over head velocity.

The neural network program was developed using Matlab 7.0 (The MathWorks, Natick, MA). The neural network simulation was tested for the single-state, dual-state, and triple-state adaptation. The training usually required <100 iterations. In all cases, the predicted changes in gain compared favorably to the data.

## RESULTS

### Single-state adaptation

Experimental changes in vertical aVOR gains produced by out-of-phase stimulation (gain increases) for Monkey 1 are shown in Fig. 4, *A* and *B* (filled symbols) at head tilt angles from the position of adaptation to the opposite-side–down position. The model predictions for the same data, (Fig. 4, *A* and *B*, open symbols) were fit by a sinusoid (13) where * A* represents the amplitude of the gravity-dependent component of the gain change,

*x*is the head tilt angle,

*is the phase in angle relative to head position of the maximal gain change, and*

**B***is the bias representing the gravity-independent part of the gain change. Before adaptation, the vertical aVOR gains had an almost uniform distribution across all tilt angles, with values ranging from 0.7 to 1.0 (not shown). After adaptation that produced a gain increase, the gains had maximal change at the head position where the gain was adapted and gradually decreased at other angles, following a distinctive sinusoidal profile. Thus the neural network model accurately predicted the data after adaptation with the exception of RSD adaptation from −40 to −90° (Fig. 4*

**C***B*).

For gain decreases produced by in-phase oscillation of rotator and surround (Fig. 4, *C* and *D*), again the gain changes had maximal negative values at the head position where the adaptation had taken place and the changes in gain were less for all other positions, following a sinusoidal profile. However, the model underestimated the gravity-dependent changes for gain decrease in both the RSD and LSD positions [see Fig. 4*C*, from 10 to 90°, and Fig. 4*D*, from −10 to −90°; compare filled (experimental data) and open symbols (model predictions), respectively]. The model-data fit can be improved by adjusting the relative learning speed of the weights and the bias. For example, by increasing the updating rate for the weights for gain increase (*k* = 2, *h* = 1), the model accurately predicted the observed gravity-dependent changes (Fig. 4*E*, compare with Fig. 4*B*). Similarly, by decreasing the updating rate for the weights (*k*) and/or increasing that for the bias (*h*) the model accurately predicted the data for the gain decrease (Fig. 4*F*, *k* = 1, *h* = 2, compare with Fig. 4*D*). Although the variable-rate model predictions improved on the fixed-rate results, it is worth showing how the model outputs compare with the overall test data from both Monkey 1 and Monkey 2. The shaded regions in Fig. 4, *E* and *F* represent the SDs of the combined four experiments for gain increase and decrease, respectively. The data from LSD gain adaptations were flipped horizontally so that they could be considered together with RSD gain adaptations. The mean values of model predictions are shown as the thick dotted lines. Although there were some local deviations, such as for head orientations close to 0° (upright) for gain decrease (Fig. 4*F*), the model predictions generally fell within the SDs of the experimental data.

A general property of the learning was that when the learning speed of weights (*k*) and that of bias values (*h*) were the same, the side where the adaptation took place would have the same amount of gain change in both the gravity-independent and gravity-dependent components, with the summation equal to the adapted gain. However, with the head positioned to the opposite side, the gravity-dependent and -independent components subtracted and produced a gain close to the preadapted value. The model therefore supports the idea that gain changes constitute a sinusoidal gravity-dependent component as a function of gravity, which modulates around a gravity-independent bias gain change (Yakushin et al. 2003c).

### Dual-state adaptation

The model was also tested to determine whether it could predict dual-state adaptation (Yakushin et al. 2003a,c). It was previously found that when adaptation was alternately executed in two states, the spatial gain distribution encoded both adaptive states. For example, if the gain was decreased in the LSD position and increased with the animal right-side down, the gain change distribution reached the maximum negative values left-side down and maximum values right-side down, whereas the gravity-independent components of the adaptations cancelled. Data from four experiments for Monkeys 3 and 4 were collected in previous studies, where each animal was adapted in two dual-state conditions: the first condition was LSD gain increase and RSD gain decrease. The second condition was LSD gain decrease and RSD gain increase. The data from the first condition were flipped horizontally so that they could be combined with the second case. The SDs of the four experiments are shown as a shaded region (Fig. 5*A*). The thick dotted line is the averaged model predictions (*k* = 1, *h* = 1 for all dual-state adaptations). The neural network predicted the simultaneous gain decreases for LSD and gain increases when RSD with no bias component, consistent with the experimental results (Yakushin et al. 2003c).

### Triple-state adaptation

To further test the validity of the model, we predicted the distribution of gain changes after triple-state adaptation for Monkeys 1 and 5. The neural network was trained simultaneously for a gain decrease in the LSD position (−90°), a gain decrease at the upright (0°), and a gain increase in the RSD position (90°), with both learning rates set to be 1 (*k* = 1, *h* = 1). The model predicted the distribution of gain changes over all head positions for both animals, although it was trained at only three head positions (Fig. 5*B*, diamonds for Monkey 1 and circles for Monkey 5). The amount of gain change after adaptation varied between the two animals. Monkey 1 had significant gain changes, where the largest gain decrease occurred in the LSD position (−40%), a maximal gain increase in the RSD position (10%), and a gain decrease (−10%) in the upright position (Fig. 5*B*). Monkey 5 had smaller gain changes, with −17, 5, and −10% for the LSD, RSD, and upright positions, respectively. Regardless of the magnitude of the gain changes, the model predictions matched experimental data in both animals. Weight distributions (*w _{ijl}*) before (Fig. 5

*C*) and after (Fig. 5

*D*) training for Monkey 1 were approximately sinusoidal, reflecting the projection profile of the direction of acceleration of gravity onto the unit polarization vectors. Thus the neural network was flexible in morphing its weights to accommodate this complex set of adaptation states and closely predicted the actual data.

### Temporal evolution of gain adaptation

The temporal evolution of the gain adaptation was investigated by comparing gain changes computed at every iteration during the training process with those measured experimentally by testing the animals after every 0.5 h of adaptation. Both the gravity-dependent and gravity-independent components of the gain changes had a rising exponential profile as a function of time (Fig. 6, *B* and *C*, respectively, open circles for the experimental data, solid lines for the model prediction). As a result, the composite gain changes followed the same exponential relationship (Fig. 6*A*). The similarity in the gain change versus time profiles between model predictions and data suggests that the learning rules incorporated in the model may be close to the physiological mechanisms that implement the adaptive process in aVOR adaptation.

### Three-dimensional distribution of aVOR gain changes after adaptation

One of the benefits of using a 3D neural structure to model the combined canal–otolith influence on aVOR was that we could explore the gain change distribution of the aVOR after adaptation in 3D space. For example, in one experiment with Monkey 1, the vertical aVOR gain was decreased in the LSD position and the gain changes were tested in four sequences: from supine to prone, LSD to RSD, right-posterior to left-anterior, and left-posterior to right-anterior. The gain changes at other head-tilt positions were interpolated with a spline interpolation (Sandwell 1987) and the resultant surface formed the 3D gain change distribution shown in Fig. 7*A*. The model was trained using a single target gain value for the LSD position. The gain change predictions for all other head-tilt positions in 3D space were then computed from the model simulation (Fig. 7*B*). The model prediction reflects the ideal aVOR gain change in a 3D space. There was a good match between the predicted and experimental data.

The performance of the neural network model in predicting gain change was compared with both a single-sinusoid model fit in one plane (*Eq. 13*) and a first-order double-sinusoid model fit in three dimensions to the experimental data to determine how well the first-order and neural network models predicted the data. The choice of the original double-sinusoid fit was based on the observation that the gain changes in three dimensions had a single peak at the head position in which the gain was adapted and decreased along any plane of head tilt with a variation in amplitude that was approximately sinusoidal (Yakushin et al. 2005c). The double-sinusoid model was represented as (14) where *y* is the predicted gain change, * A* is the amplitude,

*x*

_{1}and

*x*

_{2}are the tilt angles,

**B****and**

_{1}

**B****are the phases, and**

_{2}*is the bias. The parameters of the double-sinusoid model were fit to minimize the mean square error between the model output and the gain changes calculated from the experimental gain values.*

**C**For comparison in a single plane, two versions of the single-sinusoid model were used, one with the phase (* B*) included, together with the amplitude (

*) and bias (*

**A***) in the parameter fit (single-sinusoid 1), the other with the phase fixed (single-sinusoid 2). We considered the latter because the neural network model is essentially a phase-fixed model with the assumption that the gain change at the head position of adaptation will always be the maximum. Therefore the comparison between the fit of the phase-fixed sinusoidal and the neural network models has more relevance. Gain changes predicted by the neural network were calculated as the differences between the predicted gain values and the mean value of the preadapted gain values, whereas the gain changes from the sinusoidal models were from*

**C***Eq. 13*with the equation parameters optimized to minimize the mean square error. Similarly, two versions of the double-sinusoid models were used in 3D comparisons. In the first of the double-sinusoid models, all four parameters, the amplitude (

*), bias (*

**A***), and phases (*

**C**

**B****and**

_{1}

**B****) were fit to minimize the mean square error, whereas in the second of these models, the phases (**

_{2}

**B****and**

_{1}

**B****) were fixed to be the theoretical values and were not included in parameter optimization to minimize the mean square error. The 3D surface of the gain changes in the neural network was obtained by subtracting the averaged value of the preadapted gain values from the model-predicted, 3D surface of gain values. For the double-sinusoid models, the 3D gain change surfaces were created from**

_{2}*Eq. 14*, which was fit to a spline-interpolated 3D surface of the original gain change data in four planes (for details see Yakushin et al. 2005c).

The root mean square error and correlation coefficients were compared between the model fits and the experimental data of gain changes. The results from tests in the plane where the adaptation took place are shown in Table 1 and the 3D results in Table 2. Because the goal of the sinusoidal fits was to minimize the mean square errors, both the single- or double-sinusoid models performed better than the neural network model for root mean square error (Tables 1 and 2). However, when correlation coefficients, which represent the similarities in the overall shapes of the gain changes, were compared, the neural network model performed on a par with or better than the sinusoidal models (Tables 1 and 2). The reason that the variable-phase sinusoidal fits performed better than the fixed-phase sinusoidal models was that the variable-phase models had more latitude to adjust their parameters.

## DISCUSSION

In this study, we demonstrate for the first time that a neural network model of the aVOR, trained at head positions with respect to gravity in which the aVOR gain was adapted, was able to predict the changes in aVOR gain at all other head positions. Although originally formulated from data on the vertical aVOR, the model was capable of predicting the spatial gain changes of the horizontal and torsional aVORs (Tables 1 and 2), as well as the effects of dual-state and triple-state adaptation (Fig. 5). It should be noted that because all variables are velocity and Listing's law does not apply to aVOR dynamics, this model predicts a general property of the aVOR.

The spatial distribution of gain changes after the neural network parameter adjustments (Figs. 4 and 5) was the result of minimizing the eye velocity error at a particular head orientation. When quantitatively correlated to the least-square-error fits of the data in one dimension (Table 1) and in three dimensions using a double-sinusoid fit (Table 2), the neural network model predictions of gain changes over all space were remarkable. Because the neural network, which constitutes the sum of a large number of sinusoids that had been adapted at one head orientation, closely matched the optimal fits to the data over all head orientations, we concluded that the learning, which we have modeled, may be fundamental to the actual learning rule that is centrally implemented.

A significant characteristic of the neural network model is that it is not a conventional feedforward neural network whose weights are determined by a training algorithm under the influence of a wide range of inputs. Rather, the weights are physiologically constrained by the relative orientations of the polarization vectors with gravity when the head is in a given orientation. This constraint requires that regardless of error, every weight value represents a single point on a sinusoid, and the entire spatial gain distribution is a summation of these sinusoids having the same spatial frequency, although with different amplitudes and phases. Therefore the training results in a prediction of a sinusoid based on the state of the polarization distribution at the head position during adaptation.

The learning rule we implemented was driven by the eye velocity error and thus was related to a generalized delta learning rule (Rumelhart et al. 1986). In addition, the amount of activation of the polarization vectors, i.e., the relative difference between the polarization vector and gravity, also played an important role in the learning. The stronger the activation of the particular polarization vector, the greater the rate at which the adaptive changes of the particular unit will take place. Thus the units in the neural network that receive the largest projection of the acceleration of gravity will be adapted the most over a particular time, producing the gravity-dependent adaptation. Once adapted the gain change will be localized to the specific head position in which adaptation took place. Moving the head to a new position will reduce the projection of the acceleration of gravity on that polarization vector, with a consequent reduction of the contribution of the maximally adapted network element to the gain of the aVOR, thus producing a sinusoidal spatial gain distribution.

The closeness of both the double-sinusoid and the neural network models to the data supports our theory that gravity-dependent gain adaptation in three dimensions consists of two components: a gravity-independent component and a gravity-dependent one (Xiang et al. 2004; Yakushin et al. 2000a, 2003c). The gravity-independent component is likely produced by alteration of the gain of cells that receive canal but not otolith input, whereas the gravity-dependent component is probably produced by those neurons that receive convergent canal/otolith input. From this, it would be predicted that adjusting the relative learning rates for the weights and the bias would have an important influence of the final shape of the gain profile. For example, an asymmetry in all single-state adaptation tests was that the data for gain reduction did not fit as well as the data for gain increases. Specifically, the gains from head positions on the side opposite to the position of adaptation were smaller than the model predictions. Variations in learning rates are well known, having commonly been encountered when comparing gain increases and decreases in many studies. The gain decreases occur at a much faster rate than the increases (Cohen et al. 1992; Melvill Jones 1996; Miles and Eighmy 1980). Our hypothesis to explain this is that in the cases of gain reduction, the bias (gravity-independent components) adapted faster than the gravity-dependent counterpart. If a greater rate of bias training was implemented in the model by increasing the value of *h*, or decreasing the value of *k*, a better fit to the data was achieved (Fig. 4*F*).

This prediction is simulated in Fig. 8. The learning rate for the bias component was set to be faster than that for the weights that modulate the gravity-dependent component. Consequently, the amplitude of the gravity-dependent component was diminished when it reached the end of the training period, and the combined gain in the opposite side position was adjusted further toward the direction of the gain decrease, as in Fig. 4. In the simulation of Fig. 8, the model was trained to decrease the gain at the RSD position by 40%. When the learning rates for the bias and weights were set to be the same as the default value, the amount of gravity-independent component (bias) and the amplitude of the gravity-dependent component were the same. Thus the gain change at the opposite side (LSD) would be 0%. However, if the learning rate of the bias was increased to threefold the learning rate for the weights, the gravity-independent component constituted 75% of the composite gain change at the adaptation side. Because the amplitude of the sinusoidal gravity-dependent component was reduced, the composite gain change at the opposite side (LSD) was adjusted in the direction of gain change (gain decrease) to reach a value of −20%. Our results have shown that when the gravity-dependent and the gravity-independent components were adapted at different rates, the model better predicted the experimental data. This lends support to the idea that the gravity-dependent and -independent components were separate processes that evolved along their own characteristic timescale.

Thirty-six units were chosen for each of the three orthogonal semicircular canals. However, it was not necessary to have that many units for the artificial neural network to simulate the data. The output of the neural network would have similar gain distributions if only a single unit were assigned to each plane. Because three units would have 27 connection weights associated with the nine elements in the gain matrix *G*, there would be enough plasticity for the neural network to be trained to accommodate the adaptation data. This works in the same manner with dual- and triple-state gain adaptation as that for single-state adaptation. The larger number of neurons used, however, provides better granularity, more closely mimicking how the otolith organs regulate the aVOR gain adaptation. In this regard, the model is based on the neurophysiological structure of the aVOR rather than being a generic neural network where minimizing the number of nodes is of concern (Reed 1993; Tan and Mavrovouniotis 1995).

There was generally a good match of the model predictions and the experimental data, although in some cases, the model fits did not accurately predict the data. As shown in Fig. 4*B* (Monkey 1), the preadapted gains had large variances (not shown) and the afteradaptation point of maximal gain change in the RSD position was not at the position of adaptation. The discrepancy between the model and data could be explained by the intrinsic fixed spatial frequency of the model. Although a better fit could be obtained by doubling the learning rate of the weights while maintaining the learning rate of bias values intact, as shown in Fig. 4*E*, it should be noted that the physiological constraints imposed by our model will not produce a gain change distribution that would fit an arbitrary profile.

Effects of translational eye movements in space (Medendorp et al. 2000) were not considered in this development because the neural network model was conceived for adaptation using targets at or close to the horopter or beyond, which would not involve changes in the gain of the aVOR. However, it should be noted that, although the gain of the aVOR was adapted in light, pre- and postadaptive testing was done only in darkness, and that these were the experimental results that were modeled.

To what extent does the model predict how individual neurons in the vestibular nuclei implement the gravity-dependent adaptation? Cells with convergent inputs from both the semicircular canals and the otolith organs as well as canal-only–dependent cells are likely to be the site of the processes modeled in this study. Such cells have been demonstrated in the vestibular nuclei (Baker et al. 1984; Brettler and Baker 2001; Curthoys and Markham 1971; Dickman and Angelaki 2002; Duensing and Schaefer 1958; Fukushima et al. 1990; Graf et al. 1993; Perlmutter et al. 1998; Yakushin et al. 2005a, 2006). Figure 1*B* shows how otolith–canal and canal-only cells might implement the gravity-dependent and gravity-independent components of the aVOR gain at the neuronal level. The amplitude of the canal response of the otolith–canal units would increase/decrease maximally after adaptation when the head is rotated in a position such that the polarization vector is aligned with gravity during adaptation. The canal-only units would have canal-related responses independent of head position. The model also predicts that there should also be classes of neurons that summate the gravity-dependent and -independent components. The vestibular nuclei contain a wide range of otolith–canal convergent neurons, however, and whether and how these different classes of vestibular neurons participate in the adaptation process are currently not known.

Other cellular structures might also be responsible for the adaptation and maintenance of the weights and bias values. Cells in the fastigial nuclei (Shaikh et al. 2005; Siebold et al. 1997, 2001; Zhou et al. 2001) and the nodulus (Sheliga et al. 1999) also have considerable otolith–canal convergence. Removal of the nodulus and uvula did not alter gravitational dependency of the aVOR adaptation (Yakushin et al. 2003a), however, so the other structures are more likely sites of the gravity-dependent process. The flocculus, which plays a powerful role in modulating the gain of the aVOR (Hirata and Highstein 2001; Ito 1984; Lisberger et al. 1994a,b,c; Zee et al. 1981) could contribute to the gravity-independent component of adaptation, which could then be modulated by the direct otolith–canal convergence as predicted by this model. Regardless of the site(s) involved in processing, the changes in the unit activity identified in this study should be present in secondary neurons in the vestibular nuclei.

In summary, data presented in this study show that a neural network model based on rather simple otolith–canal convergence together with a localized learning rule is sufficient to simulate the concurrent modulation of both the gravity-dependent and the gravity-independent gain changes of the aVOR. The close match between simulations and data in one- and three-dimensional space as well as for single-, dual-, and triple-state adaptation, supports the idea that the neural structure and learning process presented accurately model how the vestibular system implements gravity-dependent adaptation.

## GRANTS

This work was supported by National Institute on Deafness and Other Communication Disorders Grants DC-04996 and DC-05284 and National Eye Institute (NEI) Grants EY-04148, EY-11812, and NEI Core Center Grant EY-01867.

## Footnotes

↵1 An otolith polarization vector is a direction in head coordinates that maximally activates an otolith cell (Fernández and Goldberg 1976).

The costs of publication of this article were defrayed in part by the payment of page charges. The article must therefore be hereby marked “

*advertisement*” in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.

- Copyright © 2006 by the American Physiological Society