## Abstract

Our brain is inexorably confronted with a dynamic environment in which it has to fine-tune spatiotemporal representations of incoming sensory stimuli and commit to a decision accordingly. Among those representations needing constant calibration is interval timing, which plays a pivotal role in various cognitive and motor tasks. To investigate how perceived time interval is adjusted by experience, we conducted a human psychophysical experiment using an implicit interval-timing task in which observers responded to an invisible bar drifting at a constant speed. We tracked daily changes in distributions of response times for a range of physical time intervals over multiple days of training with two major types of timing performance, mean accuracy and precision. We found a decoupled dynamics of mean accuracy and precision in terms of their time course and specificity of perceptual learning. Mean accuracy showed feedback-driven instantaneous calibration evidenced by a partial transfer around the time interval trained with feedback, while timing precision exhibited a long-term slow improvement with no evident specificity. We found that a Bayesian observer model, in which a subjective time interval is determined jointly by a prior and likelihood function for timing, captures the dissociative temporal dynamics of the two types of timing measures simultaneously. Finally, the model suggested that the width of the prior, not the likelihoods, gradually shrinks over sessions, substantiating the important role of prior knowledge in perceptual learning of interval timing.

- time perception
- perceptual learning
- mean accuracy and precision
- Bayesian observer model
- prior and likelihood

to act on objects in an ever-changing and noisy environment, our brain has to dynamically calibrate its spatiotemporal representations of the objects by interacting with the environment. Among those representations in need of constant calibration, interval timing (IT) is of particular interest because its perception is often subject to illusion by many endogenous and exogenous factors (Eagleman 2008) such as attention and arousal (Stetson et al. 2007; Wittmann 2009), emotional states (Wittmann and Paulus 2008), eye movements (Burr et al. 2007), stimulus visibility (Terao et al. 2008), visual adaptation (Johnston et al. 2006), and temporal context (Jazayeri and Shadlen 2010). Therefore, it is not surprising to find that IT undergoes not only incessant calibration for veridical sensation and action but also perceptual learning (PL) for improved temporal precision (Bartolo and Merchant 2009; Buonomano et al. 2009; Karmarkar and Buonomano 2003; Meegan et al. 2000; Stetson et al. 2006; White et al. 2008). Despite this strong evidence for PL of IT, how the perceived time interval is adjusted by experience remains largely unknown.

Time encoding is distinguished from other types of sensory encoding in that time is a ubiquitous feature dimension composing all sensory events, yet with no single sensory organ directly dedicated to its encoding (Wittmann and van Wassenhove 2009). Because of this nature, previous attempts at investigating mechanisms of time perception were prone to confounding by other sensory, motor, or executive functions. Hence it is crucial to avoid those confounds by precluding any potential associations between IT and other nontiming variables. We measured IT based on natural reactions to an invisible object in motion that was once visible and tracked by observers while varying the time interval by manipulating the speed and distance of motion (Fig. 1, *A* and *B*). The essential element of our IT task is that perceived time interval (Δ*t*) is revealed implicitly as a component of the equation that has to be extrapolated from a speed (*V*) in an online fashion when the spatial information (Δ*X*) is available (Δ*T* = Δ*X*/*V*; Fig. 1*C*) (Coull and Nobre 2008; Merchant and Georgopoulos 2006; Rohenkohl et al. 2011). This feature allows us, unlike previous timing procedures (e.g., reproduction or discrimination of time intervals), to vary not only target durations but also other indirect features such as speed, location, and motion direction on a trial-by-trial basis (Buonomano et al. 2009; Livesey et al. 2007; Ryan and Fritz 2007; Staddon 2005), preventing IT from being associatively conditioned to specific sensory events (Janssen and Shadlen 2005; Leon and Shadlen 2003; Mita et al. 2009; Rakitin et al. 1998) or being affected by stimulus-evoked sensory processes (Johnston et al. 2006).

Behavioral consequences of PL in IT can be probed mainly by changes in two descriptive statistics (Fig. 1*C*): “mean accuracy (constant error)”—the degree of match between physical (Δ*T*) and subjective [μ(Δ*t*)] time intervals on average—and “precision (temporal variance)”—the reciprocal of variance [σ(Δ*t*)] of perceived time intervals across repeated trials (McAuley and Miller 2007; Merchant et al. 2008b; Zarco et al. 2009). Despite the intimate relationship between the two measurements, which has been demonstrated in many visual tasks (Ahissar and Hochstein 1997; Gold et al. 2010; Herzog et al. 2006; Wenger et al. 2008), simultaneous measurements or conjunctive analyses of mean accuracy and precision data have been rare in studies on IT. Only a few studies have reported concurrent improvement (Meegan et al. 2000) or distinct effects of interstimulus interval or adaptation on discrimination threshold and apparent duration (Buonomano et al. 2009; Johnston et al. 2006; Stetson et al. 2006). Our paradigm allowed us to track PL in IT over a long time period by assessing ongoing changes in its mean accuracy and precision simultaneously.

Given the hierarchical nature of brain anatomy and functions supporting sensory perception, neural substrates underlying PL have been widely addressed by assessing the specificity of learning for particular stimuli or tasks during training (Fahle 2005; Fine and Jacobs 2002). A high specificity has been interpreted as adaptive changes at a relatively early, low-tier processing stage (Adini et al. 2002; Gilbert et al. 2001; Tsodyks and Gilbert 2004), whereas a nonspecific transfer of learning was taken as evidence for changes in top-down signals such as selective attention during PL (Ahissar and Hochstein 1997; Ahissar et al. 2009; Schäfer et al. 2009; Yu et al. 2004). We examined learning specificity on a fine scale for a wide range of Δ*T*s. In particular, we obtained “feedback transfer curves” that quantify the degree of generalization along the broad timescale around the interval trained with feedback throughout 10 daily sessions (Fig. 1*D*). These extended follow-ups of learning transfer curves on such a fine and wide timescale allowed us to observe dynamic time courses of PL in IT, distinct from previous studies, in which transfer curves were compared only between before and after training (Bartolo and Merchant 2009; Karmarkar and Buonomano 2003; Meegan et al. 2000; Nagarajan et al. 1998; Wright et al. 1997).

Our results revealed clear dissociations between mean accuracy and precision of IT in terms of both time course and specificity of learning. Mean accuracy reacted immediately to feedback by showing a partial transfer around the interval trained with feedback, while precision did not exhibit any specificity and improved slowly throughout the whole sessions without dependence on feedback. A Bayesian observer model described well the observed different dynamics in IT between mean accuracy and precision and provided a parsimonious explanation that PL of IT is promoted by the gradual reduction of the width of the prior distribution, substantiating the important role of prior knowledge in PL of IT.

## METHODS

### Subjects and Apparatus

Four naive subjects (2 women, 2 men; age 21–25 yr) participated in the experiment. All subjects had normal or corrected-to-normal vision and gave written informed consent. This study was approved by the Institutional Review Board of Seoul National University.

Stimuli were presented at 1,024 × 768-pixel resolution (40 × 30 cm) with a refresh rate of 60 Hz on a cathode ray tube monitor (LG HiSync 291U) with 81-cm distance from eyes to the monitor. A chin rest was used to minimize head movements. A button press was recorded by a numeric keypad connected to an Apple Power Mac G5 computer through a USB port.

### Experimental Procedure and Stimuli for IT Task

At the beginning of each IT task trial, subjects viewed a black (0.81 cd/m^{2}) annulus [radius = 5 degrees of visual angle (°); 1° thickness] around the fixation cross at the center of a screen with gray (19.53 cd/m^{2}) background (Fig. 1*A*). During the “cue” period (Fig. 1*B*, *left*), a stationary thin (0.5° width; 1° height) white (74.58 cd/m^{2}) bar was shown in a randomly selected position on the annulus with a pair of white nonius lines (0.3° width; 0.5° height) at the edges of the annulus. These nonius lines demarcated an occlusion arc in which the bar underwent invisible motion. In addition, a black-white annulus was also shown near the fixation to help subjects recognize the size and location of the occlusion arc (Fig. 1*A*). After the cue period of 250 ms, the bar started to move in either a clockwise or a counterclockwise direction along the annulus toward one of the nonius lines at a constant speed, which was varied across trials. After this “visible motion,” the duration and trajectory of which were varied pseudorandomly across trials (750 ± 630 ms for duration; uniform distribution of 3–4° arc length; 0.6–0.8 rad), the moving bar became invisible while passing through the “occlusion” arc (Fig. 1*B*, *center*). Subjects judged when the bar would reappear at the end of the occlusion arc by pressing a button key with the right index finger. The length of the occlusion arc (Δ*X*) was varied across trials within a uniform distribution of arc length 3–15° (0.6–3 rad). At the time of a key press, a thin line was flashed briefly (a single monitor frame of 16.67 ms) inside the nonius line as an indication that a response was made. A trial was terminated when subjects did not press the key until after the time passed more than twice a physical interval (Δ*t* > 2 × Δ*T*). The proportion of these “miss” trials was <2% of the total number of trials. To examine the specificity of learning, feedback was only given in trials with the median (1.32 s) of nine test intervals sampled in a logarithmic scale (from 0.5 s to 3.5 s). The feedback was provided at the time of a key press by showing subjects a snapshot of the actual location of the moving bar for 750 ms (Fig. 1*B*, *right*). Hence an amount of error in time (Δ*t* − Δ*T*) can be derived by dividing the spatial gap between the feedback bar and the nonius line by the speed of the moving bar.

The speed of the bar on a given trial was determined by dividing a randomly selected length of the occlusion arc by a test time interval (*V* = Δ*X*/Δ*T*). The resulting distribution of speeds was positively skewed (across-session mean, standard deviation, skewness = 8.22°/s, 6.15°/s, 1.28°/s) and ranged from 0.95°/s to 29.29°/s. We underscore that stimulus features other than the Δ*T* were either counterbalanced (direction of bar motion) or varied in a pseudorandom manner (*V*, Δ*X*, arc length and starting position of the visible motion). Thus, even when a given set of trials had an identical test ΔT, those individual trials had highly dissimilar nontemporal variables. Because trial-by-trial changes in those nontemporal variables may affect timing performance, we inspected whether the time course or learning specificity of timing accuracy and precision in each session varied in a manner dependent on the across-trial differences in *V*, Δ*X*, or duration of visible motion. To do this, we split trials with the same Δ*T* into a lower half and a higher half according to each of the nontemporal variables and compared the two halves of trials in timing accuracy or precision as a function of Δ*T*. We found no qualitative difference between the two split halves in any of the nontiming variable dimensions, indicating that the stimulus features other than Δ*T* had hardly any impacts on timing performance. We also emphasize that we varied the test Δ*T* on a trial-by-trial basis instead of presenting a single interval repeatedly within a block of trials. This “roving” was introduced to prevent subjects from employing a feedback-driven adaptive strategy to adjust a response threshold (Whiteley and Sahani 2008), which may generate unwanted sources of variance in timing measurements.

It is worth noting some of the descriptions offered by subjects during postexperimental debriefing. First, none of the subjects reported being aware that feedback was given only to trials with a specific test interval (1.32 s). Second, subjects described adopting either a “dynamic imagery” strategy (Pearson et al. 2008), where they responded to an imaginary bar in motion, or a “spatiotemporal equation” strategy, where they estimated the interval by combining spatial (e.g., ratio between the visible and invisible arc lengths) and temporal (e.g., duration of the visible motion) information. Finally, the debriefing confirmed that none of the subjects mentioned using a verbal or nonverbal chronometric counting strategy, as explicitly instructed before the experiment.

The whole experiment consisted of four daily sessions of “no feedback (NF)” conditions and an ensuing six sessions of “specific feedback (SF)” conditions (Fig. 1*D*). For a given subject, all the sessions were carried out at a fixed time in a day to prevent potential interactions between circadian rhythm and IT (Pashler and Medin 2004). The SF sessions were identical to the NF sessions except that 35 additional training trials with feedback were inserted and randomly intermingled with the other 270 test trials. As a practice for the IT task, each subject experienced 15 IT trials with no feedback on the first day of the experiment. In each daily session, subjects first performed 270 trials (30 trials for each of the 9 test intervals) of the IT task and then 180 trials (30 trials for each of the 6 test speeds) of the speed discrimination (SD) task, as described below.

### Control for Sensorimotor Function and Speed Perception

In the SD task trials, subjects performed a two-interval forced-choice task, in which they viewed two bars drifting on the visible arc in separate intervals and judged which one moved faster. While setting the pedestal speed to 8°/s, which roughly matched the mean speed (8.22°/s) in the IT task, we used an evenly distributed set of six test speeds ([4, 5.6, 7.2, 8.8, 10.4, 12]°/s; method of constant stimuli), each shown in 30 trials. Similar to the IT task, we prevented subjects from judging the relative speed of the two bars based on only temporal or spatial information by varying the arc length to be traveled by the two bars within the uniform distribution ranging from 3° to 4°. From data in each session, we plotted a psychometric curve of proportion of “faster” responses as a function of test speed and fit a cumulative Gaussian function to the curve with the maximum likelihood estimation method (Wichmann and Hill 2001). Then we estimated a point of subjective equality (PSE) and sensitivity by finding its mean and standard deviation, respectively.

The motor execution (ME) task trials were intermittently inserted between the IT task trials (20 ME trials per 270 IT trials of a daily session). This was possible because the ME trials were exactly the same as the IT trials in task structure except that the moving bar was always visible. Without knowing ahead whether a given trial would be a ME or IT trial until the moving bar entered the occlusion arc, subjects were instructed to simply press the key at the point of time when the bar arrived at the end of the occlusion arc. To minimize unwanted potential influences of ME trials on timing performances, a stimulus duration on each ME trial was randomly sampled from a continuous uniform distribution, the range of which matched that of the intervals in the main timing task. Because the distribution of errors in timing data from the ME trials did not vary systematically depending on the Δ*T*s, we merged the data across all the ME trials with different Δ*T*s within each session and simply computed the mean and standard deviation of motor errors (Δ*t* − Δ*T*), which here represent the accuracy and precision, respectively, of the sensorimotor function. No feedback was provided either in the ME trials or in the SD task, while there were 3 and 12 practice trials for the ME trials and the SD task, respectively, before the whole experiment began.

### Data Analysis

For each session, we removed from analysis outlier trials in which a Δ*t* fell away from the mean response time μ(Δ*t*) for a given Δ*T* by more than 3 standard deviations [>3 × σ(Δ*t*)] because both of the two major statistics in our study are sensitive to extreme outliers. The fraction of the outliers did not exceed 2% of the total trials. No miss or outlier trials were found in the control (ME and SD tasks) trials. In addition, we excluded from analysis the trials that were trained with feedback in the SF condition.

#### Within-session analysis of accuracy and precision in IT.

A response time (Δ*t*_{jk}^{i}, where *i*, *j*, and *k* indicate a trial number, an index for Δ*T*, and a session number, respectively) was sorted into one of the nine sets defined by the *j*th physical time interval in a given session *k* (Δ*T*_{jk}), as in Fig. 2*A*. For each of these sets, the mean and standard deviation of perceived time intervals were computed [μ(Δ*t*)_{jk} and σ(Δ*t*)_{jk}].

To characterize the overall mean accuracy for a given session *k*, we computed *D*_{k}, an across-Δ*T* mean of normalized deviations of Δ*t*_{jk}^{i} from corresponding Δ*T*_{jk}:
*T*_{jk} was done as correction for the scalar property of IT. *N*_{j}, the number of Δ*T*s, was always 9.

To index the overall precision of timing behaviors in a session, we averaged the interval-specific coefficient of variation cv_{jk} across intervals for a session *k*:

#### Across-session analysis of accuracy and precision in IT.

To test the monotonicity of learning curves across sessions, we fit an exponential function of the *k*th session (Dosher and Lu 2007), *ae*^{bk} + *c*, to the observed time course of the two measures *D*_{k} and CV_{k}, using a method of nonlinear least squares and Levenberg-Marquardt algorithm in MATLAB (MathWorks). If *a* and *b* have different signs, the learning curve is interpreted to decrease monotonically. In addition, Spearman's rank correlation test was also used to test whether the long-term time course of timing performance increases or decreases with an increasing order of daily sessions.

#### Analysis of interval-specific learning in IT.

We evaluated feedback effects on timing accuracy and precision by assessing the degree to which PL in a specific time interval with feedback (1.32 s) is transferred to other neighboring intervals without feedback in the following steps. First, for each of the “NF” and “SF” sessions, we obtained *1*) “deviation (*D*_{j}^{NF}, *D*_{j}^{SF})” curves of absolute normalized deviations (|*d*_{jk}|) and *2*) “variability (CV_{j}^{NF}, CV_{j}^{SF})” curves of coefficient of variation (cv_{jk}) as a function of Δ*T*_{jk} by averaging them across sessions separately for NF and SF conditions (*Eqs. 1* and *2* with averaging across session index *k*, not across interval index *j*, and replacing *N*_{j} with *N*_{k}^{NF} and *N*_{k}^{SF}, 4 and 6). We then estimated learning indexes for accuracy (LIA) by dividing the difference between the pre- and postfeedback deviations by their sum:
*T*. Note that the feedback transfer curve is different from the specificity measures used previously in that the test Δ*T*s have not been monitored during training for the pre- and posttraining comparison in previous studies. Similarly, we also acquired a feedback transfer curve with learning indexes for precision (LIP) based on the CV data:
*d*_{jk}|) and precision curves (cv_{jk}) separately across subjects within each session to assess dynamics of interval-specific feedback effects. The 95% confidence intervals (C.I.) for those average accuracy and precision measurements were obtained by resampling IT data for a Δ*T*_{jk} in a daily session with replacement and applying the same procedure to the 1,000 bootstrapped data sets to estimate our measures.

#### Dissociation of Vierordt's law-associated and feedback-driven improvement in timing accuracy.

To test a possibility that the Vierordt's law-associated effect (see results)—overestimated and underestimated μ(Δ*t*) for relatively short and long Δ*T*s—confounds specificity of the feedback effect, we linearly decomposed the improved accuracy at the first SF session into contributions from those two effects and compared the bias-corrected feedback effect (BCFE) across the feedback and nonfeedback Δ*T*s. First, for each Δ*T* in each individual, a “change in accuracy performance from the fourth to the fifth session that can be attributed to Vierordt's law,” Δ*d̂*_{j,k=5}, was estimated by averaging the changes in accuracy performance between all possible pairs of the neighboring sessions except that involved in the fifth session (averaging across the pairs only within the NF condition also leads to qualitatively similar and statistically significant results):

Second, we then corrected the original feedback effects at the fifth session for the central bias-induced changes in accuracy by parsing out Δ*d̂*_{j,k=5} from the original feedback effects:
_{j} is a feedback effect (|*d*_{j,k=4}| − |*d*_{j,k=5}|) that is corrected for bias (Δ*d̂*_{j,k=5}) at the *j*th Δ*T*.

To statistically test whether these BCFE_{j}s are tuned around the feedback Δ*T*, we performed the Wilcoxon signed-rank test on the 16 pairs of the feedback Δ*T* and Δ*T*_{far}s (4 subjects × 4 Δ*T*_{far}s), where Δ*T*_{far}s were defined as the two shortest and two longest Δ*T*s. In addition, to evaluate feedback-induced timing enhancements relative to the variability of changes in timing accuracy in the remaining pairs of consecutive sessions, we transformed BCFE values into *z* values by dividing them by the standard deviation of changes in timing accuracy between all possible pairs of the consecutive sessions except those involved in the fourth and the fifth sessions. We found that the normalized BCFE value (merged across subjects) was significantly different from zero only around the feedback interval [*z* = 2.011 (*P* < 0.05) for 1.32 s and *z* = 2.05 (*P* <0.05) for 1.69 s] and decreased gradually as a function of temporal distance from the feedback interval, indicating that the introduction of feedback at the 1.32-s interval in the fifth session generated highly substantial degrees of feedback interval-specific accuracy enhancements, which are well beyond the range of feedback-unrelated changes in timing accuracy.

#### Bayesian observer model for PL of IT.

To predict Δ*t* for a given Δ*T*, the model combines a likelihood function tuned for a Δ*T* with a prior probability distribution, both of which we assumed as Gaussian distributions. We opted for the Gaussian prior and Gaussian likelihood for the sake of computational and interpretive convenience by exploiting the fact that the Gaussian distribution is a conjugate prior of the Gaussian likelihood (see discussion for the assumption). The Gaussian distribution is completely characterized by two statistics, the mean and standard deviation, for which each of the prior and the likelihood has its own value (μ_{prior}, μ_{likelihood}, σ_{prior}, σ_{likelihood}) in the model. By applying Bayes' rule, the mean and standard deviation of the posterior distribution can be obtained as follows:
_{prior} and μ_{likelihood} with their relative proportion of σ_{likelihood}^{2} and σ_{prior}^{2} as weights. The posterior variance is a product of σ_{prior}^{2} and σ_{likelihood}^{2} divided by their sum. We assumed that *1*) μ_{likelihood} is unbiased and equal to Δ*T*, although the likelihood can be contaminated by sensory noise, and *2*) the relationship between μ_{likelihood} and σ_{likelihood} is defined by the scalar property. The assumption of unbiased μ_{likelihood} has been adopted in previous studies and is known to have negligible impacts on modeling outcomes (Alais and Burr 2004; Ernst and Banks 2002; Jazayeri and Shadlen 2010; Körding and Wolpert 2004; Stocker and Simoncelli 2006). The latter assumption enabled us to greatly reduce the number of free parameters using a single CV value instead of using nine (1 for each of the 9 Δ*T*s) σ_{likelihood}s (Jazayeri and Shadlen 2010; Miyazaki et al. 2006). Our assumptions were validated by extensive model comparisons (see below for comparisons of nested model variants). Accordingly, three free parameters, including μ_{prior}, σ_{prior}, and CV, were fitted to a data set from a daily session. The resulting posterior can be written as a full conditional probability function:

For each individual session from each subject, we estimated the best-fitting parameters that maximize the likelihood function of the σ_{prior}, μ_{prior}, and CV, assuming statistical independence among trials.

We also fitted the model to data from a pilot experiment, which is identical to the main experiment except for the number of trials (∼100 trials per session), sessions (6 daily sessions without feedback and 9 sessions with specific feedback to 2 s), session structure (the trials with feedback are separately blocked and placed before the main IT task in the SF condition), and sampled intervals (random sampling from continuous uniform distribution from 0.5 s to 6.5 s). Although the nondiscrete sampling hinders application of the above analysis for accuracy and precision, we were able to fit the three-parameter Bayesian model to the pilot data from six additional subjects. Similar to the main data (see results), we found the σ_{prior} (rank correlation for across-subject average, ρ = −0.95, *P* < 0.001), not that of the likelihood (rank correlation for across-subject average CV, ρ = −0.02, *P* = 0.94), gradually decreased along the sessions.

#### Comparison of nested model variants.

We evaluated eight nested model variants in terms of goodness of fit, using Bayesian information criteria (BIC), which take into account sample size and number of parameters for the model complexity to place competing models on equal footing (Pitt and Myung 2002). The tested model variants, which differ in assumptions for prior and likelihood distributions, can be sorted out in decreasing order of number of free parameters: *1*) an “exaggerated” model with interval-specific priors and interval-specific likelihoods as free parameters; *2*) a “full” model, same as *1* but now with a single prior common to all Δ*T*s set to be free; *3*) a “fixed mean of likelihood” variant of the reduced model, same as *2* but with μ_{likelihood} fixed to corresponding Δ*T*s; *4*) and *5*) “fixed mean of prior” variants of the reduced model, same as *3* but now with the μ_{prior} fixed to be equal either to the mean (*4*) or the median (*5*) of the entire set of Δ*T*s, respectively; *6*) a “single CV” variant of the reduced model, same as *3* but now with a single CV parameter that generates the σ_{likelihood} based on the scalar property; and *7*) and *8*) minimal models, same as *6* but now with the μ_{prior} fixed either to the mean (*7*) or the median (*8*) of the entire set of Δ*T*s, respectively. For details of these models, see Table 1, which compares the models in terms of assumptions for prior and likelihood distributions, number of free parameters, and BIC values from best-fitting parameters.

We fitted models separately to data from individual sessions in each subject, using the maximum likelihood estimation method with the “*fmincon*” function in MATLAB (MathWorks). We found that the “single CV” variant of the reduced model (the 6th model, with free μ_{prior}, σ_{prior}, CV and the veridical μ_{likelihood}) produced the minimum BIC values in the majority of sessions (mean rank ± SE = 1.78 ± 0.17 among 8 tested models). This model also turned out to be the best for all the subjects when BIC values were pooled over the entire sessions, indicating that the “single CV” model variant was the best among all the model variants when both predictive power and model complexity were considered. The exaggerated model and the full model did not differ significantly in predictive power, indicating that the multiple excess priors assumed in the exaggerated model are unnecessary. Even when the models were compared only in log likelihood, not in BIC, the “single CV” variant with 3 free parameters performed only marginally worse than the “fixed mean of likelihood” variant (the 3rd model), which has 11 free parameters (mean rank ± SE = 2.95 ± 0.25, the second minimum among the 8 models), despite the considerable difference in degrees of freedom (3 vs. 11).

We confirmed the stability of results by systematically varying initial guesses for free parameters. We also examined effects of lower bounds for parameters on fitting results. Obviously, σ_{likelihood} and σ_{prior} have a constraint of positivity for their sign. However, it is less obvious whether we should impose the positivity constraint on μ_{prior} and μ_{likelihood} as well. We found that the lower bounds of zero for μ_{prior} and μ_{likelihood} show little effect on the BIC values.

## RESULTS

### Scalar Property and Vierordt's Law in Response Time Distributions

In our implicit IT task, unlike typical IT tasks, subjects were not asked to judge or reproduce directly perceived time intervals (Δ*t*). Instead, after a drifting bar became invisible, subjects simply reacted to the invisible bar by estimating when it would arrive at a designated location on the annulus (Fig. 1*B*). By measuring the time that elapsed between the disappearance of the visible bar and subjects' response, we obtained an estimate for Δ*t* as a component of the physical equation, Δ*T* = Δ*X*/*V*, when the information about *V* and Δ*X* is available to subjects (Fig. 1*C*; see methods for the specific task strategies adopted by subjects). Across 10 daily sessions (Fig. 1*D*), we collected sets of Δ*t* for a range of physical time intervals (Δ*T*) and computed their means [μ(Δ*t*)] and standard deviations [σ(Δ*t*)] (Fig. 2).

We confirmed the validity of our data as a necessary condition of IT by examining whether the Δ*t* distribution conforms to the scalar property, a hallmark signature of IT (Buhusi and Meck 2005; Wearden 2008). The linear relation between μ(Δ*t*) and σ(Δ*t*) was quite evident from the Δ*t* distributions that were sorted by physical Δ*T*: spread of the distributions increased monotonically with an increasing Δ*T* (Fig. 2*A* for data from a representative subject). To quantitatively evaluate the scalar property in our data, we tested whether the correlation coefficient between μ(Δ*t*) and σ(Δ*t*) is significant within each session (Fig. 2*B* for representative data). The correlation between μ(Δ*t*) and σ(Δ*t*) was significant in all of the sessions for all subjects (*r* = 0.89–0.99; *P* = 10^{−3}-10^{−7}). When σ(Δ*t*) was normalized by the corresponding μ(Δ*t*), the resulting coefficient of variation (CV) values (mean across sessions and subjects, 0.24; standard deviation, 0.048) were also comparable to those in previous studies (Lewis and Miall 2009; Wearden 2008).

In addition, the μ(Δ*t*)s departed from their corresponding veridical Δ*T*s in an orderly way: μ(Δ*t*) was overestimated for short Δ*T*s and underestimated for long Δ*T*s, which has been referred to as Vierordt's law (Fig. 2*A*) (Fortin et al. 2009; Jones and McAuley 2005; Kanai et al. 2006; Lewis and Miall 2009; McAuley and Miller 2007; Penney et al. 2008; Wearden 2008; Zarco et al. 2009). The statistical significance of Vierordt's law in our data can be tested by regressing μ(Δ*t*) as a linear function of Δ*T* because the bias toward the center of sampled Δ*T*s is translated into a linear line with a slope smaller than 1 and an intercept larger than 0. The slopes (mean 0.77, standard deviation 0.20) were significantly smaller than 1 [*t*(39) = −7.55, *P* < 10^{−8}, 1-sample left-tailed *t*-test], and the intercepts (mean 0.32, standard deviation 0.18) were significantly larger than 0 [*t*(39) = 11.55, *P* < 10^{−13}, 1-sample right-tailed *t*-test]. This pattern of bias that conforms to Vierordt's law tended to become gradually stronger in magnitude and more consistent across subjects as an order of daily sessions (Fig. 3). The intercept and the slope were found to increase, when averaged across subjects (Spearman's rank correlation, ρ = 0.91, *P* < 0.001), and to tend to decrease (rank correlation, ρ = −0.56, *P* = 0.09), respectively. This gradual augmentation of bias in mean accuracy will be addressed again below when our Bayesian model predicts the intimate relationship between biases in mean accuracy and improvements in timing precision.

### Dissociation Between Timing Accuracy and Precision Across Sessions

To monitor changes in mean accuracy of IT across sessions, we first normalized offsets of mean response times at corresponding physical intervals by dividing them by physical intervals {*d* =[μ(Δ*t*) − Δ*T*]/Δ*T*; *Eq. 1*}. Then, by averaging the absolute values of normalized deviations or constant errors (|*d*|) across Δ*T*s in a given session, we indexed the overall degree of accuracy (*D*) in IT for that session (Fig. 4*A*; *Eq. 1*). To examine whether the averaged accuracy increased or decreased monotonically as a function of session order, we fitted an exponential function to the time course of mean accuracy and inspected the signs of the multiplication and exponent terms (see methods). If the multiplication term *a* and the constant *b* in the exponent have different signs, the learning curve can be interpreted as decreasing monotonically. The multiplication of those two terms was not consistent across subjects, being negative for two subjects [*S1*, *a* × *b* = −0.050 (C.I. = [−0.060, −0.036]); *S2*, *a* × *b* =−8.3 × 10^{4} (C.I.= [−2.6 ×10^{4}, −1.3 ×10^{4}])] and positive for the other two subjects [*S3*, *a* × *b* = 0.009 (C.I. = [0.006, 0.014]); *S4*, *a* × *b* = 0.36 (C.I. = [0.23, 0.45])]. For goodness of fit, the percentage of variance explained ranged from 19.2% to 62.4%, indicating that the overall mean accuracy *D* neither decreased nor increased monotonically across sessions (rank correlation for *D* averaged across subjects, ρ = −0.16, *P* = 0.66).

Note that the feedback was not provided until the fifth daily session and was given only to one particular interval (Δ*T* = 1.32 s) from the fifth session (Fig. 1*D*). In the “specific feedback (SF)” sessions, feedback was provided by showing subjects a snapshot of the true location of the moving bar at the time of a key press (Fig. 1*B*, *right*). In other words, the feedback was a spatial offset of the bar from the nonius line at the moment of a key press, which in principle can be translated into a temporal offset (Δ*t* − Δ*T*) when divided by the speed (*V*) (Fig. 1*C*). We examined whether abrupt changes in timing accuracy occurred after the feedback was provided. A significant change was observed between the last “no feedback (NF)” session and the first SF session [*t*(3) = 3.61, *P* = 0.04; a planned paired *t*-test], indicating that the feedback improved the overall timing accuracy immediately even though feedback was given only to one particular interval. To further examine the abrupt enhancement of timing accuracy, we tracked the time course of accuracy over individual trials within the first SF session. The deviations of Δ*t*s from corresponding Δ*T*s became smaller only after several feedback trials and then quickly reached an asymptote, remaining relatively stable. These asymptotic and stable deviations are in great contrast with those observed in the preceding NF sessions, which fluctuated unpredictably (data not shown). Across the sessions both before and after the instantaneous feedback effect, the mean deviation tended to increase gradually (Fig. 4*A*), in line with the slowly growing biases toward the center Δ*T*. Interval-specific impacts of feedback on timing accuracy are addressed again in another section below, where we inspect the specificity of perceptual learning and its relationship with the gradual buildup of central biases in μ(Δ*t*) toward the center Δ*T*.

Session-by-session changes in timing performance were also examined in terms of overall precision by averaging CV values across intervals within each session (*Eq. 2*). By contrast with the time course of timing accuracy, the mean CV gradually decreased over sessions (Fig. 4*B*; rank correlation for CV averaged across subjects, ρ = −0.90, *P* < 0.001). The monotonic decrease in timing precision over sessions was evident in all of the subjects, which was confirmed by significant negative values for multiplication of the constant and exponent terms in the fitted exponential function [*S1*, *a* × *b* =−0.25 (C.I. = [−1.10, −0.06]); *S2*, *a* × *b* = −0.008 (C.I. = [−0.01, −0.006]); *S3*, *a* × *b* = −0.07 (C.I. = [−541, −0.007]); *S4*, *a* × *b* = −0.03 (C.I. = [−0.05, −0.02])]. The goodness of fits was better than those for the accuracy data: the percentage of variance accounted for ranged from 55.8% to 88.5%. Again in contrast with the accuracy data, there hardly existed any impacts of feedback on time courses of timing precision. The amount of difference in the mean CV between the last NF and the first SF sessions was not significant [*t*(3) = −0.20, *P* = 0.86; a planned paired *t*-test]. Instead, the monotonic decrease in precision seemed to be present even in the NF sessions alone (rank correlation for CV averaged across subjects, ρ = −1.00, *P* = 0.08).

On the basis of the inspection of the overall pooled-across-Δ*T* error metrics in timing performance over sessions, we conclude that accuracy improved only immediately after the introduction of feedback, whereas precision improved steadily throughout the whole sessions regardless of the feedback.

### Across-Session Dynamics of Interval-Specific Perceptual Learning

We first assessed impacts of feedback on timing performance by comparing the mean accuracy and precision before (NF sessions) and after (SF sessions) the feedback. For timing accuracy, we averaged absolute normalized deviations (|*d*|) across the NF and the SF sessions separately, resulting in the “prefeedback” and “postfeedback” deviation curves, respectively, as a function of Δ*T* (*D*_{j}^{NF}, *D*_{j}^{SF}; Fig. 5*A*). Then, for each Δ*T*, we indexed perceptual learning by dividing the difference between the pre- and postfeedback deviations by their sum (*Eq. 3*). We obtained a “feedback transfer curve” for timing accuracy by plotting these LIAs as a function of Δ*T* relative to the feedback Δ*T* (Fig. 5*B*). Similarly, we also acquired a feedback transfer curve for timing precision based on the CV data (CVs for Fig. 5*D*; LIPs for Fig. 5*E* and *Eq. 4*).

There were interval-specific improvements in timing accuracy after the feedback: the improvement in timing accuracy was maximal around the feedback Δ*T* and gradually diminished as the difference between a test and the feedback intervals increased (Fig. 5*A*), yielding smooth transfer curves for all subjects (Fig. 5*B*). Statistically significant improvements in accuracy were indicated by the positive LIA values whose 95% confidence intervals estimated by a nonparametric bootstrapping were above zero in Fig. 5*B*. Conversely, postfeedback improvements in timing precision were not confined to the feedback Δ*T* (Fig. 5*D*). Although the overall degree of improvements in precision (estimated by LIP) varied among subjects, the feedback transfer curve did not show any systematic patterns as in the timing accuracy (Fig. 5*E*), indicating that the improvements by feedback in timing precision did not vary depending on test intervals relative to the feedback Δ*T*.

Given that the feedback was given to the median Δ*T* and there were the slowly growing biases toward the central Δ*T* (Fig. 2*A*), the observed “feedback interval-tuned” improvements in timing accuracy could also be attributed to the growing central biases, which were likely to lead to deterioration in timing accuracy at the short and long Δ*T*s predicted by Vierordt's law (Fig. 3, Fig. 4*A*). To check this possibility, we estimated the contribution of the central biases to the feedback interval-tuned calibration of mean accuracy at the fifth session with Δ*d̂*_{j,k=5}, that is, the averaged changes in the deviations between all possible pairs of the neighboring sessions except the fourth and fifth (*j*th Δ*T*, *k* for session; *Eq. 5*), and examined whether the feedback specificity in accuracy enhancement remains to exist in the feedback effects that were corrected for the contribution of the central biases. We found that the bias-corrected feedback effects (*Eq. 6*) were maximal around the feedback Δ*T* and then decreased as it moved away from the feedback Δ*T* toward either the shortest or longest Δ*T*s (*P* = 0.0052, *z* = −2.792, Wilcoxon signed-rank test). The same results were obtained when we reinspected the bias-corrected feedback effects after normalizing them for the interval-specific variability of changes in timing accuracy in all possible pairs of the neighboring sessions except the fourth and fifth (see methods for details). On the basis of these results, we conclude that the feedback-specific improvements in timing accuracy are still present even after the potential contribution of the growing central biases to feedback effects was parsed out.

To reveal further the dissociative nature of timing error measures, we also tracked dynamic changes over daily sessions in the deviation (*d*) and variability (CV) curves. We averaged them across subjects and estimated confidence intervals for those sample mean statistics as the same bootstrap procedure adopted above (Fig. 5, *C* and *F*). Before feedback, the deviation curves did not exhibit a constant shape and fluctuated unstably in an overall level with a tendency to increase in the order of sessions. As soon as feedback was provided on trials with the median Δ*T* (1.32 s), the deviation curve changed dramatically by conforming to a “V” shape. The reduction in deviation peaked around the feedback Δ*T* and decreased with an increasing difference between the test and the feedback intervals. This abrupt promotion of interval-specific learning in accuracy by feedback can be readily appreciated by a comparison between the deviation curve from the last NF session (lightest blue line in Fig. 5*C*) and that from the first SF session (orange line in Fig. 5*C*). The across-session dynamics of the variability curves was completely different from that of the deviation curves in two respects. First, the variability curves did not exhibit any interval-specific effects of feedback in any daily sessions (Fig. 5*F*). Second, overall interval-nonspecific improvements in timing precision developed slowly and steadily throughout the entire daily sessions, including the NF sessions (open symbols in Fig. 5*F*). This is in line with the monotonic decrease of the across-Δ*T* mean CV (Fig. 4*B*).

Putting together the across-session dynamics of timing errors and the interval-specific effects of feedback, we conclude that feedback interval-dependent learning occurs immediately after the feedback for timing accuracy whereas the interval-nonspecific learning gradually accumulates over sessions regardless of the feedback for timing precision.

### Control for Sensorimotor Function and Speed Perception

We have been attributing the Δ*t* distributions to the variations of IT. Because of the nature of our task, however, there are two other sources that potentially contribute to the observed Δ*t* distributions. First, the variation in speed perception could have affected the Δ*t* distributions because errors in perceived speed of the bar during the “visible motion” would cause errors in timing response (Fig. 1*C*). To check this possibility, we asked subjects to perform a speed discrimination (SD) task shortly after blocks of IT task trials in each daily session (Fig. 1*D*). On each trial, subjects viewed two moving bars sequentially and judged which of the two bars moved faster. While setting the reference speed to 8°/s, which roughly matched the mean speed (8.22°/s) in the IT task, we used an evenly distributed set of six test speeds around the reference speed. By computing the mean and the standard deviation of a cumulative Gaussian function fitted to each session's psychometric curve (Fig. 6*A*), we assessed how the accuracy and precision in speed perception changed over sessions. If the observed changes in timing accuracy and precision across sessions (Fig. 4) were caused by changes in speed perception, the across-session dynamics of speed perception should be similar to that of IT. However, neither of the two measurements exhibited any noticeable changes across sessions [Fig. 6, *B* and *C*; paired *t*-test between PSEs of SD task in 4th and 5th sessions, *t*(3) = −0.44, *P* = 0.69; rank correlation for 1/sensitivity of SD task averaged across subjects, ρ = 0.15, *P* = 0.68], invalidating speed perception as a source for systematic changes in IT that we observed in the IT task across sessions.

Second, the variation in IT may have also occurred at the stage of motor execution (ME). To test this possibility, we inserted ME task trials randomly between the IT task trials (Fig. 1*D*). The ME task was exactly same as the IT task, except that the moving bar did not disappear and continued to drift even when passing through the occlusion arc. Because subjects simply responded to the arrival of the bar at the nonius line by pressing a button, there was no need for subjects to estimate Δ*T*s to perform the ME task. The ME data did not exhibit a scalar property, indicated by a constant level of standard deviation across different Δ*T*s (Fig. 6*D*). To know whether the ME trial data exhibit patterns conforming to Vierordt's law that were observed in the IT data, we conducted linear regression analysis separately on the ME trials of each session in all subjects. The slope (0.98 ± 0.02) and intercept (0.05 ± 0.04) were significantly smaller than 1 [*t*(39) = −5.93, *P* < 10^{−6}, 1-sample left-tailed *t*-test] and larger than 0 [*t*(39) = 7.36, *P* < 10^{−10}, 1-sample right-tailed *t*-test], respectively. Despite these statistical significances of Vierordt's law in ME data, these results are unlikely to suggest that the error patterns in ME trials determined the observed across-session changes in timing accuracy in the main IT task (the promotion of central biases) for the following two reasons. First, unlike the IT data, the slope and intercept for the ME trials did not show gradual changes across sessions (rank correlation for slope, ρ = 0.21, *P* = 0.56; rank correlation for intercept, ρ = 0.39, *P* = 0.26). Second, the effect sizes of the central biases in the ME trials were marginal (average slope, 0.98; average intercept, 0.05) and substantially smaller than those in the main IT trials [slope: 0.98 ± 0.02 (ME) vs. 0.77 ± 0.20 (IT), *P* < 10^{−5}; intercept: 0.05 ± 0.04 (ME) vs. 0.32 ± 0.18 (IT), *P* < 10^{−7}; paired Wilcoxon signed-rank tests between ME and IT data]. Without the correction for scalar property, the time course of mean accuracy in the IT task, which may be primarily determined by data with long Δ*T*, differed from that of the motor control tasks (Fig. 6*E*). In addition, the significant feedback effect was absent in μ(Δ*t* − Δ*T*) for motor control [Fig. 6*E*; paired *t*-test between 4th and 5th sessions, *t*(3) = −0.50, *P* = 0.65]. Regarding motor variability, however, the standard deviations of motor latency decreased slowly across sessions [rank correlation for across-subject averaged σ(Δ*t* − Δ*T*), ρ = −0.77, *P* = 0.01], indicating some degree of improvement in ME precision, although there was no explicitly intended feedback for the motor control task. This improvement occurred probably because subjects might calibrate their motor latencies by comparing positions of the visible moving bar and another bar presented briefly in between the nonius line at the moment of button press (see methods). However, the reduction in motor variability is unlikely to account for the observed changes in timing precision. The size of motor variability reduction was too small to explain the size of variability reduction in the IT data, particularly at long Δ*T* (Fig. 6*F*), probably due to the absence of scalar property in motor variability. Moreover, the time course of the motor variability did not correlate significantly with that of the timing precision except for only one subject (*S1*; *r* = 0.71, *P* = 0.02). Thus we conclude that across-session changes in the accuracy and precision of IT data were largely independent of those of speed perception and motor execution.

### Bayesian Modeling of Perceptual Learning in Timing Accuracy and Precision

We found that the dynamics of timing accuracy was clearly dissociated from that of timing precision in overall changes across sessions (Fig. 4) and interval-specific effects of feedback (Fig. 5). Intriguingly, the framework of Bayesian probabilistic inference (Chater et al. 2006; Doya 2007; Friston 2010; Griffiths et al. 2008; Kersten et al. 2004; Körding and Wolpert 2006; Maloney and Mamassian 2009) appears to offer a parsimonious way of explaining these seemingly complicated dynamics of the timing accuracy and precision. Bayesian models have been broadly applied to visual perception (Barthelmé and Mamassian 2009; Knill and Saunders 2003; Welchman et al. 2008; Zhaoping and Jingling 2008), multisensory integration (Alais and Burr 2004; Burge et al. 2010; Ernst and Banks 2002), and sensory-motor control (Dokka et al. 2010; Hudson et al. 2008; Körding and Wolpert 2004; Stevenson et al. 2009; Trommershauser et al. 2008), while providing normative predictions about how to integrate a variety of available information in a statistically optimal manner. When the framework of Bayesian inference is applied to our IT task situation, there are two types of information contributing to the distribution of Δ*t*. The first type is a probabilistic knowledge or belief about physical time intervals [*p*(Δ*T*)], called a “prior distribution,” which can be formed by previous experience or naive expectation. Without incoming sensory events, an observer may rely on the prior to make a guess about how long a Δ*T* is. When confronting a sensory input such as Δ*T* in our case, the initial guess based on the prior can be updated with additional information obtained from that sensory input [*p*(Δ*t*|Δ*T*)], called a “likelihood function.” Final decisions about time intervals across trials can be predicted from a posterior function, which is jointly determined by the prior and the likelihood [*p*(Δ*T*|Δ*t*) = *p*(Δ*t*|Δ*T*)*p*(Δ*T*)/*p*(Δ*t*)]. This leads to interesting predictions by the Bayesian model regarding two timing error metrics of interest in our study. If we adopt Gaussian distributions to model the prior and likelihood function, it can be analytically shown that the mean or maximum of a posterior (MAP; μ_{posterior}) is the weighted average of the means of the prior and the likelihood (μ_{prior}, μ_{likelihood}) with the relative proportion of their variance (σ_{prior}^{2}, σ_{likelihood}^{2}) as a weight to the counterpart mean (*Eq. 7*). As a result, the mean of the posterior is predicted to fall midway in between the μ_{prior} and μ_{likelihood}. On the other hand, the variance of the posterior (σ_{posterior}^{2}) is a product of the variances of the prior and the likelihood divided by their sum (*Eq. 8*), consequently becoming always smaller than the variance of the likelihood.

Of particular relevance, the Bayesian model provides a parsimonious and analytical explanation for two systematic across-session trends in our data: *1*) the interval-nonspecific improvement in timing precision across sessions (Fig. 4*B*, Fig. 5*F*, Fig. 7*C*) and *2*) the increasing bias of μ(Δ*t*) toward the central Δ*T* across sessions (Fig. 2*A*, Fig. 3, Fig. 4*A*, Fig. 7*B*). Theoretically, *Eq. 8* indicates that the interval-nonspecific improvement in timing precision (σ_{posterior}) could be caused by reduction of either σ_{prior} or CV, which is analogous to σ_{likelihood} divided by μ_{likelihood}. However, the decrease in CV alone is contradictory to the progressively increasing bias of μ(Δ*t*) toward the center Δ*T* with an increasing order of session because it predicts a shift of the μ_{posterior} in the direction of the μ_{likelihood} (black vs. gray arrows in Fig. 7*A*, *right*), which is opposite to what we observed (Fig. 3, Fig. 7*B*). By contrast, the progressive reduction of σ_{prior} predicts both the nonspecific reduction of σ(Δ*t*) and the biased pattern of μ(Δ*t*) during the time course of PL (black vs. gray arrows in Fig. 7*A*, *middle*).

To confirm empirically the analytical explanation laid out above, we developed several variants of nested Bayesian models, which differed in the number of free parameters and model assumptions, and fitted them to a data set of Δ*t* and Δ*T* from each daily session of each subject (Table 1; see methods for details). Here, we assumed that the μ_{likelihood} is the same as Δ*T* in a given trial, as in previous studies on the Bayesian modeling of human psychophysical data (Miyazaki et al. 2006; Stocker and Simoncelli 2006; Whiteley and Sahani 2008). The scalar property was also assumed, leading to the linear association between μ_{likelihood} and σ_{likelihood}. Figure 7, *B* and *C*, respectively, show the observed μ(Δ*t*) and σ(Δ*t*) over sessions, along with the model fits, μ_{posterior} and σ_{posterior}, for a representative subject (*S4*). We found that, in all of the subjects, the Bayesian model was successful in capturing the dynamics of the observed μ(Δ*t*) and σ(Δ*t*), including the two major trends of interest, the increasing bias of μ(Δ*t*) toward the central Δ*T* (Fig. 7*B*) and the interval-nonspecific improvement in timing precision (Fig. 7*C*). As analytically demonstrated in Fig. 7*A*, the fitted model parameters (σ_{prior} and CV) revealed that it is the width of the prior (rank correlation for across-subject averaged σ_{prior}, ρ = −0.94, *P* < 0.001), not that of the likelihood (rank correlation for across-subject averaged CV, ρ = 0.15, *P* = 0.68), that gradually decreased along the sessions (Fig. 7*D*).

## DISCUSSION

Using an implicit IT task, in which human subjects responded to an invisible bar drifting at a constant speed, we tracked daily changes in distributions of timing responses for a broad range of physical time intervals both before and after feedback was given. Then we estimated two major types of timing error based on the mean and standard deviation of a given distribution, the former associated with timing accuracy and the latter with timing precision. The accuracy and precision estimates were dissociated from each other both in overall time courses (Fig. 4) and in interval-specific effects of feedback (Fig. 5). The timing accuracy did not improve without feedback but began to improve specifically at around the interval with feedback instantaneously after the feedback was introduced. By contrast, effects of the feedback hardly existed in the timing precision: the timing precision improved for most of the tested intervals slowly but steadily throughout the entire sessions regardless of the feedback. The analysis of subjects' performances in the SD and ME tasks indicated that neither of them could explain the dynamics of timing behaviors in our study. We found that a Bayesian observer model, in which a subjective time interval is determined jointly by a prior and a likelihood function, captures the dynamics of the two types of timing measures simultaneously. The model suggests that the width of the prior, not the likelihoods, gradually narrows over sessions, demonstrating the critical role of prior knowledge in PL of IT.

### Difference Between Previous Tasks and Our Task for Estimating Interval Timing

Owing to the intangible but ubiquitous nature of the time dimension in diverse perceptual or cognitive events (Ahrens and Sahani 2008), judgments about time can be readily complicated by the other sensory, motor, and executive functions (Buonomano et al. 2009). For example, classic timing tasks, in which subjects are asked to reproduce a sample Δ*T* or distinguish between two consecutively presented Δ*T*s, require active retention of the reference Δ*T* in working memory or motor planning and execution (Bartolo and Merchant 2009; Buhusi and Meck 2005; Buonomano et al. 2009; Gibbon 1977; Gibbon et al. 1997; Karmarkar and Buonomano 2003; Meegan et al. 2000; Miyazaki et al. 2005). In a temporal bisection (or generalization) task, subjects have to refer to long-term memory of the reference Δ*T* (Rakitin et al. 1998; Wearden 2008). Subjects often rely on associative memory among sensory or response events in tasks where Δ*T*s are simply conditioned to cues in other sensory domains (Janssen and Shadlen 2005; Leon and Shadlen 2003; Mita et al. 2009). Verbal estimation of Δ*T* is highly subject to cognitive bias with respect to our nomenclature for time intervals. Finally, if sensory processes carrying temporal information undergo dynamic changes, subjects might exploit those changes for estimation of Δ*T*s (Ahrens and Sahani 2011; Johnston et al. 2006; Kanai et al. 2006). We devised the IT task to reduce these potential confounds because improvements may arise in any of these nontiming domains while subjects are repeatedly performing the same task over multiple days. An implicit IT task like ours, which mimics a natural situation where a predator must keep track of its moving prey temporarily hidden in a cluttered visual scene to make a timed response afterward (Hulme and Zeki 2007; Shuwairi et al. 2007; Yi et al. 2008), obviates the need to explicitly encode the Δ*T* and to retrieve it later as in the previous studies. Our task allowed us to vary unpredictably nontiming variables such as the direction, speed, and location of a moving bar and the length of occlusion distance on a trial-by-trial basis while preserving independent control of the main variable, the Δ*T*. In addition, the nine test Δ*T*s were shuffled randomly or roved within a block of trials throughout the entire sessions, so that subjects could not know explicitly which Δ*T* was tested in a given trial. All these manipulations made it unlikely that simple forms of associative conditioning contribute to the observed variations of timing data. Moreover, because there was no change in physical stimuli during our IT task, timing performance in the task is hardly contaminated by any information associated with transients in sensory input. In summary, our task is designed to minimize the potential confounding factors for PL in previous studies, while enabling us to examine timing on a fine and broad timescale. On the other hand, a potential downside of our task is that any enhancement in speed estimation, motor planning, and compensation of delay in motor execution can affect the timing performance in the IT task (Cardoso-Leite et al. 2009; Faisal and Wolpert 2009; Harris and Wolpert 1998; Law and Gold 2008). For this reason, it has been necessary to directly measure performances in motor execution during IT tasks, particularly when the motor system is likely to contribute to IT performance, as in the interval reproduction task (Cicchini et al. 2012; Jazayeri and Shadlen 2010). However, our analysis of the data from the control task trials (Fig. 6) showed that the contributions of speed perception and motor execution were too small to account for our timing data, although their roles in IT task cannot be completely ruled out. It is left for future studies to examine whether our findings can be generalized to other types of IT tasks with explicit timing and different timescales (Coull and Nobre 2008; Merchant et al. 2008a; Rohenkohl et al. 2011; Zelaznik et al. 2002). We also note that tasks or stimuli in the present study were similarly used in several previous studies, albeit framed in different contexts such as motion extrapolation, selective attention, coincidence timing, and manual interception (Eskandar and Assad 1999; Linares et al. 2009; Merchant and Georgopoulos 2006; Miyazaki et al. 2005; Rohenkohl et al. 2011; White et al. 2008).

### Dissociations Between Timing Accuracy and Precision

Behavioral consequences of PL have been characterized by two descriptive statistics, a mean and a variance, the former indexing how close the overall mean of a response distribution (point of subjective equality, PSE) is to a reference stimulus (accuracy, also referred to as “constant error”) and the latter indexing how reliable responses are across trials (precision, sensitivity, discrimination threshold, or “temporal variance”) (McAuley and Miller 2007; Merchant et al. 2008b; Zarco et al. 2009). A few studies on visual perception reported feedback-induced rapid adjustments of decision criteria and relative inertia of sensitivity changes (Ahissar and Hochstein 1997; Berniker et al. 2010; Chalk et al. 2010; Herzog et al. 2006). To the best of our knowledge, however, this is the first study to report systematic dissociations between mean accuracy and precision of IT in the two major aspects of PL, time course and specificity. A few earlier studies on IT have reported simultaneous improvements in both accuracy and precision (Meegan et al. 2000) as well as differences between the two measures by changes in interstimulus interval (Buonomano et al. 2009), by adaptation effects of visual flickers (Johnston et al. 2006), or by pharmacological effects in relation to the internal clock and reference memory (Buhusi and Meck 2005; Coull et al. 2010). However, none of those studies identified systematic and clear dissociations in overall temporal dynamics and specificity of PL in the two timing errors.

The decoupled dynamics of the two timing metrics demonstrate that it can be misleading to characterize PL effects with only one of those metrics. For example, if interpreted alone, the steady reduction in the timing variability in our data (Fig. 4*B*) suggests that prolonged practice induced fundamental enhancement in perceptual sensitivity of IT over sessions. When considered together with the fact that the μ(Δ*t*) at the short or long Δ*T*s substantially deviated from the physical Δ*T*s (Fig. 2*A*, Fig. 5*B*), however, the apparent enhancement in timing variability at those Δ*T*s is actually quite detrimental in a real-world situation because veridical targets are missed in a highly reliable manner for the short or long Δ*T*s. Our findings underscore the importance of assessing effects of PL by measuring the two error metrics simultaneously and interpreting them jointly in a coherent manner (Buonomano et al. 2009; Gold et al. 2010; Johnston et al. 2006; Meegan et al. 2000; Stetson et al. 2006).

In contrast with the slow and steady changes in timing precision, we found that feedback calibrated mean accuracy quickly within approximately dozens of trials. One potential source contributing to the immediate impact of feedback on accuracy is the parametric nature of the feedback in our task. Unlike other studies, in which feedback was typically categorical, the feedback was delivered quantitatively on a fine scale on a trial-by-trial basis. It is interesting to consider the different temporal dynamics between accuracy and precision in the framework of the reverse hierarchy theory for PL (Ahissar and Hochstein 1997). According to this view, in hierarchical perceptual systems in which the initial feedforward sweep is followed by the feedback processing with fine scrutiny in the reverse order (Ahissar et al. 2009; Hegde 2008), PL occurs in a top-down manner, first at a higher level relatively easily and then at a lower level for a more difficult task. Our data are consistent with this view in that the calibration in timing accuracy requires a >1-h session whereas the calibration in precision occurs slowly over multiple daily sessions (Berniker et al. 2010). One intriguing interpretation is that neural mechanisms involved in timing accuracy and precision are likely to differ—the former located at a relatively high level in the hierarchy with a high degree of malleability to learning and the latter located at a relatively low level with rigidity to learning (Summerfield and Koechlin 2008). It will be interesting to use electrophysiological or brain imaging techniques to learn whether neural signatures for calibrations of timing accuracy and precision reside at different levels along the hierarchy of cortical processing.

### Bayesian Inferences in Interval Timing

Different from the interpretation based on two heterogeneous mechanisms, our Bayesian model provides a parsimonious and coherent way of interpreting the seemingly dissociative dynamics of timing accuracy and precision in our data, at both conceptual and empirical levels. The collectively fluctuating patterns of μ(Δ*t*) along daily sessions are well predicted by the Bayesian model fitted to the individual subjects' data (Fig. 7). The impact of repeated performance in our timing task is unlikely to manifest itself as a change in the precision of sensory measurements of Δ*T*—the reliability of likelihood functions—because this leads to the opposite direction of accuracy bias to the observed (Fig. 7*A*). Instead, the internal representation of prior distribution was calibrated during multiple days of extensive training, as the gradual reduction of the σ_{prior} succinctly captures the across-session dynamics of timing accuracy and precision simultaneously (Zhaoping and Jingling 2008).

The pattern of perceived time intervals being attracted to the center of the interval range has also been reported in previous studies on IT (Fortin et al. 2009; Jones and McAuley 2005; Kanai et al. 2006; Lewis and Miall 2009; McAuley and Miller 2007; Penney et al. 2008; Wearden 2008; Zarco et al. 2009). Despite its frequency, this “regression to the mean” has been overlooked (Bartolo and Merchant 2009; Zarco et al. 2009) or diverse explanations have been offered based on, for example, “central tendency” (Lewis and Miall 2009), Vierordt's law (Fortin et al. 2009; Tse et al. 2004), context-induced biases (Jones and McAuley 2005; Kanai et al. 2006; McAuley and Miller 2007), or abnormal timing in dopamine-deficient patients (Malapani 2002; Malapani et al. 1998). The Bayesian model argues that the “migration” effect is a consequence of strong contribution of prior distributions to IT, as suggested by recent modeling works in time perception (Cicchini et al. 2012; Jazayeri and Shadlen 2010). Specifically, the model predicts that the degree of bias is determined by the width of the prior distribution relative to that of the likelihood (*Eq. 7*). When the scalar property is counted in, the model also offers quantitative predictions about how both the size of migration and the variability vary systematically as a function of the Δ*T*s. The scalar property dictates that the longer the time interval, the larger variability of IT (σ_{likelihood}) is. Accordingly, the longer the time interval, the greater the relative contribution of the prior is, which leads to larger migration toward the μ_{prior} as the weight to the μ_{likelihood} is inversely proportional to the σ_{likelihood} (*Eq. 7*; Fig. 7*A*). By the same token, the amount of variability reduction will be greater for longer Δ*T*s because the weight to the σ_{likelihood} is inversely proportional to itself (*Eq. 8*; Fig. 7*C*). All these detailed model predictions about migration and variability reduction match the observed data (Fig. 7).

Previous studies interpreted the reduction in timing variability as evidence for enhanced sensory representation of Δ*T* via noise reduction (Bartolo and Merchant 2009; Buonomano et al. 2009; Karmarkar and Buonomano 2003). However, our Bayesian modeling results provide an alternative explanation by suggesting that the decreased timing variability in the previous studies could arise from the sharpened prior distribution around the test Δ*T*s that was overexposed to subjects through massive training or procedural learning. In the same vein, an alternative view can also be offered for the origin of the transfer in PL of IT across different sensory channels (Buonomano 2003; Westheimer 1999; Wright et al. 1997) or modalities (Meegan et al. 2000; Nagarajan et al. 1998). The major sources of generalization of PL effects could be the update of prior distributions, rather than an enhancement in a sensory mechanism for IT common to different modalities, because the impact of updated priors can be manifested in time intervals carried in different sensory channels or modalities. The previous studies on IT might not have been able to observe the contribution of priors probably because the limited number of test intervals and the wide separation between those intervals could make it difficult to appreciate differences between the prior distribution and likelihood.

What could have caused the narrowing prior during PL? In general, reducing overall variability of perception is advantageous for achieving and maintaining accurate perceptual representation of sensory events in an uncertain environment because it helps the adaptive perceptual system to be sensitive to subtle changes in stimulus statistics and accordingly to recalibrate sensory representations by minimizing the variance of errors (Burge et al. 2010). Thus it is conceivable that observers sequentially update their belief about Δ*T* by using the current posterior distributions, which already become sharpened, as a prior for the future. We tested this “sequential updating of prior” hypothesis by developing a simple generative model that uses the Δ*t* distribution in the current session as an empirical prior for the next session. However, this generative model failed to predict the complex and noisy time courses of the observed timing accuracy and precision. One likely reason for this failure is overfitting: including only the previous daily session could be insufficient to estimate the prior. More sophisticated variants of the generative model, such as those that update priors by merging many previous daily sessions with different weights or those that update priors based on internal dynamics and feedback through reinforcement learning mechanisms (Gold et al. 2008; Nassar et al. 2010; Shibata et al. 2009), may be needed to capture the intriguing dynamics found in our study.

The results of our Bayesian modeling indicate that, despite individual differences in our data that were reflected in highly flexible and idiosyncratic priors across individual subjects, the contribution of prior distributions provides a simple yet powerful explanation for the multiple aspects of our results simultaneously. It is possible that the high degree of uncertainty in our IT task augmented the influence of prior knowledge. Note that subjects had to judge a wide set of Δ*T*s ranging from hundreds of milliseconds to several seconds, the order of which was completely randomized on a trial-to-trial basis, without any predictive cue for certain Δ*T*s. In this circumstance with high uncertainty, the brain may need to rely heavily on prior knowledge (Hudson et al. 2008; Körding and Wolpert 2004; Lewis and Miall 2009; Zhaoping and Jingling 2008). From a functional viewpoint, insurmountable demands for timing are also quite evident in virtually all sensory, motor, and cognitive tasks, not to mention various direct timing tasks such as IT, temporal order judgment, rhythmic responses of temporal sequences, and causality inference (Ahrens and Sahani 2008; Coull and Nobre 2008; Karmarkar and Buonomano 2007). Thus the adaptive construction of the prior would be advantageous as well as efficient given the necessity of the mental clock to be calibrated incessantly to the dynamic environment and our limited capacity to deal with the virtually infinite range of Δ*T*s.

### Assumption of Gaussian Prior Distributions

Despite its capability to describe our data parsimoniously and to offer interesting explanations, our Bayesian observer model of IT has a couple of limitations that need to be addressed carefully. The most crucial of these is the assumption about prior distributions. In previous studies, the prior was either reconstructed empirically (Brenner et al. 2008; Körding and Wolpert 2004; Stocker and Simoncelli 2006) or often presumed identical to physical distributions of stimuli (Jazayeri and Shadlen 2010; Körding and Wolpert 2004; Miyazaki et al. 2006; Whiteley and Sahani 2008). Although test intervals in our study were actually drawn from the uniform distribution in a log scale, we opted to model the prior distribution with a Gaussian function for several reasons. The adoption of the physical prior function as a Bayesian prior may be inappropriate before subjects are exposed to the entire range of stimuli over an extended period of training. In the present study, the purpose of which was to track how effects of learning on IT evolve over multiple days, we began to collect data after providing subjects with minimal training (15 trials only at the beginning of the first-day session). Another reason for preferring the Gaussian prior to the physical prior was that, in the process of being internalized in the brain, the physical prior is likely to be contaminated by internal noises (Cicchini et al. 2012). This would result in a prior whose distribution is smoother than the physical prior, particularly around its boundaries. The smoothness of the internal prior would be even more pronounced in our situation because IT responses are known to have relatively large variability and to be prone to perceptual illusions and recalibration (Burr et al. 2007; Eagleman 2008; Johnston et al. 2006; Terao et al. 2008). Another problem with the physical prior is that it does not allow posterior estimates to fall outside the prior distribution because the prior probability outside the range of Δ*T*s is eventually zero. This apparently conflicts with the data points beyond the range of the stimulus distribution, which have also been easily observed in previous studies as well (Ahrens and Sahani 2008; Jazayeri and Shadlen 2010; Merchant et al. 2008b; Zarco et al. 2009). A recent study (Jazayeri and Shadlen 2010), in which the physical distribution of test stimuli was used as a Bayesian prior, had to create another noise distribution at a motor execution stage with a scalar property to get around this problem. Finally, it should be stressed that our major conclusion about the relative contributions of the prior and likelihood over the time course of PL does not rely critically on the detailed shape of the prior. The steady and monotonic reduction of the “width” of the prior, not the likelihood, is the crucial feature that is required for the Bayesian model to capture the dynamics of both timing accuracy and precision simultaneously.

### An Interpretation of Feedback Effects

In our data, the impact of feedback was observed significantly only in mean timing accuracy, especially right after it was first introduced. Meanwhile, the reduction in timing variability seemed to exist even before the introduction of feedback and was hardly influenced by the feedback thereafter (Fig. 4*B*, Fig. 5*F*). In our Bayesian modeling framework, this dissociation between the feedback effect and timing variability data implies that the width of the Bayesian prior is to some extent independent of the effect of feedback (Fig. 7*D*). On the other hand, although the immediate effect of the feedback was not observed in the μ_{prior}, the introduction of feedback made the μ_{prior} estimated from different subjects converge to the central Δ*T* particularly from the first SF session (data not shown)_{.} We speculate that the feedback might have played a role of “anchoring” the prior near to the veridical center Δ*T*, as hinted by the convergence of μ_{prior} to the central Δ*T* over the SF sessions.

As argued above, our modeling results suggest that the contraction of the Bayesian prior distribution is responsible for both the nonspecific reduction of timing variability and the migration of μ(Δ*t*) toward the center. How is this process distinguished from the “anchoring” effect of the feedback? Note that it is difficult to disentangle the feedback effect and the growing biases toward the central Δ*T* because we provided the feedback to the center of sampled Δ*T*. Nevertheless, one possible interpretation is that the contraction of the prior width builds up steadily as an unsupervised learning process via intensive training, whereas the location of the prior center can be quickly corrected by the feedback as a top-down process. This interpretation is consistent with a line of studies that reported immediate impacts of a variety of different feedbacks (Lustig and Meck 2005), including even “fake or incorrect feedback,” in visual PL (Shibata et al. 2009) and suggested an important role of feedback as top-down processing in PL (Sasaki et al. 2010; Zhang et al. 2008).

The Bayesian model does not explicitly specify how the error feedback updates the prior distribution and how the selective and immediate effect of the feedback on the mean accuracy occurs at a mechanistic level. Without doubt, the elucidation of the exact role of feedback in PL of IT and its relation to the evolving prior requires further studies, especially in the context of Bayesian modeling (Kersten et al. 2004; Shibata et al. 2009; Stevenson et al. 2009). For instance, by giving feedback for the shortest or longest Δ*T* or by using fake feedback as in previous studies (Herzog 1997; Shibata et al. 2009), we may learn more about specific contributions of the feedback of PL in IT.

### Implications for Neural Mechanisms for Interval Timing

The neural basis for IT became of great interest recently (Ahrens and Sahani 2008; Coull et al. 2004; Wittmann and van Wassenhove 2009). The present study can provide information that constrains neural models of IT by evaluating which neural models or theories are more compatible with our data. First, the dissociation of timing accuracy and precision is not expected from the view that the neural machinery for IT is composed of a single centralized clock (Gibbon et al. 1997; Rakitin et al. 1998). If there were one centralized clock or a tight coupling between individual timers along different timescales, the trend of shifting toward the central Δ*t* (coexistence of under- and overestimation) would not have been observed. Second, although it has been presumed that different functional requirements for the subsecond and suprasecond timing predict distinct underlying mechanisms for the different temporal ranges (Buhusi and Meck 2005; Eagleman et al. 2005; Karmarkar and Buonomano 2007), the absence of heterogeneous regimes in the smooth transfer curve in our data argues against the view assuming a strict boundary between millisecond and second timing (Grondin 2010b; Lee et al. 2007). We conjecture that the timescale-based classification might occur when subjects perform “explicit” timing tasks or use different strategies in the two realms (Coull and Nobre 2008; Merchant et al. 2008a; Rohenkohl et al. 2011; Zelaznik et al. 2002), for example, chronometric counting in IT with long durations (Lewis and Miall 2003). Finally, one most important implication of our finding about neural mechanisms of IT is that any successful neural model must capture the decoupled dynamics of the two error metrics of IT. Our modeling exercise suggests that a neural mechanism of IT is likely to implement Bayesian inference as a core feature of its functional architecture in order to incorporate the adaptive changes in timing performance.

## GRANTS

This research was supported by the WCU program through the National Research Foundation of Korea funded by the Ministry of Education, Science, and Technology (R31-10089) and by the Seoul Science Fellowship to H. Sohn.

## DISCLOSURES

No conflicts of interest, financial or otherwise, are declared by the author(s).

## AUTHOR CONTRIBUTIONS

Author contributions: H.S. and S.-H.L. conception and design of research; H.S. performed experiments; H.S. analyzed data; H.S. and S.-H.L. interpreted results of experiments; H.S. and S.-H.L. prepared figures; H.S. and S.-H.L. drafted manuscript; H.S. and S.-H.L. edited and revised manuscript; S.-H.L. approved final version of manuscript.

## ACKNOWLEDGMENTS

The authors thank Randolph Blake, Daeyeol Lee, Marcus Kaiser, and three anonymous reviewers for constructive comments on the earlier versions of this manuscript and useful discussions.

- Copyright © 2013 the American Physiological Society