reply: We thank Dr. Ninio for his interest in our 25-yr-old publications and for his attempt to analyze our data and interpretations using a strategy different from the one we used for this pioneering work. However, his approach is exclusively concerned with curve fitting. It avoids the more demanding problem of the recovery of parameters of explicit statistical models of biological processes. That is, he decided to discriminate between fits of our published histograms visually, or “by eye.” The field long ago progressed to the point where the standard for confronting data with models is statistical analysis, not visual fitting.

Reevaluation of old methods can be a good thing (Lowen et al. 1997), provided there is a sound scientific questions at stake. Yet, we are surprised that Dr. Ninio raises concerns about the validity of aspects of these seminal papers but does not want to review the subsequent literature that appeared about our results and conclusions. This is difficult to understand because the aim of a meaningful scientific approach is to progress in the understanding of natural phenomena and processes. In this respect, it is worth noting that subsequent research carried out independently by a number of eminent scientists supported our proposal of the “One Vesicle Hypothesis” of transmitter release at several synapses in the CNS.

In this response we address the flaws in the logic of Ninio's analysis and his criticisms and misunderstandings of our published work. Because of the progress in this field, and given that Dr. Ninio is not an expert in this area, we believe our response here will be of interest to other investigators who are concerned with using rigorous methods to extract information from experimental distributions.

## BACKGROUND

### Experimental context

We performed simultaneous intracellular recordings and dye injections, in vivo, to collect data sets of fluctuating inhibitory postsynaptic potentials (IPSPs) evoked in the Mauthner cell of teleost fish by single presynaptic action potentials and used an optimization procedure to determine the best-fitting binomial and Poisson descriptions of the data. We then compared those results with histological reconstructions of the presynaptic cells, including the axonal branching pattern and the identification of contacts with the postsynaptic neuron. This was one of the earliest attempts to obtain such structure–function correlations. We were particularly interested in two questions that had been raised previously: *1*) Were the fluctuations in PSP amplitude better described by a binomial or a Poisson distribution? and *2*) Could the statistical parameters be correlated with physical structures?

McLachlan (1978) noted that the binomial case offers the possibility of identifying physical correlates for two release parameters: *1*) n, the number of available release units (first believed to be large since there are numerous vesicles in nerve terminals; del Castillo and Katz 1954), and *2*) p, the release probability for each unit because these two parameters are explicitly defined in this formulation. Some investigators had suggested that the binomial model might be better in preparations where the release probability was not artificially low, in contrast to the experiments of Katz and colleagues, who lowered extracellular Ca^{2+} and used a Poisson model.

We found that for most cells, a binomial was statistically preferable to a Poisson distribution and that n was close to the number of presynaptic boutons, or release sites [Korn et al. 1981, 1982 (hereafter referred to as KTMF); Triller and Korn 1982]. Thus we proposed the so-called One Vesicle Hypothesis. It has received support from studies in several central structures (Faber et al. 1996). Some recently published articles demonstrate a similar correspondence between values of histological and binomial n, based on analyses of electron microscopy of synaptic contacts and response fluctuations in the hippocampus and cortex (Biro et al. 2005; Silver et al. 2003). At other connections, there is a discrepancy between the two measures (Biro et al. 2006), suggesting heterogeneity of evoked release.

##### STATISTICAL PROCEDURE.

The analytical method we used derives from the “theory of modeling” (Norton 1986; Richalet et al. 1971) and includes several steps. The first step is to specify the mathematical structure of the model and define its parameters. The second step is to identify parameter values representative of the experimental data, based on a criterion that quantifies the adequacy, or lack of adequacy, of the model relative to the data. This criterion is then optimized, to discover the most probable values of the parameters. Without judging the statistical characteristics of the estimates, this procedure provides an optimal set of values of the model parameters, according to the defined criterion.

We used the Likelihood Criterion, which is recognized (Edwards 1992) as a major advance in theoretically unifying all data, including the most uncertain ones. This is the case because it takes into account the experimental errors associated with all data collection. The Likelihood Criterion measures the probability that the available data would be observed, given the formal structure of the model being tested and the value of its parameters.

The reanalysis presented by Ninio does not follow these aspects of our technique. His surprise that our method leads to precise values for optimal parameters might have been assuaged if he had used a similar approach. Even so, his reliance on visual comparisons resulted in only small, insignificant differences in values of p (0 to 8%) and q (2.5 to 15%).

### An alternative analytical approach?

Ninio asks whether *1*) “the published curves were what they were supposed to be” and *2*) “other binomial curves for different values of n could fit equally well the experimental histograms.” In other words, he questions whether the published curves are drawn properly and are the same as stated in our results. Then he asks whether they adequately fit the histograms that represent the data. This approach is inappropriate, for at least three reasons.

First, it ignores that, although the histograms are indeed representations of the data, all of our results consisted of an optimization of a full data set according to the Likelihood Criterion, to find a best-fitting binomial or Poisson description. That is, the optimization was done on a point-by-point basis, not by binning the data.

Second, working with enlarged photocopies can introduce some errors in visual estimates of parametric values, such as quantal size. This is especially possible because, although the histograms were computer drawn, the graphing program did not label the axes. That was done by hand after the analysis was complete.

Third, the alternative analyses depend completely on subjective visual criteria. This is insufficient. It leads to conclusions that alternative curves identified are “at least as good as” or “better than” the parameter sets we obtained. Thus qualitative assessments of second-order representations of the data are used to criticize results obtained on the basis of an independent algorithm that took into consideration every individual data point. Given this inadequate methodology, we see no justification for the doubts raised.

### Comparison of statistical analysis and visual curve fitting

Ninio argues that the alternative fits for our histograms prove conclusively that our published solutions are not unique representations of the data. Yet, in the context of the analytical method we used, our articles did not claim that the fits were unique. They stated that they were optimal parameter sets defined by the Likelihood Criterion, corresponding to a local minimum. Even after 25 years we would contrast our model-based statistical approach with the terms “just as well approached” and “reasonably good simulation” used by Ninio.

##### THE LIKELIHOOD CRITERION.

The letter asks how different models, which he considers as equivalent (or “as good as”), could be distinguished and ordered with respect to their ability to match the data. Our approach was to compare the corresponding likelihoods of the experimental results, using all of the synaptic potentials instead of binning the distributions into a smaller number of categories. This increases the sensitivity of comparisons of sets of parameter values describing an experiment and in turn allows a statistically based choice between them. This fundamental feature—which played a major role in our work—is ignored in the alternative analyses. Neither the shape of the histogram nor that of the model is sensitive to the size of the sample. We find that the alternative framework in which additional data provide no extra information is an unusual paradigm.

The following examples illustrate that the optimal parameter set derived using the likelihood criteria can be far from the metric out obtained with visual inspection of curves.

###### First example.

We assume that the statistical model generating the data is a mixture of two Gaussians, one with zero mean and unit variance, with weight 0.99, the other with mean equal to 10 and unit variance and weight 0.01. Several samples of size 100 were drawn from this distribution, expressed as (1) The log-likelihoods of these samples were computed under two slightly different models: either the nominal model given by *Eq. 1*, or an erroneous model in which the second component was omitted, thus given by (2) The log-likelihoods under models (1) and (2) are expressed as ∑ log f(x_{i}) and ∑ log g(x_{i}), respectively, where x_{1}, …, x_{i},… represents the values constituting a data sample and the summations are over i.

The corresponding averages of the log-likelihoods across the samples were, respectively, −147.8 and −170.8 (neperian logarithms).

It is known from the likelihood ratio test theory that twice the difference between log-likelihoods computed under two different embedded models is, under the null hypothesis of equivalence of these models, distributed as a chi square with a number of degrees of freedom equal to the difference of dimensionality between the two models. In other words, a difference between log-likelihoods of a few units (typically 3.84/2) indicates that the model with the higher log-likelihood is indeed the best one to account for the available data, even if it involves a higher complexity (i.e., has more parameters). With this test in mind, we see that model (1) outperforms model (2), with a mean difference in log-likelihoods close to 23, a figure rarely encountered in practice! In 70% of the simulations performed, model (1) was preferred to model (2).

However, if we look at the corresponding underlying distributions (see Fig. 1), we discover two very very close curves: this illustrates that the likelihood metric is an extremely different distance compared with any functional distance or visual separation of the two curves. In any case, the power of this metric is that it has a precise basis for testing the ability of a statistical-mathematical model to fit sampled data. Other distances are just inappropriate, and statements by Ninio that two models represent the results equally well because they are close by eye are simply and deeply wrong.

###### Second example.

We consider an example much closer to the type of experiment we dealt with in our work. We assume again that we have two models as candidates: in the first, termed M1, we assume n = 8 release units, a probability of release of 0.4, q = 1 (arbitrary units), and σ = 0.4 (same units). In the second, termed M2, n = 6, p = 0.533, and q and σ are as in M1. The corresponding probability density functions are displayed in Fig. 2. Although not perfectly superimposed, these two curves are quite close, as similar as two histograms might well be—based on the same data set but drawn using different bin values (see following text). Using an approach similar to that of the first example, for each of these models we computed the likelihood of a set of 300 data points derived under model M1; this data set thus mimics the results of a so-called experiment.

We found that for 24 of 25 simulated data sets, the right model had a higher likelihood. Furthermore, the mean difference between the log-likelihoods was about 9.6, an extremely high value by the standards described earlier. Thus we conclude that the use of a likelihood criterion permits a clear discrimination between two similar models.

###### Misuse of histograms.

Models M1 and M2 are drawn in Fig. 3, along with histograms of the same data set (300 IPSP measurements), using two bin sizes: either the size of one quantum (rough histogram) or the size of a quarter of one quantum (refined histogram). Visually, the same model might not be preferred as being the closest to the two histograms. That is, the visual discrimination might depend on which graphic representation was used. In any case, we probably would not arrive at a unanimous consensus on the preferred model. However the data are unique and generated under model M1. The problem can be solved only statistically, not by visual guesswork.

##### MISUSE OF THE KOLMOGOROV.

We contend that Ninio's use of the Kolmogorov is incorrect. This statistic is an interesting metric that can be used to quantify the “distance” between a sample and a theoretical distribution and for rejecting the hypothesis that the sample originates from the theoretical distribution (null hypothesis). Using this statistic as a test with a nominal level of significance of 5% requires that the theoretical distribution had not been optimized with respect to a given parameter in relation to the data. Otherwise, the test will be conservative, which further argues for using it only to reject the null.

We used the Kolmogorov metric with these considerations in mind; specifically we employed it to reject some of the models previously selected on the basis of the likelihood (see above). Most of the rejected models were those based on the Poisson hypothesis, which allowed us to conclude that the binomial hypothesis was more consistent with our data.

The Kolmogorov statistic cannot be used to qualify a model for these reasons. Furthermore, it relies on functional discrepancies between cumulative distributions and has a limited ability to detect departures from a given mathematical model. In other words, in the present framework, employing the Kolmogorov as a test statistic provides a powerless test, one that is inappropriate for selecting between candidate models or sets of parameter values: many potential solutions that are very unlikely according to the likelihood criterion will not be discarded by the Kolmogorov “test.” Returning to example 1, cited earlier, we found that this test allowed rejection of the wrong model, i.e., model (2) in 30% of the simulations. The power is even lower, as expected, when the optimization step is skipped: it then decreases to 6%, the reference distribution being a standard Gaussian. Again, this test can be used only to reject already selected models. In our papers we first selected the best binomial or Poisson according to the likelihood of the data, then checked whether it was rejected using a low-power test. Indeed, because this was a secondary test, we included results from experiments that it rejected in our summary (of KMTF 1982). A more comprehensive theory would permit testing for the optimal model within the framework of the likelihood theory. We are not aware that such a theory is available.

Ninio claims that upper bounds of the Kolmogorov criterion may be obtained whatever the detailed distribution of events within the bins of the histograms. This statement is based on a triangular inequality that is used only once. In other experiments, hypotheses unfavorable to his computations are suggested to be “unlikely.” Yet, the histograms provide no more exact information about cumulative experimental distributions than they do about density functions. Obtaining exact values of the Kolmogorov criterion would require knowledge of the basic data.

Nevertheless, we agree that some solutions proposed by Ninio “pass” the Kolmogorov test and may provide a Kolmogorov statistic lower than ours. We would stress that our primary criterion was the likelihood test and we still support its use as a primary approach. We suggest that the existence and validity of equivalent solutions should be addressed with a sound statistical theory, necessarily based on a Likelihood Criterion, rather than on a fitting criterion. We note in this respect that the Kolmogorov test is inefficient in detecting underestimates of n, due to rare large- amplitude IPSPs. Such events will have a slight effect on distributions, but will heavily influence the Likelihood Criterion.

In summary, the Kolmogorov statistic is based on geometry. In our opinion, and in our papers, it may be used to provide an additional viewpoint to likelihood-based approaches but it cannot substitute for them.

### Ninio's results and reproaches

Although the preceding arguments distinguish between a statistical analysis and one based on visual criteria, we briefly address a few points raised by Ninio.

##### RESULTS BASED ON VISUAL INSPECTION.

Of ten published figures, six were classified as either “satisfactory” or as “allowing alternative binomial interpretations.” Our published n gave a satisfactory fit, with some minor differences for the values of p and q.

Of the remaining four cases, two are representations of the same data. The published binomial parameters that best fit the data are the same. We made a labeling error on the ordinate in the first paper (Korn et al. 1981) and it was subsequently corrected. We were not aware of this difference when we published Fig. 13*C* in KTMF. The histogram and fits were redrawn for the new figure.

Another figure (Fig. 10*B*, KMTF) is criticized for a similar labeling error and because the binomial and Poisson fits seem similar. They are close, as clearly indicated in Table 2 of KMTF. Nevertheless, we can confirm that the statistical test based on a comparison of Likelihood Criteria preferred the binomial.

The last result at issue [Faber and Korn, 1988 (FK), Fig. 8*C*] passes the “visual” test, but Ninio argues we should have rejected it because in his reanalysis the fit failed the Kolmogorov. We must stress again that we did not use the Kolmogorov for that purpose. For example, results from the five data sets in KMTF (Table 3) that did not pass this test were included.

### Use of the correction factor

The criticism that using a correction factor for nonlinear addition of quantal responses introduced further complexity indicates a lack of appreciation for the underlying synaptic physiology. The correction factor was first described by Martin (1955), and it takes into account how the driving force for a synaptic event is reduced by changes in potential that shift the neuronal membrane closer to the reversal level for the synaptic event. This was not an arbitrary correction and we feel we evaluated its impact appropriately.

### Conclusion

We appreciate Ninio's statement that our work was seminal, in that we described results that, along with the studies of Redman and colleagues (Jack et al. 1981), changed the nature of the scientific discussion about structure–function correlations at synapses, particularly those in the CNS. These advances were accomplished with the aid of multidisciplinary tools that were state of the art at the time. Although we may have made some mistakes—inherent in any scientific inquiry—none of the putative errors invalidates the major findings in our papers. Finally, with respect to Ninio's criticisms, we have shown here that, 25 yr after the fact, he has used inadequate methodology and still fails to understand the power of the statistical analysis we used.

## Acknowledgments

We thank Professors Julian Jack, Michel Kerszberg, Richard Miles, Martin Pinter, and Steven Redman for helpful comments on this Letter.

- Copyright © 2007 by the American Physiological Society