Erratum to: Clustering of B¯→D∗τ−v¯τ\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$ \overline{B}\to {D}^{\left(\ast \right)}{\tau}^{-}{\overline{v}}_{\tau } $$\end{document} kinematic distributions with ClusterKinG

However, this changes the interpretation of our results but slightly, because the exact probability distribution of the metric is only important for the choice and interpretation of the stopping criterium of the clustering.

In our paper we considered two histograms H k , k = 1, 2 with bin contents n ki , i = 1, . . . , N and associated uncertainties σ ki . Our null hypothesis was that the bin contents of the histograms are drawn from two distributions with identical means. Let us first consider the case of uncorrelated random variables n ki with standard deviations σ ki and normalisations N k := N i=1 n ki . For a corresponding two sample test, we considered a distance measurẽ χ 2 withχ We have added a tilde to avoid confusion with the "true" χ 2 r distribution of r degrees of freedom. In the following we assume that the n ki are normally distributed. Under the null hypothesisχ 2 (H 1 , H 2 ) will be approximately distributed according to a χ 2 , not a χ 2 N distribution as assumed throughout our paper. However, this changes the interpretation of our results but slightly, because the exact probability distribution of the metric is only important for the choice and interpretation of the stopping criterium of the clustering. The choice of our algorithm and stopping criterium guaranteed that any two points in the same cluster are "indistinguishable" and that any two distinct clusters contain at least two points that are "distinguishable". We called two points distinguishable if the corresponding histograms have a distance ofχ 2 /N > 1 and Cutoff values c for fixed p values P( 2 r /r > c|H 0 ) indistinguishable if not. For the 9 bins in our examples,χ 2 /N = 1 then corresponded to a p value of 34% (not 32% as claimed). In the paper it was also missed to point out that this value depends on the number of bins. The dependency is shown in figure 1. It should also be highlighted that the approximation of the distribution ofχ 2 values to the χ 2 N −1 distribution can break down if the uncertainties σ ki are very imbalanced, though this does not usually happen if the uncertainties are dominated by Poisson uncertainties.
More importantly, we correct our treatment of the more general case of histograms with correlated bin contents. The approach of our paper to transform this problem into a scenario where (1) again leads to a χ 2 r distribution is incorrect. Instead, we define ∆ i = n 1i With Σ 1 and Σ 2 the covariance matrices of n 1i and n 2i , we define Σ = Σ 1 Under the null hypothesis,χ 2 (H 1 , H 2 ) approximates a χ 2 N −1 distribution. In the case of uncorrelated bin contents,χ 2 =χ 2 . As the examples presented in our paper did not consider correlation, this does not affect any of the results.
Both mistakes have been corrected in ClusterKinG release 1.1.0. The χ 2 metric is now implemented using equation (2) and we define d(c 1 , c 2 ) =χ 2 (H c 1 , H c 2 )/(N − 1) (such that the expectation value is fixed to unity as originally intended). Additional unit tests automatically run several small toy studies to confirm that our current implementation produces correct approximations of χ 2 r distributions. We have performed further validation studies for our statistical treatment of the examples shown in the paper: for each point in parameter space we consider the corresponding histogram and its covariance matrix. Toy histograms are generated by drawing random values from the multivariate normal distribution with matching means and covariance matrix.

JHEP05(2021)147
We then calculate the test statisticχ 2 between each toy histogram and the original histogram. The distribution of allχ 2 /(N − 1) values is binned and compared to the calculated expected distribution χ 2 N −1 /(N − 1) using the Jenson-Shannon Divergence (JSD). An example for one particular point in parameter space is shown in figure 2. Both histograms agree nicely, resulting in a low JSD value. The result of repeating the same procedure across all points is shown in figure 3, showing satisfactorily low divergence values.
Additional code to reproduce the figures shown here and to validate the statistical treatment has been added to the ClusterKinG repository.
Open Access. This article is distributed under the terms of the Creative Commons Attribution License (CC-BY 4.0), which permits any use, distribution and reproduction in any medium, provided the original author(s) and source are credited.