Clustering of B¯→D∗τ−ν¯τ\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$ \overline{B}\to {D}^{\left(\ast \right)}{\tau}^{-}{\overline{\nu}}_{\tau } $$\end{document} kinematic distributions with ClusterKinG

New Physics can manifest itself in kinematic distributions of particle decays. The parameter space defining the shape of such distributions can be large which is chalenging for both theoretical and experimental studies. Using clustering algorithms, the parameter space can however be dissected into subsets (clusters) which correspond to similar kinematic distributions. Clusters can then be represented by benchmark points, which allow for less involved studies and a concise presentation of the results. We demonstrate this concept using the Python package ClusterKinG, an easy to use framework for the clustering of distributions that particularly aims to make these techniques more accessible in a High Energy Physics context. As an example we consider B¯→D∗τ−ν¯τ\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$ \overline{B}\to {D}^{\left(\ast \right)}{\tau}^{-}{\overline{\nu}}_{\tau } $$\end{document} distributions and discuss various clustering methods and possible implications for future experimental analyses.


Introduction
New Physics (NP) contributions can influence the kinematic distributions of particle decays. While this opens up possibilities to find and determine the nature of NP, it can also be a nuisance for experimental studies, because most measurements require assumptions on certain kinematic distributions, e.g. to distinguish signal from background or to determine efficiencies.
For example, as was shown in [1], assuming a two-Higgs-doublet model of type II changes the experimental measurement of R(D ( * ) ), because discriminating between signal and background requires assumptions on the kinematic shapes of the signal, background and normalisation modes. Such kinematic shapes are in general determined from Monte Carlo simulations.
Thus, many experimental measurements are model-dependent and are often only conducted under the assumption of the Standard Model (SM). Discrepancies between the SM JHEP04(2020)007 prediction and the measured values are a good indication for NP. However comparing different NP models based on their predicted results has to be taken with a grain of salt, because the measurements themselves are model-dependent.
A further complication for both theoretical and experimental studies is the high dimensionality of the parameter space of typical NP models. If experimentalists wish to publish model-independent results, the studies have to be repeated for a large sample of parameter points. This can be computationally very expensive. Furthermore numerical results and many visualisations can only be shown for specific (often arbitrary) parameter points, leaving their representative character unknown.
A possibility to reduce the complexity of this problem is to identify regions in the parameter space which lead to similar kinematic distributions. These regions can be found using clustering algorithms. From each cluster, a most representative point (benchmark point, BP) can then be chosen.
Experimental studies can focus on these BPs, thereby reducing the multi-dimensional problem to a small number of BPs to be considered. The results can be presented for each BP, allowing for a clear-cut numerical result and simpler visualisations.
Such a strategy has been employed for the first time in the context of Higgs boson pair production in [2][3][4]. An effective field theory (EFT) approach has been adopted to parametrise the five-dimensional parameter space of anomalous Higgs couplings. It has been shown in [2], that for current and future LHC searches a total of 12 clusters gives a reasonable approximation of the considered parameter space. In [3] several clusters were subjected to experimental limits from the CMS collaboration [5]. Finally, in [4] a method to extend the experimental sensitivity from the BPs to the other cluster members is discussed.
In recent years substantial progress has been made in the EFT description of the SM in form of the SM Effective Theory (SMEFT) [6] and the Weak Effective Theory (WET). The calculation of the complete one-loop SMEFT renormalization group equations (RGEs) [7][8][9][10], the complete tree-level and one-loop matching from SMEFT onto WET [11][12][13] and the complete one-loop QCD and QED RGEs within WET [14,15] allow for a general NP analysis of low-energy observables.
However public tools to cluster the phenomenology of NP operators in a systematic way are still missing so far.
To fill this gap we have written the Python package ClusterKinG (Cluster ing of Kinematic Graphs), which aims to make clustering techniques more easily accessible in the context of EFTs and High Energy Physics in general. Despite this motivation, ClusterKinG is a general tool that can be applied to a diverse set of problems, even outside of physics.

JHEP04(2020)007
The rest of the paper is organized as follows: in section 2 we discuss the clustering method in general terms. In section 3 we present for the first time the ClusterKinG package and describe its features. In section 4 we apply the clustering method to kinematic distributions of the decaysB → D ( * ) τ − (→ ν ν τ )ν τ and perform various consistency tests. Finally we conclude in section 5.

Clustering
In this section we discuss the different steps involved in our clustering approach. As a first step a suitable observable and the corresponding model of interest have to be chosen. After establishing the underlying parameter space the following steps will be performed: • Sampling the parameter space of the process.
• Computing the kinematic distributions in question.
• Choosing a metric to measure differences between the kinematic distributions.
• Applying a suited clustering algorithm on the parameter space.
• Computing the BPs representing each of the clusters.
The above steps are explained in the following subsections.

Sampling of the parameter space
As discussed in the introduction, typical NP models depend on several parameters. Theoretical considerations (such as symmetry arguments) can often be used to limit the study to a subset of these parameters, thereby reducing the dimensionality of the problem. The considered range of these parameters can be motivated by existing exclusion limits or from theory.
From this subset of the original parameter space, a set of points, sample points, are taken and used for the rest of the analysis. While large numbers of sample points will make the following steps more computationally expensive, it is important that the sampling is fine enough to accurately represent the whole parameter space.
ClusterKinG allows for an arbitrary selection of sampling points. The examples presented in this paper use a uniform grid in the parameter space for simplicity. In order to limit the number of required points (and thereby computing time), it is also planned to implement adaptive sampling techniques in the future: after an initial run with a coarse grid, regions with large variations in the kinematic distributions are identified and sampled in a finer grid. If needed, this procedure can then be applied several times.

Kinematic distributions
For every sample point, the corresponding kinematic distribution needs to be computed. If analytic expressions of the observable in terms of the parameters are available, this task can be achieved by explicit evaluation of the formulae. Otherwise Monte Carlo (MC) simulations have to be used to generate the distributions. Since the generation of MC samples JHEP04(2020)007 is in general resource-intensive, reweighting techniques can be used to generate samples corresponding to different parameter points from existing samples. For semileptonic B decays such methods are already implemented in the HAMMER tool [40,41].
In this article we only consider binned observables and our kinematic distributions are thus histograms.

Metric
The objective of the clustering procedure is to partition the parameter space in such a way that parameter points generating similar kinematic distributions are grouped into the same cluster. For this, the "similarity" between kinematic distributions has to be quantified by a metric 1 in the space of kinematic distributions.
The choice of this metric follows from the interpretation of the distributions as potential measurements. As such, the metric of choice should give more weight to differences in bins that are associated with low expected experimental uncertainties, while being less sensitive to differences in bins of less experimental significance. Having an estimate of experimental uncertainties is also useful when deciding the number of clusters and benchmark points to present: sufficiently many to cover the whole variety of distributions that lead to distinguishable experimental results, but not arbitrarily many. In this way, the number of clusters then also serves as an estimate for the sensitivity of a distribution to NP parameters.
A common choice for a metric measuring the similarity between binned distributions is a two-sample test statistic such as the Kolmogorov-Smirnov test [42], the Anderson-Darling test [43] or the χ 2 test.
In [2] a binned log likelihood test statistic is used to distinguish between two distributions. This likelihood ratio is obtained by taking the ratio of the binomial distribution of the two individual samples and the binomial distribution where both samples are assumed to be equal. The log of this ratio can be shown to be χ 2 -distributed up to a minus sign [44]. By basing the test statistic on binomial distributions, the metric incorporates the statistical significance of the different bins.
In this paper, we use a general χ 2 metric operating on distributions with uncertainties. As the distributions are not measured but generated as described in section 2.2, an uncertainty estimate is applied to them, consisting of configurable statistical uncertainties and relative and absolute uncertainties that can be correlated between bins.
Let n 1i and n 2i be the bin contents of two histograms H 1 and H 2 with corresponding normalisations N 1 = N i=1 n 1i and N 2 = N i=1 n 2i and uncertainties per bin σ 1i and σ 2i (i = 1, . . . , N ).
If all uncertainties are uncorrelated, the χ 2 distance between the two histograms is defined by

JHEP04(2020)007
In case of correlated errors, the correlation matrix C is the same for both histograms by construction. Thus, one can switch to a simultaneous basis of uncorrelated errors by consideringñ 1i = j C −1 ij n 1j andñ 2i = j C −1 ij n 2j . If both histograms are independently generated by the same random variable, then χ 2 /N has an expectation value of 1. In the following we call two distributions distinguishable if their χ 2 -distance χ 2 (H 1 , H 2 )/N is larger than one, corresponding to a p-value of 32%. This loose requirement is conservative in the sense that it will lead to more clusters than a stricter criterion.
The metric between the kinematic distributions gives rise to a metric d acting directly on the parameter space. In our case: for two sample points c 1,2 and their respective histograms H 1,2 .

Clustering algorithm
In general a data set can be either clustered hierarchically or partitionally. Partitional clustering methods [45] such as for example K-means algorithms [46] only perform one single partition of the data. Furthermore the number of resulting clusters has to be chosen beforehand as an input parameter. In the following we focus on hierarchical clustering methods [47]. Hierarchical clustering algorithms group a given set of data in a nested sequence of clusters. Two approaches are common: bottom-up (or agglomerative) algorithms successively merge clusters to form larger clusters, whereas top-down algorithms successively split clusters into smaller ones. In both cases, a stopping criterion is needed to avoid a trivial result. In our analysis we will employ the agglomerative method with the following steps: 1. Associate each sample point to one cluster containing only this element.
2. Merge the nearest two clusters according to a metric D.

Repeat step 2 until the stopping criterion is fulfilled.
Note that the metric D in step 2 is not between points in the parameter space, but between subsets (existing clusters) of this space. It makes sense to base D on the metric d introduced in eq. (2.2). Two canonical choices for the inter-cluster metric D are where C 1 and C 2 are clusters and |C 1,2 | denote their number of elements respectively.
While [2] uses the 'average' metric D 1 , we will employ the D ∞ metric in our analysis. This choice is more conservative in the sense that it usually leads to a larger number of clusters than in the case of D 1 , simply because D ∞ ≥ D 1 .
The stopping criterion is chosen to be D ∞ (C 1 , C 2 ) > 1 for all pairs of clusters. This means that if we merge two clusters the resulting larger cluster does not contain any two distinguishable points. As a consequence all clusters of our final result contain only indistinguishable sample points. This is not the case for the 'average' metric in general.

Benchmark points
After the application of the clustering algorithm BPs have to be determined for all of the resulting clusters. A BP is a cluster element c of a cluster C, which is chosen as a representative of that particular cluster. Usually it is taken to be the parameter point that minimizes a certain figure of merit, which is commonly based on the metric d of eq. (2.2).
Examples are: which differ in their responsiveness to outliers in the data. The BPs are the key elements of the cluster analysis. They are determined to simplify experimental analyses, which can then lay their focus only on a finite set of BPs instead of the entire parameter space.

The ClusterKinG package
ClusterKinG is publicly developed as open source under MIT license on github.com/clusterking. The package aims for ease of use, while making it easy for users to manipulate and extend functionality.
The basic building blocks of ClusterKinG are worker classes: after initialisation, a set of methods can be called for further configuration, before calling a run method, that performs the desired action. By subclassing these worker classes, the functionality can be extended. Furthermore, these worker classes can be passed on to other worker classes to perform e.g. stability checks that require repetitive calling with slight parameter variations.
To demonstrate the ease of use, a fully working example of code to generate clusters, benchmark points and plots similar to the ones presented in section 4 is shown in appendix A.
A typical ClusterKinG workflow consists of the following steps: 1. Initialise a Data object: this is what the following worker classes read from and write to.
Internally this class holds a pandas DataFrame [48] that contains the bin contents of the kinematic distributions for each sample point. Additional information (metadata) is saved as a nested dictionary. Each of the following steps will add information to the metadata, such that any output file contains a description of how it was generated.
Data objects can be saved to and loaded from an output file in SQL format, allowing to save all data in a single file of comparably small size. Exports of the data to other formats (CSV, XLS, JSON, . . . ) are easily available.
2. Adding uncertainties (optional). After the distributions have been generated, an experimental uncertainty estimate can be added to them. Typically this consists of • Statistical uncertainties, modeled as Poisson errors (and thereby dependent on the actual bin content for the distribution for each sample point).

JHEP04(2020)007
• Systematic uncertainties which can be given in absolute form or relative to the bin content for all sample points at once.
To save memory space and improve performance, the uncertainties on the distributions are only calculated when needed and are not saved to the Data object, but only the procedure of how they are calculated. This also means that it is very easy and fast to check the dependency of the clustering results on the uncertainties.
with the CKM element V cb , the Fermi coupling constant G F and the effective operators O T = (cσ µν P L b)(τ σ µν P L ν τ ) .
We use the notation P R,L = 1 2 (1 ± γ 5 ), σ µν = i 2 [γ µ , γ ν ] and b, c, τ, ν τ for the quark and lepton fields. The Wilson coefficients in eq. (4.1) are in general complex quantities. For our analysis we will assume the CP conserving limit, i.e. real Wilson coefficients. This is a common assumption 2 (see f.e. [32,33]) and is mainly chosen for simplicity. Furthermore, for presentational reasons we study three out of the five Wilson coefficients and choose one for each Dirac structure, namely: The clustering is performed using the ClusterKinG package. As mentioned in section 2 we use the hierarchical clustering algorithm together with the χ 2 -metric defined in section 2. The stopping criterion is chosen such that the χ 2 distance between all distributions within the same cluster is ≤ 1, meaning that they are indistinguishable experimentally. Finally, the BPs are obtained by adopting the figure of merit f 1 from eq. (2.4).
The complete code that has been used for the generation of the results and plots below is provided in the example directory of the ClusterKinG repository together with usage instructions.

cos(θ
As a first example we consider nine bins of the cos(θ τ )-distribution of the branching ratio BR(B → D 0 * τ ν), where θ τ denotes the angle between the τ and the B meson in the dilepton mass frame. The kinematic distributions are generated using the flavio [23] package. With an assumed systematic uncertainty of 1% and statistical uncertainties corresponding to a yield of 700 events, our clustering procedure leads to a total of six clusters and their corresponding BPs. The clustered parameter space is shown as two dimensional cuts in the C T -direction in figure 1 and numeric values for the BPs are reported in table 1. As can be seen in figure 1, the parameter space exhibits a strong cluster variation in the direction of C T . This fact is not surprising considering the explicit dependence of the kinematic distribution and agrees with the findings in [56], where this "flat term" observable has been proposed in the context of charged b → c transitions involving light leptons.
The distributions corresponding to the sample points are visualised as a box plot in figure 2. As expected, different clusters correspond to significantly different distributions. Furthermore, the distributions of the benchmark points are similar to the distributions given by the means of the bin contents of all distributions of the corresponding cluster.

cos(θ
As a second example we consider the cos(θ V )-distribution of the process B → D 0 * τ ν.
Here θ V denotes the angle between D 0 * and the B meson. The kinematic distributions for this process are again generated using flavio [23]. Assuming a signal yield of 700 events and a relative systematic uncertainty of 1%, the clustering procedure leads to a total of JHEP04(2020)007 three clusters. The clustered three-dimensional sample space is shown in figure 3 and C T can again be identified to be the most influential Wilson coefficient. A large subset of the parameter space belongs to cluster 2 (blue), whereas only a few sample points are contained in the first cluster (red), which is found at the edges of the sample space. The three BPs are reported in table 2. The clustered three-dimensional parameter space resulting from dBR(B → D 0 * τ ν)/d(cos(θ V )) distributions is shown. The parameter space is spanned by the three Wilson coefficients C V L , C SL and C T , varied within their respective ranges. Three different clusters are found in our approach, which are indicated with different markers and colours. BPs are given in boldface.

BP
C  Table 2. List of benchmark points for the distribution dBR(B → D 0 * τ ν)/d(cos(θ V )) obtained from the clustering procedure given in terms of the left-handed vector, scalar and tensor Wilson coefficients C V L , C SL and C T .
Compared to θ τ , less clusters are found, but the shapes of the clusters are different from the previous ones. The two observables can therefore be considered complementary in their respective sensitivity to NP models.

q
The q 2 -distribution of BR(B → D 0 τ ν) has already been studied extensively in the literature. For our purpose we consider this observable to study the influence of the systematic and statistical uncertainties on the resulting number of clusters. This is relevant for future experiments such as Belle II, reaching a higher luminosity than previously attained. The q 2 -distributions were computed using flavio [23]. In figure 5 we show the number of clusters as a function of the signal yield for various systematic uncertainties. As expected, the number of resulting clusters is dominated by the signal yield for small yields, while systematic uncertainties become the limiting factor for larger yields.

E
Finally, we consider E , the energy of the light lepton from the 5-body decayB → Dτ − (→ ν ν τ )ν τ . To generate the kinematic distributions in terms of the chosen set of Wilson coefficients, we use the explicit expressions given in [57] as further outlined in appendix B.  Figure 6. Clustering of the one-dimensional parameter space p of the observable dΓ(B → Dτ − (→ ν ν τ )ν τ )/d(E ), with E denoting the lepton energy. The NP parameter p stems from a two-Higgs-doublet model. The clustering is performed assuming signal yields of 1000, 1800 and 2000 events as well as a relative uncertainty of 1% and leads to a total of two, three and four clusters respectively.
The study of this observable is motivated by the BaBar analysis in [1], where the experimental value of R(D) was extracted under the assumption of a two-Higgs-doublet model. Signal and background yields were extracted with a fit to the two-dimensional m 2 miss , | p * | distribution. Here, m miss and p * denote the mass of the undetected neutrinos and the three-momentum of the light lepton in the rest frame of the B meson. The shape of these distributions for signal and background were taken from MC simulations. Such MC simulations are usually assumed to be SM like. In [1]  This result motivates to perform a clustering analysis to investigate the model dependency of the input kinematics.
In this paper, we consider the E -distribution, which can be taken as an approximation of | p * | as considered in [1]. The results of our clustering analysis with respect to the parameter p are shown in figure 6. The one-dimensional parameter space is clustered three times, assuming signal yields of 1000, 1800 and 2000 events as well as a relative systematic uncertainty of 1%. The first sub-figure shows the two resulting clusters for a yield of 1000 events, which coincide with the findings of [1], where two different values of R(D) are obtained. However, in [1] the first value for R(D) is obtained for p ≤ 0.3 and the second one for p ≥ 0.45, whereas the first sub-figure suggests to have the same R(D) for 0.3 ≤ p ≤ 0.7 and another value for the rest of the parameter space.
Increasing the yield (and thereby reducing the uncertainties on the distributions) results in more clusters in the middle region (see figure 6), again indicating that the shape of the kinematic distribution significantly changes between 0.3 < p < 0.7. However, as can be seen from figure 7, the kinematics for low and high p are still very similar, incompatible with the result of [1].
However, since [1] used the m 2 miss distribution together with the | p * | distribution, it is not too surprising to arrive at a rather different result, as the shape of the distributions in m 2 miss could behave very differently than | p * |. Applying clustering techniques to the 2D m 2 miss , | p * | distribution will allow for a more thorough comparison and is left for future work.

Conclusions
In this article we discussed cluster analyses of kinematic distributions. These analyses divide the parameter space governing the distribution into subsets (clusters), in which all points correspond to similar kinematic distributions. Each cluster is then reduced to its most representative point (benchmark point, BP ). Analyses relying on these kinematic distributions can then be carried out for the BPs only, rather than using the full parameter space. This can drastically reduce the required computing power and makes it easier to present numerical results as well as visualisations.
The results of the cluster analyses depend on the sampling of the parameter space, the clustering algorithm and the metric measuring differences between kinematic distributions. This paper introduced the Python package ClusterKinG which implements the above steps and allows to perform clustering analyses without technical overhead. While it particularly aims to make clustering techniques more accessible for the High Energy Physics community, the software can also be applied to more general problems outside of particle physics. ClusterKinG is available at github.com/clusterking together with usage examples and technical documentation.

JHEP04(2020)007
We used the ClusterKinG package to study several kinematic distributions of the decaysB → D 0( * ) τ −ν τ . The θ τ and θ V distribution ofB → D 0 * τ −ν τ were studied, showing the clustered parameter space, the BPs as well as the corresponding distributions. A strong dependence of the θ τ -distribution on the tensor Wilson coefficient C T has been shown, which agrees with previous findings in the literature [56].
The influence of statistical and systematic uncertainties on the clustering result is shown on the example of the q 2 -distribution ofB → D 0 τ −ν τ . Finally we analysed the E -distribution of the 5-body decayB → Dτ − (→ ν ν τ )ν τ . The shape of this variable is an important input for some experimental measurements of R(D). The resulting model dependency that was observed in [1] on the example of type II two-Higgs-doublet models should also be seen from clustering the input kinematics. While not entirely consistent with the results of [1], our simplified approach correctly hints at the significant change of R(D) at tan β/m 2 H ± ≈ 0.3 GeV −1 . However, a full analysis including also the shape of the m 2 miss distributions remains to be done in order to fully compare the results with those of [1]. where q 2 is the invariant mass of the lepton pair and I j i denote the angular coefficients, which depend on q 2 and the Wilson coefficients. The angle θ is defined as the angle between the direction of the tau in the dilepton rest frame and the direction of the dilepton in the B rest frame, whereas θ V is defined as the angle between the direction of the D 0 * meson in the dilepton rest frame and the direction of the D 0 * in the B rest frame. A general discussion of the kinematics of semileptonic meson decays (in the context of B → K * ¯ ) together with the angular coefficients can be found in [58]. ForB → D 0( * ) τ −ν τ decays the corresponding distributions are discussed for example in [57,59]. Our implementation of the above decay rates can be found at github.com/clusterking/clusterking physics.
Open Access. This article is distributed under the terms of the Creative Commons Attribution License (CC-BY 4.0), which permits any use, distribution and reproduction in any medium, provided the original author(s) and source are credited.