Abstract
Discrepancies between measurements of decay modes with an underlying quark level transition \(b\rightarrow s \ell ^+\ell ^-\) and standard model (SM) predictions have persisted for several years, particularly for the muon channels. The inadequacy of the SM becomes more compelling in a global fit. For example, Capdevila et al. (JHEP 01, 093. arXiv:1704.05340, 2018) described 175 observables by six parameters encoding new physics and quantified the disagreement with the SM at about the \(5\sigma \) level. While certain one and two parameter fits have previously been considered in detail, we establish a framework for the detailed discussion of the full 6d fit. We visualize and quantify the 6d \(1\sigma \) region around the best fit point and define fit uncertainties for both current and future observables. We then define metrics quantifying the deviations between measurements and both SM and best fit predictions. These metrics relate observables to directions in parameter space, revealing their precise role in the fit, thus providing guidance for future theoretical and experimental work. Some metrics further quantify the role of correlated uncertainties, which turns out to be significant. For example the relevance of angular observables such as \(P_5^\prime \) is reduced in this context. Finally, studying the space of observables allows us to discuss the internal tensions in the fit.
Similar content being viewed by others
Avoid common mistakes on your manuscript.
1 Introduction
Many measurements have been performed in recent years on decay modes with an underlying quark transition \(b\rightarrow s \ell ^+\ell ^-\), where \(\ell \) includes muons and electrons. Not surprisingly, some of these measurements show deviations from the standard model (SM) predictions by a few standard deviations. More interesting is the fact that several of these deviations from the SM appear to be ‘in the same direction’, in such a way that when quantified by a global fit the discrepancy with the SM is at just over the 5 \(\sigma \) level.
Several global fits that allow for the possibility of new physics (NP) in a model independent fashion described in the framework of effective field theory have been performed in the literature [1,2,3,4,5,6,7,8,9]. The basis for our study will be the fit of Ref. [1] which allows for NP by floating six Wilson coefficients of the effective low energy Hamiltonian responsible for the quark-level transition:
The six operators in question are
The factorization scale is taken at 4.8 GeV, so that \(m_b(\mu _b)\) is the \(\bar{\text {MS}}\) running b-quark mass at that scale, and the SM values of the Wilson coefficients are \(C^\text {SM}_{7,9,10}=-0.29,4.07,-4.31\) and \(C^\text {SM}_{7^\prime ,9^\prime ,10^\prime }=0\). New physics would be parametrized in a model independent way by deviations in these coefficients from their SM values, \(C_{i\ell } \equiv C_i^\text {SM}+C^\text {NP}_{i\ell }\) (\(i=7^{(')},9^{(')},10^{(')}\), \(\ell =\mu \)), that is, we only treat the muon coefficients as free parameters. We assume these deviations to be real and drop any CP violating observables. We note that this fit does not include (pseudo) scalar or tensor operators.
The full fit includes all available results for the following decay channels (see appendix for a full list of the 175 observables with references):
-
\(B^{(0,+)}\rightarrow K^{*(0,+)}\mu ^+\mu ^-\), \(B^{(0,+)}\rightarrow K^{*(0,+)}e^+e^-\), \(B^{(0,+)}\rightarrow K^{*(0,+)}\gamma \),
-
\(B^{(0,+)}\rightarrow K^{(0,+)}\mu ^+\mu ^-\), \(B^+\rightarrow K^+e^+e^-\) (through the \(R_K\) observable),
-
\(B_s\rightarrow \phi \mu ^+\mu ^-\), \(B_s\rightarrow \phi \gamma \),
-
\(B\rightarrow X_s\mu ^+\mu ^-\), \(B\rightarrow X_s\gamma \) and \(B_s\rightarrow \mu ^+\mu ^-\).
Although experimental data on the baryonic decay \(\Lambda _b \rightarrow \Lambda \mu ^+\mu ^-\) is available, it is not included in the fit because for the low-\(q^2\) region the QCD factorization is poorly understood [10], while at high-\(q^2\), where a recent determination of the \(\Lambda _b\rightarrow \Lambda \) form factors from lattice QCD [11] reduces theory uncertainties, experimental errors are large [12] (see discussion in [2, 13]).
Reference [1] discussed several scenarios with one or two non-zero NP contributions at a time in detail and analysed the general scenario where NP contributions are allowed to all six muonic operators. Updating their numbersFootnote 1 we find the best fit point (BF) to be,Footnote 2
These coefficients lead to a \(\chi ^2\) lower than that of the SM by 39.9, indicating that the best fit differs from the SM at the level of 5 \(\sigma \). In this particular fit, the coefficients \(C^\text {NP}_{9\mu }\), \(C^\text {NP}_{9^\prime \mu }\), \(C^\text {NP}_{10\mu } \) and \(C^\text {NP}_{10^\prime \mu }\) are so labelled because they are only allowed to differ from the SM for the muons, breaking lepton flavor universality. Bearing in mind that the six coefficients identified explicitly in Eq. 3 are the only ones allowed to differ from their SM value in this study, we will proceed to drop the superscript NP and the muon label for notational simplicity.
From this starting point, our paper provides a comprehensive statistical analysis in the full six-dimensional space, by visualizing aspects of it with the aid of the grand and guided tour [14], a systematic comparison between predictions of the SM and BF point against measurements, and a quantitative analysis relating parameter directions and individual observables, based on the Hessian approximation to the \(\chi ^2\) function. Throughout this analysis we show for the first time how knowledge of correlated uncertainties is important in judging the relative importance of individual observables to the fit.
Our paper is organized as follows. In Sect. 2 we visualize the neighborhood of the best fit point, we present the Hessian matrix of second derivatives at the minimum of the \(\chi ^2\) function and use it to construct twelve points characterizing the boundary of the one sigma region. These points will serve as benchmarks for our quantitative studies. We also use the Hessian to quantify the difference between the six-dimensional BF point and simple one and two parameter fits studied previously in the literature. In Sect. 3 we define metrics to compare the predictions of the fit at different parameter sets and apply them to all the observables in Sect. 4. In Sect. 5 we discuss future observables designed to further test lepton flavor universality and assess their potential impact on the global fit. Finally, in Sect. 6 we summarize our results and conclude.
2 The best fit point and its neighborhood
The parameter space in our problem corresponds to the set of six Wilson coefficients \(C^{}_i\) encoding physics beyond the SM as per Eq. 1. Predictions for the experimental observables, as well as their theoretical uncertainties, are functions of these six parameters. The minimization of the \(\chi ^2\) function has been performed numerically, and in fact the analytic form of the function is not known. It is thus desirable to have an approximate analytic form for this function which we construct in terms of the Hessian matrix of second derivatives of \(\chi ^2\) at the minimum. This information will form the basis of our analysis in this section.
We obtain this matrix numerically in two different ways. We first use the minimization procedure to provide us with a set of six-dimensional points (4959) near the minimum, \(S_{\chi ^2}(C^{}_i)\), along with their \(\chi ^2\). With this set of points we construct an approximation to the \(\chi ^2\) function near its minimum and we derive from it the Hessian.Alternatively, once the BF is determined, the Hessian matrix can also be computed by explicit evaluation of the corresponding second partial derivatives of the \(\chi ^2\) function with numerical routines. We verify agreement between these two numerical Hessian matrices within the uncertainty of the approximations.
2.1 Visualization of the BF region
We begin by visualising the six dimensional parameter region using the sample of points \(S_{1\sigma }\) (points of \(S_{\chi ^2}(C^{}_i)\) which are at most \(1\sigma \) away from the BF) and using the tour algorithm to obtain a sequence of 2-d projections that result in an animation of scatter plots (as shown here).Footnote 3 To produce the animation we first center all parameter directions, the centered points are then scaled with the standard deviation in each direction within \(S_{1\sigma }\), such that all directions have comparable scales.Footnote 4 We then search for appropriate projections that illustrate the separation between the BF point and the SM point. Finally, to show these projections we center the view by subtracting the mean position of the points in \(S_{1\sigma }\).
In this animation, the corresponding projection matrix for each view is shown as “axes”. Notice that the axes are centered with respect to the point cloud and therefore the origin is somewhat shifted with respect to the BF point. Watching the animation builds intuition about the features of \(S_{1\sigma }\), for example it illustrates how features observed in lower dimensional studies embed in the full space.
Most notably in this example we observe how the SM point moves away from \(S_{1\sigma }\). The animation provides a striking picture of the separation between the BF region and the SM that is mostly along the \(C^{}_9\) direction, a result that is well known in the literature. We further illustrate this with a static picture in Fig. 1, where the projection has been selected for showing the large distance between the SM and BF point. For comparison we illustrate the positions of the SM, BF and selected one and two dimensional best fits from Ref. [1] (listed in the first column of Table 1) as described in the caption. Another interesting feature that can be seen in the animation is that all these lower dimensional best fits are found roughly in the same half of \(S_{1\sigma }\).
2.2 Quadratic approximation
The Hessian matrix is the standard tool to construct a quadratic approximation to the \(\chi ^2\) function in the vicinity of the global minimum. The eigenvectors of this matrix correspond to the directions of the principal axes of the six-dimensional confidence level ellipsoids around the BF that occur in this approximation. There are twelve sets of points \(C_i\) defined by the intersections between the \(1\sigma \) confidence level hyper-ellipsoid and its principal axes that serve to gauge the behavior of the fit as one moves away from the BF within \(S_{1\sigma }\). As these points are constructed through a singular value decomposition (SVD) of the Hessian matrix, we will refer to them as the SVD points in what follows.
The Hessian at the minimum, in the basis \((C_7, C_9, C_{10},\) \(C_{7^\prime }, C_{9^\prime }, C_{10^\prime })\), is given by
In diagonal form, \(H_D=\mathrm{diag}(6621, 5647, 115.6, 72.6, 44.7, 6.1)\), exhibits a hierarchy in its eigenvalues, corresponding to some directions being much more constrained than others. In terms of H one has
where the deviations of the six Wilson coefficients \(C_i\) from their BF values are written as the vector \((C-C^{BF})\). The SVD points are reproduced in Table 9 in the appendix along with their corresponding \(\Delta \chi ^2\) (with respect to the BF). The first column “EV” labels the SVD directions from 1 to 6 by decreasing eigenvalue, the ± refer to the two possible ways of moving along a particular direction away from the BF point. Within the quadratic approximation to the \(\chi ^2\) function, all these twelve points have \(\Delta \chi ^2=7.1\) and lie precisely \(1\sigma \) away from the BF. The exact values of the \(\chi ^2\) for these points are also shown in the table and are an indication of how well the approximation works in this case, they range approximately between \(0.85~\sigma \) and \(1.4~\sigma \).
A comparison between \(S_{1\sigma }\) (yellow) and its quadratic approximation (blue) is shown as an animation (here). To produce this visualization we have rescaled each parameter direction to range between 0 and 1 to facilitate comparisons. Three static projections are shown in Fig. 2, where the black diamonds show the 12 SVD points, and the yellow (blue) regions are the corresponding projections of \(S_{1\sigma }\) (quadratic approximation to \(S_{1\sigma }\)). While the approximation is seen to work reasonably well in most directions, the first two views illustrate some inadequacies.
This visualization also reveals new features of the fit, for example, the third view shows the projection onto \(C^{}_{10^\prime }\) vs \(C^{}_{9^\prime }\) illustrating a strong correlation found between these two parameters.
The Hessian approximation also allows us to quantify the distance between the global best fit point, Eq. 3, and certain one or two-parameter scenarios singled out previously in the literature which can be associated with simple NP models. The comparison of these scenarios against the SM was already presented in Ref. [1] in terms of the measure Pull\(_\mathrm{SM}\), with a large value of this quantity indicating a large deviation from the SM.
This so-called Pull is a statistical measure, presented in units of Gaussian standard deviations (\(\sigma \)), quantifying the level of agreement between two different parametric hypotheses \(H_0\) and \(H_1\) (\(H_0\subset H_1\)) in describing a given data set. Let \(\chi ^2_{\text {min},H_0}\) and \(\chi ^2_{\text {min},H_1}\) be the minimum values of the \(\chi ^2\) statistic under \(H_0\) and \(H_1\), respectively (since \(H_0\) is contained in the family of parametrizations described by \(H_1\), then \(\chi ^2_{\text {min},H_0}\ge \chi ^2_{\text {min},H_1}\)). For large fits, Wilks Theorem [19] guarantees that \(\Delta \chi ^2_{H_0H_1}=\chi ^2_{\text {min},H_0}-\chi ^2_{\text {min},H_1}\) will follow a \(\chi ^2\) distribution with \(n_\text {dof}=n_{\text {pars},{H_1}}-n_{\text {pars},{H_0}}\) degrees of freedom, being \(n_{\text {pars},{H_1}}\) and \(n_{\text {pars},{H_0}}\) the number of free parameters characterizing the two hypotheses. Then, the Pull comparing \(H_0\) and \(H_1\) reads,
where F is the \(\chi ^2\) cumulative distribution function (CDF).
In Table 1 we present one and two dimensional fits using the quadratic approximation to the \(\chi ^2\) function. For the sake of comparison, we analyse all the scenarios (without electronic NP contributions) already studied in Ref. [1]. For each scenario, we assess its statistical significance with respect to both the SM and the BF, introducing \(\text {Pull}_\mathrm{6D}\) in analogy with Pull\(_\mathrm{SM}\) to quantify the preference for the six-dimensional BF over the different 1D and 2D hypotheses considered.
2.3 Hessian eigenvectors and the Wilson coefficients
The SVD directions are mostly aligned with one or two of the Wilson coefficients as shown in Table 2. Thus, to a good approximation there is a simple correspondence between a given SVD direction and the \(C^{}_i\) as given in the table. The direction corresponding to the largest eigenvalue of the Hessian is that in which the \(\chi ^2\) function changes most rapidly near its minimum and is thus most strongly constrained by the data. In this case the first two directions have similarly large eigenvalues (more than 50 times larger than the rest) and, as can be seen from Table 2, they correspond to the parameters \(C_7\) and \(C_{7'}\) to a very good approximation. The interpretation that these two parameters are essentially fixed and there is very little room to play with them is compatible with statements made in the literature [2]. Interestingly a combination of \(C_{10'}\) and other coefficients is the third most constrained direction, in particular we observe a large correlation with \(C_{9'}\), see Fig. 2 (right panel), with a correlation coefficient of 0.82. Curiously, this correlation approximates the pattern that would be expected from right-handed currents.
The constraints on \(C_{9',10'}\) arise primarily from \(P_1\) and \(P_4^\prime \). As can already be seen in Table 1 of [2], both \(P_1\), in all its \(q^2\) bins, and \(P_4^{\prime }\) in the high end of the low-\(q^2\) region, are very sensitive to these two coefficients, with the sensitivity to \(C_{10'}\) generally more pronounced. As already observed in that reference, these observables are within \(1\sigma \) of the SM and thus restrict the range \(C_{9'}\) and \(C_{10'}\) can take. The coefficient \(C_{10'}\) is further constrained by measurement of \(Br(B_s\rightarrow \mu ^+\mu ^-)\), leaving little room for departures from 0.
3 Metrics for quantitative comparisons
Here we introduce several quantitative metrics to compare different sets of predictions and observations in the next section.
3.1 Comparing predictions and observations
In order to quantify the comparison between the different predictions and the experimental results we will use the following metrics
-
1.
The Pull, which measures the difference between the prediction T(p) for a given set of parameters p and the observation \(\mathcal{O}\) in terms of the uncertainty constructed by adding experimental and theoretical errors in quadrature and ignoring correlations:Footnote 5
$$\begin{aligned} \mathrm{Pull}(p)_i = \frac{T(p)_i - \mathcal{O}_i}{\sqrt{\Delta _{exp,i}^2 + \Delta _{T(p)_i}^2}}. \end{aligned}$$(7)We will use the parameter sets, p, corresponding to the SM and the BF point in what follows.
-
2.
The pull difference will be used to compare the BF against the SM, quantified as
$$\begin{aligned} \Delta (\mathrm{Pull})_i= & {} |\mathrm{Pull}(\mathrm{SM})_i| - |\mathrm{Pull}(\mathrm{BF})_i|, \nonumber \\ \Delta _\sigma (\mathrm{Pull})_i= & {} \bigg |\sum _{j}\sigma ^{-1/2}_{ij} (T(SM) - \mathcal{O})_j\bigg | \nonumber \\&- \bigg |\sum _{j}\sigma ^{-1/2}_{ij} (T(BF) - \mathcal{O})_j\bigg |. \end{aligned}$$(8)where \(\sigma ^{-1/2}\) is the square root of the inverse of the full covariance matrix (including both experimental and theoretical errors evaluated at the SM point). Notice that these measures allow an explicit and systematic comparison between Pulls in the SM and BF scenarios. However, the distribution of \(\Delta (Pull)\) will not follow a \(\chi ^2\) distribution (but is measuring differences in units of the total uncertainty).Footnote 6 The absolute value ensures that a positive number indicates that the BF prediction is in better agreement with the observation, and a negative value signals better agreement of the SM prediction with the observation. \(\Delta (\mathrm{Pull})\) captures the improvement in matching the observed value for the BF as compared to the SM. Notice that \(\Delta (\mathrm{Pull})\) and \(\Delta _\sigma (\mathrm{Pull})_i\) have different connotations. While \(\Delta (\mathrm{Pull})\) measures the absolute preference for the BF over the SM for a given observable, \(\Delta _\sigma (\mathrm{Pull})_i\) corresponds to the difference in conditional Pull, taking into account the correlation with other observables.
3.2 Examining variations in the fit within \(S_{1\sigma }\)
We turn our attention to variations in the predictions, ignoring agreement with experiment. The goal is to associate specific observables with particular SVD directions (and therefore specific NP parameters). These metrics are thus constructed to single out large contributions to \(\Delta \chi ^2\) as the parameters move away from their best fit value. Several definitions are possible and we will use the following ones:
where the index i labels the observables and the different \(\delta \)s are all calculated for the SVD directions. The different definitions have the simple interpretations:
-
1.
\(\delta \) is a straightforward comparison between the points on the \(1\sigma \) ellipsoid and the best fit. The difference is quantified in terms of one standard deviation that ignores correlations.
-
2.
\(\delta ^\prime \) is a variation which takes into account the different theoretical errors at different points in parameter space. Note that \(\delta ^\prime \rightarrow \delta \) if the theoretical errors are the same. The comparison between \(\delta \) and \(\delta ^\prime \) thus contains information about the variation in the theoretical errors.
-
3.
\(\delta _{\sigma }\) is a generalization of \(\delta \) to include correlations between observables. Neglecting the dependence of the theoretical errors on the parameters, we define \(\delta _{\sigma ,i}\) such that
$$\begin{aligned} \sum _i\delta _{\sigma ,i}^2= & {} \sum _i \left( \sigma _{il}^{-1/2}(T_{pt}-T_{BF})_l\right) \nonumber \\&\times \left( \sigma _{ik}^{-1/2}(T_{pt}-T_{BF})_k\right) \nonumber \\= & {} \sum _{ij}(T_{pt}-T_{BF})_i\sigma ^{-1}_{ij}(T_{pt}-T_{BF})_j \end{aligned}$$(10)We interpret this as the conditional definition of \(\delta \).
-
4.
Finally, \(\tilde{\delta }_{\sigma }\) is a definition based on to the conditional varianceFootnote 7 which turns out to be approximately equal to \(\delta _{\sigma }\) for the points we discuss below.
A Table that illustrates the largest \(\delta \)’s for the different definitions is provided in Appendix B.
To provide some intuition for the interpretation of measures defined in terms of the covariance matrix, we present a short discussion of the two dimensional scenario in the Appendix D.
It is important at this stage to reiterate that the fit procedure of Ref. [1], which forms the basis of our analysis, uses a single covariance matrix with theory errors evaluated at the standard model point.
3.3 Correlation between observables
The covariance matrix is an important ingredient to the global fit which, at present, mainly includes correlated theory uncertainties. These theory correlations arise, for example, from definitions in terms of common coefficients, common form factor dependencies among observables or from common hadronic parameters. There are also important experimental correlations which we include when available, but note that not much information on error correlations has been released by the experimental collaborations. The resulting correlations between observables are illustrated in Fig. 3. This figure shows, for instance, the correlation between various \(B\rightarrow K^{\star }\mu ^+\mu ^-\) branching ratios (IDs 63-73), \(F_L\) and \(A_{FB}\) observables (a table listing all observables and their corresponding ID is presented in Appendix F). This is expected since these three observables are related by \(\frac{d\Gamma }{dq^2}\), the \(q^2\) distribution of the process [20].
We see also that large \(q^2\) bins are correlated between themselves, but not so much with lower \(q^2\) bins. This appears in the correlation map, for example, observables with IDs between 63 and 73 are highly correlated except for IDs 68 and 73 which are only correlated with each other. For LHCb measurements of angular observables (IDs 15-62) experimental correlations for each bin are strongest for \(q^2 \in [2.5-4]\) (IDs 31–38, for example, stand out in the experimental correlation map). See Appendix E for separate theoretical and experimental correlation maps.
4 Quantitative comparisons
4.1 Pull
Here we compare all measurements directly to the SM and the BF respectively in the top two panels of Fig. 4. We have highlighted in red those observables with Pull greater than 2, i.e. those where the theory prediction differs from the measured value by more than \(2\sigma \). The Pull(SM) metric simply updates known results from Ref. [2]: four of the largest over-predictions of the SM occur for BR and \(R_{K^{(\star )}}\) LHCb measurements, whereas the largest under-predictions occur for \(P_{5}^\prime \) measurements by three different experiments (IDs 44, 52, 108, 128).
The top-right panel of Fig. 4 presents the Pull(BF) metric for all observables, showing overall better agreement between predictions and measured values. We note however that several tensions remain. Two of the \(P_{5}^\prime \) measurements (IDs 108 and 128) are also above the BF prediction whereas an ATLAS \(P_{4}^\prime \) (ID 127) falls below both the SM and BF predictions by more than \(2\sigma \).
In the bottom two panels of Fig. 4 we show \(\Delta (\mathrm{Pull})\) for all observables. Recall that this metric captures the improved agreement between BF prediction and measurement, as compared to the SM prediction, measured in units of the total uncertainty, and negative values signal preference for the SM. The results are shown both ignoring correlations (lower left panel) and including them (lower right panel).
The majority of the points are clustered at small values of \(\Delta (\mathrm{Pull})\) (or \(\Delta _\sigma (\mathrm{Pull})\)), indicating insignificant resolving power between the SM and the BF. The distribution shows, however, that even among these points with small \(\Delta (\mathrm{Pull})\) there is an average preference for positive values. This is of course just the statement that the global fit prefers the BF to the SM, but the distribution shows how much of this overall preference is built from many small differences that go in the same direction.
Points with values of \(|\Delta (\mathrm{Pull})|\ge 1\) (left) and \(|\Delta _\sigma (\mathrm{Pull})|\ge 0.84\) (left) are highlighted in red in Fig. 4. The particular cutoffs for this are rather arbitrary. For individual pulls, we have a statistical interpretation since \(\Delta (\mathrm{Pull})\) is normalised to the total uncorrelated errors. When including correlations, we follow the argument sketched in Appendix B.
\(\Delta (\mathrm{Pull})\) simply combines the two upper panels of Fig. 4 to highlight the observables where the BF is a significant improvement over the SM. This exercise shows, for example, that branching ratio measurements and \(R_K\) which have a large \(\mathrm{Pull(SM)}\) also have the largest values of \(\Delta (\mathrm{Pull})\). On the other hand, the low-\(q^2\) bin of the \(R_{K^*}\) measurement, ID 99 has a large \(\mathrm{Pull(SM)}\) but not \(\Delta (\mathrm{Pull})\), indicating that the BF does not offer a significant improvement over the SM in this case. Amongst the angular observables, \(P_5^\prime \) measurements (ID 44, 52) stand out in both \(\mathrm{Pull(SM)}\) and \(\Delta (\mathrm{Pull})\), but the latter also highlights a comparable improvement over the SM in \(P_2\), ID 49.
To include the effect of correlations, we next compare \(\Delta (\mathrm{Pull})\) to \(\Delta _\sigma (\mathrm{Pull})\). The points that are singled out as large by both definitions are:
-
68: \(10^7 \times Br(B^0 \rightarrow K^{0*}\mu \mu )\) [15-19] LHCb
-
73: \(10^7 \times Br(B^0 \rightarrow K^{+*}\mu \mu )\) [15-19] LHCb
-
92: \(10^7\times Br(B_s \rightarrow \Phi \mu \mu ) \) [5-8] LHCb
-
93: \(10^7 \times Br(B_s \rightarrow \Phi \mu \mu )\) [15-18.8] LHCb
-
98: \(R_K(B^+ \rightarrow K^+)\) [1-6] LHCb
-
100: \(R_{K^*}(B^0 \rightarrow K^{0*})\) [1.1-6] LHCb
-
167: \(10^7\times Br(B \rightarrow K^* \mu \mu )\) [16-19] CMS-7
Of the observables in this list, (167) is the only one that has a large preference for the SM over the BF.
There are several observables highlighted in the bottom-left panel as showing \(1\le |\Delta (\mathrm{Pull})|\le 2\), that no longer stand out when correlations are included (bottom-right panel). This is the case for measurements of \(P_5^\prime \) in the last two bins of the low-\(q^2\) region (IDs 44 and 52). The largest value, \(\Delta (\mathrm{Pull})=1.3\), for a \(P_5^\prime \) observable is found for the LHCb measurement in the bin [4,6] (ID 44). When correlations are included this drops to \(\Delta _\sigma (\mathrm{Pull})=0.75\) and indeed no \(P_5^\prime \) observables are singled out in the bottom right panel.Footnote 8 The reason is that the correlations implied by the covariance matrix for this observable are more consistent with the SM than with the BF. This information is captured and penalised in the conditional definition \(\Delta _\sigma (\mathrm{Pull})\). Similar effects are insignificant for e.g. \(R_K\) which is found to have negligible correlation with the other observables. This of course does not imply that these observables are no longer relevant to the fit, as values of \(\Delta _\sigma (\mathrm{Pull})\) are still significantly larger than zero. In other words, when calculating the differences in \(\chi ^2\) between the SM and BF point after dropping a subset of observables, we find that the smaller set of observables singled out by \(\Delta _\sigma (\mathrm{Pull})\) already accounts for the largest differences, while a smaller additional reduction in \(\Delta \chi ^2\) is observed when removing the full set singled out by \(\Delta (\mathrm{Pull})\).
Another observation that results from this comparison is that there is a systematic offset in \(\Delta (\mathrm{Pull})\) for the BR measurements in observables 1–14, which is no longer present when considering \(\Delta _\sigma (\mathrm{Pull})\), indicating that the effect is well described by the covariance matrix.
4.2 The BF-SM direction at \(1\sigma \) from the BF
The quadratic approximation of Sect. 2 permits a quick estimate of the point within \(S_{1\sigma }\) that lies in the BF-SM direction, it has parameters
After evaluating the theory predictions at this point, we find it is exactly at \(\Delta \chi ^2=8.0\) from the BF. A comparison of the predictions at this point with both the SM and BF points displays the pattern observed as one moves from the SM towards the BF as can be seen in Fig. 5.
Comparing \(|\Delta _{\mathrm{SM}-1\sigma }(\mathrm{Pull})|\) to \(|\Delta (\mathrm{Pull})|\) in Fig. 4, we generally note reduced improvement over the SM as expected, except for observables 155 and 167 (where the reduced tension removes it as an outlier from Fig. 5). Recall these are measurements of \(Br(B\rightarrow K^* \mu ^+\mu ^-)\) in the high \(q^2\) region from CMS and ATLAS respectively. This behaviour illustrates some internal tension in the fit, the measured value of 155 is best accommodated by smaller NP contributions than found in the BF point, while still showing significant deviation from the SM prediction. Considering now the Pull difference with the BF point (right panel), we see that this direction is most constrained by two LHCb measurements of \(Br(B_s\rightarrow \Phi \mu ^+\mu ^-)\) (at intermediate \(q^2\), observables 91 and 92), the LHCb measurement of \(Br(B^0\rightarrow K^{+*}\mu \mu )\) at high \(q^2\) (observable 73) and of \(R_K\) (observable 98).
4.3 Fit uncertainty
Here we present a direct comparison between theory and experiment for all the observables. In addition to the usual theory and experimental errors, we include an estimate for the uncertainty in the fit. For this purpose we will follow the framework used to discuss uncertainties in global fits of parton distribution functions [21, 22]. The results are shown in Fig. 6 which shows the following for all the 175 observables listed in Appendix F:
-
The experimentally measured value of the observable with its error (black).
-
The SM prediction with the estimated theory error (green).
-
The BF prediction with the estimated theory error (brown).
-
Our estimate for the error in the fit calculated as the difference between the maximum deviations in theory predictions along one of the eigenvectors of H, within the \(1\sigma \) region (purple).
Figure 6 allows for a quick assessment of the overall picture. It is particularly interesting to study the prediction range within the \(1\sigma \) BF region (the error in the fit). For example considering observable 99 we see that the discrepancy between the measured value and theory predictions cannot be resolved anywhere within this region, while other measurements may be better explained by points within \(1 \sigma \) of the BF, for example the previously mentioned observable 155.
The observables with values \(|\delta _i|>1\) (or \(|\delta _{\sigma i}|>0.84\)) are listed separately in Table 3, showing which parameter directions are associated with large fit uncertainties. Several branching ratios, as well as \(R_{K^{(*)}}\) show large deviations in directions \(\pm 4,6\) which we saw in Table 2 refer mainly to \(C_{10}^{}\) and \(C_{9'}^{}\) respectively. Note that this is the case for observable 155 which thus points to smaller values of \(C_{10}^{}\) than found in the BF point, and negative values of \(C_{9'}^{}\). There are also a few large values along direction \(\pm 3\) which is dominated by \(C_{10^\prime }^{}\). Three \(P_1\) measurements at low \(q^2\) show large deviations along directions \(\pm 2\) which corresponds mostly to \(C_{7^\prime }^{}\). When correlations are included, two \(P_2\) measurements single out direction 5 which aligns mostly with \(C_{9}^{}\). Finally, the case of \(\delta ^\pm _1\) shows that the global fit permits a much larger variation of \(C_7^{}\) than the experimental constraint from \(B\rightarrow X_s\gamma \). It is important to recall at this stage that a large \(\delta \) means that the one-sigma region around the BF contains variations in the predictions for that particular observable along the specified direction that are much larger than its corresponding experimental or theoretical errors. The corresponding observables are thus strongly constraining the fit in the given direction.
A quick scan of Table 3 reveals that there are no large \(\delta \)s in the \(P_5^\prime \) observables. Given the interest in this observable, we examine it separately in Fig. 7 where we compare the different \(q^2\) bins as they show different behaviour. There is a somewhat large variation in direction 6 in the first bin, a larger variation with direction 5 for bins [4, 6] and [6, 8] and finally direction 3 is behind most of the variation for the last bin.
4.4 Ranking observables
Absolute values of \(\delta \), or rather \(\delta ^2\), tell us how much each observable contributes to constraining a given direction in parameter space. Therefore this ranking gives an indication of how the eigendirections get constrained in the global fit, and the hierarchy in values of \(\delta ^2\) allows us to judge the relative importance of each observable. Without correlations we list the largest five values of \(\delta ^2\) for each direction in Tables 4, 5. Direction one (mostly \(C_7^{}\)) for example, is mostly constrained by \(Br(B\rightarrow X_s\gamma )\) (ID 171), as can be seen both from Table 3 or Table 4. A somewhat different picture is obtained when taking into account correlations, see Tables 6, 7, and direct comparison of the two gives indications of which observables are most sensitive to such effects.
Direction two is constrained predominantly by low \(q^2\) bin observations of \(P_1\), and direction four is dominated by the single observable 98 (LHCb measurement of \(R_K\)), especially when taking into account correlation effects. A very different picture is observed in direction three, which does not exhibit a large hierarchy in \(\delta ^2\). This indicates that it is really the combination of multiple observables that constrains this direction. Observable 98 is found to be relevant in constraining direction six, for which we find that the list of most sensitive observables is similar to that found for direction three. Direction five (mostly \(C_9^{}\)) is not especially constrained by a single observable, as indicated by the absence of a particularly large \(\delta ^{5\pm }\). The largest \(\delta ^2\) in this case occurs for 57 or \(P_2\) (rather than \(P_5^\prime \) as one might have expected). Moreover, observable 57 is more constraining in direction \(5+\) than in \(5-\).
The most striking difference when including correlations occurs for observable 68 (a large \(q^2\) bin measurement of \(Br(B^0\rightarrow K^{\star 0}\mu ^+\mu ^-)\) by LHCb). While it appears in the first few positions in the rankings without covariance in several directions (3, 4, 5, 6), it drops below our cutoff when including covariance. As the correlation maps show, 68 is part of a group of highly correlated observables. It is in particular strongly correlated with observables 73 and 155, which also drop out of the rankings when including correlations.
4.5 Variance in the fit
The rankings of the previous section provide information about observables in the parameter space defined by the eigendirections. We have already seen that several observables are important in constraining multiple directions. An alternative way of looking at this information is to study which parameter combinations result in the largest variance in theory predictions. One approach is therefore to perform a principal component analysis (PCA) on the set of delta vectors. PCA is an orthogonal linear transformation onto a coordinate system such that the first basis direction is aligned with the maximum variance in the data, the second basis is the direction of maximum variation orthogonal to the first coordinate, and the remaining bases are sequentially computed analogously. It can be used for dimension reduction as the first few principal components (PCs) capture most of the information.
For this we consider each observable as one data point with 12 parameters, the values of \(\delta \) in the 12 shifted points. The first two principal components, for example, provide the directions with largest variations, and plotting the data points in these projections shows which observables dominate. Different information is captured by looking at each observable in isolation (using \(\delta \)) or in the context of correlations within the global fit (using \(\delta _{\sigma }\)), and we therefore reproduce this analysis for both cases.
For the PCA analysis the data should first be centered, i.e. the mean in each direction has to be subtracted. In our case, the mean values are close to zero so the effect of centering is not very large. We find very symmetric behavior: the main difference between plus/minus directions is just the sign of \(\delta \). This means that we can fully describe the 12 dimensional distribution in the space of the first six PCs. These six remaining PCs are found to contain considerable variance in the distribution: whereas the first PC explains 31% of the variance, the sixth one explains 8% when correlations are ignored. When correlations are kept the first PC explains 20% of the variance and the sixth one explains 13%. This suggests that all six dimensions (i.e. WCs) still allow for considerable variance in the predictions of the observables (recall that this is measured relative to the errors). The full rotation matrix transforming from delta space onto the first six PCs is given explicitly in Table 8. Notice the differences between the two rotation matrices, for example \(\delta _3\) (mostly \(C^{}_{10^\prime }\)) is an important contribution to PC2 based on \(\delta \), but not relevant in the first two PCs when considering \(\delta _\sigma \) (we can already observe the large reduction of variance in that direction by comparing the rankings in \(\delta _3^2\) of Tables 4 and 6).
We find that \(\delta _6\) is the only direction which exhibits strongly asymmetric behavior: for certain observables there are differences between the change in prediction in plus/minus directions, see Fig. 9 (left). This figure compares the values in the two directions of \(\delta ^6_{\sigma }\) (a similar but more crowded picture is found plotting \(\delta ^6\)), and shows as an extreme example observable 100, \(R_{K^\star }\), for which the theory prediction varies significantly along one direction but not the opposite. \(\delta ^{6-}\) is the only one of the twelve points with a large negative \(C_{9^\prime }\), and to a lesser extent \(C_{10^\prime }\).
We now focus on the first two PCs to study which directions and observables are responsible for the largest variation. To get an overview of the distribution of \(\delta \)s we show the projection of observables onto the first two principal components in Fig. 8 in the form of so-called biplots. These show the projected data points, as well as a visualisation of the projection in the form of labeled arrows pointing outwards from the center. This format makes it easy to relate directions on the projection to the original parameters.
When considering each observable in isolation (left view), clear trends can be observed. For example, observables aligned with direction \(6-, 5-\) and anti-aligned with direction \(5+\) are mainly branching ratio observations in bins of large \(q^2\) (e.g. IDs 68, 93, ...). There are differences in branching ratio observables depending on the final state: notably most observables with negative PC1 but positive PC2 correspond to decays into \(K^*\), while decays into \(K^+\) and \(K^0\) appear to take negative values in PC2 (e.g. IDs 98, 13, 14). Angular observables on the other hand show a very different behavior. For example observables 28, 41 and 44 are found to have the largest values of PC1. Large \(q^2\) bins are different, e.g. IDs 56, 57, 60, take small positive values in PC1 but large absolute values in PC2.
The picture changes drastically when considering correlations (right view), where the relevance of large \(q^2\) bins of branching ratio observables is no longer dominant. Note also the different effect that including the correlations has on the angular observables: for some of them the differences appear to be enhanced (e.g. IDs 16 and 49); yet others no longer stand out in this picture such as \(P_5'\) (IDs 44 and 60).
It is further instructive to assess the impact of covariance in a more direct fashion, to this end we introduce a difference in terms of the absolute values,
With the aid of this metric, most of the differences can be explained in two dimensions as the first two principal components capture about 80% of overall variation. We present the projection onto these first two PCs in Fig. 9 (right). This difference shows the same behaviour of BR observables already observed by comparing the results in Fig. 8. In addition we note large effects for a different group of observables, taking the largest positive values in PC1 and a range of values in PC2. At larger values of PC2 these are dominantly measurements of \(F_L\) (IDs 31, 39, 47, 148) and \(A_{FB}\) (ID 149), while a number of other observables stand out at low values of PC2.
4.6 Limitations of the Hessian approach
As discussed above and in Appendix A, the quadratic approximation is found to be a good description to the full fit, but does not capture all details.Footnote 9 Most notably, asymmetries present in the exact \(\chi ^2\) function are not captured in the approximation, and this leads to deviations of up to about 30% in \(\Delta (\chi ^2)\) for some directions. As a consequence several SVD points are not exactly \(1\sigma \) away from the BF point when measured with the exact \(\chi ^2\) function.
An alternative approach would be to use the Hessian for the identification of the SVD directions, but to define the SVD points by the intersection of said directions with the exact \(1\sigma \) surface. We have explicitly verified that we reach the same conclusions with both approaches, even though detailed quantitative results will of course be slightly different. A summary of these comparisons is given below.
The observables with \(\delta \) values above the cutoff are listed in Tables 3 and 11, where underlined (bold) indicates they are only above the cutoff when evaluating the SVD points in the approximation (exact calculation). Several differences are found, most notably along direction \(3^+\), as may be expected from the results in Table 9. In those cases values of \(\delta \) are close to the cutoff in both calculations, with differences typically smaller that 10%.
The change also affects the rankings shown in Tables 4, 5, 6 and 7, where differences in ordering when using the exact calculation to find the SVD points are given in bold. While we find the exact values of \(\delta \) in directions with the largest asymmetries to change notably (as expected), the effect in terms of hierarchies and ordering is very limited, and reordering only happens in a few instances where the absolute values of \(\delta \) are very similar in both approaches.
Finally we have also re-evaluated the PCA. We find that the projections onto the first two principal components are similar in terms of how each of the \(\delta \) directions contributes to the projection. Differences become more relevant beyond the first two principal components, since these directions carry less of the overall variance and thus are more sensitive to details. For the same reason these higher PC’s are not important in the discussion.
These results confirm that the quadratic approximation is appropriate for the description of the fit to the level performed in this paper.
5 Additional observables proposed to test lepton flavor universality
Several additional observables have been proposed in the literature to test lepton flavor universality by directly comparing the distributions in modes with muons to those in modes with electrons [23]. In this section we assess their likely future impact on the global fit, pinpointing which of these best constrain each eigendirection. A list of the 48 new observables with their corresponding ID is provided in Appendix F.
We begin with a direct comparison of the theoretical predictions for the SM along with their uncertainty (green), a theoretical prediction for the six dimensional BF (brown) and the uncertainty in the fit calculated as before (purple) in Fig. 10. Estimates of the experimental sensitivities expected for measurements of Q observables have been presented by the Belle II collaboration, see [24], Table 67.
Theoretical errors are larger for predictions for the BF point compared to SM predictions. This is because long distance non-perturbative effects, one of the main sources of hadronic uncertainties, cancel in the SM. In the presence of NP that distinguishes between muons and electrons, these uncertainties get reintroduced proportionally to the size of NP. A small set of observables have a pole in their \(q^2\)-spectrum, which causes predictions within bins containing the pole to become unstable and their errors to diverge. These particular observables, which show large uncertainties are shown separately in Fig. 10.
These two figures already indicate that most of these proposed observables will increase the constraining power of the global fit, as their uncertainties are much smaller than the current uncertainty in the fit. Of course, this will require measurements with experimental errors comparable in size (or smaller) than the corresponding theory errors.
We can be a bit more specific by associating with each of these observables the corresponding directions which they can constrain. This is shown in Fig. 11, quantified with the metric \(\delta _i\). The most sensitive observables to each of the 12 directions are marked in red and labelled in the figure.
There are a handful of observables that stand out regarding their potential to further constrain the parameter space: these are \(Q_1\), \(B_5\) and \(B_{6s}\). As a function of Wilson coefficients, \(Q_1\) only depends on \(C_{7',9',10'}\), and is one of the most stringent tests for the search of right-handed currents. Since the current fit is poorly constrained in the \(C_{9'}\) direction, this results in large values of \(\delta ^6\) for \(Q_1\) (IDs 1, 7, 13, 19, 25) in Fig. 11. The case of \(B_{5,6s}\) is rather different. These two observables, particularly in the very low-\(q^2\) region, provide direct access to \(C_{10}\) and this shows up as large \(\delta ^4\) (IDs 5, 6, 17, 18, 47, 48) in Fig. 11. Surprisingly, some large-\(q^2\) bins of these observables also show large \(\delta ^4\), at the same level of some of their low-\(q^2\) counterparts, suggesting high sensitivity to \(C_{10}\) also in this region.
This illustrates the promising opportunities for setting more precise constraints on the Wilson coefficients (as also stated in [25]). As a caveat we note that our observations could change when correlations are included.
6 Summary and conclusions
In this paper we establish a framework to quantify the importance of individual observables in the results of a global multidimensional fit. We apply our framework to the global fit to \(b\rightarrow s \ell ^+\ell ^-\) mediated observables with the six free parameters \(C^\text {NP}_7\), \(C^\text {NP}_{7^\prime }\), \(C^\text {NP}_{9\mu }\), \(C^\text {NP}_{9^\prime \mu }\), \(C^\text {NP}_{10\mu } \) and \(C^\text {NP}_{10^\prime \mu }\) of Ref. [1] (and for notational simplicity have dropped the indices NP and \(\mu \)).
We began with a direct visualisation of the one sigma region around the BF point and its position relative to the SM in six dimensions. We then provided a quadratic approximation to the global fit in parameter space based on the Hessian matrix of second derivatives at the BF point. This construction was used to find twelve points characterizing the \(1\sigma \) contour. These 12 points were used to assess the fit uncertainty for each observable in the fit. In addition they represent \(1\sigma \) shifts along well defined directions in parameter space and are thus representative of parameter directions. In Sect. 3 we defined quantitative metrics to evaluate the relative importance of each of the 175 observables to the global fit. Section 4 presented a systematic study of these measures, comparing both the SM to the BF point and the BF point to the set of 12 representative points, thereby illustrating the interplay between observable and parameter space. Throughout this discussion we have emphasised the role of correlated errors, which we found to be important in the evaluation of single observables. Finally Sect. 5 applied the same framework to assess the likely impact of future measurements on the global fit.
The coefficient \(C^\text {NP}_{9\mu }\) is of particular interest, as it has been singled out by lower dimensional fits, always finding a large deviation from the SM in this coefficient. Indeed among the six parameters considered in the global fit, only a negative \(C^\text {NP}_{9\mu }\) presents the correct patterns for explaining some of the most striking anomalies, increasing the predictions for the \(P_{5}^\prime \) observable while reducing the predicted values of \(R_K\) and \(R_{K^*}\), and it is therefore expected to play a major role in the global fit. Our visualisation of the six dimensional BF region confirms that the cloud of points within \(1\sigma \) of the BF is clearly separated from the SM mostly along the \(C_{9\mu }^\text {NP}\) direction. At one sigma from the BF, in the direction of the SM, \(C_{9\mu }^\text {NP}\) is still 60% as large and still the largest of the NP parameters. The importance of \(R_K\) and \(R_{K^*}\) can be appreciated in Fig. 5, which shows that after moving \(1\sigma \) from the BF towards the SM, these observables (98, 100) still stand out: they have a large \(\Delta \)(Pull) preference for the NP point and (especially 98) already shows a large Pull against this move from the BF.
Correlated uncertainties (which at present include mostly the ones in the theory), play an important role in the discussion. They reduce the preference for the BF over the SM for angular observables when considering conditional measures such as \(\Delta _\sigma (\mathrm{Pull})\) in place of the isolated metric Pull(SM), used exclusively in previous works. While Pull(SM) shows the largest discrepancies with the SM for BR, \(R_{K^{(\star )}}\) and \(P_{5}^\prime \) observables, we find that the preference of the BF over the SM for some of the discrepancies is less apparent with the inclusion of correlations. This is because patterns in deviations between predictions and measurements may be more/less consistent with expectations from correlated uncertainties depending on the parameter point, and this information is needed to understand the impact of each observable on the total \(\chi ^2\). While correlation effects are negligible for the considered \(R_{K^{(\star )}}\) observables, important correlations are found for the uncertainties of angular observables. In particular for the \(P_{5}^\prime \) they reduce the Pull difference between BF and SM point when measured by \(\Delta _\sigma (\mathrm{Pull})\). We further note that some of the \(P_{5}^\prime \) observables also disagree with the BF at more than the \(2\sigma \) level. As pointed out in [1], the anomaly in \(P_5^\prime \) is best described by a larger negative \(C_{9\mu }^\text {NP}\) (\(\simeq -1.8\)) than the one obtained in a global fit.
More generally our work illustrates some of the internal tensions in the fit, for example through a global discussion of the Pull for all observables. We also show for the first time a comprehensive picture of all observables entering the fit, comparing SM and BF predictions to the measured value, while also showing experimental, theoretical as well as fit uncertainties, see Fig. 6.
A main focus of this study was then to consider each of the eigendirections of the Hessian matrix to asssociate the most sensitive observables to corresponding directions in parameter space. For this discussion directions are labeled from most to least constrained, with each direction being dominated by one of the six considered Wilson coefficients, see Table 2. Our key findings for each direction can be summarized as follows (for ease of comparison we list ID numbers used in the figures with each observable):
-
Along the most constrained direction (direction one, corresponding to \(C_{7}^\text {NP}\)), the most sensitive observable is by far \(B\rightarrow X_s\gamma \) (171), which has a much smaller error than the variation allowed by the global fit as can be seen in Fig. 6. This is confirmed with large values of the \(\delta \) measures defined in Eq. 9.
-
Direction two (corresponding to \(C_{7^\prime }^\text {NP}\)) is also strongly constrained by a single observable, the low \(q^2\) measurements of \(P_1\) (16). Again, this is a feature in Fig. 6 where the fit allows a much larger uncertainty for 16 than its experimental (or theoretical) error, and confirmed by large values of \(\delta \).
-
A different picture is found for direction three (mostly \(C^\text {NP}_{10^\prime \mu }\)), for which several observables show comparable sensitivities, demonstrating that the constraint accumulates from many small contributions in the same direction. Note that in this case correlations become more important, as can be seen by comparing the rankings in \(\delta \) and \(\delta _{\sigma }\), and neglecting correlated contributions may result in overestimating the sensitivity of particular observables, most notably in this case for measurements of large \(q^2\) bins of \(Br(B \rightarrow K^\star \mu \mu )\) (IDs 68 and 155).
-
Direction four (mostly \(C^\text {NP}_{10\mu }\)) is particularly sensitive to the LFUV ratio \(R_{K}\) (98) and can be further probed by the proposed observables \(B_{5,6s}\). Note that within the current six dimensional fit the tension with the SM and the fit projection along this direction is about \(1\sigma \), making such future probes especially interesting.
-
Direction five (which mostly corresponds to \(C_{9\mu }^\text {NP}\)) exhibits a similar behavior of not being especially constrained by a single observable. Interestingly \(P_2\) (41, 49, 57) observables are found to be most sensitive to shifts in this direction, while neither \(R_K\) (98) nor \(R_{K^\star }\) (100) appear in our rankings. Measurements of \(P_{5}^\prime \) (44, 52) are found to be somewhat less sensitive than those of \(P_2\) in our comparison, indicating that the normalisation to the uncertainties is important here.
-
The least constrained direction (direction six) is mostly aligned with \(C^\text {NP}_{9^\prime \mu }\). In this case the parameter points of the \(+(-)\) directions are quite different, with \(C^\text {NP}_{9^\prime \mu } = 1.72 (-0.72)\). As a consequence we observe very different sensitivities between the ± directions, which is illustrated e.g. in the left panel of Fig. 9. Considering the fit uncertainties for future observables we find large potential of constraining this direction by measurements of \(Q_1\).
We have thus cataloged sensitivities of (current and future) observables and related them to specific direction in the six dimensional parameter space considered here, which can be used to inform future theoretical and experimental studies. Note also that the Hessian approximation given in this paper allows for quick estimates of \(\Delta \chi ^2\) close to the BF point, and the set of representative points listed in Table 9 can be used to estimate current fit uncertainties on additional observables not considered here.
Data Availability Statement
This manuscript has no associated data or the data will not be deposited. [Authors’ comment: The set of points in the 6D space was computed by the authors of Ref. [1] using a MCMC sampling. Since this sampling algorithm is private so are the data points. However, Ref. [1, 2] contain the information needed for reproducing the sampling method for obtaining an equivalent set of points.]
Notes
The experimental results used here were up to date as of February 2019.
The minimisation of the \(\chi ^2\) function leading to the BF above is performed by means of the Markov-Chain Monte Carlo Metropolis-Hastings algorithm. The small differences between the numbers quoted here and the results of Ref. [1] are a manifestation of the intrinsic error that a numerical minimisation routine always carries.
Differences in scale that describe how well each direction is constrained in the fit are thus eliminated, and they will be discussed separately below.
This Pull is defined for each observable, while the Pull quoted in Table 1 is defined in the parameter space of the six Wilson coefficients.
Alternatively, one may wish to define a measure comparing how much each measurement contributes to the total \(\chi ^2\) in each scenario, i.e. comparing quadratic Pulls rather than linear ones. We find that such a definition gives qualitatively similar results here (an explicit comparison is given in Appendix C). Since our aim is a direct comparison between SM and BF predictions and measured values (rather than statistical interpretation of the results), we will not consider such a definition here.
In the context of a multivariate Gaussian each individual variable will also be normal distributed, and the conditional variance for variable i (i.e. the variance given the values observed for the other variables) is \(1/\sigma ^{-1}_{ii}\).
This conclusion is robust with respect to details of how the covariance matrix is included in the computation of the measure, concretely we have verified that the overall picture remains the same when using a definition as given for \(\tilde{\delta }_{\sigma }\) in Eq. 9.
Note that this is expected by construction, the \(\chi ^2\) function of [1] is defined in the linearized or gaussian regime, see [2]. Asymmetries in the \(\chi ^2\) function are therefore induced by higher order terms in the expression of the observables as functions of the Wilson coefficients, which are expected to be small for \(|\mathcal {C}_i^\text {NP}|\lesssim 1\).
Note also that the selected basis in terms of Wilson coefficients is arbitrary, and different models will generate different patterns. While we do not consider it meaningful to map the directions to any high-scale model (they capture how parameter combinations are constrained by the data, while for modelling purposes the main interest is in the BF point) the directions are physically meaningful in how they enter predictions for the considered observables.
References
B. Capdevila, A. Crivellin, S. Descotes-Genon, J. Matias, J. Virto, Patterns of new physics in \(b\rightarrow s\ell ^+\ell ^-\) transitions in the light of recent data. JHEP 01, 093 (2018). arXiv:1704.05340
S. Descotes-Genon, L. Hofer, J. Matias, J. Virto, Global analysis of \(b\rightarrow s\ell \ell \) anomalies. JHEP 06, 092 (2016). arXiv:1510.04239
R. Alonso, B. Grinstein, J. Martin Camalich, Lepton universality violation and lepton flavor conservation in \(B\)-meson decays. JHEP 10, 184 (2015). arXiv:1505.05164
G. D’Amico, M. Nardecchia, P. Panci, F. Sannino, A. Strumia, R. Torre et al., Flavour anomalies after the \(R_{K^*}\) measurement. JHEP 09, 010 (2017). arXiv:1704.05438
W. Altmannshofer, C. Niehoff, P. Stangl, D.M. Straub, Status of the \(B\rightarrow K^*\mu ^+\mu ^-\) anomaly after Moriond 2017. Eur. Phys. J. C 77, 377 (2017). arXiv:1703.09189
W. Altmannshofer, P. Stangl, D.M. Straub, Interpreting Hints for Lepton Flavor Universality Violation. Phys. Rev. D 96, 055008 (2017). arXiv:1704.05435
L.-S. Geng, B. Grinstein, S. Jäger, J. Martin Camalich, X.-L. Ren, R.-X. Shi, Towards the discovery of new physics with lepton-universality ratios of \(b\rightarrow s\ell \ell \) decays. Phys. Rev. D 96, 093006 (2017). arXiv:1704.05446
A. Arbey, T. Hurth, F. Mahmoudi and S. Neshatpour, Hadronic and New Physics Contributions to \(B \rightarrow K^* \ell ^+ \ell ^-\). arXiv:1806.02791
M. Ciuchini, A.M. Coutinho, M. Fedele, E. Franco, A. Paul, L. Silvestrini et al., on flavourful easter eggs for new physics hunger and lepton flavour universality violation. Eur. Phys. J. C 77, 688 (2017). arXiv:1704.05447
Y.-M. Wang, Y.-L. Shen, Perturbative corrections to \(\Lambda _b \rightarrow \Lambda \) form factors from QCD light-cone sum rules. JHEP 02, 179 (2016). arXiv:1511.09036
W. Detmold, S. Meinel, \(\Lambda _b \rightarrow \Lambda \ell ^+ \ell ^-\) form factors, differential branching fraction, and angular observables from lattice QCD with relativistic \(b\) quarks. Phys. Rev. D 93, 074501 (2016). arXiv:1602.01399
LHCb collaboration, R. Aaij et al., Differential branching fraction and angular analysis of \(\Lambda ^{0}_{b} \rightarrow \Lambda \mu ^+\mu ^-\) decays. JHEP 06, 115 (2015). arXiv:1503.07138
S. Meinel, D. van Dyk, Using \(\Lambda _b\rightarrow \Lambda \mu ^+\mu ^-\) data within a Bayesian analysis of \(|\Delta B| = |\Delta S| = 1\) decays. Phys. Rev. D 94, 013007 (2016). arXiv:1603.02974
D. Cook, A. Buja, J. Cabrera, C. Hurley, Grand tour and projection pursuit. J. Comput. Graph. Stat. 4, 155–172 (1995). https://doi.org/10.1080/10618600.1995.10474674]
D. Asimov, The grand tour: a tool for viewing multidimensional data. SIAM J. Sci. Stat. Comput. 6, 128–143 (1985)
A. Buja, D. Cook, D. Asimov and C. Hurley, 14 - Computational Methods for High-Dimensional Rotations in Data Visualization, vol. 24 of Handbook of Statistics, pp. 391 – 413. Elsevier, Amsterdam (2005). https://doi.org/10.1016/S0169-7161(04)24014-7
D. Cook, E.-K. Lee, A. Buja, H. Wickham, Grand Tours, Projection Pursuit Guided Tours and Manual Controls, ch. III.2, p. 295–314. Springer Handbooks of Computational Statistics. Springer, New York (2008)
D. Cook, U. Laa, G. Valencia, Dynamical projections for the visualization of PDFSense data. Eur. Phys. J. C 78, 742 (2018). arXiv:1806.09742
S.S. Wilks, The large-sample distribution of the likelihood ratio for testing composite hypotheses. Ann. Math. Stat. 9, 60–62 (1938)
S. Descotes-Genon, T. Hurth, J. Matias, J. Virto, Optimizing the basis of \(B\rightarrow K^*ll\) observables in the full kinematic range. JHEP 05, 137 (2013). arXiv:1303.5794
J. Pumplin, D.R. Stump, W.K. Tung, Multivariate fitting and the error matrix in global analysis of data. Phys. Rev. D 65, 014011 (2001). arXiv:hep-ph/0008191
J. Pumplin, D. Stump, R. Brock, D. Casey, J. Huston, J. Kalk et al., Uncertainties of predictions from parton distribution functions. 2. The Hessian method. Phys. Rev. D 65, 014013 (2001). arXiv:hep-ph/0101032
B. Capdevila, S. Descotes-Genon, J. Matias, J. Virto, Assessing lepton-flavour non-universality from \(B\rightarrow K^*\ell \ell \) angular analyses. JHEP 10, 075 (2016). arXiv:1605.03156
Belle II collaboration, E. Kou et al., The Belle II Physics Book. arXiv:1808.10567
M. Algueró, B. Capdevila, S. Descotes-Genon, P. Masjuan, J. Matias, Are we overlooking Lepton Flavour Universal New Physics in \(b\rightarrow s\ell \ell \)?. arXiv:1809.08447
LHCb collaboration, R. Aaij et al., Differential branching fractions and isospin asymmetries of \(B \rightarrow K^{(*)} \mu ^+ \mu ^-\) decays. JHEP 06 (2014) 133. arXiv:1403.8044
LHCb Collaboration, R. Aaij et al., Angular analysis of the \(B^{0} \rightarrow K^{*0} \mu ^{+} \mu ^{-}\) decay using 3 fb\(^{-1}\) of integrated luminosity. JHEP 02, 104 (2016). arXiv:1512.04442
LHCb collaboration, R. Aaij et al., Measurements of the S-wave fraction in \(B^{0}\rightarrow K^{+}\pi ^{-}\mu ^{+}\mu ^{-}\) decays and the \(B^{0}\rightarrow K^{\ast }(892)^{0}\mu ^{+}\mu ^{-}\) differential branching fraction. JHEP 11, 047 (2016). arXiv:1606.04731
LHCb collaboration, R. Aaij et al., Angular analysis and differential branching fraction of the decay \(B^0_s\rightarrow \phi \mu ^+\mu ^-\). JHEP 09, 179 (2015). arXiv:1506.08777
LHCb collaboration, R. Aaij et al., Angular analysis of the \(B^{0} \rightarrow K^{*0} e^{+} e^{-}\) decay in the low-q\(^{2}\) region. JHEP 04, 064 (2015). arXiv:1501.03038
LHCb collaboration, R. Aaij et al., Test of lepton universality using \(B^{+}\rightarrow K^{+}\ell ^{+}\ell ^{-}\) decays, Phys. Rev. Lett. 113, 151601 (2014). arXiv:1406.6482
LHCb collaboration, R. Aaij et al., Test of lepton universality with \(B^{0} \rightarrow K^{*0}\ell ^{+}\ell ^{-}\) decays. JHEP 08, 055 (2017). arXiv:1705.05802
Belle collaboration, S. Wehle et al., Lepton-Flavor-Dependent Angular Analysis of \(B\rightarrow K^\ast \ell ^+\ell ^-\), Phys. Rev. Lett. 118, 111801 (2017). arXiv:1612.05014
ATLAS collaboration, Angular analysis of \(B^0_d \rightarrow K^{*}\mu ^+\mu ^-\) decays in \(pp\) collisions at \(\sqrt{s}= 8\) TeV with the ATLAS detector. Tech. Rep. ATLAS-CONF-2017-023, CERN, Geneva (2017)
CMS collaboration, Measurement of the \(P_1\) and \(P_5^{\prime }\) angular parameters of the decay \(\rm B\mathit{^0 \rightarrow \rm K}^{*0} \mu ^+ \mu ^-\) in proton–proton collisions at \(\sqrt{s}=8~\rm TeV\). Tech. Rep. CMS-PAS-BPH-15-008, CERN, Geneva (2017)
CMS collaboration, V. Khachatryan et al., Angular analysis of the decay \(B^0 \rightarrow K^{*0} \mu ^+ \mu ^-\) from pp collisions at \(\sqrt{s} = 8\) TeV. Phys. Lett. B753, 424–448 (2016). arXiv:1507.08126
CMS collaboration, S. Chatrchyan et al., Angular analysis and branching fraction measurement of the decay \(B^0 \rightarrow K^{*0} \mu ^+\mu ^-\). Phys. Lett. B 727, 77–100 (2013). arXiv:1308.3409
HFLAV collaboration, Y. Amhis et al., Averages of \(b\)-hadron, \(c\)-hadron, and \(\tau \)-lepton properties as of summer 2016, Eur. Phys. J. C 77, 895 (2017). arXiv:1612.07233
Heavy Flavor Averaging Group (HFAG) collaboration, Y. Amhis et al., Averages of \(b\)-hadron, \(c\)-hadron, and \(\tau \)-lepton properties as of summer 2014. arXiv:1412.7515
LHCb collaboration, R. Aaij et al., Measurement of the \(B^0_s\rightarrow \mu ^+\mu ^-\) branching fraction and effective lifetime and search for \(B^0\rightarrow \mu ^+\mu ^-\) decays. Phys. Rev. Lett. 118, 191801 (2017). arXiv:1703.05747
Heavy Flavor Averaging Group collaboration, D. Asner et al., Averages of \(b\)-hadron, \(c\)-hadron, and \(\tau \)-lepton properties. 1010.1589
BaBar collaboration, J. P. Lees et al., Measurement of the \(B \rightarrow X_s l^+l^-\) branching fraction and search for direct CP violation from a sum of exclusive final states. Phys. Rev. Lett. 112, 211802 (2014). arXiv:1312.5364
Acknowledgements
This work was supported in part by the Australian Government through the Australian Research Council. BC acknowledges financial support from the Grant FPA 2017-86989-P and Centro de Excelencia Severo Ochoa SEV-2012-0234. We thank Dianne Cook for help with R and with visualization and Joaquim Matias, Sébastien Descotes-Genon and Ulrik Egede for useful conversations. G.V. thanks the CERN theory group for their hospitality and partial support while this work was completed.
Author information
Authors and Affiliations
Corresponding author
Electronic supplementary material
Below is the link to the electronic supplementary material.
Appendices
Intersection of the one-sigma ellipsoid with its principal axes
SVD produces the twelve points shown in Table 9. The eigenvalues are numbered in decreasing order. For comparison we list in the table the \(\chi ^2\) difference between these points and the best fit calculated numerically and using the quadratic approximation where \(\Delta \chi ^2=7.1\) exactly (this is the condition used to find the twelve points). The column \(\Delta \chi ^2_\mathrm{exact}\) lists the number extracted from the code of Ref. [1]. The last column is used below to quantify a cut-off for ‘large’ \(\delta _\sigma \).
The small differences between the \(\Delta \chi ^2_\mathrm{exact}\) and 7.1 gives us an indication of uncertainty in our discussion, and we note how this is slightly different along the different SVD directions. For comparison purposes we show in Table 10 the same information with points chosen at the intersections of the SVD directions with the exact \(1\sigma \) surface.
Of course, the SVD directions are only defined in terms of the Hessian matrix, and the shape of the full \(1\sigma \) surface may differ significantly from the assumed hyper-ellipsoid. While in this case the true shape is adequately described by the approximation (see Fig. 2), some deviations occur, as illustrated in the left plot of Fig. 12 which shows direction 5 in the \(C_9 - C_{10}\) plane. In this case the asymmetry of the true \(1\sigma \) contour results in large differences (as quantified in Table 9) in \(\Delta \chi ^2\) between the quadratic approximation and the full fit. As summarised in Sect. 4.6, this does not significantly alter our conclusions.
Figure 12 (right) shows the projection onto \(C^{}_{9^\prime } - C^{}_{10^\prime }\) and illustrates the importance of the SVD directions. The correlations between parameter directions in the fit encode how combinations of parameters are constrained by the measurements. Important correlations are found between \(C^{}_{9^\prime }\) and \(C^{}_{10^\prime }\), and the shape is indeed captured by the SVD points shown in black. On the other hand, selecting the intersection of WC directions with the \(1\sigma \) surface will result in loss of information, the selected points would not be representative of the surface envelope.Footnote 10
Samples of largest deltas
The specific value that makes a given \(\delta \) large, and thus interesting, is arbitrary. Here we compare the different definitions introduced in Eq. 9. The definition of \(\delta \) and \(\delta ^\prime \) makes it reasonable to set the interest cutoff when they are equal to one, as this number corresponds to one standard deviation as measured by the total uncorrelated error. On the other hand, the definition \(\delta _\sigma \) can be related to the calculation of \(\Delta \chi ^2\) as shown in the last column of Table 9 with the quantity \(\sum _i\delta _\sigma ^2\). Since we are comparing points that lie within \(\Delta \chi ^2=7.1\) (in the quadratic or Hessian approximation), we define the cutoff for a ‘large’ \(\delta _\sigma \) in this case as \(\sqrt{0.71}\sim 0.84\). This choice singles out those observables that by themselves contribute 10% or more of the shift in \(\chi ^2\). Note that this approximation is not as good for directions \(5^+,6^\pm \), where a better cut-off might be 1. The subjectivity of this cut-off is mitigated with the introduction of the section with rankings in the text. A list of \(\delta \)s above these cutoffs are presented in Table 11.
The columns for \(\delta \) and \(\delta ^\prime \) are nearly identical because the estimate of theoretical errors varies little between the BF and the twelve SVD points. The sole exception shown in the Table for \(\delta ^{6+\prime }>1\) occurs for observable 20, for which \(\delta ^{6+}=0.57\). The difference arises from the estimate for the theoretical error, at the BF it is just over 60% larger than the corresponding estimate at the point \(6+\). Our numerical calculations imply that with the current level of precision in the parametrization of hadronic uncertainties (mainly form factors) the theoretical error estimates have a non-trivial dependence on the NP contributions to the Wilson coefficients. These errors are computed via a multivariate gaussian scan over all the nuisance parameters, i.e. form factors, decay constants, CKM matrix elements, etc. As explained in [2], a rescaling of the errors on the nuisance parameters is needed to ensure gaussian behavior. The rescaling factor is most relevant for some observables such as the \(Br(B\rightarrow K^*\mu \mu )\) computed for \(q^2\) bins exceeding 8 \(\text {GeV}^2\) (ID 164), while it has almost no impact on observables that are less sensitive to form factors. We have checked explicitly that once the factor ensuring gaussian behavior is found, the results are independent of the exact numerical choice for this rescaling. Finally, the reasonable agreement between the columns \(\delta _\sigma \) and \(\tilde{\delta }_\sigma \) confirms that the latter is a fair approximation to the former.
Quadratic differences between SM and BF Pull
In Sect. 4 we use the linear \(\Delta (Pull)\) to compare the SM and BF point Pull for each observable. As noted, we may consider a quadratic definition
which more accurately captures how each observable contributes differently to the \(\chi ^2\) function (in the absence of correlated uncertainties). Figure 13 shows \(\Delta _2(\mathrm{Pull})\) for each observable. Comparing to the corresponding results in Fig. 4 we note that the qualitative picture remains the same, but larger differences are emphasised when considering \(\Delta _2(\mathrm{Pull})\). Notably small systematic effects seen in Fig. 4 are no longer visible in Fig. 13. In addition, observable 14 no longer stands out when considering \(\Delta _2(\mathrm{Pull})\), which is because the Pull is close to one for the SM prediction, and close to zero for the BF prediction. In this case the details of the definition become important.
Interpretation of measures defined in terms of the covariance matrix
The definition of e.g. \(\Delta _{\sigma }\)(Pull) and similar measures is not immediately intuitive. Here we present the explicit expression in a two dimensional scenario to aid with the interpretation.
In general the variance-covariance matrix can be decomposed as as
where \(\Sigma _{ik} = diag(\sigma _1, \sigma _2, \ldots , \sigma _N)\) and \(\rho \) the correlation matrix, i.e. with entries \(\rho _{ij}\) equal to 1 if \(i=j\) and \(\rho _{ij} = \rho _{ji} \in [-1, 1]\). The inverse of the variance-covariance matrix is therefore given as
where \(\Sigma ^{-1} = diag(1/\sigma _1, 1/\sigma _2, \ldots , 1/\sigma _N)\) and in general the inverse correlation matrix is given via the adjugate matrix as
In two dimensions the correlation matrix and its inverse can be written as
Inverting the correlation matrix introduced a relative minus sign for the off-diagonal terms entering the \(\chi ^2\) function, which now takes the form
and we have introduced the shorthand notation \(D_i = T_i - O_i\) for the difference between theory prediction and observed value for observable i. We can recover the uncorrelated definition by setting \(\varrho =0\). Alternatively, in the presence of correlations, the overall factor \(\frac{1}{1-\varrho ^2}\) captures the additional information content available in that case. We see that the third term in Eq. 18 will depend on the signs of the \(D_i\) relative to \(\rho \). If both theory predictions for the considered parameter point differ from the observed value in the same direction, i.e. the \(D_i\) are either both positive or both negative, the sign of the third term will be determined by that of the correlation coefficient \(\varrho \), leading to two possible scenarios:
-
\(\varrho > 0\): This indicates that the patterns in \(D_i\) are consistent with the correlation obtained by varying the nuisance parameters, as a result the overall value of the \(\chi ^2\) function is reduced.
-
\(\varrho < 0\): Negative correlation is not consistent with our assumptions about the \(D_i\), this situation will therefore result in an overall increase in the value of the \(\chi ^2\) function.
Analogous considerations hold if the \(D_i\) have opposite sign, where \(\varrho < 0\) will result in reduced values of \(\chi ^2\) and \(\varrho > 0\) will increase the value.
We now want to capture this information on observable-by-observable level, e.g. in our definition of the Pull. To avoid working with the square root of the matrix let us consider the following definition for the Pull of observable i (corresponding to using \(\tilde{\delta }_\sigma \)):
For our 2-d example, for the first observable we obtain
Indeed this results in the same behaviour observed for the \(\chi ^2\) function, i.e. for \(\varrho =0\) we recover the uncorrelated definition, elsewise the total absolute value of the Pull will be increased/decreased if the pattern in the \(D_i\) is inconsistent/consistent with that of the correlation matrix \(\rho \), with the additional contribution being weighted according to the correlation obtained via the nuisance parameters.
Notice that while the factor used in the definition of the Pull is somewhat arbitrary, here we recover the \(\chi ^2\) function as
i.e. in analogy to our definition without correlation, where \(\tilde{\sigma }_i = 1/\sqrt{\sigma _{ii}^{-1}}\) the conditional variance. We recall that numerically the results of \(\delta _{\sigma }\) defined in terms of the square root of the covariance matrix, and those found for \(\tilde{\delta }_{\sigma }\) are similar, though the exact expression is more complicated already for the two dimensional scenario.
Finally we note that when going beyond two dimensions additional effects need to be taken into account, such as simultaneous correlation between groups of observables. While this yields somewhat more complicated expressions (obtained via the inversion of higher dimensional matrices), the overall picture is similar to that shown by the two dimensional example.
Experimental and theoretical correlation matrices
We reproduce here the information in Sect. 3.3 but showing separately the theoretical and experimental correlations between observables, see Fig. 14.
Observables used in the fit
In Tables 12, 13 we list the 175 observables included in the fit along with their corresponding ID used throughout our paper. The additional observables that have been proposed in the literature are listed in Table 14.
Rights and permissions
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
Funded by SCOAP3.
About this article
Cite this article
Capdevila, B., Laa, U. & Valencia, G. Anatomy of a six-parameter fit to the \(b\rightarrow s \ell ^+\ell ^-\) anomalies. Eur. Phys. J. C 79, 462 (2019). https://doi.org/10.1140/epjc/s10052-019-6944-8
Received:
Accepted:
Published:
DOI: https://doi.org/10.1140/epjc/s10052-019-6944-8