Constraints on tensor and scalar couplings from $B\to K\bar\mu\mu$ and $B_s\to \bar\mu\mu$

The angular distribution of $B\to K\bar\ell\ell$ ($\ell = e,\,\mu,\,\tau$) depends on two parameters, the lepton forward-backward asymmetry, $A_{\rm FB}^\ell$, and the flat term, $F_H^\ell$. Both are strongly suppressed in the standard model and constitute sensitive probes of tensor and scalar contributions. We use the latest experimental results for $\ell = \mu$ in combination with the branching ratio of $B_s\to \bar\mu\mu$ to derive the strongest model-independent bounds on tensor and scalar effective couplings to date. The measurement of $F_H^\mu$ provides a complementary constraint to that of the branching ratio of $B_s\to \bar\mu\mu$ and allows us---for the first time---to constrain all complex-valued (pseudo-)scalar couplings and their chirality-flipped counterparts in one fit. Based on Bayesian fits of various scenarios, we find that our bounds even become tighter when vector couplings are allowed to deviate from the standard model and that specific combinations of angular observables in $B \to K^*$ are still allowed to be up to two orders of magnitude larger than in the standard model, which would place them in the region of LHCb's sensitivity.


Introduction
With the analysis of the data collected by the LHCb Collaboration during run I at the Large Hadron Collider (LHC), we now have access to rather large samples of rare B-meson decays with branching ratios below 10 −5 . As a consequence, angular analyses of three-and four-body final states can be used to measure a larger number of observables than previously possible at the B factories BaBar and Belle. In this work we focus on rare B decays driven at the parton Preprint: EOS-2015-03, FLAVOUR(267104)-ERC-107 a e-mail: frederik.beaujean@lmu.de b e-mail: christoph.bobeth@ph.tum.de c e-mail: sjahn@mpp.mpg.de level by the flavor-changing neutral-current (FCNC) transition b → s¯ that constitutes a valuable probe of the standard model (SM) and provides constraints on its extensions.
The angular distribution of B → K¯ -normalized to the width Γ -in the angle θ between B and − as measured in the dilepton rest frame is LHCb analyzed their full run 1 data set of 3 fb −1 and measured the angular distribution of the mode B + → K +μ µ, i.e. = µ [1], with unprecedented precision. They provide the lepton-forward-backward asymmetry A µ FB and the flat term F µ H in CP-averaged form and integrated over several bins in the dilepton invariant mass q 2 . Similarly, the CP-averaged branching ratios, B µ = τ B Γ µ , [2] and the rate CP asymmetry A µ CP [3] are also available from 3 fb −1 . Both angular observables, F H and A FB , exhibit strong suppression factors for vector and dipole couplings present in the SM, thereby enhancing their sensitivity to tensor and scalar couplings [4,5]. A similar enhancement of scalar couplings compared to helicity-suppressed vector couplings of the SM is well-known from B s →μ µ. Unfortunately the limited data set of B → K * (→ Kπ)¯ from LHCb [6] did not yet allow to perform a full angular analysis without the assumption of vanishing scalar and tensor couplings in this decay mode. In the future with more data or special-purpose analysis techniques like the method of moments [7], certain angular observables in B → K * ¯ will provide additional constraints on such couplings, as for example J 6c [8] and the linear combinations (J 1s − 3J 2s ) and (J 1c + J 2c ) [5,9] as well as the experimental test of the relations H T and J 7 = 0 [5] at low hadronic recoil.
Here we exploit current data from B + → K +μ µ and B s →μ µ to derive stronger constraints than before on tensor arXiv:1508.01526v1 [hep-ph] 6 Aug 2015 and scalar couplings in various model-independent scenarios and study their impact on the not-yet-measured sensitive observables in B → K * ¯ . In Section 2, we specify the effective theory of |∆ B| = 1 decays on which our modelindependent fits are based. Within this theory, we discuss the dependence of observables in B → K¯ and B → K * ¯ on the tensor and scalar couplings in Section 3 and specify also the experimental input used in the fits. The constraints on tensor and scalar couplings from the data are presented for several model-independent scenarios in Section 4. Technical details of the angular observables in B → K * ¯ , the branching fraction of B s →μ µ, the treatment of theory uncertainties, and the Monte Carlo methods used are relegated to appendices.

Effective Theory
In the framework of the |∆ B| = |∆ S| = 1 effective theory and tensor (i = T, T 5) operators where the notation O T 5 = i/2 ε µναβ sσ µν b ¯ σ αβ is also used frequently in the literature. The respective shortdistance couplings, the Wilson coefficients C ( ) i (µ b ) are evaluated at a scale of the order of the b-quark mass µ b ∼ m b and can be modified from SM predictions in the presence of new physics.
The SM values C ( ) 7,9,10 are obtained at next-to-next-to leading order (NNLO) [10,11] and depend on the fundamental parameters of the top-quark and W -boson masses, as well as on the sine of the weak mixing angle. Moreover, they are universal for the three lepton flavors = e, µ, τ. All other Wilson coefficients are numerically suppressed or zero: C SM 7 = m s /m b C SM 7 , C ,SM S,S ,P,P ∼ m b m /m 2 W , and C ,SM 9 ,10 ,T,T 5 = 0. The Wilson coefficients of the four-quark current-current and QCD-penguin operators as well as of the chromomagnetic dipole operators are set to their NNLO SM values at µ b = 4.2 GeV [10,11].
For the rest of this article, we will suppress the leptonflavor index on the Wilson coefficients C i → C i and operators O i → O i . In Section 4 we exploit data with = µ only, hence all derived constraints apply in principle only to the muonic case but can be carried over to the other lepton flavors = e, τ for NP models that do not violate lepton flavor. In general, the Wilson coefficients are decomposed into SM and NP contributions C i = C SM i + C NP i but often we will use C i for Wilson coefficients with zero (or suppressed) SM contributions synonymously with C NP i .

Observables and experimental input
The full dependence of F H and A FB on tensor and scalar couplings has been presented in [4,5], adopting the effective theory (2.1), i.e. neglecting higher-dimensional operators with dim ≥ 8. These results imply that for SM values of the effective couplings Hence, for = e, µ both observables are quasi-null tests. The flat term F H (q 2 )| SM is strongly suppressed by small lepton masses for the considered kinematic region 1 ≤ q 2 ≤ 22 GeV 2 [4,12,13]. Nonzero values of A FB | SM can be induced by higher-order QED corrections, which will modify the simple cos θ dependence of the angular distribution (1. 2s [15,19] ×10 −7 C NP 9 = 0 C NP 9 = −1.1 Fig. 1 The sensitivity to the tensor coupling Re(C T ) of F µ H and B µ in B + → K +μ µ as well as B µ , (J 1c + J 2c ), and (J 1s − 3J 2s ) in B 0 → K * 0μ µ. Angular observables are rescaled by the lifetime of the B meson, τ B . The bands represent the theory uncertainties at 68% and 95% probability of the prior predictive. Two sets of bands are shown for C NP 9 = 0 (blue) and C NP 9 = −1.1 (red). If available, the gray band indicates the latest 68% confidence interval reported by LHCb. All observables are integrated over q 2 ∈ [q 2 1 , q 2 2 ] bins denoted as . . . [q 2 1 , q 2 2 ] to match LHCb.
There are some angular observables J i in B → K * (→ Kπ)¯ with the same properties; i.e. tensor and scalar contributions are kinematically enhanced by a factor q 2 /m over vector ones present in the SM or their respective interference terms. These are J 6c and the two linear combinations (J 1s − 3J 2s ) and (J 1c + J 2c ) with explicit formulas given in Appendix A. In our fits and predictions we include all kinematically suppressed terms. But for the purpose of illustration, we now consider the analytical dependence for vanishing lepton mass. In this limit, is sensitive to the interference of tensor and scalar operators, i.e., with an interchange of tensor contributions T ↔ T 5. We note also that J 6c contributes to the lepton forward-backward asymmetry of B → K * ¯ being ∝ (J 6s + J 6c /2). Since it has to compete with J 6s in this observable, a separate measurement of J 6s and J 6c is necessary. Only tensor contributions enter where the dots indicate different kinematic and form-factor dependencies. But tensor and scalar contributions enter which is similar to the dependence of F H in B → K¯ (3.6) Concerning F H , the involved kinematic factors-see [4,5]-are such that tensor and scalar couplings contribute only constructively/cumulatively, apart from cancellations among C S(P) and C S (P ) . Interference terms in the numerator of F H of the form (C T × C 7,7 ,9,9 ) and (C P,P × C 10,10 ) are suppressed by m / q 2 . They become numerically relevant in case C T C 7,7 ,9,9 or C P,P C 10,10 where the smallness of C T,P,P is of the same level as the suppression factor m / q 2 accompanying the large vectorial SM Wilson coefficients C SM 9,10 ∼ ±4. This implies, however, no large enhancement of F H over the SM prediction.
On the one hand, the observables F H (3.6) and A FB (3.3) are measured in the angular distribution (1.1) of B → K¯ normalized to the decay width Γ such that uncertainties due to form factors can cancel in part [4,5]. On the other hand, J 6c , (J 1s − 3 J 2s ), and (J 1c + J 2c ) appear in the unnormalized angular distribution of B → K * ¯ . "Optimized" versions S 1 , M 1 , and M 2 for the low-q 2 region for which form factors cancel in the limit of m b → ∞ have been identified in [9]. For the high-q 2 region, potential normalizations are discussed in Appendix A, which could serve to form optimized observables for special scenarios of either vanishing chirality-flipped vector or tensor or scalar couplings. In the most general case, however, there are no optimized observables at high q 2 . Although form factors do not cancel in this case, it might still be preferable to use normalizations, for example when the overall normalization of B → K * form factors constitutes a major theoretical uncertainty.
To illustrate the sensitivity of F H to tensor couplings, we compare it in Figure 1 to the branching ratios of B → K ( * )¯ for = µ, integrated over one low-q 2 and one high-q 2 bin. The details of the numerical input and the uncertainty propagation can be found in Appendix B and Appendix C. In light of the hint of new physics in C 9 from recent global analyses of b → s(γ,¯ ) data [26][27][28][29][30], we show predictions for C NP 9 = −1.1 in addition to C NP 9 = 0. From Figure 1, the highest sensitivity to tensor couplings of any B → K observable is attained by F µ H at high q 2 due to a partial cancellation of form factors [5]. If the experimental uncertainty could be reduced further, F µ H would give a very strong constraint on a simultaneous negative shift in C NP 9 and C T . The prediction of B(B → Kμ µ) is essentially insensitive to C T but sensitive to C NP 9 . A stronger impact on global fits, however, would require a reduced theory uncertainty.
The observable B(B → K * μ µ) shows moderate dependence on C T at least at low q 2 and has some impact on the constraints on tensor couplings as will be discussed in Section 4. At the moment, theory and experimental uncertainty are of similar size.
(J 1c + J 2c ) is sensitive to C T in both q 2 regimes. At low q 2 , it is mildly affected by C NP 9 , whereas at high q 2 it is unaffected. Regarding (J 1s − 3J 2s ), the situation is reversed: here the strong dependence on C T appears at low q 2 . Overall, F µ H , (J 1c + J 2c ) at high q 2 , and (J 1s − 3J 2s ) are sensitive to C T and theoretically very clean around C T = 0.
From the available measurements, F µ H at high q 2 currently provides the most stringent constraints on the size of tensor couplings. Moreover, the dependence on vector couplings is such that C NP 9 0 leads to stronger constraints on C T than C NP 9 ≈ 0. Important additional constraints on scalar couplings come from the branching ratio of B s →μ µ as given in (A.8). It provides the most stringent constraints on the moduli |C S − C S | and |C P − C P | and further depends only on (C 10 − C 10 ). Thus it is complementary to F H in B → K¯ ; see Eq. (3.6).

Channel
Constraints Kinematics Source [14.18, 16], [> 16] GeV 2 [20] Table 1 List of all observables of the various b → sμ µ decays entering the fits with the respective kinematics and experiments that provide the measurements. LCSR and lattice results of B → K ( * ) form factors are used to constrain a q 2 -dependent form-factor parametrization. For more details see Section 3 and Appendix B.
Eventually we also explore the effect of interference with NP contributions in the vector couplings C 9, 9 , 10, 10 on the bounds on tensor and scalar couplings. For this purpose we include also the branching ratio, the lepton forwardbackward asymmetry, and the rate CP asymmetry of B → K * μ µ as they provide additional constraints on the real and imaginary parts of C 9, 9 , 10, 10 . The experimental input of all observables entering our fits is listed in Table 1 together with input for the B → K ( * ) form factors. More details on the latter can be found in Appendix B.

Fits and constraints
There are no discrepancies between the latest measurements for = µ (throughout this section) of F µ H and A µ FB in B + → K +μ µ and their tiny SM predictions; cf. Figure 1. Thus our main objective is to derive constraints on tensor and scalar couplings through the enhanced sensitivity of both observables to these couplings compared to vector couplings. For this purpose, we will consider several model-independent scenarios, progressing from rather restricted to more general ones in order to asses the effect of cancellations due to interference of various contributions.
For each coupling that we vary in a fit, we remain as general as possible, treat it as a complex number and use the Cartesian parametrization assuming uniform priors for ease of comparison with previous studies. Specifically, we set Re(C S,S ,P,P ,T,T 5 ) ∈ [−1, 1], Re(C 9,9 ,10,10 ) ∈ [−7, 7], (4.1) and the same for the imaginary parts. The priors of the nuisance parameters are given in Appendix B.
We start with the scenario of only tensor couplings and see that they are well constrained by F µ H alone. In a second scenario we consider only scalar couplings in order to investigate the complementarity of F µ H and B s →μ µ. Here we find that-for the first time-all complex-valued scalar couplings can be bounded simultaneously by the combination of both measurements. Finally we consider as a special scenario the SM augmented by dimension six operators as an effective theory of new physics below some high scale Λ NP assumed much larger than the typical scale of electroweak symmetry breaking. In addition, the model contains one scalar doublet under SU(2) L as in the SM. For each scenario, we also investigate interference effects with new physics in vector couplings C 9,9 ,10,10 . Finally, we conclude this section with posterior predictions-conditional on all experimental constraints-of the probable ranges of the not-yet-measured angular observables J 6c , (J 1c + J 2c ) and

Tensor couplings
In a scenario with only complex-valued tensor couplings C T, T 5 , the experimental measurement of F µ H constrains the combination |C T | 2 + |C T 5 | 2 , up to some small interference of C T with vector couplings C 7, 7 , 9, 9 ; cf. (3.6). The according 68% (95%) 1D-marginalized probability intervals are listed in the second column of Table 2. From the third column, it is seen that the constraints become tighter when utilizing all observables in Table 1 sitivity of the branching ratio of B → K * μ µ to tensor couplings (see also Figure 1). The latter stronger bounds are driven by the new lattice results of B → K * form factors that predict values above the measured ones [31]. Since tensor couplings contribute constructively to B(B → K * μ µ), large values are better constrained. In this scenario with vanishing scalar couplings, current measurements of A µ FB (B + → K +μ µ) barely provide any constraint; cf. (3.3).
We also perform a fit with nonzero C NP 9, 10 in order to assess the robustness of the bounds with respect to interference. Note that C 7,7 appears in linear combinations with C 9,9 such that its interference with tensor and scalar couplings is captured implicitly by allowing new physics in C 9,9 . Thus we fix C 7,7 to the SM value without loss of generality. In this case, F µ H by itself still provides bounds on |C T, T 5 | that are weakened by a factor of two since F µ H does not pose constraints on C 9, 10 (in the chosen prior range). Once additional experimental measurements of Table 1 are taken into account, the potential destructive effects of new physics in C 9, 10 become reduced and almost the same constraints on C T, T 5 are recovered, as shown in the last column in Table 2. If in addition we allow C S,S ,P,P = 0 (not shown in Table 2), the credible regions further shrink by about 10%, which we attribute to the cumulative effect of C S,S ,P,P = 0 in F H ; cf. (3.6). In summary, the F µ H measurement [1] of LHCb with 3 fb −1 shrinks the previous bounds [5] on C T, T 5 by roughly 50%.
A keen observer may notice that in Table 2 the SM point C T,T 5 ≡ 0 is contained in every 68% region in Cartesian coordinates but never in even the 95% region in polar coordinates 2 . This is a consequence of the general concentration of measure. Another way to look at it is to transform the uniform prior density from Cartesian to polar coordinates. For the example of a single Wilson coefficient, say C T , the transformed density is proportional to the determinant of the Jacobian which is |C T |. Since the Cartesian prior boundaries are much larger than the regions of high likeli-2 But the bin with lower edge C T,T 5 = 0 is always in the 99% region. hood, one could think the value of the boundary is irrelevant, but in fact it determines the peak of the prior in polar coordinates. In other words, the uniform prior on Re(C T ) and Im(C T ) favors larger values of |C T | even though we consider it consensus in the community that smaller rather than larger values are reasonable because C T = 0 in the SM. We suggest therefore that the default treatment be revised in the future to include available prior knowledge.

Scalar couplings
Scalar couplings C S, S , P, P enter F H without kinematic suppression-see (3.6)-as the sum (C i + C i ) whereas in the time-integrated branching ratio B(B s →¯ ) they appear as the difference (C i − C i ), i = S, P. Since the existing measurement of F µ H constrains the sum, the combination of F µ H and B(B s →μ µ) allows us-for the first time-to bound the real and imaginary parts of all four couplings. The corresponding 2D-marginalized regions in the Re(C i ± C i ) (i = S, P) planes are shown in Figure 2. The corresponding plots for Im(C i ± C i ) are very similar to those shown and thus omitted. These bounds do not change when including all other data in Table 1, since the A µ FB (B → Kμ µ) requires interference of scalar with tensor couplings and other observables are not very sensitive to scalar couplings. Quantitatively, the constraint from B(B s →μ µ) on (C i − C i ) is about a factor four to five stronger than the one of F µ H on Interference terms of C P, P with vector couplings might weaken these bounds. For B(B s →μ µ), the relevant term is (C 10 −C 10 ) (see (A.7)) and for F H it is (C 10 +C 10 ) [4]; both are suppressed by the factors m µ /M B s and m µ / q 2 , respectively. Nevertheless, these terms become important for small C P, P due to the large SM value of C SM 10 −4.2. We compile bounds on complex-valued scalar couplings in Table 3 Table 1 as well as nonzero C 9, 9 , 10, 10 (red solid) at 68% (darker) and 95% (lighter) probability. The constraints on Re(C P ± C P ) are identical to Re(C S ± C S ), apart from a small translation of the contours by (+0.2, +0.15). The SM prediction is indicated by the black diamond.  Table 3 The 1D-marginalized constraints on complex-valued C S, S , P, P at 68% (95%) probability from measurements of only F µ H , only B(B s → µ µ), and all the data in Table 1 and additional new physics contributions to C 9, 9 , 10, 10 . however their combination is capable to do so and moreover, the bounds are stable against destructive interference with vector couplings. In the case of C NP 9,9 ,10,10 = 0, we find the following bounds with 68% (95%) probability:  Allowing in addition C T,T 5 = 0, the intervals of (4.2) are quite similar but in general (10-20)% narrower and shifted by that amount towards zero. Again, this can be explained by the cumulative effect of C S,S ,P,P and C T,T 5 in F H shown in (3.6).
In the special case of real-valued couplings, B(B s →¯ ) would lead to rings [32] instead of circles in Figure 2. Our results improve and extend previous bounds in the literature to the most general case of complex-valued couplings. For example they are a factor two to five more stringent than [33] and comparable to [34] once restricting to the simpler scenarios considered there.

SM-EFT-constrained scalar couplings
In the following we consider a scenario in which it is assumed that there is a sizable hierarchy between the electroweak scale and the new-physics scale, Λ NP , and that the SM gauge symmetries SU(2) L × U(1) Y are only broken at the electroweak scale. This results in the augmentation of the SM by dimension-six operators that respect the SM gauge group and are composed of SM fields only. Such a scenario becomes more and more viable for two reasons. The first is the discovery of a scalar resonance at the LHC in agreement with all requirements of the Higgs particle in the SM. The second is the steadily rising lower bound on the mass of new particles reported by ATLAS and CMS in various more or less specific models.
A nonredundant set of dimension-six operators of this effective theory (SM-EFT) that requires a linear realization of the electroweak symmetry was given in [35]. The matching of the SM-EFT to the effective theory of ∆ B = 1 decays (2.1) at the scale µ ∼ m W of the order of the W -boson mass was performed for vector couplings C 7, 9, 10 in [36]. The matching of tensor (2.4) and scalar (2.3) operators [32] shows that SM gauge groups in conjunction with the linear representation impose the relations on scalar couplings and require tensor couplings to be suppressed to the level of dimension-eight operators. In consequence only two scalar couplings C S, S arise that scale as where v ∝ m W denotes the scale of electroweak symmetry breaking.
It must be noted that the relations (4.3) are a consequence of embedding the Higgs in a weak doublet along with the Goldstone bosons. For example, choosing a nonlinear representation of the scalar sector allows additional dimension-six operators in the according effective theory, such that the couplings C S,S ,P,P are all independent and tensor operators have nonvanishing couplings already at dimension six [37].
Omitting for the sake of simplicity terms of order m 2 /M 2 B s and m s /m b , the couplings C S, S can be bound from [32] In similar spirit, dropping terms of order m 2 /q 2 and m s /m b gives (4.5) In the SM-EFT no relations between C 10 andC 10 arise, so they are in general additional independent parameters. Here we find that destructive interference with contributions involving C 10, 10 does not significantly alter the bounds on C S, S . The results of two fits are shown in Figure 3. In the first fit, we set C NP 10,10 = 0 and include all constraints on B → Kμ µ and B s →μ µ. In the second fit, we allow C NP 10,10 = 0 and further include all B → K * μ µ constraints from Table 1. For both fits, all six 2D marginals of real and imaginary parts of C S vs. C S have nearly circular contours of equal size that contain the SM point at the 68% level except for Re(C S ) vs. Re(C S ) where it is within the 95% credible region. The regions hardly vary between the two fits.
Since we consider here complex-valued couplings the allowed regions are circles rather than rings as for the case of real-valued couplings [32]. Compared to those rings, the circles are smaller because the probability moves from the ring towards the center of the circle.

Tensor, scalar, and vector couplings
The most general fit of complex-valued tensor and scalar couplings C S,S ,P,P ,T,T 5 in combination with vector couplings C 9, 9 , 10, 10 -the combination of Section 4.1 and Section 4.2-yields bounds very similar to those in tables 2 and 3. The changes are only small and in fact the bounds tend to be even more stringent because tensor and scalar couplings can contribute to F µ H only constructively-see (3.6). As before, branching-ratio measurements of B → K * μ µ help to improve the constraints on C T, T 5 . This demonstrates that even in the case of complex-valued couplings there is enough information in the data to bound all 20 real and imaginary parts.

Angular observables in B → K * ¯
Now we discuss what the fits tell us about likely values of observables that have not been measured yet but have sensitivity to tensor and scalar couplings. In the B → K * (→ Kπ)¯ decay, we again consider (J 1s − 3 J 2s ) and (J 1c + J 2c ) as in Section 3 and additionally J 6c . We compute the posterior predictive distribution (see Appendix C) for each observable integrated over the low-q 2 bin [1.1, 6] GeV 2 and high-q 2 bin [15, 19] GeV 2 matching LHCb's range. The distributions resemble Gaussians, thus we summarize them by their modes and smallest 68% intervals in Table 4 comparing the SM (prior predictive, C NP i = 0) to three NP scenarios. In each, we allow for interference with the vector couplings and additionally vary only C T,T 5 (Section 4.1), only C S,S ,P,P (Section 4.2), and finally both tensor and scalar couplings (Section 4.4).
We rescale J i and combinations by the B 0 -meson life time τ B 0 = 1.519 ps [38] to judge the experimental sensitivity in the near future by comparing to current measurements of the branching ratio In the SM, the typical magnitude of the branching ratio of B → K * ¯ is approximately equal to 2 · 10 −7 for both the q 2 ∈ [1.1, 6] and [15,19] GeV 2 bins. For comparison, the predicted ranges for τ B 0 (J 1s − 3 J 2s ) and τ B 0 (J 1c + J 2c ) in the  [15,19] (1.12 +0.10 −0.10 ) · 10 −10 (7.1 +5.6  Table 4 The posterior predictive 68% probability intervals of not-yet-measured angular observables for several new-physics scenarios given all the considered experimental constraints. The corresponding values for the SM (prior predictive) are given, too, where " 0" indicates zero in the considered approximation, see text for details.
SM are suppressed by 2 − 3 orders of magnitude down to O(10 −10 ); cf. Table 4. The angular observable J 6c is strictly zero in the absence of tensor and scalar couplings. Nonzero contributions can be generated in the SM by QED corrections or potentially from higher-dimensional (d ≥ 8) |∆ B| = |∆ S| = 1 operators, leading to parametric suppression by α e /(4π) or m b m /m 2 W . These factors should be compared to the potential suppression present for tensor and scalar contributions in particular NP models in order to gauge their relevance. Our modelindependent fits are still in a regime where such considerations are insignificant since current experimental measurements, in combination with theory uncertainties, do not yet impose sufficiently stringent constraints on tensor and scalar couplings.
Beyond the SM, J 6c can become of order O(10 −9 ) in scenarios involving tensor couplings only and about O(10 −10 ) in the presence of scalar couplings only. Both ef-fects are due to interference with vector couplings. Schematically, J 6c is a function of C T × C P + C S × C T 5 , m / q 2 × scalar × vector, and m / q 2 × tensor × vector. The largest interval is obtained for the scenario without scalar couplings because then the uncertainty on the (tensor) couplings is largest. But even then, it seems that the experimental sensitivity will not be high enough to have an impact in global fits.
Concerning (J 1c + J 2c ) and (J 1s − 3 J 2s ), substantial deviations from the SM prediction are again only possible in the presence of nonzero tensor couplings. In this case, an enhancement by two orders of magnitude is possible up to O(10 −8 ) at high q 2 and also at low q 2 in the case of (J 1s − 3 J 2s ). We want to stress again that we make these statements conditional on all included experimental constraints, the scenario, and our prior. In view of the current experimental precision of 20% on the branching ratio at LHCb [21] with only 1 fb −1 , corresponding to the O(10 −8 ), one can indeed hope for some sensitivity to such large effects in (J 1c + J 2c ) and (J 1s − 3 J 2s ) for the not-yet-published 3 fb −1 data set. At least, we can hope for some measurement if the method of moments [7] is applied.
For the CP asymmetry A ∆Γ (B s →μ µ) induced by the nonvanishing width of the B s meson (cf. Appendix A), we find a rather uniform distribution in scenarios with nonzero scalar couplings. So any value in the range [−1, 1] is plausible whereas the SM and the scenario with only tensor couplings predict a value of precisely one [39]; cf. the last row in Table 4. Hence any deviation from one would unambiguously hint at the presence of scalar operators.

Conclusions
We have derived the most stringent constraints to date on tensor and scalar couplings that mediate b → sμ µ transitions. They are based on the latest measurements of angular observables F µ H and the lepton forward-backward asymmetry A µ FB in B + → K +μ µ from LHCb [1], supplemented by measurements of the branching ratios of B s →μ µ and B → K ( * )μ µ.
Both F µ H and A µ FB belong to a class of observables in which vector and dipole couplings-present in the standard model (SM)-are suppressed (mostly kinematically by m / q 2 ) with respect to tensor and scalar couplings. We provide predictions for the equivalent but not-yet-measured angular observables J 6c , (J 1c + J 2c ) and (J 1s − 3J 2s ) in B → K * (→ Kπ)μ µ.
In a Bayesian analysis of the complex-valued couplings of the effective theory, we find that the measurement of F µ H , especially at high-q 2 , 1. imposes by itself constraints on tensor couplings |C T, T 5 | such that the upper bound of the smallest 68% (95%) credibility interval is 0.43 (0.57), superseding previous bounds. In combination with current data from B → K * μ µ and lattice predictions of B → K * form factors, the bounds are lowered to 0.33 (0.43), even in the presence of nonstandard contributions in vector couplings. 2. for the first time allows to simultaneously bound all four scalar couplings C S,S ,P,P due to its complementarity to B(B s →μ µ). Even when taking into account destructive interference with vector couplings, |C i + C i | < 0.3 (0.6) and |C i − C i | < 0.1 (0.2) for i = S, P with at least 68% (95%) probability. Currently, the bounds from F µ H are weaker than those from B(B s →μ µ) by about a factor of four. Future measurements of F µ H at LHCb and Belle II will further tighten the bounds. Moreover, measurements of F e H ( = e) will provide constraints on scalar couplings in the electron channel in the absence of a direct determination of the branching ratio of B s →ēe.
Our updated bounds on complex-valued tensor and scalar couplings are summarized in Table 2 and Table 3, accounting also for interference effects with vector couplings. These bounds hold even in the most general scenario of complexvalued tensor, scalar, and vector couplings, showing that the data are good enough to bound the real and imaginary parts of all Wilson coefficients simultaneously.
As a special case, we consider the scenario arising from the SM augmented by dimension-6 operators generalizing existing studies to the case of complex-valued couplings. In this scenario, tensor couplings are absent and additional relations between scalar couplings are enforced by the linear realization of the SU(2) L ⊗ U(1) Y electroweak symmetry group.
Our study of the yet unmeasured angular observables J 6c , (J 1c + J 2c ), and (J 1s − 3J 2s ) in B → K * (→ Kπ)μ µ (see Table 4) shows that despite the current bounds on tensor couplings, enhancements of up to two orders of magnitude over the SM predictions are allowed for (J 1c + J 2c ) and (J 1s − 3J 2s ), placing them in reach of the LHCb analysis of the full run I data set. Our bounds on scalar couplings from B s →μ µ and F µ H , however, are already quite restrictive permitting only small deviations from SM predictions in (J 1c + J 2c ) and (J 1s − 3J 2s ). Notably, the CP asymmetry A ∆Γ (B s →μ µ) given nonzero scalar couplings can take on any value in the range [−1, 1].
Both terms depend on the scalar B → K * form factor A 0 (q 2 ), a normalization factor N, and the Källén-function λ (see [5]), but have a different dependence on the Wilson coefficients i = 10, 10 , P, P : The lepton-flavor index of Wilson coefficients is omitted for brevity throughout.
In full generality, the angular observables in B → K * ¯ depend on seven transversity amplitudes with vector and dipole contributions, A L,R 0, ⊥, andÃ t , one scalar and one pseudoscalar amplitude, A S, P ∝ (C S,P − C S ,P ), and six tensor amplitudes, A ⊥,t⊥, 0 ∝ C T and A t0,t , 0⊥ ∝ C T 5 . The interesting combinations are (A.5) The function β 2 (q 2 ) ≡ 1−4m 2 /q 2 tends to 1 for m q 2 . This condition is well fulfilled for = e and q 2 1 GeV 2 , provided that tensor and scalar Wilson coefficients do not receive additional suppression factors. For = µ, the value of q 2 should not be too low, whereas in the case = τ, these observables are not anymore dominated by tensor and scalar contributions alone, and the full lepton-mass dependence has to be taken into account. Finally, we note that the second part of (J 1c + J 2c ) in (A.4), 2 m resembles very much the branching ratio of the rare decay B s →¯ in the limit Hence there is some similarity between (J 1c + J 2c ) in B → K * ¯ and B(B s →¯ ) in their dependence on the couplings but the former has additional dependence on tensor and vector couplings through the other transversity amplitudes. In B → K * ¯ , the helicity suppression factor 4 m 2 /q 2 of vector couplings is weaker than the corresponding factor 4 m 2 /M 2 Concerning B s →¯ , the expression (A.7) corresponds to the plain branching ratio at time t = 0. Due to the nonvanishing decay width ∆Γ s , experiments measure the average time-integrated branching ratio-denoted by B-and the two are related as [39] Here y s ≡ ∆Γ s /(2Γ s ) with the numerical value given in [28].
A ∆Γ is the CP asymmetry due to nonvanishing width difference, which is A ∆Γ = 1 in the SM, but in general can be A ∆Γ ∈ [−1, 1]. Since A ∆Γ can depart from it's SM value in scenarios of new physics considered in this work, we take this effect into account in our numerical analysis, although it is suppressed by small y s . The latest SM prediction B(B s →μ µ) = (3.65 ± 0.23) · 10 −9 [41] includes NLO electroweak [42] and NNLO QCD corrections [43]. Finally we discuss the possibility of suitable normalizations of J 6c , (J 1s − 3 J 2s ) and (J 1c + J 2c ) at high q 2 that would provide optimized observables. For this purpose we use form-factor relations at leading order in 1/m b and neglect terms suppressed by m / q 2 . With the notation and expressions derived in [5], Here f ⊥, ,0 and A 0 denote B → K * form factors, whereas the ρ ± 1 depends on vector couplings and ρ T 1 ∝ (|C T | 2 + |C T 5 | 2 ). Concerning (J 1s − 3 J 2s ), there are no appropriate normalizations, unless the chirality-flipped C NP 7 , 9 , 10 = 0 be-cause then ρ + 1 = ρ − 1 holds. There are three potential normalizations where the last one depends only on vector couplings; i.e., is free of tensor and scalar ones. In J 1s and J 2s , tensor couplings contribute either cumulatively or destructively to vector couplings in ρ ± 1 . A similar situation arises for (J 1c +J 2c ), where no appropriate normalization exists, unless scalar couplings vanish. In this special case, both J 1c and depend only on f 0 and can be used as normalization. In the general case, J 2c still depends only on tensor couplings and was used at low q 2 [9]. Finally we note that is free of tensor couplings at high q 2 and would provide access to ρ − 1 provided scalar couplings vanish.

Appendix B: Theoretical Inputs
Here we describe the theoretical treatment of observables and collect the numerical input for the relevant parameters.
The software package EOS [12,40,44] is used for the calculation of observables in B s →μ µ and B → K ( * )¯ and associated constraints. Both the likelihood and the prior are defined entirely within EOS.
Concerning numerical input, we refer the reader to [28] for the compilation of nuisance parameters relevant to this work. We adopt the same values for fixed parameters and the same priors unless noted otherwise below. Specifically, we use identical priors for for common nuisance parameters of the CKM quark-mixing matrix, the charm and bottom quark masses in the MS scheme, and the parametrization of subleading corrections in 1/m b as given in [28].
Contrary to [28], we do not choose a log-gamma distribution for the asymmetric uncertainties in priors anymore but rather a continuous yet asymmetric Gaussian distribution. This avoids a poor fit because the log-gamma distribution falls off too rapidly in the "short" tail. In a unifying spirit, we now use the continuous rather than discontinuous asymmetric Gaussian to approximate asymmetric experimental intervals in our likelihood.
The updated prior of the B s decay constant f B s enters the branching ratio of B s →μ µ. We adopt recent updates of the N f = (2 + 1) FLAG compilation [45], see Table 5, which averages the results of [46][47][48]. More recent calculations with N f = (2 + 1 + 1) [49] and N f = 2 [50] flavors are consistent with these averages.
The tensor and scalar amplitudes in B → K ( * )¯ factorize naively, i.e. they depend only on scalar and tensor B → K ( * ) form factors. These amplitudes are implemented in EOS for B → K¯ and B → K * ¯ as given in [4] and [5], respectively and we refrain from using form-factor relations at low and high q 2 for the tensor and scalar B → K and B → K * form factors.
Therefore, we need additional nuisance parameters for the complete set of B → K form factors f +,T,0 . As a consequence of using the z parametrization [51] and the kinematic relation f 0 (0) = f + (0), five nuisance parameters (listed in Table 5) are needed. As prior information, we average the LCSR results [51] and [52] (see Table 5 and cf. [28]) supplement them with lattice determinations [13]. The lattice results are given in a slightly different parametrization necessitating a conversion to the parametrization used here. For this purpose, we generate the form factors from the parametrization of [13], including full correlations, at three values of q 2 = 17, 20, 23 GeV 2 . Subsequently, these "data" are included in the prior by means of a multivariate Gaussian whose mean and covariance is given in Table 6. The q 2 values and number of points are chosen such that the correlation of neighboring points is small enough to keep the covariance nonsingular.
We change the parametrization of B → K * form factors w.r.t. our previous work [28] slightly and now use the simplified series expansion (SSE) [53] with three parameters α i k (k = 0, 1, 2) per form factor i = V, A 0 , A 1 , T 1 , T 23 and two (k = 1, 2) for i = A 12 , T 2 . The parameters α i 0 correspond to F i (q 2 = 0) and the kinematic constraints are used to eliminate α A 12 ,T 2 0 , where the kinematic factor depends on the B-and K *meson masses. The pole masses M R,i are set to the values given in [23].
This parametrization allows us to consistently combine available results of form factors from different nonperturbative approaches, namely LCSR's at large recoil and lattice at low recoil and to simultaneously implement the kinematic constraints. We determine the parameters α k i in a combined fit to form-factor predictions of V, A 0,1,2 , T 1,2,3 from the LCSR [23] and of V, A 0 , A 1 , A 12 , T 1 , T 2 , T 23 from the lattice [24,25] Table 6 Mean values, standard deviations (top) and correlation coefficients (bottom) of lattice points [13] for the B → K form factors f 0,+,T (q 2 ).
verified that the constraint at the kinematic endpoint q 2 max ≡ (M B − M K * ) 2 is satisfied to high accuracy by the above constraints and thus need not be imposed explicitly. Uninformative flat priors are chosen for all α i k with ranges The results of the LCSR form-factor predictions have been provided to us directly by the authors of [23] at q 2 = (0.1, 4.1, 8.1, 12.1) GeV 2 for V, A 0 , A 1 , A 12 , T 1 , T 2 , T 3 including the 28 × 28 covariance matrix. Similar results could have been obtained by "drawing" form factors from the correlated parameters given in [23] and ancillary files just as for B → K lattice form factors. For the B → K * lattice form factors this approach is not good enough as we were not able to select more than two q 2 values without obtaining a singular covariance. In order to fully the exploit the available information, we then contacted the authors of [25] and obtained the original values of the form factors (including correlation) at various values in the interval q 2 ∈ [11.9, 17.8] to which Horgan et al. fit the SSE. The covariance has a block-diagonal structure with a 48×48 block for V, A 0 , A 1 , A 12 and a 36×36 block for T 1 , T 2 , T 23 .
Having the "raw" information on form-factor values is much more reliable and future proof as there are no issues with artificial correlation and we could one day decide to use yet another form-factor parametrization and fit it easily to these data points.
We have compared the SSE fit (B.1) with two versus three parameters and found that in the former case lattice form factors influence the fit such that form factors tend to be higher than LCSR predictions at low q 2 leading to a poor fit. Hence we prefer the three-parameter setup as it provides the flexibility needed to accommodate LCSR and lattice results. Means and standard deviations of that three-parameter fit are given in Table 5, we omit the correlations for the sake of brevity but are happy to provide them.

Appendix C: Monte Carlo sampling
The marginalization of the posterior is performed with the package pypmc [54], which incorporates the algorithm presented in [55,56] and in addition an implementation of the variational Bayes algorithm. In every analysis we first run multiple adaptive Markov chains (MCMC) in parallel through pypmc. If necessary, chains are seeded at the SM point to exclude solutions in which multiple nuisance parameters-mostly for hadronic correctionssimultaneously deviate strongly from prior expectations.
In total, there are 19 parameters α i j to describe B → K * form factors and most of them are strongly correlated. But it is well known that strong correlation leads to poor sampling as it can cause the random-walk Markov chains to spend an excessive amount of time in regions of low probability and thus produce spurious peaks. To mitigate this issue, we perform a fit to form-factor constraints without any experimental data and use the resulting covariance matrix to transform parameters such that the new parameters are uncorrelated.
In all but three cases, the Markov chains then give reliable results. But when we analyze scenarios with C S,P = 0 and all experimental constraints, strong correlations appear again. As a final solution, we then use importance sampling  with the initial proposal function determined by a fit of a Gaussian mixture to the MCMC samples within the variational Bayes approximation [57]. As the posterior is unimodal and closely resembles a Gaussian, only a few Gaussian components are needed; i.e., 3 components proved optimal by the variational approximation to the model evidence.
In the most challenging run with 62 parameters, we obtain a relative effective sample size of only 0.038%. We want to have enough independent samples N such that the 68% region is determined with a relative precision of about 1%. As a rule of thumb, we consider the "relative error of the error" given by 1/ √ 2N [38, ch. 37]. In the 62D case, we compute a total of 1.1 · 10 6 importance samples, update the proposal after every 10 5 samples, and combine all samples [57] such that N ≈ 3500 and the estimated relative error of the error is 1.2% and thus good enough for our purposes.
To create the smooth marginal plots in Figures 1-3, we apply kernel density estimation for both MCMC and importance samples using the fast figtree library [58]. In the latter case, we additionally crop 500 outliers.
The prior or posterior predictive distribution of an observable X within a model M in which X = f (θ ) is a definite function of the parameters θ and given as P(X|M) = d θ P(X|θ , M)P(θ |M) = d θ δ (X − f (θ ))P(θ |M) . (C.1) We estimate P(X|M) by computing f (θ i ) for every sample θ i ∼ P(θ |M), then smooth as above.