Full top quark mass dependence in Higgs boson pair production at NLO

We study the effects of the exact top-quark mass dependent two-loop corrections to Higgs boson pair production by gluon fusion at the LHC and at a 100 TeV hadron collider. We perform a detailed comparison of the full next-to-leading order result to various approximations at the level of differential distributions and also analyse non-standard Higgs self-coupling scenarios. We find that the different next-to-leading order approximations differ from the full result by up to 50 percent in relevant differential distributions. This clearly stresses the importance of the full NLO result.


Introduction
After the discovery of a boson [1,2] whose characteristics have so far been consistent with the Standard Model Higgs boson, it is a primary goal of the LHC and future colliders to further scrutinize its properties. In particular, the form of the Higgs potential needs to be reconstructed by experimental measurements, in order to confirm the mechanism of electroweak symmetry breaking postulated by the Standard Model. One of the parameters entering the Higgs potential, the mass of the Higgs boson, already has been measured to an impressive precision [3]. The other parameter, the Higgs boson self-coupling, is more difficult to constrain, as it requires the production of at least two Higgs bosons. The cross sections for Higgs boson pair production at the LHC are about three orders of magnitude smaller than the ones for single Higgs production.
The dominant production channel is the gluon fusion channel, as for single Higgs boson production at the LHC. In the gluon fusion channel, there are two categories of contributions to di-Higgs production: either a virtual Higgs boson, produced by the same mechanism as in single Higgs production, is decaying into a Higgs boson pair, involving the self-coupling λ hhh , or the two Higgs bosons are both directly radiated from a heavy quark. At leading order (LO), these two mechanisms can be attributed to "triangle" and "box" contributions, respectively. However, at NLO, i.e. at the level of two-loop diagrams, the diagram topologies are more complicated, such that the association of "triangle diagrams" to diagrams containing the self-coupling λ hhh becomes invalid. The Higgs boson pair production cross section is additionally suppressed by the fact that there is destructive interference between contributions containing the Higgs boson self-coupling and the ones containing only Yukawa couplings to heavy quarks, and that for larger values of √ŝ , the contributions with an s-channel virtual Higgs boson propagator are strongly suppressed. Therefore, narrowing the window of possible values for the triple-Higgs coupling experimentally will have to wait until the high-luminosity run of the LHC [4][5][6], if Standard Model rates are assumed. However, the Higgs boson pair production rate could be modified by physics beyond the Standard Model (BSM), and hence it is important to be able to distinguish BSM effects from Standard Model higher order corrections. In this paper we will study the effects of a modified Higgs boson self-coupling and show that the Higgs boson invariant mass distribution is quite sensitive to changes in λ hhh , as such changes modify the interference pattern. Both ATLAS and CMS have published measurements of Higgs boson pair production in the decay channels γγbb [7][8][9][10], bbbb [9,[11][12][13][14], γγW W * , bbW W * , τ + τ − bb [9,[15][16][17][18][19][20][21]. Phenomenological studies about Higgs boson pair production and the feasibility of Higgs boson self-coupling measurements can be found e.g. in Refs. . The leading order (one-loop) calculation of Higgs boson pair production in gluon fusion has been performed in Refs. [48][49][50]. NLO corrections were calculated in the m t → ∞ limit, where the top quark degrees of freedom are integrated out, leading to point-like effective couplings of gluons to Higgs bosons ("Higgs Effective Field Theory", HEFT). Top quark mass effects have been included in various approximations. Calculating the NLO corrections within the heavy top limit and then rescaling the result differentially by a factor B F T /B HEF T , where B F T denotes the leading order matrix element squared in the full theory, is denoted "Born-improved HEFT" approximation. This calculation [51], implemented in the program Hpair, led to a K-factor of about two. In Ref. [52], another approximation, called "FT approx ", was introduced, which contains the full top quark mass dependence in the real radiation, while the virtual part is calcu-lated in the HEFT approximation and rescaled by the re-weighting factor B F T /B HEF T . The "FT approx " result [52] in addition uses partial NLO results for the virtual part, i.e. it employs the exact results where they are known from single Higgs production. The "FT approx " calculation leads to a cross section which is about 10% smaller than the Born-improved NLO HEFT cross section. Using the "FT approx " procedure, the reduction is about 9% with respect to the Born-improved NLO HEFT result. It was also found that top width effects can reach up to −4% above the tt threshold [52]. At LO, a finite top width reduces the total cross section at √ s = 14 TeV by about 2%. In our calculation we do not include a finite top width. In addition, the HEFT results at NLO and NNLO have been improved by an expansion in 1/m 2ρ t in Refs. [53][54][55][56], with ρ max = 6 at NLO, and ρ max = 2 for the soft-virtual part at NNLO [55]. In the latter reference it is also demonstrated that the sign of the finite top mass corrections, amounting to about ±10%, depends on whether the re-weighting factor is applied at differential level, i.e. before the integration over the partonic centre of mass energy, or at total cross section level. The NNLO QCD corrections in the heavy top limit have been performed in Refs. [54,57,58], and they have been supplemented by an expansion in 1/m 2 t in Ref. [55] and by resummation at NLO+NNLL in Ref. [59]. The most precise results within the infinite top mass approximation are NNLO+NNLL resummed results, calculated in Ref. [60], leading to K-factors of about 1.2 relative to the Born-improved HEFT result. Very recently, fully differential NNLO results in the HEFT approximation have become available [61]. As the different approximations partly led to corrections with opposite sign, there was a rather large uncertainty associated with the unknown effect of the exact top quark mass dependence at NLO, which was estimated to be of the order of 10% at √ s = 14 TeV.
The full NLO calculation which became available recently [62], revealed a 14% reduction of the total cross section compared to the Born improved HEFT at √ s = 14 TeV and a 24% reduction at √ s = 100 TeV.
At differential distribution level, we found that the deviation from the Born-improved HEFT approximation can be as large as 50% in the tails of distributions like the Higgs boson pair invariant mass or Higgs boson transverse momentum distributions. This paper is structured as follows. In Section 2 we give details of the calculation, in particular about the calculation of the two-loop amplitude and about the 1/m t expansion which we also performed. In Section 3 we discuss our phenomenological results. We study various distributions at √ s = 14 TeV and √ s = 100 TeV, comparing the full NLO result to different approximations. We also analyze the effects of non-Standard Model values of the triple Higgs coupling.
H Figure 1: Diagrams contributing to the process gg → hh at leading order.
2 Details of the calculation 2.1 Amplitude structure The leading order diagrams contributing to the process gg → hh are shown in Fig. 1.
As the cross section does not have a tree level contribution, the virtual contribution at next-to-leading order involves two-loop diagrams, and the NLO real radiation part involves one-loop diagrams up to pentagons. The amplitude for the process g(p 1 , µ) + g(p 2 , ν) → h(p 3 ) + h(p 4 ) can be decomposed into form factors as where n 1 , n 2 are arbitrary reference momenta for the two gluon polarization vectors µ , ν . Colour indices are denoted by a, b and The decomposition into tensors carrying the Lorentz structure is not unique. It is however convenient to define the form factors such that [49] where At leading order, we can further split F 1 into a "triangle" and a "box" contribution As the LO form factor F only contains the triangle diagrams, which have no angular momentum dependence, it can be attributed entirely to an s-wave contribution. The form factors F and F 2 can be attributed to the spin-0 and spin-2 states of the scattering amplitude, respectively. We can get an idea about the angular dependence of F 1 and F 2 by considering the partial wave decomposition of the scattering amplitude, which is independent of the loop order. It should be noted however that this analysis is valid for 2 → 2 scattering. At NLO, the cross section for the process gg → HH also contains a 2 → 3 scattering contribution from the real radiation. Therefore the analysis of the angular dependence below does not apply to the full NLO cross section. In general, for a scattering process a + b → c + d with the corresponding helicities λ a , ..., λ d , the partial wave decomposition reads [63][64][65] where θφλ c λ d |T (E)|00λ a λ b denotes the transition matrix element. Unitarity must hold for each partial wave independently, i.e. |T J | ≤ 1 . Thus the amplitude is decomposed into (orthogonal) Wigner d-functions d J s i ,s f (θ), where J denotes the total angular momentum and s i , s f the total spin of the initial and final state, respectively. The structure of the amplitude is such that F 1 only contributes to s i = 0, while F 2 only contributes to s i = 2. F 1 can have both a component proportional to d 0 0,0 (θ) as well as one proportional to d 2 0,0 (θ), while F 2 is proportional to d 2 2,0 (θ). The d-functions d J 0,0 (θ) are proportional to the Legendre-Polynomials P J (cos θ). As P 0 (x) = 1, P 2 (x) = 1 2 (3x 2 −1) and d 2 2,0 (θ) ∼ sin 2 θ, we can conclude that the angular dependence of F 2 should be ∼ sin 2 θ. From the analytic expression for F 2 at leading order [49], we can verify that indeed Further, it is known that the leading contributions to the amplitude come from the lower partial waves in Eq. (2.6). Therefore we also conclude that the contribution from F 2 should be subleading with respect to F 1 in most of the kinematic regions. Indeed we observe that the contribution of the form factor F 2 to the virtual two-loop amplitude is suppressed as compared to F 1 .

Leading Order cross section
The functions F i at leading order with full mass dependence can be found e.g. in Refs. [49,50]. At LO, the "triangle" form factor has the simple form where λ = 1 in the Standard Model, τ q = 4 m 2 q /ŝ and The partonic leading order cross section for gg → hh can be written aŝ (2.9) The integration limitst ± are derived from a momentum parametrisation in the centreof-mass frame, leading tot ± = m 2 s . To obtain the hadronic cross section, we also have to integrate over the PDFs. Defining the luminosity function as the total cross section reads where s is the square of the hadronic centre of mass energy, τ 0 = 4m 2 h /s, and µ F is the factorization scale.

Heavy top limit
In the m t → ∞ approximation the LO form factors are given bȳ which implies for the the effective ggH and ggHH couplings c h and c hh (2.13) From the expressions above we can derive the following expression for the squared amplitude in the heavy top limit : (2.14) For λ = 1, this expression vanishes at the Higgs boson pair production threshold s ∼ 4m 2 h . This explains why near the threshold the contributions containing the triple Higgs boson coupling and the ones which do not contain an s-channel Higgs boson exchange almost cancel. On the other hand, if the triple Higgs boson coupling was different from the Standard Model value, for example equal to zero, this should be clearly seen from the behaviour of the m hh distribution. We investigate the effects of non-standard values for the triple Higgs boson coupling in Section 3.3.

NLO cross section
The NLO cross section is composed of various parts, which we discuss separately in the following.
σ NLO (pp → hh) = σ LO + σ virt + σ r gg + σ r gq + σ r gq + σ r qq . (2.15) The contributions from the real radiation, σ r , can be divided into four channels, according to the partons in the initial state. The qq channel is infrared finite. Details are given in Section 2.3.2.

Amplitude generation
For the virtual two-loop amplitude, we use projectors P µν j to achieve a separation into objects carrying the Lorentz structure T µν i and the form factors F 1 and F 2 , In D dimensions we can use the tensors T µν i , defined in Eqs. (2.4), to build the projectors The virtual amplitude has been generated with an extension of the program GoSam [67,68], where the diagrams are generated using Qgraf [69] and then further processed using Form [70,71]. The two-loop extension of GoSam contains an automated python interface to Reduze [72], which implies that the user has to provide the integral families when running GoSam-2loop. The other input files needed by Reduze are generated automatically by GoSam-2loop, based on the kinematics of the given process. The reduction of the integrals occurring in the amplitude to master integrals should be performed separately, where in principle either of the codes Reduze [72], Fire5 [73] or LiteRed [74] can be used. Examples of two-loop diagrams contributing to Higgs boson pair production are shown in Fig. 2.
We would like to point out again that the distinction between "triangle diagrams" and "box diagrams" becomes ambiguous beyond the leading order. At two-loop and beyond there are diagrams which contain triangle sub-diagrams but which do not contain the Higgs boson self coupling, see Fig. 2k.

Integral families and reduction
For the reduction of planar diagrams we have defined five integral families. Each family contains nine propagators which allows irreducible scalar products in the numerator to be written in terms of inverse propagators prior to reduction. We chose a non-minimal set of integral families in favour of preserving symmetries as much as possible. We find that integrals with up to four inverse propagators appear in the amplitude and must be reduced. The families are listed in Table 1. The amplitude generation leads to about 10000 integrals before any symmetries are taken into account. After accounting for symmetries and after reduction (complete reduction of the planar sectors and partial reduction of the non-planar ones), we end up with 145 planar master integrals plus 70 non-planar integrals, and a further 112 integrals that differ by a crossing. As these integrals contain four independent mass scales,ŝ,t, m 2 t , m 2 h , only a small subset is known analytically. Besides the diagrams which are factorizing into two one-loop diagrams [56], the known integrals are the twoloop diagrams with two light-like legs and one massive leg, which enter single Higgs boson production, calculated e.g. in Refs. [75][76][77][78][79], and the triangles with one light- Table 1: Integral families for the reduction of the planar diagrams. The non-planar integrals were computed as tensor integrals, see text.
As the integral basis is not unique, we choose to have two set-ups, relying on different sets of basis integrals. This serves as a strong check of the calculation of the virtual amplitude. It has previously been noted that using a finite basis [85] along with sector decomposition can increase the precision obtained by numerical integration for a given number of sampling points [86]. We also observed that switching to a finite basis in some of the planar sectors turned out to be beneficial for the numerical evaluation of the master integrals. A complete reduction could not be obtained for the non-planar 4-point integrals. The inverse propagators appearing in unreduced integrals were rewritten in terms of scalar products such that the resulting integrals had the lowest possible tensor rank. The tensor integrals (up to rank 4) were then directly computed with SecDec.
We would like to mention that non-planar diagrams also contribute to the leading colour coefficient. Therefore we could not identify a contribution which is both dominant and gauge invariant where only planar integrals contribute.

Renormalization
We expand the amplitude in a 0 = α 0 /(4π), where α 0 is the bare QCD coupling. The bare amplitude can be written as where the one-and two-loop coefficients are given by Here µ 2 0 is a parameter introduced in dimensional regularisation to maintain a dimensionless bare coupling and S = e −γ E (4π) , with γ E the Euler constant. The one-loop amplitude is expanded to O( 2 ) as it appears multiplied by the Catani-Seymour insertion operator stemming from the integrated dipoles, I, which has poles of O( −2 ). To renormalize the gluon wave function we must multiply the amplitude by (Z A ) 1 2 for each external gluon leg, where Z A is the gluon field renormalization constant. We renormalize the QCD coupling using the relation where α s is the renormalized coupling and Z a is the associated renormalization constant.
Here µ R is the renormalization scale and the dependence of α s on µ R is implicit. The top mass is renormalized by relating the bare top mass m 2 t 0 to the renormalized top mass In practice, we compute top mass counter-term diagrams, treating a δm 2 t as a counterterm insertion in top quark lines and renormalize the top Yukawa coupling using No Higgs wave function or mass renormalization is required as we compute only QCD corrections.
In our calculation we use conventional dimensional regularization (CDR) with D = 4 − 2 . We renormalize the top mass in the on-shell scheme and the QCD coupling in the MS five-flavour scheme (N f = 5) with the top quark loops in the gluon self-energy subtracted at zero momentum. The one-loop renormalization constants are given to first order in a by 2 26) and the mass counter-term in the on-shell scheme is given by The coefficientsb i in (2.19), (2.20) contain integrals I r,s (ŝ,t, m 2 h , m 2 t ), where r denotes the number of propagators in the denominator and s denotes the number of propagators in the numerator and therefore defines the tensor rank of the integral. The integrals have mass dimension [I r,s ] = D L − 2r + 2s, with L the number of loops. We may therefore factor a dimensionful parameter M out of each integral such that they depend only on dimensionless ratios The renormalized amplitude may then be written as Since δm 2 t contains poles of O( −1 ) the coefficient c of the top mass counter-term must be expanded to O( ). It is obtained by the insertion of a mass counter-term into the heavy quark propagators, where a, b, c are colour indices in the fundamental representation. Alternatively, the mass counter-term can be obtained by taking the derivative of the one-loop amplitude with respect to m.
The coefficients b and c in (2.29) are calculated numerically. We have extracted the dependence of the coefficients on the renormalization scale and introduced a dependence on a new scale, M , which we keep fixed in our numerics. For the infrared singularities stemming from the unresolved real radiation, we use the Catani-Seymour subtraction scheme [88]. The infrared poles of the virtual amplitude are cancelled after combination with the I-operator, which is given by where K g is also defined by the Catani-Seymour subtraction scheme [88]. Inserting the I-operator into the Born amplitude leads to 3 where we again have extracted a factor (M 2 ) − from the integrals contained in the one-loop amplitude. Using (2.29) and (2.36) we therefore have By construction the double pole in must vanish, thus (2.37) implies Substituting the above relation back into (2.37) we see that the dependence on the renormalization scale µ R cancels in the single pole term. The dependence of the cross section on the factorization scale is encoded in the P and K terms of the Catani-Seymour framework [88].

Integration of the two-loop amplitude
To evaluate the two-loop integrals appearing in the amplitude we first apply sector decomposition as implemented in SecDec. In the Euclidean region sector decomposition resolves singularities in the regulator , leaving only finite integrals over the Feynman parameters which can be evaluated numerically. In the physical region we treat the integrable singularities by contour deformation [83,[89][90][91]. To obtain the differential cross section we have to evaluate integrals at phase space points very close to threshold, where no special treatment was necessary but numerical convergence was considerably harder to achieve. and store a j as a vector containing the coefficients of I j in the expressions for b (2) k , leading to the amplitude structure given in Eq. (2.32). Structuring the code this way allows us to dynamically set the number of sampling points used for each integral according to its contribution to the amplitude. After calculating each integral with a fixed number of sampling points, we assume that the integration error ∆ j of the integrals scales as ∆ j ∝ t −α j with the integration time t j . To efficiently calculate the results b (i) k with a given relative accuracy ε k , we estimate the required number of sampling points for each integral such that the total time j,k is the error estimate of integral I j including its coefficients in b andλ is a Lagrange multiplier. Since the loop integrals can contribute to several results b (i) k , we apply the above optimization formula for each required order in and for both form factors. For each integral, we then use the maximum of the estimated number of required sampling points. Instead of directly evaluating each integral with the calculated number of sampling points, we limit the number of new sampling points and iterate this procedure to reach the desired accuracy, updating the estimated number of sampling points after each iteration. The desired accuracy for the finite part of the two-loop amplitude (ε (2) 0 ) is set to 3% for form factor F 1 and (depending on the ratio F 2 /F 1 ) to a value of 5-20% for form factor F 2 . For the integration we use a quasi-Monte Carlo method based on a rank-one lattice rule [92][93][94]. For suitable integrands, this rule provides a convergence rate of O(1/n) as opposed to Monte Carlo or adaptive Monte Carlo techniques, such as Vegas [95], where n is the number of sampling points. While we observe a convergence rate of O(1/n) for most of the integrals, the convergence of some integrals is worse and we therefore assume a scaling of ∆ j (t j ) with exponent α = 0.7 when estimating the number of required sampling points. The integration rule is implemented in OpenCL 1.1 and a further (OpenMP threaded) C++ implementation is used as a partial cross-check. The 913 phase-space points at 14 TeV (1029 phase-space points at 100 TeV) used for the current publication were computed with ∼16 dual Nvidia Tesla K20X GPGPU nodes. More details on the numeric evaluation of the amplitudes can be found in Refs. [96,97].

Real radiation
As we calculate a process which is loop-induced, the NLO corrections involve two-loop integrals. But, for the real part only single-unresolved radiation can occur. This means that a standard NLO infrared subtraction scheme can be used. We use the Catani-Seymour dipole formalism [88], combined with a phase space restriction parameter α to restrict the dipole subtraction to a limited region, as suggested in Ref. [98]. There are four partonic channels for the real radiation contribution to the cross section: σ r (gg → hh + g), σ r (gq → hh + q), σ r (gq → hh +q), σ r (qq → hh + g) . (2.42) Including all crossings, there are 78 real radiation diagrams. Infrared singularities only originate from initial state radiation, diagrams with extra gluons radiated from a heavy quark line are infrared finite, which implies that the qq channel is finite. Example diagrams are depicted in Fig. 3.
t t t t t Figure 3: Examples of diagrams contributing to the real radiation part at NLO. The diagrams in the second row do not lead to infrared singularities.

Expansion in 1/m 2 t
We have calculated top mass corrections as an expansion in 1/m 2 t in the following way: we write the partonic differential cross section as where Λ ∈ √ŝ , √t , √û , m h , and determine the first few terms (up to N = 3) of this asymptotic series. The case N = 0 reproduces to the usual effective theory approach, without the need to calculate Wilson coefficients separately, however. To generate the diagrams we again use qgraf [69]. The generation and expansion of the amplitude in small external momenta is then performed using q2e/exp [99,100] and leads to two-loop vacuum integrals inserted into tree-level diagrams as well as oneloop vacuum integrals inserted into massless one-loop triangles. Whereas the vacuum integrals are evaluated with Matad [101], the massless integrals can be expressed in terms of a single one-loop bubble, which we achieve with the help of Reduze [72]. Again, the algebraic processing of the amplitude is done with Form [70,71].
The exact and expanded matrix elements were combined in the following way: a series expansion for the virtual corrections was performed then rescaled with the exact born, The first identity is valid because the colour structure of the exact and the expanded LO cross section are identical, and the second because the sum in the bracket is finite. Thus one needs to know only the dependence of the expanded LO cross section in this approximation. There is some ambiguity when to do the rescaling, i.e. before or after the phase-space integration, and convolution with the PDFs. We opt to do it on a fully differential level, i.e. the rescaling is done for each phase-space point individually.

Checks of the calculation
We have verified for all calculated phase space points that the coefficients of the poles in are zero within the numerical uncertainties. For a randomly chosen sample of phase-space points we have calculated the pole coefficients with higher accuracy and obtained a median cancellation of five digits. Our implementation of the virtual two-loop amplitude has been checked to be invariant under the interchange oft andû at various randomly selected phase-space points. Single Higgs boson production has been re-calculated with the same setup for the virtual corrections and compared to the results obtained with the program Sushi [102]. Further, the one-loop amplitude has been computed using an identical framework to the two-loop amplitude and has been checked against the result of Ref. [49]. As a further cross-check we have also calculated top mass corrections as an expansion in 1/m 2 t as explained above. We have also compared to results provided to us by Jens Hoff for the orders N = 4, 5, 6 in the expansion above, worked out in [55]. The result of the comparison is shown in Fig. 4. One can see that below the 2m t threshold, where agreement is to be expected, the expansion converges towards the full result. The computation of the mass counter-term diagrams has been cross-checked by ex- where A ct,(1) is the one-loop top quark mass counter-term. On the real radiation side, we have verified the independence of the amplitude from the phase space restriction parameter α. We have also varied the technical cut p min T in the range 10 −2 ≤ p min T / √ŝ ≤ 10 −6 to verify that the contribution to the total cross section is stable and independent of the cut within the numerical accuracy. Further, we have compared to the results of Ref. [52] for the Born-improved HEFT and FT approx approximations and found agreement within the numerical uncertainties [103].     Table 2. The full NLO result has a statistical uncertainty of 0.3% at 14 TeV (0.16% at 100 TeV) stemming from the phase space integration and an additional uncertainty stemming from the numerical integration of the virtual amplitude of 0.04% at 14 TeV and 0.2% at 100 TeV. These uncertainties are not included in Table 2, where only scale variation uncertainties are shown.

NLO distributions
In this section we show differential distributions at √ s = 14 TeV and √ s = 100 TeV for various observables and compare to the approximate results in order to assess the effect of the full top quark mass dependence at NLO. Results which are obtained within the effective field theory approach without reweighting by the leading order results in the full theory are always denoted by "basic HEFT", while "B-i. NLO HEFT" stands for the Born-improved NLO HEFT result, where the NLO corrections have been calculated in the m t → ∞ limit and then a reweighting factor B F T /B HEF T is applied (on differential level, B F T stands for the Born amplitude squared in the full theory). We decided to take the same bin sizes as in Ref. [61], such that the differences to the effective theory results can be exhibited most clearly. In Fig. 5    Comparing the results at 14 TeV and 100 TeV, we observe that the differences of the full NLO result to the Born-improved HEFT and also to the FT approx result are amplified at 100 TeV, as expected, as the HEFT approximation does not have the correct high energy behaviour. This scaling behaviour will be discussed more in detail below. We also see that the K-factor is far from being uniform for the m hh distribution, while the "basic HEFT" results suggest a uniform K-factor.  The p T,h distribution shown in Fig. 6 denotes the distribution of the "single inclusive" Higgs boson transverse momentum, which denotes the transverse momentum distribution of any (randomly picked) Higgs boson. In contrast, Fig. 7 shows the transverse momentum distributions of the leading-p T ("harder") and subleading-p T ("softer") Higgs boson. It again becomes very clear that reweighting the basic HEFT result is indispensable in order to get at least somewhat close to the shape of the full NLO result. The p T,h distribution in Fig. 6a shows that, while the Born-improved NLO HEFT result starts moving out of the scale variation band of the full NLO result at 14 TeV beyond p T,h ∼ m t , the FT approx result stays within the scale uncertainty band of the full NLO result, (even though it is clear that it systematically overestimates the full result by about 20-30%). This is not surprising, as the tail of the p T,h distribution is to a large extent dominated by the real radiation contribution. At √ s = 100 TeV, the FT approx result leaves the scale variation band of the full NLO result beyond p T,h ∼ 280 GeV, but still is much closer to the full result than the Born-improved NLO HEFT result. The differences of the latter to the full result are amplified at 100 TeV. In any case, it is clear that the scale variation bands can only be indicative of missing higher order corrections in perturbation theory, while the top quark mass effects (or the omission of the exact top quark mass dependence) are in a different category. Therefore one cannot expect that, for example, the NLO HEFT scale variation band would comprise the full NLO result. It is also worth mentioning that the "FT approx " approximation [52], where the partial two-loop results (known from single Higgs production) were included, turned out to be a worse approximation than "FT approx ", where the virtual part is given by the Born-improved NLO HEFT result, as it lead to a larger cross section than the "FT approx " one, and the latter is still larger than the full result. Note that for 2 → 2 scattering the transverse momentum of the Higgs boson is given by p 2 T =ŝ 4 β 2 h sin 2 θ. Therefore, at leading order, the p T,h transverse momentum distribution directly reflects the angular dependence of the virtual amplitude. However, at NLO, the angular dependence of the form factors is influenced to a large extent by the real radiation. This can be seen from the distributions of the leading-p T ("harder") and subleading-p T ("softer") Higgs bosons shown in Fig. 7. The Higgs boson will pick up a large transverse momentum if it recoils against a hard jet, therefore the K-factor of the p hard T,h grows in the tail of the distribution, which is dominated by 2 → 3 kinematics. Fig. 8 shows the rapidity distributions of both the Higgs boson pair and the leading-p T Higgs boson. As the mass effects are uniformly distributed over the whole rapidity range, the K-factors are close to uniform for these distributions, and the FT approx result is within 10% of the full result. In Fig. 9 we display the tails of the m hh and p T,h distributions on a logarithmic scale, in order to exhibit the scaling behaviour in the high energy limit. Using leading-log high energy resummation techniques, it can be shown [108] that at high transverse momentum, the differential partonic cross section for single Higgs (+jets) production dσ/dp T,h ∼ 1/p a T,h scales with a = 2 in the full theory, however with a = 1 in the effective theory. This behaviour also has been recently confirmed by a (leading order) calculation of Higgs + 1,2,3 jet production with full mass dependence [109]. In order to investigate the high energy scaling behaviour we fitted a line to the tail of the leading order m hh distribution (with the luminosity factor set to one, plotted logarithmically), and found the following scaling behaviour: with full mass dependence, the scaling is as m −3 hh for dσ/dm hh i.e. the partonic cross section scales asŝ −1 , while in the basic HEFT approximation the scaling is as m hh for dσ/dm hh i.e. the partonic cross section grows asŝ. From Fig. 9 one can see that this relative difference in the high-energy scaling behaviour between the full calculation and the basic HEFT approximation is similar at NLO. In Fig. 10 we show distributions for an improved FT approx , which is supplemented with higher order terms in the expansion of the virtual amplitude in 1/m 2 t as given by Eq. (2.44), dubbed "exp. virt." for "expanded virtuals". We see a trend similar to the one for the virtual (plus I-operator) part shown in Fig. 4. In order to better account for missing higher order corrections it is desirable to combine the full NLO with NNLO results obtained in the HEFT, ideally on a differential level. As a first attempt to achieve this, we take the NNLO to NLO ratio from Ref. [   from Ref. [61] described in the main text.

Sensitivity to the triple Higgs coupling
As already mentioned in Section 2.1, the Higgs boson self-coupling in the Standard Model is quite special. Not only that it is completely determined in terms of the Higgs boson mass and VEV, but it also leads to the fact that at the double Higgs production threshold √ŝ = 2m 2 h , the LO cross section is almost vanishing, due to destructive interference between box and triangle contributions. Therefore a measurement of the Higgs boson self-coupling is a very sensitive probe of New Physics effects. A more complete analysis of such effects would require an approach where further operators are taken into account, for example operators which mediate direct ttHH couplings (and Higgs-gluon couplings which can differ from the SM HEFT ones), see e.g. [37][38][39]. However, the conclusions drawn from the calculation of NLO corrections in the m t → ∞ limit to the extended set of EFT Wilson coefficients have to be taken with a grain of salt, as the full top quark mass dependence may affect them considerably. In this section we would like to focus on just a single line in the parameter space of possible non-SM Higgs couplings and investigate the behaviour of the m hh distribution under variations of λ, where we have defined λ hhh = 3m 2 h λ, see Eq. (2.7). In Fig. 12 we show the total cross section as a function of λ. As already observed for the LO cross section [22], it has a minimum around λ = 2. Negative λ values, which are not excluded neither theoretically nor experimentally (within certain broad limits given e.g. by vacuum stability), do not lead to destructive interference and therefore result in a much larger cross section. For large positive values, λ ∼ 5, the total cross section is of comparable size to the one for λ 0, but the shape of the m hh distribution is completely different. This can be seen in Fig. 13, where we show the Higgs boson pair invariant mass distribution for various values of the Higgs boson self-coupling, at √ s = 14 TeV and √ s = 100 TeV. For λ = 5, the differential cross section is mainly dominated by contributions containing the Higgs boson self coupling and peaks at low m hh values. In contrast, the λ = 0 case, which does not contain any triple Higgs coupling contribution, peaks shortly beyond the 2m t threshold at m hh ∼ 400 GeV, as does the case λ = −1. In the latter case, however, the total cross section is much larger.
The case λ = 2 shows a dip at m hh ∼ 300 GeV, which is due to destructive interference effects as mentioned above. At 100 TeV, the shape of the distributions is very similar.  However, the fact that the cross sections are much larger can be exploited to place cuts which enlarge the sensitivity to the Higgs boson self coupling. For example, one can try to enhance the self-coupling contribution by cuts favouring highly boosted virtual Higgs bosons, decaying into a Higgs boson pair which could be detected in the bb bb channel. A highly boosted virtual Higgs boson must recoil against a high-p T jet. Therefore, an enhancement of the boosted component could be achieved by imposing a p min T,jet cut on the recoiling jet in Higgs boson pair plus jet production [110,111]. An additional advantage of boosted Higgs bosons is the fact that they lend themselves to the use of the bbbb rather than the bbγγ decay channel, as the decay channel into b-quarks is accessible through boosted techniques. This leads to a gain in the rate which easily makes up for the loss in statistics due to a high p min T,jet cut. Fig. 14 shows a comparison to the different approximations for various values of λ, as well as the K-factors. For all values of λ, the K-factors are far from being uniform, while the HEFT approximation suggests almost uniform K-factors for λ ≤ 1. For λ = 2, we see a pronounced "interference dip" at m hh ∼ 330 GeV, which is present at LO already. We can get an idea about the destructive interference effect by observing the following: In the basic HEFT approximation, the squared Born amplitude is given by Eq. (2.14) This expression has a double zero atŝ = m 2 h (1 + 3λ). Therefore, the re-weighting factor B F T /B HEF T can get large when B HEF T approaches zero, i.e. at √ s 330.72 GeV for λ = 2, √ s 395.29 GeV for λ = 3, √ s 450.7 GeV for λ = 4 and 500 GeV for λ = 5. In the full theory, the amplitude does not vanish completely at these points, but nonetheless also gets small, which should be the reason for the dips in the m hh distributions for λ = 2 and 3.

Conclusions
We have presented results of a fully differential calculation of Higgs boson pair production in gluon fusion at NLO retaining the exact top quark mass dependence. For the total cross section at √ s = 14 TeV, we found a reduction of 14% compared to the Born improved HEFT, and a 24% reduction at √ s = 100 TeV. For differential distributions, the mass effects can be even larger. In the tails of the Higgs boson transverse momentum distributions, the differences to the Born improved NLO HEFT approximation amount to more than 50%, while the FT approx result, where the full top mass dependence is included only in the real radiation part, stays within 20% of the full result. The basic NLO HEFT approximation, where no reweighting by the Born result in the full theory is performed, fails to properly describe the shape of the m hh and p T h distributions, in particular in the tails of the distributions, where we performed an analysis of the high-energy scaling behaviour. We also studied the influence of non-standard values for the Higgs boson self-coupling on the total cross sections and m hh distributions. As is known from leading order, there is destructive interference between various contributions to the cross section, and this feature persists at NLO. Varying λ hhh /λ SM leads to a minimum in the value for the total cross section around λ hhh /λ SM ∼ 2.3. The shape of the m hh distribution is rather sensitive to variations of λ hhh , which alter the interference pattern. For example, at λ hhh = 0, the total cross section is almost as large as for λ hhh /λ SM = 5, but the shape of the distributions is very different. Further, we made a first attempt to combine the full NLO results with the NNLO results calculated in the basic HEFT approximation [61] at differential distribution level, which should lead to a "NLO-improved NNLO HEFT" result, which may still be improved in the near future in various directions, for example towards Higgs boson decays.
PZ00P2 154829. GH would like to acknowledge the Kavli Institute for Theoretical Physics (KITP) for their hospitality. We gratefully acknowledge support and resources provided by the Max Planck Computing and Data Facility (MPCDF).