Exploring anomalous couplings in Higgs boson pair production through shape analysis

We classify shapes of Higgs boson pair invariant mass distributions mhh, calculated at NLO with full top quark mass dependence, and visualise how distinct classes of shapes relate to the underlying coupling parameter space. Our study is based on a five-dimensional parameter space relevant for Higgs boson pair production in a non-linear Effective Field Theory framework. We use two approaches: an analysis based on predefined shape types and a classification into shape clusters based on unsupervised learning. We find that our method based on unsupervised learning is able to capture shape features very well and therefore allows a more detailed study of the impact of anomalous couplings on the mhh shape compared to more conventional approaches to a shape analysis.


Introduction
The Higgs sector as we see it today is probably just a glimpse of an underlying more general structure still awaiting to be explored. Manifestations of new physics at higher scales would lead to operators which on one hand introduce new, effective couplings and on the other hand also modify interactions known in the Standard Model (SM). Therefore it is a primary goal for collider physics in the next decades to constrain the couplings, in particular in the Higgs sector, to an unprecedented precision. This is particularly true for the Higgs boson self couplings, in order to find out whether the Higgs potential is indeed of the form assumed by the SM. Deviations from this form could provide strong hints about how to extend the SM. The trilinear Higgs boson coupling λ can be constrained by measurements of Higgs boson pair production [1,2], where the gluon fusion channel yields the largest cross section, and the most stringent 95% CL limits on the total gg → HH cross section at √ s = 13 TeV are currently σ HH max = 6.9 × σ SM , constraining trilinear coupling modifications to the range −5.0 ≤ λ/λ SM ≤ 12.0 [1]. The trilinear Higgs couplings can also be constrained in an indirect way, through measurements of processes which are sensitive to these couplings via electroweak corrections [3][4][5][6][7][8][9][10][11][12][13][14]. Such processes offer important complementary information, however they are susceptible to other BSM couplings entering the loop corrections at the same level, and therefore the limits on c hhh = λ/λ SM extracted this way may be more model dependent than the ones extracted from the direct production of Higgs boson pairs. A corresponding experimental analysis based on single Higgs boson production processes already has been performed [15]. Under the assumption that all deviations from the SM expectation are stemming from a modification of the trilinear coupling, the derived bounds on c hhh at 95% CL are −3.2 ≤ c hhh ≤ 11.9 [15]. However once the couplings to vector bosons and/or fermions are allowed to vary as well, these bounds deteriorate very quickly. The idea of indirect constraints through loop corrections also has been employed trying to constrain the quartic Higgs boson self-coupling from (partial) EW corrections to Higgs boson pair production [16,17]. Theoretical constraints on c hhh are rather loose if derived in a largely model independent way. Recent work based on general concepts like vacuum stability and perturbative unitarity suggests that |c hhh | 4 for a new physics scale in the few TeV range [18][19][20][21]. More specific models can lead to more stringent bounds, see e.g. Refs. [22][23][24][25][26]. Recent phenomenological studies about the precision that could be reached for the trilinear coupling at the (HL-)LHC and future hadron colliders are summarised in Refs. [27][28][29].
Higgs boson pair production in gluon fusion in the SM has been calculated at leading order in Refs. [30][31][32], the NLO QCD corrections with full top quark mass depen-dence became available more recently [33][34][35]. Implementations of the full NLO QCD corrections in parton shower Monte Carlo programs are also available [36][37][38]. NNLO QCD corrections have been computed in the m t → ∞ limit in Refs. [39][40][41][42][43]. The calculation of Ref. [43] has been combined with results including the top quark mass dependence as far as available in Ref. [44], and the latter has been supplemented by soft gluon resummation in Ref. [45]. The scale uncertainties at NLO are still at the 10% level, while they are decreased to about 5% when including the NNLO corrections. The uncertainties due to the chosen top mass scheme have been assessed in Ref. [35]. Analytic approximations for the top quark mass dependence of the two-loop amplitudes in the NLO calculation have been studied in Refs. [46][47][48][49][50]. Complete analytic results in the high energy limit have been presented in Ref. [51]; the latter have been combined with the full NLO results in the regions where they are more appropriate in Ref. [52]. The effects of operators within an Effective Field Theory (EFT) description of Higgs boson pair production have been studied at LO QCD in Refs. [21,[53][54][55][56][57][58] and at NLO in the m t → ∞ limit in Refs. [59][60][61], including also CP-violating operators [62]. EFT studies at NNLO in the m t → ∞ limit are also available [63]. In Ref. [64] for the first time the full NLO QCD corrections have been combined with an EFT approach to study BSM effects. It is well known that Higgs boson pair production in gluon fusion is a process where delicate cancellations occur between triangle-type contributions -containing the trilinear Higgs coupling -and box-type contributions not containing the trilinear coupling. While the destructive interference between these contributions is usually seen as a curse leading to small cross sections, it can be turned into a virtue when analyzing the shapes of distributions, as for example the di-Higgs invariant mass distribution m hh , because even small anomalous couplings can lead to characteristic shape changes. Therefore it is important to investigate in which way the shapes are influenced by a certain configuration in the coupling parameter space. The idea of a shape analysis has been pursued in detail in Refs. [56,65,66], where a cluster analysis is proposed to define 12 benchmark points in a 5-dimensional non-linear EFT parameter space which result from clusters of "similar" shapes. The similarity measure in this case is based on a binned likelihood ratio using LO predictions for the observables m hh , cos θ * and p T,h . In Ref. [64] it was analyzed how the m hh and p T,h distributions change when going from LO to NLO for the benchmark points defined in Ref. [56]. As a function of the 5-dimensional coupling parameter space, the m hh distribution can have a few characterising features, such as an enhanced low-m hh region, a double peak, a single peak or an enhanced tail. Some of these features can be attributed rather easily to a certain anomalous coupling. For example, an enhanced low-m hh region is naturally produced by values of |c hhh | > 3, as this leads to a dominance of the triangle-type contributions, which are suppressed by 1/ŝ and therefore die out quickly for larger m hh values. Other features of the m hh -shape, like a double peak or a SM-like distribution, are harder to attribute to a certain coupling configuration, as there are a multitude of configurations leading to such shapes. This is also reflected in the cluster analysis proposed in Ref. [65], where (a) very different coupling configurations can end up in the same cluster, and (b) the same cluster can contain shapes which "by eye" look quite different (for example "double peak" and "single peak"). Therefore it is desirable to seek for alternative methods to extract information about the underlying parameter space from the shape of distributions in Higgs boson pair production. In this work we first classify the shapes of Higgs boson pair invariant mass distributions, calculated at full NLO, into four characteristic types. We visualise the underlying 5-dimensional EFT parameter space producing these shape types, projecting onto 2-dimensional subspaces. We also comment on the shape of the p T,h distribution. Then we refine the shape analysis, applying an unsupervised learning algorithm based on an autoencoder to identify patterns in the shapes of the m hh distribution. We use the KMeans clustering algorithm from scikit-learn [67] and ask for a classification of the shapes into four to eight clusters. The unsupervised classification into four clusters is compared to the analysis based on predefined clusters for validation. Then the number of clusters is increased with the aim to find an "optimal" number of clusters, in the sense that it captures distinct shape features, but does not focus on minor details. One aim of this study is to offer an alternative to the cluster analysis proposed in Refs. [56,65,66]. While the benchmark points representing a cluster are isolated points in the parameter space, the procedure we propose here allows us to associate certain shapes more straightforwardly with distinct regions in the parameter space. The application of machine learning techniques in high energy physics, in particular to constrain the EFT/new physics parameter space, has been brought forward already some time ago [68][69][70][71], with successful applications in jet and top quark identification [72][73][74][75][76][77][78][79][80][81], new physics searches [70,71,[82][83][84][85][86][87][88][89][90] and PDFs [91]. Shape analysis with machine learning has been applied already to constrain anomalous Higgs-vector boson couplings in HZ production [92]. The remainder of this work is structured as follows: In Section 2 we explain the framework our data samples are based on. We define four different shape types for the m hh distribution and visualise the parameter space underlying the predefined shape types. In Section 3 we describe our cluster analysis based on unsupervised learning and show how the clusters found by this procedure relate to the underlying parameter space, before we conclude.
2 Classification through predefined shape types

Parametrisation of anomalous couplings in the Higgs sector
As a starting point we use the effective Lagrangian in a non-linear Effective Field Theory ("Higgs Effective Field Theory, HEFT") relevant for Higgs boson pair production, assuming CP conservation, up to order 4 in the chiral expansion [64,93]: In the SM c t = c hhh = 1 and c tt = c ggh = c gghh = 0. The chromomagnetic operator is absent in (2.1) because it contributes to gg → hh only at higher order in the chiral counting. The coefficients c ggh and c gghh are related in SMEFT ("SM Effective Field Theory") [29,94,95], however in HEFT there is not necessarily a relation between the two parameters. To clarify the relation to the widely used SMEFT operators, we briefly comment on the SMEFT Lagrangian here. The dimension-6 terms relevant for gg → hh can be written as [55,59] Relating the coefficientsc i in Eq. (2.2) to the couplings of the physical Higgs field h and comparing with the corresponding parameters of the chiral Lagrangian Eq. (2.1), one finds, after a field redefinition of h to eliminatec H from the kinetic term, Note that, assuming an underlying weakly coupled gauge theory, dimension-6 operators involving field-strength tensors can only be generated through loop diagrams [96]. Their coefficients then come with an extra factor of 1/16π 2 . In this case, the coefficientsc ug andc g in Eq. (2.2) are counted as order For more details about the difference between HEFT and SMEFT we refer to Refs. [29,64]. We produce our data using the differential distributions calculated in Ref. [64], parametrised in terms of coefficients A i for each coupling combination occurring in the (differential) NLO cross section, which allows for a fast evaluation: The coefficients A i are evaluated in bins of width 20 GeV from 250 GeV to 1050 GeV, i.e. for 40 bins. They are available with Ref. [64] as .csv tables, in units of fb/GeV, for √ s = 13, 14 and 27 TeV. The median of the statistical uncertainties of the differential coefficients A i does not exceed 3%, however in the bins beyond m hh 650 GeV some A i coefficients have uncertainties in the 20-30% range.

Definition of shape types
We distinguish four types of characteristic shapes for the Higgs boson invariant mass distribution m hh : 1. Enhanced low m hh region, constantly falling distribution as m hh increases.
2. Double peak with peaks separated by more than 100 GeV.
3. Single peak near the tt production threshold at m hh ∼ 346 GeV.
4. Double peak with peaks separated by less than 100 GeV.
Examples of the four shape types are shown in Fig. 1. According to our classification the Standard Model shape is contained in distributions of kind 3. Certainly there is some arbitrariness in the definition of these shapes. For example, shapes of kind 4 would move to kind 1 or 3 for bin widths ≥ 100 GeV. However, the other three shape types are quite robust and would be clearly distinguishable experimentally. Based on the parametrisation in Eq. (2.5), the normalised differential cross section is computed for a 5-dimensional grid in the coupling parameter space and according to its behaviour is classified into one of the four shape types. For this purpose we wrote a function, called "analyzer" in the following, that checks the slopes of the distribution and puts it into the corresponding class. At this stage the shape classes are mutually exclusive. For each point in the coupling parameter space, we also consider the variations of the result in each bin due to inclusion of the statistical uncertainties on the coefficients A i . If the shape obtained after these variations belongs to a different kind, we exclude that point from the data set. We find that for shape type 4 about 20% of points fall into this category and are therefore excluded, while for shape type 2 it is about 8%, and for types 1 and 3 it is less than 5%. Scale variations have not been included, as they tend to be rather uniform over the whole m hh range [38,64] and therefore would not significantly modify our shape analysis.

Classification of m hh distributions
Our results for the gg → hh cross sections at NLO are produced for a centre-ofmass energy of √ s = 13 TeV, using PDF4LHC15 nlo 100 pdfas [97] parton distribution functions interfaced via LHAPDF, along with the corresponding value for α s . The masses have been set to m h = 125 GeV, m t = 173 GeV and the top quark width has been set to zero.
We study the differential cross section as a function of five anomalous couplings, varying them in the ranges specified below, (2.6) The ranges are motivated by current experimental constraints. For c hhh we use a smaller range than suggested by experiment in order to focus more on the range where interesting shape features are present. In order to visualise the results, we project out 2-dimensional slices of the 5-dimensional parameter space, fixing the other three couplings to their SM values. This leads to a total of ten configurations. For each of these ten projections we generated a set of 10 6 parameter pairs. Feeding them through our analyzer we obtain the shape type produced by the given  Considering variations of c ggh versus c gghh , shown in the right-hand panel of Fig. 2, we find only shapes of kind 2 (green) and SM-like shapes (red). The existence of kind 2 shapes means that a double peak structure could be produced solely by effective Higgs-gluon couplings, while keeping c hhh , c t and c tt at their SM values. However, for the more likely case that c ggh deviates only slightly from zero [98], and so does c gghh , these couplings do not distort the SM shape significantly. Variations of c t versus c tt and c hhh versus c tt are shown in Fig. 3. Varying only c t and c tt , the shapes remain mainly SM-like. A small area in the c t − c tt plane however contains doubly peaked m hh distributions, which thus can originate from anomalous top-Higgs couplings only, while the trilinear Higgs coupling remains fixed at its SM value. Turning to c hhh versus c tt , displayed in the right-hand panel of Fig. 3, we find that for kind 1 and kind 4 shapes the parameter regions are split into two disconnected parts. While shapes of kind 1 are favoured by large values of c hhh , it becomes clear that large values of c tt , also related to triangle-type diagrams, can counterbalance this effect, because the top right corner is not a parameter region producing shapes of kind 1. If both c hhh and c tt are large, it is more likely to produce a double peak structure with close-by peaks (kind 4, blue). Further we see that shapes of kind 2 (well separated double peak structure, green) can be produced by values of c tt and c hhh which are rather close to the SM values.    as a transition from kind 1 to kind 3. Close-by double peaks (kind 4, blue) however are mostly associated to negative c hhh values. Note that a similar pattern can be found in Fig. 2 (left). Variations of c hhh versus c gghh , shown in Fig. 5 (right), are similar in the overall behaviour, and again show that c gghh and c ggh have a similar impact on the shape. Fig. 6 shows variations of c tt versus c ggh (left) and c tt versus c gghh (right). We observe that SM-like shapes (red) are preferred. However, doubly peaked structures are also possible for c tt values not too far from the SM value (c tt = 0). We also notice the similarity to Fig. 3 (left). The behaviour with respect to c gghh is again similar. Note that in SMEFT c ggh and c gghh are related, so this behaviour would necessarily be the case. However we will see later that a shape classification algorithm based on unsupervised learning is able to detect shape differences which distinguish effects of c ggh and c gghh . An interesting feature is also that kind 1 (black) and kind 4 (blue) shapes appear only when we modify the value of c hhh : for c hhh = 1 shapes of kind 1 never occur, and shapes of kind 4 are very unlikely. Further, the kind 4 shapes tend to point to (moderately) negative values of c hhh as long as c tt is close to zero, as can be seen from Figs. 2, 3 and 5.

Classification of p T,h distributions
So far we have studied m hh distributions, assuming that they are very well suited to study the sensitivity to shape changes induced by anomalous couplings. In order to verify that we do not miss out interesting features in the transverse momentum distributions, we also present a study of the p T,h distributions, but only at LO, to assess the salient features. The main difference with respect to the m hh case is that in the p T,h analysis we could identify only two kinds of clearly distinct shapes: single peak (SM-like, which we denote as 'p T,h kind 3') and double peak, with peaks separated by at least 30 GeV (denoted as 'p T,h kind 2'). Examples of such p T,h shapes are shown in Fig. 7.  The parameter spaces leading to singly or doubly peaked shapes are shown in Fig. 8 for the c t − c hhh and c hhh − c tt configurations. The parameter region related to shapes with a well separated double peak (green) is similar to the m hh case, as one can see comparing with Figs. 2 and 3. This indicates that the underlying parameter space leads to similar characteristics for the distributions differential in p T,h and m hh , however the p T,h distribution is less sensitive than the m hh distribution.

Classification and clustering by unsupervised learning 3.1 Unsupervised learning procedure
To assess the bias introduced by the definition of the four shape types we can approach the classification problem using unsupervised learning techniques. We construct a classification of the shapes of the m hh distribution into distinct types, where we do not predefine what the types should look like. For this purpose we use an autoencoder to find common patterns in the data and thus achieve a compressed representation. The setup is implemented using Keras [99] and TensorFlow [100]. As input data we use 30 bins of width 20 GeV for the normalised m hh distributions. We train the network based on a set of 10000 distributions, retaining 3000 for the validation. The encoder architecture, i.e. the part compressing the array information, is composed of two dense layers with 20 nodes and a middle layer with 4 nodes, the latter defining the length of the array containing the compressed information. The decoder architecture, which reconstructs the original array from the compressed one, is composed of two dense layers of 20 nodes and an output layer of the length of the input array.
We trained the autoencoder model over 10000 epochs using Adam [101] as optimizer and the root mean square error to define the loss function. Based on the trained autoencoder we applied the encoder model to the training and validation data to obtain two sets of compressed arrays. Then we used these arrays as inputs for a classification algorithm, where we employed the KMeans clustering algorithm from scikit-learn [67], asking for a classification into a given number of clusters. We tested classifications into four to eight clusters. Asking the KMeans algorithm to find four clusters yielded the shape types shown in Fig. 9. The coloured curves denote the cluster centres determined by the KMeans algorithm. These clusters do only partly coincide with the ones defined in Section 2.2. Shapes of kind 1, showing an enhanced m hh region, as well as shapes of kind 3 (SMlike), were clearly identified. Shapes having a double peak were clustered together with shapes showing a shoulder. The separation of the peaks plays a minor role, so shapes of kind 2 and kind 4 are put into the same cluster. However, another cluster was formed, containing shapes with an enhanced tail. Asking for 5, 6 and 7 clusters refined the distinction of certain shape features, as will be shown in the following, picking seven clusters as an example. Defining eight clusters did not lead to useful additional features but rather to the tendency to focus on local minima in the clustering space, while neglecting more gobal shape features.

Parameter space underlying the clusters
In this section we show how the parameter space relates to the clusters if we ask for four or seven clusters. For each parameter configuration of our 5-dimensional grid, we plot the corresponding cluster type in Fig. 11 and Figs. 13 to 16. The colour codes are shown in Figs. 9 and 10, and are also listed in Table 1. For clusters which are similar to the shape types defined in Section 2.2, we should also find patterns similar to the ones shown in Figs. 2 to 6. Comparing Fig. 11 (top row) with Fig. 2 (left), both showing variations of c t versus c hhh , we see that kind 1 shapes (black) are clearly identified. However, for both four and seven clusters the area for SM-like shapes got smaller, as the clustering algorithm also identifies features which were not considered in the predefined shapes. For example, the clustering into four clusters identifies shapes which are almost SM-like but have an enhanced tail (magenta), and the clustering into seven clusters in addition identifies shapes which are almost SM-like but have a shoulder (yellow). Certainly we could have defined such features in our analyzer as well, but it is not that easy to define where the tail starts and what exactly should be considered as "enhanced". Further, the figure clearly shows that small variations of c hhh can easily distort the SM-like shape, while the shape is more robust against variations of c t . Fig. 11 (bottom row) shows c t versus c tt . We again see that variations c t and c tt mostly produce SM-like shapes. Why this is so can be understood from the behaviour of the coefficients A i in eq. (2.5) which are relevant in these cases. For Fig. 11 (top row), only the coefficients A 1 , A 3 and A 7 are relevant. As A 1 and A 7 have opposite signs and a different peak location, this can generate a rich shape structure. For Fig. 11 (bottom row), the coefficients A 2 , A 6 and A 8 are relevant in addition to A 1 , A 3 and A 7 . A 2 being the coefficient of c 2 tt , it is dominant except for very small values of c tt and leads to a SM-like shape. We also observe that c tt has the tendency to enhance the total cross section, such that only a relatively small slice in c tt is left after considering the bounds on the total cross section. A behaviour similar to the one in Fig. 11 can be seen in Fig. 14: as c hhh varies the disribution goes through various shape types, while variations of c ggh and c gghh affect the shapes to less extent. Fig. 14 also shows that a positive c gghh value has the tendency to enhance the tail of the distribution. Fig. 15 shows c tt versus c ggh (top) and c tt versus c gghh (bottom). Compared to Fig. 6, the clustering into both four and seven clusters shows a better discrimination power between SM-like shapes and small deviations, for example due to an enhanced tail. We again see that c tt has a larger impact on the total cross section than c ggh or c gghh . Fig. 16, showing the c t − c ggh and c t − c gghh parameter planes, can be compared to Fig. 4. Again, both the case of four and of seven clusters indicates that the unsupervised  Table 1. learning algorithm is able to distinguish better subtle influences on the shape than our method based on humanly classified shapes.

Conclusions
The aim of this work was to provide more insight how certain configurations of anomalous couplings in the Higgs sector lead to a corresponding characteristic shape of the Higgs boson pair invariant mass distribution. For this purpose we employed the Lagrangian relevant to Higgs boson pair production as given in a non-linear Effective Field Theory framework, which contains five (potentially) anomalous couplings [64]. We produced data for the Higgs boson pair invariant mass distribution, based on a calculation which includes the NLO QCD corrections with full top quark mass dependence, varying all five coupling parameters by finite steps, thus producing a dense grid of data. Then we defined four characteristic shape types for the m hh distribution and visualised the parameter space leading to these shape types. To this aim we projected onto all possible two-dimensional slices of the parameter space, keeping the remaining parameters at their Standard Model values. We also considered p T,h distributions for a shape analysis, however we found that the m hh distribution is more sensitive to shape changes induced by anomalous couplings. Further, we tested an unsupervised learning approach to classify shapes. We produced 10000 distributions, trained a neural network based on an autoencoder to extract common shape features and tried to find the number of shape clusters which optimally catches different shape characteristics. Our study demonstrated that some shape features, like an enhanced tail or a shoulder in the m hh distribution, were caught very well by this procedure, and provided more insight about the underlying parameter space leading to such features than the analysis based on predefined shape classes. While machine learning is not essential to define shape clusters, it has the advantage of being straightforward and of minimising the human bias compared to other shape analysis methods. The shape analysis revealed that the Standard-Model-like shape is quite stable against variations of c t , c ggh and c gghh , as long as c hhh = 1, while deviations of c hhh from the SM value show a rich shape changing pattern. We also found that small deviations of c tt from zero are very likely to produce a doubly peaked structure in the m hh distribution, while SM-like shapes dominate again as c tt moves further away from zero. However, as c tt leads to a rather fast increase of the total cross section, the shape analysis in combination with the limits on the total cross section allows to put constraints on c tt . This is an interesting feature because, in contrast to c t and c ggh , c tt cannot be constrained directly from single Higgs boson processes. Further, an enhanced tail or a shoulder of the m hh distribution are likely to be produced by nonzero values of c gghh , however the influence of the effective Higgs-gluon couplings on the shape is milder than the one of c hhh and c tt .
Our approach associates 2-dimensional slices of a 5-dimensional parameter space to a given shape cluster and thus is going beyond a benchmark point analysis, which only provides pointwise snapshots of the 5-dimensional parameter space. The method can also be applied to other processes where anomalous couplings introduce characteristic shape changes to differential cross sections, and it can be extended to consider more than one distribution simultaneously.