Precision SMEFT bounds from the VBF Higgs at high transverse momentum

We study the production of Higgs bosons at high transverse momenta via vector-boson fusion (VBF) in the Standard Model Effective Field Theory (SMEFT). We find that contributions from four independent operator combinations dominate in this limit. These are the same `high energy primaries' that control high energy diboson processes, including Higgs-strahlung. We perform detailed collider simulations for the diphoton decay mode of the Higgs boson as well as the three final states arising from the ditau channel. Using the quadratic growth of the SMEFT contributions relative to the Standard Model (SM) contribution, we project very precise bounds on these operators that far surpass the corresponding bounds from the LEP experiment.


Introduction
In the absence of any evidence for new physics at the Large Hadron Collider (LHC), the Standard Model Effective Field Theory (SMEFT) is an efficient parametrisation for heavy new physics beyond the reach of the LHC. The effective field theory (EFT) formalism has, in fact, become the standard framework to precision physics at the LHC . As we approach higher integrated luminosities, very precise EFT limits will become achievable. This is, in particular true, because, with a higher luminosity, we will gain the ability to probe the high energy tails of various distributions accurately. This can lead to very precise bounds on SMEFT operators whose contributions grow with energy with respect to the SM.
As far as the Higgs and electroweak physics is concerned refs. [36,37] identified a fourdimensional subspace of the full 59 dimensional space of dimension-6 operators that can be measured very accurately in the diboson processes, pp → V h/V V , (V = W ± , Z) at high energies. That the same set of four operators control both double gauge boson production and Higgs-strahlung is a consequence of the Goldstone Boson Equivalence theorem 1 . These four directions in the EFT space were dubbed the 'high energy primaries' in ref. [36]. It was Weak Boson Fusion f f (′) Higgs-strahlung Figure 1. Figure shows the crossing symmetry that exists between the Higgs-strahlung and VBF Higgs production processes. The amplitudes for the two processes are the same up to an exchange of the Mandelstam variables, s ↔ t. As a result the same four directions in SMEFT space control VBF Higgs production at high t and Higgs-strahlung at high s. The figure has been produced with the help of the JaxoDraw package [45].
shown that by utilising the quadratic energy growth of the contributions of these operators with respect to the SM, the LHC sensitivity to probe these operators can far surpass LEP bounds.
In this work, we show that these same high-energy primaries are also sufficient to completely determine the SMEFT amplitude for Higgs production in the Vector Boson Fusion (VBF) channel if the transverse momentum of the Higgs boson is large. The reason for this is a crossing-symmetry that exists between the VBF and Higgs-strahlung diagrams, as shown in Fig. 1. This implies that the two processes have the same amplitude up to an interchange in the Mandelstam variables, s ↔ t. Thus VBF Higgs production probes the same four operators at large t as Higgs-strahlung at large s. Furthermore, one can also extend the equivalence theorem argument used in the diboson case in ref. [36] to this case and connect the VBF production of Higgs and gauge bosons.
Thus, the processes, pp → V V, V h and VBF production of Higgs or gauge bosons, which are entirely different from each other from a collider physics point of view, actually probe, in a very precise manner, the same set of four operators at high energies. Combining these processes can thus give us the best bounds on the high-energy primaries. Apart from the apparent statistical advantage, it is crucial to combine all these processes because each of them probes a unique linear combination of the four operators; all these processes should, thus, be included to eliminate all flat directions.
As one of the important results of this work, we will present the linear combination of the four operators that are probed by VBF Higgs production. In this work, we carry out a thorough collider analysis of the h → γγ channel and the three final states from the h → τ + τ − channel, namely, the hadronic, semi-leptonic and fully leptonic final states. We find that including all these channels is important as their sensitivity to the EFT effects is comparable. In the end, we obtain projections, much stronger than LEP bounds, on the high-energy primaries.

VBF Higgs production at high transverse momentum in the D6 SMEFT
The vertices in the dimension-6 (D6) lagrangian that contribute to the VBF Higgs production are the following, where we have expanded the D6 lagrangian taking α em , m Z and m W as the input parameters and any correction to the SM vector propagators, such as V µ V µ , V µν V µν and V µν F µν , have been eliminated in favor of the vertex corrections following refs. [14,46]. The δg Z f and g h Zf couplings include only a single generation of fermions such that f = u L , d L , u R and d R .
However, we will assume that these couplings are extended to all generations in a flavour universal way which is well justified if we assume Minimal Flavour Violation (MFV) [47]. We show how these vertices give corrections to VBF Higgs production in Fig. 2. It is the subprocess, qV → qh, a common part of all these diagrams, that receives corrections from the D6 lagrangian. As this hard process is a 2 → 2 process, its amplitude can be completely specified by two variables, for example, the Mandelstam variable, t, and an angle. Up to the leading terms in t/m 2 Z in the EFT correction, we obtain for the SILH Basis M(qV T,L → qh) amplitude,

Warsaw Basis
; J µ f =f γ µ f is the fermion current, the subscript L (T ) denotes the longitudinal (transverse) polarisation of the gauge boson, q denotes its four-momentum and the associated polarisation vector. The reason we chose to write the EFT corrections to the amplitude as a function of the Mandelstam variable, t, and not s, can be understood from eq. (2.2). The EFT corrections are functions of only t. The additional angular variable required to specify the scattering kinematics does not appear. This is physically important as it means that the EFT corrections grow with the transverse momentum of the Higgs boson as this kinematic variable is highly correlated with t.
As we discussed already, the qV → qh subprocess, is related to the Higgs-strahlung process, qq → V h, by crossing symmetry, as shown in Fig. 1 such that the expressions in eq. (2.2) are identical to the corresponding ones for the qq → V h process if we interchange t → s. This is very significant as it implies that VBF Higgs production at high transverse momentum probes the same set of EFT operators as qq → V h at high energies.
For t m 2 Z , the correction proportional to the contact term coupling, g h V f dominates over all other terms. The EFT correction due to g h V f grows with t because, unlike the SM diagram, the corresponding diagram in Fig. 2 does not have an intermediate V -propagator. The κ V V contribution to the transverse amplitude also grows with t. This contribution, however, cannot interfere with the dominant longitudinal piece of the SM amplitude and is thus sub-leading with respect to the g h V f contribution. The EFT corrections due to the couplings, δg V f , δg h bb and δĝ h V V do not grow with t at all. We have checked explicitly that at high t only the g h V f contributions are important and the effects of the other couplings are negligible.
Thus, VBF Higgs production at high transverse momentum is controlled by the five contact interaction couplings: g h Zf , with f = u L , u R , d L and d R and g h W ud . The operators contributing to these five couplings in the Warsaw basis are shown in Table 2. These contribute to these five anomalous couplings as follows, (2. 3) The coupling g h W ud is actually not independent of the above four contact interactions at the dimension-6 level, and is given by, Thus, only the four g h Zf couplings are independent and these completely determine the EFT deviations for the VBF Higgs production at high transverse momentum.
In Table 2, we also show the mapping of these four g h Zf couplings to other EFT parametrisations. In the first row of Table 2, we present the contributions of the universal (bosonic) operators of the SILH Lagrangian. We then show how these four couplings can be predicted/constrained by other independent measurements. The second row provides the mapping to the so-called BSM Primary basis of ref. [14]. In this basis, the correlations between different pseudo-observables are made explicit. For instance, in our case we can see how these 4 Higgs anomalous couplings can be predicted in terms of other measurements, namely, the couplings δg Z f defined in eq. (2.2) that are strongly constrained by Z-pole measurements at LEP, and the anomalous TGCs, δκ γ and δg Z 1 (in the notation of ref. [48]) that were constrained by the W W production during LEP2. In the fourth row of Table 2, we write the 4 couplings in terms of only the "oblique"/universal pseudo-observables, i.e. the TGCs δκ γ and δg Z 1 and the Peskin-TakeuchiŜ-parameter [49] in the normalisation of ref. [50]. For a definition of these observables we refer to the Lagrangian presented in ref. [10] (see also ref. [51]). Finally, in the last line of Table 2, we connect these contact terms to the original definition of the high energy primaries in ref. [36].

Collider Analyses
In this section, we will provide all the details for our collider studies of the three h → τ + τ − channels and the h → γγ channel. Utilising the fact that the EFT and SM contributions have the same form apart from a growth in the Mandelstam variable, t, we will use a twostep procedure to isolate our EFT signal. First, we will use sophisticated Neural Network (NN) techniques to optimally discriminate between the SM contribution from the other backgrounds in this section. We will then use the p h T distribution to isolate the EFT effects from the SM contribution in the next section.

The
The SM Higgs decays 6.27% of the times into a pair of τ -leptons. However, even though this is a significantly larger branching ratio, the τ -leptons are not stable, and hence we obtain three distinct final states, depending on the decay modes of the τ s. The cleanest of these final states comprises two light leptons (e, µ). Thus, we categorise our final states as τ τ , τ τ h and τ h τ h , where τ h is the hadronic remnant of the τ and is identified as a τ -jet. All of these final states are associated with missing transverse energy, / E T , and at least two hard jets. We consider all three possibilities here. We closely follow the ATLAS analysis [52] and then use multivariate methods to optimally isolate the SM VBF Higgs production from the rest of the backgrounds. Our analysis is done at the centre-of-mass energy of 14 TeV.
The electron (muon) candidates are required to have minimum transverse momentum, p T , of 15 GeV (10 GeV). The electrons (muons) are further required to be in an absolute pseudorapidity region of |η| < 2.47 (2.50). Furthermore, the electrons are disallowed in the transition region between the barrel and the endcap (1.37 < |η| < 1.52). Jets are reconstructed using the anti-k t algorithm [53] with a jet parameter of R = 0.4 and with a minimum p T of 20 GeV. The maximum allowed pseudorapidity range for the jets is required to be 4.5. In order to reconstruct b-jets, jets are matched with B-hadrons within ∆R(B, j) < 0.2 and b-jets are required to have |η| < 2.5. We require a flat b-tagging efficiency of 70%. In our setup, a light jet (including c-jets) can fake a b-jet with a faketagging efficiency of 1%. We tag the hadronic τ s with a tagging efficiency of 65%. Light jets can fake τ -jets with a probability of 2.5%.
There are multiple backgrounds to consider for the τ τ category. The dominant background comes from τ + τ − jets excluding the Higgs diagrams. We generate this background keeping in mind that the τ s can also emanate from off-shell photons. We also separately generate + − + jets, where = e, µ. The other backgrounds include tt which we generate separately for the fully leptonic, semi-leptonic and fully hadronic cases, single top (tq, tW and tb), (τ )ν+ jets and h+ jets with h → W W * , where the W s decay either leptonically or hadronically. The h+ jets samples are generated for the SM scenario as well as with the EFT couplings turned on. The Feynman rules are generated using the FeynRules package [54], through which we obtain the UFO [55] model. All the samples are then generated within the MadGraph version 2.6.5 [56] framework. The fragmentation, showering and hadronisation are done using Pythia version 8.2 [57]. For the full setup, we use the LO set of NNPDF2.3 parton distribution function [58] within the LHAPDF package [59]. For almost all the samples, we use the following cuts at the generation level: is the hardest (second-hardest) quark in p T and can be a b-quark as well, and ∆η j 1 j 2 > 2.5. The m j 1 j 2 and ∆η j 1 j 2 cuts are not applied to the single top samples at the generation level. All our event generations are at leading order (LO) in perturbation theory and we consider flat K-factors to roughly emulate the next-to-leading order (NLO) QCD effects. For the weak-boson fusion samples, the K-factor is almost a constant at 1.1 as a function of p T,j 1 [60]. For the l + l − + jets, lν+ jets (l = e, µ, τ ), the NLO QCD K-factor is roughly 1 as a function of p T,V [61], with V being the vector boson W/Z. . For the tt samples, we estimated the NNLO K-factor be around 1.63 [62]. For the single top channel, there are three sub-processes, i.e., t-channel, s-channel and associated W t production. The most dominant of these three sub-processes is the qb → q t channel followed by the bg → tW channel. The smallest contribution comes from qq → tb. Upon following ref. [63], we consider a conservative K-factor of 1.1 2 .
To validate our analysis, we reproduce the rectangular cut-based analysis in the ATLAS paper [52] and find very similar results. The details of our rectangular cut-based analyses are mentioned in Appendix A.1.
To obtain our final results we use a Neural Network (NN) analysis. First, in addition to the p T and |η| requirements, we also impose the following cuts, i.e. m j 1,2 > 500 GeV, ∆η j 1,2 > 2.5 and m col. τ τ < 300 GeV. m col. τ τ is the di-tau collinear mass [64]. The variables used in the NN training are shown in Table 6 in Appendix B. In order to prevent the NN from concentrating on the m τ τ peak, the sensitivity on the observable has been limited to 5 GeV bins. Table 7 shows the neural network results for the SM h → τ τ events as well as the other backgrounds, divided into three sub-regions, namely hadronic, semileptonic and leptonic. The first row shows the number of events at 0.3 ab −1 luminosity after the preprocessing mentioned before and the following row shows the yielding number of events after the classification. The procedure and detailed results regarding the neural network are discussed in Appendix B.

The h → γγ channel
Although the diphoton channel suffers from low branching fractions, due to its clean topology, it is relatively easy to separate from the background. With this in mind, we consider diphoton production with two jets topology to single out the VBF channel to achieve higher sensitivity in the aforementioned EFT operators further. By loosely following ref. [65], we construct two workspaces where first we design a cut-and-count based analysis. Then we studied on a Neural Network (NN) architecture which observed to increase our sensitivity.
Although Higgs-less diphoton with multijet production is the primary background in this channel, it has been shown that in low energy regimes, fake photons can have a significant impact on certain signal regions. The overall fractions of the background sources are presented as 78.7% from Higgsless diphoton channels, 18.6% from single-photon channels and 2.6% from multijet channels [65]. It is important to note that these fractions drastically change depending on the phase-space and the efficiency of the jet vertex tagging algorithm [66], where it has been shown that such techniques can reduce fake photon rates below 0.3% especially at higher energies [66][67][68]. To test this hypothesis, we generate the SM and other background samples using the aforementioned framework. All samples are generated with a specific set of cuts at the matrix-element level; minimum jet p T is taken to be 30 GeV, two leading jets' invariant mass is chosen to be greater than 500 GeV, and the pseudorapidity separation between the two leading jets is required to be greater than 1.5. As presented in Appendix A.2, these set of cuts has been chosen with respect to our cut-flow to populate the phase-space that is crucial for this analysis. The generated events are further showered and hadronised via Pythia version 8.2 [57].
The analysis of the event samples is performed within MadAnalysis 5 version 1.8 [69]. The hadronised events are reconstructed using FastJet version 3.3.2 [70] with the anti-k T algorithm [53], where the radius parameter has been chosen to be 0.4 with minimum transverse momentum of a reconstructed jet at 30 GeV. In order to simulate a simple detector environment, we apply particular tagging efficiencies on the b-jets, c-jets, hadronic taus and light jets. The tagging criteria are the same as discussed in the di-tau subsection 3.1.
In order to get definitive objects, detailed preselection requirements are applied. A photon candidate is required to have a minimum 25 GeV transverse momentum and is chosen to be within |η| < 2.37 and all the photon candidates are required to be separated from each other with ∆R > 0.4. On the other hand, a jet candidate is required to be within |η| < 4.5. A clear distinction between photon and jet objects is essential in this analysis in order to suppress the background that might arise from misidentified objects. For this reason, we require the two photons which have a maximum of 15% hadronic activity within a cone radius of 0.4. After this point we branched our framework into two where cut-based analysis has been discussed in Appendix A.2 and NN analysis has been discussed in Appendix B. Table 3 shows the NN results presented at 3 ab −1 integrated luminosity.   It shows the event yield for the SM Higgs contribution as well as the other backgrounds at preprocessing stage and for the classifier output for this channel.

Projected sensitivity for EFT couplings
In this section, we present the final sensitivity projections for the EFT couplings. The NN techniques used to optimally isolate the SM VBF Higgs contribution from the other background processes in the previous section also isolate our signal, the EFT interference contribution. This is because as shown in Sec. 2 the dominant EFT contributions have a matrix element that is the same as the SM apart from growth with the Mandelstam variable t. We will now use this growth with t to distinguish the EFT interference contribution from the SM; we will utilise the distribution of events with respect to p h T , a variable that is highly correlated to t, as the discriminant. We show the p h T distribution for the diphoton channel in Fig. 3. The EFT interference contribution can be seen to grow as a fraction of the SM contribution with p h T . To derive the projected sensitivity for the EFT couplings, we define a χ 2 function as follows.
where we take the SM as our null hypothesis. N exp i , denotes the expected number of events in the SM for the ith bin in the p h T distribution. We will then assume that the number of events observed in the i-th bin, N obs i is different from the SM due to the presence of EFT couplings. Finally, σ i includes both the statistical and systematic uncertainties, ∆ sys being the percentage of systematic uncertainty. As can be seen from Fig. 3, although the EFT interference contribution steadily grows with p h T as a fraction of the SM, the absolute value for the excess keeps decreasing. As a result of the χ 2 function initially increases with p h T , peaks at an intermediate value around p h T ∼ 300 GeV and then decreases again. As discussed in Sec. 2 the four contact couplings g h Zf , with f = u L , d L , u R , d R , give dominant contributions in the high p h T region. In a hadron collider, it is impossible to disentangle initial states for the process qV T,L → qh that gets corrections from these contact terms (see Fig. 2). Thus only a linear combination of these four contact couplings appears in the EFT interference term at a given p h T . As bins around p h T ∼ 300 GeV yield maximum sensitivity, the direction probed by VBF Higgs production turns out to be the above linear combination at this p h T value, Here the second line expresses the EFT direction in terms of Warsaw basis operators defined in Table 2. If p h T is varied, the coefficient of g h Zd L varies only by a few per cent whereas the coefficient of g h Zd R decreases by 20 % and that of g h Zd R decreases by 30 %. The left-handed couplings dominate the above direction as the W -boson luminosity is much larger than the Z-luminosity in VBF processes and the right-handed couplings cannot contribute to the qW L → qh process.
For our final sensitivity estimate we combine all four final states by adding their individual χ 2 functions. We find each final state has a comparable contribution to the final χ 2 , value which emphasises the importance of including all these four channels. Including only the bins for which p h T < 400 GeV we obtain our final bound for an integrated luminosity of 3 ab −1 (0.3 ab −1 ) and 10% systematic uncertainty as The p h T < 400 GeV cut ensures that most events safely respect the EFT validity requirement √ t < Λ. For strongly coupled UV completions, the values of the Wilson coefficients can be much larger than unity giving much larger values for Λ and thus higher allowed p h T values. This, however, would not lead to much better bounds as the most sensitive bins are around p h T ∼ 300 GeV. Our results, thus, do not depend too much on whether the UV completion is weakly or strongly coupled.
Combination with diboson channels: As we discussed in Sec. 1, the diboson channels pp → V V /V h at high energies, and the VBF Higgs production process at high p h T considered here, probe the same set of four operators. Of these W Z production was studied in ref. [36], Zh production in ref. [37,43] and W h production in ref. [43]. Compiling the 68 % CL HL-LHC bounds obtained in these papers with our result in eq. (4.2), we obtain in terms of Warsaw basis operators, for 3 ab −1 integrated luminosity where ξ = v 2 /Λ 2 . It is clear that all these different processes constrain a different direction in four-dimensional space of high energy primaries. As the W Z and W h process constrain the same direction, the above bounds still leave a flat direction unconstrained. An additional bound from the W W production process will thus close all flat directions and allow us to bound all the four operators simultaneously. The W W process was studied in ref. [38] and it is clear from the results that it puts strong bounds on yet another complementary direction. It is, however, difficult to infer the direction probed by the pp → W W process from the results of ref. [38] as in this paper the W W and W Z channels have been presented in a combined way including both the interference and EFT squared contributions. As mentioned in Sec. 1 the VBF production of gauge bosons will also probe the same four-dimensional space. These channels should thus be added to over-constrain the system and maximally constrain the high energy primaries.
Comparison with LEP bounds: Using Table 2 we can also write this direction in terms of other pseudo-observables already constrained by LEP, where the first line applies to the general case and the second line to the universal case. The LEP bounds on the above pseudo-observables are given by the second column of Table 4. The LEP bound on the full direction is thus given by the largest term in the right hand sides of the above equations which is g h(V BF ) Zf 1.08 δg Z 1 0.02 which is almost an order of magnitude weaker than the bound in eq. (4.2).
One can also assume that there is no cancellation between the different terms in eq. 4.4. This allows us to require that each term in the right-hand side respects the bound in eq. (4.2). We then obtain the results in the first column of Table 4. We see that with this 'no tuning' assumption, relative to LEP bounds, the results of this work can lead to much stronger bounds on TGCs, comparable bounds on deviations of Z coupling to quarks and weaker bounds for the oblique parameters.  Table 4. Table shows the comparison of 68 % CL bounds extracted from the VBF analyses with the existing LEP bounds. The bounds outside the parentheses are projections for 3 ab −1 data and those inside are for 0.3 ab −1 data. To get our projection, we demand that each term in eq. (4.4) respects the bound in eq. (4.2). The LEP bounds on the Z-boson couplings to quarks, δg Z f , are taken from ref. [12], the bounds on the charged TGCs are taken from ref. [71], the bound onŜ from ref. [72], and the bounds on the W, Y observables from ref. [50].

Conclusions
It is increasingly being recognised that the LHC is a precision machine. This is because various examples are beginning to appear where certain operators can be probed very precisely, for instance, by studying the high energy tails of different processes. One of the best examples is that of the high energy primaries, the four operators that dominate the high energy tails in diboson production, including the Higgs-strahlung process. In this work, we highlight how VBF Higgs production probes a linear combination of the same operators given by eq. 4.1. Our results are complementary to those obtained in the Higgsstrahlung, and diboson processes as all these processes probe different directions in this four-dimensional space (see eq. (4.3)). Our final projection for the HL-LHC bounds on the direction corresponding to VBF Higgs production is per mille level (see eq. (4.2)) which translates to a multi-TeV bound on the new physics scale given in eq. (4.3). These bounds far surpass the existing LEP bounds (see discussion below eq. (4.4) and Table 4).
As far as Higgs and electroweak physics is concerned, the highest energy scales the LHC will probe indirectly might well be via a precise measurement of these high energy primaries 3 . These may therefore become part of the legacy measurements of LHC. The VBF Higgs production process studied in this work would be an important and integral part of this program.

A Rectangular Cut-Based Analyses
Overall, we follow the relevant cuts listed in tables 3 and 4 of ref. [52]. For the τ τ case, we demand m j 1 j 2 > 500 GeV instead of 800 GeV as mentioned in the paper. We use the tight VBF category for the τ h τ h case. For our lepton isolation, we require that the hadronic activity around an isolated lepton (e, µ) within a cone of ∆R = 0.2, should not exceed 10% of its p T . With the rectangular cut-based analyses, we get the following results.

A.2 The h → γγ channel
As mentioned in section 3.2, we construct a cut-based analysis by loosely following ref. [65]. After the preselections mentioned above, in order to identify the VBF channel, we use standard VBF cuts where the b-jets are vetoed, and we require at least two jets where the leading two are separated into two hemispheres with |∆η| > 3. To identify the boosted VBF topology, the general recipe requires an invariant mass cut between two leading jets at the order of 300-400 GeV as applied in ref. [65]. However, we observe higher sensitivity to the EFT operators achieved when higher M jj requirement is applied. In addition to 3 Another example where LHC can indirectly probe very high scales by studying high energy tails is the Drell-Yan process as shown in ref. [73]. 4 We must note that, here and in what follows, we are loosely referring to the SM VBF as the 'signal' (S), and the the rest of the samples as 'background' (B). In our final analyses, the SM VBF is of course part of the background and the EFT contribution is the true signal.
the isolation requirement presented in section 3.2, we further demand additional angular requirements between the photons and jets to restrict the phase-space for additional emissions. The minimum angular separation between jets and photons, ∆R min γj is observed to perform as a tremendous discriminatory tool against background rejection. Limiting ∆R min γj > 1.5 is observed to separate the background from the signal events without any loss of the desired phase-space. We also require an azimuthal angle separation between the two-jet and two-photon systems. Although this requirement does not propose relatively active discrimination, it has been shown to be a powerful tool to suppress theoretical uncertainties and veto additional jets in the event sample [65]. These series of requirements cause largely boosted samples, where although our signal does not show any particular azimuthal separation preference, we observe that the background is dominated by highly separated jets in azimuthal angle. For this reason, we require angular separation between the two leading jets to be less than 2. Finally, the most effective cut was expectedly the invariant mass of the two-photon system, which is chosen to be within 125 ± 3 GeV. Also, the reconstructed Higgs rapidity is required to lie between the two jets. Table 5 summarises all the cuts and their relative efficiencies for both the signal and the background samples. At the bottom portion of the table, we show various discriminatory variables to asses the quality of the yield events. All results are presented at 0.3 ab −1 . It is important to note that we also generate single-photon samples to quantify the effect of fake photon contamination in the sample. For this we use the SFS module of MadAnalysis 5 [74] to simulate a light jet mis-tag rate of 0.3%. However, we observe that out of a million events, we do not have any to pass the Higgs mass requirement. Thus, in order to save valuable computation time, we assume that such effects are insignificant in such boosted phase-space regimes.

B Neural Network Analysis
In recent years, the particle physics community has been increasingly adapting to the use of deep neural networks (DNN) in challenging signal characterisation problems [75][76][77][78][79][80][81][82][83][84][85]. The Keras library [86] offers a python-based, flexible framework using feed-forward networks [87][88][89] to create mashed layers with connected neurons (nodes). In order to increase our sensitivity to the operators presented above, we design a simple workspace to determine achievable sensitivities with different neural network architectures. We assume to have certain common properties to apply on each architecture. Each architecture is optimised using the Adam algorithm [90] and to accommodate multi-class classification, sparse categorical crossentropy loss function is used where the crossentropy is defined as Here p truth refers to the vector of the truth values and p pred is the vector of prediction probabilities. We use softmax activation in the output, which is essentially a combination of sigmoid functions for each output class. Furthermore, instead of traditionally used sigmoid activation for each layer, we use rectified linear unit (ReLU) 5 . Each model is initialised with 5 See ref. [91] for advantages of the ReLU activation over sigmoid function.  The class weights are normalised with respect to their occurrences in the training sample in order to compensate for the difference of the population of each class. We investigate four different signal regions, namely diphoton, and ditau decaying hadronically (hereafter hadronic), ditau decaying semileptonically (hereafter semi-leptonic) and ditau decaying leptonically (hereafter leptonic). Each sample has a separate set of backgrounds, and in order to save computation time, we only use the dominant background samples that have the greatest impact in a given signal region. In order to prevent over-training for each signal region, we use the dropout and kernelregularisation methods. Each layer is required to have 25% probability of dropping each node in order to prevent dependency on a given parameter. Additionally, each hidden layer is supported via the L 2 kernel regularisation [92] with a penalty strength of 10 −2 . This penalty term is directly reflected on the loss as where ω i is the weight of the node, b i is the bias and λ is the penalty strength. Lower and higher values of the penalty strength are tested as well which lead to the signal regions to over train and, in case of the latter, cumulative accuracy has been observed to drop below 70% respectively.
Each signal region undergoes a specific preprocessing before training. In addition to aforementioned preselection requirements, diphoton sample is required to have at least two jets, and two isolated photons in the data sample and the invariant mass of the two leading jet is required to be at least 550 GeV. In order to prevent the NN from concentrating only on the diphoton invariant mass, we require it to be within 125 ± 3 GeV window and remove the M γγ from the training parameters. Otherwise, it has been observed that the NN avoids all other variables and concentrate solely on M γγ peak, which has been tested up to 5 GeV resolution for the distribution. Although this reduces the accuracy of the test sample significantly, it is necessary to avoid the NN to concentrate only on the sharp invariant mass peak. In order to remain in the realm of VBF, we also veto all b-jets and require |∆η jj | > 1.5, without requiring them to be on different hemispheres. Following the same recipe, the ditau samples are preprocessed by requiring M jj > 500 GeV, |∆η jj | > 2.5 and M τ τ < 300 GeV. For the leptonic final state, we demand two isolated leptons, for the semi-leptonic final state, one isolated lepton and one hadronic tau. Finally, for the hadronic final state, we require two hadronic taus and veto events with isolated leptons. As before, in order to prevent the NN to concentrate on the invariant mass peak of two taus, we require it to have a resolution of 5 GeV. Table 6 summarises all the parameters that have been used for each region. Here, p jj T refer to the combined vectorial p T of the two-hardest jets, τ 1h/2h refer to the visible part of the hardest and the second-hardest τ -lepton, which can be e, µ or τ h , m T 2 is the stransverse mass variable [93,94], ∆φs are the azimuthal angle separations, x 1/2 are the visible momentum fractions for the two τ leptons and ∆R = ∆η 2 + ∆φ 2 . All the other variables are self-explanatory. For the semi-leptonic case, we also have an additional variable, the transverse mass, m T .
Hyper-parameter optimisation is a challenging problem in machine learning. To further understand the phase-space and the effect of the layers, we devise a simple scanning procedure which starts from a linear model and increases number of hidden layers and nodes depending on the performance of the NN. To simplify the process, the number of nodes is chosen to be a certain multiple of the number of input parameters. Table 3 shows the general results of the NN where preselection gives the number of events remaining after the preprocessing and the classifier output is the number of events left after the classification process. All the results are presented at 3 ab −1 integrated luminosity. In Table 7, we present corresponding classification results for signal in the training sample. As expected, we observe a larger S/B ratio for diphoton channel with respect to the ditau channels. The test accuracy is measured via 10-fold validation in order to see the fluctuations in the results. Although the diphoton channel gives the least amount of uncertainties, none of Table 6. Parameters which are used in corresponding NN training. Observables shown with comma, O i,j , represents a system of corresponding i th leading and j th leading particles and the ones shown with slash, O i/j , represents the usage of the same observable for both i th leading and j th leading reconstructed object separately. the uncertainties goes beyond 4% of the mean test accuracy. We also present the signal precision, the true positive rate (TPR) and the F1-score [95] for the test sample. The last row of the Table 7 shows the number of hidden layers and their corresponding number of nodes in each layer. Fig. 4 shows the corresponding receiver operating characteristic (ROC) curve where dark blue, red, green and light blue curves represents hadronic, semileptonic, leptonic and diphoton channels. Area under the ROC (AUC) curve with respect to TPR and false positive rate (FPR) has been attached to each label. Black dashed line presents a reference for random guess.
It is important to understand how the neural network learns and interprets the data, where understanding such features can help in optimising the cut-based analyses as well. For this reason, we adapt the SHapley Additive exPlanations (SHAP) [96] method. The SHAP value shows the average of the marginal contributions of the input parameters to the neural network. In order to measure this value, we use the same training and test samples where the SHAP explainer trained with 2000 events from the training sample and the SHAP values are extracted using 1000 random events from the test sample. For diphoton channel, the most important ten observables with SHAP values are presented in the right panel of Fig. 5    values are represented with blue bars where the average SHAP value has been divided by the contributions coming from the signal and the background. Although the SHAP values are relatively low, one can immediately see the importance of the angular observables and transverse momenta of the second leading photon in the NN. The left panel of Fig. 5 shows the classifier output for the NN architecture where the red line shows the signal and the blue bars shows the background sample. In figures 6, 7 and 8, we show the classifier output (on the left) and the SHAP values for the ten parameters that have the biggest impact on the classification for hadronic, semi-leptonic and leptonic final states respectively. As seen in the SHAP values, the ditau signal regions mostly rely on angular observables between final state particles. One can see in the classifier outputs, τ τ +jets is the most dominant background. The loss of sensitivity can also be observed in the classifier outputs where the leptonic signal region can not reach beyond 70%. This outcome renders semileptonic and leptonic signal regions as not optimal for sensitivity studies of the particular operator in hand. In all, we observe much superior results in the diphoton signal region in terms of both statistical significance and accuracy of the NN. As presented in Appendix A, compared to cut-based analysis our S/B ratio is significantly lower in the NN analysis which is by design. We observed that by increasing yielding number of signal events one can populate the high energetic regions that are crucial for the sensitivity of EFT operators. We observe up to 38% improvement depending on the choice of confidence level, luminosity and systematic uncertainty in the operator sensitivities due to large number of signal events left after the classification. This is because large statistical significance in the rectangular cut-based approach has been achieved with less yielding events which degrades the impact of the events where EFT effects are most prominent. In the ditau signal regions, due to the vast amount of background sources, the classification accuracy is lower than the diphoton channel. Expectedly, the τ τ + jets background is the most dominant background source for   all ditau subregions. All these results also compared with a boosted decision tree (BDT) algorithm. Although the BDT results were slightly less significant compared to the NN, we observed that both methods were giving priorities to similar observables (as represented by average SHAP values in NN case) to increase signal significance.