Extract the energy scale of anomalous $\gamma\gamma \to W^+W^-$ scattering in the vector boson scattering process using artificial neural networks

As a model independent approach to search for the signals of new physics~(NP) beyond the Standard Model~(SM), the SM effective field theory~(SMEFT) draws a lot of attention recently. The energy scale of a process is an important parameter in the study of an EFT such as the SMEFT. However, for the processes at a hadron collider with neutrinos in the final states, the energy scales are difficult to reconstruct. In this paper, we study the energy scale of anomalous $\gamma\gamma \to W^+W^-$ scattering in the vector boson scattering~(VBS) process $pp\to j j \ell^+\ell^-\nu\bar{\nu}$ at the large hadron collider~(LHC) using artificial neural networks~(ANNs). We find that the ANN is a powerful tool to reconstruct the energy scale of $\gamma\gamma \to W^+W^-$ scattering. The factors affecting the effects of ANNs are also studied. In addition, we make an attempt to interpret the ANN and arrive at an approximate formula which has only five fitting parameters and works much better than the approximation derived from kinematic analysis. With the help of ANN approach, the unitarity bound is applied as a cut on the energy scale of $\gamma\gamma \to W^+W^-$ scattering, which is found to has a significant suppressive effect on signal events. The sensitivity of the process $pp\to j j \ell^+\ell^-\nu\bar{\nu}$ to anomalous $\gamma\gamma WW$ couplings and the expected constraints on the coefficients at current and possible future LHC are also studied.


Introduction
The Standard Model (SM) has proven to be very successful and accurate.Searching for new physics (NP) beyond the SM is one of the main goals of current and future colliders.Due to the lack of clear guidelines, a model independent approach to look for NP signals has gradually become popular, known as the SM effective field theory (SMEFT) [1][2][3][4].It is assumed that energy scales of processes at current colliders are not large enough to directly produce the signals of NP particles.At low energies, the NP sector is decoupled, one can integrate out NP particles, then NP effects become new interactions of known particles, which are in the form of higher dimensional operators.Then, the SM can be extended as a low energy EFT of some unknown UV completion by adding those higher dimensional operators with small Wilson coefficients, result in a Lagrangian as where O 6i and O 8j are dimension-6 and dimension-8 operators, C 6i /Λ 2 and C 8j /Λ 4 are corresponding Wilson coefficients, Λ is the energy scale of NP.In Eq. (1.1), we have neglected the odd-dimensional operators which violate the lepton number conservation.
To investigate an EFT, the energy scale is an important parameter, because the Wilson coefficients are functions of energy scales.It has been suggested that, in experiments the constraints on the Wilson coefficients of higher dimensional operators should be given as functions of energy scales [5].Meanwhile, there are theoretical constraints such as the unitarity bounds [6][7][8][9][10] which are also functions of energy scales.In conclusion, the reconstruction of the center-of-mass (c.m.) energy is an important task in phenomenological studies of the SMEFT.
At a proton-proton (pp) collider such as the Large Hadron Collider (LHC), due to the parton distribution function (PDF), the c.m. energy can only be reconstructed by using the information in the final states.This poses difficulties for processes whose final states contain neutrinos.For example, for the vector boson scattering (VBS) process pp → jjγW , in order to study the unitarity bounds of anomalous quartic couplings (aQGCs), one needs to reconstruct the c.m. energy of subprocess γ(Z)W → γW subjected to a delicate kinematic analysis and approximation [11].Another example is the process pp → W W , where the study of validity of the SMEFT has also encountered great difficulties due to the neutrinos in the final state [12,13].
There is a similar problem in the studies of processes containing vector bosons at the LHC.The longitudinal polarized vector bosons are related to the symmetry broken and the Higgs mechanism, therefore draws a lot of attention [14][15][16][17][18][19].The polarization of a vector boson can be inferred by the momentum of the daughter charged lepton in the rest-frame of the vector boson, the so called helicity frame [20].However, the momentum of the W ± boson is difficult to reconstruct due the neutrino, as a result, it is difficult to boost the charged lepton to the rest frame of the W boson, which is one of the reasons that the polarization of a W boson is difficult to determinate.In response to the problems in determining the polarizations, a novel approach has been introduced into high energy physics (HEP).It has been shown that, the artificial neural network (ANN) can be very powerful in determining the polarizations of W, Z bosons [21][22][23] and τ lepton [24].The ANN approach is one of the machine learning methods, which have been widely used in HEP, and are being developed rapidly in recent years [25][26][27][28][29][30][31][32][33][34].
In this paper, we study the aQGCs induced by dimension-8 operators [35,36] in the process pp → jjW + W − with leptonic decays of W ± bosons.The aQGCs can be contributed by a lot of NP models [37][38][39][40][41][42][43][44][45][46][47], and has been studied intensively [48][49][50][51][52][53].Dimension-6 operators cannot contribute to aQGCs while leaving anomalous triple gauge couplings (aTGCs) along [50], therefore we concentrate on the dimension-8 operators.A recent study shows that the existence of dimension-8 operators is necessary as long as the dimension-6 operators exist in the convex geometry point of view to the SMEFT space [54].Besides, there are cases that the contributions from dimension-6 operators are absent [37,38,[55][56][57][58][59].Moreover, aQGCs can lead to richer helicity combinations than dimension-6 aTGCs [60].Apart from that, aQGCs can be generated by tree diagrams while aTGCs are generated by loop diagrams [61], therefore the possibility exists that the signals of dimension-8 aQGCs are more significant than the dimension-6 aTGCs.Consequently, while the SMEFT has mainly been applied with dimension-6 operators, recently the study of dimension-8 operators has gradually received much attention [35,36,55,62].The most sensitive processes for aQGCs are the VBS processes [63].The VBS processes have been extensively studied by both the ATLAS and the CMS groups [60,[64][65][66][67][68][69][70][71][72][73][74][75][76][77][78], and will continue to draw attentions with future runs of the LHC.The evidence of exclusive or quasi-exclusive γγ → W + W − process has been found [79].The next-to-leading order QCD corrections to the process pp → W + W − jj have been computed [80], and the K factor is found to be close to one (K ≈ 0.98).As introduced, to study the dimension-8 operators, the two neutrinos in the final state will cause difficulties.However, these difficulties just provide a good test for the ANN approach.We use the ANN approach to study the process pp → jjℓ + ℓ − ν ν with the focus on the reconstruction of the energy scale of the γγ → W + W − subprocess.We discuss the justification of using ANN to study the energy of the subprocess, and show that the ANN can achieve better results than kinematic analysis.An interpretation of the ANN is discussed, which indicates that the ANN can be approximated by a function of three variables and contains five fitting parameters.The unitarity bounds and the signal significances of the aQGCs are also studied in this paper.
The remainder of the paper is organized as follows, in Sec. 2 we briefly introduce the aQGCs; in Sec. 3 the kinematic analysis is presented; the numerical results of the ANN approach is shown in Sec.4; an interpretation of the ANN is presented in Sec.5; in Sec.6, we use the results of ANN to study the unitarity bounds and signal significances of aQGCs; Sec.7 is a summary.

A brief introduction of aQGCs
In this section, we briefly introduce the dimension-8 operators contributing to the aQGCs frequently used in experiments.The Lagrangian relevant to the process γγ [35,36] where Φ is the SM Higgs doublet, W ≡ ⃗ σ • ⃗ W /2 with σ being the Pauli matrices and ⃗ W ≡ {W 1 , W 2 , W 3 }.The O M 0,1,2,3,4,5,7 and O T 0,1,2,5,6,7 operators can contribute to five anomalous γγW W couplings, which can be written as where The coefficients of the couplings can be related to the coefficients of the operators as (2.3)Because each dimension-8 operator contributes to only one vertex, and the constraints on the dimension-8 operators are obtained by assuming one operator at a time in experiments, the constraints on α i can be derived by the constraints on dimension-8 operators [52] which are listed in Table 1.For simplicity, we concentrate on these five couplings in this paper.Table 1: The constraints on anomalous γγW W couplings and the corresponding limits on the dimension-8 operators at 95% CL [75].found to be about three orders of magnitude smaller compared with the VBS contribution in Fig. 1.(a) [52], therefore in the following discussions we concentrate on the effect of the VBS contribution.Moreover, we only consider the leptonic decays of the W ± bosons, and focus on the process pp → ℓ + ℓ − ννjj at √ s = 13 TeV, with ℓ = e, µ.To distinguish, the s of the subprocess γγ → W + W − is denoted as ŝ.
3 Approximation of the energy scale A prerequisite for using an ANN to mine information is that the information to be mined actually exists.This is extremely important because the ANN is considered to be a 'black box'.To demonstrate that the ŝ can be approximately reconstructed, and also as a comparison, we briefly introduce the method for estimating ŝ in Ref. [52].Assuming the W ± bosons are energetic and neglecting the O(M W / √ ŝ) contributions, the leptons can be viewed as approximately collinear to the neutrinos, i.e., with u and v the coefficients to be determined, the momenta of the neutrinos can be related to the momenta of the charged leptons as p ν ≈ up ℓ + and p ν ≈ vp ℓ − , which lead to the equations by which u and v can be solved and then ŝ can be reconstructed.The result is where This approximation is based on the assumption that W ± bosons are energetic, which is supported by the fact that the ŝ are large for the signal events induced by aQGCs.However, when ŝ is large, the charged leptons are approximately back-to-back, and the two equations in Eq. (3.1) will degenerate when charged leptons are exactly back-to-back.In other words, for most events induced by aQGCs, the κ are very small.When κ is close to zero, it will amplify the errors in numerator, resulting in an inaccurate approximation.
Since approximations exist, using an ANN to reconstruct ŝ is nothing but to look for a better approximation.The ANN is good at looking for approximations and finding patterns in complex relationships, and therefore has the potential to yield better results.

Numerical results of the ANN
In this section we use the ANN approach to reconstruct ŝ.To train the ANN, we use the Monte-Carlo (MC) simulation to generate the data-sets.We take the contributions from both diagrams of Fig. 1 as the signal because they both signal the existence of the aQGCs.As explained, the effect of The signal events are generated by using MadGraph5_aMC@NLO [81,82], with a parton shower using Pythia82 [83].The PDF is NNPDF2.3[84].A CMS-like detector simulation is applied using Delphes [85].The events are generated assuming one operator at a time, and using the largest coefficients listed in Table 1.
The signal of the VBS process is characterized by two quark jets, events are thus required to have at least 2 jets and two opposite sign charged leptons.The dominant background is the process pp → t t + N j with t → W + b ( t → W + b) and with b-jet mistagged.To reduce this background, we also require N j ≤ 5.In the following, the results are established after the lepton number cut and jet number cut N ℓ = 2 and 2 ≤ N j ≤ 5. To train the ANN, we generate 10 6 events to build the training data-set, and another 10 6 events to build the validation data-set for each anomalous γγW W coupling.After the requirement on the numbers of leptons and jets, there are about 6 × 10 5 events in each data-set.
Before the detector simulation, the ŝ can be obtained, which is denoted as ŝtr .Each event corresponds to an element in the data-set consists of 19 variables.For each event, an 18 dimensional vector provides as the input to the ANN, which consists of 18 variables.They are the components of the 4-momenta of the two hardest jets, the 4-momenta of the two hardest opposite signed charged leptons and the components of the transverse missing momentum.The output of the ANN corresponds to ŝ.The true labels are the 19-th variables of the elements in the data-sets which are ŝtr of the events.
In this section, we mainly focus on the contribution of V 0 vertex.It has been found that the process pp → jjℓ + ℓ − ν ν is insensitive to the V 4 vertex [52], we do not study V 4 in this paper.

ANN approach
ANN is a mathematical model to simulate the complex neural system of a human brain, and it is also an information processing system for large-scale distributed parallel information processing [86].The ANN is good at finding the complex mathematical mapping relationships between input and output, it could be utilized to unveil hidden information in the final states.The mapping relationship is determined by the number of interconnected nodes and their connection modes.In this paper, we use a dense connected ANN.
An ANN is composed with one or more hidden layers and an output layer.Denoting x i j as neurons in the i-th layer, where x 1 1≤j≤n 1 are input neurons, x 2≤i≤l−1 1≤j≤n i are in hidden layers and x l 1 is the output neuron, the ANN can be depicted in Fig. 2. Without causing ambiguity, the value at a neuron takes the same notation x i j .x i+1 j ′ can be related with x i j as T l ×7- ß?5 Figure 2: The architecture of an ANN used in this paper.'i', 'h' and 'o' stand for input layer, hidden layer and output layer, respectively.l is the number of layers and n i is the number of neurons in the i-th layer. where are components of a bias vector, and f i+1 is an activate function.The activation functions for the hidden layers are chosen as the parametric rectified linear unit (PReLU) function [87] defined as where α's are trainable parameters.For the output layer, no activation function (i.e., linear activation function) is used.In this paper, without further specification we use l = 10, n 10>l>1 = 32, n 1 is as same as the dimension of input data and n 10 = 1 for the output layer.
The training data-sets are normalized using the z-score standardization, i.e., denoting v i as the i-th variable of one of the elements in the data-sets, v ′ i is used instead of v i which is defined as v ′ i = (v i − vi )/σ v i , where vi and σ v i are the mean value and the standard deviation of all i-th variables of the elements in the data-sets.The architecture is built using Keras with a TensorFlow [88] backend.The data preparation is performed by MLAnalysis [89].The learning curves of the ANNs for V 0 and V 2 are shown in Fig. 3.Note that the label is also standardized, therefore the mean squared error (mse) is dimensionless.From Fig. 3, we find that the mse stopped to decrease at about 150 epoches for V 0,1 vertices, and at about 300 epoches for V 2,3 vertices.To avoid overfitting, we stop the training at 150 epoches for V 0,1 vertices, and at 300 epoches for V 2,3 vertices.Note that the V 0,1 vertices are from O M i operators and V 2,3 vertices are from the O T i operators, it is interesting that the ANNs are more difficult to train with the signal events induced by O T i operators.4, where √ ŝtr is the mean value of √ ŝtr .One can see that the deviation of ŝap from √ ŝtr is smaller than using √ ŝtr as an approximation.On the other hand, the result of the ANN is much better than the approximation derived from the kinematical analysis.

The information in the data-set
It has been shown that the ANN can reconstruct ŝ much better than the kinematical analysis.In this subsection, we investigate how the performances of the ANNs are affected by different factors.Specifically, we are interested in where does the information to reconstruct ŝ contained in.To answer this question, we pay particular attention to the information that is not used by the approximation in Eq. (3.2).In this subsection, the epoches are determined similarly as the previous subsection.

Compare different sectors
The approximation in Eq. (3.2) does not use the momenta of jets which are difficult to be made use of.To investigate how the results are affected by the information contained in jets, charged leptons and missing momentum, we divide the input data into 3 sectors.We denote ŝ2lm as the ŝ predicted by the ANN trained with the data-set consists of the components of the 4-momenta of the two hardest opposite signed charged leptons and the transverse missing momentum, ŝ2jm as the result of the ANN trained with the data-set consists of the components of the 4-momenta of the two hardest jets and the transverse missing momentum, ŝ2j2l as the result of the ANN trained with the data-set consists of the components of the 4-momenta of the two hardest jets and the 4-momenta of the two hardest opposite signed charged leptons, ŝ2j as the result of the ANN trained with the data-set consists of the components of the 4-momenta of the two hardest jets, ŝ2l as the result of the ANN trained with the data-set consists of the components of the 4-momenta of the two hardest opposite signed charged leptons, ŝm as the result of the ANN trained with the data-set consists of the components of the transverse missing momentum, respectively.The normalized distributions of |∆ √ ŝ| are shown in Fig. 5. From the distributions we find that the importance of different sectors can be ordered as p ℓ ± > p miss T > p jet .Indeed, the ANN trained with the data-set including the momenta of jets can produce slightly more precise results.Nevertheless, the effect brought about by jets is small and is not the main reason why ANNs are more accurate.

Compare different operators
Except for assuming the events are induced by aQGCs, the approximation in Eq. (3.2) does not use any other information about the anomalous couplings.In the approximation, the formula to estimate √ ŝ is same for all couplings.Meanwhile, using the ANN approach, we can train the ANNs by using data-sets consist of signal events from different couplings.In this way we can investigate whether the distinction between couplings is important for the reconstruction of √ ŝ.The normalized distributions of |∆ √ ŝ| for ŝV 0 ,V i and ŝV i ,V 0 .Note that ŝV 0 ,V 0 is the ŝann shown in Figs. 4 and 5.
Denoting the ŝV i ,V j as the predicted ŝ of the events in a V j validation data-set but predicted by the ANN trained with the V i training data-set.The results of ŝV 0 ,V i and ŝV i ,V 0 are shown in Fig. 6.Again, the difference between O M i operators and O T i operators can be found from the results of ŝV 0 ,V i .From the results of ŝV i ,V 0 , it can be seen that the predictive powers of the ANNs trained using different data-sets are about the same for signal events of V 0 vertex.In fact, the ANN trained with V 3 training data-set is slightly more accurate than the ANN trained with V 0 training data-set in predicting the ŝ of the V 0 validation data-set.Therefore we conclude that the information about different couplings is not made use of.The difference in the distributions of | √ ŝtr − ŝV 0 ,V i | is just another evidence that, from the perspectives of the ANNs the signal events induced by O T i operators are more difficult to learn.

Compare different collision energies
The approximation in Eq. (3.2) assumes that the ŝ is large, however it does not use the information of collision energy √ s = 13 TeV.To investigate whether this information is made use of, we prepare three training data-sets, which are the signal events of V 0 generated at √ s = 12, 13 and 14 TeV.ŝ of the events in the √ s = 13 TeV V 0 validation data-set are predicted by the ANNs trained with the √ s = 12, 13 and 14 TeV training data-sets, which are denoted as ŝ12,13,14 , respectively.The normalized distributions of |∆ √ ŝ12,13,14 | are shown in Fig. 7.

Interpretation of the ANN
To investigate how the √ ŝ is predicted by the ANNs, in this section, the implicity relation between √ ŝ and the inputs concealed in an ANN is investigated.Since the accuracy of the ANN trained with only 4-momenta of charged leptons has been able to achieve almost the best accuracy, for simplicity, we focus on the ANN trained with only 4-momenta of the charged leptons.
Once the ANN is trained, we can input arbitrary 4-momenta of the charged leptons to study the relationship between ŝ and the 4-momenta.In this procedure, one can have a control on the variables, i.e., keep some variables constant and then change the others.In contrast, using M.C. simulation for such a study is difficult because the 4-momenta of charged leptons are generated according to the probability density and therefore are not arbitrary.
One of the reasons the ANN is called a 'black box' is that, although the ANN has an analytic expression, this expression is very complex and it is difficult to read the physics behind this expression.Another motivation of this section is to find a more understandable expression.In addition, this procedure can also be seen as a method to use the ANN to find an approximate formula.
Firstly, we use a pair of massless leptons with zero azimuth angles (denoted as ϕ ℓ ± ) and with E ℓ ± = Ē.The 4-momenta are set as where θ ℓ + and θ ℓ − are free parameters.The √ ŝ as a function of θ ℓ + and θ ℓ − predicted by the ANN is shown in the left panel of Fig. 8.We find that the √ ŝ can be well fitted as a 1 + a 2 cos(θ ℓ + − θ ℓ − ), which is shown as the curved surface.
Moreover, we also investigate how √ ŝ depends on E ℓ + and E ℓ − .For this purpose, we introduce a pair of back-to-back massless leptons with the directions of p ℓ + and p ℓ − fixed, i.e., we use the following 4-momenta where E ℓ + and E ℓ − are free parameters.The √ ŝ as a function of E ℓ + and E ℓ − predicted by the ANN is shown in the right panel of Fig. 8.We find that the √ ŝ can be well fitted as which is shown as the curved surface.
Denoting the angle between the momenta of ℓ ± as θ ℓℓ , the relation between √ ŝ and θ ℓℓ is investigated.In this case, we use p ℓ + = Ē 1, sin( θ), 0, cos( θ) , and let p ℓ − be on the surface of a cone whose central axis is p ℓ + , with E ℓ − = Ē.The p ℓ + and p ℓ − are depicted in the left panel of Fig. 9 where the definition of ϕ ℓℓ is shown.The √ ŝ as a function of θ ℓℓ and ϕ ℓℓ given by the ANN is shown in the right panel of Fig. 9.We find that, √ ŝ is insensitive to ϕ ℓℓ .For the case of aQGCs, the W ± bosons are typically energetic.As a result, the charged leptons are dominantly back-to-back.We find that, at the vicinity of θ ℓℓ ≈ π, √ ŝ is almost independent of ϕ ℓℓ , and is approximately a cosine function of θ ℓℓ .Based on the observations, we assume the relation between √ ŝ and the momenta of charged leptons can be fitted by an ansatz in Eq. (5.3).Note that, for fixed E ℓ ± and ϕ ℓ ± = 0, the ansatz has the form a 1 + a 2 cos(θ ℓ + − θ ℓ − ).For fixed θ ℓ ± and ϕ ℓ ± = 0, the ansatz has the form b  of θ ℓℓ and is independent of are also shown in Fig. 10.It can be seen that, Eq. ( 5.

Besides, the ansatz is a cosine function
3) is able to achieve comparable results to the ANN.Meanwhile, as an approximation found by the ANN, Eq. ( 5.3) is much better than Eq.(3.2).
The ANN with n 1 = 8 contains 9025 trainable parameters.As a contrast, the ansatz in Eq. ( 5.3) has only five fitting parameters.Besides, Eq. ( 5.3) is no longer an overly complicated expression which is hard to read.One can already see some patterns from Eq. (5.3).For example, ŝ is insensitive to ϕ ℓℓ , and is approximately a linear function of E ℓ + + E ℓ − when cos(θ ℓℓ ) = 0.

Expected constraints on the aQGCs
Where to find the signals of the NP is one of the most important questions in the study of NP.Since the signals of aQGCs are not observed yet, the objective of this section is to investigate the sensitivity of VBS processes to the aQGCs.In this section we study the signals and backgrounds with the help of the ANN.To take into account the unitarity bounds, ŝ is necessary.The ŝ reconstructed by the ANN approach is made use of to apply the unitarity bounds which are important in the study of an EFT.

One ANN for all couplings
We have confirmed in Sec. 4 that the ANN does not use the information about which coupling the events come from.For simplicity, and on the other hand, to have more sufficient training samples, we combine the training data-sets of V 0,1,2,3 to one data-set, and use this data-set to train one ANN for all couplings.Denoting ŝcomb as ŝ of the events in validation data-sets predicted by the ANN trained with the combined data-set, normalized distributions of ∆ √ ŝ for ŝcomb and ŝV i ,V i are compared in Fig. 11.We find that for V 0,1 vertices, ŝcomb are slightly better than ŝV 0 ,V 0 and ŝV 1 ,V 1 , for V 2,3 vertices, ŝcomb are about the same as ŝV 2 ,V 2 and ŝV 3 ,V 3 .In the remainder of this section, we use ŝcomb .

Signals and backgrounds
At the LHC, the t t + N j production contribute to the backgrounds due to the b-jet mistag.The Feynman diagrams in the case of N = 0 are shown in Fig. 12. (b).The cross section of inclusive t t production is about 888 pb [90], with the 77% [91] b-tagging efficiency, the inclusive t t production would lead to a jjℓ + ℓ − ν ν background whose cross section is about 2.32 pb.Apart from the t t backgrounds, significant irreducible backgrounds can arise from the SM processes which lead to the same final state ℓ + ℓ − ν νjj.The typical Feynman diagrams at tree level are shown in Fig. 12. (a), which are often categorized as the electroweak VBS (EW-VBS), electroweak non-VBS (EW-non-VBS) and QCD processes.To highlight Figure 12: The typical Feynman diagrams of the backgrounds.
the contributions from aQGCs, the contributions from EW-VBS diagrams including those contain the SM γγW + W − coupling are also considered as parts of backgrounds.In the following, the backgrounds shown in Fig. 12.(a) are denoted as 'SM', and the backgrounds in Fig. 12.(b) are denoted as 't t'.We use the event selection strategy in Ref. [52], where M jj and ∆y jj are invariant mass and difference between the rapidities of the hardest two jets, ϕ LM is the angle between the transverse missing momentum and the sum of transverse momenta of charged leptons, i.e. the angle between p ℓ + T + p ℓ − T and p miss T , θ ℓℓ is the angle between the charged leptons, and [92] which was found to be very efficient to highlight the signals of aQGCs in the study of same sign WW scattering.In this paper, we use ŝcomb instead of ŝap and adjust the cut to ŝcomb > 2.25 TeV 2 .ŝ is not well defined in the cases such as the t t background events, nevertheless, ŝcomb can still serve as an observable to discriminate the signal events from the backgrounds.The normalized distributions of √ ŝcomb are shown in Fig. 13.It can be seen that, for a background event, ŝcomb is generally smaller than 2.25 TeV 2 , which is not the case for a signal event.
By using the event selection strategy in Eq. (6.1), the cut flow is shown in Table 2. From Table 2, it can be seen that the cuts can reduce the backgrounds significantly while preserving much of the signals events.For the pp → t t+N j backgrounds, the cut efficiency for N = 0 is the lowest mainly because the N j ≤ 5 cut can increase the cut efficiencies for the cases of N > 0. An upper bound of the pp → t t + N j backgrounds is estimated by  The effect of the cuts on the process pp → ℓ + ℓ − ν νjj.The cross sections of signal and background events are given in fb.The result of t t + N j with N = 0 is shown as an example, the effect of b-tagging is not included in this table which will reduce the cross-section from 24938.24 to 1319.23 fb with 77% b-tagging efficiency.
using the efficiency of N = 0 for all values of N.Then, the pp → t t + N j backgrounds is reduced from 2.32 (pb) to about 0.15 (fb). Amplitudes Table 3: The helicity amplitudes at the order of O(s 2 ).

Unitarity bounds
As an EFT, the SMEFT is only valid under a certain energy scale.The cross-section of the VBS process with contributions from aQGCs included grows significantly at high energies.On one hand, at higher energies the VBS process is ideal to search for aQGCs.On the other hand, the cross-section will violate unitarity at a certain energy, provides a signature indicating that the SMEFT is not valid.The violation of unitarity can be avoided by unitarization methods such as K-matrix unitarization [80], T-matrix unitarization [93], form factor method [79,80], as well as relation method [94,95].It has been pointed out that, the constraints on the coefficients dependent on the method used [96], and it has been emphasised that unitrization defeats the model-independent purpose of using an EFT [63].Therefore, we present our results using a procedure independent of unitarization methods.
Considering the subprocess γ λ 1 γ λ 2 → W − λ 3 W + λ 4 , where λ 1,2 = ±1 and λ 3,4 = ±1, 0 correspond to the helicities of the vector bosons, in the c.m. frame of two photons with z-axis along the flight direction of γ λ 1 , the amplitudes can be expanded as [97] M(γ where θ and ϕ are zenith and azimuth angles of the W − boson, λ = λ 1 −λ 2 , λ ′ = λ 3 −λ 4 and d J λλ ′ (θ) are the Wigner D-functions [97].The partial wave unitarity bound is |T J | ≤ 2 [9].For the γγ → W + W − , 36 different helicity amplitudes can be obtained.The number of amplitudes can be reduced by using It is only necessary to keep the terms at the leading order (O(ŝ 2 )).The helicity amplitudes at the leading order are list in Table .3. The tightest bounds are The partial wave unitarity bound has been widely used in previous studies [59,60,[98][99][100][101][102][103].To avoid the violation of unitarity, the partial wave unitarity bound was often used as constraints on the coefficients of the high dimensional operators.Note that, in Eq. (6.4), the unitarity bounds are presented as constraints on ŝ instead of the coefficients.Due to the PDF, the ŝ of the subprocess is not a fixed value, which brings difficulties in setting constraints on the coefficients directly.Therefore, in this paper we use a matching procedure [104,105] instead.The matching procedure is built based on the idea that, to take validity into account, the constraints obtained by experiments should be reported as functions of energy scales [5], and has been introduced in the studies of the aQGCs [11,51].Such a matching procedure is independent of unitarization methods and can be applied in experiments.The matching procedure in this paper is also very similar to the 'clipping' method which also cuts off the signal events violating unitarity according to ŝ [76,106], except that we also cut off the backgrounds so one can compare the signals with backgrounds under a same ŝ cut.
We use Eq. ( 6.4) as a cut on ŝ, and compare the cross-sections with and without aQGCs under a same energy cut.We shall emphasis that, although this approach is called 'unitarity bound', using this approach we are actually not applying any constraints or unitarizations.In any case, it is practicable to compare NP and the SM under a certain energy scale.Especially, it is necessary in the detailed study of the Wilson coefficients in an EFT because the Wilson coefficients are functions of the energy scale.This matching procedure is independent of whether or not the unitarity bounds are imposed.We merely choose a matching energy scale such that the unitarity is guaranteed.Specifically, we choose the energy scale as the maximally allowed energy scale according to the coefficients of the aQGCs in the sense of unitarity.1, and the cross-sections before and after the energy cuts in Eq. (6.4).
For the largest coefficients listed in Table 1, the maximally allowed energy scales (denoted as √ ŝmax ) according to Eq. (6.4) are listed in Table 4.The effect of the unitarity bounds are also shown in Table 4.It can be seen that, the unitarity bounds have great suppressive effects on the cross-sections.Especially for V 0 , the cross-section is reduced by about an order of magnitude.Such a significant suppression indicates the necessity of the unitarity bounds.

Signal significance
The sensitivity of the process pp → jjℓ + ℓ − ν ν to the aQGCs can be estimated with the help of statistical significance defined as S stat ≡ N S / √ N S + N B , where N S is the number of signal events, and N B is the number of the background events.It has been shown that, within the current ranges of coefficients of the aQGCs, the interference terms can also be neglected [52].For simplicity, the cross-sections are calculated with the above contributions neglected.
We scan the parameter spaces larger than the constraints listed in Table 1 because the unitarity bounds have significant suppressive effects.The unitarity bounds are applied for each coefficient individually, and are applied according to Eq. (6.4).The cross-sections for aQGCs, the SM and t t backgrounds are denoted as σ V i , σ SM and σ t t+N j , respectively.After the cuts listed in Table 2 and after the unitarity bounds, the cross-sections as functions of the coefficients are shown in Fig. 14.Note that, although the cross-sections of the backgrounds are not functions of the coefficients of aQGCs, we compare the cross-sections of the backgrounds under different energy scales which are related with the coefficients of aQGCs, consequently, the cross-sections of the backgrounds appear to become functions of the coefficients of aQGCs.Without the unitarity bounds, the cross-sections of the signals should be quadratic functions of the coefficients.As we can see from Fig. 14, this is greatly changed by the unitarity bounds.Table 5: The expected constraints on the coefficients of the anomalous γγW W couplings at 13 TeV with L = 139 fb −1 and L = 300 fb −1 when S stat < 2.
be found in Fig. 15 that the process pp → jjℓ + ℓ − ν ν is sensitive to the V 0,2 vertices.The expected constraints are calculated assuming the signals of aQGCs are not observed with S stat ≥ 2, which are shown in Table 5.The results for possible future LHC luminosity L = 300 fb −1 [109] are also shown in Table 5.A comparison of Tables 1 and 5 shows that, except for V 2 , the constraints at L = 139 fb −1 in Table 5 is a bit less stringent, while the constraints in Table 1 is given at L = 35.9fb −1 by studying the production of W γ. The main reason is that results in Table 1 do not take into account unitarity bounds.The fact that constraints with unitarity bounds considered are significantly less stringent were also observed in the studies using 'clipping' method [76].From the results in Figs.14,15 and Table 5, one can see that when the unitarity bounds are applied, it is very important to increase the luminosity in order to narrow down the coefficient spaces.This is the problem of the narrow 'EFT triangles' which has been pointed out in previous studies [13,92,110], and it has been suggested that multi-operator analysis and combination of different processes are also important.We shall emphasis that, the above arguments are based on the pessimistic assumption that NP signals will not be discovered.This in turn just shows the importance of the high-energy region, where plenty room has been left for the discovery of new resonances.
In the study of the SMEFT, the energy scale of a process is an important parameter.However, reconstruction of the energy scales for processes at the LHC is difficult when there are two neutrinos in final states.The energy scale of the sub-process γγ → W + W − in the VBS process pp → jjℓ + ℓ − ν ν is such a case.In this paper, we study the contribution of aQGCs in the process pp → jjℓ + ℓ − ν ν with the focus on the energy scale of the subprocess γγ → W + W − .The method we are using is the ANN.We show that ANN, as a technique that has proven itself in several areas of HEP, is powerful when studying ŝ of the process pp → jjℓ + ℓ − ν ν.The results of the ANNs are much better than the approximation derived from kinematic analysis.With the help of ANNs, we investigate the information about ŝ hidden in the final state.It can be shown that, the importance of different sectors can be ordered as p ℓ ± > p miss T > p jet .Apart from that, which coupling is being studied, and the collision energy are two pieces of information that are hardly used.
With the help of the ANN approach, we find another approximate formula for ŝ which is a function of three variables θ ℓℓ , E ℓ + and E ℓ − and contains only five fitting parameters, as presented in Eq. (5.3).Eq. ( 5.3) has comparable accuracy as the ANN trained with 4momenta of charged leptons which has 9075 fitting parameters, and is more understandable than the ANN.In addition, Eq. (5.3) is much better than the approximation derived from kinematic analysis.
The unitarity bounds and the signal significances of aQGCs are also studied in this paper.It can be shown that, ŝ reconstructed by the ANN approach serves as an observable powerful in discriminating the signal events from the backgrounds.With ŝ, the unitarity bounds can be applied.The unitarity bounds have significant suppressive effects, and therefore are necessary.With unitarity bounds applied, the cross-sections and the signal significances of aQGCs are studied.The expected constraints at L = 139 and 300 fb −1 are obtained.The constraints from the process pp → jjℓ + ℓ − ν ν can contribute to the combined limits.

Figure 1 :
Figure 1: The Feynman diagrams of the contributions from anomalous γγW W couplings to the process pp → jjℓ + ℓ − ν ν.The process pp → W + W − jj can be affected by the anomalous γγW W couplings as shown in Fig. 1.The contribution from the tri-boson channel shown in Fig. 1.(b) was

Fig. 1 .
(b) is negligible compared with Fig. 1. (a).In the following discussions, we concentrate on Fig. 1.(a) and neglect the effect of Fig. 1. (b), despite that the data-sets are generated with both diagrams included.

Figure 3 :
Figure 3: The learning curves of the ANNs trained with V 0,2 data-sets.

Figure 9 :
Figure 9: The directions of the momenta of charged leptons used to study the relationship between ŝ and the angles between charged leptons are shown in the left panel.The ŝ as a function of θ ℓℓ and ϕ ℓℓ given by the ANN is shown in the right panel.

Figure 11 :
Figure 11: The normalized distributions of ∆√ s for ŝcomb compared with those for ŝV i ,V i .

Figure 13 :
Figure 13: The normalized distributions of √ ŝcomb for the signal events and background events.

Figure 14 :Figure 15 :
Figure14: The cross-sections as functions of the coefficients of aQGCs after the cuts listed in Table2and after the unitarity bounds.

Table 4 :
The √ ŝmax correspond to the largest coefficients in the ranges listed in Table