Measuring the anomalous quartic gauge couplings in the $W^+W^-\to W^+W^-$ process at muon collider using artificial neural networks

The muon collider provides a unique opportunity to study the vector boson scattering processes and dimension-8 operators contributing to anomalous quartic gauge couplings~(aQGCs). Because of the cleaner final state, it is easier to decode subprocess and certain operator couplings at a muon collider. We attempt to identify the anomalous $WWWW$ coupling in the exclusive $WW\to WW$ scattering in this paper. Since one aQGC can be induced by multiple dimension-8 operators, the study of one coupling can help to confine different operators. Meanwhile, singling out the $WW\to WW$ process can help to study the unitarity bounds. The vector boson scattering process corresponding to the anomalous $WWWW$ coupling is $\mu^+\mu^-\to \nu\nu\bar{\nu}\bar{\nu}\ell^+\ell^-$, with four (anti-)neutrinos in the final state, which brings troubles in phenomenological studies. In this paper, the machine learning method is used to tackle this problem. We find that, using the artificial neural network can extract the $W^+W^-\to W^+W^-$ contribution, and is helpful to reconstruct the center of mass energy of the subprocess which is important in the study of the Standard Model effective field theory. The sensitivities and the expected constraints on the dimension-8 operators at the muon collider with $\sqrt{s}=30$ TeV are presented. We demonstrate that the artificial neural networks exhibit great potential in the phenomenological study of processes with multiple neutrinos in the final state.


Introduction
The self-couplings of electroweak (EW) gauge bosons are most closely related to the nature of the electroweak symmetry breaking (EWSB) [1][2][3][4][5].Any hints for the anomalous gauge couplings would indicate the existence of new physics (NP) beyond the Standard Model (SM).In the framework of the SM effective field theory (SMEFT) [6][7][8][9], the dimension-8 operators contribute to the anomalous quartic gauge couplings (aQGCs) [10,11].On the other hand, the vector boson scattering (VBS) is one of the most common channels for performing precision measurement of the SM or searching NP beyond the SM at high-energy colliders.The probe of aQGCs through VBS is thus one of the most important topics at the Large Hadron Collider (LHC) and has received great attention [12][13][14][15][16][17][18][19][20][21][22][23][24][25][26].Nevertheless, the VBS measurements suffer from the large QCD background at the LHC and it is difficult to decode the initial states of subprocess as the final jets in the forward region are not distinguishable.
Recently, the development of the muon collider has gradually entered the limelight [27][28][29][30][31][32][33][34].On the high-energy muon collider, the dominant production mode for the SM and NP particles is VBS or vector boson fusion process [33].Therefore, the muon collider is also known as a gauge boson collider [34].Compared to the LHC, there are no composite particles in the initial states at muon collider, and thus the QCD background is not severe.Taking W W initiated scattering as an example, the high-energy muon beams radiate W bosons and turn into neutrinos.Meanwhile, the neutral gauge bosons Z, γ are also radiated under an approximately unbroken SM gauge symmetry and muons are produced in final states.The outgoing muons are extremely forward with a small polar angle of the order θ µ ∼ M Z /E µ ≈ 1.2 • for a Z-initiated process at 10 TeV [35] and most likely escape the detector.If we require the outgoing muons to be observable in the detector coverage 10 • < θ µ < 170 • , the cross sections of neutral gauge bosons initiated scattering would be substantially suppressed by two orders of magnitude [36,37].Thus, it is feasible to study the aQGCs induced exclusive W + W − → W + W − scattering at a muon collider.In addition, the muon collider can reach both high energy and high luminosity, which will be of great help to precisely measure aQGCs, since the cross-section induced by dimension-8 operators increases significantly with energy.Meanwhile, high luminosity is considered as one of the keys to solve the "EFT triangle" problem [38][39][40][41].
In this work, we investigate the sensitivity of W + W − → W + W − scattering to the dimension-8 scalar/longitudinal operators contributing to aQGCs at muon colliders. 1The W bosons in final states are then followed by purely leptonic decay W ± → ℓ ± ν.One key problem of the process µ + µ − → νν ν νℓ + ℓ − is the presence of the (anti-)neutrinos which lead to difficulties in the phenomenological studies.For example, it is difficult to reconstruct the center of mass (c.m.) energy of the subprocess W + W − → W + W − (denoted as √ ŝ).In the content of EFT, the Wilson coefficients of effective operators should rely on energy scales [43].At a high-energy muon collider, the c.m. energy is especially important because at high energies, the validity of SMEFT should be taken into account since the SMEFT is valid only under a certain energy scale.The unitarity bound [44][45][46][47][48] is often needed to investigate the validity of SMEFT, and the c.m. energy is necessary information to apply unitarity bounds.To solve the problem of reconstructing ŝ in the processes with multiple (anti-)neutrinos, a machine learning approach has been introduced into high energy physics (HEP) community [41].The machine learning methods have been widely used, and are being rapidly developed in HEP [49][50][51][52][53][54][55][56][57][58][59][60][61][62][63].In this paper, we adopt the artificial neural network (ANN) to extract the W + W − → W + W − contribution and to reconstruct ŝ.The complexity caused by the neutrinos just provides a venue to explore the boundaries of ANN capabilities.Based on the ANNs, the sensitivities of the process µ + µ − → νν ν νℓ + ℓ − to the dimension-8 operators contributing to aQGCs are investigated, with the focus on the The rest of this paper is organized as follows.In Sec. 2, the dimension-8 operators contributing to aQGCs are briefly reviewed.The ANN approach to extract the W + W − → 1 A recent Snowmass paper investigated the searches of aQGCs through the production of W W boson pairs at a muon collider with √ s = 6 TeV and an integrated luminosity of 4 ab −1 [42].They studied the W W νν and W W µµ final states with the W bosons decaying hadronically.
W + W − contribution is discussed in Sec. 3. In Sec. 4, we discuss the ANN approach to reconstruct ŝ.The expected constraints on the coefficients of the aQGC operators at the muon collider are estimated in Sec. 5. Sec.6 summarizes our main conclusions.

A brief introduction of the anomalous quartic gauge couplings
The Lagrangian of the dimension-8 operators contributing to aQGCs can be written as [10, 11] with where Φ denotes the SM Higgs doublet, D µ is covariant derivative, W ≡ ⃗ σ • ⃗ W /2 with σ being the Pauli matrix and ⃗ W = {W 1 , W 2 , W 3 }, B µ and W i µ are U (1) Y and SU (2) I gauge fields, and B µν and W µν correspond to the gauge invariant field strength tensor.Many NP models can generate effective dimension-8 operators contributing to aQGCs [64][65][66][67][68][69][70][71][72][73].Although the dimension-6 operators have received most studies, the importance of dimension-8 operators was recently emphasized by many groups [1,[74][75][76][77][77][78][79][80][81][82][83][84][85][86][87][88].As VBS processes receive great attention at the LHC, the above operators in the SMEFT have been investigated intensively.The LHC constraints on the coefficients of the operators assuming one operator at a time are listed in Table 1.Note that a UV completion model usually does not contribute to only one operator.The assumption that only one operator exists at a time can be used to study the sensitivity of a process and place stringent constraints when NP beyond the SM has not yet been found.The scattering process 3 Identification of the A naive dimensional analysis of the cross-section ignoring inferred divergences and logarithms yields σ SM ∼ 1/s, σ int ∼ sf /Λ 4 and σ NP ∼ s 3 f /Λ 4 2 , where σ SM , σ int and σ NP denote the contributions from the SM, interference term and NP squared term, respectively.For √ s = 30 TeV, one has σ int ∼ σ NP when s 2 f /Λ 4 ∼ 1, that is f /Λ 4 ∼ 10 −6 TeV −4 .For f /Λ 4 above this value, the interference contribution is smaller than σ NP .Besides, as the helicity amplitude grow fast with energy, there is not necessarily a corresponding large helicity amplitude in the SM that interferes with NP.In summary, whether the interference can be neglected needs to be verified for the range of operator coefficients of interest.A numerical justification of this is postponed to Sec. 5.2.
The tree-level Feynman diagrams of the aQGCs contribution to the process µ + µ − → ℓ + ℓ − νν ν ν are shown in Fig. 1.They can be categorized into three different types, including tri-boson processes (Fig. 1 (a)), VBS (Fig. 1 (b)) and the Yukawa suppressed diagrams involving Higgs boson (Fig. 1 (c)).Although the VBS processes dominate the contribution of the aQGCs at high energies, there is still significant contribution from the tri-boson process for the above dimension-8 operators [89].Among the VBS processes, other than W + W − → W + W − , there are contributions from neutral gauge boson induced processes together with forward muons.To identify the subprocess W + W − → W + W − contribution, it is necessary to select the charged leptons in central region.In the following, the contribution from diagrams including a W W W W SMEFT interaction is denoted as σ 4W , and σ no−4W denotes other SMEFT contributions.
To study the features of the aQGCs contribution, a Monte Carlo (M.C.) simulation is applied with the help of the MadGraph5_aMC@NLO toolkit [90,91].The events are generated Figure 1.The tree-level Feynman diagrams of the aQGCs contribution to the process µ with one operator at a time.Though out the paper, the standard cuts are set as the default ones in MadGraph5_aMC@NLO, as where p T ℓ are the transverse momenta of charged leptons, η ℓ are the rapidities of charged leptons, and ∆R ℓℓ = ∆ϕ 2 + ∆η 2 with ∆ϕ and ∆η being the differences of azimuth angles and rapidities of two charged leptons.
Since the neutrinos are invisible, in principle, one cannot use the information of the neutrinos.However, before we use the machine learning method to identify the contribution of σ 4W , it is useful to illustrate the size of σ 4W first.This can provide a criterion for the later algorithms which only utilize detectable observables.For this purpose, here we temporarily use the information of neutrinos obtained in the M.C. simulations.
Note that there are always two (anti-)neutrinos from a Z boson decay in no-4W processes.The neutrino flavors from Z boson decay must be the same.For W + W − → W + W − , besides the two (anti-)neutrinos ν µ and νµ along the beam direction, the flavors of the other two neutrinos must correspond to the charged leptons from two W bosons' decay.We denote m ν ν as the invariant mass of a pair of (anti-)neutrinos with the same flavor whose invariant mass is closest to m Z among all possible combinations of neutrinos.are then separated into two groups, according to the neutrino flavors and the size of mass window ∆ m = |m ν ν − m Z |. σ 4W is calculated as the cross-section with ∆ m > 15 GeV and the (anti-)neutrino flavors from W + W − → W + W − .Meanwhile, σ no−4W corresponds to the cross-section with ∆ m ≤ 15 GeV or with wrong flavors.Their results at √ s = 3, 5, 10, 14 and 30 TeV [34] are listed in Table 2.
In Table 2, using the above selection strategy, we show the M.C. results of σ 4W and σ no−4W given by O S 0 , O M 0 and O T 0 for illustration.The Wilson coefficients are taken to be the maximal values allowed by LHC in Table 1.The charged leptons in final states are ee, µµ or eµ.We also evaluate σ 4W with ee charged leptons for comparison, using effective vector boson approximation (EVA) [92][93][94] (denoted as σ EVA 4W ).The detailed EVA calculation is given in Appendix A. Although there are kinematic cuts and unavoidable interference in M.C. simulations, the small discrepancy between σ EVA 4W and σ 4W indicates that our selection strategy of σ 4W is reliable.Table 2 shows that if the aQGC signal is induced by O S i operators, when √ s ≥ 10 TeV, one can concentrate on σ 4W and neglect σ no−4W .For the cases of O M i and O T i operators, although the cross-section of the W + W − → W + W − contribution grows with √ s, σ no−4W is not negligible even at √ s = 30 TeV.We will use an ANN to select the events from the process involving subprocess Based on the results of Table 2, we find that, to study the VBS subprocess √ s is needed.Therefore, we only consider √ s = 30 TeV below.In the following, we use d 4W to represent the event data-set from W + W − → W + W − contribution, and d no−4W to represent the event data-set from σ no−4W contribution.

The traditional approach
The mechanism behind the ANN is a "black box".Thus, before using the ANN, we must verify that the data set contains information that allows us to extract the W + W − → W + W − contribution.
For the process µ + µ − → ν µ νµ W + W − , the (anti-)neutrinos tend to be along the muon beam direction and are back-to-back.Meanwhile, when ŝ is large, W ± are energetic and also back-to-back.Consequently, the (anti-)neutrinos from W → ℓν tend to be along the directions of W ± and therefore also back-to-back.One can conclude that, the transverse missing momentum in the W + W − → W + W − contribution should be relatively small.At a lepton collider, all the components of the missing momentum can be obtained by using momentum conservation with satisfactory accuracy, and the zenith angle of the missing momentum (denoted as θ miss ) can be regarded as an observable.We find that θ miss provides a better discrimination than the transverse missing momentum.At √ s = 30 TeV the normalized distributions of | cos(θ miss )| for O M 0 are shown in left panel of Fig. 2. One can see that | cos(θ miss )| is indeed closer to 1 for events from d 4W .
The cross-section can also be contributed by other VBS processes in addition to W + W − → W + W − .Taking the process µ + µ − → νν ν νe ± µ ∓ as an example, there is also ZW ± → ZW ± process as shown in the second Feynman diagram in Fig. 1 (b).For the ZW ± → ZW ± process, the direction of the muon in the final state tends to be along the direction of z-axis.Denoting the zenith angle of the muon as θ µ , the normalized distributions of | cos(θ µ )| for O T 0 are shown in the right panel of Fig. 2. One can see that | cos(θ µ )| is closer to 1 for events from d no−4W .
Although we can distinguish events from d 4W and d no−4W by some observables in this way, the distinction is not very efficient.One has to analyze different processes accordingly.To achieve a desirable efficiency, a complicated analysis is required.Finding patterns from complicated relationship is what an ANN is good at.It can be seen that even without the ANN, there are still clues to distinguish events from d 4W and d no−4W .The ANN simply automates and improves the search for the clues.

Hidden layer
Output layer

The neural network approach
The ANN is a mathematical model to simulate a human brain [95], which is good at finding the complicated mathematical mapping relationship between input and output.It could be utilized for clustering the events.However, a classification is not tunable.For example, to archive a cleaner event set of d 4W , some events from the d 4W could be allowed to be misidentified as from d no−4W .Therefore, instead of clustering, we treat the identification of the events from d 4W as a regression problem.
For an ANN, the relationship between input and output is determined by interconnected nodes and their connection modes.We use a dense connected ANN.An ANN is composed with an input layer, hidden layers and an output layer.Denoting x i j as neurons in the i-th layer, where x 1 1≤j≤n 1 are input neurons, x 2≤i≤L−1 1≤j≤n i are in hidden layers and x L 1 is the output neuron, where L is the number of layers and n i are the number of neurons in the i-th layer, the ANN can be depicted in Fig. 3.A value is assigned for each neuron which is also denoted as x i j .Then x i+1 j ′ can be related with x i j as where ω i+1 jj ′ and b i+1 j ′ are trainable parameters.ω i+1 jj ′ are called the elements of the weight matrix W i+1 , b i+1 j ′ are components of the bias vector, and f i+1 j ′ are activation functions.Except for the output layer, the activation functions are chosen as the parametric rectified linear unit (PReLU) function [96] defined as  where α's are also trainable parameters.For the output layer, no activation function (i.e., linear activation function) is used.We use L = 15, n 10>i>1 = 50, n 1 = 14 the same as the dimension of input data and n L = 1 for the output layer.The architecture is built using Keras with a TensorFlow [97] backend.The data preparation is performed by MLAnalysis [98].
The training and validation data-sets are prepared by using M.C. simulation.The data-sets consist of elements with 15 variables.The first 14 variables make up a 14dimension vector fed to the input layer (denoted as v i ), and the last variable corresponds to the output layer.12 components of the 14-dimension vector are the components of 4momenta of charged leptons, and those of the missing momentum.The other 2 components correspond to the flavors of the charged leptons.The variable for output is set to 1 if the event is from σ 4W otherwise is set to 0. The ground truth of the output is determined with the help of non-observables, and the goal of the ANN is to reproduce the ground truth with only observables.In the following, the output predicted by the ANN is denoted as W score .Therefore, if the ANN is well-trained, we expect that W score is close to 1 for the events from σ 4W otherwise is close to 0. Two ANNs are trained, one for O M i operators and the other for O T i operators.For each operator of O M i and O T i , one million events are generated.One half of them form the training data-sets, and the other half form the validation data-sets.The data-sets are normalized using the z-score standardization, i.e., v ′ i instead of v i which is defined as , where vi and σ v i are the mean value and the standard deviation of all i-th variables of the elements in the training data-sets.
The learning curves are shown in Fig. 4. One can see that the mean squared errors (mses) stop to decrease for the validation data-sets at about epoches = 100 ∼ 200.To avoid overfitting, we stop at epoches = 100 where the mses of validation data-sets stop to decrease.After training, the W score results for the validation data-sets are shown in Fig. 5. Compared with Fig. 2, the W score has stronger discrimination power.In the following, for each operator, we choose one minimal cut of W score as long as the mistag rate of the events from d 4W reaches about 5%.The cuts and the effects of the cuts are shown in Table 3.
4 The reconstruction of center of mass energy of the The ŝ of the process W + W − → W + W − is important in the study of the SMEFT, because as an EFT the Wilson coefficients should be dependent in energies.On the other hand, the SMEFT is only valid below certain energy scale.The violation of unitarity is often used as a signal that SMEFT is no long valid, and unitarity bounds depend on the energy scale.
In any case, energy scale is important information in the study of the SMEFT.However, the process µ + µ − → νν ν νℓ + ℓ − have four (anti-)neutrinos in the final state, which causes problems to reconstruct ŝ.

The traditional approach
In traditional approach, one has to analyze the kinematics of the process.With four (anti-)neutrinos, the kinematic feature is difficult to analyze.For the process µ + µ − → ν νW + W − , the (anti-)neutrinos tend to be along the beam direction of µ + µ − and thus the transverse momenta of the (anti-)neutrinos are small.If one neglects the transverse momenta of (anti-)neutrinos along the beam, the transverse missing momentum is the sum of the transverse momenta of the (anti-)neutrinos in the processes W → ℓν.Then, further assuming the mass of the W boson is negligible compared with ŝ, the situation is approximately the same as that investigated in Refs.[41,99] which deal with the reconstruction of ŝ with two (anti-)neutrinos, and ŝ can be approximately given by where E ℓ ± are the energies of charged leptons, θ ℓℓ is the angle between the charged leptons, and c i are the parameters to be fitted.Eq. (4.1) is in fact not obtained by kinematic analysis but by using machine learning approach.With the help of some kinematic analysis and approximations, we translate our problem with four (anti-)neutrinos into the one with two (anti-)neutrinos solved by Eq. (4.1).Using kinematic analysis, there is another approximation which is ŝap in Refs.[41,99].ŝap is less accurate than ŝlep , therefore is not discussed in this paper.However, the procedure of deriving the ŝap shows that, although it is not possible to give the exact ŝ by the information in the final state, it is possible to give the most likely one.Using the ANN is merely an improvement and an automation of this complicated procedure.

The neural network approach
We trained three ANNs to reconstruct ŝ, which correspond to O S i , O M i and O T i operators.The architecture of the ANNs as well as the 14-dimensional vectors to feed the input are the same as those used in Sec.3.2.For the output layer, the ground truth of ŝ is estimated as ŝtr = (p ℓ + + p ℓ − + p ν + p ν ) 2 , where p ν (p ν ) is the 4-momentum of the neutrino (anti-neutrino) with the same flavor as ℓ + (ℓ − ) and with the direction closest to ℓ + (ℓ − ).Since the Les-House event files created by MadGraph5_aMC@NLO contains the information of intermediate W ± for events from NP squared terms, it is possible to verify the correctness of ŝtr .For the events from O S 0 and for all events containing intermediate W ± , we find the rate of mismatch is smaller than 0.0001%.
To construct the data-sets, one million events are generated for each operator.Only events in σ 4W are included, and the events are divided.4. The normalized distributions of relative differences for ŝlep are also shown in Fig. 7.For both ŝann and ŝlep , the predictions for most events can be smaller than 40%.One can find that the ANNs are able to predict ŝ more accurately than ŝlep .If one requires the relative difference to be smaller than 20%, using ŝann , 76.8%, 81.9% and 81.8% of the O S i , O M i and O T i events satisfy the requirement, compared with 72.6%, 78.6% and 66.7% for using ŝlep .
5 Signal significance

The partial wave unitarity bound
As an EFT, the SMEFT is only valid under the NP energy scale Λ.The large ŝ at the muon collider provides a great chance to detect the NP.Meanwhile, the verification of the validity of the SMEFT becomes inevitable.The partial wave unitarity has been widely used in previous studies as an indicator of the SMEFT validation [73,88,[100][101][102][103][104][105][106].For a VBS process W + λ 1 W − λ 2 → W − λ 3 W + λ 4 with λ 1,2,3,4 = ±1, 0 corresponding to the helicities of the vector bosons, in the c.m. frame with z-axis along the flight direction of W − in the initial state, the amplitudes can be expanded as [107] M where θ and ϕ are zenith and azimuth angles of the W − boson in the final state, λ = λ 1 −λ 2 , λ ′ = λ 3 − λ 4 and d J λλ ′ (θ) are the Wigner D-functions.The partial wave unitarity bound is |T J | ≤ 2 [47].With the helicity amplitudes calculated in Appendix B, and assuming one operator at a time, the tightest bounds are (5.2) In this paper, the unitarity bounds are applied using a matching procedure [108, 109] which has been used in previous studies of aQGCs at the LHC [41,76,77].We compare the cross-sections with and without aQGCs under a certain energy scale.It should be emphasized that, such unitrization procedure introduces no extra assumptions.This is important because it has been pointed out that different unitrization methods lead to different results [110], and therefore the unitrization methods introducing extra assumptions actually break the model-independence principle of the SMEFT [1].
In fact, in our approach, no bounds or constraints are applied, despite a misleading "bounds" in the name of this procedure.Studying the Wilson coefficient within an energy range is standard for an EFT because the Wilson coefficients typically depend on energy scales.Namely, even without unitarity bounds, it is a matter of interest to compare the SMEFT and the SM within a certain energy scale.We simply choose the energy scale according to the coefficients such that unitarity is guaranteed.
The suppression of cross-section when unitarity bounds are considered has been noticed in Refs.[41,76,77], and demonstrates the necessity of the unitarity bounds.The effect of unitarity bounds can be estimated in terms of EVA (see Appendix.A).To illustrate the necessity of unitarity bounds, we compared the cases with and without unitarity bounds at √ s = 30 TeV in Fig. 8.As a verification, the results from M.C. simulation are also shown with √ ŝtr in Sec.4.2 used for unitarity bound.It can be seen in Fig. 8 that, the cross-section is suppressed significantly and is no longer a bilinear function of f S 0 after the unitarity bound is applied, which is also seen in Refs.[41,76,77].
After event selection strategy, the suppressed efficiencies of unitarity bounds will change.Thus, we cannot use Eq.(A.6) to estimate the effect of unitarity bounds after the event selection strategy.Also, we cannot use √ ŝtr to apply the unitarity bounds in M.C. simulation because √ ŝtr is not observable.Instead, with ANNs trained to reconstruct ŝ at hand, we use the ŝANN for unitarity bounds.Fig. 8 is only for an illustration that the unitarity bounds are necessary..6).This is only for an illustration that the unitarity bounds are necessary.

Signal and backgrounds
The major SM background is µ + µ − → ℓ + ℓ − + / E. We consider the processes with up to four (anti-)neutrinos and the Feynman diagrams are shown in is µ + µ − → ℓ + ℓ − ν ν whose cross-section is about 155.2 fb.
The signal significance is estimated using the definition S stat = N s / N s + N bg where N s (N bg ) is the number of signal (backgrounds) events.We define the cross-section with unitarity bounds as σ 4W,U as shown in Eq. (A.5).Before performing the M.C. simulation, σ 4W,U is used to initially predict the constraints on coefficients, as well as to determine the range of parameter space.The expected luminosity of the µ + µ − collider at √ s = 30 TeV is about L = 10 ab −1 ∼ 90 ab −1 [34].By requiring S stat ≈ 2 ∼ 5 before event selection strategies and after the unitarity bounds are applied, we choose the coefficient spaces satisfying σ 4W,U ≈ 0.1% × σ SM .The largest coefficients are calculated according to Eqs. (A.6-A.13)and listed in Table 5.The coefficients for O S i are much larger than O M i and O T i , because the suppression of the unitarity bounds is more important for O S i operators.This is because the dominant helicity amplitude for each O S i operator is longitudinal scattering W 0 W 0 → W 0 W 0 as shown in Eq. (B.1).However, the luminosity of transverse polarized W bosons from the beam is logarithmically enhanced, as shown in Eq. (A.1).As a result, to produce cross-sections of the same order of magnitude, f S i should be much larger than f M i and f T i .Meanwhile, the unitarity bounds are set by the amplitudes of W W → W W , so that the suppression is at the same order of magnitude for all operators.In return, a larger f S i suffers from a more significant suppression, and an even larger f S i is required as a consequence.
To study the kinematic features of the signals and the background, in the following, a fast detector simulation is performed by Delphes [111] with the muon collider card.We cut off the events that do not contain two opposite-sign leptons or with at least two charged leptons but the hardest two have same-sign (denoted as N ℓ cut).The kinematic features are shown after the N ℓ cut.
Compared with the VBS contribution from the aQGCs, the ŝ of the VBS in the SM is smaller.As a result, a smaller E ℓ = E ℓ + + E ℓ − is expected for the SM.For the same reason, the invariant masses of charged leptons m ℓℓ = (p ℓ + + p ℓ − ) 2 of the signal events are larger.For the signal events, with an energetic W boson, the charged lepton should be approximately collinear to the W boson.The values of ŝ of the signal events are large, so that the produced W ± bosons tend to be approximately back-to-back for a signal event.As SM a result, the angle between charged leptons (denoted as θ ℓℓ ) should be close to π. Taking O S 0 , O M 0 and O T 0 as examples, the normalized distributions of E ℓ , m ℓℓ and cos(θ ℓℓ ) are shown in Fig. 10.It can be seen that E ℓ , m ℓℓ and cos(θ ℓℓ ) for the signal events are very different from those of the background events.
To constrain the aQGC operators, different event selection strategies can be designed for different operators.The event selection strategies and effects of the cuts are summarized in Table 5.We find that the event selection strategies can suppress the backgrounds significantly, while keeping most of the signal events.
The interference has been neglected until now.For the coefficients listed in Table 5, the contributions of interference terms compared with those of NP squared terms are shown in Table 6.One can see that the interference of O T i operators is about 10% of the squared term and should be taken into account.However, the interference does not only come from the interference of W W → W W but also from other diagrams such as tri-boson diagrams, which are meant to be cut off.Apart from that, the event selection strategy designed to cut off background should also suppress the interference contribution.Therefore, taking O T 2 as an example (whose interference is the greatest among all operators in study), we investigate the effect of event selection strategy on the interference.After the event selection strategy, σ int becomes 28.6 ab, which is less than 10% of σ NP after event selection strategy is applied.

Signal significance and the expected constraints
As introduced, we compare the cross-sections under a certain energy scale corresponding to the coefficients of dimension-8 operators.As a result, due to the energy cut caused by unitarity bound depending on the coefficients, the cross-section of the SM appears to be functions of coefficients.From another point of view, the cross-section of the SM does not actually depend on the coefficients of the operators, but on a certain energy scale that we have selected.
The cross-sections after applying unitarity bounds are shown in Fig. 11.The upper bounds of ŝ are calculated using Eq.(5.2) and denoted as ŝO X U for each operator O X .It can be found that, in the ranges of coefficients considered in this paper, the cross-sections of the SM are typically ∼ O(1) fb, and the cross-sections of NP are typically less than 0.1 fb.
For luminosities L = 10 ab −1 and L = 90 ab −1 [34], the expected constraints on the absolute values of coefficients (|f X |) are obtained with the help of signal significance and assuming one operator at a time.Since there are still errors between ŝann and ŝtr , moreover, the EFT probably stops being valid before the unitarity limit.Therefore, the robustness of the results are studied by varying the cut-off on ŝ2 ann by factors 1/2 and 2, analogous to what has been done with the QCD scales for the study of aQGCs at the LHC [77].The results are listed in Tables 7 and 8.The expected constraints with 2ŝ 2 U and ŝ2 U /2 used as unitarity bounds are presented as systematic errors in Tables 7 and 8.
As introduced, the O S i operators are significantly affected by the unitarity bounds, which can also be seen from Tables 7 and 8. Take the expected constraints at S stat = 2, for O S i operators in general and O M i operators at L = 10 ab −1 , our results only show the orders of magnitude.For O T i operators in general and O M i operators at L = 90 ab −1 , however, our results do not rely on the unitarity bounds and are thus more meaningful for experiments.This is a representation of the "EFT triangle" problem [38][39][40][41], which can be solved by high luminosity.As a result, one can see that the results for O M i and O T i operators at L = 90 ab −1 are more reliable.Generally, the O T i operators are seldom affected by the unitarity bounds.This is because the luminosity of the transverse W bosons from the beam is logarithmically enhanced, and all the dominant helicity amplitudes of O T i contain transverse W bosons in the initial state of W W → W W . Compared with Table 1 which does not consider unitarity bounds, the coefficients can be narrowed down significantly even with unitarity bounds considered.The expected constraints can be 3 to 4 orders of magnitude stronger than those at the 13 TeV LHC for O M i and O T i operators.

Summary
As a gauge boson collider, the muon collider is suitable to study the VBS processes for the aQGCs.Unlike the LHC, with cleaner final states, it is easier to extract aQGCs out of all relevant dimension-8 operators.For example, exclusive W + W − → W + W − can be separated out from V V → W + W − processes.In this paper, we study the process µ + µ − → ℓ + ℓ − νν ν ν containing a W + W − → W + W − subprocess induced by dimension-8 operators contributing to aQGCs.The presence of four (anti-)neutrinos in the final states poses difficulties to the phenomenological study.In this paper, this problem is tackled with a machine learning approach.The ANNs can be used to pick out the contribution including a W + W − → W + W − subprocess.Moreover, the ANNs can also be used to reconstruct ŝ of the  With the help of the ANNs, the sensitivities of the process µ + µ − → ℓ + ℓ − νν ν ν to the dimension-8 operators contributing to aQGCs are studied at muon collider with √ s = 30 TeV.The kinematic features are investigated, and the event selection strategies are proposed.The unitarity bounds are also considered, which turn out to be necessary in the study of the W + W − → W + W − subprocess at a muon collider.The expected constraints are studied with the help of signal significance, and are found to be as small as 4 orders of magnitude stronger than those at the 13 TeV LHC even with unitarity bounds applied.This shows the great advantage of the muon collider in studying the aQGCs.

A Effective vector boson approximation
As a comparison, the contribution of NP involving a W + W − → W + W − subprocess is calculated by using effective vector boson approximation.The W + W − → W + W − contribution to the process µ + µ − → νν ν νℓ + ℓ − can be given by EVA as [92][93][94] where √ ŝ = √ ξ 1 ξ 2 s and µ f is the factorization scale which can be set to be √ ŝ/4 [94].At the leading order of s, one has With Br(W → eν e ) ≈ Br(W → µν µ ) ≈ 10.8% [112], the predictions of σ 4W by Eq. (A.2) (denoted as σ EVA 4W ) can be calculated.The differential cross-section is also calculated to estimate the effect of unitarity bounds, which is where (A.4) The cross-section with unitarity bounds is then estimated as where 2) The helicity amplitudes producing duplicated T J are not shown.

2 Figure 3 .
Figure 3.The graphical representation of the ANN.

Figure 4 .
Figure 4.The learning curves of the ANNs trained for W score , corresponding to O Mi (left panel) and O Ti (right panel).

Figure 6 .
Figure 6.The learning curves of the ANNs trained for √ ŝ.The left (middle) [right] panel corresponds to O Si (O Mi ) [O Ti ].

Fig. 6 .
To avoid overfitting, we stop at the epoch when the accuracy of validation data-set starts to fall.Based on the learning curves in Fig.6, we choose to stop at 100, 200 and 500 epoches for O S i , O M i and O T i operators.Denoting ŝann as the prediction of ŝ by the ANNs, for the validation data-sets, the normalized distributions of relative differences defined as ∆ √ ŝ/ √ ŝtr are shown in Fig.7.For comparison with ŝlep , c i in Eq. (4.1) are fitted with the training data-sets and the results are listed in Table

Figure 11 .
Figure 11.The cross-sections after unitarity bounds are applied.

Table 8 .
The expected constraints on the absolute values of coefficients (10 −3 TeV −4 ) at √ s = 30 TeV and L = 90 ab −1 .The results by varying the unitarity bounds by factors 1/2 and 2 are presented as systematic errors.

ACKNOWLEDGMENT
This work was supported in part by the National Natural Science Foundation of China under Grants Nos.11905093 and 12147214, the Natural Science Foundation of the Liaoning Scientific Committee No. LJKZ0978 and the Outstanding Research Cultivation Program of Liaoning Normal University (No.21GDL004).T.L. is supported by the National Natural Science Foundation of China (Grants No. 12035008, 11975129) and "the Fundamental Research Funds for the Central Universities", Nankai University (Grant No. 63196013).

Table 1 .
The LHC constraints on the coefficients (in unit of TeV −4 ) of dimension-8 aQGC operators obtained at 95% CL.
can be contributed by O S 0,1,2 , O M 0,1,7 and O T 0,1,2 operators, therefore, in the following we concentrate on these operators.

Table 2 .
σ 4W : σ no−4W (fb) for different operators and different charged lepton flavors.The predictions of effective vector boson approximation (EVA) are denoted as σ EVA 4W for ee final states. )

Table 4 .
One half of them are in the training Results of c i in Eq. (4.1) fitted with the training data-sets.
data-sets, and the other half form the validation data-sets.For O S i , O M i and O T i operators, about 1.50, 1.03 and 1.21 million events are included in each data-set, respectively.The learning curves are shown in

Table 6 .
Cross-sections (ab) of interference terms and squared terms.

Table 7 .
The expected constraints on the absolute values of coefficients (10 −3 TeV −4 ) at √ s = 30 TeV and L = 10 ab −1 .The results by varying the unitarity bounds by factors 1/2 and 2 are presented as systematic errors.