Using k-means assistant event selection strategy to study anomalous quartic gauge couplings at muon colliders

The search for new physics beyond the Standard Model is one of the central problems of current high energy physics interest. As the luminosities of current and near-future colliders continue to increase, the search for new physics has increased the requirements for processing large amounts of data. Meanwhile, quantum computing which is rapidly evolving, has great potential to become a powerful tool to help search for new physics signals. Since the k-means algorithm is known to be able to be accelerated with the help of quantum computing, we investigate and propose an event selection strategy based on k-means algorithm to search for new physics signals. Taking the case of tri-photon processes at the muon colliders as an example, the event selection strategy is shown to be effective in helping to search for the signals of dimension-8 operators contributing to anomalous quartic gauge couplings. Compared with traditional event selection strategy, the expected constraints are generally tighter.


Introduction
While the Standard Model (SM) has been very successful so far, there are still unanswered questions and the search for new physics (NP) signals beyond the SM is at the forefront of high energy physics (HEP) [1].On the one hand, except for a few cases known or suspected to deviate from the SM [2][3][4][5][6], there is no clear signal giving guidance for possible NP models.Moreover, the experimental advances in the near decade ahead will be focused on the high luminosity frontier, which makes it important to search for NP efficiently.
The use of machine learning (ML) algorithms in HEP is recently developing rapidly [46][47][48][49][50][51][52][53][54][55][56][57].To further improve efficiency, event selection strategies based on anomaly detection (AD) ML algorithms are introduced to HEP community [58][59][60][61][62][63][64].One advantage of using anomaly detection algorithms is that, usually no a priori knowledge of the NP models under study is required to find anomalous signals, i.e. the strategy to use AD is independent of the operators or NP models to be searched for.Except for that, an AD algorithm may also serve as an aid to the traditional event selection strategies to improve the results.Moreover, unlike searching for NP in a process that may turn out to be fruitless, the anomalies found by searching for anomalous signals are always worth attention.
Another promising improvement in efficiency comes from quantum computing [65][66][67].It has been shown that, the calculation of distances between two vectors can be accelerated by using the controlled swap gate which is known as swap test [68,69].Meanwhile, the calculation of distances is at the core of the famous k-means algorithm [70], and the quantum accelerated k-means algorithms have been developed [71,72].The k-means algorithm can also been used for AD, however, whether the k-means AD algorithm (KMAD) is useful in searching for NP is still not known.In this paper, we focus on the feasibility of using KMAD in the search of NP.
As a test bed, we use KMAD to study the dimension-8 anomalous quartic gauge couplings (aQGCs) [73,74] at the muon colliders.The aQGCs also draw a lot of attention at the LHC because they are closely related to the NP w.r.t.electroweak symmetry breaking [12,17,18,45,75].Meanwhile, the muon colliders have received a lot of attention recently due to their abilities to reach both high energies and high luminosities, while keeping cleaner environment less affected by the QCD backgrounds [76][77][78][79][80][81][82][83][84][85].The tri-photon processes at the muon colliders are shown to be sensitive to the transverse operators contributing to the aQGCs [86].Due to the narrow 'EFT triangles' problem in the SMEFT [87][88][89], i.e., if no new particles are found at very high energies and an EFT still applies, implying that the Wilson coefficients of the EFT have to be small or else the validity of the EFT would be violated, high luminance is necessary.On the other hand, signatures of NP should be relatively easy to find, or already have been found, if they are not particularly exceptionally small.In this paper, we assume that the search for NP on muon colliders remains difficult, and therefore the search for NP remains a search for tiny signals in a large number of background events.In this paper, we present an event selection strategy to search for the signals of aQGCs and set constraints on the operator coefficients with the help of KMAD.
The rest of this paper is organized as follows.In Sec. 2, the dimension-8 operators contributing to aQGCs and the tri-photon process are briefly reviewed.The event selection strategy using KMAD is discussed in Sec. 3. The expected constraints on the operator coefficients are presented in Sec. 4. Sec. 5 summarizes our main conclusions.
2 The contribution of aQGCs to the tri-photon process Table 1: The constraints on the O T i coefficients (TeV −4 ) obtained at 95% C.L at the LHC.
The frequently used dimension-8 operators contributing to aQGCs can be classified as scalar/longitudinal operators O S i , mixed transverse and longitudinal operators O M i and transverse operators O T i [45].At tree level, the tri-photon process at muon colliders can be affected by O T i operators [73,74], where  The reason this set of operator basis is used is because this set has been used extensively at the LHC [30-44] as well as in previous studies [14 -16, 52, 56, 57, 62, 86, 90], and therefore we can compare our results with previous studies.The O T i operators (and linear combinations of O T i ) are also in a complete basis of dimension-8 operators.For example, B 4 are operators in the operator basis in Ref. [91], and they are also operators in the operator basis in Ref. [92].
At tree level, there are only two Feynman diagrams induced by O T i operators as shown on the left panel of Fig. 1.
The SM background is shown on the right panel of Fig. 1.For the tri-photon process, the contribution of O T 1,6 operators are exactly as same as O T 0,5 operators, respectively.In the following, we concentrate on O T 0,2,5,7,8,9 operators.

K-means assistant event selection strategy
Fig. 2: A demonstration of the search for anomalous signals when the distribution of points is not Gaussian.The black dot in the center is the anomalous point.The blue, red and green points are normal points which can be divided into three clusters.The anomalous point is far away from the centroids of all clusters.
Typically, the searching of NP at a collider with a high luminosity is to look for a small number of anomalies in the vast amount of data.Meanwhile, AD algorithms are designed to search for those events which are 'few and different'.Therefore, it can be expected that the AD algorithms are suitable to search for NP signals.
In the simplest case, one can assume that the elements in the dataset are Gaussian distributed.The degree of anomaly of each event can be quantified by the distance of the point representing the event from the centroid of all points.However, the picture of Gaussian distribution is often oversimplified for the case of searching for NP.As an example, a more complicated case is shown in Fig. 2. The anomalous point is depicted as the black dot, which is approximately at the centroid of all points, therefore, the degree of anomaly can no longer be quantified by the distance of the point from the centroid of all points.On the other hand, as shown in the Fig. 2, the normal points can be divided into three clusters.The anomalous point is far from the centroids of all clusters.The exploitation of this feature is the key idea of KMAD.By using KMAD, we use k-means to find the centroids of the clusters automatically.
A sketch of how k-means cluster the data points is plotted in Fig. 4. At first, k (in Fig. 4, k = 3) centroids are chose randomly, which is depicted as the red, green and blue dots in Fig. 4. (a).Then, the points are assigned in the same cluster as the closest centroid, which is shown in Fig. 4. (b).And then the centroids are recalculated as the centers of the points with the same cluster assignments, which is depicted as the red, green and blue dots in Fig. 4 (c Calculating the distance from each point to each centroid in this procedure is the most computationally resource intensive, and calculating the distance can be accelerated with the swap test in quantum computing [93].Besides, swap test can directly calculate the distance from one vector to the centers of many vectors, and the cluster assignments can also be accelerated with a quantum computer [71].
After clustering, we calculate the distances of the points to the centroids of all clusters.Since the anomalous point is far from all centroids, the distance from a point to the nearest centroid can be used to quantify the degree of anomaly of the point.
This mechanism is suitable for even more general cases.Assume that the background events distribute arbitrarily as the one shown in the left panel of Fig. 3.If the classes obtained using the clustering algorithm can put together the shape of the distribution of background events, one can still use KMAD to improve the signal significance.As shown in the right panel of Fig. 3, one can still choose the distance from a point to its nearest centroid as the anomaly score.Keeping only the points with large anomaly scores can significantly suppress the background events.It can be expected that, the more clusters can sample an arbitrarily shaped distribution more accurately.As will be shown, KMAD with more clusters works better in the case of searching for NP.
The essence behind the anomaly score mechanism is hypothesis testing.Still taking the simplest example of a Gaussian distribution, suppose that the SM predicts that the probability distribution of the occurrence of an event follows a Gaussian distribution, and suppose that these events can be described as points in a 2-dimensional space.The percentage of points that can be expected to be in a circle with a radius equal to twice of the standard deviation 2σ is 95%, and the percentage of points in a circle with a radius equal to 3σ is 99.7%.Therefore, in experiments, when the number of points outside the 3σ circle exceeds 0.3% of the total number of points, the distribution obtained by measurements must not be Gaussian, and the SM must need to be improved.Not only that, but since the points outside the 3σ circle account for 0.3% of the total under the SM, the points outside the 3σ circle are more likely to be the signal of some unknown NP model than the points inside the Fig. 3: An example that the background events distribute arbitrarily (the left panel).One can still use a clustering algorithm to divide the background events into several classes (the right panel).The points farther from all centroids are more likely the anomalous events (the blue dots in the right panel).circle.Thus the distance of an event from the center can be used as an anomaly score and become a quantitative criterion for portraying whether or not an event is more likely to be a NP event.KMAD, on the other hand, generalizes the above case of the Gaussian distribution to the case of more general distributions.

Data preparation
To prepare the dataset, we generate the events using Monte Carlo (MC) simulation with the help of MadGraph5@NLO toolkit [94][95][96], including a muon collider-like detector simulation with Delphes [97].To avoid infrared divergences, in this section we use the standard cut as default, the cuts rele-vant to infrared divergences are where p T,γ and η γ are the transverse momentum and pseudorapidity for each photon, respectively, ∆ R γγ = ∆ φ 2 + ∆ η 2 where ∆ φ and ∆ η are differences between the azimuth angles and pseudo-rapidities of two photons.The events for signals are generated with one operator at a time.In this section, the coefficients are chosen as the upper bound of Table 1.As will be introduced later, the coefficients as well as the operators to be searched for are actually irrelevant.
To build datasets for the KMAD, we require each event to have at least three photons.The datasets are composed of the 4-momenta of the three hardest photons, i.e., we select three photons with the highest energy and rank them in descending order of energy, taking the components of the 4-momentum of each photon in the order, so that an event can be correspond to a 12-dimensional vector.In this section, we select 600000 events for the SM, and 200000 events are generated for the NP.Since we only consider the difference between background features and signal features in this section, the interference is ignored.The interference will be included in the next section.
In terms of the principle of the k-means algorithm, as long as the NP distribution is different from the background in the space of observables, it is possible to use the KMAD in this space of observables.The use of the KMAD and the space of observables can be combined at will.To show that KMAD is an automatic algorithm which has the advantage that one hardly needs to analyze the characteristics of the NP signal theoretically, we use momentum space straightforwardly.
It is important to emphasize that we only use the SM dataset in this step in training, and the NP datasets are used to shown the distribution of anomaly scores.Moreover, to remove the priori knowledge about this process as much as possible, after the datasets are built, the 12 components are treated as just 12 real numbers.In fact the 12 components are not independent of each other.One can make a step of analysis that does not depend on the physical content to remove redundancy.For example, one can use principal component analysis, or autoencode before proceeding to the next step.In this paper, we focus on the effect of KMAD and this step is not in our consideration for simplicity.Again to avoid the use of prior knowledge, in this paper we simply use the Euclidean distance as the distance between two events.

Event selection strategy
The number of clusters in the k-means algorithm is often denoted as the k, and the cluster assignment is often denoted as the k-value.The following is a brief summary of the implementation of KMAD, 1. Specify a k-value for each point randomly.2. Calculate the cluster centroid as the center of the points with k-values as same as the cluster.3. Calculate the distance from each point to the k centroids (all centroids) separately, and by comparing the distances, specify the k-value of the point as the k-value of the nearest centroid.4. Repeat steps 2 and 3, until the k-value of each point is no longer changing.5. Calculate the anomaly score of one point as the distance (denoted as d) from the point to the centroid with the same k-value as this point.
Denote the number of points whose k-value changed at the previous step as n, practically, to speed up the procedure, we stop when n is less than 0.1% of the total event number, i.e., when vast majority in cluster assignments no longer change.There are cases where the classes corresponding to some k-values are empty.In this case, we divide the largest cluster randomly and maintain the total number of clusters as k.When the k-means process (step 1 to 4) is finished, we can calculate how anomalous a point is, which is called anomaly score (d in step 4).Since the k-values given in the first step are random, and we stop before n = 0, the results of the distances of the points to the centroids are not fixed.To avoid the effect of these randomness, we repeated the above process m times and make use of the average of the distances (denoted as d), i.e. the average anomaly score to distinguish the signal events from the background events.
If it is to find anomalous signals in a dataset, only the dataset needs to be provided without specifying the source of the dataset, and the physical content behind it.KMAD will automatically discover those anomalous events in the dataset that are different from the majority.The dataset can be the one obtained directly from experiment, without knowing the predictions from both the SM and NP.Thus KMAD is an unsupervised learning algorithm.At the same time, KMAD can also be used as a supervised learning algorithm.In steps 1-4, the SM dataset obtained from MC can be used, and the process of obtaining centroids can be regarded as the training process.In step 5, the dataset from the experiment is used to find signals in the experimental dataset that are different from the SM.Moreover, when KMAD is used to constrain the operator coefficients, KMAD is also a supervised learning algorithm because one need to provide the SM dataset.
The difference between the two approaches can be neglected if it is assumed that the number of NP signals is small enough to affect the positions of the centroids.Both approaches do not need to know what the NP models are being searched for, and therefore NP specific analysis is not needed.As will be shown in the next section, the KMAD can also been introduced as an aid the the event selection strategy.
We use the second approach in this paper.When the second approach is used, the operator coefficients are completely irrelevant if only the distribution characteristics of the anomalous scores are studied.As introduced, the number of repetitions of the KMAD process m is a tunable parameter.m can be used to control the accuracy of the event selection strategy.We studied the rate of convergence of d as m grows.Picking an event from the SM background and O T 0 signals at 3 TeV, 10 TeV, 14 TeV and 30 TeV, d as functions of m at k = 50 are shown in Fig. 5.We find that d converges rapidly as m grows, when m = 200, the value of d starts to become stable.Theoretically, the value of m can continue to increase to make the relative statistical error of d smaller, but due to limited computational power, we use m = 200 in this paper.The relative statistical error of d for each point does not exceed 4%.
Another tunable parameter of KMAD is k.If it is assumed that different processes are Gaussian distributed in momentum space, then from the point of view of computational efficiency, the best choice of k should be the number of processes, i.e., each classification corresponds to a process of the background, and the NP is relatively farther away from the center of each Gaussian distribution.It is true that there is a possibility that each process in the SM corresponds to a Gaussian distribution in the space of observables by choosing the appropriate observables, however, given the interference in the different processes, the more general case is not encountered as a superposition of Gaussian distributions.Regardless of the case, a larger k provides a better sample of the background distribution as shown in Fig. 2 while requiring more computational resources, so we choose a sufficiently large k in a balance between accuracy and computational efficiency.Fig. 6 shows the distribution of the SM and NP anomaly scores at different k.It can be seen that the distributions of the anomaly scores for the SM background and the NP signals are different.With limited computational power, we take the value of k as 50 in this paper.

Constraints on the coefficients
When no NP signals are found, the study of NP is to set constraints on the parameters of NP.The KMAD can also be used to set constraints on the coefficients of operators contributing to aQGCs.For this purpose, we generate events by using MC with coefficients in Table .2. The contribution of interference between the SM and aQGCs is also included.For each value of the coefficients, 300000 events are generated.It has been shown that p T,γ provides an efficient cut to suppress the SM background.In Ref. [86], p T,γ > 0.12E beam is used as a part of the event selection strategy where E beam is the energy of the beam.To leave some room for KMAD, when generating events, in the standard cuts p T,γ > 0.1E beam is used, other standard cuts relevant to infrared divergences are as same as those in Eq. ( 2).The beam induced background is also very important, for example, the radiated photons tangent to the muon trajectories.At this stage, it is difficult to consider the impact of this part of the contribution since Delphes does not yet include it.However, due to the inclusion of the p T cut in the standard cuts, the impact of this part will be significantly removed.The detailed quantitative analysis is deferred to future works.
Similar as the previous section, at least 3 photons are required to present after fast detector simulation.Then the KMAD event selection strategy is applied to select the events with d larger than 700 GeV, 2000 GeV, 3000 GeV and 7000 GeV for √ s = 3 TeV, 10 TeV, 14 TeV and 30 TeV, respectively.In this paper, the kinematic cuts are used, so this work can only be regarded as using the AD algorithm to improve the efficiency of the event selection strategy, and the event selection strategy in this paper is a combination of traditional and machine learning strategies.The standard cuts used to generate events are used to avoid infrared divergence, and the required number of end-state photons depends on the process to be studied, none of which requires the knowledge of the NP model being studied.Of course those cuts inevitably affect the efficiency of the event selection strategy, especially that we use a large p T,γ cut to suppress the SM background.
With aQGCs presented, the cross-section of tri-photon process can be written as where σ SM is the SM contribution, σ int f T i /Λ 4 denotes interference between the SM and aQGCs, σ NP f T i /Λ  aQGCs induced contribution.The analytical results of σ int and σ NP before cuts can be found in Ref. [86].After event selection strategy, the cross-section is fitted as a bilinear function of f T i /Λ 4 in Eq. ( 3).The cross-sections after event selection strategy and the fitted cross-sections are shown in Fig. 7.It can be found from Fig. 7 that after the event selection strategy, the cross-sections still fit bilinear functions well when the coefficients are within the range listed in Table 2.The interference between the SM and aQGCs plays an important role in the tri-photon process.From Fig. 7, it can be found that the KMAD works well when the interference presents.
The constraints on the coefficients are estimated by using statistical sensitivity defined as [98,99] where N s = (σ − σ SM )L and N bg = σ SM L, and L is the luminosity.In this paper, the luminosities in both the "conservative" and "optimistic" cases [80] are considered.The Table 3: The projected sensitivities on the coefficients of the O T i operators at the muon colliders with different c.m. energies and integrated luminosities for the "conservative" case.
constraints on the coefficients at S stat = 2, 3, 5 are listed in Tables 3 and 4.
In Refs.[86,90], the same process and same operators are studied.The expected constraints at 95% C.L. in the "conservative" case are compared in Fig. 8, where "KMAD", "PCAAD" and "traditional" correspond to the strategy in this paper, in Ref. [90] and in Ref. [86], respectively.It can be seen from Fig. 8 and Tables 3 and 4 that, the expected constraints in this paper are generally strengthened, some are even about one order of magnitude tighter than those in Ref. [86].Compared with Ref. [90] which also concentrate on a machine learning algorithm which has the potential to be accelerated by quantum computers, the KMAD is also better.It can be concluded that, the KMAD is useful and efficient in the search of NP signals.There may be other observables for this process that can be used to distinguish the NP signal from the background.However, unlike the KMAD, the effectiveness of different observables often depends on what the NP to be sought is.The comparisons with traditional event selection strategy in this paper already show this advantage, and for simplicity we neglect to discuss other traditional event selection strategies.
Note that different datasets are used in this section which also verifies that overfitting is not causing a significant impact.Overfitting means that the algorithm is finding patterns only exist in the training dataset, and don't generalize to new, unseen data.In training, the KMAD trains the centroids, which are obtained using the SM dataset.When constraining the operator coefficients, the events are generated with the SM, NP and interference contributions involved.In experiments, the dataset used to constrain the operator coefficients should come from measurements.Therefore, the results in this section (or in experiments) use datasets differ- ent from the one in training, which can be regarded as new and unseen data during the training phase.

Summary
Searching for NP signals requires processing a large amount of data, while it has been shown that the k-means algorithm is able to be accelerated by using quantum computers, which are capable of handling large amounts of data, so it is a ques-tion worth investigating whether the k-means algorithm can also be used to search for NP signals.In this paper, taking the case of aQGCs in the tri-photon processes at the muon colliders as an example, we propose an event selection strategy using KMAD which is based on the k-means algorithm to search for NP signals.
By using MC, the expected constraints on the coefficients of the operators contributing to aQGCs are calculated.The upper bounds in this paper are generally tighter than those obtained by using a traditional event selection strategy.Fig. 8: The expected constraints at 95% C.L. in the "conservative" case using different event selection strategies.The panels from left to right correspond to √ s = 3, 10, 14 and 30 TeV, respectively.The "KMAD", "PCAAD" and "traditional" correspond to the strategy in this paper, in Ref. [90] and in Ref. [86], respectively.It can be concluded that, the KMAD is useful and efficient in the search of NP signals.
The ability of quantum computing to handle large amounts of data will provide a whole new opportunity in the study of NP.Since the k-means algorithm can be accelerated by quantum computing and, at the same time, the k-means algorithm can be used to search for NP signals, we expect that the quantum computers can be used to help to search for NP in the near future.

Fig. 1 :
Fig. 1: Feynman diagrams for the tri-photon processes µ + µ − → γγγ at the muon colliders.The diagrams induced by O T i operators are shown in the left panel, and one of the diagrams in the SM is shown in the right panel.In the SM, there are other five diagrams which can be obtained by permuting the photons in the final state.
).The steps in Figs. 4. (b) and (c) are repeated until the assignment of the points do no longer change, as shown in Figs. 4. (d), (e) and (f).

Fig. 4 :
Fig. 4: A sketch of how k-means cluster the data points with k = 3.

Fig. 5 :
Fig. 5: d as functions of m at k = 50.We find that d converges rapidly as m grows.
with σ being the Pauli matrices and ⃗ W = {W 1 ,W 2 ,W 3 }, B µ and W i µ are U(1) Y and SU(2) I gauge fields, and B µν and W µν correspond to the field strength tensors.The constraints on the coefficients obtained by the LHC are listed in Table1.

Table 4 :
Same as Table3but for the "optimistic" case.