Probing stop pair production at the LHC with graph neural networks

Top-squarks (stops) play a crucial role for the naturalness of supersymmetry (SUSY). However, searching for the stops is a tough task at the LHC. To dig the stops out of the huge LHC data, various expert-constructed kinematic variables or cutting-edge analysis techniques have been invented. In this paper, we propose to represent collision events as event graphs and use the message passing neutral network (MPNN) to analyze the events. As a proof-of-concept, we use our method in the search of the stop pair production at the LHC, and find that our MPNN can efficiently discriminate the signal and back-ground events. In comparison with other machine learning methods (e.g. DNN), MPNN can enhance the mass reach of stop mass by several tens of GeV to over a hundred GeV.

Introduction.After the discovery of the Higgs boson, the pursuit of new physics beyond the Standard Model (SM) is a primary goal of the LHC experiment.A major guideline in this endeavor is the naturalness principle which implies that the new physics for stabilizing the Higgs mass should appear at TeV scale.Among all the proposed scenarios, the weak scale SUSY remains as one of the most popular models, in which the quadratically divergent contribution to the Higgs mass from the top quark is canceled by the top-squarks (stops).Thus, the search for the stops is crucial for testing the naturalness of SUSY.However, searching for the stops at the LHC is a challenging task due to the complicated nature of superparticles (sparticles).(i) For m t1 m t + m χ 0 1 , the stop can decay to tχ 0 1 and produce an energetic top quark.Using endpoint observables, like M T or M T2 , the t t background can be efficiently reduced [1][2][3][4][5].(ii) In the compressed region m t1 − m χ 0 1 ≈ m t , the kinematics of stop pair events closely resemble the t t background events, rendering the searches rather difficult.Thanks to the ISR jet, the stop events in such a compressed region will have a peak-like feature around the ratio of missing transverse momentum vector to the transverse momentum vector of t t system [6][7][8], while the t t background does not show such a peak.If the LSP in the above compressed region becomes almost massless, the precision measurements of t t cross section [9] or spin-correlation [10] can also be used to probe the light stop.(iii) When the two body decays t1 → t χ0 1 and t1 → b χ+ 1 are kinematically forbidden, the three-body decay t1 → W + b χ0 1 [11], the twobody flavor-changing decay t1 → c χ0 1 [12,13] or even the four-body decay t1 → bf f χ0 1 [14] would happen.But due to the small mass splitting, the decay products of the stop are usually too soft to be observed.Thus the ISR/FSI jet (plus the heavy quark tagging) is needed to trigger these stop events [15][16][17].In addition, other miscellaneous studies of stop searches in different parameter space have also been performed at the LHC [18][19][20][21][22][23][24][25][26][27][28][29][30][31][32][33][34][35][36].
Note that besides traditional cut-flow based analyses, some machine learning (ML) algorithms are also begining to be applied to particle physics as it can efficiently find the patterns hidden in complex and large data sets.Among them, the boosted decision tree (BDT) is the most common used ML model, which has been adopted in the searches of stops in the compressed region [37].However, the discriminating power of BDT depends on the human-constructed kinematic variables that are used as input to train the network, which is hard to capture all the features of the events.Deep neural network (DNN) approach can learn the discriminative features of events directly from the four-momenta of individual reconstructed objects in the event [38,39].But DNN still cannot capture the complete event features because of the number of input features is limited in this method.
Very recently, a general framework for supervised learning on graphs called message passing neutral networks (MPNN) [40] has been developed for accurate prediction of molecular properties.It provides an efficient end-to-end solution to learning on graphs with varied number of nodes and edges, in which the nodes are atoms and the edges are chemical bonds.Inspired by this, we propose to describe high energy physics (HEP) events as graphs, which are dubbed as event graphs.In each event graph, the nodes capture the intrinsic properties of individual objects and the edges are weighted by the distances between the objects.Then a variant of MPNN is designed to perform classification on the event graphs.As a proof-of-concept, we apply this idea to the search of the stops through the process pp → t1 t * 1 → t tχ 0 1 χ 0 1 at the LHC, in which the ML graph classification techniques are used to further enhance the discovery sensitivity.As shown by our results, the sensitivity can be significantly enhanced compared with the traditional techniques.
Methods.In a HEP event, the final-state particles produced in collisions are identified as photons, leptons, jets and missing transverse energy (MET) by detectors.We can build an undirected complete graph G = {V, E} to describe an event.Firstly, the graph nodes V are used to represent each reconstructed object.The node features x i (i ∈ V) encode the intrinsic (coordinate-independent) properties of the particles, containing their energy, transverse momenta, invariant mass and a onehot-like encoding of their identities.Every two nodes are connected via an edge, which is weighted by the pair distance between the two particles, where y and φ are the corresponding rapidity and azimuthal angle, respectively.All the information of an event is encoded into components of a graph, and finally forms an event graph.As an illustration, we shows an event graph with detailed node features and distance matrix, built from a Monte Carlo simulated event of the process pp → t1 t * 1 → t tχ 0 1 χ 0 1 → 2b+2j + + / E T in FIG. 1.In contrast with other event representations, such as expertconstructed kinematic variables or four-momenta of fixed number of leading objects, the event graph by design can encode the complete information of the final-states and is boost/rotation invariant.From the perspective of event graph, discriminating signal and background events is translated to classifying the event graphs.
In this paper, we design a variant of MPNN to implement the graph classification, whose architecture is presented in FIG. 2. First, we embed the object intrinsic properties x i into a higher dimensional state vector s where f e is called the node embedding function.The state vector s (0) i only knows the properties of i-th reconstructed object rather than the whole event.Then, the message passing techniques are utilized to perform event graph embedding, which will encode the whole event graph into each node state vector.At t-th iteration, each node i collects the messages sent from other nodes j:  and then update its state vector m are the message functions and f u are the update functions.Like the information dissemination in social networks, by repeating this procedure, the information of object properties together with the pair distances between objects is disseminated with the sent messages, and each node updates its knowledge of other nodes and the relationships between all nodes.Therefore, after T iterations, each resulting node state contains the whole information of the event.This procedure acts as automatic event feature extraction and embeds the whole graph into the high-dimensional node state vectors.Next, each node votes a number as the likeness of the event to be signallike, based on its own knowledge of the event, where f v is the vote function.Finally, to make the prediction stable, we average the votes from each node as the final discrimination score of the event, where |V| is the number of nodes.The above operations form an end-to-end ML model, which maps event graphs directly to discrimination scores, without human event feature engineering.The event selection can then be carried out by applying a specific cut θ y on the score y; only events with y > θ y will be selected out.
In our following calculations, we use 30-dimensional state and message vectors, and choose single layer perceptrons as the node embedding, message passing, update and vote functions, where ⊕ denotes vector concatenation, relu is the rectified linear unit, σ is the sigmoid function, W s and bs are trainable model parameters.Independent message and update functions are used for each iteration t.Note that, to ease the learning of the message functions, pair distance d is expanded on a Gaussian basis N (µ i , δ 2 ) (linearly distributed in [0,5] with width of 0.25) as a 21- Based on our practice, above choices are a good trade-off between model complexity and prediction accuracy.
To train this MPNN model, we utilize supervised learning techniques with binary-cross-entropy as the loss function.The Adam [41] optimizer with a learning rate of 0.001 is used to optimize the model parameters based on the gradient calculated on mini-batch of 100 training examples.A separate set of validation examples is used to measure the generalization accuracy while training to prevent over-fitting using the early-stopping technique.All these are implemented with the open-source deep learning framework PyTorch [42] with strong GPU acceleration.
Results and discussions.As a proof-of-concept, we apply MPNN to investigate the observability of the stop through the process pp → t1 t * 1 → t tχ 0 1 χ 0 1 → 2b + 2j + + / E T at 13 TeV LHC with the luminosity L = 36.1 fb −1 .We assume the LSP χ 0 1 is pure bino and focus on the kinematic region of m t1 ≥ m t + m χ 0 1 .The dominant events in this analysis arise from t t, W + jets and tW .The t tZ(→ ν ν) background is non-negligible for a heavy stop and included in our calculations as well.The multi-jet background can be estimated from data using a fake-factor method, which is found to be negligible in all regions [37].
We use the event generator MadGraph5 aMC@NLO [43] to simulate the signal and background events at the parton-level.Then we carry out the parton shower and hadronization with the Pythia8.2[44].Delphes-3.4.1 [45] is used for fast detector simulation.The anti-k t algorithm [46] with the distance parameter R = 0.4 is chosen to cluster jets, and the b-tagging efficiency is assumed as 80%.In the end, the event preselections are performed by CheckMATE-2.0.14 [47] using the following pre-selection cuts.We require exact one lepton with p T ( ) > 10 GeV and |η( )| < 2.5, and at least four jets with p T (j) > 25 GeV and |η| < 2.5.We also require exact two b-jets in the events.The transverse missing energy should satisfy / E T > 150 GeV.The NLO QCD corrected cross section of stop pair production is calculated with the Prospino [48].The t t and W +jets events are further normalized with their NNLO cross-sections, respectively [49,50].
In TABLE I, we show the performance of the MPNN for two benchmark points with distinctive kinematic features.The benchmark point A lies in the compressed region with m t1 ≈ m t + m χ 0 1 , while the benchmark point B lies in the uncompressed region with m t1 m t + m χ 0 1 .For benchmark point A/B, we generated 14/4 million signal events and 100 million background events.After the pre-selection, 300,000 signal events and 300,000 background events are collected as training examples.We also collect a separate set of 100,000 signal events and 100,000 background events as validation examples to evaluate the performance of the ML models.The event graphs for each event are built as the input of MPNN.In the evaluation of significance S/ B + βB 2 , we assume the systematical uncertainty of backgrounds to be β = 10%.To guarantee the statistics, we require at least 10 events for signal and backgrounds after event selections.From the table, we can see that our MPNN method can greatly improve the significance for both benchmark points as comparison with the ATLAS results [37].The significance of benchmark point A and B increases from 2σ to 3.3σ and 3.7σ, respectively.
To look closer the inside of MPNN, we perform a principle component analysis on the node state vectors s (T ) i .These vectors are the high-dimensional representation of events that the MPNN automatically learns from the training graphs.In FIG. 3, the first two principle components (PC1 and PC2) of all node state vectors s (T ) i for all events are given as an illustration.It shows that for both benchmark points the MPNN can successfully learn a discriminative representation of events that well distinguishes the signal and background events and helps the following classification.
In FIG. 4, we further present the discriminating power of MPNN on signal and background for benchmark points A and B. In each diagram, the training curve is consistent with the validation curves, which demonstrates our MPNNs are not over-fitted.From the top panel, we can see that the signal and background events are well seperated on the distribution of the discrimination score from MPNN, in which the signals are inclined to have larger scores, while the backgrounds have smaller scores.From the middle panel, we can find that the selection efficiency of signal ε S decreases much slower than that of background ε B when the selection threshold θ y increases.For example, when θ y = 0.1(0.8), the corresponding efficiency of signal and background for benchmark point A is 0.97(0.58)and 0.69(0.02),respectively.Furthermore, we give the receiver operating characteristic curves (ROCs) in the bottom panel.
In FIG. 5, we present the 95% C.L. exclusion limits from MPNN on the plane of m t1 versus m χ 0 1 and compare it with the results of ATLAS.We assume the systematical uncertainty of backgrounds to be 5%, 10% and 15% in the evaluation of significance.We can see that our MPNN method can produce a stronger limit for stop mass than cut-flow based method used in ATLAS analyses.For example, if 10% systematical error is assumed, the lower bound of stop at m χ 0 1 = 100 GeV can be excluded up to 1020 GeV, which is about 85 GeV greater than the ATLAS limit.the exclusion limit on stop mass can also be pushed up by about 75 GeV in the compressed region with m t1 ≈ m t + m χ 0 1 .Conclusions.A general framework for the supervised learning on graphs, namely MPNN, provides a new way to discriminate the signal and background events in the HEP analyses.Each event can be described as a graph with varied number of nodes and edges.The MPNN can utilize all the event information to efficiently extract the discriminative event features.We applied such an approach to the search of the stops at the LHC and found that the current ATLAS sensitivity can be greatly enhanced.Our method is rather general and can also be applied to other physical processes.

FIG. 3 .
FIG. 3. The first two principle components of node state vectors s (T ) i of signal (red) and background (blue) events.The left(right) panel is for benchmark point A(B) in TABLE I.
2. The architecture of our message passing neural network (MPNN) designed for event graph classification.The network is a stack of functional layers shown as shadowed blocks, which has T pair of message passing and state update layers for automatic event feature extraction.State vectors s, message vectors m, votes yi and discrimination score y are shown as black boxes.Colored arrows denote applying node embedding function fe, message passing function fm, state update function fu and vote function fv, respectively.Some operators are given in gray circles, where ⊕ performs vector concatenation, Σ and Σ/N are sum and average, respectively.

TABLE I .
[37]comparison of MPNN with the available AT-LAS results[37]for two benchmark points at 13 TeV LHC with the luminosity of L = 36.1 fb −1 .