Does SUSY have friends? A new approach for LHC event analysis

We present a novel technique for the analysis of proton-proton collision events from the ATLAS and CMS experiments at the Large Hadron Collider. For a given final state and choice of kinematic variables, we build a graph network in which the individual events appear as weighted nodes, with edges between events defined by their distance in kinematic space. We then show that it is possible to calculate local metrics of the network that serve as event-by-event variables for separating signal and background processes, and we evaluate these for a number of different networks that are derived from different distance metrics. Using supersymmetric electroweakino and stop production as examples, we construct prototype analyses that take account of the fact that the number of simulated Monte Carlo events used in an LHC analysis may differ from the number of events expected in the LHC dataset, allowing one to derive an accurate background estimate for a particle search at the LHC. We show that the network variables offer significantly greater discrimination between signal and background processes than the original kinematic variables used in the definition of the network.


Introduction
The search for physics beyond the Standard Model (BSM) at the Large Hadron Collider (LHC) has thus far not produced any significant evidence for new phenomena, despite many analyses targeting the new particles that arise in a variety of Standard Model extensions. What this means precisely for the landscape of BSM physics models is unclear, since null results are difficult to interpret even within one BSM theory due to the large parameter space and the complexity of the particle spectra predictions.
In fact, there are specific cases where the LHC results are known to provide no constraint in general. For example, a recent global fit of the electroweakino sector of the Minimal Supersymmetric Standard Model (MSSM) [1] demonstrated that there is no significant general exclusion on any range of electroweakino masses. The fact that the weaklyproduced supersymmetric signal is very low rate compared to the dominant Standard Model -1 -background processes means that searches need to be very heavily optimised for specific scenarios in order to provide any sensitivity for discovery. This reduces sensitivity to a large range of models that do not resemble the simplified models used for optimisation [2]. Similar arguments should apply to other supersymmetric sectors, and previous global fits have indeed found ample parameter space for light coloured sparticles once their decays are allowed to be more complex than those encountered in simplified models [3][4][5][6][7].
There is clearly a strong motivation to revisit particle search techniques, and find new ways of extracting small signals from the LHC data. All LHC analyses start from a knowledge of the reconstructed objects in each event, and their reconstructed energies and momenta. These are the attributes of the event, and one typically searches for functions of the four-momenta of the final state particles that, within a given final state, provide effective discrimination between the signal being searched for, and the dominant SM background processes. One can then place a series a selections on these kinematic variables (or perform a machine-learning based classification) in order to find regions of the data that should contain a significant excess of signal events.
In this paper, we propose a novel approach for analysing LHC events that examines the connections between events to define new event-by-event attributes which can be analysed in the usual way. Given a set of kinematic variables, the LHC collision events form a topological structure in kinematic space, and one should expect there to be significant differences between the topological structures predicted for SM events, and those predicted for signal events. Motivated by studies of galaxy topology [8], we build a series of network graphs from simulated LHC events, each of which uses a particular distance metric to define "friendship" between events based on their proximity in a chosen space of kinematic variables. For example, considering only the missing energy values reconstructed for each event, the SM events will cluster in a group at lower missing energy, each having many "friends" close by, while SUSY events can be expected to be few and far from the main part of the distribution, giving a small number of "friends" when viewed as part of a network. In defining friendship in a larger space of kinematic variables, we perform an appropriate scaling of the variables and use them to calculate a distance metric. This is then used to define connections between events that are closer than some distance l, which is a free parameter. A variety of local network metrics can then be calculated for each network, which serve to define new event-by-event attributes that can be used to discriminate rare signals from their dominant SM backgrounds. The analysis is complicated by the need to use local metrics that are invariant under the reweighting of the network nodes (to cope with arbitrary integrated luminosities of simulated MC events with non-trivial individual weights), and we provide a detailed solution using two supersymmetric examples based on stop quark or electroweakino production.
We note that graph networks have recently been used for a variety of applications in particle physics (usually in the context of deep learning studies), including jet tagging [9][10][11][12], modelling kinematics within an event for classification [13,14], reconstructing tracks in silicon detectors [15], pile-up subtraction at the LHC [16], investigating multiparticle correlators [17], and particle reconstruction in calorimeters [18]. A recent work that investigates the relationship between events is the exploration of the earth-moving distance metric presented in Refs. [19,20], although this did not involve building a network based on the metric. Our work is distinguished by the fact that it builds a graph network across proton-proton collision events, and we investigate a number of different distance metrics, and local network metrics, for separating events.
This paper is structured as follows. In Section 2, we provide a brief review of the relevant network analysis concepts, and present our method for defining connections between LHC events. We develop our electroweakino case study in Section 3, and our stop quark example in 4. A discussion of various aspects of the future applicability of our technique is contained in Section 5. Finally, we present our conclusions in Section 6.

Overview
A network is a mathematical data structure that is comprised of nodes connected by either directed or undirected edges. In the following, we will consider finite, undirected graph networks denoted by G = (N , E), where N is the set of nodes, and E ⊆ {{i, j} : i = j ∈ N } is the edge list. Graphs can be described in a number of ways, and for large graphs a particularly convenient form is the adjacency matrix which has the list of nodes as the rows and columns. Connected nodes have a 1 in the relevant adjacency matrix entry, whilst disconnected nodes have a 0. We may write this as A = (a ij ) i,j∈N , where a ij ∈ {0, 1} and a ij = 1 iff {i, j} ∈ E, and we note that the adjacency matrix will be symmetric in our case. The neighbours of a node ν ∈ N are those that are linked to the node by an edge, defined via: These are also referred to as the members of ν's punctured neighbourhood. We can define an extended adjacency matrix A + = (a + ij ) i,j∈N = A + I with: a + ij = a ij + δ ij (2.2) where I is the identity matrix, and δ ij is the Kronecker delta symbol. This then allows us to define the unpunctured neighbourhood of ν as the set of nodes which includes both the neighbours of ν, and ν itself: In our analysis, we will define the adjacency matrix for LHC events by defining a distance between events in the space of kinematic variables that are measured for each event. Assuming some distance d ij between the nodes in this space of variables, one can then define the adjacency matrix via: 0, otherwise, where l is a free parameter known as the linking length. This prescription clearly leaves many choices open for how to proceed, including the choice of kinematic variables, the choice of distance metric, and the choice of linking length for a given analysis. Both the choice of distance metric and the choice of kinematic variables will change the topological structure mapped by the network, and sensible choices might lead to greater differences between the behaviour of Standard Model processes and new physics processes within the network. We show below that it is advantageous to construct networks with a variety of different distance metrics from the LHC data, and to combine variables derived from more than one network. We will also suggest some guiding principles for the selection of optimal linking lengths.
For two vectors u and v in the space of n kinematic variables for an analysis, the distance metrics that we consider in this work are: • The Chebyshev distance: d cheb = max |u i −v i |, i.e. the maximum of the difference between similar kinematic variables for the two chosen points.
• The Bray-Curtis distance: • The cityblock distance: Also known as the Manhattan distance, the cityblock distance is given by whereū is the mean of the elements of the vector u.
The Canberra distance metric was found to be ineffective, and we do not refer to it in the analyses below. We also note that many other possibilities exist in the literature [19][20][21], but we find that the list above offers sufficient performance whilst remaining relatively quick to evaluate.
It is possible to have weights associated with the edges in the adjacency matrix that depart from one, leading to what is conventionally referred to as a weighted network. In our example, we will instead have weights on the nodes of the network. To see why these weights are necessary, consider the formation of a network that is obtained by taking all LHC events that pass some pre-selection (e.g. selection of a given final state with some basic kinematic requirements). Each event then becomes a node in the network, with connections to other events defined via the distance metric. In any real LHC analysis, one would want to compare the behaviour of Standard Model Monte Carlo simulations with the observed data. In the absence of a method for weighting the events, one would -4 -have to generate exactly the same number of events as one would expect to obtain in a given integrated luminosity of LHC data, for every relevant Standard Model process. This is clearly neither feasible nor desirable, and it does not permit the use of Monte Carlo generators with non-trivial weight assignments (such as those that arise from jet matching procedures). Instead, in a node-weighted network, the network can be populated with events whose weights are defined in the normal manner. This is straightforward from the perspective of network analysis, since network nodes are allowed to have any number of attributes assigned to them, but it complicates the calculation of network metrics as we shall see below.

Network metrics
Once a network has been defined, one can calculate a series of network metrics that characterise the network topology. These include global metrics (defined for the network as a whole), and local metrics (which we assume to be defined for each node of the network). For a given selection of events, there is only one network that can be formed from all of the selected events. It is possible in principle to infer the presence of new physics by demonstrating that the global network metrics for the network of selected events depart from a well-modelled Standard Model expectation. However, we will instead focus on local metrics that will allow us to define attributes on an event-by-event basis. These can be substituted for the kinematic variables that are used in a regular LHC event analysis, in which searches for new physics are performed by placing selections on variables to define regions of the data where the background is expected to be small. The simplest example of a local network metric is the degree centrality of a node, which is equal, for an unweighted network, to the number of other nodes that are connected to it. In a social media network, for example, the degree centrality of a given person would be equal to the number of their friends. In a weighted network, the definitions of both local and global metrics must be updated to take account of the fact that each node now represents a different number of effective nodes. Node-weighted network measures can be defined based on the concept of node-splitting invariance as detailed in Ref. [22], and in the following we perform calculations of node-splitting-invariant (n.s.i) local network metrics using a custom version of the pyunicorn package [23]. The full list of network measures that we utilise is as follows.
• The n.s.i degree: For a given node ν, this is the weighted version of the degree centrality, given by: where W = i∈N w i is the sum of the weights of all nodes in the network. In our analysis, W is equivalent to the total number of events expected at the LHC for our assumed integrated luminosity.
• The n.s.i average and maximum neighbours degree: The average neighbours degree of a node ν represents the average size of the network region that an event -5 -linked to ν is linked to. The n.s.i measure of this quantity is given by: (2.6) One can also define an n.s.i maximum neighbors degree, as • The n.s.i betweenness centrality: The shortest path betweenness centrality of a node ν gives the proportion of shortest paths between pairs of randomly chosen nodes that pass through ν. If we label the random nodes by a and b, we have that: where n ab is the total number of shortest paths from a to b, n ab (ν) is the number of those paths that pass through ν, and we have defined the average of a function of node pairs by h(i, j) ij = 1 N 2 i∈N j∈N h(i, j). One can write a formal expression for this quantity by first noting that n ab can be written as a sum over the tuples (t 0 , ..., t d ab ), with t 0 = a and t d ab = b (d ab is the number of links between the nodes a and b on the shortest path), where each tuple in the sum gives a contribution of 1 if every node t l in the tuple is linked to its successor t l+1 , or 0 if at least one node is not linked to its successor. Both of these conditions are met if one simply takes the product of elements of the adjacency matrix for each pair of nodes in the tuple, allowing us to write (2.9) n ab (ν) is given by a similar formula, except that, for some m in 1...d ab − 1, t m must equal ν: The n.s.i version of this quantity can be written as: where we have defined the weighted sum of a function of pairs of nodes h(i, j) wsum ij = i∈N j∈N w i h(i, j)w j , and n * ab (ν) and n * ab are defined below. We note that one can also define a weighted average of a function of pairs of nodes to be h(i, j) w ij = 1 ( i∈N w i) 2 i∈N j∈N w i h(i, j)w j . The n.s.i betweenness centrality values obtained for our examples below do not come close to saturating the maximum value of W 2 /w ν .
-6 -Formulae for n * ab (ν) and n * ab can be derived as follows. If a node s is hypothetically split into two nodes s + s , any shortest path through s becomes a pair of shortest paths (one of which passes through s , and the other of which passes through s ). In addition, a shortest path from s to some b = s will never meet s . Thus, the betweenness centrality can be made invariant under node splitting by making each path's contribution proportional to the product of the weights of the inner nodes, but with the condition that we skip the weight w ν in this product when calculating n ab (ν). Formally, we can write a modified n * ab as: and a modified n * ab (ν) as: A geometric interpretation of the n.s.i betweenness can be obtained by considering the set of nodes of our node-weighted network {N } as a sample from a population of points {N 0 }. Each node ν in the network then represents some small cell R ν of points in the geometric vicinity of ν in {N 0 }. The n.s.i betweenness can be interpreted as an estimate of the probability density that a randomly-chosen shortest path between two randomly-chosen points in the population network passes through a specific randomly-chosen point in R ν . This is ultimately a measure of the importance of the node ν in the network.
• The n.s.i closeness centrality: Another measure of node importance is the closeness centrality, which is defined for a node ν by CC ν = 1/ d νi i , where d νi is the number of links on a shortest path from ν to i, or, if there is no path, ∞, and we have defined the same average over nodes that was used previously. A larger value of CC ν for a node indicates a smaller average number of links to all other nodes in the network. The n.s.i version of this metric is given by: where d * νi is an n.s.i distance function given by: • The n.s.i exponential closeness centrality: A limitation of the closeness centrality is that it receives very low values for nodes which are very close to most of the other nodes, but very far away from at least one of them. To prevent outlying -7 -nodes from skewing the closeness calculation for "typical" nodes, one can use the exponential closeness centrality, defined as CC EC,ν = 2 −d νi i . The n.s.i measure is given by: • The n.s.i harmonic closeness centrality: The harmonic closeness centrality reverses the sum and reciprocal operations in the definition of closeness centrality, such that 1/d νi contributes zero to the sum if there is no path from ν to ı. The n.s.i harmonic close centrality is given by: • The n.s.i local clustering coefficient: The local clustering co-efficient of a node ν is the probability that two nodes drawn at random from those linked to ν are linked with each other. It is given by: The n.s.i version is given by • The n.s.i local Soffer clustering coefficient: An alternative form of the clustering coefficient proposed by Soffer and Vázquez [24] includes a correction that reduces the impact of degree correlations: The n.s.i version of this is given by: .

(2.21)
For the case studies presented in this paper, networks of events are built for different distance metrics after specified pre-selection criteria, which are detailed further in Sections 3 and 4. For a given choice of distance metric, events are linked in the corresponding network if their distance in the original space of kinematic variables is less than a chosen linking length l. The pre-selection criteria applied before the networks are trained involve the typical combinations of particle multiplicity selections and loose kinematic selections that are encountered in SUSY searches at the LHC. Local network metrics are then used along -8 -with further requirements on kinematic variables to define search regions analogous to those used in existing LHC searches. Note that, unlike traditional analysis variables which only depend on the kinematic properties of that event, the presence of a signal in the LHC data would change the local network metrics for the background events in addition to adding signal events to the various distributions. By comparing event yields obtained from a background-only network (which mimics the data that would be expected in the absence of a signal) to those from a signal-plus-background network (which mimics the data that would be expected in the presence of a signal), estimates of the exclusion sensitivity are calculated for the network driven analysis. These are compared to sensitivity estimates for search regions defined only using conventional kinematic variables. The exclusion sensitivities are estimated using the RooStats framework [25], which provides a calculation for the binomial significance (Z bi ) associated with a number counting experiment that tests between signal-plus-background and background-only hypotheses, in the case that the background estimate has an uncertainty derived from an auxillary measurement [26][27][28]. Our comparative results using this significance measure are encouraging and indicate that the network-based methods can offer significant improvements in discovery potential. Since we do not have access to a full background analysis, complete simulation of the detector systems, or the systematic uncertainties available to the experimental collaborations, we do not show a full statistical analysis of the expected exclusion and discovery reach, which is left for future studies.
3 Case study 1: The search for electroweakinos

Model definition and simulation
As our first example of the application of network analysis techniques to an LHC search, we will consider the hunt for supersymmetric electroweakinos at the LHC. The electroweak neutralinos (χ 0 i=1,2,3,4 ) and charginos (χ ± i=1,2 ) are the mass eigenstates corresponding to linear combinations of the superpartners of the unbroken electroweak gauge bosons and the supersymmetric Higgs particles. The mixing of the bino, wino, and Higgsino states into the electroweakino states can greatly impact the resulting phenomenology meaning that LHC searches are forced to make assumptions about the relative masses and mixings of these states. Searches are typically optimised on simplified model scenarios such as that shown in Figure 1. In this case, it is assumed that electroweakinos are produced via wino-likeχ ± 1 −χ 0 2 production, with subsequent decay to SM gauge bosons and the lightest neutralinos, which are assumed to be pure bino states. Different search channels emerge from the possible decay modes of the gauge bosons, usually enriched in leptons due to the relative paucity of multi-lepton SM backgrounds. One can then optimise kinematic selections on functions of the final state particle four-vectors to isolate the signal for a range of assumed masses. Our analysis will differ from the standard approach only via the choice of variables that are used to select signal regions in the data.
We will develop an analysis in the 3 lepton final state, in which the Z boson decays to a pair of electrons or muons, whilst the W boson decays to an electron or a muon plus the corresponding neutrino. We simulateχ ± 1 −χ 0 2 production with Pythia 8 for a model with -9 -mχ± 1 = mχ0 2 = 400 GeV, and mχ0 1 = 0 GeV, and also generate the overwhelmingly dominant Standard Model background arising from diboson (W Z) production. In normalising the electroweakino signal to the relevant integrated luminosity, we use the next-to-leadingorder plus next-to-leading-log cross-section provided in Refs. [29,30]. For the background, we use the next-to-next-to-leading order cross-section presented in Ref. [31]. An LHC detector simulation is performed using Delphes [32][33][34], using the default ATLAS detector card. Two independent simulated W Z samples each containing 10 million events are used; one for the background-only network and one for the signal-plus-background network. This model was not excluded in the three-lepton search channel in the most recent 36.1 fb −1 analysis conducted by the ATLAS collaboration [35], and we will demonstrate the increased sensitivity that arises from a network analysis of 150 fb −1 of integrated luminosity equivalent to the full Run-2 ATLAS dataset.χ

Electroweakino analysis design
The first step in our analysis is to apply a pre-selection consistent with a 3 lepton electroweakino search. We require there to be exactly three light leptons (electrons or muons), with transverse momentum p T > 25 GeV. We further require there to be no b-tagged jets and at most 1 non-b-tagged jet in the event with p T > 25 GeV and |η| < 2.4, and for the dilepton invariant mass of an opposite-sign same-flavour pair in the event to satisfy |m ll − m Z | < 10 GeV. When investigating the potential improvements in sensitivity from including network metrics in the definition of search regions, we consider a series of variables that are typically used to discriminate electroweakino events from diboson events, choosing to focus on those whose distributions over all events show the greatest difference between our benchmark signal point and the diboson background. These are: • The missing transverse energy E miss T , which represents the momentum imbalance transverse to the beam direction and can be used to infer the presence of weakly interacting neutral particles escaping the detector.
• p T (Z): the reconstructed transverse momentum of the Z boson.
• ∆Φ(l + Z , l − Z ): the azimuthal angle between the two leptons associated with the Z boson, which are taken to be the same-flavour opposite sign pair whose di-lepton invariant mass is closest to the Z-boson mass in the case of any ambiguity (with the remaining lepton assigned to the W -boson).
• ∆Φ(Z, l W ): the azimuthal angle between the reconstructed Z boson and the lepton coming from the W boson.
When constructing the network for a given event sample these variables are scaled to equalise their weight in the analysis and account for their differing units. This "median" scaling is performed by subtracting the median of the variable's distribution in the background sample and normalising by the median deviation. A median scaling is chosen rather than a mean scaling to avoid being overly sensitive to tail effects. The distance metrics defined in Section 2 are then calculated for a dataset containing 4000 of our simulated signal events and 4000 of our simulated SM events. Distributions of these distances, normalised to unit area, are shown for signal and background events in Figure 2, where we note that distances between signal and background events enter these figures twice as they contribute to both the "signal" and "background" distributions. These metrics exhibit reasonably large differences between the signal and background processes, indicating the presence of different typical scales at which events are expected to be clustered in the two cases. It is interesting to note that, in some cases, the signal and background roles are reversed; the Bray-Curtis, correlation and cosine metrics all have the signal being more concentrated at small distances than the background. This raises the option of flipping the friendship condition to occur if the distance in kinematic space is greater than the linking length, although we continue to adopt the standard definition in the following analysis for consistency between distance metrics. For each distance metric d ij between pairs of events i and j, we define the adjacency matrix for the graph network by using the definition in Equation 2.4. The linking length l is chosen as the point in each of the histograms where the signal first rises above the background distribution or vice-versa, since this is indicative of a characteristic scale of separation of signal events which differs from the typical separation of background events. Our final choices for each distance metric are summarised in Table 1. After applying the event pre-selection, and with our definitions of "friendship" in place, we build networks for each distance metric and calculate, for each event, the local metrics defined in Section 2. Our analysis will proceed in two stages, designed to reflect how it could be performed in practise at the LHC: • In this section, we use a signal-plus-background network to design signal regions that would be sensitive to our chosen EW benchmark point, using a knowledge of which events in our simulated network are background events, and which are signal events. Within the ATLAS and CMS collaborations, this would correspond to using MC samples to design and optimise signal regions, using only one set of generated background events. Hypothetical significance results are obtained by placing selections on the original variables, and local network metrics, and counting the number of signal and background events in the resulting search region. For this procedure to be valid, it is essential that the background contribution to the signal-plus-background network metric distributions does not change significantly from the background distribution that would result from only having the background present in the network.
-13 - We have checked this explicitly for all variables that are used to define our optimum analyses below 1 .
• In Section 4.3, we develop a realistic example of an LHC exclusion test by simulating an independent set of background events that corresponds to the LHC data that would be obtained in the absence of a signal. These MC events are to be interpreted as "mock LHC data", and we build background-only networks from those events to represent the networks that would be obtained in such an LHC dataset. The yields in our search regions for the mock LHC data can then be compared to the yields expected from our signal-plus-background network analysis to determine the exclusion significance of our benchmark model (which now incorporates the effect of statistical fluctuations in the real LHC data). In practise, this test could be repeated on a variety of signal models, to generate exclusion limits in, for example, simplified model parameter planes.
The metric calculations are computationally expensive to evaluate. For the electroweakino example, we used all events that pass the pre-selection (which amounts to just over 10,000 signal and 10,000 background events before weighting). We are confident that one could evaluate network metrics for O(100, 000) network events with a suitable parallellisation of the network metric calculations, making the use of network variables after a looser pre-selection a realistic possibility within the ATLAS and CMS collaborations. With our chosen distance metrics, we have 7 × 6 =42 new variables in total, and we are also free to retain the original variables for extra kinematic selections.
After building the networks, we explored a variety of potential event selections that used either the network metrics alone, the original kinematic variables alone, or a combination of the original kinematic variables and network metrics. Including tighter kinematic requirements in the pre-selection before building the graph network weakened the sensitivity of the network-driven searches, due to the increased kinematic similarity of the signal and background events that pass tighter kinematic selections. The five kinematic variables are shown in Figure 3. The lower panel of each figure shows the binomial significance Z bi for either an upper cut or lower cut on the variable at the value given on the horizontal axis, assuming a total systematic uncertainty of 15%. This significance calculation also includes the statistical uncertainty of the background which is added to the systematic uncertainty in quadrature. Differences in shape between the signal and SM distributions are clearly apparent, but no selection on a single kinematic variable is able to reach an expected 3σ exclusion sensitivity (corresponding to Z bi = 1.64) for our chosen benchmark point.
Events / Bin 1 −   The supersymmetric events consistently show lower values of k * ν as well as the clustering and closeness variables. This suggests that the supersymmetric events form fewer connections with other events than the Standard Model events, and that the events they do connect with are also sparsely connected. This arises from a combination of factors including the different typical lepton four-momenta expected in the SUSY case (given that the final state has a higher multiplicity), different signal distributions for variables which have well-defined kinematic endpoints for the background, and the fact that the smaller number of SUSY events limits the number of possible connections relative to background events. The rare and complex nature of the SUSY events for our benchmark model can be expected to be common among many BSM model processes, suggesting that these network metrics could also be useful for other cases. This includes the more complex supersymmetry models favoured by recent global fits that significantly depart from simplified model assumptions.
Taken as a set, the local network metric distributions offer a large set of variables that offer greater discrimination between signal and background than can be obtained with the original kinematic variables. To test the robustness of this conclusion, we have checked that the distributions are indeed invariant under changing the number of MC events used to build the networks, and also that the network variable distributions remain the same if the poorly-sampled kinematic tails of the original variables are removed. These checks are presented in Appendix A.
To find optimum kinematic selections for potential analyses, we first consider each variable in turn (including both the original kinematic variables and the local network metrics). We determine which variables will give the highest significance based on a single upper or lower cut, considering many possible cut values for each variable. This is then iteratively repeated with the additional variables until the number of signal and background events passing the combined selections drops below 3 in each case. Some examples of the Z bi values and event yields obtained for promising search regions which provide Z bi > 1.64 are shown in Table 2. This calculation includes both the statistical uncertainty on the event yields, and an assumed systematic uncertainty of 15%.

Results of realistic electroweakino exclusion test
Having defined signal regions using hypothetical signal-plus-background networks, we now compare the yields derived from the signal-plus-background networks with the backgroundonly networks derived from our mock LHC dataset. This determines the exclusion sensitivity for our benchmark point that one would obtain with actual LHC data. The binomial significance is once again calculated with an error on the background yield that includes the statistical uncertainty plus an additional 15% systematic uncertainty. Table 3 compares the expected sensitivity of these regions with that obtained in some representative search regions constructed using only conventional kinematic variables. These selections are loosely inspired by the regions in the ATLAS 36.1fb −1 search [35], but generally have tighter requirements on E miss T and M l,min T . We also impose slightly different pre-selection requirements (we require n jets < 2 whereas the ATLAS search region was inclusive in light jet multiplicity, but had "binned" regions considering the 0 and >0 light jet categories separately). The Z bi values differ from those of the previous section as expected, primarily due to statistical fluctuations in the assumed LHC dataset. As is demonstrated in Figure 6, which superimposes example background distributions taken from the background component of the signal-plus-background networks and the background-only networks, the general shape of the distributions are the same. This validates our previous procedure of designing search regions using signal-plus-background networks alone, and ultimately tells us that, for the chosen variables, the addition of rare signal events to the network does not subtantially alter the connection properties of background events. It remains the case that network methods provide exclusion sensitivity for the benchmark SUSY model, and they also outperform the analysis that uses cuts on the original kinematic variables. for W Z background events calculated from either the signal-plus-backround or background-only network. These two networks use different sets of simulated W Z events.  Table 3: Yields in our electroweakino search regions for our mock LHC data set (N b−only ) and our mock MC set (N s+b ). Also shown is the sensitivity of search regions using only the original kinematic variables. The errors quoted are statistical only, whilst the Z bi calculation uses a relative background uncertainty that combines this in quadrature with a 15% systematic uncertainty.

Model definition and simulation
As a second demonstration of the application of network analysis to an LHC search, we consider the search for supersymmetric top quarks at the LHC. Searches are typically optimised on the simplified model scenario shown in Figure 7, where it is assumed thatt 1t1 production dominates at the LHC, and the decay of the stop is presumed to occur with 100% branching ratio to either a top quark and a lightest neutralino,χ 0 1 , or a b quark and a lightest chargino. We will assume the decay shown in the left-hand diagram of Figure 7. We will develop a prototype analysis in the 1 lepton final state, in which one of the top quarks is assumed to decay hadronically, whilst the other decays leptonically, which is typically the most constraining final state. We simulate stop quark pair production for a model with mt 1 = 500 GeV, and mχ0 1 = 280 GeV, and also generate the dominant Standard Model background arising from top quark production (including both top pair and single top production), both with Pythia 8. The signal cross-section used for normalisation is the next-to-next-to-leading-order plus approximate next-to-next-to-leading log cross-section derived from Refs. [36][37][38][39]. For the background, we use the next-to-next-toleading-order plus next-to-next-to-leading log tt cross-section derived from Refs. [40][41][42][43][44][45][46] In a real analysis, there would be a small contribution from events containing a W boson produced in association with jets, but we neglect this for our proof of principle. Again, our benchmark SUSY model choice is motivated by the fact that the model was not excluded by the most recent 36.1 fb −1 analysis conducted by the ATLAS collaboration [35], and it can be expected to be challenging to observe due to the fact that the kinematic properties of the top quarks produced are rather similar to those expected in the Standard Model. We generated 30 million stop events and 30 million top events which after pre-selection leads to approximately 38,000 background events and 150,000 signal events (before weighting). To speed up the network calculations for our proof of principle, we take a subsample of 10,000 events for both the signal and background, and adjust their weights accordingly. We demonstrate that our results remain the same under a doubling of the number of background MC events in Appendix B.

Stop analysis design
We require there to be exactly one electron or muon, with transverse momentum p T > 25 GeV, at least 2 b-jets (with p T > 30 GeV), and a missing transverse energy, E miss T , greater than 100 GeV. We have then examined a number of variables that are normally used in stop searches, choosing the following set that show good discrimination between the background and signal processes for our benchmark signal point: • The leading jet transverse momentum p j 1 T ; • the value of the missing transverse momentum E miss T ; • the minimum value of the transverse mass, m b,min • the minimum value of the invariant mass formed by the lepton and each of the two b-jets in the event, m min bl . For the top pair production process, this has a maximum value, whilst signal events can have higher values; • the scalar sum of the moduli of the transverse momenta for the lepton and the 2 b-jets, which we refer to as H T ; • ∆R(b 1 , b 2 ), defined as the difference in the R = (∆η) 2 + (∆φ) 2 value for the two b-jets; • the asymmetric m T 2 value, am T2 defined in Refs. [47][48][49][50][51].
We show histograms of event rates as functions of these variables after the pre-selection in Figure 8, with the samples scaled using the "background median" procedure defined previously. In all cases the signal lies substantially below the background across the entire range of the distribution. The variables are used in the calculation of our various distance metrics, exactly as we did in the previous section. Similar histograms of the distance -22 -metrics are provided for our stop example in Figure 9. Based on these plots, we select the linking lengths given in Table 4, where we note that we have suppressed metrics that did not turn out to be useful in the final analysis. As in the EW example, the correlation and cosine metrics all have the signal being more concentrated at small distances than the background, but we again retain the standard friendship condition when building our networks.
-23 -  In Figures 10 and 11, we show the network metrics that offer the best discrimination between signal and background for the stop example. The betweenness measures typically fall off much faster for top events than for stop events. This is particularly true for the networks built using the correlation and cosine distance metrics, for which the betweenness centrality distributions for the background do not show evidence of a flatter tail which is apparent in the case of the Mahalanobis betweenness. To check that this is not an artifact introduced by poor sampling of the kinematic tails of the original variables in Figure 8, we performed a variety of tests which are summarised in Appendix B, revealing that the endpoint behaviour is robust. We further motivate the shape of the n.s.i betweenness centrality distribution in Appendix C. More modest discrimination comes from the Mahalanobis harmonic closeness, the Euclidean local clustering, the cosine average neighbors degree and the Mahalanobis harmonic closeness. Whilst single cuts on the correlation or cosine betweenness centrality variables would seem to be the best motivated approach for exclusion and discovery, we also found optimum analyses for exclusion in the case that several variables are used at the same time. We followed a similar procedure to that of the previous section, with the exception that we modified the criterion on the number of events that must remain after applying all selections to take into account the change in the event weights. Table 5 shows the analyses that delivered the highest binomial significance values.
-26 -      Table 5: Examples of binomial significances for promising stop analyses. We assume a systematic uncertainty on the background of 15%, added in quadrature with the statistical uncertainty on the MC yields.

Results of realistic stop exclusion test
As we did for the electroweakino example, we now compare the stop and top yields derived from the signal-plus-background networks with the background-only networks derived from our mock LHC top dataset. A comparison to a pre-existing LHC analysis is less trivial in the stop case, since no analysis in the literature uses the precise combination of kinematic variables that we defined above. Table 6 shows the significance values for our benchmark point that one should observe in real LHC data, indicating that an upward fluctuation in the background has made it hard to reach exclusion sensitivity in this case. This does not tell the whole story, however. If we take one of the search regions from the previous section with the highest expected significance (BC * ,corr ν > 3410000, k * ,cos nnmax,ν > 677000), and examine the BC * ,corr ν distribution after the selection on k * ,cos nnmax,ν , we obtain the distribution in the right panel of Figure 12, which clearly shows the expected signal tail that would allow a much higher significance in principle. This motivates further study with more MC events, which in turn requires novel approaches for rapid computation of the betweenness centrality which we leave for further work.

Discussion
For both of our analysis examples, local network metrics offer powerful discrimination between signal and background. In Figure 12, we show one distribution from each of our examples that further illustrates this point. The left panel takes the electroweak search region with C * ,euc ν < 0.76 and p T (j 1 ) < 220, and shows the C * ,euc ν distribution after the selection on the jet transverse momentum is applied. The right panel shows the stop BC * ,corr ν distribution described previously. In both cases, we can see the extra separation power afforded by the local network metrics, which indicates that discovery potential may  Table 6: Yields in our stop search regions for our mock LHC data set (N b−only ) and our mock MC set (N s+b ). The errors quoted are statistical only, whilst the Z bi calculation uses a relative background uncertainty that combines this in quadrature with a 15% systematic uncertainty.
be reachable if one can verify that the expected background in the tail of the distributions is genuinely negligible. There is much remaining to be explored in future work, starting with the fact that our study is currently limited by the computational complexity of local network metric calculations for large networks. As more signal and background events are considered, the number of edges in the network grows to be very large, hampering our current parallelised calculations in the pyunicorn package. The betweenness centrality is the most expensive -30 -calculation, and we note that there is room for both a smarter parallelisation strategy, and the use of fast approximations such as the graph neural network approach presented in Ref. [52]. It would also be interesting to study how our prototype analyses change when choosing different linking lengths for each distance metric when defining the graph networks. A significant reduction in computational complexity could be achieved with shorter linking lengths, since this would make all of the networks more sparse. The cost, however, is that it will be harder to discriminate background from signal events, since both signal and background events will often have a low degree in the network.
Our graph network technique also offers the possibility of unsupervised event detection in the LHC data, since one can look for local network metric shapes that differ from those expected in a purely Standard Model network. A similar analysis would also be possible with global network metrics. In both cases, this is only feasible if the number of events in a given final state in the full LHC dataset is small enough that network computations can be performed within a reasonable timescale. In the case of the "supervised" analyses presented above, one of the functions of the pre-selection is to reduce the number of events that must be processed to a manageable level, with the consequence that some model-dependence is introduced through the choice of pre-selection. Fast approaches to calculating n.s.i local network metrics would, however, provide a very powerful way to search for new physics in an agnostic fashion, complementing previous model-independent approaches such as those presented in Refs. [53][54][55][56][57][58][59][60][61][62][63][64][65][66][67][68][69][70][71].

Conclusions
We have shown, using two supersymmetric examples, that graph network analysis can be used to define powerful variables for discriminating signal and background processes at the LHC. By building graph networks using a variety of distance metrics, we have developed prototypical LHC analyses that use local metrics of the graphs as new event attributes, alongside the original kinematic variables that the networks were built from. This permits the mapping of the topological structure, in kinematic space, of different contributions to the LHC dataset.
The results indicate that the graph network technique offers exclusion potential for benchmark models of stop and electroweakino production that are hard to exclude using conventional kinematic variables. At the very least, our results indicate that the technique provides a promising alternative to current search methods, motivating a deeper analysis that could explore different distance metrics, linking lengths and pre-selections. Better scalability of the technique will require faster computational approaches for node-splitting invariant network metric calculations.
for helpful advice regarding the pyunicorn package, and on the implementation and behaviour of the n.s.i betweenness centrality. Our perspective on unsupervised LHC searches has benefitted from interaction with the Dark Machines research collective (https:// darkmachines.org).
A Checks of the network variable shapes for the electroweakino example Given the strong separation afforded by some of our network variables, we have performed various checks to test the robustness of their behaviour. For example, since the number of SM events is underestimated in the poorly-sampled tails of the original kinematic variables, it is essential to check if these events are also the cause of the long signal tails in the network variables. If not, these tails in the network variables must instead arise from bulk features of the original kinematic distributions. An obvious check involves running more background MC events to populate the kinematic tails, which is difficult to accomplish due to the prohibitive computational requirements of training very large, non-sparse networks. There are, however, a number of other tests that prove to be informative.
First, the behaviour of the network variables was studied when different numbers of MC background events are used to generate the signal-plus-background network. This is designed to demonstrate two key things: • The number of MC events we have generated is sufficient to generate accurate variable distributions.
• The crucial concept of node splitting invariance is indeed correct, and allows our local network metrics to be compared between an LHC dataset, and a MC set that corresponds to a different integrated luminosity.
The results in the main text were obtained using 11,197 signal events and 10,486 background events -the full number of background events with a three lepton final state obtained from a generated set of 1,000,000 events. In Figure 13 distributions for some example network variables calculated from these samples are shown, in comparison to those calculated from networks built from samples using 1,000 or 5,000 background events. From these plots it is clear that larger event samples produce less sparse tails and smoother distributions, however the bulk shapes of the distributions remain the same.
One might worry that the tails in our local network metrics arise from events that are in the poorly-sampled tails of the original kinematic variables. To check this, we have performed the entire network analysis again, but with the pre-selection amended by the additional selections given in Table 7. By placing upper cuts on each variable, we have restricted all nodes in the network to regions of the original kinematic variables where the tail is well-sampled. Some of the resulting network metric distributions are shown in Figure 14. These are compared with the distributions calculated using the preselection cuts used in the main body of the text, with both sets of distributions normalised to unit area. The distributions generally show good agreement, particularly for the variables that are used in our most promising search regions.
-32 - Table 7: Extra kinematic selections applied to our graph network analysis in order to investigate the effect of poorly sampled kinematic tails.

B Checks of the network variable shapes for the stop example
For the stop analysis example, we have also performed a number of checks to test the robustness of the variables. First, we investigated the behaviour of BC * ,corr ν under a doubling of the number of MC background events used to generate the signal-plus-background network. When the number of background events is increased from 10,000 to 20,000, the distributions of network metrics consistently remain unchanged. This behaviour is shown in Figure 15 for a range of metrics.  amended by the additional selections given in Table 8. The resulting local network metrics are shown in Figure 17, with shapes that closely match the original distributions shown in Figure 10. It therefore appears that the difference in the BC * ,corr ν distributions for stop and top processes arises from a genuine difference in the bulk properties of the original kinematic distributions. We provide a possible explanation for this difference in Appendix C.    Table 8.

C The shape of the n.s.i betweenness centrality distribution
In this appendix, we motivate the shape of the n.s.i betweenness centrality distribution, to understand why it has a tail for the signal rather than the background in our stop simplified model example.
In the pyunicorn implementation of the betweenness centrality, the unweighted version is expressed as a sum rather than an average, with a maximum value of N 2 , for N nodes in the network. In Figure 18, we show the unweighted betweenness centrality for the stop simplified model example, using a network built with the correlation distance metric from 1,000 signal events and 1,000 background events. All event weights have been set to one. These distributions do not represent the actual distributions expected in LHC data because the betweenness calculation must be replaced by the n.s.i version that is safe under nodeweighting, and the histogram itself should be filled with appropriate weights. Nevertheless, they are clearly very similar, which tells us that the actual pattern of the nodes in the network is not driving the tail behaviour, which is instead entirely driven by the details of the n.s.i corrections to the betweenness formula. This is important, because it tells us that attempts to visualise the network based on the actual pattern of nodes that we have simulated will not lead to any important insights (one would instead have to explicitly replace nodes by the number of nodes corresponding to their weight, which is not possible for non-integer weights). Events / Bin Figure 18: Regular betweenness centrality distributions obtained for the stop simplified model example using the correlation distance metric, using 1,000 signal and 1,000 background events with all event weights set to one. The left plot shows the background distribution, whilst the right plot shows the signal distribution.
The way to proceed instead is to understand the difference between the unweighted betweenness formula, and the n.s.i version. Nodes that have the maximum possible value of the betweenness centrality would have a value of the n.s.i betweenness which is equal to the regular value scaled by W 2 /(N 2 w ν ), where w ν is the weight of the node, and W is the total sum of weights in the network. Nodes that have a betweenness centrality somewhere between zero and the maximum value get a scaling which is similar in scale, but not identical. We can now investigate this scaling for a small network. If we build a network using the correlation distance with 10 signal events and 10 background events, with all node weights set to one, we obtain the betweenness distributions shown in Figure 19. It is worth noting that none of the values get anywhere close to the maximum value of N 2 = 400.
The n.s.i betweenness is shown in Figure 20, where the weights are now assigned correctly. Immediately, we see that the typical scale for the signal distribution is orders of magnitude greater than that of the background distribution. The signal weight for events in the network is 462.9, whilst the background weight is 163900. Thus W = 1640000 and W 2 /N 2 = 6750000000. This gives a scaling factor for the betweenness of 41200 for the background, and 14,600,000 for the signal. This explains the typical difference in scale of the signal and background values in the network.
We have shown that the tail behaviour entirely comes from the difference between the signal and background weights in the network. The natural question is thus whether this is a real effect that would be exhibited by LHC data (where all weights are equal to one), or an artifact of our MC generation procedure.
We argue that this is a real effect that would be exhibited by the LHC data, since the whole point of the n.s.i formulation of the betweenness centrality is that it should be -39 - Events / Bin Figure 19: Regular betweenness centrality distributions obtained for the stop simplified model example using 10 signal and 10 background events with all weights set to 1, and using the correlation distance metric. The left plot shows the background distribution, whilst the right plot shows the signal distribution.  safe under a splitting or merging of the network nodes. This is explicitly designed to cope with the scenario in which a finite sample has been taken from an underlying population of nodes, and the nodes have been reweighted to approximate the original distribution. In our example with 10 signal and 10 background events, both of these are reduced samples of the original population, and it would be possible to start from the original population of nodes and merge them until we have only 10 signal and 10 background nodes, each with a higher weight. In this case, we would expect to see the same betweenness centrality distribution as the one seen in the original population. This conclusion is safe as long as the number of events that we have simulated is an adequately dense sample of the original population, which can be tested by building a network with more events and testing if we get similar distributions after reweighting. This is observed to be the case for both the stop and electroweakino examples once one simulates a few thousand events in each case.
The origin of the difference in behaviour must ultimately result from the large disparity of the number of background and signal events in the network built from the original population (the background weight is much higher in our example). When a node s is split into s + s , the two nodes s and s are linked with each other, and each is linked to the other nodes in the same way as the original node. Any shortest path through s becomes a pair of shortest paths, one through s and another through s . Also, a shortest path from s to some node b = s will never meet s . A background event gets split into a relatively large number of nodes under reweighting. For any one of these new nodes ν, the other nodes from the split create a large number of new shortest paths that do not pass through ν. This increases the denominator of the betweenness centrality much more than the numerator. If instead the node was a signal event, it gets split into less nodes, and the betweenness centrality is not changed by the same amount. So the nature of the splitting (which is driven by the relative proportion of signal and background events in the assumed LHC integrated luminosity) affects signal and background events differently as we go from the regular betweenness centrality to the weighted form. This explains the journey from the unweighted to the weighted betweenness centrality values, but once we complete that journey, we arrive at the distribution that should be exhibited by LHC data.