Equivariant, Safe and Sensitive — Graph Networks for New Physics

: This study introduces a novel Graph Neural Network (GNN) architecture that leverages infrared and collinear (IRC) safety and equivariance to enhance the analysis of collider data for Beyond the Standard Model (BSM) discoveries. By integrating equivari-ance in the rapidity-azimuth plane with IRC-safe principles, our model significantly reduces computational overhead while ensuring theoretical consistency in identifying BSM scenarios amidst Quantum Chromodynamics backgrounds. The proposed GNN architecture demonstrates superior performance in tagging semi-visible jets, highlighting its potential as a robust tool for advancing BSM search strategies at high-energy colliders.


Introduction
The application of machine learning algorithms to jet classification  provides an ideal environment to gauge the interplay between performance and interpretability.On the one hand, we need highly performant algorithms utilising the wealth of experimental data recorded at the Large Hadron Collider (LHC), while on the other, we want to ascertain the reason behind the algorithm's efficiency.Architectures which are Lorentz equivariant [30][31][32][33][34], or are infra-red and collinear (IRC) safe [35][36][37][38][39][40][41][42] have been shown to enhance the physical biases of the algorithms with generally minimal loss, or at times, comparatively increased performance [31,34] with respect to physics-opaque constructions.In this work, we combine IRC safety with Euclidean equivariance in the rapidity-azimuth plane.With well-known examples of jet-shape [43][44][45][46][47][48][49][50] variables, constructed as functions of distance in the rapidity-azimuth plane, this pushes the IRC-safe features closer to human-engineered QCD features without losing out on performance, even with two orders of magnitude lower number of model parameters.
Symmetry in a different sense, namely the degeneracy over nearly degenerate final states, is fundamental to the theoretical robustness of information that can be gained from phenomenological analyses.The Kinoshita-Lee-Nauenberg theorem [51,52] guarantees cancellations of soft and (final-state) collinear divergences when algorithms are infrared (IR) and collinear safe; any departure from IRC-safe methods ultimately implies uncorrectable interpretation shortfalls.This technically elevates IRC safety to a comparable level as the space-time symmetries to perturbatively model and interpret scattering processes.
The recent surge in the application of machine learning for enhancing the sensitivity to new physics in available and projected particle physics data sets has led to a departure from observables that transparently reflect symmetry properties and IRC safety.Nonetheless, any such serious attempt should critically feature IRC safety whilst exploiting symmetries of the data sets in the corresponding architecture.This work aims to demonstrate that this is not only feasible but directly applicable to the determination of BSM parameter regions in case a discovery can be claimed via these techniques.We construct an IRC-safe E(2)-equivariant network and demonstrate the enhanced optimisation capability of such an algorithm compared to approaches that do not exploit symmetry.As most popular substructure variables are generally built out of ∆R = (∆y) 2 + (∆ϕ) 2 , our choice of the E(2)-group preserves such a structure in the feature extraction process, thereby closely matching the QCD-intuitive picture of high-level variables.We then apply this approach to a classification of jet substructure analyses (we consider Hidden Valley Models [53][54][55][56][57][58][59][60] as a representative example), where IRC safety plays a critical role in obtaining a theoretically robust output score when the substructure becomes sparse or soft.
This work is organised as follows: In Sec. 2, we discuss the inductive biases relevant to phenomenology at the LHC, namely IRC safety and equivariance in the rapidity-azimuthal plane.Then, in Sec. 3, we apply this network to a relevant physics case where it discriminates between QCD jets and dark shower jets (semi-visible jets).We discuss the network architecture and its performance.Finally, we provide a summary and conclusion in Sec. 4.

Inductive biases and QCD
The success of deep learning algorithms relies on reducing the search space of possible functions to a predetermined subset by imposing additional structures on the architecture.These inductive biases favour the extraction of relevant features which are not too specific (like high-level variables) but follow the intuition of the application domain.Here, we discuss the two inductive biases we impose on the Message Passing Neural Network utilised in our study: IRC safety and equivariance in the rapidity-azimuth plane.

Infrared and Collinear Safety
From a QCD perspective, infra-red and collinear safe feature extraction promises to provide better interpretability while not losing out on the high classification capabilities.Any message passing operation involves taking in the node features H (l) i of each node i as the input and updating it to H (l+1) i .As IRC safety is defined at the level of the whole jet and not for each constituents, these node features inevitably undergo a graph-readout operation for jet-level classification.A way to define an IRC-safe graph-representation is the "conservation of collinearity" of the node features in each representation indexed by l of the message passing operation which preserves the structure of the angular components in the limit of collinear emissions.In the following, each particle i has four vectors p µ i = (z i , pi ), with z i = p i T / k p k T and pi denoting the angular vector.Mathematically, a node representation H (l) i conserves collinearity, if for any particle q, r, and s, we have Once this condition is satisfied, it is straightforward to define an IRC safe graph representation as For implementing IRC safety in Energy-weighted Message Passing Network (EMPN) [37], one defines an IRC-safe prescription for defining the graph structure.Once this is satisfied for any node i in the graph, the message-passing operation where the scope-dependent factors are an analogue of z i , defined as recursively conserves collinearity of the node features if the initial input H (0) i follows the same.For the inpuit, one can use the relative coordinates in the rapidity-azimuth plane from the jet axis, H (0) i = (∆y iJ , ∆ϕ iJ ) satisfies the condition trivially.The function Φ(l) , in general, can be any well-behaved neural network which we take to be an edge convolution network [61].The neighbourhood N [i] is constructed by imposing the step function Θ(∆R ij < R 0 ) for each particle i, R 0 being a parameter that determines the density of the constructed graph.The graph features (by taking a graph readout as defined in Eq. 2.2) for l > 0 can be concatenated up to the total number of such operations L, which is a hyperparameter.For classification, a downstream network takes the graph representation as an input and gives a classification score.

Equivariance in rapidity-azimuth plane
Equivariance is at the core of the powerful learning ability shown by modern deep-learning algorithms such as Convolutional Neural Networks (CNNs) and Graph Neural Networks (GNNs).They reduce the possible set of functions that a neural network can approximate by assuming symmetries of the input data.For instance, CNNs assume translational symmetry in R 2 while GNNs assume permutation symmetry of the nodes.Equivariance enforces group algebraic structures in the hidden representations in a neural network.This reduces the set of possible functions that a neural network can approximate to those which follow the group equivariant property.
Mathematically, for a group G with an element denoted as g, let X and Y be two sets with x ∈ X and y ∈ Y that admit group actions1 T X (g, x) and T Y (g, y), respectively.A function (2.4) An equivariant neural network [62] approximates such a function f for a given group, and f is generally non-linear.From an approximation perspective, the neural network being able to learn only those functions that follow Eq.(2.4) rather than any general function necessarily reduces the expressive power.However, the reduction generally results in higher efficiency of the optimisation process in terms of network complexity, on top of increasing the interpretability as the particular group G is determined from domain knowledge.If the action T Y is trivial, i.e., T Y (g, y) = y ∀g ∈ G, then the function f is said to be G-invariant.
GNNs are permutation equivariant in the node update stage, while for graph-level purposes, the global readout renders the graph representation permutation invariant.
For our purpose, the group which preserves the separation ∆R in the (y, ϕ) plane is the Euclidean group E(2).To incorporate E(2)-equivariance in an EMPN, we modify the general E(n)-equivariant message-passing operation of reference [63].To define E(n) equivariance in this form, one segregates the node features H (l) i , into a scalar and a vector representation.Denoting the former as h (l) i and the latter as x ) , where Φ , and Φ (l+1) h are multilayer perceptrons (MLPs), with Φ giving a one-dimensional output with a sigmoid activation, thereby being interpreted as a scalar weight in the aggregation.The restriction j ̸ = i in the aggregation steps can be relaxed as the equivariance is satisfied in its absence.Moreover, as the neighbourhood construction modifies the message functions (Φ (l+1) e and Φ

(l+1) x
) by a product with a step function Θ(∆R ij < R 0 ) which is itself E(2) invariant, we can incorporate a local aggregation scheme without impeding global equivariance.We note that modifying Eq. (2.3) to with a well-behaved (to be learned) function F (l+1) still conserves collinearity of the updated node features.This is because the conservation of collinearity in the second argument already requires its conservation in the first argument s .In general, by the definition of a function not being one-to-many, any function f (H i of a single node feature i conserves collinearity, if each H (l) i conserves collinearity.Equipped with this observation, it is straightforward to modify Eq. (2.5) to satisfy IRC safety by incorporating the energy-weighted sum structure in the aggregation as ) . (2.7d) Conservation of collinearity of the updated scalar features h (l+1) i , and vector features x (l+1) i follows from comparing their corresponding aggregation equations to Eq. (2.6).We have modified the input Euclidean norm to the squared value as we have self-loops, and the square root is not differentiable at zero.
The general E(n) group consists of translations and rotations.While there are some instances where the translational symmetry is utilised by defining vector observables like jet pull [64][65][66], most substructure observables are confined to a rotationally symmetric definition by using only the scalar separation ∆R.Therefore, we also study an O(2)equivariant architecture by forgoing translational equivariance.This can be done by modifying Eqs.(2.7a) and (2.7b) to and keeping Eqs.(2.7c) and (2.7d) unchanged.We have added the inner product as it is invariant to rotations, which is not the case for translations.Note that for the final IRC safe feature extraction, we need to perform an energyweighted summed global readout as given in Eq. (2.2) for both x

A Physics application: Tagging semi-visible jets
The radiation pattern arising from perturbative QCD emissions is known to be approximately uniform in the rapidity-azimuth plane.This guides the design of most jet-shape observables [43] to use the relative distance between ∆R, between pairs of particles [46,48], or from well-defined axes [44,45].The implicit assumption underlying such definitions is the invariance of the observables under the two-dimensional Euclidean group E(2) in the rapidity-azimuth plane.It is, therefore, a natural extension of IRC-safe feature extraction to incorporate E(2)-equivariance to better connect to known jet substructure observables in literature.
Although equivariance, in principle, would help segregate QCD radiation patterns with different n-prong structures, a particularly challenging case would be when the parton shower itself is altered with non-trivial physics.For example, a QCD-like dark shower in Hidden Valley models can produce a mixture of BSM and normal SM particles, generally modelled via interleaved parton showers.In such a case, the differences that arise from the QCD radiation are not in the presence of the hard prongs but rather at an intricate level in the relative distribution of the particle's position in the (y, ϕ) plane, with possibly some missing patches left by undetectable stable or quasi-stable particles within the jet.Generally, the energy values would be similar to a one-prong QCD jet.At the same time, the relative position of the particles in the (y, ϕ) plane would be the distinguishing feature in such semi-visible jets.Therefore, E(2) equivariance would be more prominent in identifying such semi-visible jets.
Hidden Valley Models offer an intriguing perspective in particle physics, suggesting the existence of secluded, almost decoupled sectors that interact weakly with the SM.Such a scenario can be brought about by extending the SM gauge group with a new group G v .The choices of the gauge group and new fermions and their relation to the SM particles offer a rich phenomenology with possibly distinct signatures like emerging jets or semi-visible jets not covered by traditional searches at the LHC, like monojet searches [67,68].In general, the SM particles remain neutral under the new group G v , while there can be two classes of new particles: those charged under the SM gauge group and the G v and those which solely have a G v charge.Interactions between the new sector and SM particles can occur through intermediary mechanisms, such as a TeV scale Z ′ or through higher-dimensional operators and loops involving heavy particles possessing non-trivial G SM and G v charges.The case of a confined dark sector decoupled from SM in the lower energies is of particular interest.

Simulation details
Strongly coupled hidden-sector particles produced at the LHC could produce invisible dark mesons along with stable mesons.In such a case, novel LHC signatures such as semivisible jets could be produced, almost indistinguishable from QCD backgrounds, therefore posing an exciting challenge for jet classification.This study considers resonant and nonresonant production of semi-visible jets.For the latter, we consider both Electroweak (EW) and QCD productions separately.We simulate these dark shower jets events at 14 TeV centre-of-mass energy proton-proton collisions using the Hidden Valley (HV) module of Pythia8.310[69].They are generated as: • Resonant production: A new leptophobic vector boson Z ′ is produced as a BSM mediator, decaying to a pair of dark quarks.Events are generated setting with the process HiddenValley:ffbar2Zv = on.We set the mass of Z ′ = 1 TeV with a width of 10 MeV.The Z ′ decays to a pair of dark quarks (q d qd ), which has no SM charge but can decay to another dark sector particle and an SM quark.The subsequent showering and hadronisation produce dark mesons and SM particles.
• Non-resonant EW production: We produce a pair of U v quarks using the process HiddenValley:ffbar2UvUvbar = on.These U v 's are charged under the fundamental representation of the dark SU(N) and mirror the SM charges of the u-quark.It, therefore, radiates into both SM and dark sector particles.
• Non-resonant QCD production: The same U v pair of quarks are produced using the process HiddenValley:gg2UvUvbar = on.Unlike the previous channel, which occurs via an s-channel colour singlet propagator, this channel has an SM colour connection between the initial and the final state lines, resembling QCD dijet productions more closely.
The fraction of stable to total dark mesons produced in the process is controlled by the probVector parameter in the HV Module.We have chosen probVector=0.5 in our analysis.This choice is motivated by higher values producing events with low missing transverse energy, which is covered by dijet bump hunts.In comparison, lower values correspond to signatures where the missing transverse energy's ϕ direction is away from the jet axes and, hence, are covered by more traditional dark matter searches.All final state particles detectable at the LHC are clustered into microjets of radius R = 0.1 using the anti-k t algorithm [70], mimicking the calorimeter resolution.Inclusive microjets with transverse momentum p T > 1 GeV are then clustered into jets of radius R = 0.8 with the anti-k t algorithm.The jet clusterings are performed using the FastJet (v3.4.1) [71] package.Events are required to have at least two jets within |η| < 3, the leading jet is required to have p T > 150 GeV, and other sub-leading jets are required to have p T > 120 GeV.
For the background, QCD dijet events are generated by setting HardQCD:all = on.To have better efficiency in the event selection, we set PhaseSpace:pTHatMin= 100, which puts a lower bound of 100 GeV on the transverse momentum on each of the two final state Representative Feynman diagrams for the dark shower event generation processes.legs in the 2 → 2 processes for all classes.We generate 300k events passing the selection criteria for each class, which are used to create three binary classification datasets for each of the three signals vs the background.The leading jet's constituents in these events are used to construct jet graphs.We use R 0 = 0.5 to construct the radius graph and extract h (0) i = ∆R iJ , and x (0) i = (∆y iJ , ∆ϕ iJ ) along with z i and ω . For the ordinary EMPN architecture, we use i .Since all three networks process the node features to extract relevant edge features as inputs to the learnable functions, no edge features are added.The combined datasets of the signal and background (consisting of 600k samples) are segregated into 60% training, 20% testing, and 20% validation datasets.

Network Architecture and training
The network analysis is carried out using the PyTorch-Geometric [72] package.We consider an IRC-safe EdgeConv operation modified with the radius graph and the energyweighted structure as an example of a non-equivariant architecture.The message passing operation for this architecture follows from Eq. (2.3), with Φ being an MLP which takes the input of the form: For ease of reading, we name the three architecture as E(2)-EMPN, O(2)-EMPN, and EdgeConv-EMPN.As one of the hallmarks of equivariant architectures is the efficiency of feature extraction with a low number of tunable parameters, we consider a small variant containing about 4k parameters and a large variant containing about 150k parameters.The E(2)-EMPN and O(2)-EMPN have the same structure of MLPs, and the difference arises from the edge input to Φ (l) e and the aggregation of the vectors.Since the EdgeConv-EMPN has only one MLP, while the equivariant ones have three, we reduce the dimensionality of the hidden (scalar) node representations in the latter to keep the number of parameters comparable.We apply the message-passing operation twice for all architectures and extract the IRC-safe graph representation after each operation to feed to a classifier MLP.A schematic diagram for the equivariant ones is shown on the right side of figure 1.
For the small parameter size, the equivariant architectures have a 12-dimensional updated scalar node feature after each message operation, while the EdgeConv-EMPN has 16-dimensional updated node features.For l ∈ {1, 2}, the MLPs Φ x , and Φ 1/¯ 1/¯ two hidden layers with the same dimension as the updated scalar feature dimension.Similarly, the edge MLPs have two hidden layers with the same dimensions as the updated node features.An analogous architecture is repeated for the large parameter case, with the equivariant ones updating 80-dimensional scalar node features and the EdgeConv-EMPN updating 128-dimensional ones.All hidden layers have ReLU activation.Except for Φ x 's, which have a one-dimensional output with Sigmoid activation, we do not put any output activation for the other MLPs.The graph representation for the E(2)-EMPN and O(2)-EMPN cases is obtained from the two scalar node representations and two vector node representations.
In total, we have three binary classification datasets and six networks, thereby giving us eighteen training cases.We train the network from random initialisation five times for each of these cases with a batch size of 300 for 200 epochs.The networks are optimised with the Adam [73] optimiser with an initial learning rate of 10 −3 .A decay-on-plateau criterion is utilised for the learning rate, triggering a decay by a factor of 0.5 if the validation loss remains stagnant for three consecutive epochs.

Results
We extract the prediction on the test dataset from the epoch with the lowest validation lost for each of the five training of the eighteen instances.To obtain a summary representation of the receiver-operator characteristics (ROC) curve, we extract the background acceptance ϵ B , at fixed signal efficiency ϵ S over the five training instances of training using a custom implementation. 2Extracting the mean value of the background acceptance εB , over these five instances, we plot the ROC curve between 1/ε B and ϵ S in figure 3. The average of the area under the curve (AUC) and the standard deviation are listed in table 1.
Before comparing the relative differences for the considered architectures, we see that the classification efficiency follows the usual physics intuition.The resonant production of a heavy colour singlet Z ′ sets a hard scale, which induces significant changes in the jets, while the non-resonant productions do not have such a hard scale.In the colour-singlet s-channel non-resonant production the γ and Z, there is no colour flow from the initial quarks to the final state (even though the U v quark carries an SM QCD charge).These jets are less QCD-like than the non-resonant production via gluons, where colour flows between the initial and final states and hence closely resembles QCD jet production.
For the large network on the right side of figure 3, the three architectures have virtually overlapping ROCs with a very minute decrease for the non-equivariant EdgeConv-EMPN.The values of the AUCs in table 1 follow the same trend.This suggests that with sufficiently large parametrisation, the EdgeConv-EMPN can catch the distinguishing features as well as the E(2) and O(2) equivariant ones, even though the extracted features are not equivariant.In stark contrast, there is a very noticeable dip in performance for the EdgeConv-EMPN of 4k parameters for all three signal cases compared to the two equivariant architectures, with the E(2)-EMPN having the best classification power for the resonant and electroweak nonresonant processes, while the O(2)-EMPN bettering the E(2)-EMPN for the non-resonant production via gluons.Moreover, the best equivariant AUCs in the low parameter case are not that far off from their corresponding best ones in the large parameter cases, with the highest difference for the hardest case of distinguishing gluon-induced production of semi-visible jets.This clearly shows that the inductive bias of equivariance in the rapidityazimuth plane helps optimise feature extraction and reduces the need for many parameters.

Summary and Conclusions
The future roadmap of particle physics crucially depends on maximising the discovery potential of the LHC and its high luminosity phase.Recent developments in optimising searches for new physics using machine learning have demonstrated the enormous opportunity created by less traditional approaches to data analysis.Interfacing such highly adapted strategies with theoretical predictions to make robust statements about parameter measurements, whether related to SM or BSM effects, highlights infrared and collinear safety as a central theme of neural network architecture design, e.g. through energy-weighted message passing.Adding such features to the neural network design can practically increase computational demands, which requires optimisation improvements.In this work, we have considered E(2) equivariance as an avenue to remove computational overhead.In particular, we have considered jet-substructure-based analysis to show that an E(2) equivariant Graph Neural Network optimises the network's learning capability by directly exploiting the physical symmetry that the data set possesses.As most jet substructure observables deal with rotationally invariant quantities, we also considered O(2)-equivariance which is a subgroup of E(2).
Such advances directly impact physics applications.To highlight this, we have considered a representative (but not exclusive) scenario of semi-visible jets.In theories underpinning this BSM signature, extracting, e.g., dark shower parameters critically depends on an IRC-safe implementation of the tagging algorithm as the model navigates between visibly hard and invisibly soft final states.This asks for the correct modelling of the QCD null hypothesis to draw theoretically consistent conclusions.The architecture presented in this work provides excellent signal vs. background discrimination with these desired properties.This opens up the possibility of employing the architectures proposed in this work as a new tool for BSM discovery in challenging QCD-dominated final states.
i cannot contain z i from the requirement of IRC safety, even though it is invariant under E(2).A schematic representation of the equivariant message passing is shown on the left of figure1.

Figure 3 :
Figure3: The network performance is shown for scenarios: one with a low number of neural network parameters (≈ 4k) and another with a high number of parameters (≈ 150k), for E(2)-EMPN (solid), O(2)-EMPN (dashed) and EdgeConv-EMPN (dotted) for the three cases of semivisible jet tagger.

Table 1 :
The network performance is illustrated for two scenarios: one with a low number of neural network parameters (≈ 4k) and another with a high number of parameters (≈ 150k), for the E(2)-EMPN, O(2)-EMPN, and EdgeConv-EMPN.