Hypergraphs in LHC Phenomenology – The Next Frontier of IRC-Safe Feature Extraction

: In this study, we critically evaluate the approximation capabilities of existing infra-red and collinear (IRC) safe feature extraction algorithms, namely Energy Flow Networks (EFNs) and Energy-weighted Message Passing Networks (EMPNs). Our analysis reveals that these algorithms fall short in extracting features from any N -point correlation that isn’t a power of two, based on the complete basis of IRC safe observables, specifically C-correlators. To address this limitation, we introduce the Hypergraph Energy-weighted Message Passing Networks (H-EMPNs), designed to capture any N -point correlation among particles efficiently. Using the case study of top vs. QCD jets, which holds significant information in its 3-point correlations, we demonstrate that H-EMPNs targeting up to N=3 correlations exhibit superior performance compared to EMPNs focusing on up to N=4 correlations within jet constituents.

With the recorded events naturally represented as sets (of variable sizes) of different reconstructed particles or raw detector hits, point clouds are the natural representation of the recorded data, and architectures to process such data efficiently, particularly Graph Neural Networks [41][42][43][44][45][46][47][48], have been used successfully for LHC phenomenology.However, graphs do not expose higher-order correlations within the data by design, concentrating on two-particle correlations-the natural generalisation being hypergraphs.This generalisation is diagrammatically shown in figure 1 for a three-prong top jet where the graph's

Graph Hypergraph
Figure 1: Visualisation of the inter-relations of jet constituents as captured by a graph structure (left) and a hypergraph structure with order-three hyperedges (right).In a graph structure, the edges correlate two constituents at a time and are shown as a line segment connecting two nodes.Instead, the order-three hyperedges simultaneously link properties of three jet constituents at a time and are shown as a triangle with vertices coinciding with three nodes.Thus, hypergraphs are more expressive structures and can access higher-order correlations amongst jet constituents.edges are defined in terms of two particles, while the order three hyperedges can look into the relevant three-prong structure of the top jet.This paper addresses these challenges by introducing Hypergraph Energy-weighted Message Passing Networks (H-EMPNs) designed to extract three-particle correlations better than existing IRC-safe feature extractors.We first examine the universal approximation capabilities of existing infra-red and collinear safe neural network models like Energy Flow Networks (EFNs) [31] and Energy-weighted Message Passing Networks (EMPNs) [36] in approximating any IRC safe observable expressible in terms of C-correlators [49,50] looking into any general N -body phase space.Finding that EFNs are restricted N = 1, and EMPNs have an arguably weak capability for approximating any N ̸ = 2 n C-correlators, we present H-EMPN as a more robust and versatile model capable of efficiently approximating any general IRC safe observable for any general N .Our method leverages the power of message-passing in graphs and hypergraphs to capture higher-order relationships among the data points, thereby providing a more comprehensive feature extraction mechanism.
Restricting ourselves to N = 3 for the top vs QCD jet tagging scenario, where the dominant information lies in the 3-body decay phase space of the top quark, we find that H-EMPNs outperform EMPN, which look up to N = 4 interparticle correlations, confirming our initial observation.We demonstrate the efficacy of H-EMPNs through empirical tests to showcase the learned graph representations.Furthermore, we discuss the architectural nuances of H-EMPN, providing insights into its design and training procedures.By doing so, we aim to establish H-EMPN as a powerful tool for LHC phenomenology, opening new avenues for applications in collider phenomenology.Specifically, in section 2, we discuss the universal approximation of any IRC safe ob-servable by EFNs and EMPNs by taking its correspondence to any generic C-correlator.In section.3, we devise H-EMPNs that can approximate any general C-correlator.The architecture and training details are presented in section 4, while the results are presented in section. 5. We conclude in section 6.

Notation
In the following discussions, we are given the set of four vectors of the jet constituents S = { p 1 , p 2 , ...., p npart.} , with n part being the number of constituents.These particles will be indexed via small Roman subscripts, while the number of message-passing operations will be indexed as Greek superscripts.Unless otherwise stated, all summations will be over the set S. The four vectors are given in terms of the relative hardness z i = p i T / p j T and the rapidityazimuth variables pi = (y i , ϕ i ).Bold-faced alphabets like h i and G denote vector quantities with their italicised counterparts h i and G acting as a placeholder for a component.As we will consider inference on networks after training rather than the training itself, we will not explicitly write the dependence of function approximators on the tunable parameters.For instance, g (α) (h ) denotes a MultiLayer Perceptron (MLP) at the α th message passing step, h (α−1) i and h (α−1) j correspond to the updated node features in the previous operation of particle i and j, respectively, in S.

Universal Approximation of IRC safe observables
In present scientific literature, it is well-known that MLPs are universal function approximators [51][52][53].Without going into mathematical rigour, a parametrized function f (x, Θ) of a vector x and tunable parameters Θ, is a universal approximator if it can approximate any continuous function up to any arbitrary precision in a compact domain and range.On the other hand, physical observables like momenta or position live in an underlying metric space, and notions of completeness have long been the bread-and-butter of physicists to study physical systems.The complete set of IRC safe observables is essential at the LHC and the subject of our present investigation.Any IRC safe observable O can be expanded in a basis of C-correlators [49] as where f N is symmetric to any permutation of its arguments.Energy Flow Polynomials (EFPs) [50] expand O in a basis of polynomials of energy using the Stone-Weierstrass approximation theorem.In this section, we take a look into the approximation capabilities of existing IRC safe neural networks, namely Energy Flow Networks [31], and Energyweighted Message Passing Network (EMPN) [36], comparing the functional form to any arbitrary N in the basis of C-correlators.As the C-correlators are complete, the networkextracted observables would be expressible as a linear sum of different C-correlators, and we investigate the terms in the sum (as given in eq.2.1) that are optimally extracted via these observables.
Although we rely on the statement of universal approximation theorems, it is important to remember that we will strictly talk about the existence of such approximators and not concentrate on the method of finding such a function.However, presently available gradient descent algorithms are powerful enough to efficiently find an approximation given that we have the desired output value on a large enough number of samples.This numerical nature of finding a practical working point in the weight space is one of the significant concerns regarding the interpretability of neural networks in general.Our aim is not to tackle this more difficult problem but to systematically establish the capability of IRC-safe feature extractors based on their ability to approximate different C-correlators.Moreover, we concentrate on the extracted features rather than the final observable approximated by the complete network, i.e. we do not consider the function approximation done by the downstream MLP, which takes the extracted IRC safe features, as this would be akin to a usual multi-variate approach of physics motivated features.
As we will study the general behaviour of the approximated function whose weights are frozen after some training procedure, we will not discuss the explicit dependence of the neural networks on their tuneable parameters in the following discussions.

Energy Flow Networks
Energy Flow Networks are infra-red and collinear safe deep sets model which learns a per-particle map of each particle's directional coordinates pi and undergoes an energyweighted sum to form a fixed length representation of any variable cardinality constituent set.Without loss of generality for a multi-dimensional representation, a single IRC safe observable can be written as where g 1 (p i ) represents a parameterised multilayer perceptron.We have specifically denoted the observable as C 1 to make it self-evident the per-particle map essentially approximates any general C f 1 1 .This is because the MLP g 1 is a universal approximator and can approximate any function f 1 suiting a particular objective up to a required precision.In a practical implementation, several related IRC safe observables are approximated, which are fed to a downstream network for classification.The direct implementation of EFNs can, therefore, only extract features expressible in terms of C 1 .

Energy-weighted Message Passing Networks
An energy-weighted message passing operation for any general parametrised function ḡ(α) can be written as = z j , for the whole set S. For notational convenience in the following discussions, we will take the sum over the full set of particles in the jet and replace z j in place of ω (N [i]) j without loss of generality.Therefore, we have with the function g (α+1) expressed as a product of a Heaviside step functions Θ(∆R ij < R 0 ) and the original message function ḡ(α+1) as Here, ∆R ij is the Euclidean distance in the rapidity-azimuth plane between particle i and j while R 0 is the graph's radius.The requirement of symmetry in the argument of f 2 (p i , pj ) for C f 2 2 and its absence in eq.2.2 is not a contradiction as the node features themselves are defined for each particle and hence are not IRC safe observables.In contrast, the IRC safe graph representation will generally be expressible as some linear combination of C f N N .We have h for any α ≥ 0 and any two collinear particles i and j.The IRC safe graph representation is obtained as , after L iterations.As we shall see in the following, the complexity of the extracted features via EMPNs will depend on the value of L.
Explicitly for L = 1, we have h i = j z j g (1) (p i , pj ) which gives If the symmetry is enforced in g (1) , the approximated observable will contain a C f 2 2 term alone.At the same time, a non-symmetric g (1) would also have a C f 1 1 component.For L = 2, we have The complicated nature of the arguments makes it difficult to ascertain the exact behaviour of the functional approximation.One expects the universal approximator g (2) to be expressible as a linear combination of C f N N 's up to N = 4.However, due to the presence of four angular arguments and four energy weights, it hints against the efficient approximation of any C f N N for any N < 4.
The situation is even more futile for L = 3 with eight angular arguments and eight energy-weighted sums.For a particular L, we have 2 L angular arguments and the same number of energy-weighted sums.Even if one extracts the graph features at each stage α, and gets a concatenated graph representation for each α > 0 up to α = L, we have the efficient extraction of 2, 2 2 , 2 3 , ....2 L terms the sum in eq.2.1 for any general IRC safe observable O. Although, for jet substructure applications, one does not need to go to very high N , we already run into a problem for top-tagging, which has valuable information in the 3-prong structure of the energy deposits.

Hypergraph Energy-weighted Message Passing Networks
As discussed above, although powerful, Graph Neural Networks cannot look into higherorder relational information amongst the nodes efficiently.Therefore, in this section, we develop IRC-safe point cloud architectures capable of efficiently extracting higher-point correlation.
A possible way to extend the capabilities of IRC safe feature extraction to higher-point correlations is to directly implement the form of C-correlators as where Θ N are step functions for reducing the sums to localised information, and Φ N are the neural networks approximating a correlated set (as the output of Φ N in general, is a vector) of f N 's for the particular training objective.For IRC safety, both Θ N and Φ N should be symmetric under the permutation of its arguments.The step function Θ N for each N essentially endows an N -uniform hypergraph structure onto the constituent set similar to the radius filter Θ(∆R ij < R 0 ) endowing a graph structure for the case of N = 2. Therefore, the concatenated hypergraph representations up to N max would extract IRC safe features to be fed to a downstream MLP for some task.
We do not follow this approach for the following reasons.It is well-known [54-57] that automatic feature extraction works best with deeper networks.Depth can only be brought into Φ N in the above expression, which does nothing to the IRC-safe feature extraction process.The complexity can be increased by increasing N , which increases the width of the network, thereby increasing the model complexity sharply.Although the factorisation of the extracted features in energy and angular components could lead to better all-order behaviour in QCD and is indeed interesting, one needs to have proper control of the behaviour of the parameter optimisation before we can hope to answer such questions as demonstrated in reference [33].
Our approach is based on one-particle and two-particle messages to construct a hybrid message-passing neural network that can extract higher point correlations in a recursive approach.Although it is easily generalisable to higher-point information, we restrict ourselves up to 3-point interactions due to the increasing complexity.

IRC safety with heterogenous source and destination embeddings
The basic observation which makes it possible to build a higher-point IRC safe feature extractor is that the requirement of IRC safety for EMPN is still valid even when the node embedding for the source ψ S (p i ), and destination ψ D (p i ) are different as long as they are separately equal in the collinear limit of two particles.If a particle q has two collinear daughters r and s, then we have More importantly, the embeddings ψ S and ψ D need not be functions of just a single particle.They can also be the updated node features of the α-hop IRC safe neighbourhood after α energy-weighted message passing operations (as given in eq.2.2).For an IRC safe neighbourhood of i, where a particle q splits to two daughters r and s, we have Let us look closer into the statement that we need not have the same embedding in the argument of the message function in an Energy-weighted Message Passing operation even though the statement logically follows from the non-requirement of symmetry of the message function.Since we have heterogeneous source and destination embeddings, we need to fix a uniform direction of messages.We will take the direction of all messages as originating from a neighbourhood node j ∈ N [i] to the destination node i.Therefore, we have S,j are the destination and source node embeddings, respectively, and g (α+1,β+1) is the corresponding message function.As the destination and source node embeddings differ, the message-passing operations are indexed separately with α and β, respectively.The source embedding satisfying h S,s in the collinear limit makes the updated node representation H (α+1,β+1) i equal for i / ∈ {q, r, s}, in the splitted and unsplitted case since z q = z r + z s .Explicitly, we have Additionally, we require the equality of the destination embeddings h (α) when i ∈ {q, r, s}.However, we can have h S,q , as this is not needed to satisfy eq.3.1.Therefore, , in the collinear limit of the two daughters r and s of q.

Building higher point IRC safe feature extractor
It is now straightforward to build an IRC-safe message-passing operation which looks into three-particle correlations.The structure of the two-particle energy-weighted operation is kept the same as eq.2.2, and then combined with destination embedding ψ D (p i ) and source embedding ψ S (p i ) of the angular coordinates to give an effective three particle message passing of the form As the destination and source embeddings are different, h D,i and h S,i denote node features updated after two separate message-passing operations as given in eq.2.2 with different message functions g S , respectively.The IRC safe feature would be a graph-level representation after an energy-weighted summed graph readout on We shall see in the following discussions that these two representations look at distinct topological structures in the graph; the IRC safe representation for the order three feature extraction is constructed as a concatenation of these two components .
We can ascertain the behaviour of G 3 by writing down its dependence on the particle's four vectors: S (p j , pl )) D (p i , pl ), ψ S (p j ) .
Three energy weights and three angular arguments hint that the learning procedure would directly start looking at the three-particle interrelations.It is important to note that any IRC safe observable looking into n body phase space, by definition, approaches its n − 1 body phase space limit when one particle approaches the soft or collinear limit.In other words, eq.2.3 will also look into the three-body limit of any four-particle combination when one is soft or collinear to any other particle.However, we expect the above form to extract better the three-particle correlations required for tagging three-prong jets like top quarks.
A schematic representation of the feature extraction procedure using different source and destination embeddings of order one and order two operations is shown in figure 2. We focus on the red node whose neighbours are the coloured.On the top left, the perparticle embeddings for the source and destination can only look into the individual particle Order One Order Two information.On the right, however, the energy-weighted message-passing operation gathers information from each node's neighbourhood, which are shown with the identically coloured arrows for the coloured nodes.The order three feature extractors are built by combining the per-particle destination embedding with the order-two source embedding (on the left) and the order-two destination embedding with the per-particle source embedding (on the right).
From a feature extraction perspective, there are two essential differences in comparison to the L = 2 case given in eq.2.3: • One argument in both g (1,2) and g (2,1) is an embedding of the angular coordinates of a single particle and hence contain single-particle information.In contrast, both arguments already contain the aggregated neighbourhood information in g (2) .
• The embedding of the two arguments in g (1,2) and g (2,1) have independently trainable weights while they are shared for g (2) .
The first difference makes it possible for the function g (1,2) to effectively extract the relation of node i with the updated neighbourhood information of its neighbours (2-hop neighbourhood of i), while the function g (2,1) looks at the aggregated node feature of i's immediate neighbourhood with individual nodes in the same neighbourhood.The difference is also seen in figure 2, where on the left H (1,2) i looks into the features of the nodes within each coloured circle with the red node, while on the right, H (2,1) i looks into the feature of the aggregated neighbourhood information of the red node with the individual nodes within its neighbourhood.This essential difference in the feature extraction procedure makes it imperative to devise the two separate message-passing operations as they need to extract topologically different features within the graph.
It is straightforward to generalize this procedure to any arbitrary N , with substantial flexibility to choose the extractor guided by the requirement to divide N into two parts in any possible way.Any feature extractor looking into less than N correlations can be used to extract features from topologically distinct paths of length N within the graph.Due to the different combinatorial factors involved, the complexity rises relatively fast with increasing N , and we restrict our discussion to N = 3.
To look into the learnt features of the order one and two feature extractors, we define the graph representation as a concatenation of the source and destination embeddings as S,i ) .

(3.4)
This gives the concatenated graph readout to be fed to the classifier network as (3.5)

Network architecture and training
To gauge the properties of the proposed network, we utilise the public top-tagging dataset [58] for a supervised classifier.These events were generated with Pythia 8.2.15 [59] and were showered and hadronised without MPI effects.The showered events additionally underwent a parametrised detector response via Delphes3 [60] with the default ATLAS detector card.The particle-flow objects of the Delphes output were used as inputs to construct anti-k T [61] jets with R = 0.8 via FastJet [62].with additional requirements of p T within the range [550, 650] GeV, and pseudorapidity |η| < 2. Further, for the signal events, the top quark and its decay products' parton level information were used to reject falsely reconstructed jets with the partons falling outside the jet's area.  million samples, while the test and validation datasets contain 400k samples.The network analysis uses PyTorch-Geometric [63].
We compare order three Hypergraph Energy-weighted Message Passing Networks (H-EMPNs) with L = 2 EMPNs.For a reasonable comparison with the H-EMPN, we will extract the graph features for α = 1 and α = 2 stages separately for the EMPN and feed the concatenated graph representation into the classifier network.As shown in figure 3, the IRC-safe feature extractor module for the H-EMPN, in total, contains two per-particle maps for ψ D and ψ S , and four energy-weighted edge convolution (E-EdgeConv) operations to give the updated node embeddings h (1) , and H (2,1) i .Including the classifier MLP, which takes in the concatenated graph readout, we have seven MLPs.We have one for each per-particle map and a message function for each E-EdgeConv operation from the feature extractor module.All these seven MLPs contain two hidden layers with 128 nodes and a rectified linear unit activation function.Except for the classifier network, which has a one-dimensional output with sigmoid activation, all other MLPs have a 128-dimensional output layer with a linear activation function.The per-particle maps take the rapidityazimuth coordinates pi = (∆y iJ , ∆ϕ iJ ) of each constituent i as inputs with the differences taken from the jet axis defined by the four-vector p µ J = npart k=1 p k µ .For a destination node embedding h S,i and source node embedding h D,i , the message function takes in the concatenated vector h S,i ⊕ h S,i − h S,j as the input.The EMPN network sequentially applies the E-EdgeConv operation twice to the input graph's node features.The first and the second E-EdgeConv operations have the same MLP architecture corresponding to the ones that give h (1) S,i ) and ), respectively.The classifier MLP for the EMPN and H-EMPN takes in 256 and 768-dimensional concatenated graph representations, respectively.The whole network is trained using the binary-cross entropy loss function.
We construct graphs with R 0 ∈ {0.4,0.5, 0.6} and R 0 → ∞ corresponding to complete graphs. 1 For all these four instances of input graphs, we train each network five times from random initialization for 100 epochs with the Adam optimizer [64] and a learning rate of 0.001.A decay-on-plateau condition is applied to the learning rate with a decay factor of 0.5 if the validation loss does not decrease for three epochs.The epoch with minimum validation loss is used for inference for each training instance.

Performance
The receiver operator characteristics (ROC) curve for the network with highest area under the ROC (AUC) curve from all training instances between the signal acceptance ϵ S and the inverse of background acceptance 1/ϵ B for the two models for R 0 = 0.4 and R 0 → ∞ is shown in figure 4. We see that the EMPN has almost an overlapping ROC curve for these two radii, while for the H-EMPN, there is a noticeable improvement.The area under the receiver operator curve for the EMPN and H-EMPN for different graph construction radii are tabulated in table 1.The values correspond to the mean over the five training instances, while the errors correspond to the standard deviation.For R 0 = 0.4, the EMPN and H-EMPN have almost identical discrimination power with an AUC of 0.9823 and 0.9821, respectively.As the radius increases, there is a steady increase for the H-EMPN, while for the EMPN, it increases for R 0 = 0.5 and stays at a similar value for R 0 = 0.6 and there is a noticeable dip in performance when going to complete graphs with R 0 → ∞.This trend can be understood from the structural difference between the EMPN and H-EMPN and the three-prong nature of the top jet.The EMPN's feature extraction is sequential, with the second E-EdgeConv being fed by the first E-EdgeConv's updated node features.With increasing radius, the feature-extraction, which looks at aggregated twoparticle correlations, suffers from a redundancy of the information as the first E-EdgeConv already looks at a much larger neighbourhood in the rapidity-azimuth plane.On the other hand, the H-EMPN has a much larger width, with four modules taking the input jet constituents parallelly, which are then combined non-trivially to feed the order-three feature extractors.Even though the order-three extractors take in the updated order-two node features from the full jet in the R 0 → ∞ limit, the combination with the per-particle maps drives the extraction process to look at any relevant three-prong structure in the whole jet.From a purely QCD perspective, the radius R 0 puts in an additional scale beyond the jet radius, and going to the R 0 → ∞ limit takes away this dependence in the feature extraction procedure.Although it is possible to define R 0 as a function of the IRC safe kinematic information of the jet which could possibly improve the feature extraction, we do not consider this as our aim is to move towards theoretically transparent ways of improving feature extraction.Therefore, the H-EMPN can extract features from the full jet more efficiently without being restrained by an arbitrary angular scale R 0 .
The AUC paints a global picture of the discrimination power of a binary classifier; however, a classifier is almost always used at a specific working point, depending on the analysis.This practical aspect demands a local figure of merit, which we show with the inverse of the background acceptance ϵ B , the background rejection 1/ϵ B , at fixed values of signal acceptance ϵ S .The background rejection for the EMPN and H-EMPN for the different graph construction radii are shown for ϵ S = 0.5 and ϵ S = 0.3 in tables 2 and 3, respectively.The values are averaged over the five training instances, with the standard deviations shown as errors.Although the trend for separate models is similar to that of the AUCs, the H-EMPN already starts having a noticeably better background rejection for R 0 = 0.5 even though the EMPN has a nominally higher AUC.As a matter of fact, except for R 0 = 0.4 at ϵ S = 0.3, the H-EMPN has a numerically higher mean background rejection for all other instances.

Visualizing the latent graph representation
In this section, we investigate whether all the graph representations that the H-EMPN learns can contribute to separating the signal and the background for the final classifier output.We choose the best-performing complete graph, which has the possibility of the highest information redundancy besides being the strongest classifier.Although a relatively high linear correlation with the network output does point to the classification using that particular information, it is defined for each component of the graph representation, which dilutes the importance of the underlying vector representations.Moreover, the absence of linear correlation does not imply the lack of discriminatory information, as neural networks can be highly non-linear functions of their inputs.
We look into the separating power of the different graph representations by visualizing them in a two-dimensional latent space using the t-distributed Stochastic Neighbourhood Embedding (t-SNE) [65]-an unsupervised data representation technique, where high dimensional data is embedded non-linearly in a lower dimensional space by maximally conserving the neighbourhood information endowed by a Euclidean metric in both spaces.In other words, nearby points in the high-dimensional representation get mapped to a local neighbourhood in the low-dimensional space.As it is an unsupervised technique, no explicit class information (QCD and top for our case) is fed when learning the map, and the clusters that arise in the low-dimensional space are a consequence of their proximity in the high-dimensional space.Therefore, a well-separated cluster in the lower-dimensional space implies that the higher-dimensional space also has well-separated regions.
We use the implementation of t-SNE in Scikit-learn [66] package to embed the various 128-dimensional graph representations of the test dataset evaluated on the best performing EMPN and H-EMPN for the complete graph in a two-dimensional space sep- arately for each representation.The class-wise two-dimensional histogram in the embedding space (t 1 , t 2 ) for G (1) and G (2) for the EMPN are shown in figure 5. We can see that both the graph representations have relatively distinct regions in (t 1 , t 2 ) for the QCD samples (shown above) and top samples (shown below).Similarly, the two-dimensional histograms for the graph representations constructed out of the destination and source node-embeddings for the H-EMPN are shown in figures 6 and 7, respectively.All these embedded graph representations exhibit clear clustering of the QCD and top samples in different regions, confirming that the H-EMPN has extracted discriminating features from all of its component modules.
Although the EMPN and H-EMPN can utilize their constituent graph representation to separate the QCD jets from top jets as seen from these two-dimensional histograms, we reiterate the qualitative differences between these two networks from the QCD perspective.The L = 2 EMPN looks up to order four relations.In contrast, the H-EMPN in its present guise only looks up to order three-the sequential application of E-EdgeConv (to give H ) takes in the per-particle map with single particle information rather than an updated node feature with the local neighbourhood information in one of its arguments.However, we can see the better ability of the H-EMPN network from its performance studies and potentially better behaviour in QCD with its greater efficacy in the absence of an arbitrary angular scale R 0 .Since we took the top vs QCD jets classification example, we already knew that there is beneficial information in the three-prong structure within the jet, which prompted our design of the specific H-EMPN. 2The first observation from the finite R 0 cases is that the H-EMPN architecture is more critical in extracting the order three relational information from the jets than the L = 2 EMPN.On the other hand, our a priori knowledge of QCD, prompting the design of the H-EMPN, validates that physical inductive biases, or more specifically, QCD, have an important role in the design of performant feature extractors.Therefore, rather than throwing a currently "fashionable network" under the hood, designing architectures based on the underlying physical intuition can help push the performance boundaries of deep learning algorithms and gain (at least) a qualitative understanding of their inner workings.

Conclusions
This study delved deep into the intricacies of generalised automatic infrared and collinear safe feature extraction for LHC phenomenology, focusing on the potential of Graphs and Hypergraphs.Hypergraphs are a generalisation of traditional graphs.While a standard graph consists of vertices connected by edges, each connecting exactly two vertices, a hypergraph allows edges to connect any number of vertices, offering a more flexible way to represent relationships between entities.
First, we explored the behaviour of energy-weighted message passing and its capability to approximate general infrared and collinear safe observables.We highlighted the significance of IRC-safe observables, especially in the context of data interpretation at LHC experiments.The study further explored the capabilities of Energy Flow Networks and Energy-weighted message-passing networks, shedding light on their potential and constraints utilising the usage of multilayer perceptrons as universal function approximators within the architecture with the IRC-safe observables expressible in terms of C-correlators.
To enhance the capabilities of IRC safe feature extraction, especially for higher-point correlations, a novel method was introduced by leveraging the form of C-correlators and heterogenous source and destination node embeddings.This approach presents a renewed outlook on feature extraction.
Qualitatively assessing the two models, while the EMPN model provides a robust foundation for feature extraction, the H-EMPN model, designed to look at order-three interparticle relations, demonstrates an edge in performance metrics even though the EMPN model via the application of two-message passing operations could theoretically look up to order-four.This suggests that incorporating hypergraph structures in the H-EMPN model offers enhanced capabilities in extracting higher-point correlations, making it a promising tool for more intricate analyses in LHC phenomenology.
Our findings underscore the potential of hypergraph-based methods in enhancing the extraction of IRC-safe features.The research paves the way for further exploration into LHC phenomenology, focusing on optimising feature extraction techniques.

(
node features for the α th message passing operation and ω weights dependent on the IRC safe neighbourhood set N [i], with ω (S) j

Figure 2 :
Figure 2: The figure shows a schematic representation of the message passing operation to build hybrid order three node representations for Hypergraph Energy-weighted Message Passing Networks by combining order one and two node representations.

Figure 3 :
Figure 3: The architecture of the H-EMPN network utilized in this study is shown as a flowchart.

Figure 4 :
Figure 4: The receiver operator characteristics curve for the best performing network (in terms of AUC) over the five training instances for R 0 = 0.4 and R 0 → ∞ for the EMPN and H-EMPN for different ranges of signal acceptance ϵ S .On the left, we show 1/ϵ b in log scale over the full range of ϵ S , while on the center and right, it is shown in linear scale over different regions of ϵ S to highlight the differences.

Figure 5 :
Figure 5: two-dimensional histogram of the QCD (above) and top (below) test datasets in the two-dimensional latent space obtained after a t-SNE embedding of the 128-dimensional graph representation G (1) (left) and G (2) (right) of the best performing EMPN trained with complete graphs.

Figure 6 :
Figure 6: The two-dimensional histogram of the QCD (above) and top (below) test datasets in the two-dimensional latent space obtained after a t-SNE embedding of the 128-dimensional graph representation G D,1 (left), G D,2 (center) and G (1,2) 3 (right) of the best performing H-EMPN trained with complete graphs.

Figure 7 :
Figure 7: The two-dimensional histogram of the QCD (above) and top (below) test datasets in the two-dimensional latent space obtained after a t-SNE embedding of the 128-dimensional graph representation G S,1 (left), G S,2 (center) and G (2,1) 3 (right) of the best performing H-EMPN trained with complete graphs.
The training data comprises 1.2

Table 1 :
9823 ± 0.00015 0.9827 ± 0.00009 0.9826 ± 0.00024 0.9825 ± 0.00015 The table shows the mean AUC for five training instances evaluated on the test dataset of the public top-tagging dataset for different architectures.The errors shown are the standard deviation of the five training instances.

Table 2 :
The table shows the background rejection at a signal acceptance of 50% for different models.The values correspond to the mean from the evaluation of the test dataset for five different training instances from random initialization, while the standard deviations are shown as errors.1/ϵB at ϵ S = 0.3

Table 3 :
The table shows the background rejection (1/ϵ B ) at a signal acceptance (ϵ S ) of 30% for different models.The values correspond to the mean from the evaluation of the test dataset for five different training instances from random initialization, while the standard deviations are shown as errors.