Lorentz group equivariant autoencoders

There has been significant work recently in developing machine learning (ML) models in high energy physics (HEP) for tasks such as classification, simulation, and anomaly detection. Often these models are adapted from those designed for datasets in computer vision or natural language processing, which lack inductive biases suited to HEP data, such as equivariance to its inherent symmetries. Such biases have been shown to make models more performant and interpretable, and reduce the amount of training data needed. To that end, we develop the Lorentz group autoencoder (LGAE), an autoencoder model equivariant with respect to the proper, orthochronous Lorentz group \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\textrm{SO}^+(3,1)$$\end{document}SO+(3,1), with a latent space living in the representations of the group. We present our architecture and several experimental results on jets at the LHC and find it outperforms graph and convolutional neural network baseline models on several compression, reconstruction, and anomaly detection metrics. We also demonstrate the advantage of such an equivariant model in analyzing the latent space of the autoencoder, which can improve the explainability of potential anomalies discovered by such ML models.


Introduction
The increasingly large volume of data produced at the LHC and the new era of the High-Luminosity CERN Large Hadron Collider (LHC) poses a significant computational challenge in high energy physics (HEP).To face this, machine learning (ML) and deep neural networks (DNNs) are becoming powerful and ubiquitous tools for the analysis of particle collisions and their products, such as jets-collimated sprays of particles [1] produced in high energy collisions.
Embedding such inductive biases and symmetries into DNNs can not only improve performance, as demonstrated in the references above, but also improve interpretability and reduce the amount of required training data.Hence, in this paper, we explore another fundamental symmetry of our data: equivariance to Lorentz transformations.Lorentz symmetry has been successfully exploited recently in HEP for jet classification [33][34][35][36], with competitive and even stateof-the-art (SOTA) results.We expand this work to the tasks of data compression and anomaly detection by incorporating the Lorentz symmetry into an autoencoder.
Autoencoders learn to encode and decode input data into a learned latent space, and thus have interesting applications in both data compression [37,38] and anomaly detection [11, 13-17, 39, 40].Both tasks are particularly relevant for HEP, the former to cope with the storage and processing of the ever-increasing data collected at the LHC, and the latter for model-independent searches for new physics.Incorporating Lorentz equivariance into an autoencoder has the potential to not only increase performance in both regards, but also provide a more interpretable latent space and reduce training data requirements.To this end, in this paper, we develop a Lorentz-group-equivariant autoencoder (LGAE) and explore its performance and interpretability.We also train alternative architectures, including GNNs and convolutional neural networks (CNNs), with different inherent symmetries and find the LGAE outperforms them on reconstruction and anomaly detection tasks.
The principal results of this work demonstrate (i) that the advantage of incorporating Lorentz equivariance extends beyond whole jet classification to applications with particle-level outputs and (ii) the interpretability of Lorentz-equivariant models.The key challenges overcome in this work include: (i) training an equivariant autoencoder via particle-to-particle and permutation-invariant set-to-set losses (Section 4), (ii) defining a jet-level compression scheme for the latent space (Section 3), and (iii) optimizing the architecture for different tasks, such as reconstruction (Section 4.3) and anomaly detection (Section 4.4).
This paper is structured as follows.In Section 2, we discuss existing work, motivating the LGAE.We present the LGAE architecture in Section 3, and discuss experimental results on the reconstruction and anomaly detection of high energy jets in Section 4. We also demonstrate the interpretability of the model, by analyzing its latent space, and its data efficiency relative to baseline models.Finally, we conclude in Section 5.

Related Work
In this section, we briefly review the large body of work on frameworks for equivariant neural networks in Section 2.1, recent progress in Lorentz-equivariant networks in Section 2.2, and finally, applications of autoencoders in HEP in Section 2.3.

Equivariant Neural Networks
A neural network NN :  →  is said to be equivariant with respect to a group  if where   :  → GL() and   :  → GL() are representations of  in spaces  and  respectively, where GL() is the general linear group of vector space .The neural network is said to be invariant if   is a trivial representation, i.e.   () = 1  for all  ∈ .
Broadly, equivariance to a group  has been achieved either by extending the translation-equivariant convolutions in CNNs to more general symmetries with appropriately defined learnable filters [48][49][50][51], or by operating in the Fourier space of , or a combination thereof.We employ the Fourier space approach, which uses the set of irreducible representations (irreps) of  as the basis for constructing equivariant maps [43,52,53].

Lorentz Group Equivariant Neural Networks
The Lorentz group O(3, 1) comprises the set of linear transformations between inertial frames with coincident origins.In this paper, we restrict ourselves to the special orthochronous Lorentz group SO + (3, 1), which consists of all Lorentz transformations that preserve the orientation and direction of time.Lorentz symmetry, or invariance to transformations defined by the Lorentz group, is a fundamental symmetry of the data collected out of high-energy particle collisions.
There have been some recent advances in incorporating this symmetry into NNs.The Lorentz group network (LGN) [33] was the first DNN architecture developed to be equivariant to the SO + (3, 1) group, with an architecture similar to that of a GNN, but operating entirely in Fourier space on objects in irreps of the Lorentz group, and using tensor products between irreps and Clebsch-Gordan decompositions to introduce non-linearities in the network.More recently, LorentzNet [34,35] uses a similar GNN framework for equivariance, with additional edge features -Minkowski inner products between node features -but restricting itself to only scalar and vector representations of the group.Both networks have been successful in jet classification, with LorentzNet achieving SOTA results in top quark and quark versus gluon classification, further demonstrating the benefit of incorporating physical inductive biases into network architectures.In this work, we build on top of the LGN framework to output not only scalars (e.g.jet class probabilities) but encode and reconstruct an input set of particles under the constraint of Lorentz group equivariance in an autoencoder-style architecture.

Autoencoders in HEP
An autoencoder is an NN architecture comprised of an encoder, which maps the input into a, typically lower dimensional, latent space, and a decoder, which attempts to reconstruct the original input from the latent features.By using a lower dimensional latent space, an autoencoder can learn a smaller representation of data that captures salient properties [54], which can be valuable in HEP for compressing the significant volumes of data collected at the LHC [55].This learned representation can also be exploited for later downstream tasks, such as anomaly detection, where an autoencoder is trained to reconstruct data considered "background" to our signal, with the expectation that it will reconstruct the signal poorly relative to the background.Thus, examining the reconstruction loss of a trained autoencoder may allow the identification of anomalous data2.This can be an advantage in searches for new physics, since instead of having to specify a particular signal hypothesis, a broader search can be performed for data incompatible with the background.This approach has been successfully demonstrated in Refs.[12,39,40,[56][57][58][59][60][61].
Furthermore, there are many possible variations to the general autoencoder framework for alternative tasks [62,63], such as variational autoencoders (VAEs) [64], which are popular generative models.To our knowledge, while there have been some recent efforts at GNN-based autoencoder models [16,65], Lorentz equivariance has not yet been explored.In this work, we focus on data compression and 2Another approach directly examines the latent space [14,15].anomaly detection but note that our model can be extended to further applications.

LGAE architecture
The LGAE is built out of Lorentz group-equivariant message passing (LMP) layers, which are identical to individual layers in the LGN [33].We reinterpret them in the framework of message-passing neural networks [66], to highlight the connection to GNNs, and define them in Sec.3.1.We then describe the encoder and decoder networks in Secs.3.2 and 3.3, respectively.The LMP layers and LGAE architecture are depicted in Fig. 1.We provide the LGAE code, written in Python using the PyTorch ML framework [67] in Ref. [68].

LMP
LMP layers take as inputs fully-connected graphs with nodes representing particles and the Minkowski distance between respective node 4-vectors as edge features.Each node F  is defined by its features, all transforming under a corresponding irrep of the Lorentz group in the canonical basis [69], including at least one 4-vector (transforming under the (1/2, 1/2) representation) representing its 4-momentum.As in Ref [33], we denote the number of features in each node transforming under the (, ) irrep as  (,) , referred to as the multiplicity of the (, ) representation.
The ( + 1)-th MP layer operation consists of messagepassing between each pair of nodes, with a message  ( )    to node  from node  (where  ≠ ) and a self-interaction term   defined as

𝑗
(2) where are the node features of node  before the ( +1)-th layer,    =   −   is the difference between node fourvectors,  2   is the squared Minkowski norm of    , and  is a learnable, differentiable function acting on Lorentz scalars.A Clebsch-Gordan (CG) decomposition, which reduces the features to direct sums of irreps of SO + (3, 1), is performed on both terms before concatenating them to produce the message   for node : where the summation over the destination node  ensures permutation symmetry because it treats all other nodes equally.
Finally, this aggregated message is used to update each node's features, such that for all  ∈ {1, . . .,  particle }, where  (+1) is a learnable nodewise operator which acts as separate fully-connected linear layers  (+1)  (,) on the set of components living within each separate (, ) representation space, outputting a chosen  (+1)  (,) number of components per representation.In practice, we then truncate the irreps to a maximum dimension to make computations more tractable.

Encoder
The encoder takes as input an -particle cloud, where each particle is each associated with a 4-momentum vector and an arbitrary number of scalars representing physical features such as mass, charge, and spin.Each isotypic component is initially transformed to a chosen multiplicity of  (0) (,) E via a node-wise operator  (0) identical conceptually to  (+1) in Eq. ( 5).The resultant graph is then processed through  E MP LMP layers, specified by a sequence of multiplicities , where is the multiplicity of the (, ) representation at the -th layer.Weights are shared across the nodes in a layer to ensure permutation equivariance.
After the final MP layer, node features are aggregated to the latent space by a component-wise minimum (min), maximum (max), or mean.The min and max operations are performed on the respective Lorentz invariants.We also find, empirically, interesting performance by simply concatenating isotypic components across each particle and linearly "mixing" them via a learned matrix as in Eq. ( 5).Crucially, unlike in Eq. ( 5), where this operation only happens per particle, the concatenation across the particles imposes an ordering and, hence, breaks the permutation symmetry.

Decoder
The decoder recovers the -particle cloud by acting on the latent space with  independent, learned linear operators, which again mix components living in the same representations.This cloud passes through  D MP LMP layers, specified by a sequence of multiplicities  ( ) , where  ( )   (,) D is the multiplicity of the (, ) representation at the -th LMP layer.After the LMP layers, node features are mixed back to the input representation space  (0,0) ⊕  (0) (0,0) ⊕  (1/2,1/2) by applying a linear mixing layer and then truncating other isotypic components.

Experiments
We experiment with and evaluate the performance of the LGAE and baseline models on reconstruction and anomaly detection for simulated high-momentum jets.We describe the dataset in Sec.4.1, the different models we consider in Sec.4.2, the reconstruction and anomaly detection results in Sec.s 4.3 and 4.4 respectively, an interpretation of the LGAE latent space in Sec.4.5, and finally experiments of the data efficiency of the different models in Sec.4.6.

Dataset
The model is trained to reconstruct 30-particle high transverse momentum jets from the JetNet [70] dataset, obtained using the associated library [71], zero-padding jets with fewer than 30, produced from gluons and light quarks.These are collectively referred to as quantum chromodynamics (QCD) jets.
Jets in JetNet are first produced at leading-order using MADGRAPH5_aMCATNLO [72] and decayed and showered with pythia 8.2 [73].They are then discretized and smeared to take detector spatial and energy resolution respectively into account, with simulated tracking inefficienciesemulating the effects of the CMS and ATLAS trackers and calorimeters-and finally clustered using the anti- T [74] algorithm with distance parameter  = 0.8.Further details on the generation and reconstruction process are available in Ref. [20].The exact smearing parameters and calorimeter granularities used are reported in Table 2 of Ref. [75] and correspond to the "CMS-like" scenario.
We represent the jets as a point cloud of particles, termed a "particle cloud", with the respective 3-momenta, in absolute coordinates, as particle features.In the processing step, each 3-momentum is converted to a 4-momentum:   = (|p|, p), where we consider the mass of each particle to be negligible.We use a 60%/20%/20% training/testing/validation splitting for the total 177,000 jets.For evaluating performance in anomaly detection, we consider jets from JetNet produced by top quarks,  bosons, and  bosons as our anomalous signals.
Finally, we note here that the detector and reconstruction effects in JetNet, and indeed in real data collected at the LHC, break the Lorentz symmetry; hence, Lorentz equivariance is generally an approximate rather than an exact symmetry of HEP data.We assume henceforth that the magnitude of the symmetry breaking is small enough that imposing exact Lorentz equivariance in the LGAE is still advantageous-and the high performance of the LGAE and classification models such as LorentzNet support this assumption.Nevertheless, important studies in future work may include quantifying this symmetry breaking and considering approximate, as well as exact, symmetries in neural networks.

Models
LGAE model results are presented using both the minmax (LGAE-Min-Max) and "mix" (LGAE-Mix) aggregation schemes for the latent space, which consists of varying numbers of complex Lorentz vectors -corresponding to different compression rates.We compare the LGAE to baseline GNN and CNN autoencoder models, referred to as "GNNAE" and "CNNAE" respectively.
The GNNAE model is composed of fully-connected MPNNs adapted from Ref. [20].We experiment with two types of encodings: (1) particle-level (GNNAE-PL), as in the PGAE [16] model, which compresses the features per node in the graph but retains the graph structure in the latent space, and (2) jet-level (GNNAE-JL), which averages the features across each node to form the latent space, as in the LGAE.Particle-level encodings produce better performance overall for the GNNAE, but the jet-level provides a more fair comparison with the LGAE, which uses jet-level encoding to achieve a high level of compression of the features.
For the CNNAE, which is adapted from Ref. [76], the relative coordinates of each input jets' particle constituents are first discretized into a 40 × 40 grid.The particles are then represented as pixels in an image, with intensities corresponding to  rel T .Multiple particles per jet may correspond to the same pixel, in which case their  rel T 's are summed.The CNNAE has neither Lorentz nor permutation symmetry, however, it does have in-built translation equivariance in  −  space.
Hyperparameter and training details for all models can be found in Appendix A and Appendix B respectively, and a summary of the relevant symmetries respected by each model is provided in Table 1.The LGAE models are verified to be equivariant to Lorentz boosts and rotations up to numerical error, with details provided in Appendix C.

Reconstruction
We evaluate the performance of the LGAE, GNNAE, and CNNAE models, with the different aggregation schemes discussed, on the reconstruction of the particle and jet features of QCD jets.We consider relative transverse momentum  rel T =  particle T / jet T and relative angular coordinates  rel =  particle −  jet and  rel =  particle −  jet (mod 2) as each particle's features, and total jet mass,  T and  as jet features.We define the compression rate as the ratio between the total dimension of the latent space and the number of features in the input space: 30 particles × 3 features per particle = 90.
Figure 2 shows random samples of jets, represented as discrete images in the angular-coordinate plane, reconstructed by the models with similar levels of compression in comparison to the true jets.Figure 3 shows histograms of the reconstructed features compared to the true distributions.The differences between the two distributions are quantified in Table 2 by calculating the median and interquartile ranges (IQR) of the relative errors between the reconstructed and true features.To calculate the relative errors of particle features for the permutation invariant LGAE and GNNAE models, particles are matched between the input and output clouds using the Jonker-Volgenant algorithm [77,78] based on the L2 distance between particle features.Due to the discretization of the inputs to the CNNAE, reconstructing individual particle features is not possible; instead, only jet features are shown.3 We can observe visually in Figure 2 that out of the two permutation invariant models, while neither is able to reconstruct the jet substructure perfectly, the LGAE-Min-Max outperforms the GNNAE-JL.Perhaps surprisingly, the permutation-symmetry-breaking mix aggregation scheme improves the LGAE in this regard.Both visually in Figure 3 3These are calculated by summing each pixel's momentum "4-vector" -using the center of the pixel as angular coordinates and intensity as the  rel T .
and quantitatively from Tables 2 and 3, we conclude that the LGAE-Mix has the best performance overall, significantly outperforming the GNNAE and CNNAE models at similar compression rates.The LGAE-Min-Max model outperforms the GNNAE-JL in reconstructing all features and the GNNAE-PL in all but the IQR of the particle angular coordinates.

Anomaly detection
We test the performance of all models as unsupervised anomaly detection algorithms by pre-training them solely on LGAE-Min-Max LGAE-Mix GNNAE-JL GNNAE-PL CNNAE Fig. 3 Top: particle momenta (  rel T ,  rel ,  rel ) reconstruction by LGAE-Min-Max ( (1/2,1/2) = 4, resulting in 56.67% compression) and and LGAE-Mix ( (1/2,1/2) = 9, resulting in 61.67% compression), and GNNAE-JL (dim( ) = 55, resulting in 61.11% compression) and GNNAE-PL (dim( ) = 2 × 30, resulting in 66.67% compression).The reconstructions by the CNNAE are not included due to the discrete values of  rel and  rel , as discussed in the text.Bottom: jet feature ( ,  T , ) reconstruction by the four models.For the jet feature reconstruction by the GNNAEs, the particle features in relative coordinates were transformed back to absolute coordinates before plotting.The jet  is not shown because it follows a uniform distribution in (− ,  ] and is reconstructed well. Table 2 Median and IQR of relative errors in particle feature reconstruction of selected LGAE and GNNAE models.In each column, the best-performing latent space per model is italicized, and the best model overall is highlighted in bold.QCD and then using the reconstruction error for the QCD and new signal jets as the discriminating variable.We consider top quark, W boson, and Z boson jets as potential signals and QCD as the "background".We test the Chamfer distance, energy mover's distance [79] -the earth mover's distance applied to particle clouds, and MSE between input and output jets as reconstruction errors, and find the Chamfer distance most performant for all graph-based models.For the CNNAE, we use the MSE between the input and reconstructed image as the anomaly score.4,4 and   values at particular background efficiencies are given in Table 4.We see that in general the permutation equivariant LGAE and GNNAE models outperform the CNNAE, strengthening the case for considering equivariance in neural networks.Furthermore, LGAE models have significantly higher signal efficiencies than GNNAEs and CNNAEs for all signals when rejecting > 90% of the background (which is the minimum level we typically require in HEP), and LGAE-Mix consistently performs better than LGAE-Min-Max.

Latent space interpretation
The outputs of the LGAE encoder are irreducible representations of the Lorentz groups; they consist of a pre-specified number of Lorentz scalars, vectors, and potentially higherorder representations.This implies a significantly more interpretable latent representation of the jets than traditional autoencoders, as the information distributed across the latent space is now disentangled between the different irreps of the Lorentz group.For example, scalar quantities like the jet mass will necessarily be encoded in the scalars of the latent space, and jet and particle 4-momenta in the vectors.
We demonstrate the latter empirically on the LGAE-Mix model ( (1/2,1/2) = 2) by looking at correlations between jet 4-momenta and the components of different combinations of latent vector components.Figure 5 shows that, in fact, the jet momenta is encoded in the imaginary component of the sum of the latent vecotrs.
We can also attempt to understand the anomaly detection performance by looking at the encodings of the training data compared to the anomalous signal.Figure 6 shows the individual and total invariant mass of the latent vectors of sample 4Discontinuities in the top quark and combined signal LGAE-Min-Max ROCs indicate that at background efficiencies of ⪅ 5 × 10 −3 , there are no signal events remaining in the validation dataset.
LGAE models for QCD and top quark, W boson, and Z boson inputs.We observe that despite the overall similar kinematic properties of the different jet classes, the distributions for the QCD background are significantly different from the signals, indicating that the LGAE learns and encodes the difference in jet substructure -despite substructure observables such as jet mass not being direct inputs to the network -explaining the high performance in anomaly detection.
Finally, while in this section we showcased simple "bruteforce" techniques for interpretability by looking directly at the distributions and correlations of latent features, we hypothesize that such an equivariant latent space would also lend itself effectively to the vast array of existing explainable AI algorithms [80,81], which generically evaluate the contribution of different input and intermediate neuron features to network outputs.We leave a detailed study of this to future work.

Data efficiency
In principle, equivariant neural networks should require less training data for high performance, since critical biases of the data, which would otherwise have to be learnt by nonequivariant networks, are already built in.We test this claim by measuring the performances of the best-performing LGAE and CNNAE architectures from Sec. 4.3 trained on varying fractions of the training data.
The median magnitude of the relative errors between the reconstructed and true jet masses of the different models and fractions is shown in Fig. 7.Each model is trained five times per training fraction, with different random seeds, and evaluated on the same-sized validation dataset; the median of the five models is plotted.We observe that, in agreement with our hypothesis, the LGAE models both maintain their high performance all the way down to training on 1% of the data, while the CNNAE's performance steadily degrades down to 2% and then experiences a further sharp drop.

Conclusion
We develop the Lorentz group autoencoder (LGAE), an autoencoder model equivariant to Lorentz transformations.We argue that incorporating this key inductive bias of high energy physics (HEP) data can have a significant impact on the performance, efficiency, and interpretability of machine learning models in HEP.We apply the LGAE to tasks of compression and reconstruction of input quantum chromodynamics (QCD) jets, and of identifying anomalous top quark, W boson, and Z boson jets.We report excellent performance in comparison to baseline graph and convolutional neural network autoencoder models, with the LGAE outperforming them on several key metrics.We also demonstrate the LGAE's interpretability, by analyzing the latent spaces of LGAE models for both tasks, and data efficiency relative to baseline models.The LGAE opens many promising avenues in terms of both performance and model interpretability, with the exploration of new datasets, the magnitude of Lorentz and permutation symmetry breaking due to detector effects, higher-order Lorentz group representations, and challenges with real-life compression and anomaly detection applications all exciting possibilities for future work.

Fig. 1
Fig. 1 Individual Lorentz group equivariant message passing (LMP) layers are shown on the left, and the LGAE architecture is built out of LMPs on the right.Here, MixRep denotes the node-level operator that upsamples features in each (, ) representation space to  (,) channels; it appears as  in Eq. (5).

Fig. 5
Fig. 5 The correlations between the total momentum of the imaginary components in the  (1/2,1/2) = 2 LGAE-Mix model and the target jet momenta.The Pearson correlation coefficient  is listed above.

Fig. 7
Fig. 7 Median magnitude of relative errors of jet mass reconstruction by LGAE and CNNAE models at trained on different fractions of the training data.

Table 1
Summary of the relevant symmetries respected by each model discussed in Sec. 4.

Table 3
Median and IQR of relative errors in jet feature reconstruction by selected LGAE and GNNAE models, along with the CNNAE model.In each column, the best performing latent space per model is italicised, and the best model overall is highlighted in bold.
) versus background efficiencies (  ) for individual and combined signals are shown in Fig.

Table 4
Anomaly detection metrics by a selected LGAE and GNNAE models, along with the CNNAE model.In each column, the best performing latent space per model is italicized, and the best model overall is highlighted in bold.