Introduction

The Large Hadron Collider beauty Experiment (LHCb) is one of the four large experiments at the proton–proton collider LHC, at CERN [1]. It is dedicated to the study of beauty (b) and charm (c) hadron decays, performing high-precision measurements to test the validity of the Standard Model (SM) of particle physics and identify possible signatures of the presence of physics beyond the SM. To push the precision frontier, LHCb needs to record as many heavy-hadron decays as possible. One way to increase that quantity for a given period of data collection is to increment the average number of proton–proton collisions that happen in each event (bunch crossing). During the LHC Run 1 and Run 2 periods, between 2010 and 2018, each LHCb event contained an average of around one visible proton–proton collision, producing a flow of tens of particles to be reconstructed. The experiment has now undergone its Upgrade I, with the installation of new sub-detectors and a new data-collection software to allow the processing of events with around five visible proton–proton collisions each. These will be the conditions for the ongoing Run 3 and for the future Run 4. In a decade from now, the Upgrade II [2] of LHCb will prepare the experiment to face another tenfold increase in proton-collision multiplicity [3] to fully exploit the High-Luminosity (HL-LHC) Phase of the LHC during Runs 5 and 6. The approximated expected object multiplicities per event in the different conditions are shown in Table 1. Beyond upgraded sub-detectors, the much larger event complexities bring unprecedented challenges to LHCb, both for data-collection and for the eventual measurements. New strategies need to be devised and implemented to tackle those challenges and hence maximise the future physics reach of the experiment.

Table 1 Approximate average quantities per event for the different LHCb run conditions, as estimated from the simulation used in this work

So far, the entire data flow of the LHCb experiment has been based on an exclusive approach, i.e. it was sufficient for a set of particles to be compatible with a certain type of decay to be identified as a signal candidate. While this approach has its merits, it ignores all the remaining particles produced in the collision during its selection process, which contain important information on the underlying physics process. Exceptions to this exclusive approach are found in flavour tagging algorithms [4] and isolation studies [5, 6]. However, both cases examine the rest of the event in relation to a specific candidate, e.g. flavour tagging aims to infer the flavour of the heavy-hadron associated with a given signal candidate. While technically very challenging, significantly more information could be gained by an inclusive study of all the particles in the event. This would not only add discriminating power to disentangle true signal decays from multiple sources of background but also allow for the identification and separation of groups of particles corresponding to multiple heavy-hadron decays in the event, all of which can be used for subsequent physics analyses. The gains of such an inclusive approach compared to the individual study of signal candidates become stronger with increasing event complexities, as the larger combinatorics problem makes it more complicated to identify and isolate signals [5, 6].

The individual study of heavy-hadron decays is also at the core of the LHCb strategy for data collection. The trigger of the experiment aims at discerning between events that contain a signal decay and those that do not, by means of a combination of exclusive and partially inclusive [7, 8] particle selections. In previous LHC runs, the disk space available to store the information for the selected events was large enough to allow persisting all the objects in the event in many cases. This provided the flexibility to study offline particles other than the ones composing the signal candidate that triggered the event, which is a crucial feature for signal-background separation in many analyses and for allowing the study of modes not considered when the trigger selections were made. This situation is completely different in the HL-LHC era. First of all, the fraction of events containing decays of interest will saturate to around 100%, with each event typically containing several heavy-hadron decays. Second, the event sizes will be much larger than in the past due to increased particle multiplicity. This implies that the potential datasets to be collected are huge, while the available disk space is limited and imposes tight constraints. A trigger strategy based on selecting events in those conditions necessarily leads to a signal inefficiency, impacting the potential physics reach of the experiment. Consequently, the trigger paradigm needs to shift from deciding “which events are interesting?” to “which parts of the event are interesting?”. Minimising the average event size will directly translate into maximising the number of events LHCb can record. When doing so, the trigger needs to ensure that the relevant particles (those produced in heavy-hadron decays) are amongst those to be kept for offline analysis, otherwise also impacting the potential physics reach. These problems are already partially present in the current LHCb Upgrade I, as anticipated in Ref. [9]. In preparation, LHCb has developed a framework that allows the persistency of part of the event, named turbo stream [10] (e.g. the persistency of the information associated to the identified signal candidates and the set of reconstructed particles associated to the same proton–proton collision). During the ongoing Run 3, for instance, about two-thirds of the recorded events follow such turbo data processing model. However, at present, there is no nominal strategy in LHCb to systematically identify which parts of the event may be interesting for physics analysis. This is a very complicated task affected by large particle combinatorics and a huge variability of types of signal decays.

To tackle the previous challenges, we propose a new algorithm to perform a Deep-learning based Full Event Interpretation (DFEI) at LHCb. This innovative approach, which targets an inclusive analysis of the entire event, represents a shift of paradigm with important applications both at the trigger level and at the offline analysis level. The algorithm takes as input all the reconstructed particles in an event and aims to identify which of them originate from the decays of heavy-hadrons and at reconstructing the hierarchical decay chains through which they were produced. The possibility of accomplishing this difficult task leverages on some of the most recent developments in the field of machine learning. At the trigger level, DFEI can identify the part of each event which is interesting for physics analyses, allowing to safely discard the rest of the event and hence minimise the storage required. As an additional benefit, an automated identification and classification of the decay chains could eventually replace the need for cut-based exclusive selections that need to be designed and carefully tuned independently for each signal decay type. At the offline analysis level, DFEI can offer a common tool for physicists to identify and classify the different types of backgrounds contributing to a broad spectrum of possible decays of interest. Leveraging the information from all the correlations in the event can enhance the background rejection power in many cases, increasing the precision of future LHCb measurements.

This document describes the conceptualisation, construction, training and performance of the first prototype of the DFEI algorithm. The prototype is specialised for reconstructed charged particles produced in beauty-hadron decays. Extensions to include reconstructed neutral particles and charm-hadron decays can be considered in the future. All the studies are done using simulated datasets that emulate proton–proton collisions in the LHCb Run 3 environment. These datasets have been produced with a custom simulation framework and made publicly available for future benchmarking. The algorithm is based on a composition of Graph Neural Network (GNN) models, designed to handle the complexity of high-multiplicity events in a computationally efficient way. Regarding the paper organisation, the state of the art is first presented in Sect. "Related Work". The development of the DFEI prototype is described in Sect. "Methods" , starting with an introduction to GNN models in Sect. "Usage of Graph Neural Networks", followed by the description of the employed dataset in Sect. "Dataset" for which additional details are provided in App. A, the structure of the algorithm in Sect. "Structure of the Algorithm", and finally the training in Sect. "Training". The performance of the algorithm is described in detail in Sect. "Results". In particular, the quality of the reconstruction is first evaluated at the event level, in Sect. "Event-Level Performance", and then at the exclusive-signal level, in Sect. "Decay-Level Performance". A timing study is presented in Sect. "Timing Studies" (with additional details provided in App. B). The results are discussed in Sect. "Discussion" and future prospects are presented in Sect. "Future Work". Finally, the conclusions are summarised in Sect. "Conclusion".

Related Work

Even though the problem addressed in this paper is unique, it shares similarities with a variety of past efforts at the technical and/or scientific level. In this section, we conduct a review of those efforts and place our approach in the context of the field.

The first and so far only use of a machine learning-based approach on the full set of reconstructed tracks within LHCb is Ref. [11], where the authors employed a probabilistic model based on decision trees for the inclusive flavour tagging of signal beauty hadrons. The combined processing of all the event information demonstrated better results compared to a combination of multiple classical flavour tagging algorithms, each using only a subset of the reconstructed particles in the event. The task of flavour tagging is much simpler than the explicit decay chain reconstruction attempted by DFEI. Regarding isolation tools, past LHCb efforts [5, 6] are restricted to multivariate classifiers that aim to predict whether individual particles from the rest of the event originate or not from the same heavy-hadron decay as a signal candidate. The decision is based on a combination of features from the signal candidate and the extra particle, fully disregarding any correlation with the other particles in the event. Concerning trigger-oriented applications, the authors in Ref. [12] presented a study of the full information in the event in terms of the activity in the different LHCb sub-detectors, hence at a level prior to the reconstruction of the stable particles, which are considered as input in DFEI. Using machine learning techniques, they successfully managed to predict the number of reconstructible proton-proton collisions per event.Footnote 1 They also studied the possible classification of events between those containing (at least) one b-hadron decay and those that do not, but this turned out to be a very complicated task when looking only at the sub-detector activity information.

Regarding other LHC experiments, a type of full event reconstruction is done in CMS [13] and ATLAS [14], through the usage of the particle flow algorithm. The implementation uses all the final state particles for a global event description, significantly improving the performance of jet reconstruction with respect to the previous baseline that used basic geometric cones to cluster particles. In order to further improve the performance, an approach with a GNN [15] was proposed in CMS, which takes all particles of an event as input and predicts variables such as particle identification and transverse momentum for each particle. While similar to DFEI at a technical level, the particle flow algorithm does not attempt to explicitly reconstruct the decay chains for all the relevant decays of interest.

The task of decay-chain reconstruction is conceptually close to the hierarchical reconstruction of jets, for which a variety of algorithms based on GNN were developed [16,17,18,19,20,21,22,23,24,25,26,27,28,29,30]. The ultimate goal of those algorithms, however, is typically focused on inferring quantities of the jet overall, for example doing a flavour tagging of the jet to determine the initial particle, and reconstructing the jet to infer its kinematics. The jet substructure is only studied to the extent in which it’s useful for those purposes. The limitations of those algorithms for the task of reconstructing all the ancestors in particle decay chains are reviewed in detail in Ref. [31].

The effort closest to the one presented in this paper is being conducted at the Belle II experiment, where the FEI algorithm [32] was developed for exclusive tagging of B-decays. This constitutes a similar approach to the one presented in this paper but with a different goal and in a simpler environment. As Belle II is a hermetic detector situated at an electron-positron collider, the event is a fully reconstructible system with known initial states and significantly less tracks, making the task of inference less challenging. In addition, only two species of b hadrons are studied, \(B^0\) and \(B^+\) mesons,Footnote 2 while LHCb is interested in all b-hadron species (for example \(B_s\) and \(B_c\) mesons, and \(\Lambda _b\) baryons) as well as c-hadron decays. From a probabilistic point of view, the FEI algorithm at Belle II is based on a fixed set of different boosted decision tree classifiers, one for each considered decay type. This approach would be unfeasible at LHCb, given the much larger variability in terms of different signal decay topologies, further compounded by the fact that a fraction of the particles produced in the decays may fall outside the LHCb geometrical acceptance, and hence not be reconstructed in the detector. Recently, an extension to the FEI algorithm based on GNN was proposed [31, 33], showing a better performance than the previous implementation. This resembles the approach presented in this paper, but in a very different environment, as has been discussed.

As exemplified by previous efforts, GNNs have become popular for replacing other machine learning algorithms within particle physics experiments  [34, 35], as they can naturally capture the structure and spatial sparsity of the problem. A challenge, however, is the GNN’s performance in deployment, such as in real-time computing for trigger purposes. Achieving a fast inference with GNNs would require sparse operations and standards of representing such operations in protocols that would allow the automatic optimisation of the networks. This is a matter of broad interest and front-line research. Very recently, there have been multiple successful efforts in this direction within other CERN experiments  [36,37,38,39,40,41,42], for example by reducing the complexity of the networks and using FPGAs or GPUs as hardware accelerators.

Methods

Usage of Graph Neural Networks

Machine learning and especially neural networks usage in particle physics has been growing exponentially in the last decade [43]. The major motivation to explore new and increasingly complex machine learning techniques is to optimally incorporate the structure of the underlying problem into the model itself. This includes incorporating variable input sizes, representing different types of connections between inputs and embedding invariances into the architecture. Graph Neural Networks are a class of neural networks built around the concept of a graph, which is an unordered and variable-sized collection of nodes (\(v\in V\)), edges connecting those nodes (\(e\in E\)), and possibly a vector of graph-level features (\(\textbf{u}\)). The relations between the nodes occur in a high-dimensional latent space, allowing for a more complete description of the data. This architecture is especially well-fitted to capture problems with sparse connections and invariance under input permutation, as is the case for the set of reconstructed particles in a collision event.

In general, GNNs implement “graph-to-graph” transformations, by the application of multiple layers that operate on the graph constituents. At each layer, input vectors of features at the node, edge and/or graph level are used and returned to the next layer, with the output of the last layer fulfilling the goal of a certain task. This work is based on the usage of message-passing GNNs [44], in which the information is propagated through the graph at each layer by exchanging information between adjacent nodes. Specifically, we use the so-called full GN block in Ref. [45], depicted in Fig. 1. This block is composed of three feature-update functions, \(\phi ^v\), \(\phi ^e\) and \(\phi ^u\), and three information-aggregation functions, \(\rho ^{e\rightarrow v}\), \(\rho ^{e\rightarrow u}\) and \(\rho ^{v\rightarrow u}\). Each of the three update functions is implemented by a multilayer perceptron (MLP), and the aggregation functions are element-wise summations.

Fig. 1
figure 1

Graph processing block at each message-passing step, as presented in Ref. [45]

These blocks are then applied multiple times, and what is returned is a new representation of the graph. As the final layer, each output element–nodes, edges or the entire graph—has a sigmoid for binary activation function and a softmax for multilabel classification. The output is then up to interpretation as further described in Sect. "Structure of the Algorithm".

Dataset

The DFEI prototype is trained on simulated data. Since the LHCb simulation samples are restricted to internal member access only, and there is no publicly available dataset that fully captures the essence of the problem at hand, we have created a new simulation environment and produced datasets which we have made publicly available [46]. The datasets are generated with PYTHIA8 [47] and EvtGen [48], replicating the particle-collision conditions expected for the LHCb Run 3. In addition, an approximate emulation of the LHCb detection and reconstruction effects is applied, as described in App. A. In the generated dataset, each event is required to contain at least one b-hadron, which is subsequently allowed to decay freely through any of the standard decay modes present in PYTHIA8. In these conditions, around 40% of the events contain more than one b-hadron decay within LHCb acceptance, and the maximum observed number of b-hadrons is five. All the studies presented in this paper refer only to reconstructed particles that have been produced inside the LHCb geometrical acceptance and in the Vertex Locator region (as defined in App. A). Other particles are not considered, which also implies they are not included as part of the ground truth heavy-hadron decay chains.

A total of 100,000 simulated events have been used to develop this first prototype of the DFEI algorithm. They are divided into: training dataset (40,000 events), test dataset (10,000 events) and evaluation dataset (50,000 events). In addition to this inclusive dataset, several other smaller samples (of a few thousand events each) have also been generated simulating specific signal decay types. These decay types have been chosen to be representative of the most common signal topologies studied in physics analyses at LHCb, and are used to evaluate the performance of DFEI focused on typical use cases. These samples contain only events in which all the particles originating from each of the considered exclusive decays have been produced inside the LHCb geometrical acceptance and in the Vertex Locator region.

The input features used in the DFEI GNN modules are described in the following. Regarding geometrical variables, a cartesian right-handed coordinate system is adopted, with the z axis along the beam, the y pointing upwards and the x axis parallel to the horizontal.

  • Node variables:

    • Transverse momentum (\(p_T\)): component of the three-momentum transverse to the beamline.

    • Impact parameter (IP) with respect to the associated primary vertex (PV): distance of closest approach between the particle trajectory and its associated primary vertex (proton-proton collision point), defined as the one with the smallest IP for the given particle amongst all the primary vertices in the event.

    • Pseudorapidity (\(\eta\)): spatial coordinate describing the angle of a particle relative to the beam axis, computed as  \(\eta =\textrm{arctanh}(p_z/\Vert \textbf{p}\Vert )\).

    • Charge (q): since only charged reconstructed particles are considered, the charge can only take the values 1 or -1.

    • \(O_x\),  \(O_y\),  \(O_z\): cartesian coordinates of the origin point of the particle.

    • \(p_x\),  \(p_y\),  \(p_z\): cartesian coordinates of the three-momentum of the particle.

    • \(PV_x\),  \(PV_y\),  \(PV_z\): cartesian coordinates of the position of the associated primary vertex.

  • Edge variables:

    • Opening angle (\(\theta\)): angle between the three-momentum directions of the two particles.

    • Momentum-transverse distance (\(d_{\perp \textbf{P}}\)): distance between the origin point of the two particles projected onto a plane which is transverse to the combined three momentum of the two particles.

    • Distance along the beam axis (\(\Delta _z\)): difference between the z-coordinate of the origin points of the two particles.

    • FromSamePV: Boolean variable indicating whether the two particles share the same associated primary vertex.

    • IsSelfLoop: Boolean variable indicating whether the edge is connecting a particle with itself or not (i.e. it connects two different particles).

Structure of the Algorithm

A full-sized event graph with all necessary features can grow quite large, possibly exceeding the available computing resources in online applications. To improve the scalability of the algorithm, a sequential approach is adopted: several event pre-filtering steps are applied before the decay chain reconstruction is performed, with tunable thresholds depending on the available resources. It should be noted that the GNN-based FEI algorithm used at Belle II uses a dense network, that would not scale well to LHC conditions. Each collision event is transformed into a graph, where the charged reconstructed particles are represented as nodes and the relations between them are represented as edges. Edges are established between particles that either share the same associated primary vertex or have an opening angle smaller than a given threshold, to reduce the graph size and the time needed to build it. Requiring that the edge selection keeps 99% of the connections between particles that originate from the same b-hadron decay corresponds to choosing a threshold value for \(\theta\) of 0.26 rad. This requirement removes around 11% of all the other connections. Further tuning of this parameter goes beyond the scope of this paper, which privileges a loose preselection in order not to compromise the subsequent performance of the algorithm.

The input graph is passed subsequently through three GNN modules, built using the graph_nets library [45]. The modules are schematically represented in Fig. 2 and described in the following. The input features used by each module are specified in Table 2.

Fig. 2
figure 2

Schematic representation of an event processing by the algorithm. Green (red) graph nodes represent particles originated in the decay chain of a b-hadron (from the rest of the event). The reconstructed ancestors are represented in blue

Table 2 Input variables used by each of the DFEI modules
  • Node pruning (NP). The first GNN module has the goal of removing most of the particles (nodes) that have not been produced in the decay of any b-hadron. It mostly exploits the fact that particles produced in the decay of a b-hadron typically have large IP and \(p_T\) values. Since the main contributing factor to the prediction of each node in this case comes from the same node’s features, self-loop connections are included in the graphs. The model is trained using binary cross-entropy loss function to predict whether a node originates from beauty hadrons or not. Nodes with an output score below a certain threshold are removed from the graph.

  • Edge pruning (EP). The output graph of the previous step still has a large number of edges, which are further reduced by a second GNN module. This one aims to remove edges between particles that do not share the same beauty-hadron ancestor. Amongst other relations, this exploits the fact that particles coming from the same b-hadron decay tend to be closer in space and their three-momenta tend to form a small opening angle. The model is trained using binary cross-entropy loss function to predict whether an edge connects two particles from the same beauty-hadron decay. Edges with an output score below a certain threshold are removed from the graph.

  • Lowest common ancestor inference (LCAI). Finally, a third GNN module takes the output of the previous algorithm, and aims at inferring the so-called “lowest common ancestor” of each pair of particles (a technique similar to the recently proposed LCA-matrix reconstruction for the Belle II experiment [33]). The limited coverage of the LHCb geometrical acceptance and the fact that only charged reconstructed particles are considered in this prototype implies that a large fraction of the decay chains can only be partially reconstructible. To circumvent this limitation, the target decay chains for this prototype are not the ones output by the PYTHIA8 simulation but a “topological” version of them, constructed from the separable decay vertices in the decay chain. In practice, this amounts to a transformation of the ground truth decay chain, removing the ancestors that either correspond to very-short-lived resonances or do not have enough charged-particle descendants to allow the formation of a vertex. From a technical perspective, the GNN module performs a multi-class classification on the edges. The model is trained using multi-class cross-entropy loss function outputting a score associated to the “topological” LCA relation between the two connected particles, e.g. particles that share an ancestor at the lowest level will have 1st order LCA (class-1), particles that share an ancestor at the next-to-lowest level will have 2nd order LCA (class-2), etc. The fraction of edges with a ground-truth order larger than 3 in the simulation is very small, so the target classes considered are class-1, class-2 and class-3. In addition, a class with an LCA value of 0 is included (class-0), to identify the case in which the two particles do not originate from the same decay chain. As a side product of the addition of this last class, the LCAI provides a final step of node filtering, by allowing to remove fully disconnected particles (those whose edges are all predicted to have an LCA value of 0).

Each of the previous modules uses independent MLPs for the node-, edge- and global-update functions introduced in Sect. "Usage of Graph Neural Networks". Each MLP is composed of a certain number of layers, all of which have the same latent size. The number of GN block iterations is also configured separately for each module. The hyperparameters chosen for this prototype are written in Table 3.

The output of the DFEI processing chain can be directly translated into a set of selected charged reconstructed particles and their inferred ancestors, with the predicted hierarchical relations amongst them.

Training

The training is done in stages, following the algorithm sequence. Each model is trained in a supervised way, using a weighted cross-entropy as loss function, where the weights (corresponding to the inverse of the number of elements in each true class) compensate for the imbalance across classes present in the dataset. The minimisation is done using Adam with the hyperparameter configuration reported in Table 3.

Table 3 Hyperparameters used in the construction and training of the different GNN modules

Thresholds are defined for the output score of the NP and EP models, as those resulting into a \(\sim 99\%\) efficiency of selecting the desired nodes and edges, respectively. This loose requirement is chosen to minimise the potential negative impact on the performance of the subsequent steps. The working point corresponds to a \(\sim 70\%\) background rejection power for nodes from the NP algorithm and a \(\sim 68\%\) background rejection power for edges from the EP algorithm. In this setup, the ROC AUC for the NP module is 0.977, and the one for the EP module is 0.974. A consistent performance is observed between the training and test samples, showing no overtraining of these modules. The average reduction of the total event size after each processing step is shown in Table 4.

Table 4 Cumulative average efficiencies on the total number of nodes and edges in the graph after each pre-filtering step, illustrating the graph reduction power achieved in each case

The training of the LCAI module requires significantly more training iterations than the previous steps, given the much higher complexity of the task. A certain level of overtraining is found for the least populated classes, and the training is stopped once the average classification accuracy for the test sample reaches a plateau. Since the goal of this paper is demonstrating the feasibility of the approach by presenting a first working prototype, rather than obtaining the maximum possible performance, we leave the improvements in the training as future work.

Results

In this section, the performance of the current DFEI prototype is described, both at an event level (relevant for trigger) and at an individual-decay-chain level (relevant for trigger and offline analysis).

Event-Level Performance

Different metrics are defined and evaluated in the following section to characterise the performance of DFEI at event level, from multiple perspectives.

Event-size-reduction capabilities. Three different quantities are studied, as a function of the particle multiplicity per event: efficiency of selecting particles from a b-hadron (\(H_b\)) decay, efficiency of rejecting particles from the rest of the event (background), and total number of selected particles in the event. The obtained values are shown in Figs. 3 and  4. The average efficiency for selecting particles truly produced in b-hadron decays is 94%, and the average background rejection power is 96%. The selection efficiency of particles from b-hadron decays is found to be independent on the total number of particles in the event. The average number of selected particles per event is \(\sim\)10, from the initial number of \(\sim\)140. A good event reduction is obtained irrespectively of the number of particles originating from b-hadron decays per event, as demonstrated by the linear behaviour of the confusion matrix presented in Fig. 4. For the set of selected particles per event, an average purity of 60% is found, defined as the number of selected particles that truly originate from b-hadron decays over the total number of selected particles.

Fig. 3
figure 3

Average particle-selection efficiency as a function of the total number of particles per event, shown separately for (blue) particles originating from a b-hadron decay and (red) particles from the rest of the event

Fig. 4
figure 4

Confusion matrix for the true vs. predicted number of particles from b-hadron decays per event, computed in terms of percentages normalised for each row (true value) and shown for the square subregion corresponding to a number of particles between 2 and 16

Quality of the decay-chain reconstruction. Apart from helping in background suppression in offline analysis, being able to accurately reconstruct and classify the decay chains in an event can allow DFEI to allow a further level of automation to the LHCb trigger, as introduced in Sect. "Introduction".

A first metric that can serve for characterising the overall understanding of the event in this regard, and be used for benchmarking purposes, is the fraction of events in which DFEI achieves a perfect event reconstruction (PER). For an event to fulfil this condition, all the b-hadron decays in the event need to have been found, all the charged reconstructed particles produced in them been selected, the associated “topological” decay chains been exactly reconstructed, and all the particles from the rest of the event been removed. An example of a PER case found by DFEI in the evaluation dataset is shown in Figs. 5 and  6, from the points of view of the ancestor-chain reconstruction and of the reconstructed-particle filtering, respectively. The average fraction of PER found in the evaluation dataset is \((2.14\pm 0.07)\%\).

It should be noted that the PER is an extremely challenging case, and that even a partially good reconstruction can be used for trigger purposes. For example, the selection of extra particles from the rest of the event will break the conditions for a PER, but will not impact the efficiency for selecting all the particles produced in b-hadron decays.

Fig. 5
figure 5

Example of a PER from the evaluation dataset. The (top) reconstructed and (bottom) ground truth b-hadron decay chains in the event are shown. Apart from the reconstructed particles produced in those decays, the event contains 106 particles from the rest of the event (not shown for simplicity), all of which are correctly removed by DFEI. The dark-green (light green) circles represent the reconstructed particles (topological ancestors). The key (k) numbers correspond to unique identifiers for each reconstructed particle produced in the simulation. The cluster (c) numbers correspond to unique identifiers assigned to each ancestor during the construction of the decay chains. The true identity of the particles is shown in the ground-truth case

Fig. 6
figure 6

Example of a PER from the evaluation dataset, same as in Fig. 5. Two-dimensional view of the charged reconstructed particle trajectories in the proton–proton interaction region. Red lines represent particles produced in b-hadron decays, that DFEI has correctly selected, and gray lines represent particles from the rest of the event, that DFEI has correctly removed

Decay-Level Performance

The performance shown in the previous section refers inclusively to all the heavy-hadron decays per event. For each of them, it considers an average over all the known b-hadron species and their known decay types. In this section, the DFEI output is processed to obtain predictions for individual decays. First, all the true decay chains of a certain type are identified in the simulation dataset, taking note of the events in which they were produced. Then, DFEI is run for each of those events, outputting a set of candidate decay chains (connected sub-graphs) per event. Each true decay chain is finally compared with the corresponding set of candidate decay chains, to classify the DFEI reconstruction into one of the following mutually exclusive categories:

  • Perfectly reconstructed decay: all the reconstructed particles originating from the b-hadron decay have been predicted to be part of the same connected sub-graph, which is disconnected from all the other particles in the event, and the “topological” ancestor decay chain has been perfectly reconstructed.

  • Wrong hierarchy: same as before, but there is at least one mistake in the reconstruction of the “topological” ancestor decay chain.

  • Not isolated: all the reconstructed particles originating from the b-hadron decay have been predicted to be part of the same connected sub-graph, but there is at least one extra particle from the rest of the event which is also contained in that sub-graph. This category does not consider the specific “topological” decay chain reconstruction of the sub-graph, and is solely based on the association with extra particles.

  • Partially reconstructed: not all of the reconstructed particles originating from the b-hadron decay have been predicted to be part of the same connected sub-graph. As before, this category does not consider the specific “topological” decay chain reconstruction, and is solely based on the impossibility to group all the desired particles in a single sub-graph. It should be noted that this type of reconstruction does not necessarily imply an overall inefficiency in selecting the particles from the b-hadron decay since they can have been selected in multiple sub-graphs.

Examples of the different types of reconstruction for a given true decay chain are shown in Fig. 7.

Table 5 Decay-level performance of DFEI for the inclusive (\(H_b\)) case and for several exclusive decay types
Fig. 7
figure 7

Examples for the types of decay-chain reconstruction defined in the text. Green (red) nodes represent particles created in the true decay chain (from the rest of the event). Blue (orange) nodes represent true (candidate) ancestors

The decay-level performance is first computed in an inclusive way using the evaluation dataset, by measuring individually the response for all the b-hadron decays contained in the simulation and then looking at the fraction of decays reconstructed in each of the four possible categories. The numbers are reported in Table 5. Complementary to the inclusive case, the DFEI response is evaluated in a second stage restricted to specific decay types, by using the additional datasets introduced in Sect. "Dataset". The resulting numbers are also reported in Table 5. Those modes are representative of the most typical case studies of LHCb, with the inclusive sample also containing decays to many particles and more complicated decay topologies, for which the reconstruction is more challenging.

The performance evaluated on the exclusive modes is significantly better than the inclusive case, with fractions of perfectly reconstructed decays in the range 20–40%. The comparative study of the performance on the different exclusive modes helps to understand which cases are easier or harder for DFEI to reconstruct, and in general to analyse the dependencies of the DFEI response. The most complicated cases are found to be \(B^{0} \rightarrow D^{-}[K^+\pi ^-\pi ^-] D^{+}[K^-\pi ^+\pi ^+]\) (with two three-particle vertices very separated in space, given the long lifetime of the \(D^+\) meson), and \(\Lambda _{b}^{0} \rightarrow \Lambda _{c}^{+} [p K^{-} \pi ^{+}] \; \pi ^{-}\) (with a single \(\pi ^+\) that needs to be associated to a spatially separated three-particle vertex. The difference in performance between the second of the previous decays and \(B_{s}^{0} \rightarrow D_{s}^{-} [K^{-} K^{+} \pi ^{-}]\), which has a similar topology, is due to the \(\Lambda _{c}^{+}\) flying more on average than the \(D_{s}^{-}\), due to a significantly larger Lorentz boost. The fraction of partial reconstruction is below \(10\%\) in all the exclusive cases except for the \(\Lambda _{b}^{0} \rightarrow \Lambda _{c}^{+} [p K^{-} \pi ^{+}] \; \pi ^{-}\) decay, which translates into an efficiency for selecting all the reconstructed particles produced in those decays above \(90\%\).

Timing Studies

Detailed timing studies and an optimisation of the inference speed of the DFEI algorithm are out of the scope of this paper, and are left for future research. However, a first, simplified, timing study of the current prototype is shown in this section. The first motivation for the study is to understand the scalability of the response with the object multiplicity per event. The second goal is to estimate how the current event-processing rate achievable by the algorithm compares with the requirements to run DFEI in the LHCb Run 3 trigger. Since the algorithm runs over reconstructed tracks, at the moment the target would be the Run 3 HLT2 trigger, which runs on CPU. As explained in App. B, this would imply a processing rate in the ballpark of 500 Hz per computing node (the precise target number would depend on internal LHCb considerations).

Fig. 8
figure 8

Average evaluation time per event of the different DFEI modules as a function of the total number of reconstructed particles in the event. The error bars correspond to the standard deviation

The timing study is done on a CentOS Linux 7 (Core) x86 architecture, using a 2.2 GHz Intel Core Processor (Broadwell, IBRS). No parallelisation scheme is employed. The average computing time required for the evaluation of the NP, EP and LCAI modules as a function of the total number of particles in the event is computed and reported in Fig. 8. In this configuration, the NP is both the slowest module and the one that presents the strongest scaling as a function of the event size, hence the one that can profit the most from a future optimisation in terms of timing. The average of the combined NP + EP + LCAI times is approximately 1 s per event.

The time needed to create the input graphFootnote 3 for each of the three modules and to post-process their output (i.e. filtering nodes and edges and interpreting the predicted LCA values in terms of reconstructed decay chains) is not included in the previous study. From all these auxiliary tasks, the only one that does not have a processing time significantly under 1 s is the graph construction of the NP module, that requires an average of 2 s per event.

Taking into account these first studies, a strategy to speed up the full algorithm in order to meet the trigger constraints is outlined in Sect. "Future Work".

Discussion

The proposed approach for a multi-heavy-hadron-decay reconstruction of b-hadron decays in a hadronic environment is the first of its kind. To allow the benchmarking of future efforts in this new scenario, all the datasets used for the training and evaluation performance of DFEI have been made publicly available [46]. In this section, the performance obtained with this first prototype is discussed, in reference to the global context.

On a first stage, the reconstructed-particle selection capabilities can be compared with previous studies in LHCb. The closest case study, reported in Ref. [10], considers the subset of reconstructed particles that have been selected by a standard LHCb inclusive trigger algorithm, and attempts to discern whether each of the other particles in the event has been produced in the same b-hadron decay or not. By combining vertex-quality requirements and the output of a multivariate algorithm trained on individual-particle features, the authors estimate an approximate selection efficiency for particles from the same b-hadron decay of 90% for an approximate background rejection power of 90%. That study is based on official LHCb simulation, which contains material-interaction backgrounds and fake-track backgrounds, not included in the simulated dataset used in this paper. Both simulations, however, aim at representing inclusive b-hadron decays in LHCb Run 3-like conditions. The performance of DFEI (94% selection efficiency for particles from b-hadron decays and 96% background rejection power) is similar and numerically higher, within the caveats of the comparison. Most importantly, it shows a powerful discrimination consistently for all the b-hadron decays present in the event at the same time, instead of focusing on an individual decay. It should be noted that the strategy presented in Ref. [10] is not used in production by LHCb. The difference between the two approaches will only increase in the much harsher object-multiplicity conditions expected for LHCb Upgrade II. The almost flat response found in DFEI for the particle-selection efficiencies as a function of the number of particles in the event also suggests good prospects for the Upgrade II conditions.

On a second level, regarding decay-chain reconstruction, DFEI has demonstrated for the first time that this kind of reconstruction can be done successfully both in a hadronic environment and in a multi-decay-chain scenario. Given the novelty of the approach, the performance at this level can only be partially compared with the one achieved by the FEI algorithm at the Belle II experiment, and with significant caveats. On one side, as explained in Sect. "Introduction", the reconstruction in LHCb is a much more difficult task than in Belle II. On the other side, the DFEI prototype for LHCb makes use of several, previously introduced, simplifications: omitting particles produced outside the geometrical acceptance, not including neutral reconstructed particles and reconstructing only the “topological” decay chains, not the full ones. Keeping the previous caveats in mind, the fraction of perfect decay reconstruction obtained in this paper can be approximately compared to the so-called tag-side reconstruction efficiency determined in Ref. [32] using a Belle simulated dataset, which is of the order of few per cent for semileptonic decays and of few per mille for hadronic decays. The conclusion that can be drawn from this comparison is that DFEI manages a level of reconstruction of decay chains in a hadronic environment which is in the ballpark of that achieved in Belle (II), hence demonstrating not only the feasibility but also the competitiveness of a Full Event Interpretation approach at the LHC.

Concerning offline analysis applications, a study of different possible types of DFEI reconstruction for specific ground truth decay chains is reported in Sect. "Decay-Level Performance". A technically similar but conceptually different study could be done on a collision dataset, focusing this time on the DFEI prediction for the reconstructed particles output by any standard LHCb analysis preselection. Those preselections aim at identifying particles that are compatible with having been produced in a specific type of decay chain, which are denoted as signal candidates. The DFEI output can be used to classify each signal candidate in one of the following categories: “signal” (if the reconstruction matches the expected decay chain), background with a different resonance structure (if the selected reconstructed particles are deemed to be correct but the predicted hierarchy is not), background from decays with extra particles (some of which are not part of the signal candidate) and combinatorial background (where the candidate particles are predicted to originate from multiple sources). This implies that DFEI could virtually be used in every LHCb analysis to suppress/study the different possible types of contributing backgrounds with a potentially higher background separation power, by leveraging all the information in the event.

Future Work

The work in this paper opens the door for multiple future research lines. Natural follow up steps are detailed performance studies on official LHCb simulation and Run 3 collision data. This will allow to assess the impact of the DFEI reconstruction in a broad spectrum of decay distributions, to understand the potential needs for further optimisations/calibrations of the algorithm. Another natural continuation is the extension of the developments and studies to Upgrade II conditions. Additionally, the DFEI functionality is expected to be expanded, to include neutral reconstructed particles, charm-hadrons decays and particle-identity information. This can bring potential new complementary applications of DFEI, such as providing enhanced flavour-tagging capabilities to LHCb.

Regarding speeding up the inference, a design optimisation of the NP module, for example substituting the GNN by a combination of independent multi-variate classifiers per particle or by a nearest-neighbour selection in a learnt embedding, together with an overall hyperparameter optimisation can bring large reductions to the evaluation time. Significant additional speed ups can be gained by converting the full DFEI pipeline into C++ [50, 51] (which is by itself a technical requirement to run DFEI in the current LHCb trigger). The combination of the suggested improvements gives a good hope to achieve the target event-processing rate discussed in Sect. "Timing Studies". Finally, regarding the utilisation of DFEI in the LHCb Upgrade II, the inference of the GNN modules could become much faster by the usage of GPUs [41, 42, 50] or FPGAs [37,38,39,40] as hardware accelerators in the trigger system. For example, in Ref. [50], graphs of order 100,000 nodes are segmented with GNNs (in a technically similar way to this work) in less than 1 s.

Conclusion

This is the first proof of concept for an inclusive event processing at the LHC in a high-multiplicity environment focused on the identification and explicit reconstruction of all the heavy-hadron decay chains in the event. It is heavily based on deep learning and uses GNNs to optimally capture the event structure. To keep the approach computationally scalable, the algorithm is divided into three stages: node pruning removes all the nodes that are not associated with a heavy-hadron decay, edge pruning removes all the edges that do not share the same ancestor inference and finally the lowest-common-ancestor that predicts the hierarchical decay relations of particles, allowing to completely reconstruct all decays. The algorithm has been trained using a simulated dataset that emulates LHCb Run 3 conditions, and is specialised for beauty hadron decays and charged reconstructed particles.

The algorithm is able to separate between particles originating from b-hadron decays and those from the rest of the event better than previous approaches in similar conditions at LHCb. The resulting fraction of perfectly reconstructed b-hadron decay chains is in the ballpark of the one obtained by the FEI algorithm in an electron-positron environment, showing not only the feasibility but also the competitiveness of this approach at the LHC.

The performance of DFEI is studied in detail both at the global event level and at the individual b-hadron decay level, using both inclusive and exclusive samples containing typical decays of interest for LHCb. A particularly good performance is found in the exclusive modes, in terms of both the efficiency of a perfect decay-chain reconstruction (in the range 20–40%) and the efficiency to identify all the reconstructed particles originated from the decay (above \(90\%\) in most of the cases).

The application of the algorithm for data analysis at offline level is discussed, explaining how DFEI can be used as a common tool to identify and classify different types of background. These capabilities can already be explored with the Run 3 dataset, which is currently being collected. In terms of charged reconstructed particles, the current DFEI algorithm achieves a \(14\times\) event reduction factor in Run 3 conditions, for a \(94\%\) efficiency in the selection of particles from b-hadron decays in the event. For illustration, if this kind of performance was achieved in Upgrade II conditions and all the event information was solely related to charged particles, the saving factor would translate into a \(14\times\) larger integrated luminosity that could be recorded, compared to storing the full event information. This shows the strong potential of the DFEI approach, while accurate estimates of the gaining factor in Upgrade II conditions will be the focus of future research.

To be used in the trigger, the DFEI algorithm needs to be able to process events at high rate. A first timing study of the DFEI algorithm is performed and several steps towards achieving the target event-processing rate are identified.

Finally, the successful development of the DFEI prototype opens the door to future research towards expanding its functionality and use cases in LHCb and can inspire similar developments in other LHC experiments for the HL-LHC Phase.