Search for neutral long-lived particles in pp collisions at √ s = 13 TeV that decay into displaced hadronic jets in the ATLAS calorimeter

: A search for decays of pair-produced neutral long-lived particles (LLPs) is presented using 139 fb − 1 of proton-proton collision data collected by the ATLAS detector at the LHC in 2015–2018 at a centre-of-mass energy of 13 TeV. Dedicated techniques were developed for the reconstruction of displaced jets produced by LLPs decaying hadronically in the ATLAS hadronic calorimeter. Two search regions are deﬁned for diﬀerent LLP kinematic regimes. The observed numbers of events are consistent with the expected background, and limits for several benchmark signals are determined. For a SM Higgs boson with a mass of 125 GeV, branching ratios above 10% are excluded at 95% conﬁdence level for values of c times LLP mean proper lifetime in the range between 20 mm and 10 m depending on the model. Upper limits are also set on the cross-section times branching ratio for scalars with a mass of 60 GeV and for masses between 200 GeV and 1 TeV.


Introduction
Many of the scenarios proposed to address some of the open questions of the Standard Model (SM) predict the existence of new non-SM particles whose lifetimes can be long enough for their decays to be significantly displaced from the interaction point but still within the ATLAS detector. Examples of theories involving such suitably long-lived particles (LLPs) are various supersymmetric (SUSY) models [1][2][3][4][5][6][7]; neutral naturalness models [8][9][10][11] which feature a hidden sector (HS) [12][13][14] that addresses the hierarchy problem; models that seek to incorporate dark matter [15][16][17][18], or explain the matter-antimatter asymmetry of the universe [19]; and models that lead to massive neutrinos [20,21] which provide an explanation for the origin of light neutrino masses and mixings.
The decay of an LLP created in proton-proton collisions in the ATLAS detector could produce one of a variety of highly unconventional signatures, depending on the detector subsystem in which the LLP decays. Searches for promptly decaying particles often have very low sensitivity to such signatures and hence the study of LLPs requires dedicated analyses. This paper presents a search sensitive to neutral LLPs decaying chiefly in the calorimeters of the ATLAS detector. This allows the analysis to probe values of cτ (where c is the speed of light and τ is the mean proper lifetime of the LLP) ranging between a few centimetres and a few tens of metres.

JHEP06(2022)005
insertable B-Layer [39][40][41]. It was added in 2014 and provides high-resolution hits at small radius to improve the tracking performance. The pixel detector is surrounded by the silicon microstrip tracker, which usually provides four three-dimensional measurement points per track. These silicon detectors are complemented by the transition radiation tracker, with coverage up to |η| = 2.0.
The calorimeter system covers the pseudorapidity range |η| < 4.9. Within the region |η| < 3.2, electromagnetic calorimetry is provided by barrel and endcap high-granularity lead/liquid-argon (LAr) electromagnetic calorimeters (together referred to as the ECal), with an additional thin LAr presampler covering |η| < 1.8 to correct for energy loss in material upstream of the calorimeters. The ECal extends from 1.5 m to 2.0 m in radial distance r in the barrel and from 3.6 m to 4.25 m in |z| in the endcaps. Hadronic calorimetry is provided by a steel/scintillator-tile calorimeter (HCal), segmented into three barrel structures within |η| < 1.7, and two copper/LAr hadronic endcap calorimeters covering |η| > 1. 5. The HCal covers the region from 2.25 m to 4.25 m in r in the barrel (although the HCal active material extends only up to 3.9 m) and from 4.3 m to 6.05 m in |z| in the endcaps. The solid angle coverage is completed with forward copper/LAr and tungsten/LAr calorimeter modules optimised for electromagnetic and hadronic measurements, respectively.
The calorimeters have a highly granular lateral and longitudinal segmentation. Including the presamplers, there are seven sampling layers in the combined central calorimeters (the LAr presampler, three in the ECal barrel and three in the HCal barrel), and the endcap regions provide up to eight sampling layers (the presampler, three in ECal endcaps and four in HCal endcaps). The forward calorimeter modules provide three sampling layers in the forward region. The total amount of material in the ECal corresponds to 24-35 radiation lengths in the barrel and 35-40 radiation lengths in the endcaps. The combined depth of the calorimeters for hadronic energy measurements is more than 9 hadronic interaction lengths nearly everywhere across the full detector acceptance.
The muon spectrometer comprises separate trigger and high-precision tracking chambers measuring the deflection of muons in the magnetic field generated by the superconducting air-core toroids. The field integral of the toroids ranges between 2.0 and 6.0 T m across most of the detector.
The ATLAS detector selects events using a tiered trigger system [42]. The level-1 (L1) trigger is implemented in custom electronics and reduces the event rate from the LHC crossing frequency of 40 MHz to a design value of 100 kHz. The second level, known as the high-level trigger (HLT), is implemented in software running on a commodity PC farm that processes the events and reduces the rate of recorded events to 1 kHz. An extensive software suite [43] is used in the reconstruction and analysis of real and simulated data, in detector operations, and in the trigger and data acquisition systems of the experiment.

Data and simulation samples
Data collected by the ATLAS detector during the period 2015-2018 from proton-proton (pp) collisions at √ s = 13 TeV are used in this search. Only data collected during stable beam conditions in which all detector subsystems were operational are considered [44]. These High-E T CalRatio trigger with E T > 60 GeV  3  33  41  40  117 High-E T CalRatio trigger with E T > 100 GeV -- 44 59 103 Low-E T CalRatio trigger (2016 version) -11 43 -54 Low-E T CalRatio trigger (2018 version) ---59 59 Table 1. CalRatio triggers which were available during the LHC Run 2 data-taking, and corresponding integrated luminosity collected in each period. The high-E T CalRatio trigger with E T > 60 GeV was disabled in 2017 for instantaneous luminosities higher than 1.4 × 10 34 cm −2 s −1 . Two versions of the low-E T CalRatio trigger were used, with slight differences in their algorithms. The details are reported in section 4.
data were collected with a set of dedicated LLP signature-driven triggers and separated into four datasets, defined according to the triggers used to collect them. The search is performed on the main dataset, composed of all data events passing at least one of the two types of CalRatio triggers [45] running on bunch crossings where protons were present in both beams. The name "CalRatio" refers to the ratio of energies deposited in the hadronic and electromagnetic calorimeters. As described in detail in section 4, these include the low-E T CalRatio triggers and high-E T CalRatio triggers. Different versions of these triggers ran during the full data-taking period. The amount of data collected in each case is summarised in table 1, with at least one of these triggers running at any given time during the entire data-taking. Two additional datasets were collected for the study of non-collision backgrounds. The BIB dataset, used for the study of BIB events faking signal events, was collected from events failing the CalRatio trigger BIB-removal algorithm. The cosmics dataset, used for the estimation of cosmic-ray events passing the analysis selection, was collected from events recorded during empty bunch crossings, as described in section 4. Finally, a dijet dataset is selected using a single-jet-based trigger and vetoing on the CalRatio triggers to make it orthogonal to the main dataset. This dataset is used in the neural network training described in section 4.2 and in the calculation of some of the systematic uncertainties involved in the analysis.
The HS Φ → ss signal samples were generated using the MadGraph5_aMC@NLO v2.6.2 [46] generator at leading order (LO) with the NNPDF2.3lo parton distribution function (PDF) set [47]. In these samples, Φ is produced via gluon-gluon fusion. The Φ transverse momentum distribution for the samples is reweighted to match that obtained for corresponding next-to-leading-order (NLO) predictions using the same event generator. A production cross-section of 48.6 pb, taken from a next-to-next-to leading-order calculation [48], is assumed when normalising results for the case where the mediator is the SM Higgs boson (the mass of which was set to 125 GeV). Parton showering and hadronisation was modelled using Pythia 8.230 [49] with the A14 set of tuned parameters (tune) [50]. Several sets of samples were generated, with different assumptions for the masses of the -5 -JHEP06(2022)005 mediator (m Φ ∈ [60, 1000] GeV) and LLPs (m s ∈ [5, 475] GeV). These mass ranges ensure an extensive LLP boost spectrum, allowing a wide variety of topologies to be studied. The LLP mean proper lifetime at which each of the signal samples was generated (τ gen ) was determined so as to maximise the fraction of decays in the ATLAS hadronic calorimeter and muon system, and is quoted in relevant tables and figures throughout this document. Some of the samples were generated for two assumptions about the LLP generated lifetime: one sample is used to study the signal throughout the analysis, while the other sample (with the alternative lifetime assumption but exactly the same mass choices) is used to validate the procedure for extrapolating limits to different mean proper lifetimes of the long-lived scalar.
The dominant SM background in this analysis is SM multijet production. Although a data-driven method is used to perform the background estimation, Monte Carlo (MC) simulated multijet events are needed to train the per-jet neural network to discriminate between signal jets and the multijet background, and to evaluate some of the systematic uncertainties. The samples were generated at LO with Pythia 8.186 [51] using the A14 tune for parton showering and hadronisation. The NNPDF2.3lo PDF set [47] was used.
The effect of multiple interactions in the same and neighbouring bunch crossings (pile-up) was modelled by overlaying the simulated hard-scattering event with inelastic pp collision events generated with Pythia 8.186 [51] using the NNPDF2.3lo PDF set and the A3 tune [52].
The detector geometry and response were simulated with Geant4 [53,54]. The standard ATLAS reconstruction software is used for both simulation and collision data.

Trigger and event selection
This section describes how the data were collected, processed and selected for the analysis. The main dataset was collected with dedicated CalRatio triggers, and auxiliary datasets are also defined so as to be enriched in background events such as BIB, cosmic rays, or SM dijets. Using a combination of data and simulated samples, per-jet neural networks are trained to distinguish three classes of objects: signal-like jets, BIB-like jets, and non-displaced jets. The neural network training architecture involves an adversarial network, which is used to ensure that differences between data and simulation in the training samples are not exploited. Details are provided in section 4.2. Per-event boosted decision trees (one each for low-E T and high-E T signals) make use of the per-jet neural network scores as well as several other event-level variables to construct discriminants which separate signal from all sources of background. Some additional requirements are applied to maximise the sensitivity of the final selection. Finally, the data-driven ABCD method, explained in detail in section 5, is used to estimate the remaining background in the final selection. In this method the boosted decision tree score is used as one of two axes which define the ABCD plane.

Triggers and event preselection
Events are selected by the CalRatio triggers [45], which are designed to identify jets that result from neutral LLPs decaying near the outer radius of the ECal or within the HCal. The combined L1 and HLT selections make use of the main characteristics of jets resulting   from the decay of a neutral LLP in the calorimeters: due to the late development of these jets, they appear narrower than standard jets, they have a high fraction of their energy deposited in the HCal, and they are isolated from tracks. Two types of such triggers are used, differing only in the L1 trigger selection.
At L1, two different triggers are used. The high-E T L1 triggers select narrow jets with energy deposits above certain thresholds (either E T > 60 GeV or E T > 100 GeV) in a 0.2 × 0.2 (∆η × ∆φ) region of the ECal and HCal combined [55]. The low-E T L1 trigger makes use of the L1 topological trigger system [56] by accepting events where the largest energy deposit, with total-E T threshold at 30 GeV, does not geometrically overlap in the η-φ plane with any energy deposits with E T > 3 GeV in the ECal. The veto on ECal deposits ensures a high value of the ratio at L1 of energy deposited in the HCal to energy deposited in the ECal, E H /E EM > 9, rejecting a large portion of the background with typical values of E H /E EM <1, and allowing the E T threshold to be kept low. This looser E T requirement increases the efficiency for lower-p T LLPs. In its 2016 version, the trigger requires that if there is more than one HCal deposit, the second-most energetic one is also isolated from energy deposits in the ECal. This condition was removed in 2018, which helped to increase the trigger efficiency as seen in figures 2 and 3.
At the HLT, a two-step selection algorithm is applied to CalRatio triggers, regardless of the L1 selection. In the first step, calorimeter deposits are clustered into jets using the anti-k t algorithm [57,58] with radius parameter R = 0.4. The standard jet-cleaning requirements [59] applied offline in most ATLAS analyses reject jets with high values of E H /E EM , one of the key characteristics of the displaced hadronic jets, and are therefore not included in these triggers. A dedicated cleaning algorithm (referred to as CalRatio jet  cleaning and applied starting in 2016) is applied instead. This algorithm is based on the standard jet cleaning described in ref. [59] but with the requirements on the jet E H /E EM substituted by a condition to reject any jet with more than 85% of its energy associated with a single calorimeter sampling layer and with an absolute value of its negative energy higher than 4 GeV, calculated as the sum of the energy in all cells with negative energy, caused by electronic and pile-up noise. At least one of the HLT jets passing the CalRatio jet cleaning is required to satisfy E T > 30 GeV, |η| < 2.5 and log 10 (E H /E EM ) > 1.2. Jets satisfying these requirements are used to determine 0.8 × 0.8 regions in ∆η × ∆φ centred on the jet axis in which to perform charged-particle reconstruction (tracking). Triggering jets are required to have no tracks with p T > 2 GeV within ∆R = 0.2 of the jet axis.
In the second step, events passing the CalRatio triggers are required to pass a BIBremoval algorithm. Events which pass the first step but fail this algorithm are collected to form the BIB dataset. Muons from BIB enter the HCal horizontally and may radiate a photon via bremsstrahlung, generating an energy deposit that may be reconstructed as a signal-like jet. BIB deposits are more likely to be aligned in φ since they travel parallel to the LHC beam pipe. Furthermore, muons from BIB leave these energy deposits in the calorimeter earlier than particles resulting from a collision in the same bunch crossing. The BIB-removal algorithm applied at the HLT therefore identifies events as containing BIB if the triggering jet has at least four HCal-barrel cells close together in φ and in the same calorimeter layer but not belonging to the same jet and requiring that every cell's timing be consistent with that of a BIB deposit [60]. This algorithm has been estimated to have an efficiency of approximately 70% in BIB identification.

JHEP06(2022)005
All events in the main dataset are required to pass at least one of the two CalRatio triggers described in this section. The same HLT selection is also active in empty bunch crossings. These are crossings where protons are absent in both beams and isolated from filled bunches by at least five unfilled bunches on either side. Events in empty bunch crossings that have at least one 0.2 × 0.2 (∆η × ∆φ) calorimeter energy deposit with E T > 30 GeV at L1, and which pass the two steps of the HLT selection algorithm, are stored in the cosmics dataset.
The dijet dataset is collected using the single-jet L1 trigger with the lowest energy threshold in any given data-taking period, and at the HLT, events are required to have at least one jet with p T > 400 GeV or p T > 420 GeV depending on the data-taking period. Events passing any of the CalRatio triggers described above are vetoed from this selection.
The CalRatio trigger efficiency for simulated signal LLPs is defined as the fraction of jets spatially matched (∆R < 0.2) to one of the generated LLPs that fire the trigger. The trigger efficiency as a function of triggering LLP particle-level p T is shown in figure 2 for two signal samples. Only generated LLPs decaying in the calorimeters are considered in this plot. The 60-GeV-high-E T CalRatio trigger, which is seeded by the high-E T L1 trigger with an E T threshold of 60 GeV, provides the highest efficiency (about 80%) to select LLPs with p T > 100 GeV. The 2018 version of the low-E T CalRatio trigger (seeded by the low-E T L1 trigger) has a similar efficiency for high-p T LLPs, although it is slightly lower in some p T ranges. For LLPs with p T < 100 GeV, both versions of the low-E T CalRatio trigger are able to recover part of the inefficiency that is present in the high-E T CalRatio trigger. This is especially relevant in the low-E T samples, which are mostly populated by low-p T LLPs.
The trigger efficiency also depends strongly on the LLP decay position, as shown in figure 3 for two samples of simulated signal events. The efficiency as a function of LLP decay length in the x-y plane is shown for LLPs decaying in the calorimeter's barrel (|η| < 1.4). The selection is most efficient in the HCal for both triggers. The fact that the low-E T samples contain a large fraction of LLPs with low p T makes these samples have a lower trigger efficiency. The total per-event trigger efficiency in signal is obtained by applying a logical OR of the CalRatio triggers, according to the conditions and luminosities listed in table 1 on a per-event basis, to the complete signal sample integrated over all LLP transverse momenta and for any decay position. Given that the CalRatio triggers accept events where at least one of the jets fulfils the two-step HLT conditions, this efficiency accounts for events where one or both LLPs decay in the calorimeters. In the HS signal, the total trigger efficiency rises from 1.6% in the sample with m Φ = 125 GeV and m s = 55 GeV to 28.8% for the m Φ = 1000 GeV and m s = 50 GeV sample.
For every event, primary vertices (PV) with at least two tracks with p T > 500 MeV are reconstructed. If more than one PV candidate is found, the one with the largest sum of the squares of the transverse momenta of all its associated tracks is chosen as the PV. Clean jets are used at several steps of the event selection. They are required to have p T > 40 GeV, |η| < 2.5 and to pass the CalRatio jet cleaning. To select events with trackless jets, an additional event-level variable, ∆R min (jet, tracks), is used. The quantity ∆R min (jet, tracks) is defined as the angular distance between the jet axis and the closest track with p T > 2 GeV that is associated with the PV, and ∆R min (jet, tracks) is calculated by summing this distance over all the clean jets with p T > 50 GeV. Events with no displaced decays have a very small value of this variable. Every displaced jet contributing to the sum causes a considerable increase in its value, making this variable a good discriminator between signal and multijet background.
For an event to pass the analysis preselection, it is required to have passed at least one of the CalRatio triggers, to contain a PV and at least two clean jets, and to have ∆R min (jet, tracks) > 0.5. After preselection, ∆R min (jet, tracks) still has good discrimination power and it is used in the data-driven background estimation described in section 5.

Displaced-jet-tagging neural network
A neural network (NN) is trained on a per-jet basis to discriminate displaced jets in signal from non-displaced jets and from jets generated by BIB. The architecture chosen for the network is a set of convolutional layers fed into a long short-term memory (LSTM) layer using Keras [61] (version 2.2.4) with TensorFlow [62] (version 1.12.0) as the backend. The displaced-jet-tagging NN is trained on SM multijet MC samples, the BIB dataset and a combination of the simulated HS signal samples. Each signal sample is split into two orthogonal subsets, one used for training and the other for application in the rest of the analysis chain. In the signal samples, only jets matched to MC generated LLPs decaying after passing through the ATLAS inner detector are used in the training. In the SM multijet samples, the two highest-p T clean jets in the event are used in the training. In the BIB dataset, only the triggering jet is used. About 1 million jets of each class are obtained, with an 80:10:10 training, testing, validation split. Signal jets with small boosts are in general harder to discriminate from BIB and SM multijets than those with large boosts. In order to obtain the optimal training for all boost regimes, two separate NNs are trained. The low-E T NN is trained using the HS signal samples with m Φ = 60, 125, 200 GeV, which typically contain low-p T signal jets with small boosts. The high-E T NN is trained on the HS samples with m Φ = 400, 600, 1000 GeV, dominated by high-p T jets.
The input variables are low-level features associated with each jet, using information from the entirety of the ATLAS detector: positions, momenta, impact parameter and quality variables of inner-detector tracks which are within ∆R = 0.2 of jets; momenta, timing information and positions of calorimeter topoclusters (collections of calorimeter cells used in jet reconstruction [63]) associated with each jet, as well as the fraction of the jet energy deposited in each layer of the ECal and HCal; and spatial and timing information for muon track segments within ∆φ = 0.2 of a jet. In addition, the momenta and positions of the jets themselves are also used. Overall, the input to the neural network consists of 10 track variables for up to 20 tracks per jet, 12 topocluster variables for up to 30 topoclusters per jet, 6 muon-segment variables for up to 30 muon segments per jet, and 3 jet variables.
To check the modelling of variables prior to training, simulated distributions of the per-jet NN inputs are compared with data and fair agreement is generally observed, although some variables display some mismodelling; this is addressed during training by using the adversarial technique described below.

JHEP06(2022)005
The displaced-jet-tagger uses a combination of two specialised network architectures. The first one is a one-dimensional (1D) convolutional neural network (CNN) which acts as a feature extractor. A convolutional layer performs a series of convolution operations on a 1D array of inputs, corresponding to the variables per subdetector listed above, separately for the tracker, calorimeter and muon-segment variables. A kernel size of 1 is used for the 1D convolutions, with a series of filters being applied which up-sample the features per object in the first filter, then down-sample as the filter goes down in size, working as a feature extractor. In this way the network can exploit the correlations between the input variables. The second step is the LSTM [64] network, where the memory uses relations between subsequent inputs (for example, between subsequent topoclusters) which allows the network to exploit the correlations between the different topoclusters, tracks and muon segments separately. The topocluster and track inputs to the LSTM are ordered in descending p T , which was shown to enhance the performance of the LSTM. No ordering of the muon segments is applied, but up to 30 muon segments per jet are included which is enough to contain all muon segments for the vast majority of input jets.
The outputs of the individual networks learning from the different subdetectors are concatenated with the jet variables as input to one network which has a number of fully connected layers, named 'dense layers'. The output of this final, concatenated network is three predictions: signal NN score, SM multijet NN score and BIB NN score. Each prediction is a score between 0 and 1 and it is closer to 1 the more confident the NN is that a jet belongs to this particular class; the three scores add up to 1. Parameters such as learning rate, regularisation and learning nodes per layer are optimised by scanning a multidimensional grid.
Finally, given that the NN is trained on simulation samples as well as on data samples, an adversarial network is added to the output of the jet-tagging NN with the objective of mitigating the dependence of the displaced-jet-tagger outputs on potential mismodelling of any of the input variables in simulation. The adversarial network concept is based on ref.
[65] but built specifically for the input variable mismodelling circumstances of this search. An illustration of the architecture is shown in figure 4. The adversarial network is trained on a selection of SM multijet MC and collision data events in a dijet control region separate from the signal selection used to train the principal NN. This dijet control region is constructed from events in the dijet dataset which pass the following requirements: the leading and subleading jets must pass standard quality selections and have p T > 400 GeV and p T > 60 GeV respectively; they must have high angular separation (|∆φ| > 1.6); and the magnitude of the vectorial sum of transverse momenta of these jets (H miss T ) is required to be below 120 GeV.
The NN learns the differences between simulation and real data and feeds this information back to the jet-tagging neural network such that the latter network avoids taking advantage of the MC mismodelling. The whole network tries to minimise the loss function. This is achieved by simultaneously minimising the principal network's loss function (learning to discriminate Signal from BIB and SM multijets) and maximising the adversary's loss function (learning not to discriminate between simulation and data). During training the cross-entropy function H( x) is used, combining the cross-entropy of the main network  where x is the signal, multijet and BIB jets and the cross-entropy of the adversarial network where x is the input variables of the control region MC simulation and data jets. The adversary part is modulated by a scale factor λ, with a value of λ = 10 chosen to minimize the impact of mismodelling in the final NN output while maintaining good separation. The adversarial cross-entropy term is subtracted from the main cross-entropy term to give the total loss function. The adversary greatly improves the level of agreement between data and SM multijet simulation in the jet-tagging NN outputs, reducing the impact of any potential mismodeling of the NN input variables along the analysis chain. Figure 5 shows the BIB NN score in the dijet control region in data and in simulation, before and after including the adversarial network in the training. The remaining small discrepancies in the modelling of the NN output variables are covered by a dedicated uncertainty, which is described in section 6 and applied to the signal efficiency only, as the background estimate in this analysis is data-driven.
Over-fitting during training is monitored by checking the loss in the training and testing sets periodically. No over-training is seen, which would be characterised as a large deviation between training and testing loss.
The performance of the final trained algorithm can be seen in figure 6. The training took between 50 and 100 run-throughs of the input data to converge, depending on the training. In the event selection, both NNs (low-and high-E T ) are evaluated in every clean jet of every signal sample. This provides a better selection than using only the low-E T (high-E T ) NN in the low-mass (high-mass) signal samples because of the residual presence of high-p T jets in the low-mass samples and low-p T jets in the high-mass samples.

Event selection
A per-event boosted decision tree (BDT) is designed with the foremost objective of discriminating BIB events from signal events where at least two jets have a considerable displacement. As explained at the beginning of this section, the CalRatio triggers contain an algorithm to reject BIB. However, this algorithm is not 100% effective, and hence some BIB contamination is expected in the main dataset. The BIB data sample contains SM multijet events on top of the BIB event that caused them to be selected by the trigger. Consequently, even if no multijet sample is explicitly used in the training, the per-event BDT is able to discriminate signal from BIB as well as from multijet background. A combination of signal samples in the HS model is used to train the per-event BDT against BIB data. As in the NN training, only part of each signal sample is used for training, leaving the other part for interpretation. Four CalRatio jet candidates are defined as the two clean jets with the highest low-E T signal-NN-score and the two clean jets with the highest high-E T signal-NN-score in the event (jet sig 1l , jet sig 2l , jet sig 1h , jet sig 2h ). Analogously, four BIB jet candidates are defined as the two clean jets with the highest low-E T BIB-NN-score and the two clean jets with the highest high-E T BIB-NN-score in the event (jet BIB 1l , jet BIB 2l , jet BIB 1h , jet BIB 2h ). The NN outputs of these jets are used as input variables for the per-event BDTs. Other event-level variables are used in the training: examples include the distance ∆R between the two high-E T CalRatio jet candidates (jet sig 1h , jet sig 2h ) and H miss T /H T , where H T is the scalar sum of the transverse momenta of the jets.
Following the same strategy as used for the per-jet NN and to obtain optimal signal-tobackground discrimination at any jet p T , two versions of the per-event BDT are trained: the high-E T BDT is trained for the analysis of the high-mass (m Φ = 400 to 1000 GeV) signal samples, with a high jet-p T distribution, and the low-E T BDT is trained for low-mass (m Φ = 125 to 200 GeV) signal samples, with a softer jet-p T distribution. The two versions use the same set of input variables and the same BIB background sample. They differ only in the signal samples used during training. Only signal events where at least one of the LLPs decays after passing through the inner detector are considered in the training. Figure 7 shows the distribution of the per-event BDT outputs from five signal samples, as well as from the main data and BIB data. The BDT output in the BIB dataset peaks at slightly lower values than in the main data. Using the jet's time, measured as the energy- Taking this separation between BIB and multijets into account, the per-event BDT serves two purposes in the analysis: first, it is used as part of the event cleaning described below in this section to reject BIB; second, it is used in the data-driven background estimation in the search region (see section 5).
The simulated distributions of the per-jet NN and other inputs to the per-event BDTs show mostly good agreement with data. A dedicated procedure to evaluate the impact of residual mismodelling of NN and BDT input variables on the signal efficiency is described in section 6.
Two selections are defined, referred to as the high-E T selection and the low-E T selection, which are optimised to give maximum sensitivity for high-E T models and low-E T models, respectively.
Event-cleaning selections are applied to remove as much BIB and detector-related background as possible: the per-event BDT output must satisfy low-E T BDT > 0.05 and high-E T BDT > 0.0 in the low-E T and high-E T selections, respectively; trigger matching, where at least one of the CalRatio jet candidates must be matched to the jet that fired the trigger and fulfil the requirements applied at trigger level; a timing window of −3 < t < 15 ns, where t is the jet time for any of the CalRatio jet candidates or BIB jet candidates (this selection helps to remove remaining BIB jets and jet candidates produced by detector noise while retaining signal jets originating from relatively slowly moving LLPs with timing up to 15 ns); a veto on events where one of the CalRatio jet candidates falls in the transition region between the ECal barrel and endcaps (1.45 < |η| < 1.55), where poor coverage by the electromagnetic calorimeter produces jets with artificially low fractions of energy in the ECal; and a veto on events where one of the CalRatio jet candidates or BIB jet candidates has log 10 (E H /E EM ) < −1.5. These requirements ensure that in the final selection, the only remaining source of background is multijet events, as described in section 5.
The final selections are optimised to maximise the signal-to-background ratio in each search region. Variables with good signal-to-background discrimination at event level are used, such as H miss T /H T and the product of the NN signal scores of the two relevant CalRatio jet candidates. The quantity H miss T /H T has high values for BIB events but it has a softer distribution for signal. Hence it is a good discriminator, especially in the high-p T regime. Making a selection based on the product of the NN signal scores of the two relevant CalRatio jet candidates helps to further reduce the background while keeping most signal events where one of the LLPs decays in the ATLAS calorimeters and the other one decays between the IP and the outer edge of the calorimeters. The low-E T NN product is defined as the product of the two highest low-E T NN signal-scores for clean jets in a given event. The high-E T NN product is defined analogously. The requirements shown in table 2 are applied for the low-E T and high-E T selections.
The signal efficiency for signal events to pass the low-and high-E T selections depends on the momenta, decay positions and decay products of the two LLPs in the event. For

Background estimation
A data-driven ABCD method is used to estimate the contribution from the dominant background (SM multijet events) to the final selection. The ABCD method relies on the assumption that the distribution of background events can be factorised in the plane of two relatively statistically independent variables. In this plane, the method uses three control regions (B, C and D) to estimate the contribution of background events in the search region (A). In the case of no signal contamination in regions B, C and D, the number of background events in region A can be predicted using N A = (N B · N C )/N D , where N X is the number of background events in region X. In reality, there is non-zero signal contamination in the control regions. This is accounted for by using a modified ABCD method, which involves fitting to background and signal models simultaneously. The background component of the yield in each of the regions A, B, C and D is constrained to obey the standard ABCD relation, within the bounds of the ABCD method uncertainty (described below), while a signal strength parameter uniformly scales the signal yield in each region. Events passing the high-E T and low-E T selections defined in the previous section are divided into four subregions according to two variables: ∆R min (jet, tracks) and high-E T BDT or low-E T BDT, depending on the selection. The variables are uncorrelated (Pearson correlation coefficient |r| < 0.03 in the main dataset after the event cleaning, with additional tests for correlations described below) and have good separation between signal and multijet background, as shown in figure 8.
Region A is defined by high-E T BDT ≥ 0.36 and ∆R min ≥ 1.5 for the high-E T analysis and by low-E T BDT ≥ 0.27 and ∆R min ≥ 1.0 for the low-E T analysis. Regions B, C, and D are obtained by reversing one or both of these selections.
In cases where more than one background population is included in the final selection, the condition that the two variables defining the plane are statistically independent is only guaranteed if their contributions have the same shape in the ABCD plane. In this search there are three major sources of background (BIB, cosmics and SM multijets) with different distributions in the plane, which results in the two variables defining it having some correlation. It is therefore necessary to make sure that the selection above has a high rejection power for two of the three sources. Specifically, the contribution from BIB and cosmics in the ABCD plane must be negligible after the object and event selection requirements, leaving only the contribution of SM multijet events to be estimated by the ABCD method.
Two checks were performed to confirm that the contribution of background events from non-collision background is negligible after the selection.

JHEP06(2022)005
First, the number of events satisfying each stage of the selection for the main dataset and the BIB dataset is shown in table 3 for the high-E T and low-E T selections, along with the fraction of signal events passing each cut for several benchmark samples. For both the high-E T and low-E T selections, the number of BIB events satisfying all selection criteria is well within the uncertainty in the number of events passing all selections in the main dataset. Considering that the efficiency of BIB identification in this dataset is approximately 70%, the BIB contamination in the main dataset can be considered negligible. Furthermore, the events from the BIB dataset that pass the selection were checked and found to display properties of multijet events. In particular, their φ and z vs time distributions do not show the typical shape of BIB. The events from the main dataset that pass the event cleaning were also checked and were found not to display the properties of BIB.
The second check is to confirm that the contamination from cosmic rays in the ABCD plane is negligible. This is done using the cosmics dataset. The number of events in this dataset passing the full selections is checked after weighting by the following two factors. The first factor takes into account the difference between the number of bunch crossings where protons are present in both beams while the CalRatio triggers were enabled, and the number of empty bunch crossings, during which the cosmics dataset was collected. The protons-toempty live-time ratio depends on the beam conditions, and its value lies in the range 2 to 3.5. The second factor takes into account the fact that events in the cosmics dataset will have no related collision activity, and therefore will be largely trackless: events in the cosmics dataset will be far more likely than events from the main dataset to pass the requirements on jet ∆R min at the HLT and in the analysis preselection. The second factor is calculated as the ratio of the number of events entering the ABCD plane in the main dataset to the number passing the selection if all tracks in the event are ignored. This factor is <0.1 in all data-taking periods. The final estimated number of events in the ABCD plane is 2.5 ± 0.8 (none of which are in region A) in the high-E T selection and 1.8 ± 0.5 (0.3 ± 0.2 in region A) in the low-E T selection. These yields are negligible in comparison with the expected number of data events in the ABCD plane in the main dataset, which is over 300 in both selections.
The validity of the ABCD method is tested by applying it to a number of validation regions (VRs), which are orthogonal to region A. A first set of validation regions is defined using the nominal event selections but looking only into part of regions B, C and D. Restricting the VR ABCD plane to intermediate values of the BDT output (using part of nominal regions C and D) allows a test of the background estimation method in the whole ∆R min range in the absence of signal contamination. Likewise, restricting the VR ABCD plane to low values of ∆R min and allowing any BDT output value (using part of nominal regions B and D) permits a test of the method at large values of the BDT output. As an example, the nominal high-E T event selection is validated in region VRCD high-ET , defined using the nominal high-E T selection but restricted to the range 0.0 < high-E T BDT < 0.3 and 0.5 < ∆R min < 5.0. First, the ABCD method is tested by dividing this region into four subregions defined by the boundaries of high-E T BDT = 0.15 and ∆R min = x, where the value of x is varied from 1 to 4. Then the ABCD method is tested again in this same region by setting the ∆R min (jet, tracks) boundary at 1.5 and allowing the high-E T BDT boundary to take values from 0.05 to 0.25.   Table 3. Sequential impact of each requirement on the number of events passing the selection for the high-E T (top table) and low-E T (bottom table) selections. The signal columns represent the cumulative fraction of events passing the selection than the number of events.
The ABCD plane defined in this VR can be seen in figure 9 for VRCD high-ET and the low-E T selection, VRCD low-ET , along with the level of agreement between the expected and observed numbers of events in region A in all the tested boundary selections. It should be noted that the tests in each VR are statistically correlated. Therefore, statistical fluctuations can affect several tests.
A second set of VRs is defined by inverting the selection on the product of the signal NN scores of the two CalRatio jets in the analysis selections. Given that signal contamination is low in all the validation regions, the plain ABCD method assuming no signal is applied. The statistical precision of this closure test is better than the final statistical uncertainty of the number of events observed in region A. Good closure, within one standard deviation, is observed in all the validation regions defined in this section.

Systematic uncertainties
The uncertainty in the data-driven ABCD method for the background estimate is studied in the dijet control region. In this control region, an alternative ABCD plane is defined using the same variables as in the analysis, but adjusting the boundaries in regions A, B, C and D to reduce the effect of statistical fluctuations in the estimation of the number of dijet events in region A by this method. The observed number of events in region A is compared with the estimate given by the ABCD method for both the high-E T and the low-E T ABCD planes. In all the tests performed, the observed and expected numbers of events agree within the statistical uncertainties. This confirms that the two variables -21 -

JHEP06(2022)005
forming the plane are statistically independent and therefore no systematic uncertainty is added for the background estimation method.
The uncertainty in the combined 2015-2018 integrated luminosity is 1.7% [67], obtained using the LUCID-2 detector [68] for the primary luminosity measurements.
Events in MC simulation are reweighted to obtain the correct pile-up distribution. The uncertainty in the pile-up reweighting of the reconstructed events in the MC simulation is estimated by comparing the distribution of the number of primary vertices in the MC simulation with the distribution in data as a function of the instantaneous luminosity. Differences between these distributions are reduced by scaling the mean number of pp interactions per bunch crossing in the MC simulation and the ±1σ uncertainties are assigned to these scaling factors [67,68]. The effect on the signal event yields varies between 2% and 8% depending on the signal model.
The jet energy scale and jet energy resolution introduce uncertainties in the signal yield of <1% to 5% and 1% to 6%, respectively, depending on the signal model. These uncertainties are calculated using the procedure detailed in ref. [69]. Since the jets used in this analysis are required to have a low fraction of calorimeter energy in the ECal, the jet energy uncertainties are rederived as a function of ECal energy fraction as well as η. The additional jet energy uncertainties are found to have an effect of 1% to 6% on the signal yield, and are conservatively taken in quadrature with the regular jet-energy uncertainties. The low-E T models are more sensitive to all jet energy uncertainties than the high-E T models.
The uncertainty in the signal trigger efficiency is estimated by studying how well modelled the three most important trigger variables (jet E T , log 10 (E H /E EM ), and p T of tracks within the jet) are between HLT-reconstructed quantities and offline-reconstructed quantities in data and MC simulation. A tag-and-probe technique is applied to a pure sample of multijet events obtained using standard jet triggers in both data and MC simulation. Scale factors that represent the degree of mismodelling in each variable are derived and then applied in an emulation of the CalRatio triggers. The change in yield relative to the nominal (unscaled) trigger emulation after the full analysis selection is taken as the size of the systematic uncertainty, which is between 1% and 7% depending on the signal model.
A systematic uncertainty is included to account for potential mismodelling of input variables used in the machine-learning techniques applied in the analysis. Using the same control sample of dijet events defined for the evaluation of the systematic uncertainty in the data-driven background estimate, the distributions of the inputs and outputs of the per-jet NNs and the per-event BDTs are studied. They are found to agree fairly well between data and MC simulation. The residual differences are translated into a systematic uncertainty in the signal efficiency, using the following procedure. For each mis-modeled variable, the residual differences are quantified through a transfer factor between simulation and data. In an ensemble of pseudo-experiments, the NN and BDT input variables for each signal event are varied by this transfer factor; in each pseudo-experiment the transfer factor is modulated by a random Gaussian, with a mean of zero and a width determined by measurements in the control region. The final per-event BDTs are then re-evaluated, and the overall signal -22 -JHEP06(2022)005 efficiency of the sample is evaluated for each pseudo-experiment. This sequence of steps allows for the statistical determination of the systematic uncertainty in the final signal efficiency. The value of the resulting uncertainty can thus be obtained from the distributions of efficiencies for the ensemble, and can be as large as 6% depending on the signal model.
Finally, an uncertainty due to the NLO-reweighting of the signal samples is obtained by comparing the NLO MadGraph predictions for a 125 GeV Higgs boson mediator with predictions at next-to-next-to-leading-order accuracy in QCD from Powheg Box v2 [70][71][72][73][74]. This results in an additional signal efficiency uncertainty of 1%-7% for most samples.

Results and statistical interpretation
A data-driven background estimation and signal hypothesis test is performed simultaneously in all regions. The procedure for the simultaneous fit is explained in detail in the previous iteration of this analysis [37] and is summarised in the following. A profile likelihood function is constructed from the product of the probabilities of observing a given number of events in each region of the ABCD plane, given the expected number of events in that region. This expected number of events is given by the sum of the predicted signal yield in that region (scaled by a parameter of interest called the signal strength) and the expected number of background events. The background component of the expected yield in each region is constrained to satisfy the ABCD relation introduced in section 5. The introduction of a signal component will therefore dynamically modify the remaining allowed background prediction in this set-up. Additionally, a nuisance parameter which represents the total uncertainty in the signal efficiency, and consequently the signal yield, is introduced.
The background estimates before (a priori) and after (a posteriori) unblinding region A are shown in table 4. After unblinding, the modified ABCD method takes account of the observed number of events in region A when making the final fit. As a consequence, the background estimate changes relative to the a priori case, since the fit now uses all available information to determine the expected background in each region. As shown in the table, there is a slight excess in the observed number of events in region A over the a priori expected background in both the high-and low-E T selections. In the a priori fit, these excesses correspond to background-only hypothesis p-values of 0.083 and 0.076 in the high-E T and low-E T selections, respectively. The significance is reduced to slightly under one standard deviation in the a posteriori (background-only fit) estimate.
The CL s method [75], using the "alternative test statistic"q [76] is used to set upper limits on the production cross-section times BR of the LLP signals considered. The pyhf [77,78] framework is used to implement the likelihood function and extract the upper limits. An asymptotic approximation [76] is used for these results. This approximation was tested against the full frequentist pseudo-experiment-based method for a variety of signal samples and was found to give consistent limits. Since each signal sample was generated for a given lifetime assumption, it is necessary to extrapolate the limits across lifetimes. This extrapolation is performed using a reweighting method, which is described in detail in the previous iteration of this analysis [37] and summarised briefly here. LLPs follow an exponential decay distribution, where the decay constant is given by the mean lifetime τ gen -23 -

JHEP06(2022)005
High-E T selection A B C D · N bkg C )/N bkg D relation. The a posteriori estimate refers to the 'post-unblinding' case, including the observed data in region A in the background-only global fit, obtained by fixing the signal strength to 0 (background-only fit) or allowing it to float (signalplus-background fit). The table also shows one set of representative signal yields in each selection for the signal-plus-background fit. Only statistical uncertainties are included in the quoted error of the background, while the uncertainties in the signal include those from both statistical and experimental sources. in a given sample. To extrapolate to a different mean proper lifetime τ new , a weight w is calculated as a function of the given particle's proper decay time t: The per-event weight is the product of the weights obtained for each LLP in a given event. Figure 10 shows a summary of the observed limits for the samples where the mediator is assumed to be the SM Higgs boson, with mass 125 GeV. SM Higgs boson BRs to long-lived neutral scalars above 10% are excluded for cτ values between a few centimetres and about 20 metres, depending on the mass of the LLP. Figure 11 shows a summary of observed limits  The analysis improves on the limits from the previous iteration of the analysis for many models. The improvements are typically between a factor of two and a factor of five. The low-E T models are the ones for which the limits are most improved. The previous analysis used only a fraction of the 13 TeV dataset: 11 fb −1 and 33 fb −1 for the analysis of low-E T and high-E T models, respectively. Part of the gain is therefore explained by the increase in the integrated luminosity of the data sample. The rest of the gain, particularly for the low-E T selection, comes from improvements in the low-E T trigger and in the displaced-jet identification.   In this paper, an analysis targeting pair-produced neutral long-lived particles decaying in the ATLAS calorimeters is presented. The analysis used the full LHC Run 2 dataset of 13 TeV proton-proton collisions, the size of which is 139 fb −1 . The analysis is structured in a very similar way to the one described in a previous paper using only the 2016 dataset but has several improvements, including a new method to select the most signal-like displaced jets with the help of a deep neural network trained using an adversary network. No significant excess of events in the signal region is observed relative to the data-driven background prediction. The CL s method is used to set 95% CL limits on the cross-section times branching ratio as a function of c times the long-lived particle mean proper lifetime. Limits are set on hidden-sector benchmark models with mediator masses ranging from 60 GeV to 1 TeV and long-lived scalar masses ranging from 5 GeV to 475 GeV. The improvements to the analysis and additional data lead to an improvement on the limits for mediator masses above and below 200 GeV by a factor of around 1.5-2 and 3-5, respectively, compared with the previous version of the analysis. For models with a SM Higgs boson mediator, branching ratios to neutral scalars above 10% are excluded for long-lived particle mean proper lifetimes times c between approximately 20 mm and 10 m, depending on the model.
Open Access. This article is distributed under the terms of the Creative Commons Attribution License (CC-BY 4.0), which permits any use, distribution and reproduction in any medium, provided the original author(s) and source are credited.   -31 -