Search for heavy resonances decaying into a pair of $Z$ bosons in the $\ell^+\ell^-\ell'^+\ell'^-$ and $\ell^+\ell^-\nu\bar\nu$ final states using 139 fb$^{-1}$ of proton-proton collisions at $\sqrt{s} = 13$ TeV with the ATLAS detector

A search for heavy resonances decaying into a pair of $Z$ bosons leading to $\ell^+\ell^-\ell'^+\ell'^-$ and $\ell^+\ell^-\nu\bar\nu$ final states, where $\ell$ stands for either an electron or a muon, is presented. The search uses proton-proton collision data at a centre-of-mass energy of 13 TeV collected from 2015 to 2018 that corresponds to the full integrated luminosity of 139 fb$^{-1}$ recorded by the ATLAS detector during Run 2 of the Large Hadron Collider. Different mass ranges spanning 200 GeV to 2000 GeV for the hypothetical resonances are considered, depending on the final state and model. In the absence of a significant observed excess, the results are interpreted as upper limits on the production cross section of a spin-0 or spin-2 resonance. The upper limits for the spin-0 resonance are translated to exclusion contours in the context of Type-I and Type-II two-Higgs-doublet models, and the limits for the spin-2 resonance are used to constrain the Randall--Sundrum model with an extra dimension giving rise to spin-2 graviton excitations.


Introduction
The discovery of a scalar particle by the ATLAS and CMS collaborations [1,2] in 2012, with measured properties [3-7] consistent with those of the Standard Model (SM) [8-10] Higgs boson, was a major milestone in the understanding of electroweak symmetry breaking [11][12][13]. One important question is whether the discovered particle is part of an extended scalar sector as postulated by various extensions to the Standard Model such as the two-Higgs-doublet model (2HDM) [14]. These extensions predict additional Higgs bosons, motivating searches in an extended mass range. This paper reports on two searches for heavy resonances decaying into two SM bosons, encompassing the final states produced from the subsequent → ℓ + ℓ − ℓ + ℓ − and → ℓ + ℓ −¯d ecays, where ℓ stands for either an electron or a muon and stands for all three neutrino flavours. The data employed were recorded by the ATLAS detector between 2015 and 2018 in proton-proton collisions at √ = 13 TeV and correspond to an integrated luminosity of 139 fb −1 . The additional Higgs boson (spin-0 resonance), denoted by throughout this paper, is assumed to be produced mainly via gluon-gluon fusion (ggF) and vector-boson fusion (VBF) processes with the ratio of the two production mechanisms unknown in the absence of a specific model. The results are interpreted separately for the ggF and VBF production modes, with events being classified into ggF-and VBF-enriched categories in both final states, as discussed in Sections 5 and 6. The searches cover a wide mass range from 200 GeV up to 2000 GeV and look for an excess in the distribution of the the four-lepton invariant mass, 4ℓ , for the ℓ + ℓ − ℓ + ℓ − final state, and the transverse mass, T , for the ℓ + ℓ −¯fi nal state, as the escaping neutrinos do not allow the full reconstruction of the final state. This mass range is chosen based on the sensitivity of the analysis as determined by the selection criteria and the size of the data sample.
The transverse mass is defined as: where is the mass of the boson [15], ì T ℓℓ and ì miss T are the transverse momentum of the lepton pair and the missing transverse momentum with magnitudes of ℓℓ T and miss T , respectively. In the absence of such an excess, limits on the production rate of different signal hypotheses are obtained from a simultaneous likelihood fit in the two final states. The hypothesis of a heavy Higgs boson in the narrow-width approximation (NWA) is studied. The upper limits on the production rate of a heavy Higgs boson are also translated into exclusion contours in the context of the two-Higgs-doublet model. As several theoretical models favour non-negligible natural widths, large-width assumption (LWA) models [14], assuming widths of 1%, 5%, 10% and 15% of the resonance mass, are examined only for ggF production, which dominates in many scenarios over the next-largest contribution (VBF) in the search range. Results are also interpreted assuming the bulk Randall-Sundrum (RS) model [16,17] with a warped extra dimension giving rise to a spin-2 Kaluza-Klein (KK) excitation of the graviton KK .
The main improvements relative to the previous search [18] are the following: i) full LHC Run 2 integrated luminosity is used; ii) both analyses profit from improved lepton reconstruction and isolation selection to mitigate the impact of additional pp interactions in the same or neighbouring bunch crossing (pile-up); iii) the reconstruction of jets uses a particle-flow algorithm which combines measurements from the tracker and the calorimeter; iv) the normalisation of the SM background is derived from data rather than being estimated from SM predictions; v) event classification targeting different production processes is optimised using machine learning (ML) algorithms in the case of → ℓ + ℓ − ℓ + ℓ − final state; vi) the T distribution is used to search for signals in the VBF-enriched category in the case of the → ℓ + ℓ −¯fi nal state, in addition to the use of T in the ggF-enriched category; and vii) the search range is extended to 2000 GeV in signal mass. The improved analyses reduce the expected upper limit on the production cross section of an additional heavy resonance by up to 40% in comparison with the previous published result scaled to the full Run 2 luminosity. Results of a similar search from a subset of data collected at the LHC with √ = 13 TeV have been reported by the CMS Collaboration in Ref. [19].
The paper is organised as follows. A brief description of the ATLAS detector is given in Section 2.
In Section 3 the data and simulated samples are described. The object reconstruction is described in Section 4. The analysis strategies for the ℓ + ℓ − ℓ + ℓ − and ℓ + ℓ −¯fi nal states are described in Sections 5 and 6, respectively. Section 7 describes the systematic uncertainties, Section 8 the final results, and Section 9 the interpretation of these results in the various models.

ATLAS detector
The ATLAS experiment is described in detail in Ref. [20]. ATLAS is a multipurpose detector with a forward-backward symmetric cylindrical geometry and a solid-angle 1 coverage of nearly 4 . The inner tracking detector (ID), covering the region | | < 2.5, consists of a silicon pixel detector, a silicon microstrip detector, and a transition-radiation tracker. The innermost layer of the pixel detector, the insertable B-layer [21], was installed between Run 1 and Run 2 of the LHC. The inner detector is surrounded by a thin superconducting solenoid providing a 2 T magnetic field, and by a finely segmented lead/liquid-argon (LAr) electromagnetic calorimeter covering the region | | < 3.2. A steel/scintillator-tile hadron calorimeter provides coverage in the central region | | < 1.7. The endcap and forward regions, covering the pseudorapidity range 1.5 < | | < 4.9, are instrumented with LAr electromagnetic and hadron calorimeters, with steel, copper, or tungsten as the absorber material. A muon spectrometer (MS) system incorporating large superconducting toroidal air-core magnets surrounds the calorimeters. Three layers of precision wire chambers provide muon tracking in the range | | < 2.7, while dedicated fast chambers are used for triggering in the region | | < 2.4. The trigger system, composed of two stages, was upgraded [22] before Run 2. The first stage, implemented with custom hardware, uses information from the calorimeters and muon chambers to select events from the 40 MHz bunch crossings at a maximum rate of 100 kHz. The second stage, called the high-level trigger (HLT), reduces the data acquisition rate to about 1 kHz on average. The HLT is software-based and runs reconstruction algorithms similar to those used in the offline reconstruction.

Data and simulation
The proton-proton ( ) collision data used in these searches were collected by the ATLAS detector at a centre-of-mass energy of 13 TeV with a 25 ns bunch-spacing configuration from 2015 to 2018. The data are subjected to quality requirements: if any relevant detector component was not operating correctly during the period in which an event was recorded, the event is rejected. The efficiency for recording good-quality data during Run 2 is 95.6% [23].
Simulated events are used to determine the signal acceptance and some of the background contributions.
The events produced by each Monte Carlo (MC) event generator were processed through the ATLAS detector simulation [24] within the G 44 framework [25]. Additional inelastic interactions were overlaid on the simulated signal and background events. The MC event generator used for pile-up is P 8.186 [26] with the A2 set of tuned parameters [27] and the MSTW2008LO [28] parton distribution 1 The ATLAS experiment uses a right-handed coordinate system with its origin at the nominal interaction point (IP) in the centre of the detector and the -axis along the beam pipe. The -axis points from the IP to the centre of the LHC ring, and the -axis points upward. Cylindrical coordinates ( , ) are used in the transverse plane, being the azimuthal angle around the -axis. The pseudorapidity is defined in terms of the polar angle as = − ln tan( /2). function (PDF) set. The simulated events are weighted to reproduce the observed distribution of the mean number of interactions per bunch crossing in data (pile-up reweighting).
Heavy spin-0 resonance production was simulated using the P B v2 [29] MC event generator. The gluon-gluon fusion and vector-boson fusion production modes were simulated separately, with matrix elements calculated to next-to-leading-order (NLO) accuracy in quantum chromodynamics (QCD). P B was interfaced to P 8.212 [30] for parton showering and hadronisation with the AZNLO set of tuned parameters [31], and for decaying the Higgs boson into the → → ℓ + ℓ − ℓ + ℓ − or → → ℓ + ℓ −¯fi nal states. The event generator was interfaced to the E G v1.2.0 program [32] for the simulation of bottom and charm hadron decays. The leading-order (LO) CT10 PDF set [33] was used for the hard-scattering process. Events from ggF and VBF production were generated in the resonance mass range of 300 GeV to 2000 GeV in the NWA, using a step size of 100 GeV up to 1000 GeV and 200 GeV above. For the ℓ + ℓ − ℓ + ℓ − final state, due to the sensitivity of the analysis at lower masses, events were also generated for = 200 GeV. In addition, events from ggF heavy Higgs production with a width of 15% of the Higgs boson mass were generated at NLO accuracy in QCD with M G 5_ MC@NLO v2.3.2 [34], which was interfaced to P 8.210 for parton showering and hadronisation with the A14 set of tuned parameters (A14 tune) [35], and for decaying the Higgs boson into the two leptonic final states. The properties of bottom and charm hadron decays were simulated by E G v1.2.0. Events were generated in the resonance mass range of 400 GeV to 2000 GeV using a step size of 100 (200) GeV up to (above) 1000 GeV. Similarly, events with a width of 5% or 10% of = 900 GeV were generated for validating the analytic parametrisation of the 4ℓ distribution used in the ℓ + ℓ − ℓ + ℓ − final state as described in Section 5.3. For the ℓ + ℓ −¯fi nal state, a reweighting procedure as described in Section 6.3 is used on fully simulated events to obtain the reconstructed T distribution at any value of mass and width tested.
Spin-2 Kaluza-Klein gravitons from the bulk Randall-Sundrum model [17,36] were generated with M G 5_ MC@NLO at LO accuracy in QCD with the NNPDF2.3 LO PDF set with s = 0.130 [37], which is then interfaced to P 8.210 for parton showering and hadronisation with the A14 tune and for decaying the heavy resonance into the two leptonic final states. The properties of bottom and charm hadron decays were simulated by E G v1.2.0. The dimensionless coupling / Pl , where 8 is the reduced Planck scale and is the curvature scale of the extra dimension, is set to 1. The width of the resonance is correlated with the coupling / Pl and in this configuration it is around ∼ 6% of its mass. Mass points between 600 GeV and 2 TeV with 200 GeV spacing were generated for both final states.

The¯→
background was simulated by the S v2.2.2 [38] generator, in which the NNPDF3.0 NNLO PDF set [37] was used for the hard-scattering process, achieving NLO accuracy in the matrix-element calculation for 0-and 1-jet final states and LO accuracy for 2-and 3-jet final states with the C [39] and O L [40][41][42] matrix-element generators. The merging with the S parton shower [43] was performed using the MEPS@NLO prescription [44]. NLO electroweak (EW) corrections were applied as a function of 4ℓ for the ℓ + ℓ − ℓ + ℓ − final state [45,46], and as a function of the transverse momentum of the boson that decays into two neutrinos for the ℓ + ℓ −¯fi nal state [40,[47][48][49][50]. The EW production of a pair and two additional jets via vector-boson scattering up to ( 6 EW ) was generated using S v2.2.2 for both the ℓ + ℓ − ℓ + ℓ − and ℓ + ℓ −¯fi nal states, where the process → 4ℓ is also taken into account. In addition, the diboson events from both QCD and EW production, with the subsequent leptonic decays of both the and bosons, were simulated by S with a similar set-up. The events with boson decaying leptonically and boson decaying hadronically were modelled with S v2.2.1.
The → process was modelled by S v2.2.2 at LO accuracy in QCD for both final states, including the off-shell SM ℎ boson contribution and the interference between the ℎ and processes. The higher-order correction factor accounting for up to NLO accuracy in QCD for the → continuum production was calculated for massless quark loops [51][52][53] in the heavy-top-quark approximation [54], including the → ℎ * → process [55]. Based on these studies, a constant factor of 1.7 is used, and a relative uncertainty of 60% is assigned to the normalisation in both searches.
For the ℓ + ℓ −¯fi nal state, the contribution from production was removed in the S simulation of the¯→ and → processes by requiring the charged leptons and the neutrinos to have different lepton flavours. The¯→ and → processes were then modelled with P B v2 and S v2.2.2, respectively. The interference between and production is expected to be negligible [48] and is therefore not considered.
Events containing a single boson with associated jets were simulated using the S v2.2.1 event generator. Matrix elements were calculated for up to two partons at NLO and four partons at LO using the C and O L matrix-element generators and merged with the S parton shower using the MEPS@NLO prescription. The NNPDF3.0 NNLO PDF set was used in conjunction with dedicated parton-shower tuning developed by the S authors. The + jets events are normalised using the NNLO cross sections [56].
The triboson backgrounds , , and with fully leptonic decays and at least four prompt charged leptons were modelled using S v2.2.2 with LO accuracy of the QCD calculations and the CT10 PDF set. The simulation of¯+ production ( = or ) with both top quarks decaying semileptonically and the vector boson decaying inclusively was performed with M G 5_ MC@NLO interfaced to P 8.210 for parton showering and hadronisation with the A14 tune and to E G v1.2.0 for the simulation of bottom and charm hadron decays. The total cross section is normalised to the prediction of Ref. [57], which includes the two dominant terms at both LO and NLO in a mixed perturbative expansion in the QCD and EW couplings. The¯background, as well as single-top and production, were modelled using P B v2 interfaced to P 8.230 with the A14 tune and to E G v1.6.0 for the simulation of bottom and charm hadron decays.
In order to study the interference treatment for the LWA case, samples containing the → continuum background (B) as well as its interference (I) with a hypothetical heavy Higgs signal (S) were used and are referred to as SBI samples hereafter. In the ℓ + ℓ − ℓ + ℓ − final state the MCFM NLO event generator [58], interfaced to P 8.212, was used to produce SBI samples where the width of the heavy scalar is set to 15% of its mass, for masses of 200, 300, 400, 500, 600, 800, 1000, 1200, and 1400 GeV. Background-only samples were also generated with the MCFM event generator, and are used to extract the signal-plusinterference term (SI) by subtracting them from the aforementioned SBI samples. For the ℓ + ℓ −¯fi nal state, the SBI samples were generated with the 2VV event generator [59,60]. The samples include signal events with a scalar mass of 400, 700, 900, 1200, and 1500 GeV.

Event reconstruction
Electron reconstruction uses a dynamic, topological calorimeter-cell clustering-based approach which allows improved measurement of the electron energy, particularly in situations where an electron radiates a bremsstrahlung photon; details can be found in Ref. [61]. Electron candidates are clusters of energy deposits in the calorimeter associated with ID tracks, where the final track-cluster matching is performed after the tracks have been fitted with a Gaussian-sum filter (GSF) [62] to account for bremsstrahlung energy losses. The electron's transverse momentum is computed from the cluster energy and the track direction at the interaction point. Background rejection relies on the longitudinal and transverse shapes of the electromagnetic showers in the calorimeters, track-cluster matching, and properties of tracks in the ID. All of this information, except for that related to track hits, is combined into a likelihood discriminant. The selection combines the likelihood with the number of track hits and defines several working points (WP). Selected electrons have T > 4.5 GeV and | | < 2.47. The ℓ + ℓ − ℓ + ℓ − analysis uses a 'loose' WP, with an efficiency of at least 90% for electrons with T > 30 GeV [63]. The 'medium' WP (with an efficiency about 85% for electrons with T > 30 GeV) is adopted to select candidate electrons in the ℓ + ℓ −¯a nalysis.
Muons are formed from tracks reconstructed in the ID and MS, and their identification is primarily based on the presence of the track or track segment in the MS [64]. If a complete track is present in both the ID and the MS, a combined muon track is formed by a global fit using the hit information from both the ID and MS detectors (combined muon); otherwise the momentum is measured using the ID, and the MS track segment serves as identification (segment-tagged muon). The segment-tagged muon is limited to the centre of the barrel region (| | < 0.1) which has reduced MS geometrical coverage. Furthermore, in this central region an ID track with T > 15 GeV is identified as a muon if its calorimetric energy deposition is consistent with a minimum-ionising particle (calorimeter-tagged muon). In the forward region (2.5 < | | < 2.7) with limited or no ID coverage, the MS track formed out of three MS layers is either used alone (stand-alone muon) or combined with silicon-detector hits, if found in the forward ID (combined muon). The ID tracks associated with the muons are required to have at least a minimum number of associated hits in each of the ID subdetectors to ensure good track reconstruction. The minimum T for muon candidates is 5 GeV, while the maximum | | is 2.7. A 'loose' muon identification WP, which uses all muon types, is adopted by the ℓ + ℓ − ℓ + ℓ − analysis. This criterion has an efficiency of at least 98% [64] for isolated muons with T = 5 GeV and rises to 99.5% at higher T . For the ℓ + ℓ −¯a nalysis a 'medium' WP is used, which only includes combined muons and has an efficiency of 98%.
The reconstruction of jets uses a particle-flow algorithm [65] which combines measurements from both the tracker and the calorimeter. The energy deposited in the calorimeter by all charged particles is removed, and the jet reconstruction is performed on an ensemble of 'particle-flow objects' consisting of the remaining calorimeter energy and tracks which are matched to the hard interaction. This improves the accuracy of the charged-hadron measurement, while retaining the calorimeter measurements of neutral-particle energies. Compared to only using topological clusters [66], jets reconstructed with the particle-flow algorithm with T of about 30 GeV have approximately 10% better transverse momentum resolution. The two different algorithms have similar resolutions for T above 100 GeV. Particle-flow jets are reconstructed using the anti-algorithm [67] with a radius parameter = 0.4. The jet four-momentum is corrected for the calorimeter's non-compensating response, signal losses due to noise threshold effects, energy lost in non-instrumented regions, and contributions from pile-up [68]. The jets used are required to satisfy T > 30 GeV and | | < 4.5. Jets from pile-up with | | < 2.5 are suppressed using a jet-vertex-tagger multivariate discriminant [69,70].
Jets containing -hadrons, referred to as -jets, are identified by the long lifetime, high mass, and decay multiplicity of -hadrons, as well as the hard -quark fragmentation function. The ℓ + ℓ −¯a nalysis identifies -jets of T > 20 GeV and | | < 2.5 using an algorithm that achieves an identification efficiency of about 85% in simulated¯events, with a rejection factor for light-flavour jets of about 30 [71].
Selected events are required to have at least one vertex having at least two associated tracks with T > 500 MeV, and the primary vertex is chosen to be the vertex reconstructed with the largest 2 T of its associated tracks. As lepton and jet candidates can be reconstructed from the same detector information, a procedure to resolve overlap ambiguities is applied. In the ℓ + ℓ − ℓ + ℓ − case, the overlap ambiguities are resolved as follows. If two electrons have overlapping energy deposits, the electron with the higher T is retained. If a reconstructed electron and muon share the same ID track, the muon is rejected if it is calorimeter-tagged; otherwise the electron is rejected. Reconstructed jets geometrically overlapping in a cone of size Δ = 0.2 with electrons or muons are also removed. The overlap removal in the ℓ + ℓ −¯c ase is similar to that in the ℓ + ℓ − ℓ + ℓ − case, except for an additional criterion that removes any leptons close to the remaining jets with 0.2 < Δ < 0.4. This additional criterion is not imposed in the ℓ + ℓ − ℓ + ℓ − case due to the cleaner environment of this final state and in order to maximise the signal efficiency.
The missing transverse momentum ì miss T , which accounts for the imbalance of visible momenta in the plane transverse to the beam axis, is computed as the negative vector sum of the transverse momenta of all identified electrons, muons and jets, as well as a 'soft term', accounting for unclassified soft tracks and energy clusters in the calorimeters [72]. This analysis uses a track-based soft term, which is built by combining the information provided by the ID and the calorimeter, in order to minimise the effect of pile-up, which degrades the miss T resolution.

Event selection and categorisation
In Section 5.1.1 the four-lepton event selection is described. After this selection, events are further split into several categories, in order to probe different signal production modes, such as VBF production and ggF production. To enhance the search sensitivity to the NWA signals, multivariate classifiers are optimised for the event categorisation as described in Section 5.1.2. In order to also obtain results that are more model-independent (since the training of the multivariate classifiers is usually based on a specific signal model), a cut-based event categorisation that enhances the sensitivity in the VBF production mode is also considered and is described in Section 5.1.3.
In the search for LWA signals, due to the complexity of modelling the categorisation of the interference between heavy Higgs boson and SM Higgs boson processes, only the ggF-enriched categories of the cut-based analysis (CBA) are used. The same strategy is adopted in the search for a Kaluza-Klein graviton excitation.

Common event selection
Four-lepton events are selected and initially classified according to the lepton flavours: 4 , 2 2 , 4 , called 'channels' hereafter. They are selected using a combination of single-lepton, dilepton and trilepton triggers with different transverse momentum thresholds. The single-lepton triggers with the lowest T thresholds had tighter requirements than the high T threshold single-lepton triggers and the multilepton triggers. Due to an increasing peak luminosity, these T thresholds increased during the data-taking periods [73,74]. For single-muon triggers, the T threshold increased from 20 GeV to 26 GeV, while for single-electron triggers, the T threshold increased from 24 GeV to 26 GeV. The overall trigger efficiency for signal events passing the final selection requirements is about 98%.
In each channel, four-lepton candidates are formed by selecting a lepton-quadruplet made out of two same-flavour, opposite-sign lepton pairs, selected as described in Section 4. Each electron (muon) must satisfy T > 7 (5) GeV and be measured in the pseudorapidity range of | | < 2.47 (2.7). The highest-T lepton in the quadruplet must satisfy T > 20 GeV, and the second (third) lepton in T order must satisfy T > 15 GeV (10 GeV). In the case of muons, at most one calorimeter-tagged, segment-tagged or stand-alone (2.5 < | | < 2.7) muon is allowed per quadruplet.
If there is ambiguity in assigning leptons to a pair, only one quadruplet per channel is selected by keeping the quadruplet with the invariant mass of the lepton pairs closest (leading pair) and second closest (subleading pair) to the boson mass [15], with invariant masses referred to as 12 and 34 respectively. In the selected quadruplet, 12 must satisfy 50 GeV < 12 < 106 GeV and 34 must satisfy 50 GeV < 34 < 115 GeV.
Selected quadruplets are required to have their leptons separated from each other by Δ > 0.1. For 4 and 4 quadruplets, if an opposite-charge same-flavour lepton pair is found with ℓℓ below 5 GeV, the quadruplet is removed to suppress the contamination from / mesons. If multiple quadruplets from different channels are selected at this point, only the quadruplet from the channel with the highest signal acceptance is retained, in the order: 4 , 2 2 , 4 .
The + jets and¯background contributions are reduced by imposing impact-parameter requirements as well as track-and calorimeter-based isolation requirements on the leptons. The transverse impact-parameter significance, defined as the impact parameter calculated relative to the measured beam-line position in the transverse plane divided by its uncertainty, | 0 |/ 0 , for all muons (electrons), is required to be lower than 3 (5). The track-isolation discriminant is calculated from the tracks with T > 500 MeV that lie within a cone of Δ = 0.3 around the muon or electron and that either originate from the primary vertex or have a longitudinal impact parameter 0 satisfying | 0 sin( )| < 3 mm if not associated with any vertex. Above a lepton T of 33 GeV, the cone size falls linearly with T to a minimum cone size of 0.2 at 50 GeV. Similarly, the calorimeter isolation is calculated from the positive-energy topological clusters that are not associated with a lepton track in a cone of Δ = 0.2 around the muon or electron. The sum of the track isolation and 40% of the calorimeter isolation is required to be less than 16% of the lepton T . The calorimeter isolation is corrected for electron shower leakage, pile-up, and underlying-event contributions. Both isolations are corrected for track and topological cluster contributions from the remaining three leptons. The pile-up dependence of this isolation selection is reduced compared with that of the previous search by optimising the criteria used for exclusion of tracks associated with a vertex other than the primary vertex and by the removal of topological clusters associated with tracks.
An additional requirement based on a vertex-reconstruction algorithm, which fits the four-lepton candidates with the constraint that they originate from a common vertex, is applied in order to further reduce the + jets and¯background contributions. A cut of 2 /ndof < 6 for 4 and < 9 for the other channels is applied, with an efficiency larger than 99% for signal in all channels.
The QED process of radiative photon production in boson decays is well modelled by simulation. Some of the final-state-radiation (FSR) photons can be identified in the calorimeter and incorporated into the ℓ + ℓ − ℓ + ℓ − analysis. The strategy to include FSR photons into the reconstruction of bosons is the same as in Run 1 [75]. It consists of a search for collinear (for muons) and non-collinear FSR photons (for muons and electrons) with only one FSR photon allowed per event. After the FSR correction, the four-momenta of both dilepton pairs are recomputed by means of a -mass-constrained kinematic fit [76]. The fit uses a Breit-Wigner boson lineshape and a single Gaussian function per lepton to model the momentum response function with the Gaussian width set to the expected resolution for each lepton. The -mass constraint is applied to both candidates.
Events that pass the common event selection (as described above) which are not yet split according to lepton flavours, form a category which is called 'inclusive' hereafter.

Event categorisation: multivariate analysis
In order to improve the sensitivity in the search for an NWA Higgs boson signal produced either in the VBF or the ggF production mode, two multivariate classifiers, namely a 'VBF classifier' and a 'ggF classifier', are used. These classifiers are built with deep neural networks (DNN) and use a architecture similar to that in Ref.
[77], combining a multilayer perceptron (MLP) and one or two recurrent neural networks (rNN) [78]. For both classifiers, the outputs of the MLP and rNN(s) are concatenated and fed into an additional MLP that produces an event score.
The 'VBF classifier' uses two rNNs and an MLP. The two rNNs have as inputs the T -ordered transverse momenta and the pseudorapidities of the two leading jets and the transverse momenta and the pseudorapidities of the four leptons in the event. The MLP uses as inputs the invariant mass of the four-lepton system, the invariant mass and the transverse momentum of the two-leading-jets system, the difference in pseudorapidity between the ℓ + ℓ − ℓ + ℓ − system and the leading jet, and the minimum angular separation between the ℓ + ℓ − or ℓ + ℓ − pair and a jet.
The 'ggF classifier' uses one rNN and an MLP. The rNN has as inputs the T -ordered transverse momenta and the pseudorapidities of the four leptons in the event. The MLP uses as inputs the following variables: 1) the four-lepton invariant mass; 2) the transverse momentum and the pseudorapidity of the four-lepton system; 3) the production angle of the leading defined in the four-lepton rest frame, cos * ; 4) the angle between the negative final-state lepton and the direction of flight of leading (subleading) in the rest frame, cos 1 (cos 2 ); 5) the angle between the decay planes of the four final-state leptons expressed in the four-lepton rest frame, Φ; and 6) the transverse momentum and the pseudorapidity of the leading jet.
The two classifiers are trained separately using the above-listed discriminating variables on all simulated NWA signal events from their corresponding production mode, and the SM background events. The 'VBF classifier' is trained on events with at least two jets while the 'ggF classifier' is trained on events with fewer than two jets. In order to represent the relative importance of the signal and background events, weights that scale the events to the same luminosity according to their production cross sections are used in the training. Furthermore, in order to achieve good discriminating power of the classifiers over a large range of signal mass hypotheses, the signal events are reweighted such that their overall four-lepton invariant mass spectrum matches that of the SM background events. As a result of this reweighting method the classifiers do not produce a bias towards a specific mass point. Extensive checks are performed to ensure such treatment does not create a local excess of background events that would fake a signal. Figure 1 shows the 'ggF classifier' and 'VBF classifier' output for the data, the SM background and an example signal with = 600 GeV.
After the common event selection, as described in Section 5.1.1, events with at least two jets ( jets ≥ 2) and a 'VBF classifier' score value greater than 0.8 form the VBF-MVA-enriched category. Events failing to enter the VBF-MVA-enriched category are classified into the ggF-MVA-high category if the 'ggF classifier' score value is greater than 0.5; these events are further split into three distinct categories according to the lepton flavour of the ℓ + ℓ − ℓ + ℓ − system. Finally, events failing both classifiers form the ggF-low category. Overall, five mutually exclusive categories are formed: VBF-MVA-enriched, ggF-MVA-high-4 , ggF-MVA-high-2 2 , ggF-MVA-high-4 , ggF-MVA-low. This categorisation is used in the search for a heavy scalar with the NWA and in the search in the context of a CP-conserving 2HDM.
The signal acceptance, defined as the ratio of the number of reconstructed events after all selection requirements to the total number of simulated events, is found to be between 30% (15%) and 46% (22%) in the ggF (VBF)-enriched category for the ggF (VBF) production mode depending on the signal mass hypothesis.

Event categorisation: cut-based analysis
As in the previous publication [18], a cut-based analysis is also performed to probe the sensitivity in the VBF production mode. If an event has two or more jets with T greater than 30 GeV, with the two leading jets being well separated in , Δ jj > 3.3, and having an invariant mass jj > 400 GeV, this event is classified into the VBF-enriched category; otherwise the event is classified into one of the ggF-enriched categories further split according to the lepton flavour of the ℓ + ℓ − ℓ + ℓ − system. Four distinct categories are formed, namely VBF-CBA-enriched, ggF-CBA-4 , ggF-CBA-2 2 , and ggF-CBA-4 . The ggF-enriched categories are used in the search for a heavy large-width scalar and the search for a Kaluza-Klein graviton excitation. In addition, as for the multivariate-based analysis, such categorisation is used in the search for a heavy scalar with the NWA and the corresponding results are described in the Appendix.  Figure 1: The output of (a) the 'ggF classifier' (NN ggF ) and (b) the 'VBF classifier' (NN VBF ) for the events passing the common event selections for the data, the SM background and NWA signal events with a mass of 600 GeV. For (b) the 'VBF classifier' output, an additional requirement of at least two jets in the event, is applied. The signal cross section is set to 100 times the observed limit for the 'ggF classifier' and 30 times the observed limit for the 'VBF classifier'. The background is scaled by the normalisation factors shown in Table 2. The lower panels show the ratio of data to prediction. Only statistical and experimental systematic uncertainties are included.

Background estimation
The main background source in the → → ℓ + ℓ − ℓ + ℓ − final state is non-resonant SM production, accounting for 97% of the total background events in the inclusive category. It arises from quark-antiquark annihilation¯→ (86%), gluon-initiated production → (10%), and a small contribution from EW vector-boson scattering (1%). The last of these is more important in the VBF-enriched category using the DNN-based categorisation, where it accounts for 20% of the total background events. While in the previous publication [18] the SM background was exclusively estimated from simulation for both the shape and the normalisation, in this analysis its normalisation is derived from the data in the likelihood fit used in the statistical treatment of the data as explained in Section 8. The shapes of the¯→ and → invariant mass distributions are parameterised with analytic functions as described in Section 5.3. Additional background comes from the + jets and¯processes. These contribute to the total background yields at the percent level and decrease more rapidly than the non-resonant contribution as a function of 4ℓ . These backgrounds are estimated using data where possible, following slightly different approaches for final states with a dimuon (ℓℓ + ) or a dielectron (ℓℓ + ) subleading pair [79,80].
The ℓℓ + non-background comprises mostly¯and + jets events, where in the latter case the muons arise mostly from heavy-flavour semileptonic decays and to a lesser extent from / in-flight decays. The normalisations of the + jets and¯backgrounds are determined by fitting the invariant mass of the leading lepton pair in dedicated data control regions. The control regions are formed by relaxing the 2 requirement on the four-lepton vertex fit, and by inverting and relaxing isolation and/or impact-parameter requirements on the subleading muon pair. An additional control region ( ) is used to improve theb ackground estimate. The contribution of transfer factors, defined as the number of events in the signal region divided by the number of events in the control region, are obtained separately for¯and + jets using simulated events to extrapolate the yields from the control regions to the signal regions.
The main non-prompt background for the ℓℓ + process arises from three sources: light-flavour jets misidentified as electrons; photon conversions; and semileptonic decays of heavy-flavour hadrons. The ℓℓ + control-region selection requires the electrons in the subleading lepton pair to have the same charge, and relaxes the identification and isolation requirements on the electron candidate, denoted , with the lower transverse momentum. The heavy-flavour background is found to be negligible, whereas the light-flavour and photon-conversion background is obtained with the sPlot [81] method, based on a fit to the number of hits in the innermost ID layer in the data control region. Transfer factors for the light-flavour jets and converted photons, obtained from simulated samples, are corrected using a + control region and then used to extrapolate the extracted yields to the signal region. Both the yield extraction and the extrapolation are performed in bins of the transverse momentum of the electron candidate and the jet multiplicity.
The production process is included in the data-driven estimates for the ℓℓ + final states, while it is added from simulation for the ℓℓ + final states even though its contribution to the total background is at the per-mille level. The contributions from¯(where stands for either a or a boson) and triboson processes are minor and taken from simulated samples.

Signal and background modelling
The reconstructed four-lepton invariant mass 4ℓ distribution is used as the discriminating variable for the ℓ + ℓ − ℓ + ℓ − final state. It is extracted from simulation for signal events and for most background components (¯, , ℓℓ + and heavy-flavour hadron component of ℓℓ + ), except for the light-flavour jets and photon conversions in the case of ℓℓ + background, which are taken from the control region as described in Section 5.2.
To obtain statistical interpretations for each mass hypothesis, the 4ℓ distribution for signal is parameterised as a function of the mass hypothesis . In the case of a narrow resonance, the width in 4ℓ is determined by the detector resolution, which is modelled by the sum of a Crystal Ball (C) function [82,83] and a Gaussian (G) function: The Crystal Ball and Gaussian functions share the same peak value of 4ℓ ( ), but have different resolution parameters, C and G . The C and C parameters control the shape and position of the non-Gaussian tail, and the parameter C ensures the relative normalisation of the two probability density functions. To improve the stability of the parameterisation in the full mass range considered, the parameter C is set to a fixed value. The bias in the extraction of signal yields introduced by using the analytic function is below 2% and treated as a systematic uncertainty of the signal parameterisation. The function parameters are determined separately for each final state using the simulated events for each generated mass , and then fitted with a polynomial in to interpolate between the generated mass points. The order of the polynomial is determined by first fitting with a third-order polynomial and then decreasing its order until the 2 is three times larger than the number of degrees of freedom. The use of this parameterisation for the function parameters introduces a bias in the signal yield and extraction of about 1%. The extra bias is included in the systematic uncertainties of the signal acceptance.
In the case of the LWA and the graviton model, a parton-level lineshape of 4ℓ is derived from a theoretical calculation and multiplied by the signal acceptance obtained from the simulated events; it is then convolved with the detector resolution, using the same functions as those for modelling the narrow resonance. The parton-level lineshape of 4ℓ is taken from Ref. [84] for the LWA, and from Ref. [85] for the graviton model.
For the continuum background, the 4ℓ distribution is parameterised by an empirical function for both the quark-and gluon-initiated processes in order to reduce the statistical uncertainties stemming from the limited number of simulated events. The empirical function is described by the following: where, .
The function's first part, 1 , covers the low-mass part of the spectrum until the threshold around 2 · , and the second part, 2 , describes the high-mass tail. The transition between low-and high-mass parts is modelled with the Heaviside step function ( ) around 0 = 260 GeV for¯→ and around 350 GeV for → . The continuity of the function around 0 is ensured by the normalisation factor 0 that is applied to the low-mass part. Finally, and are shape parameters which are obtained by fitting the 4ℓ distribution in simulation for each category. A large number of 4ℓ distributions are calculated from the analytic function with variations of the and values sampled from a multivariate Gaussian distribution that is constructed from their covariance matrix. The uncertainty in the 4ℓ distribution is determined by calculating a central interval that captures 68% of the variations, and is treated as a nuisance parameter in the likelihood fit, namely a parameterisation uncertainty. The parameterisation uncertainty is one of the leading systematic uncertainties for a low-mass signal, as shown in Table 1.

Interference modelling
The gluon-initiated production of a heavy scalar , the SM Higgs ℎ and the → continuum background all share the same initial and final state, and thus lead to interference terms in the total amplitude. Theoretical calculations described in Ref. [86] have shown that the effect of interference could modify the integrated cross section by up to O(10%), and this effect is enhanced as the width of the heavy scalar increases. Therefore, a search for a heavy scalar Higgs boson in the LWA case must properly account for two interference effects: the interference between the heavy scalar and the SM Higgs boson (denoted by -ℎ) and between the heavy scalar and the → continuum (denoted by -). However, because the width of the KK excitation resonance is relatively small, the interference effect is assumed to be negligible in the graviton interpretation for both final states.
If the and ℎ bosons have similar properties, they have the same production and decay amplitudes and therefore the only difference between the signal and interference terms in the production cross section comes from the propagator. Hence, the acceptance and resolution of the signal and interference terms are expected to be the same. The -ℎ interference is obtained by reweighting the particle-level lineshape of generated signal events using the following formula: where 1/ − (ℎ) is the propagator for a scalar ( or ℎ). The particle-level lineshape is then convolved with the detector resolution function, and the signal and interference acceptances are assumed to be the same.
In order to extract the -interference contribution, signal-only and background-only samples are subtracted from the generated SBI samples. The extracted particle-level 4ℓ distribution for theinterference term is then convolved with the detector resolution.

Event selection and categorisation
The ℓ + ℓ −¯fi nal state consists of a pair of high-T isolated leptons (electrons or muons) and large miss T , and is subject to larger background contamination than the ℓ + ℓ − ℓ + ℓ − channel. Candidate events are recorded with a combination of multiple single-lepton triggers, which gives a high efficiency of about 98% for typical signal processes in the signal region defined in the following.
Candidate events are preselected by requiring exactly two electrons or muons with opposite charges and T > 20 GeV, where the electrons (muons) must have | | < 2.47 (2.5). The leading lepton is further required to have T > 30 GeV, well above the threshold of the single-lepton triggers. The selected electrons or muons must have a longitudinal impact parameter satisfying | 0 sin( )| < 0.5 mm. The lepton candidates are required to satisfy the same isolation criteria and the same requirement on the transverse impact-parameter significance as used in the ℓ + ℓ − ℓ + ℓ − channel (see Section 5.1.1), which leads to an efficiency above 98% for typical prompt leptons with T > 30 GeV. To suppress the background, events containing any additional lepton satisfying the 'loose' identification requirement with T > 7 GeV, in addition to the other requirements, are rejected. Requiring the dilepton invariant mass ( ℓℓ ) to be in the range between 76 and 106 GeV largely reduces the contamination from the non-resonant-ℓℓ background, originating from¯, , , and → production. The data sample after the preselection is dominated by the + jets and non-resonant-ℓℓ processes. To suppress these backgrounds, a further selection based on miss T and event topology is applied.
Candidate events are required to have miss T > 120 GeV, which suppresses the + jets contamination by several orders of magnitude. The number of residual + jets events, which have large fake miss T , is further reduced by requiring S( miss T ) > 10, where S( miss T ) is the statistical significance of the miss T value against the null hypothesis of zero-miss T [87]. Additional selection criteria based on angular variables are imposed to further reject the + jets and non-resonant-ℓℓ background events. The selection on angular variables is motivated by the desired detector signature, where the ì miss T is back-to-back with the transverse momentum of the dilepton system. The azimuthal angle difference between the dilepton system and ì miss T , Δ ( ì ℓℓ T , ì miss T ), must be larger than 2.5 radians, and the selected leptons must be close to each other, with the distance Δ ℓℓ = √︁ (Δ ℓℓ ) 2 + (Δ ℓℓ ) 2 < 1.8. Furthermore, the azimuthal angle difference between any of the selected jets with T > 100 GeV and ì miss T must be larger than 0.4 radians. As a consequence of all the requirements, the + jets process only constitutes a small fraction of the total background (about 4%) after the full selection. Finally, events containing one or more -jets are vetoed to further suppress thē and backgrounds.
The signal region for the VBF production mode (VBF-enriched signal region) is defined for candidate events containing at least two selected jets with T > 30 GeV, where the two leading jets must have jj > 550 GeV and Δ jj > 4.4. The remaining events, failing the requirements for the VBF-enriched signal region, are categorised for the ggF-enriched signal region. The signal acceptance in the ggF-enriched signal region for signal events containing a heavy spin-0 resonance from ggF production is about 30% at = 400 GeV and up to 50% at = 1.4 TeV. For VBF signal events the signal acceptance in the VBF-enriched signal region is generally lower, ranging from 3% at = 400 GeV to 20% at = 1.6 TeV.

Background estimation
In the ggF-enriched signal region, the major backgrounds originate from the and processes, which account for 60% and 30% of the total background contribution, respectively. The non-resonantℓℓ background yields a relative contribution of about 5% to the total background, while the largely suppressed + jets background only constitutes a small fraction (4%). Finally, the remaining contributions from other processes ( and¯), amount in total to less than 1% of the total background. A similar composition of background processes is found in the VBF-enriched signal region, where the total background yield is expected to be smaller than 1% of that in the ggF-enriched signal region, due to the event selection for the VBF phase space. The various background estimates and their uncertainties are described below.
The main background contribution from production is estimated using a semi-data-driven method. Similarly to the ℓ + ℓ − ℓ + ℓ − analysis, the predicted yield is scaled by a floating normalisation factor, which is determined in the statistical fit to the signal-region data (see Section 8.1). The introduction of the data-driven normalisation factor helps constrain the total uncertainty in the yield, while the theoretical and experimental uncertainties in the transverse mass distribution are evaluated from simulation.
To estimate the background from production in the ggF-enriched signal region, a control region enriched in events, with a purity of over 90%, is defined using the preselection criteria, except that a third lepton with T > 20 GeV is required. Several further selections such as S( miss T ) > 3, a -jets veto, and T > 60 GeV, where T is constructed from the third lepton's transverse momentum and the ì miss T vector, 2 are applied to suppress non-contributions. A normalisation factor is calculated in the control region as the number of observed events in data, after subtracting the non-contributions estimated from simulation, divided by the predicted yield. The factor is found to be 1.05 with a total uncertainty of 5%, which is consistent with a recent measurement [88] performed within a broader fiducial phase space. The statistical uncertainty of the data in the control region leads to a 0.8% uncertainty in the estimate in the signal region. The main systematic uncertainty is evaluated for the ratio of the predictions in the signal and control regions, and covers the experimental uncertainties and the theoretical ones related to the PDFs and the QCD scales. The uncertainty related to the subtraction of the non-contribution in the control region is estimated by applying cross-section uncertainties for all the relevant processes and is found to be negligible. An additional uncertainty is assigned to the prediction in the signal region, to account for the efficiency mismodelling of vetoing a third lepton in → ℓℓℓ events. The total uncertainty in the estimate for the ggF-enriched signal region is about 5%. A similar method is adopted to estimate the contribution in the VBF-enriched signal region, except that the control region additionally selects two jets with T > 30 GeV. The normalisation factor is found to be 0.83 with an uncertainty of 0.27, which is compatible with the results presented in Ref.
[89]. The total uncertainty in the estimate for the VBF-enriched signal region is about 30%. The kinematic distributions are estimated from simulation.
To estimate the non-resonant-ℓℓ background, a control region dominated by the non-resonant-ℓℓ processes (with a purity of about 95%) is defined with all the event selection criteria except that the final state is required to contain an opposite-sign pair. The non-resonant-ℓℓ contribution in the ( ) channel is calculated as one half of the observed data yield after subtracting the contribution from the other background processes in the control region, and then corrected for the difference in the lepton reconstruction and identification efficiencies between selecting an pair and an ( ) pair. The lepton efficiency correction is derived as the square root of the ratio of the numbers of and events in data after the preselection. The choice of deriving the correction after preselection minimises the resulting statistical uncertainty. The total uncertainty in the non-resonant-ℓℓ estimate in the ggF-enriched signal region is about 9%, including the statistical uncertainty of the data in the control region and the method bias estimated from simulation. The estimation of the non-resonant-ℓℓ background in the VBF-enriched signal region relies on a similar methodology, except that the control region is defined with a jet selection that is looser than in the signal region. The non-resonant-ℓℓ estimate obtained with the looser selection is then scaled by a simulation-based transfer factor to derive the final estimate in the VBF-enriched signal region. The transfer factor is subject to experimental and theoretical uncertainties, and the relative uncertainty in the final estimate in the VBF-enriched signal region is 70%. The kinematic distributions for the non-resonant-ℓℓ background in the signal region are predicted with simulation, and the assigned systematic uncertainty covers the experimental uncertainty in the simulated shape as well as the difference between data and simulation in the control region.
The + jets background contribution is estimated from simulation and scaled by a normalisation factor derived in a control region enriched in + jets events. The control region is defined with all event selection criteria except that S( miss T ) must be less than 9 and no requirements on the azimuthal angle difference between jets with T > 100 GeV and ì miss T are made. The normalisation factor is found to be close to one. Apart from the statistical uncertainty in the control sample, the experimental and theoretical uncertainties are evaluated for the ratio of the number of simulated events in the signal region to that in the control region. The total uncertainty in the + jets estimate is about 40%. The kinematic distributions for the + jets background are modelled with simulation. Finally, backgrounds from the and¯processes, which contribute less than 1% of the total background, are estimated from simulation.

Signal and background modelling
The modelling of the transverse mass T distribution for signal and background is based on templates derived from fully simulated events and afterwards used to fit the data. In the case of a narrow resonance, simulated events generated for fixed mass hypotheses as described in Section 3 are used as the inputs in the moment-morphing technique [90] to obtain the T distribution for any other mass hypothesis.
The extraction of the interference terms for the LWA case is performed in the same way as in the ℓ + ℓ − ℓ + ℓ − final state, as described in Section 5.3. In the case of the ℓ + ℓ −¯fi nal state a correction factor, extracted as a function of , is used to reweight the interference distributions obtained at particle level to account for reconstruction effects. The final expected LWA T distribution is obtained from the combination of the interference distributions with simulated T distributions, which are interpolated between the simulated mass points with a weighting technique using the Higgs propagator, a method similar to that used for the interference.

Systematic uncertainties
The systematic uncertainties can be categorised into experimental and theoretical uncertainties. The first category includes the uncertainties resulting from the integrated luminosity, the trigger efficiencies, the momentum scale and resolution of tracks, the reconstruction and identification of leptons and jets, and their energy scale and resolution calibrations. Systematic uncertainties associated with data-driven methods are also in this category, but described in their corresponding sections: Section 5.2 for ℓ + ℓ − ℓ + ℓ − final state and Section 6.2 for ℓ + ℓ −¯fi nal state. The second category includes the uncertainties in the theoretical descriptions of the signal and background simulations.
These systematic uncertainties evaluated separately for signal and background in each category affect signal acceptances and background yields as well as the probability density distributions of the discriminating variables. They are provided as the inputs for the statistical interpretations described in Section 9, in which the impact of these uncertainties on the expected signal yields are also presented.

Experimental uncertainties
The uncertainty in the combined 2015-2018 integrated luminosity is 1.7% [91], obtained using the LUCID-2 detector [92] for the primary luminosity measurements.
The lepton identification and reconstruction efficiency and energy/momentum scale and resolution are derived from data using / → ℓℓ and → ℓℓ decay events. The uncertainties in the reconstruction performance are computed following the method described in Ref.
[64] for muons and Ref. [63] for electrons. In general, their impact on the signal and background yields is less than 1% in the ℓ + ℓ −¯fi nal state, and up to 1.5% in the ℓ + ℓ − ℓ + ℓ − final state. In addition, the lepton isolation uncertainty is estimated to be less than 1% in both final states.
The uncertainties in the jet energy scale and resolution have several sources, including uncertainties in the absolute and relative in situ calibration, the correction for pile-up, the flavour composition and response [93]. Each source is treated as an independent component. They vary from 4.5% for jets with transverse momentum T = 20 GeV, decreasing to 1% for jets with T = 100-1500 GeV and increasing again to 3% for jets with higher T . They are the dominant uncertainties in the VBF-enriched categories for ggF signal production and SM production in both final states.
Uncertainties in the lepton and jet energy scales are propagated to the uncertainty in the miss T [94]. Additionally, the uncertainties from the momentum scale and resolution of the tracks that are not associated with any identified lepton or jet contribute 8% and 3%, respectively, to the uncertainty in the miss T value.
The efficiency of the lepton triggers in events with reconstructed leptons is nearly 100%, and hence the related uncertainties are negligible. The uncertainties associated with the pile-up reweighting are also taken into account; their impact on the signal and background yields is about 1% for both final states.
These experimental uncertainties are common to the two final states; therefore, they are fully correlated between the two final states.

Theoretical uncertainties
For the simulation-based estimates, the theoretical uncertainties stemming from parton distribution functions (PDFs), missing higher-order QCD corrections, and parton showering are considered.
The PDF uncertainty is evaluated by taking the envelope of variations among alternative PDF choices and the estimate from its internal PDF error sets, following the PDF4LHC recommendation [95]. The missing higher-order QCD corrections are estimated by halving or doubling the factorisation and renormalisation scales independently, among which the largest effect is taken as the systematic uncertainty. The partonshowering uncertainty is assessed by varying the P configurations, such as the parameter values of the AZNLO tune, the multi-parton models and the final-state radiation models.
For different signal hypotheses, the impact of these theoretical uncertainties on the signal acceptance and the spectrum of the discriminating variables is evaluated. In total, the theoretical uncertainty in the signal acceptance varies from less than 1% in the low mass region to 12% in the high mass region of the ℓ + ℓ −f nal state, and from less than 1% in the low mass region to up to 20% in the high mass region of the ℓ + ℓ − ℓ + ℓ − final state.
For the continuum background, a common floating normalisation factor is introduced to scale the number of events for the¯→ and → processes, while the relative yields of the two processes are estimated from the simulations. Therefore, in addition to the spectrum of the discriminating variables in the background, the theoretical uncertainties are also propagated to the simulation-based estimation of the relative yields. Moreover, the uncertainty associated with the NLO EW corrections, calculated in Refs. [45,46,48], are also taken into account, affecting the discriminating variables by less than 1% in the low mass region and up to 10% in the high mass region for both final states.
Because the ℓ + ℓ − ℓ + ℓ − and ℓ + ℓ −¯s earches are sensitive to different energy scales, these theoretical uncertainties are assumed to be completely uncorrelated between the two analyses. A fully correlated scenario is also examined and the differences between the two scenarios in terms of the expected limits on various signal hypotheses are negligible.

Results
The statistical procedure used to extract the results is described in Section 8.1 and the results are presented in Section 8.2.

Statistical procedure and impact of systematic uncertainties
The statistical treatment of the data interpretation follows the procedure for the Higgs-boson search combination in 7 TeV data [96,97]. The test statistic used for limit setting is the profile likelihood ratio Λ( , ), which depends on one or more parameters of interest , additional normalisation factors and extra nuisance parameters . The parameter of interest is the cross section times branching ratio of the heavy resonance decaying into the two final states. The normalisation factors, which were not used in the previous publication [18], are introduced separately for each final state to scale the expected number of the SM background events in each category and are determined by a likelihood fit to the data. This allows the systematic uncertainty to be reduced by removing both the theoretical and luminosity uncertainties contributing to the normalisation uncertainty. In the ℓ + ℓ − ℓ + ℓ − final state, three floating normalisation factors are introduced for the VBF-enriched, ggF-MVA-high and ggF-MVA-low categories. They are referred to as VBF-MVA , ggF-MVA-high and ggF-MVA-low , respectively. The use of three normalisation factors for the ℓ + ℓ − ℓ + ℓ − final state is motivated by the different phase spaces defined for the respective signal regions. Only one floating normalisation factor is introduced in the ℓ + ℓ −¯fi nal state, due to the limited size of the data sample and the worse signal-to-background ratio in the respective VBF-enriched signal region.
The nuisance parameters represent the estimates of the systematic uncertainties and each of them is constrained by a Gaussian distribution. For each category of each final state, a discriminating variable is used to further separate signal from background. The number of signal events is extracted from a simultaneous fit to the discriminating variable, 4ℓ in the ℓ + ℓ − ℓ + ℓ − analysis and T in the ℓ + ℓ −¯a nalysis, in the event categories described in Sections 5 and 6.
The impact of a systematic uncertainty on the result depends on the production mode and the mass hypothesis. For the ggF production mode, at lower masses the parameterisation for the ℓ + ℓ − ℓ + ℓ − final state and the systematic uncertainty of the + jets background for the ℓ + ℓ −¯fi nal state dominate, and at higher masses the uncertainties in the NLO EW correction and parton showering become important, as also seen in VBF production. For the VBF production mode, the dominant uncertainties come from the theoretical modelling of the discriminating variables of the events in the VBF category. At lower masses, jet-energy-scale uncertainties are also important. Table 1 shows the impact of the leading systematic uncertainties on the predicted signal event yield when the cross section times branching ratio is set to the expected upper limit (shown in Figure 4), for ggF and VBF production modes. The statistical uncertainty of the data sample dominates in both of the present searches, and the systematic uncertainties impact the searches to a much lesser extent. Table 1: Impact of the leading systematic uncertainties, the data statistical uncertainties and the total uncertainties on the predicted signal event yield with the cross section times branching ratio being set to the expected upper limit, expressed as a percentage of the signal yield for the ggF (left) and VBF (right) production modes at = 300, 600, 1000, and 1500 GeV.

General results
The total number of observed events is 3275 in the ℓ + ℓ − ℓ + ℓ − final state ( 4ℓ > 200 GeV) and 2794 in the ℓ + ℓ −¯fi nal state. The expected background yields are obtained from a simultaneous likelihood fit of the two final states under the background-only hypothesis. The fitted normalisation factors for the SM background are summarised in Table 2.
The number of observed candidate events with mass above 200 GeV together with the expected background yields for each of the five categories of the ℓ + ℓ − ℓ + ℓ − analysis as described in Section 5.1.2 are presented in Table 3. The 4ℓ spectrum in each category is shown in Figure 2 for illustration, since the backgrounds are determined from a combined unbinned likelihood fit to the data under the background-only hypothesis. Table 4 contains the number of observed events along with the obtained background yields for the ℓ + ℓ −ā nalysis and Figure 3 shows the T distribution for the electron and muon channels in the ggF-enriched and VBF-enriched categories.
The two previous excesses around 240 GeV and 700 GeV that were observed in the publication [18] using 2015 and 2016 data are not confirmed using the full Run 2 dataset as explained below. The maximum deviation of the data from the background-only hypothesis is evaluated in the context of a NWA signal from the ggF production or from the VBF production separately. For the ggF production, the maximum deviation is for a signal mass hypothesis around 240 GeV, with a local significance of 2.0 standard deviations and a global significance of 0.5 standard deviation. For the VBF production, the maximum deviation is for a signal mass hypothesis around 620 GeV, with a local significance of 2.4 standard deviations and a global significance of 0.9 standard deviation.      = 600 GeV signal is normalised to a cross section corresponding to 50 (5) times the observed limit given in Section 9.1.1 for the ggF (VBF) production mode. The error bars on the data points indicate the statistical uncertainty, while the systematic uncertainty in the prediction is shown by the hatched band. The lower panels show the ratio of data to prediction. The red arrows indicate data points that are outside the displayed range.  Events beyond the upper limit of the histogram are included in the last bin of the distribution. The backgrounds are determined from a combined likelihood fit to data under the background-only hypothesis. The simulated = 600 GeV (1.5 TeV) signals are normalised to a cross section corresponding to 50 (5) times the observed limit given in Section 9.1.1 for the ggF production mode and to 5 (1) times the observed limit for the VBF production mode. The error bars on the data points indicate the statistical uncertainty and markers are drawn at the bin centre. The systematic uncertainty in the prediction is shown by the hatched band. The lower panels show the ratio of data to prediction. The red arrows indicate data points that are outside the displayed range.

Interpretations
Since no significant excess with respect to the background predictions is found, results obtained from the combination of the ℓ + ℓ − ℓ + ℓ − and ℓ + ℓ −¯fi nal states are interpreted in terms of exclusion limits for different signal hypotheses as presented below.

Spin-0 resonances with NWA
Upper limits on the cross section times branching ratio ( × ( → )) for a heavy resonance are obtained from the combination of the two final states, as a function of with the CL s procedure [98] in the asymptotic approximation. The results were verified to be correct within about 4% using pseudoexperiments. It is assumed that an additional heavy scalar would be produced mainly via the ggF and VBF processes but that the ratio of the two production mechanisms might depend on the model considered. For this reason, fits for the ggF and VBF processes are done separately, and in each case the other process is allowed to float in the fit as an additional free parameter. Figure 4 presents the observed and expected limits at 95% CL on the × ( → ) of a narrow scalar resonance for the ggF (left) and VBF (right) production modes, as well as the expected limits from the ℓ + ℓ − ℓ + ℓ − and ℓ + ℓ −¯s earches. This result is valid for models in which the width is less than 0.5% of . When combining the two final states, the 95% CL upper limits range from 215 fb at = 240 GeV to 2.0 fb at = 1900 GeV for the ggF production mode and from 87 fb at = 255 GeV to 1.5 fb at = 1800 GeV for the VBF production mode. Compared with the expected limits projected to the luminosity of 139 fb −1 from the previous publication [18], the current expected limits are decreased by a factor ranging from 20% to 28% for the ggF production mode and from 27% to 43% for the VBF production mode, depending on the mass hypothesis.

Spin-0 resonances with LWA
In the case of the LWA, upper limits on the cross section for the ggF process times branching ratio ( ggF × ( → )) are set for different widths of the heavy scalar. Figures 5 shows the limits for a width of 1%, 5%, 10% and 15% of respectively. The limits are set for masses of higher than 400 GeV. The choice of 400 GeV as the lower boundary is to avoid any major instability in the parametrization of the mass spectra for the LWA signals and the interference effects, especially when the signal pole mass gets smaller. The interpretation has only been carried out for the ggF process, as it is the leading channel to study the impact of non-trivial nature width on the search.

Two-Higgs-doublet model
A search in the context of a CP-conserving 2HDM is also presented. This model has five physical Higgs bosons after electroweak symmetry breaking: two CP-even, one CP-odd, and two charged. The model considered here has seven free parameters: the Higgs boson masses, the ratio of the vacuum expectation values of the two Higgs doublets (tan ), the mixing angle between the CP-even Higgs bosons ( ), and the potential parameter 2 12 that mixes the two Higgs doublets. The two Higgs doublets Φ 1 and Φ 2 can couple to leptons and up-and down-type quarks in several ways. In the Type-I model, Φ 2 couples to all quarks and leptons, whereas for Type-II, Φ 1 couples to down-type quarks and leptons and Φ 2 couples to up-type quarks. The 'lepton-specific' model is similar to Type-I except for the fact that the leptons couple to Φ 1 , instead of Φ 2 ; the 'flipped' model is similar to Type-II except that the leptons couple to Φ 2 , instead of Φ 1 . In all these models, the coupling of the heavier CP-even Higgs boson to vector bosons is proportional to cos( − ). In the limit cos( − ) → 0, the light CP-even Higgs boson is indistinguishable from a SM Higgs boson with the same mass. In the context of → decays there is no direct coupling of the Higgs boson to leptons, so only the Type-I and II interpretations are presented. In addition, our interpretations assume other Higgs bosons are heavy enough so that the heavy CP-even Higgs boson will not decay to them. Figure 6 shows exclusion limits in the tan versus cos( − ) plane for Type-I and Type-II 2HDMs, for a heavy Higgs boson with mass = 220 GeV. This value is chosen so that the assumption of a narrow Higgs boson is valid over most of the parameter space, and the experimental sensitivity is maximal. At this low mass, only the ℓ + ℓ − ℓ + ℓ − final state contributes to this result. The range of cos( − ) and tan explored is limited to the region where the assumption of a heavy narrow Higgs boson with negligible interference is valid. When calculating the limits at a given choice of cos( − ) and tan , the relative rates of ggF and VBF production in the fit are set to the prediction of the 2HDM for that parameter choice. Figure 7 shows exclusion limits as a function of the heavy Higgs boson mass and the parameter tan for cos( − ) = −0.1, which is chosen so that the light Higgs boson properties are still compatible with the recent measurements of the SM Higgs boson properties [99]. The white regions in the exclusion plots indicate regions of parameter space which are not excluded by the present analysis. In these regions the cross section predicted by the 2HDM is below the observed cross-section limit. In comparison with the previous publication, the excluded regions are significantly expanded. For example, in the tan versus plane for the Type-II 2HDM the excluded region in tan is more than 60% larger for 200 < < 400 GeV.

Spin-2 resonances
The results are also interpreted as a search for a Kaluza-Klein graviton excitation, KK   ) for a KK graviton produced with / Pl = 1. The black line indicates the observed limit. The green and yellow bands give the ±1 and ±2 uncertainties in the expected limits. The predicted production cross section times branching ratio as a function of the KK mass ( KK ) is shown by the red solid line.

Summary
A search is conducted for heavy resonances decaying into a pair of bosons which subsequently decay into ℓ + ℓ − ℓ + ℓ − or ℓ + ℓ −¯fi nal states. The search uses proton-proton collision data collected with the ATLAS detector from 2015 to 2018 at the Large Hadron Collider at a centre-of-mass energy of 13 TeV corresponding to the full Run 2 integrated luminosity of 139 fb −1 . No significant excess is observed with respect to the predicted SM background; therefore, the results are interpreted as upper limits on the production cross section of spin-0 resonances or a spin-2 resonance. The mass range of the hypothetical resonances considered is between 200 GeV and 2000 GeV depending on the final state and the model considered. The spin-0 resonance is assumed to be a heavy scalar, whose dominant production modes are gluon-gluon fusion and vector-boson fusion, and it is studied in the narrow-width approximation and with the large-width assumption. In the case of the narrow-width approximation, upper limits on the production rate of a heavy scalar decaying into two bosons (the production cross-section times the corresponding decay branching fraction) are set separately for ggF and VBF production modes. Combining the two final states, 95% CL upper limits range from 215 fb at = 240 GeV to 2.0 fb at = 1900 GeV for the gluon-gluon fusion production mode and from 87 fb at = 255 GeV to 1.5 fb at = 1800 GeV for the vector-boson fusion production mode. The results are also interpreted in the context of Type-I and Type-II two-Higgs-doublet models, with exclusion contours given in the tan versus cos( − ) (for = 220 GeV) and tan versus planes. This value is chosen so that the assumption of a narrow Higgs boson is valid over most of the parameter space and the experimental sensitivity is maximal. The limits on the production rate of a large-width scalar are obtained for widths of 1%, 5%, 10% and 15% of the mass of the resonance, with the interference between the heavy scalar and the SM Higgs boson as well as between the heavy scalar and the → continuum taken into account. In the framework of the Randall-Sundrum model with one warped extra dimension a graviton excitation spin-2 resonance with ( KK ) < 1830 GeV is excluded at 95% CL.

Appendix
The results based on the cut-based categorisation as described in Section 5.1.3 are presented here. The number of observed candidate events with mass above 200 GeV together with the expected background yields for each of the four categories of the ℓ + ℓ − ℓ + ℓ − analysis as described in Section 5.1.3 is presented in Table 5. The obtained normalisation factors are summarised in Table 6, and the 4ℓ spectrum in each category is shown in Figure 9. The upper limits at 95% CL on the cross section times branching ratio as a function of the heavy resonance mass in the case of the NWA is presented in Figure 10. Table 5: ℓ + ℓ − ℓ + ℓ − search: expected and observed numbers of events for 4ℓ > 200 GeV, together with their uncertainties, for the VBF-CBA-enriched and ggF-CBA-enriched categories using the cut-based categorisation. The expected numbers of events, as well as their uncertainties, are obtained from a combined likelihood fit to the data under the background-only hypothesis. The uncertainties of the normalisation factors, presented in Table 6, are also taken into account.      = 600 GeV signal is normalised to a cross section corresponding to 50 (5) times the observed limit given in Section 9.1.1 for the ggF (VBF) production mode. The error bars on the data points indicate the statistical uncertainty, while the systematic uncertainty in the prediction is shown by the hatched band. The lower panels show the ratio of data to prediction.  Figure 10: The upper limits at 95% CL on the cross section times branching ratio as a function of the heavy resonance mass for (a) the ggF production mode ( ggF × ( → )) and (b) for the VBF production mode ( VBF × ( → )) in the case of the NWA. For the ℓ + ℓ − ℓ + ℓ − state the cut-based categorisation is used. The green and yellow bands represent the ±1 and ±2 uncertainties in the expected limits. The dashed coloured lines indicate the expected limits obtained from the individual searches.   [29] P. Nason and G. Zanderighi, + − , and production in the POWHEG-BOX-       [100] ATLAS Collaboration, ATLAS Computing Acknowledgements, ATL-SOFT-PUB-2021-003, : https://cds.cern.ch/record/2776662.