Search for lepton-flavour violation in high-mass dilepton final states using 139 $\mathrm{fb}^{-1}$ of $pp$ collisions at $\sqrt{s}=13$ TeV with the ATLAS detector

A search is performed for a heavy particle decaying into different-flavour, dilepton final states, using 139 $\mathrm{fb}^{-1}$ of proton-proton collision data at $\sqrt{s}=13$ TeV collected in 2015-2018 by the ATLAS detector at the Large Hadron Collider. Final states with electrons, muons and hadronically decaying tau leptons are considered ($e\mu$, $e\tau$ or $\mu\tau$). No significant excess over the Standard Model predictions is observed. Upper limits on the production cross-section are set as a function of the mass of a Z' boson, a supersymmetric $\tau$-sneutrino, and a quantum black-hole. The observed 95% CL lower mass limits obtained on a typical benchmark model Z' boson are 5.0 TeV (e$\mu$), 4.0 TeV (e$\tau$), and 3.9 TeV ($\mu\tau$), respectively.


Introduction
Lepton-flavour-violation (LFV) is forbidden in the Standard Model (SM) theory of particle physics.However, the observation of flavour oscillations among neutrinos has shown that lepton flavour is not a conserved symmetry of nature [1].The search for charged LFV (CLFV) remains an area of active interest, motivating many experimental searches for the CLFV decays, such as the MEG experiment for  →  [2],  → 3 with the SINDRUM spectrometer [3] and also many searches of LFV -lepton decay [4].So far, there is no experimental evidence that LFV also occurs in interactions between charged leptons, and any observation of LFV would be a clear signature of new physics.
Many extensions of the SM predict LFV couplings, such as models with  ′ bosons [5], scalar neutrinos in -parity-violating (RPV) [6,7] supersymmetry (SUSY) and quantum black holes (QBH) in low-scale gravity [8].These dilepton LFV processes are usually produced at TeV energy scale, with a clear detector signature of a prompt, different-flavour lepton pair, which decreases the contribution from SM background processes.
A common extension of the SM is the addition of an extra  (1) gauge symmetry resulting in a massive neutral vector boson known as a  ′ boson.The Sequential Standard Model (SSM) [5] is used as a benchmark in this paper, where the  ′ boson is assummed to have the same quark couplings and chiral structure as the SM  boson, but allowing for lepton flavour violating couplings.Additonal LFV coupling parameters    are assigned for  ′ → ℓ  ℓ  processes, where ,  = 1 . . . 3 represent the three lepton generations.It is assumed that the    parameters equal to the SM  boson coupling for  = .The latest ATLAS Collaboration study placed lower limits of 4.5, 3.7, and 3.5 TeV on the mass of a  ′ boson decaying into , , and  pairs, respectively, using 36.1 fb −1 of the 13 TeV data sample [9].The CMS Collaboration has placed limits up to 5.0 TeV on a  ′ boson decaying into an  final state using 138 fb −1 [10].Following the same methodology as in Ref. [11], the polarisation of -leptons is not included in the model, but its impact on the sensitivity to a possible signature is found to be negligible.
In the RPV SUSY model, the Lagrangian terms allowing LFV can be expressed as ē +  ′        d , where  and  are the  (2) doublet superfields of leptons and quarks,  and  are the  (2) singlet superfields of charged leptons and down-type quarks,  and  ′ are Yukawa couplings, and the indices , , and  denote generations.A -sneutrino ( ν ) may be produced in proton-proton ( ) collisions by  d annihilation and later decay into , , or .Although only ν is considered in this paper, results apply to any sneutrino flavour.For the theoretical prediction of the cross-section times branching ratio, the ν coupling to first-generation quarks ( ′ 311 ) is assumed to be 0.11 for all channels.As in the  ′ model, each lepton-flavour-violating final state is considered separately.It is assumed that  312 =  321 = 0.07 for the  final state,  313 = 0.07 for the  final state, and  323 = 0.07 for the  final state, while  331 and  332 are set to be zero, due to the gauge invariance for these channels, resulting in the ν cross-section times branching ratio in the  channel being up to approximately twice as large as in the  or  channel.These values are chosen for easy comparisons with previous ATLAS and CMS searches [11][12][13].However in the previous ATLAS search [9] and the latest CMS publication [10],  331 and  332 were assumed to be the same as  313 and  323 respectively, so the results of RPV model in this paper are not directly comparable to those in the previous publications for  and  channels.The latest ATLAS results with 36.1 fb −1 using the 13 TeV data [9] have excluded RPV SUSY models below the ν masses 3.4 TeV, 2.6 TeV and below 2.3 TeV for ,  and  channels respectively with exactly the same parameters mentioned above.The CMS Collaboration has recently excluded RPV SUSY models below 2.2 TeV for  312 =  321 =  ′ 311 = 0.01 [10].Various models introduce extra spatial dimensions to reduce the value of the Planck mass and resolve the hierarchy problem.The search described in this paper presents interpretations based on two models: the Arkani-Hamed-Dimopoulos-Dvali (ADD) model [14], assuming  = 6, where  is the number of extra dimensions, and on the Randall-Sundrum (RS) model [15] with one extra dimension.Due to the increased strength of gravity at short distances, in these models   collisions at the LHC could produce states exceeding the threshold mass ( th ) to form black holes.In this paper,  th is assumed to be equivalent to the extra-dimensional Planck scale.The quantum gravity regime [16][17][18] is applied only when considering the mass region below 3-5  th , since for masses beyond this region it is expected that thermal black holes would be produced.The non-thermal (or quantum) black holes could decay into two-particle final states, producing the topology investigated in this paper.QBHs would have a continuum mass distribution from  th up to the beginning of the thermal regime which for the models considered in this paper is assumed to start at 3 th .This approach is consistent with the previous ATLAS analyses, such as Ref. [9].The decay of the QBH would be governed by a yet unknown theory of quantum gravity.The two main assumptions of the extra-dimension models considered in this paper [8] are that (a) gravity couples with equal strength to all SM particle degrees of freedom and (b) gravity conserves local symmetries (colour and electric charge), but can violate global symmetries such as lepton-flavour and baryon-number conservation.Following these assumptions, the branching ratio to each final state can be calculated.QBHs decaying into different-flavour, opposite-charge lepton pairs are created via  q () with a branching ratio to ℓℓ ′ of 0.87% (0.34%) [8].As in the  ′ model, each lepton-flavour-violating final state is considered separately.These models were used in previous ATLAS and CMS searches for QBH in dĳet [19][20][21], lepton+jet [22], photon+jet [23],  [10], and same-flavour dilepton [24] final states.The latest ATLAS Collaboration study placed lower limits of 5.5 (3.4), 4.9 (2.9), and 4.5 (2.6) TeV on  th of QBH with ADD (RS) model decaying into , , and  pairs, respectively, using 36.1 fb −1 of the 13 TeV data sample [9].This paper describes a search for new phenomena in final states with two leptons of different flavour using 139 fb −1 of data from   collisions at √  = 13 TeV at the Large Hadron Collider (LHC).The dilepton signal final states consisting of , , or  pairs are considered, where the -lepton decays hadronically.This analysis is looking for a localised excess in the distribution of dilepton invariant mass in TeV range.There are three signal regions defined, one for each decay mode.Corresponding control regions are also designed to extract a normalisation of the most prominent SM backgrounds: production of top quarks and dibosons.The contributions from fake leptons are calculated from the data.The final simultaneous fit, with the signal and control regions, is performed separately in each decay mode.Four benchmark models ( ′ , RPV SUSY ν and QBH: ADD n=6; RS n=1) are used to interpret the results.Compared with the previous ATLAS search with 36 fb −1 of   collision data at √  = 13 TeV [9], this analysis benefits from a factor of four increase in integrated luminosity, improvements in object reconstruction (such as electron and tau lepton identification) the use of a more robust background estimation method (e.g. the 4 × 4 matrix method in  channel), and the application of a simultaneous fit with both signal region and control regions to constrain systematic uncertainties.In this analysis, the b-jet veto is also included as a part of the baseline selection in every channel, which highly suppresses the top quark related backgrounds.

The ATLAS detector
The ATLAS detector [25] is a general-purpose particle detector with approximately forward-backward symmetric cylindrical geometry.It covers nearly the entire solid angle around the collision point. 1 It consists of an inner tracking detector surrounded by a thin superconducting solenoid, electromagnetic and hadron calorimeters, and a muon spectrometer incorporating three large superconducting air-core toroidal magnets.
The inner-detector system (ID) is immersed in a 2 T axial magnetic field and provides charged-particle tracking in the range of || < 2.5.The high-granularity silicon pixel detector covers the vertex region and typically provides four measurements per track, the first hit normally being in the insertable B-layer (IBL) installed before Run 2 [26,27].It is followed by the silicon microstrip tracker (SCT), which usually provides eight measurements per track.These silicon detectors are complemented by the transition radiation tracker (TRT), which enables radially extended track reconstruction up to || = 2.0.The TRT also provides electron identification information based on the fraction of hits (typically 30 in total) above a higher energy-deposit threshold corresponding to transition radiation.
The calorimeter system covers the pseudorapidity range || < 4.9.Within the region of || < 3.2, electromagnetic calorimetry is provided by barrel and endcap high-granularity lead/liquid-argon (LAr) calorimeters, with an additional thin LAr presampler covering || < 1.8 to correct for energy loss in material upstream of the calorimeters.Hadron calorimetry is provided by the steel/scintillator-tile calorimeter, segmented into three barrel structures within || < 1.7, and two copper/LAr hadron endcap calorimeters.The solid angle coverage is completed with forward copper/LAr and tungsten/LAr calorimeter modules optimised for electromagnetic and hadronic energy measurements respectively.
The muon spectrometer (MS) comprises separate trigger and high-precision tracking chambers measuring the deflection of muons in a magnetic field generated by the superconducting air-core toroidal magnets.The field integral of the toroids ranges between 2.0 and 6.0 T m across most of the detector.Three layers of precision chambers, each consisting of layers of monitored drift tubes, cover the region || < 2.7, complemented by cathode-strip chambers in the forward region, where the background is highest.The muon trigger system covers the range || < 2.4 with resistive-plate chambers in the barrel, and thin-gap chambers in the endcap regions.
Interesting events are selected by the first-level trigger system implemented in custom hardware, followed by selections made by algorithms implemented in software in the high-level trigger [28].The first-level trigger accepts events from the 40 MHz bunch crossings at a rate below 100 kHz, which the high-level trigger further reduces in order to record events to disk at about 1 kHz.
An extensive software suite [29] is used in data simulation, in the reconstruction and analysis of real and simulated data, in detector operations and in the trigger and data acquisition systems of the experiment. 1 ATLAS uses a right-handed coordinate system with its origin at the nominal interaction point (IP) in the centre of the detector and the -axis along the beam pipe.The -axis points from the IP to the centre of the LHC ring, and the -axis points upwards.Cylindrical coordinates (, ) are used in the transverse plane,  being the azimuthal angle around the -axis.

Data and simulated event samples
The data sample used for this analysis was collected during 2015 to 2018 from   collisions at a centre-ofmass energy of 13 TeV.After selecting periods with stable beams and applying data-quality requirements, the total integrated luminosity is 139 fb −1 with an uncertainty of 1.7% [30,31].
The   →  ′ → ℓℓ ′ Monte Carlo (MC) simulated samples were generated at leading order (LO) using the generator Pythia 8.186 [32] with the NNPDF2.3LO[33] parton distribution function (PDF) set and the A14 [34] set of tuned parameters.The signal samples were generated for the SSM, including fifteen mass points ranging from 1 TeV to 8 TeV in steps of 500 GeV.The cross-section of the  ′ signal was corrected from LO to next-to-next-to-leading order (NNLO) in the strong coupling constant with a rescaling, which was computed with VRAP 0.9 [35] and the CT14NNLO PDF [36] set.The NNLO QCD correction was applied as a multiplicative factor from 1.44 to 0.29 for different  ′ masses on total cross-section.No mixing of the  ′ boson with the  and  * bosons is included.
The  d → ν → ℓℓ ′ samples were generated at LO with MadGraph5_aMC@NLO v2.3.3 [37] interfaced to the Pythia 8.186 parton shower model with the NNPDF2.3LOPDF set and the A14 tune.The signal samples were generated with the same mass values as for the  ′ model described above.The cross-section was calculated at LO with the same generator used for simulation and corrected to next-to-leading-order (NLO) using LoopTools v2.2 [38].The NLO correction factor ranges from 1.1 to 1.4, depending on different ν masses.
The   → QBH → ℓℓ ′ samples were generated with the program QBH 3.00 [39] using the CTEQ6L1 [40] PDF set and the A14 tune, for which Pythia 8.205 [41] provides showering and hadronisation.For each extra-dimensional model, eleven  th points in 500 GeV steps were produced: from 4 TeV to 9 TeV for the ADD  = 6 model and from 2 TeV to 7 TeV for the RS  = 1 model.These two models differ in the number and nature of the additional extra dimensions (large extra dimensions for ADD and one highly warped extra dimension for RS).In particular, the ADD model predicts black holes with a larger gravitational radius and hence the parton-parton cross-section for this model is larger than for the RS model.Therefore, the  th range of the generated samples differs for the two models.
The SM background in the LFV dilepton search is due to several processes which produce a final state with two different-flavour leptons.For the  mode, the dominant background contributions originate from  t and single-top  production with the subsequent decay of both of the  bosons (including those from top quark decays) in the event into leptons, and diboson (,  , and  ) production.Other backgrounds originate from -lepton pair production ( q → / * → ).Both diboson and -lepton pair production produce different-flavour final states, through the leptonic decays of the  and  bosons or the -leptons.For the  and  modes, multĳet and +jets processes are the dominant backgrounds.They contribute due to the misidentification of jets as leptons.
Backgrounds from top quark production include  t and  production.The production of  t events was modelled using the Powheg-Box v2 [42][43][44][45] generator at NLO with the NNPDF3.0NLOPDF set [46] and the ℎ damp parameter2 set to 1.5  top [47].The events were interfaced to Pythia 8.230 [41] to model the parton shower, hadronisation, and underlying event, with parameters set according to the A14 tune and using the NNPDF2.3LOset of PDFs.The decays of bottom and charm hadrons were performed by EvtGen 1.6.0[48].The  backgrounds were modelled by the Powheg-Box v2 generator at NLO in QCD using the five-flavour scheme and the NNPDF3.0NLOset of PDFs.The diagram removal scheme [49] was used to remove interference and overlap with  t production.The events were interfaced to Pythia 8.230 using the A14 tune and the NNPDF2.3LOset of PDFs.
For diboson samples, final states were simulated with Sherpa 2.2.12 [50,51], including off-shell effects and Higgs boson contributions where appropriate.Fully leptonic final states and semileptonic final states, where one boson decays leptonically and the other hadronically, were simulated using matrix elements (ME) at NLO accuracy in QCD for up to one additional parton and at LO accuracy for up to three additional parton emissions.
The SM Drell-Yan process was generated at NLO using Powheg-Box v1 MC generator which is used for the simulation at NLO accuracy of the hard-scattering processes of -boson production and decay in the , , and -lepton final states.It was interfaced to Pythia 8.186 with parameters set according to the AZNLO [52] tune.The CT10 PDF set [53] was used for the hard-scattering processes, whereas the CTEQ6L1 PDF set was used for the parton shower.
Processes such as +jets and multĳet production with jets that are misidentified as leptons were estimated through a combination of data-driven methods and simulation, detailed in Section 5.The +jets contribution was estimated with the aid of the Sherpa 2.2.2 simulated samples using NLO MEs for up to two partons, and LO matrix elements for up to four partons calculated with the Comix [54] and OpenLoops libraries.They are matched with the Sherpa parton shower using the MEPS@NLO [55] prescription using the set of tuned parameters developed by the Sherpa authors [50].
For all samples used in this analysis, the effects of multiple proton-proton interactions per bunch crossing (pileup) were included by overlaying minimum-bias events simulated with Pythia 8.186 and reweighting the simulated events to reproduce the distribution of the number of interactions per bunch crossing observed in the data.The generated events were processed with the ATLAS simulation infrastructure [56], based on Geant4 [57], and passed through the trigger simulation and the same reconstruction software used for the data.

Event reconstruction and selection
The search for new phenomena presented here is aimed at the high mass range, thus event selections are optimised accordingly.Events are kept only if they can satisfy a single-muon or single-electron trigger.The trigger  T threshold is different for different data-taking periods.For 2015 data runs, a  T threshold of 20 or 50 GeV for muons is applied, and the value for electrons is 24, 60 or 120 GeV.For 2016 -2018 data runs, the  T threshold is 26 or 50 GeV for muons, and 26, 60 or 140 GeV for electrons.This analysis relies on triggers with  T threshold above 50 (60) GeV for muons (electrons).The single-electron trigger with higher  T threshold has a looser identification requirement, resulting in an increased trigger efficiency in the high  T region which is especially important for a high mass resonance search.In addition to the trigger selection, events are required to have at least one offline reconstructed signal lepton matched to the object that fired the trigger.
Electron candidates are formed by associating the energy in clusters of cells in the electromagnetic calorimeter with a track in the ID [58].Candidate electrons are identified using a likelihood-based method.The likelihood discriminant utilises lateral and longitudinal calorimeter shower shapes together with tracking and cluster-track matching quantities.The discriminant criterion is a function of the  T and || of the electron candidate.Two operating points are used in this analysis, as defined in Ref. [59,60]: Medium and Tight, which correspond to an efficiency of 88% and 80% for a prompt electron with  T = 40 GeV respectively.The Tight working point is required for electrons in signal events while the Medium working point is required in order to define an electron used for the reducible background estimates.Electron candidates must have  T > 65 GeV and || < 2.47, excluding the region 1.37 < || < 1.52, where the electron performance is degraded due to the presence of extra inactive material.Further requirements are made on their tracks: the transverse and longitudinal impact parameters relative to the primary vertex of the event ( 0 and Δ 0 ) must satisfy | 0 /  0 | < 5 and |Δ 0 sin | < 0.5 mm.Candidates are required to satisfy relative track-based and calorimeter-based isolation requirements with an efficiency of 99% to suppress background from non-prompt electrons [61].
Candidate muon tracks are reconstructed independently in the ID and the MS which are then used in a combined fit.To ensure optimal muon momentum resolution to very high  T region (∼10% at 1 TeV), the High- T operating point [62] is used in this analysis.Only tracks with hits in each of the three stations of the muon spectrometer are considered.Moreover, muon candidates are required to be within a pseudorapidity of || < 2.5, satisfy | 0 /  0 | < 3 and |Δ 0 sin | < 0.5 mm,  T > 65 GeV, and satisfy a track-based isolation criterion with an efficiency of 99% to further reduce contamination from non-prompt muons.The track isolation is similar to the one defined for electrons.Muon candidates fulfilling all selection except the isolation criterion are called "Loose muons" which are used in the reducible background estimates.An additional upper cut on   T at 6 TeV is applied to remove poorly calibrated muons.This only affects the very high-mass signals, and no events fail this requirement in either the data or the background MC.
Jets are reconstructed using the anti-  algorithm [63] with a radius parameter of 0.4.The inputs to the jet clustering are built by combining the information from both the calorimeters and the ID using a particle-flow algorithm [64].The cluster energies are calibrated according to in situ measurements of the jet energy scale [65].Jets with  T < 60 GeV and with || < 2.4 are further required to satisfy the jet vertex tagger [66], which is a likelihood discriminant that uses a combination of track and vertex information to suppress jets originating from pile-up activity.
Hadronic decays of -leptons are composed of a neutrino and a set of visible decay products ( had ), typically one or three charged pions and up to two neutral pions.The reconstruction of  had starts with jets reconstructed from topological clusters [67,68].The  had candidates must have || < 2.5 with the transition region (1.37 < || < 1.52) excluded, a transverse momentum  T > 65 GeV, one or three associated tracks with ±1 total electric charge.To discriminate against jets,  had candidates are identified with a multivariate algorithm that employs a Recurrent Neural Network (RNN) using shower shape and tracking information [68].All  had candidates are required to satisfy the Medium operating point which corresponds to an efficiency of 75% (60%) for 1-prong (3-prong) -leptons, and a jet background rejection of 35 (240) for 1-prong (3-prong) -lepton candidates.Furthermore, a dedicated Boosted Decision Tree-based veto is applied to reduce the number of electrons misidentified as  had candidates.Jets with || < 2.5 containing a -hadrons are identified with a -tagging algorithm based on a deep-learning neural network [69].The information used includes distinctive features of -hadron decays in terms of the impact parameters of the tracks and the displaced vertices reconstructed in the ID.The chosen operating point has an efficiency of 85%.In this study, a veto on -tagged jets is applied to reject contributions from events containing top quarks.
The missing transverse momentum vector ì  miss T , with magnitude  miss T , is defined as the negative vector sum of the transverse momenta of all identified physics objects and an additional soft term.The soft term is constructed from all tracks that are associated with the primary vertex, but not associated with any selected physics object [70].
Candidate signal events must have a reconstructed primary vertex with at least two associated tracks, defined as the vertex whose constituent tracks have the highest sum of  2 T .There should be exactly two different-flavour, opposite-sign leptons satisfying the previously mentioned criteria: ,  had or  had .Events with an additional electron, muon or  had fulfilling all the selections are vetoed.For the  channel only, events with an extra electron or muon fulfilling the "loose" criteria are also vetoed.For all three channels, the lepton candidates must be back-to-back in the transverse plane with Δ(ℓ, ℓ ′ ) > 2.7.The invariant mass of the dilepton pair larger than 600 GeV is used as the discriminant.To account for differences between data and simulation, corrections are applied to the lepton trigger, reconstruction, identification, and isolation efficiencies as well as to the lepton energy/momentum resolution and scale [58,62,67].
To avoid double counting among objects, a lepton-lepton and lepton-jet overlap removal is applied based on the Δ distance metric.The  T threshold of muons considered in the overlap removal with  had is lowered to 2 GeV.Further details can be found in Ref. [71].
After the event selection mentioned above, for a  ′ boson with a mass of 1.5 TeV, the fractions of signal events that satisfy all of the selection requirements are approximately 45%, 20%, and 15% for the , , and  final states, respectively.
Because of the presence of a neutrino in the -leptons hadronic decay, for the  and  channels the dilepton invariant mass cannot be fully reconstructed, so an approximation is used.The mass of the BSM particle from the signals considered is much larger than the masses of the SM leptons, thus producing very high energy -leptons when it decays.The hadronic cascade decay of the high energy -lepton results in the neutrino and the resultant jet being nearly collinear.The neutrino four-momentum is reconstructed from the magnitude of the missing transverse momentum and the direction of the  had candidate.This technique improves the dilepton mass resolution and search sensitivity [12].For a simulated  ′ boson with a mass of 2 TeV, the mass resolution improves from 8% (17%) to 4% (12%) in the  () channel.

Background estimation
The background processes for this search can be divided into two categories: reducible and irreducible.The latter is composed of processes, which can produce two different-flavour opposite-sign prompt leptons, including diboson,  t, single-top , and / * →  production.These processes are modelled with MC simulated event samples and normalised to their theoretical cross-sections before further adjustment.Reducible backgrounds originate from jets mis-reconstructed as leptons, such as +jets and multĳet production.These backgrounds are estimated from the data.The contribution from reducible backgrounds is small in the  channel-approximately 10% in the signal region (SR)-whereas in the  and  channels they are among the leading components and account for 20−30% of the total background in the SR.Table 1 shows the regions used in the final fit mentioned in Section 7 in different channels.

Irreducible backgrounds
Among the irreducible backgrounds,  t and  are the dominant processes in all channels.Their contributions are evaluted with MC simulation, which are corrected by factors derived in the relevant control regions (CR).For the  t process, which is the dominant contribution in "Top Quarks" backgrounds, a CR is built with the same selection as signal events but reversing the -jet veto cut in each of ,  For the  and  channels,  CRs are not defined due to non-negligible contributions from fake backgrounds.However, due to lepton flavour universality, the correction factor ratios between the  t and  processes are assumed to be the same in the  and -lepton channels: where   ( )   is the correction factor for the  process in the  or  channel,    is the correction factor for the  process in the  channel.The   ( )   t is the correction factor for the  t process in the  or  channel, and    t is the correction factor for the  t process in the  channel.The correction factor for the  process in the  channel can be extrapolated to the  or  channel with a factor equal to the ratio of the  t correction factors between the  channel and  channels: From the simultaneous fit in all CRs for the  channel, the fitted uncertainty of    is 12% and it is highly correlated with    t .The same uncertainty is assigned to   ( )   .Conservatively, it is treated as uncorrelated to    t , thus leading to an over-estimate in the final fit uncertainties.The 12% uncertainty on the  correction is also much larger than the correction effect itself in the  and  channels, which is less than 5% on the diboson background.
The  t and  correction factors are used to correct the total yields of "Top Quarks" and "Diboson" backgrounds respectively.Other irreducible background processes, such as  → ℓℓ ′ are evaluted directly with MC simulation.

Reducible backgrounds
The dominant reducible backgrounds, +jets and multĳet production, are estimated by using data-driven techniques.In the  channel, the matrix method [24] is used, while in the  and  channels, the fake-lepton background is extrapolated from dedicated CRs to SR with the method described in Ref. [12].

Fake backgrounds in 𝒆 𝝁 channel
The goal of the matrix method is to estimate the fraction of events in the data sample with a "real" electron and a "real" muon ( RR ), events with a jet faking an electron ("fake" electron) and a "fake" muon ( FF ), and events with one "real" lepton and one "fake" lepton ( RF and  FR ).
Two selection criteria are defined: "Tight" and "Loose" for muons (electrons) based on their lepton quality (identification and isolation) respectively.As the selection efficiency is different between a "real" lepton and a "fake" lepton, the contribution of "fake" leptons is estimated by the number of data events passing each selection criteria.
Four data samples are constructed based on the lepton quality: • a sample where both the electrons and muons pass the "Tight" quality cut, with number of events  TT ; • a sample where the electrons pass the "Loose" but failed the "Tight" selection, and the muons pass the "Tight" selection, with event number  LT ; • a sample where the muons pass the "Loose" but failed the "Tight" selection, and the electrons pass the "Tight" selection, with event number  TL ; • a sample where both the electrons and muons pass the "Loose" but fail the "Tight" selection, with event number  LL .By solving a system of linear equations involving the numbers of events for these four data samples, the reducible background contributions can be evaluated.
(3) Among these equations, "" is the probability of a "Loose" lepton matched to a true lepton to satisfy the "Tight" quality selection, defined as the "real efficiency."It is evaluated from  → ℓℓ ′ simulated events.The "  " refers to the probability that a jet is misidentified as a "Tight" quality lepton, so called "fake rate" which is determined in a multĳet-enriched data sample.The creation of this sample starts with the selection of a "Loose" lepton back-to-back with a jet.To suppress the +jets contamination to the multĳet-enriched sample, the events are required to have  miss T < 25 GeV and the transverse mass  T < 50 GeV.Transverse mass is defined as , where  ℓ ± T and  miss T denote the transverse energies of the final state charged lepton and neutrino with Δ as the azimuthal angle difference.Since most of the "fake" muons are from heavy flavour jets decays in the muon fake rate measurement, the requirements on both of the impact parameters  0 and  0 are reversed in order to further suppress the +jets contamination.Remaining contamination are subtracted relying on simulations from +jets and other SM background processes (top, diboson and  → ℓℓ).Both the real efficiency and the fake rate are determined as a function of  T .
The uncertainties associated with the matrix method are evaluated by considering systematic effects on the lepton fake rate measurement and uncertainties in the real lepton efficiency.The latter is considered to be the difference in the lepton  distribution between SR and  → ℓℓ MC [71].It is assigned by comparing the real lepton efficiency measured with  → ℓℓ plus zero jet events and inclusive  → ℓℓ events.The systematic uncertainties in the fake rate include: • the choice of multĳet-enriched region; • the impact of reversing  0 and  0 on muons; • the difference in the fake rates between the SR and multĳet-enriched region due to the difference in the jet flavour components.
The overall uncertainty on the  reducible background is about 30% in the SR.Given that in the  channel SR, this contribution is about 10% of the total background over the invariant mass range considered, the uncertainties in the fake-lepton background estimates have little impact on the results.

Fake backgrounds in 𝒆𝝉 and 𝝁𝝉 channels
One of the major backgrounds for the  and  channels is the +jets process, where a jet is misidentified as a lepton candidate.As it is difficult to model misidentification of jets as leptons, particularly in the high  T region, the +jets backgrounds are determined from dedicated CRs.They are built from events selected with the same criteria as used in the signal selection except for requiring  miss T > 30 GeV to enhance the +jets contribution and suppress the  → ℓℓ ′ contamination.The dilepton mass requirement is also lowered to 130 GeV to reduce the statistical uncertainty in these control regions.Furthermore, the Δ ℓℓ ′ requirement is reversed (Δ ℓℓ ′ < 2.7) to ensure orthogonality with the SR.Simulation studies indicate that the multĳet background is negligible in this CR.Contributions from other SM background processes are estimated by using MC simulation.The +jet MC samples are used to calculate the ratio of events in the CR to events in the SR.Then the total +jet backgrounds yields are extrapolated from CR to SR with this ratio.Kinematic distributions of the +jets background are derived by relying on MC simulation.The uncertainties considered in this method include • the construction of the +jets CR, • uncertainties in the estimates of other SM background processes in the +jets CR, • and the +jets MC shape uncertainty which is assigned based on the data versus prediction agreement in the +jet CR.
The overall uncertainty of +jets in  and  channels is approximately 20%.
The contribution from multĳets production is estimated also from a CR with the same selection as the SR except that the leptons are required to have the same electric charge, assuming that the probability of misidentifying a jet as a lepton is independent of the charge.The dilepton mass requirement is also lowered to 130 GeV to estimate the low mass multĳets contribution used in the high mass extrapolation.The contamination from processes with prompt leptons is subtracted from data in the same-sign CR using MC simulation, while the contamination from +jets is subtracted using the previously mentioned procedure.
The assumption of charge independence of the jet misidentification rates is tested in a multĳet-enriched region with two non-isolated leptons.After subtraction of other SM backgrounds, the ratio of opposite-sign to same-sign events is found to be in reasonable agreement in the range of 10% to 20% up to the TeV dilepton invariant mass range.The remaining lack of closure is taken as an uncertainty.Besides this charge independence assumption uncertainty, the uncertainties in multĳets estimates mainly come from uncertainties in the +jets subtraction and other MC irreducible background subtractions.As the uncertainty in +jets is quite significant, they are also propagated to the multĳets estimate through the +jets subtraction.In this way, the yield of the multĳets background is highly anti-correlated with the +jets background, and they are combined together as "fake" background.The overall uncertainty in the "fake" background is approximately 25% in the  and  channels.
Considering the lack of "fake" background event numbers at high dilepton invariant mass values in the signal region, a function of the form  (   ) =  +log(   )   with free parameters ,  and  is fitted to the dilepton invariant mass distribution in the range 340 <    < 3000 GeV.This function is used to extrapolate the fake background estimate for the whole signal region (   > 600 GeV).The statistical uncertainty of the fitted function is determined by the propagation of the statistical uncertainties on the fitted parameters.The uncertainty due to the extrapolation is dominated by the fit function choice uncertainty, and it is evaluated by comparing the nominal estimate with the one obtained when changing the fit function (  (   ) =     , with the free parameters  and ).The difference of these two functions is more than 100% above 2 TeV.The same functions have been previously used to model the jet backgrounds in other analyses [72].Both the data-driven uncertainties and the fit related uncertainties are taken into account as independent sources for the fake estimation uncertainty.

Systematic uncertainties
Systematic uncertainties affect the event yields and the shape of the invariant mass distribution in the signal and control regions.They are grouped into two types: theoretical and experimental uncertainties.The systematic uncertainties related to the estimates of fake backgrounds are discussed in Section 5.
Experimental uncertainties considered include those from the trigger, reconstruction, identification and isolation requirements of the final-state particles, such as electrons [58], muons [62], -leptons [67], jets [65],  miss T [73], and b-tagging [74].Their energy scale and resolution uncertainties are also taken into account.An additional uncertainty (1.7%) from the measurement of the integrated luminosity is included.Sources of uncertainty are considered for both the simulated background and signal processes.Experimental uncertainties affect the event yields of background and the signal cross-section through their effects on the acceptance and the event migration between different analysis regions.
Theoretical uncertainties are considered for the renormalisation (  ) and factorisation (  ) scales, the choice of PDF, the choice of the value of the strong coupling constant   , and the modelling of the  t and diboson backgrounds.The strategy used in the same analysis with partial data sample at √  = 13 TeV [9] is followed in this search.
Variation of the renormalisation and factorisation scales is used to estimate the uncertainty due to missing higher order corrections.Pairwise variations of   and   are considered to find the maximum and minimum variations.The PDF uncertainty consists of the contribution from the PDF set used in the matrix element calculation.It is estimated using different PDF sets and eigenvector variations within a particular PDF set for the top-quark, diboson, and +jets backgrounds.The   uncertainty is evaluated by using the same PDF set evaluated with two different   values.The uncertainty in the modelling of the  t background is assessed by evaluating two MC samples which are generated by Powheg-Box+Pythia8 and aMC@NLO +Pythia8 as sources of modelling uncertainty.The uncertainty in the modelling of the diboson background is assessed by evaluating two MC samples which are generated by Sherpa 2.2.12 and Powheg-Box+Pythia8.
Experimental systematic uncertainties common to signal and background processes are assumed to be correlated.For signal processes, only experimental systematic uncertainties and uncertainties due to the limited statistical precision of simulated samples are considered.All uncertainties are evaluated as a function of  ℓℓ ′ .
For illustration, Table 2 lists the post-fit impact of uncertainties in the RPV SUSY ν model for each measurement grouped by their respective sources.The  ′ and QBH models have similar results.The sources of the largest systematic uncertainties are the statistical uncertainties associated with the background estimate and background modelling uncertainties in the  channel.In the -lepton channels, the uncertainties are dominated by the statistical precision of the background estimate and experimental uncertainties related to -leptons and the estimate of fake backgrounds.The total uncertainties are dominated by the statistical precision in all three channels.

Statistical analysis
The dilepton invariant mass distribution in the SR and CRs are fitted simultaneously to test for the presence of a signal.For hypothesis testing, binned profile-likelihood fits are performed, following a modified frequentist method [75] implemented in RooStats [76].The fits are performed for each decay channel separately.For the  channel,  and low Δ ℓℓ ′  t CRs are included in the fit to extract the overall normalisation of the diboson background while keeping the normalisation of the top quark background uncorrelated in the high-and low-Δ ℓℓ ′ regions.Separate normalisation factors of the top quark background are used for the high-and low-Δ ℓℓ ′ regions in the simultaneous fit, by which their correlation is determined.
For the  and  channels, only  t CRs are included in the fit and the diboson correction factors are calculated from the CR-only fit results and put into the final fit as fixed normalisation factors.
The binned likelihood function (L (, )) is constructed as a product of Poisson probability terms over all bins considered in the search.The likelihood function depends on the parameter of interest (POI), the signal strength parameter , a factor multiplying the theoretical signal production cross-section and a set of nuisance parameters  that encode the effect of systematic uncertainties in the signal and background expectations.All nuisance parameters are implemented in the likelihood function as Gaussian constraints.
Unconstrained normalisation factors are applied on the top quark and diboson background components in the  channel, and the top quark background in the -lepton channels.They are controlled by the previously mentioned CRs in the respective channels.The expected event yield in a bin depends on the normalisation factors and on the nuisance parameters.The nuisance parameters adjust the expected event yields for signal and background according to the best fit to data.
The likelihood function is fitted to the data to test for the presence of a signal.The test statistic   is defined as the profile likelihood ratio,   = −2 ln L (, θ)/L ( μ, θ), where μ and θ are the values of the parameters that simultaneously maximise the likelihood function, and θ are the values of the nuisance parameters that maximise the likelihood function for a fixed value of .Compatibility of the observed data with the background-only hypothesis is tested by setting  = 0 in the test statistic  0 .Upper limits on the signal production cross-section for each considered signal scenario are computed using   in the CL  method [75] with the asymptotic approximation [77].A given signal scenario is considered to be excluded at the 95% confidence level (CL) if the value of the signal production cross-section (parameterised by ) yields a CL  value less than 0.05.

Results
Tables 3, 4 and 5 show the observed data event yields and the expected background event yields in the CRs and SRs for ,  and  channels.The  background is dominated by  t and diboson events, while +jets events are dominant for the  and  final states.The +jets background in the  and  channels is highly suppressed by the RNN  identification comparing to the previous analysis.
Binned profile-likelihood fits are performed on dilepton invariant mass spectra for each signal scenario in ,  and  channels.Table 6 shows the fitted correction factors for "Top Quarks" and "Diboson" backgrounds applied to SR. Figures 1 and 2 show the post-fit dilepton invariant mass distributions of the background-only fit in the CRs for the ,  and  channels.Figure 3 shows the post-fit dilepton invariant mass distributions of the background-only fit in the SR for the ,  and  channels.The SM expectations agree well with the data in the SR of the  channel.Mild tension with the SM background estimates is observed between 2.0 TeV and 2.3 TeV in the SRs of the -lepton channels.Nevertheless, SM expectations are still statistically consistent with the data within 2.
Figures 4-6 show the observed and expected 95% CL upper limits on the production cross-section times branching ratio of the  ′ , RPV SUSY ν and QBH models for each of the final states considered.The limits for signal masses above 3.0 TeV are dominated by the last bin in the dilepton invariant mass spectrum, thus the limit curve tends to be flat in the high mass region.
The extracted limits are not as strong for signal masses above about 2.5 TeV due to a decrease in acceptance at very high  T , and specifically to the LFV  ′ model, low-mass signal production due to PDF suppression at high masses.The observed limits are slightly more stringent than the expected limits above 2 TeV because of the small deficit observed in data in  channel.In the -lepton channels, the observed limits are less stringent than the expected limits above 2 TeV because of the mild excess above the background estimates between 2.0 TeV and 2.3 TeV.The acceptance times efficiency of the ADD and RS QBH models agree with each other within 1%, and the same prediction is used for the limit extraction.The results are summarised in Table 7.
Table 3: Observed data event yields and the expected background event yields with their total uncertainties in the CRs in the  channel after applying all selection criteria and the background-only fit.The "Fake" background refers to "+jets" and "Multĳet" events with fake leptons.The "Top Quarks" contains single top and  t events.and (c) QBH ADD and RS production cross-section times branching ratio for decays into an  final state.The signal theoretical cross-section times branching ratio lines for the  ′ model, the QBH ADD model assuming six extra dimensions, and the RS model with one extra dimension are obtained from the simulation of each process, while the  ′ is corrected to NNLO and the RPV ν is corrected to NLO.The theoretical uncertainties are not considered in the mass limit calculation.The acceptance times efficiency of the ADD and RS QBH models agree to within 1% and the same curve is used for limit extraction.The expected limits are shown with the ±1 and ±2 standard deviation uncertainty band.and (c) QBH ADD and RS production cross-section times branching ratio for decays into an  final state.The signal theoretical cross-section times branching ratio lines for the  ′ model, the QBH ADD model assuming six extra dimensions, and the RS model with one extra dimension are obtained from the simulation of each process, while the  ′ is corrected to NNLO and the RPV ν is corrected to NLO.The theoretical uncertainties are not considered in the mass limit calculation.The acceptance times efficiency of the ADD and RS QBH models agree to within 1% and the same curve is used for limit extraction.The expected limits are shown with the ±1 and ±2 standard deviation uncertainty bands.and (c) QBH ADD and RS production cross-section times branching ratio for decays into a  final state.The signal theoretical cross-section times branching ratio lines for the  ′ model, the QBH ADD model assuming six extra dimensions, and the RS model with one extra dimension are obtained from the simulation of each process, while the  ′ is corrected to NNLO and the RPV ν is corrected to NLO.The theoretical uncertainties are not considered in the mass limit calculation.The acceptance times efficiency of the ADD and RS QBH models agree to within 1% and the same curve is used for limit extraction.The expected limits are shown with the ±1 and ±2 standard deviation uncertainty bands.

Conclusions
A search for a heavy particle decaying into ,  or  final states is conducted using 139 fb −1 of proton-proton collision data at √  = 13 TeV recorded by the ATLAS detector at the Large Hadron Collider.The data are found to be consistent with the Standard Model expectation in the SR ( ℓℓ ′ > 600 GeV).With no evidence for new physics, profile likelihood fits are used to set 95% CL lower limits on the mass of a Z ′ vector boson with lepton-flavour-violating couplings at 5.0, 4.0 and 3.9 TeV for the ,  or  final states, respectively; on the mass of a supersymmetric -sneutrino with R-parity-violating couplings at 3.9, 2.8 and 2.7 TeV; and on the threshold mass for quantum black-hole production in the context of the ADD n=6 (RS n=1) model at 5.9 (3.8), 5.2 (3.0) and 5.1 (3.0) TeV respectively.
For the mass limit of the  ′ signal model, as an illustration, the observed results based on the full data samples recorded by the ATLAS experiment at √  = 13 TeV are more stringent by 0.6, 0.3 and 0.4 TeV than the corresponding limits based on the 36.1 fb −1 data samples at √  = 13 TeV [9].The expected sensitivity is improved by 0.5, 0.6 and 0.7 TeV for the ,  and  channels respectively.The increases on the observed limits in  and  channels are not as strong as the expected ones due to the slight excess observed in data.The major improvement comes from the four times larger data sample.The increase of sensitivity is also due to the application of b-jet veto, more accurate data-driven background estimates, as well as better particle reconstruction and identification.

Figure 1 :
Figure 1: The background-only post-fit invariant mass distribution of (a) Low Δ ℓℓ ′  t CR, (b)  t CR and (c)  CR for data and the SM background predictions in the  channel.The error bars show the statistical uncertainty of the observed yields, while the hashed band includes the post-fit total uncertainties taking into account all the correlations.The ratio of data to the best-fit prediction is shown in the bottom panels of the plots.The last bin contains the overflow events.The binning of the control regions was limited by the background statistics.

Figure 2 :
Figure 2: The background-only post-fit invariant mass distribution of (a)  and (b)  pairs for data and the SM background predictions in the  t CR.The error bars show the statistical uncertainty of the observed yields, while the hashed band includes the post-fit total uncertainties taking into account all the correlations.The ratio of data to the best-fit prediction is shown in the bottom panels of the plots.The last bin contains the overflow events.

Figure 3 :
Figure 3: The background-only post-fit invariant mass distribution of (a) , (b)  and (c)  pairs for data and the SM background predictions in the SR.The error bars show the statistical uncertainty of the observed yields, while the hashed band includes the post-fit total uncertainties taking into account all the correlations.The dashed line shows a typical pre-fit signal mass distribution (RPV ν at 3 TeV).The ratio of data to the best-fit prediction is shown in the bottom panels of the plots.The arrow shows the difference between data and MC which exceed the ratio range.The last bin contains the overflow events.

Figure 4 :
Figure 4: The observed and expected 95% CL upper limits on the (a) Z ′ boson, (b) RPV -sneutrino ( ν )and (c) QBH ADD and RS production cross-section times branching ratio for decays into an  final state.The signal theoretical cross-section times branching ratio lines for the  ′ model, the QBH ADD model assuming six extra dimensions, and the RS model with one extra dimension are obtained from the simulation of each process, while the  ′ is corrected to NNLO and the RPV ν is corrected to NLO.The theoretical uncertainties are not considered in the mass limit calculation.The acceptance times efficiency of the ADD and RS QBH models agree to within 1% and the same curve is used for limit extraction.The expected limits are shown with the ±1 and ±2 standard deviation uncertainty band.

Figure 5 :
Figure 5: The observed and expected 95% CL upper limits on the (a)  ′ boson, (b) RPV -sneutrino ( ν )and (c) QBH ADD and RS production cross-section times branching ratio for decays into an  final state.The signal theoretical cross-section times branching ratio lines for the  ′ model, the QBH ADD model assuming six extra dimensions, and the RS model with one extra dimension are obtained from the simulation of each process, while the  ′ is corrected to NNLO and the RPV ν is corrected to NLO.The theoretical uncertainties are not considered in the mass limit calculation.The acceptance times efficiency of the ADD and RS QBH models agree to within 1% and the same curve is used for limit extraction.The expected limits are shown with the ±1 and ±2 standard deviation uncertainty bands.

Figure 6 :
Figure 6: The observed and expected 95% CL upper limits on the (a)  ′ boson, (b) RPV -sneutrino ( ν )and (c) QBH ADD and RS production cross-section times branching ratio for decays into a  final state.The signal theoretical cross-section times branching ratio lines for the  ′ model, the QBH ADD model assuming six extra dimensions, and the RS model with one extra dimension are obtained from the simulation of each process, while the  ′ is corrected to NNLO and the RPV ν is corrected to NLO.The theoretical uncertainties are not considered in the mass limit calculation.The acceptance times efficiency of the ADD and RS QBH models agree to within 1% and the same curve is used for limit extraction.The expected limits are shown with the ±1 and ±2 standard deviation uncertainty bands.

Table 2 :
Summary of the different sources of relative uncertainty (post-fit) in percentage on the observed signal strength of the RPV SUSY ν model with a mass of 3 TeV in the , , and  channels."Other" refers to luminosity, jet-vertex-tagger (JVT) and pile-up weight uncertainties."NA" stands for not applicable.

Table 4 :
Observed data event yields and the expected background event yields with their total uncertainties in the CRs in the  and  channels after applying all selection criteria and the background-only fit.The "Fake" background is estimated directly by +jets MC, and the multĳet contribution is not considered in the  t CRs.The "Top Quarks" contains single top and  t events.

Table 5 :
Observed data event yields and the expected background event yields with their total uncertainties in the SR ( ℓℓ ′ > 600 GeV) in the , , and  channels after applying all selection criteria and the background-only fit.The "Fake" background refers to "+jets" and "Multĳet" events with fake leptons.The "Top Quarks" contains single top and  t events.

Table 6 :
The fitted correction factors of "Top Quarks" and "Diboson" backgrounds which are applied in the SR for the background-only fit in the ,  and  channel.For the  and  channels, the diboson correction factors are calculated from the CR-only fit results and put into the final fit as fixed normalisation factors.

Table 7 :
Expected and observed 95% CL lower limits on the mass of a  ′ boson with lepton-flavourviolating couplings, a supersymmetric -sneutrino ( ν ) with -parity-violating couplings, and the threshold mass for quantum black-hole production for the ADD  = 6 and RS  = 1 models.