1 Introduction

In the Standard Model (SM), no doubly charged bosons are present. However, various theories beyond the Standard Model (BSM theories), namely type-II seesaw models [1,2,3,4,5,6], left-right symmetric models (LRSMs) [7,8,9,10,11,12], the Zee–Babu neutrino mass model [13,14,15], 3-3-1 models [16], and the Georgi–Machacek model [17], predict the existence of doubly charged bosons decaying into same-charge lepton pairs. In left-right symmetric models based on the \(SU(2)_L \times SU(2)_R \times U(1)_{B-L}\) symmetry, Higgs multiplets \(\Delta _L\) and \(\Delta _R\) transforming, respectively, as triplets under \(SU(2)_L\) and \(SU(2)_R\) gauge symmetries, contain doubly charged Higgs bosons, termed \(H^{\pm \pm } _L\) and \(H^{\pm \pm } _R\). Doubly charged Higgs particles can therefore couple to either left-handed or right-handed leptons. The cross-section for the Drell–Yan production of \(H^{++}_L H^{--}_L\) is a factor of about two larger than for the \(H^{++}_R H^{--}_R\) production, due to the different couplings to the \(Z\) boson [18]. Since the \(H^{\pm \pm } _L\) particle is common to LRSMs and the canonical type-II seesaw model, the results can be directly interpreted in both. In the Zee–Babu case, two complex scalar \(SU(2)_L\) singlets are proposed within the SM gauge group, where one of them is doubly charged and is usually denoted by \(k^{\pm \pm }\). As it has the same quantum numbers as the \(H^{\pm \pm } _R\) from LRSMs, their electroweak production is the same.Footnote 1 It was recently shown that for the Drell–Yan production mechanism studied in this analysis, cross-sections and differential scalar distributions in the Zee–Babu and type-II seesaw models differ at most by a normalisation factor when all theoretical inputs are the same [19]. If not stated explicitly, \(H^{\pm \pm }\) represents any of the \(H^{\pm \pm } _L\), \(H^{\pm \pm } _R\), or \(k^{\pm \pm }\) particles throughout the paper.

The Feynman diagram of the Drell–Yan pair production mechanism considered in this analysis is shown in Fig. 1.

Fig. 1
figure 1

Feynman diagram of the pair production process \(\textit{pp }\rightarrow H^{++} H^{--}\). While the analysis allows for \(H^{\pm \pm }\) decays into all lepton flavour combinations, it only studies electrons and muons in the final states

At the LHC, other mechanisms such as vector-boson fusion, gluon–gluon fusion, and photon-initiated [20] processes are less important than Drell–Yan production [18] in the mass range of interest for this paper, which is 400 GeV to 1300 GeV.

The ATLAS Collaboration previously analysed data corresponding to an integrated luminosity of \({36.1}\,{\hbox {fb}}^{-1}\) from \(\sqrt{s}= {13}{\hbox { TeV}}\) pp collisions at the LHC during the 2015 and 2016 data-taking periods. Masses of doubly charged Higgs bosons were excluded up to 870 GeV for \(H^{\pm \pm } _L\) and up to 760 GeV for \(H^{\pm \pm } _R\) at 95% confidence level (\(\text {CL}\)) [21]. The CMS Collaboration performed a similar search using \(\sqrt{s}={7}{\hbox { TeV}}\) pp collisions collected during Run 1 [22], establishing lower bounds on the \(H^{\pm \pm }\) mass between 204 GeV and 459 GeV in the 100% branching fraction scenarios, depending on the flavour content of the two-lepton final states. A complementary analysis from the ATLAS Collaboration searched for doubly charged Higgs bosons in the \(H^{\pm \pm } \rightarrow W^\pm W^\pm \) decay channel using the whole \({139}\,{\hbox {fb}}^{-1}\) dataset from Run 2 of the LHC and excluded \(H^{\pm \pm }\) masses up to 350 GeV [23].

The doubly charged Higgs boson can decay to \(W\,W\,\) or \(\ell \ell ^\prime \) (\(\ell , \ell ^\prime =e,~\mu ,~\tau \)), depending on the vacuum expectation value of the \(\Delta _L\) Higgs triplet, \(v_{\Delta }\) [5]. In this paper, it is assumed that \(v_{\Delta }\) is smaller than \({10^{-8}}{\hbox { GeV}}\). Therefore, only leptonic decays of doubly charged Higgs bosons, \(H^{\pm \pm } \rightarrow \ell ^{\pm } \ell ^{\prime \pm }\), are relevant.

A measurement of same-charge lepton pairs is sensitive to the same-charge partial decay width of \(H^{\pm \pm }\) given by:

$$\begin{aligned} \Gamma (H^{\pm \pm } \rightarrow \ell ^{\pm } \ell ^{\prime \pm }) = k\frac{h^{2}_{\ell \ell ^\prime }}{16 \pi } m(H^{\pm \pm }), \end{aligned}$$
(1)

with \(k=2\) if the two leptons have the same flavour and \(k=1\) if their flavours are different. The factor \(h_{\ell \ell ^\prime }\) has an upper bound that depends on the flavour combination [24, 25] but it is set to an allowed value \(h_{\ell \ell ^\prime } = 0.02\) for all flavour combinations in this analysis. This choice corresponds to prompt decays of the \(H^{\pm \pm }\) bosons and, according to Eq. (1), results in a mean proper decay length \(c\tau \sim {10}{\hbox { fm}}\) across the whole \(H^{\pm \pm }\) mass range considered in this search. In general, there is no preference for decays into heavier leptons, as the coupling is not proportional to the lepton mass as it is for the SM Higgs boson. Therefore, the couplings of \(H^{\pm \pm }\) to all three SM charged leptons (\(e,~\mu ,~\tau \)) were assumed to be the same.

It is worth stressing that lepton-flavour-violating decays such as \(H^{\pm \pm } \rightarrow e^\pm \mu ^\pm \), \(H^{\pm \pm } \rightarrow \mu ^\pm \tau ^\pm \) or \(H^{\pm \pm } \rightarrow e^\pm \tau ^\pm \) are also allowed by the model. This fact connects high-energy searches at LHC with low-energy neutrinoless double beta decay searches [26].

The analysis presented in this paper searches for leptonic decays \(H^{\pm \pm } \rightarrow \ell ^\pm \ell ^{\prime \pm }\) of doubly charged Higgs boson pairs, resulting in same-charge lepton pairs in final states with two, three or four leptons. Since these events are produced very rarely in proton–proton (pp ) collisions by SM processes, same-charge lepton pairs represent a striking signature for BSM physics. The analysis only considers light leptons (e and \(\mu \)) in the two-, three- and four-lepton final states, thereby including leptonic \(\tau \) decays, using the whole \({139}\,{\hbox {fb}}^{-1}\) dataset from Run 2 of the LHC. The assumption that the branching ratios to each of the possible leptonic final states are equal, \(\mathcal {B} (H^{\pm \pm } \rightarrow e^\pm e^\pm ) = \mathcal {B} (H^{\pm \pm } \rightarrow e^\pm \mu ^\pm ) = \mathcal {B} (H^{\pm \pm } \rightarrow \mu ^\pm \mu ^\pm ) = \mathcal {B} (H^{\pm \pm } \rightarrow e^\pm \tau ^\pm ) = \mathcal {B} (H^{\pm \pm } \rightarrow \mu ^\pm \tau ^\pm ) = \mathcal {B} (H^{\pm \pm } \rightarrow \tau ^\pm \tau ^\pm ) = 1/6\), is made. To estimate the compatibility of the data with the SM expectations, as well as to extract upper limits on the \(H^{\pm \pm }\) mass at 95% \(\text {CL}\), a binned maximum-likelihood fit to data is performed in all control and signal regions, described in Sect. 4. A simultaneous fit to the distribution of the invariant mass of the two same-charge leptons with the highest \(p_{\text {T}}\) in the event is implemented in the two- and three-lepton regions, while a single-bin event yield is used in the four-lepton regions, merging two-lepton channels to obtain the best sensitivity.

This paper is organised as follows. The ATLAS detector is described in Sect. 2, the data and simulated events used in the analysis are outlined in Sect. 3, and the event reconstruction procedure and the analysis strategy are detailed in Sect. 4. The background estimation is presented in Sect. 5, and the systematic uncertainties are described in Sect. 6. Finally, results and their statistical interpretation are presented in Sect. 7, followed by the conclusions in Sect. 8.

2 ATLAS detector

The ATLAS detector [27] is a multipurpose particle detector covering nearly \(4\pi \) in solid angle in one of the pp interaction regions of the LHC. It consists of several subdetectors assembled with a cylindrical symmetry coaxial with the beam axis.Footnote 2 The inner detector (ID) surrounding the beam pipe is immersed in a 2 T solenoidal magnetic field and provides charged-particle tracking in the range \(|\eta | < 2.5\). Starting from the beam pipe and going outwards, the ID is composed of a high-granularity silicon pixel detector that typically provides four measurements per track, the first hit normally being in the insertable B-layer installed before Run 2 [28, 29], a silicon microstrip tracker, and a transition radiation tracker (TRT) that covers the region up to \(|\eta | = 2.0\). The TRT also provides electron identification information based on the fraction of hits (typically 30 in total) above a higher energy-deposit threshold corresponding to transition radiation.

Outside the thin superconducting solenoid, the calorimeter system covers the pseudorapidity range \(|\eta | < 4.9\). Within the region \(|\eta |< 3.2\), electromagnetic calorimetry is provided by barrel and endcap high-granularity lead/liquid-argon (LAr) calorimeters, with an additional thin LAr presampler covering \(|\eta | < 1.8\) to correct for energy loss in material upstream of the calorimeters. Hadron calorimetry is provided by the steel/scintillator-tile calorimeter, segmented into three barrel structures within \(|\eta | < 1.7\) and two copper/LAr hadron endcap calorimeters. The solid angle coverage is completed with forward copper/LAr and tungsten/LAr calorimeter modules optimised for electromagnetic and hadronic energy measurements respectively.

In the outer part of the ATLAS detector, the muon spectrometer (MS) covers the region \(|\eta | < 2.7\) with three layers of precision tracking chambers. These consist of monitored drift tubes except in the innermost layer at \(2.0< |\eta | < 2.7\), where cathode-strip chambers are used to cope with the much higher rate of background particles. The muon trigger system covers the range \(|\eta | < 2.4\) with resistive-plate chambers in the barrel, and thin-gap chambers in the endcap regions. The MS is immersed in a magnetic field produced by three large superconducting air-core toroidal magnets with eight coils each. The field integral of the toroids ranges between 2.0 and 6.0 Tm across most of the detector.

A two-level trigger system selects events that are of interest for the ATLAS physics programme [30]. The first-level trigger is implemented in hardware and reduces the event rate to below 100 kHz. A software-based trigger further reduces this to a recorded event rate of approximately 1 kHz.

An extensive software suite [31] is used in data simulation, in the reconstruction and analysis of real and simulated data, in detector operations, and in the trigger and data acquisition systems of the experiment.

3 Dataset and simulated event samples

This analysis uses data from pp collisions occurring every 25 ns at \(\sqrt{s}={13}{\hbox { TeV}}\), collected during Run 2 of the LHC from 2015 to 2018. Events selected for analysis are required to pass standard data-quality requirements [32] and they amount to a total integrated luminosity of \({139}\,{\hbox {fb}}^{-1}\). The uncertainty in the combined 2015–2018 integrated luminosity is 1.7% [33], obtained using the LUCID-2 detector [34] for the primary luminosity measurements. Events were collected using two-lepton triggers that select pairs of electrons [35] or muons [36] or electron–muon combinations [35]. The transverse momentum (\(p_{\text {T}}\)) thresholds of the unprescaled two-lepton triggers were raised during Run 2 because of the increasing luminosity of the colliding beams but were never higher than 24 GeV (24 GeV) for the (sub-)leading electron or 22 GeV (14 GeV) for the (sub-)leading muon.

Monte Carlo (MC) methods were used to model signal and background events. The Geant4 toolkit [37] was used to simulate the response of the ATLAS detector [38]. Simulated events were reconstructed with the same algorithms as those applied to data [31] and were corrected with calibration factors to better match the performance measured in data. The various tools and set-ups used to simulate the different processes are summarised in Table 1. To minimise the impact of the theoretical uncertainties on the final result, the normalisation of the MC samples that model the dominant backgrounds is considered a free parameter in the final likelihood fit, described in Sect. 7.

Table 1 Simulated signal and background event samples: the corresponding event generator, PDF set used for the matrix element, cross-section normalisation, set of tuned parameters and parton shower are shown for each sample. The generator cross-section is used where not specifically stated otherwise

Signal events were generated at leading order (\(\text {LO}\)) with the LRSM package in Pythia 8.212 [39], which implements the \(H^{\pm \pm }\) scenario described in Ref. [40]. All samples contain a mixture of \(H^{\pm \pm } _L\) and \(H^{\pm \pm } _R\) particles. A set of tuned parameters called the A14 tune [41] and the NNPDF2.3lo parton distribution function (PDF) set [42] were used. The value of \(h_{\ell \ell ^\prime }\) in Eq. (1) was set to 0.02 for each leptonic decay mode to obtain a \(H^{\pm \pm }\) decay width that is negligible compared to the detector resolution. Consequently, the branching ratios are assumed to be equal for all possible leptonic final states. Only Drell–Yan production of the \(H^{\pm \pm }\) particles was simulated. The \(H^{\pm \pm } _L\) pair production cross-sections at \(\sqrt{s} ={13}\, {\hbox {TeV}}\) were calculated to leading-order and next-to-leading-order (\(\text {NLO}\)) accuracies [18]. The resulting \(\text {NLO}\)-to-\(\text {LO}\) K-factor was then applied to the \(H^{\pm \pm } _R\) \(\text {LO}\) production cross-section, \(\sigma _R^{\text {LO}}\). The cross-sections for the Zee–Babu model were calculated using the same simulation set-up at \(\text {NLO}\) accuracy [19] and agree well with the \(K \times \sigma _R^{\text {LO}}\) value, as expected. Eleven samples were simulated with different masses of the \(H^{\pm \pm }\) particle decaying into light leptons, and an additional 11 samples with at least one \(\tau \)-lepton in the final state were produced. Signal mass points considered in the analysis range from 400 GeV to 1300 GeV in steps of 100 GeV.

Drell–Yan \(\ell ^+\ell ^-\) production was modelled with the Sherpa 2.2.1 generator [43] and the NNPDF 3.0nnlo set of PDFs [44], using \(\text {NLO}\) matrix elements for up to two partons, and \(\text {LO}\) matrix elements for up to four partons, calculated with the Comix [45] and OpenLoops [46,47,48] libraries. They were matched with the Sherpa parton shower [49] using the MEPS@NLO prescription [50,51,52,53]. The parton shower used the set of tuned parameters developed by the Sherpa authors, and the samples were normalised to a next-to-next-to-leading-order (\(\text {N}\) \(\text {NLO}\)) prediction [54]. Similar methods were used to simulate \(W\,\text {+ jets}\) processes that contribute to the fake-lepton background, which is estimated in Sect. 5.

Samples of diboson (VV) and multiboson (VVV) final states were simulated with the Sherpa [43] generator, using version 2.2.1 or 2.2.2, depending on the process. Where appropriate, off-shell effects and Higgs boson contributions were included. Fully leptonic final states and semileptonic final states, where one boson decays leptonically and the other hadronically, were generated using matrix elements at \(\text {NLO}\) accuracy in QCD for up to one additional parton and at \(\text {LO}\) accuracy for up to three additional parton emissions. Samples for the loop-induced processes, \(gg \rightarrow V\!V\), were generated using \(\text {LO}\)-accurate matrix elements for up to one additional parton emission for both the fully leptonic and semileptonic final states. The matrix element calculations were matched and merged with the Sherpa parton showers, using Catani–Seymour dipole factorisation [45, 49] with the MEPS@NLO prescription [50,51,52,53]. The virtual QCD corrections were provided by the OpenLoops library [46,47,48]. The NNPDF 3.0nnlo set of PDFs was used [44], along with the dedicated set of tuned parton-shower parameters developed by the Sherpa authors.

The \(\text {NLO}\) Powheg Box v2 [55,56,57,58] generator was used with the NNPDF 3.0nlo [44] PDF set to model the production of \(t{}\bar{t}\) and single-top-quark Wt-channel events; the top-quark mass was set to 172.5 GeV and the \(h_\textrm{damp}\) parameterFootnote 3 was set to \(1.5\,m_{\text {top}} \) [60]. For the Wt-channel single-top events, the diagram removal scheme [61] was used to remove interference and overlap with \(t{}\bar{t}\)  production. The events were interfaced to Pythia 8.230 [39] to model the parton shower, hadronisation, and underlying event, with parameters set according to the A14 tune [41] and using the NNPDF 2.3lo set of PDFs [42]. The decays of bottom and charm hadrons were performed by EvtGen 1.6.0 [62]. Additionally, top-quark spin correlations are preserved through the use of MadSpin [63]. The predicted \(t{}\bar{t}\)  production cross-section, calculated with Top++ 2.0 [64] to \(\text {N}\) \(\text {NLO}\) in perturbative QCD and including soft-gluon resummation to next-to-next-to-leading-log order, is \(830^{+20}_{-29}\,\mathrm {(scale)} \pm 35\,(\textrm{PDF} + \alpha _{\text {s}})\,{\hbox {pb}}\).

The MadGraph5_aMC@NLO 2.3.3 [59] generator was used with the NNPDF 3.0nlo [44] PDF set to model the production of \(t{}\bar{t}+ W/Z/H\) events. The events were interfaced to Pythia 8.210 [39] using the A14 tune [41] and the NNPDF 2.3lo [44] PDF set. The production of 3t and 4t events was modelled using the MadGraph5_aMC@NLO 2.2.2 [59] generator at \(\text {LO}\) with the NNPDF 2.3lo [44] PDF set. The events were interfaced with Pythia 8.186 [65] using the A14 tune [41] and the NNPDF 2.3lo [44] PDF set. The decays of bottom and charm hadrons were simulated using the EvtGen 1.2.0 program [62].

Top-quark processes (\(t{}\bar{t}\), single-top-quark, 3t, 4t and \(t{}\bar{t}+ W/Z/H\)) and multiboson processes (\(W\,\!\!W\,\!\!W\,\)\(W\,\!\!W\,\!\!Z\,\), \(W\,\!\!Z\,\!\!Z\,\) and \(Z\,\!\!Z\,\!\!Z\,\) ) are grouped together and form the ‘Other’ background category. Pile-up events from additional interactions in the same or neighbouring bunch crossings were simulated with Pythia 8.186 [65] using the NNPDF 2.3lo PDFs and the A3 tune [66] and overlaid on the simulated hard-scatter events, which were then reweighted to match the pile-up distribution observed in data.

4 Event reconstruction and selection

To reduce non-collision backgrounds originating from beam-halo events and cosmic rays, events are required to have at least one reconstructed primary vertex with at least two associated tracks, each having \(p_{\text {T}}\) greater than 500 MeV. Among all the vertices in the event, the one with the highest sum of squared \(p_{\text {T}}\) of the associated tracks is identified as the primary vertex. Events that contain jets must also satisfy the quality criteria described in Ref. [67].

4.1 Event reconstruction

This analysis classifies leptons into two exclusive categories called tight and loose, defined specifically for each lepton flavour as described in Sect. 5. Leptons selected in the tight category are mostly prompt leptons that are estimated using Monte Carlo samples and originate from decays of \(Z\) , \(W\) , H bosons, or from prompt leptonic \(\tau \) decays, while loose leptons are mostly non-prompt and misidentified leptons used for the estimation of reducible background. Lepton definitions described in the following three paragraphs correspond to the tight category and the corresponding selection is referred to as baseline event selection.

Electron candidates are reconstructed by using electromagnetic calorimeter and ID information to match an isolated calorimeter energy deposit to a charged-particle track. They are required to stay within the fiducial volume of the inner detector, \(|\eta | < 2.47\), and to have transverse momenta \(p_{\text {T}} > {40}{\hbox { GeV}}\). Moreover, electrons must pass at least the LHTight identification level, based on a multivariate likelihood discriminant [68, 69]. The discriminant relies on track and calorimeter energy cluster information. Electron candidates within the transition region between the barrel and endcap electromagnetic calorimeters (\(1.37< |\eta | < 1.52\)) are vetoed due to limitations in their reconstruction quality. The track associated with the electron candidate must have a transverse impact parameter \(d_{0}\) that, evaluated at the point of closest approach of the track to the beam axis in the transverse plane, satisfies \(|d_{0} |/\sigma (d_{0}) < 5\), where \(\sigma (d_{0})\) is the uncertainty on \(d_{0}\). In addition, the longitudinal distance \(z_{0}\) from the primary vertex to the point where \(d_{0}\) is measured must satisfy \(|z_{0} \sin (\theta )| < {0.5}{\hbox { mm}}\), where \(\theta \) is the polar angle of the track. The combined identification and reconstruction efficiency for LHTight electrons varies from 58% to 88% over the transverse energy range from 4.5 GeV to 100 GeV. Electron candidates must also satisfy the FCLoose isolation requirement, which uses calorimeter-based and track-based isolation criteria that reach an efficiency of about 99%. Electron candidates are discarded if their angular distance from a jet satisfies \(0.2< \Delta R < 0.4\). If \(\Delta R (e,j) < 0.2\), the jet is discarded. Both the reconstruction and isolation performance are evaluated in \(Z\,\rightarrow e{}e\) decay measurements described in Ref. [68].

A dedicated machine-learning algorithm (BDT) is applied to reject electrons with incorrectly identified charge [68]. A selection requirement on the BDT output is chosen to achieve a rejection factor that is between 7 and 10 for electrons having a wrong charge assignment while selecting properly measured electrons with an efficiency of 97%.

Muon candidates are reconstructed by matching a track from the muon spectrometer with an inner-detector track. The candidates must have \(p_{\text {T}} > {40} {\hbox { GeV}}\) and \(|\eta | < 2.5\), and satisfy the impact parameter requirements \(|d_{0} |/\sigma (d_{0}) < 3\) and \(|z_{0} \sin (\theta )| < {0.5}{\hbox { mm}}\). If a muon candidate has a \(p_{\text {T}}\) higher than 300 GeV, it must satisfy a special HighPt muon quality requirement. If its \(p_{\text {T}}\) is less than 300 GeV, the Medium quality requirement is used. The criteria associated with these selections are described in Ref. [70]. The muon candidates must also meet the track-based FixedCutTightTrackOnly isolation requirement. The full set of selections results in a reconstruction and identification efficiency of over 95% in the entire phase space, as measured in \(Z\,\rightarrow \mu {}\mu \) events [71]. If a muon and a jet featuring more than three tracks lie \(\Delta R < 0.4\) apart and the muon’s transverse momentum is less than half of the jet’s \(p_{\text {T}}\), the muon is discarded. Furthermore, a muon candidate that deposits a sufficiently large fraction of its energy in the calorimeter is rejected if it also shares a reconstructed ID track with an electron.

Jets are reconstructed using particle-flow objects that combine tracking and calorimetric information [72]. The anti-\(k_{t}\) algorithm [73], implemented in FastJet [74], is applied with a radius parameter \(R = 0.4\). The measurements of jet \(p_{\text {T}}\) are calibrated to the particle level [75]. To help reject additional jets produced by pile-up, a ‘jet vertex tagger’ (JVT) algorithm [76] is used for jets with \(p_{\text {T}} < {60}{\hbox { GeV}}\) and \(|\eta | < 2.4\). It employs a multivariate technique that relies on jet energy and tracking variables to determine the likelihood that the jet originates from the primary vertex. The medium JVT working point is used, which has an average efficiency of 92%.

Jets considered in this analysis are required to have \(p_{\text {T}} > {20}{\hbox { GeV}}\) and \(|\eta | < 2.5\). Jets within \(\Delta R = 0.2\) of an electron are discarded. Jets within \(\Delta R = 0.2\) of a muon and featuring fewer than three tracks or having \(p_{\text {T}} (\mu ) / p_{\text {T}} (\textrm{jet}) > 0.5\) are also discarded. Events containing jets identified as originating from b-quarks are vetoed. They are identified with a multivariate algorithm [75] which uses information about the impact parameters of inner-detector tracks matched to the jet, the presence of displaced secondary vertices, and the reconstructed flight paths of b- and c-hadrons inside the jet [77,78,79]. The algorithm is used at a working point that provides a b-tagging efficiency of 77%, as determined in a sample of simulated \(t{}\bar{t}\) events. The working point provides rejection factors of approximately 134, 6, and 22 for light-quark and gluon jets, c-jets, and hadronically decaying \(\tau \)-leptons, respectively.

The missing transverse momentum is defined as the negative vector sum of the transverse momenta of fully reconstructed particles and its magnitude is denoted by \(E_{\text {T}}^{\text {miss}}\). If an event has tracks that are not assigned to any of the physics objects but are matched to the primary vertex, an additional ‘soft term’ is added. In a high pile-up environment, this term improves the reconstruction of \(E_{\text {T}}^{\text {miss}}\) [80].

Correction factors are applied to the selected electrons, muons, and jets. For electrons and muons, these factors account for identification and selection efficiency differences between data and simulation [68, 70], while for jets, they correct the jet energy scale and jet energy resolution [75].

4.2 Event selection

The baseline event selection requires at least two light leptons to be identified as tight. For this search, three distinct types of regions are defined: control regions (CR), validation regions (VR), and signal regions (SR). The normalisation factors of the dominant backgrounds are treated as free parameters in the likelihood fit and are constrained by the control regions. Validation regions are used to cross-check the fit results but are not included in the fit itself. Both the control regions and validation regions are intended to be close to the kinematic region of the expected signal but must be constructed in a way to allow only negligible signal contamination. The CRs and VRs are defined using selections that are orthogonal to the SRs. The analysis is sensitive to final states with three different lepton multiplicities. The regions are optimised to search for one same-charge lepton pair in the case of regions requiring two or three leptons, while four-lepton events are required to feature two same-charge pairs with zero total charge. The main variable used to distinguish between control, validation and signal regions is the invariant mass of the two same-charge leptons with the highest \(p_{\text {T}}\) in the event, \(m(\ell ^{\pm },\ell ^{\prime \pm })_{\textrm{lead}}\), where \(\ell , \ell ^\prime =e,~\mu \). This variable is also used in the final fit to data in the two- and three-lepton regions, while a single-bin event yield is used in the four-lepton regions. The selection criteria for each region are summarised in Table 2.

Table 2 Summary of regions defined in the analysis. The table is split into four horizontal blocks: the upper block indicates the final states for each region, the second block lists the lepton multiplicity of the region, the third block indicates the mass range of the corresponding final state, and the lower block indicates the event selection criteria for the region. The application of a selection requirement is indicated by a check-mark (✓), or by inverted when it is inverted. The three- and four-lepton regions include all light-lepton flavour combinations. No selection is applied when a dash is present in the corresponding cell. A selection on the average invariant mass of the two same-charge lepton pairs, \({\overline{m}} \equiv (m_{\ell ^+\ell ^{\prime +}}+m_{\ell ^-\ell ^{\prime -}})/2\), is used to increase the signal significance in the four-lepton signal region

Events with at least one b-tagged jet are vetoed in all regions to suppress background events arising from top-quark decays. In regions with more than two leptons, the so-called \(Z\) -veto condition is required, i.e. events containing same-flavour lepton pairs having invariant masses within 20 GeV of the \(Z\) -boson mass (\({71.2}{\hbox { GeV}}< m(\ell , \ell ) < {111.2}{\hbox { GeV}}\)) are rejected in order to suppress events featuring a \(Z\) boson in the final state. The \(Z\) -veto is not applied in four-lepton control and validation regions – this increases the available number of simulated diboson events.

4.2.1 Signal regions

The signal regions, independently of the lepton multiplicity and flavour combination, require invariant masses of the same-charge leading lepton pair to be above 300 GeV. To maximise the sensitivity, additional requirements are imposed on same-charge lepton pairs, regardless of the flavour. These exploit both the boosted decay topology of the \(H^{\pm \pm }\) resonance and the high energy of the decay products. The same-charge lepton angular separation in the two-lepton signal region (SR2L) is required to be \(\Delta R(\ell ^{\pm },\ell ^{\prime \pm }) < 3.5\). Furthermore, the vector sum of the two leading leptons’ transverse momenta, \(p_{\textrm{T}}(\ell ^{\pm },\ell ^{\prime \pm })_{\textrm{lead}}\), must exceed 300 GeV in both the two-lepton and three-lepton (SR3L) signal regions. In the four-lepton signal region (SR4L), the signal significance is increased by requiring the average invariant mass of the two same-charge lepton pairs to satisfy \({\overline{m}} = (m_{\ell ^+\ell ^{\prime +}}+m_{\ell ^-\ell ^{\prime -}})/2 > {300} {\hbox { GeV}}\).

The efficiency for signal events to pass the signal region selections depends on the signal mass and on the flavour combination of the leptons in the final states. For the benchmark \(H^{\pm \pm }\) masses, the efficiencies range between 10% and 20% in the two-lepton signal regions, while the typical efficiencies in the three- and four-lepton signal regions vary from 30% to 40%. Generally, the efficiencies increase with the \(H^{\pm \pm }\) mass. Focusing on two-lepton regions, the efficiencies are highest for the \(e \mu \) channel, lower for the ee channel and lowest for the \(\mu \mu \) channel, which is an effect of the two-lepton triggers used in the search.

The following control and validation regions either span a lower \(m(\ell ^{\pm },\ell ^{\prime \pm })_{\textrm{lead}}\) interval or require other orthogonal selections if the same \(m(\ell ^{\pm },\ell ^{\prime \pm })_{\textrm{lead}}\) window as in the signal regions is used.

4.2.2 Control regions

Firstly, the two-lepton diboson control region (DBCR2L) is used to constrain the dominant diboson contribution in the two-lepton final states and spans a \(m(\ell ^{\pm },\ell ^{\prime \pm })_{\textrm{lead}} \in [200,~300)~{\hbox { GeV}}\) range. A significant Drell–Yan event yield was observed in the ee channel, so additional selections on \(E_{\text {T}}^{\text {miss}} \) and the pseudorapidity of the same-charge lepton pair, \(|\eta (\ell ^\pm , \ell ^{\prime \pm })|\), are required to specifically suppress the Drell–Yan background. The Drell–Yan background is further constrained by the opposite-charge Drell–Yan control region (DYCR), where exactly two electrons with opposite charges are required. Since it is designed to target only opposite-charge electron pairs, the invariant mass of the opposite-charge electron pair is required to be \(m(e^{\pm },e^{\mp })_{\textrm{lead}} > {300}{\hbox { GeV}}\) and is also used as the fit variable. Then, the three-lepton diboson control region (DBCR3L) is used to constrain the diboson background yield, independently of the flavour combination. Since the \(m(\ell ^{\pm },\ell ^{\prime \pm })_{\textrm{lead}}\) requirement is the same as it is in the corresponding three-lepton signal region, at least one \(Z\) -boson candidate is required in order to achieve orthogonality. Finally, the four-lepton control region (CR4L) is used to constrain the yield of the dominant diboson background in four-lepton regions, where \(m(\ell ^{\pm },\ell ^{\prime \pm })_{\textrm{lead}}\) is restricted to be between 100 GeV and 200 GeV. The signal contamination in all control regions is negligible.

4.2.3 Validation regions

The two-lepton validation region (VR2L) is used to validate the data-driven fake-lepton background estimation and to assess the diboson modelling in the two-lepton channel. The \(m(\ell ^{\pm },\ell ^{\prime \pm })_{\textrm{lead}}\) selection is the same as in the SR2L, thus requiring an inverted \(p_{\textrm{T}}(\ell ^{\pm },\ell ^{\prime \pm })_{\textrm{lead}}\) cut with respect to the corresponding signal region. Similarly to the DBCR2L, additional requirements on \(E_{\text {T}}^{\text {miss}} \) and \(|\eta (\ell ^\pm , \ell ^{\prime \pm })|\) are imposed only in the ee channel. The three-lepton validation region (VR3L) is used to validate the diboson background and fake-lepton background with three reconstructed leptons in the final states. The \(m(\ell ^{\pm },\ell ^{\prime \pm })_{\textrm{lead}}\) value is required to be within the interval of \([100,~300)~{\hbox { GeV}}\). Additionally, a \(Z\) -veto condition is applied. The four-lepton validation region (VR4L) is used to validate the diboson modelling in the four-lepton region and is defined by the \({200}{\hbox { GeV}}< m(\ell ^{\pm },\ell ^{\prime \pm })_{\textrm{lead}} < {300}{\hbox { GeV}}\) requirement. The signal yield does not exceed a few per cent of the background yield in any validation region.

5 Background composition and estimation

In this section, background estimation techniques are described. Different lepton multiplicity and flavour channels in the analysis have different background compositions and thus require different treatments. Backgrounds can be categorised into irreducible and reducible types, which can be identified by their source. The former category is derived from MC simulation and the latter with data-driven methods.

Irreducible background sources are SM processes producing the same prompt final-state lepton pairs as the signal, with the dominant contributions in this analysis coming from diboson production. Irreducible prompt SM backgrounds in all regions are estimated using the simulated samples listed in Sect. 3.

Reducible background sources are processes where reconstructed leptons originate from misreconstructed objects such as jets, or from light- or heavy-quark decays or, in the electron case, from photon conversions. These types of events thus contain at least one non-prompt lepton. Events containing leptons whose charges were incorrectly assigned also enter the reducible background category. An example of this type of background is the Drell–Yan background, where the contribution is normalised to the DYCR, then reweighted for the charge misidentification probability and finally used to predict yields in the same-charge selections.

To avoid overlap between the irreducible backgrounds estimated using MC simulation and the data-driven, reducible backgrounds, events from the MC background samples are considered only if reconstructed leptons can be matched to their prompt generator-level counterparts. Electrons are the only significant source of charge misidentification.

5.1 Electron charge misidentification

Electron charge misidentification is caused mainly by bremsstrahlung emission from the electrons as they propagate through the detector material. The emitted photon can either convert to an electron–positron pair or traverse the inner detector without creating any track. In the first case, the calorimeter energy cluster corresponding to the initial electron can be matched to the wrong-charge track. In the case of photon emission without subsequent pair production, the electron track usually has only very few hits in the silicon pixel layers, while other hits are from unknown sources. This can lead to the wrong charge assignment from the track curvature, but a correct estimate of the electron energy, since its estimation relies mostly on the calorimeter.

The reducible background due to charge misidentification is estimated in same-charge analysis channels that contain electrons. The modelling of charge misidentification in the Geant4 simulations deviates from data because of the complex processes involved, which depend sensitively on the arrangement of the material in the detector. Consequently, charge reconstruction correction factors are derived by comparing the charge misidentification probability measured in \(Z\,\rightarrow e{}e\) data with the one in simulation. The charge misidentification probability is extracted by performing a likelihood fit to a dedicated \(Z\,\rightarrow e{}e\) data sample, as described in Ref. [68]. These scale factors are then applied to the simulated background events to compensate for the differences.

5.2 Fake/non-prompt lepton background

The non-prompt lepton background refers to a reducible background that originates from secondary decays of light- or heavy-flavour mesons into light leptons that are usually surrounded by jets. Another source of reducible background (fakes) arises from other physics objects misidentified as electrons or muons. Collectively, such events are referred to as the fake/non-prompt (FNP) background. The b-jet veto significantly reduces the number of FNP leptons from heavy-flavour decays. Most of the FNP leptons still passing the analysis selection originate from in-flight decays of mesons inside jets, jets misreconstructed as electrons, and conversions of initial- and final-state radiation photons. MC samples are not used to estimate these background sources because the simulation of jet production and hadronisation has large intrinsic uncertainties. It is estimated with a data-driven approach, the so-called fake factor method described in Ref. [81]. This method relies on determining the probability, or fake factor (\(F_\ell \)), for a FNP lepton to be identified as a prompt lepton, corresponding to the default lepton selection. Electrons that pass the LHLoose identification requirements [68, 69] but fail either the LHTight identification or the isolation requirements are referred to as loose. Muons, on the other hand, must pass the LHMedium identification requirements [70] but fail isolation to enter the loose category. Electron and muon fake factors, \(F_\ell \), are then defined as the ratio of the number of tight leptons to the number of loose leptons and are parameterised as functions of \(p_{\text {T}}\), \(E_{\text {T}}^{\text {miss}} \), and \(\eta \). The FNP lepton background (containing at least one fake lepton) is estimated in the SR by applying \(F_\ell \) as a normalisation correction to relevant distributions in a region which has the same selection criteria as the SR, except that at least one of the two leptons must pass the loose selection but fail the tight one. The fake factor is measured in dedicated fake-enriched regions, reported in Table 3, where events must pass prescaled support single-lepton triggers without isolation criteria, contain no b-jets and have exactly one reconstructed lepton that satisfies either the tight selection criteria or a relaxed loose selection. No additional requirements are used in the electron channel. Exactly one jet with a \(p_{\text {T}} > {35}{\hbox { GeV}}\) and one reconstructed muon have to be back-to-back in the muon fake-enriched region. This is achieved by requiring \(\Delta \phi (\mu ,\textrm{jet}) > 2.7\). In addition, \(W\,\rightarrow \mu \nu \) events are rejected by requiring \(E_{\text {T}}^{\text {miss}} < {40}{\hbox { GeV}}\). Most of the selected events are dijet events, while the majority of prompt leptons originate from the \(W\,\text {+ jets}\) process.

Table 3 Selection criteria defining the FNP lepton regions used to measure the ratio of the numbers of tight and loose leptons, the so-called fake factor, for the electron and muon channels. The fake factors are parameterised with a variable binning in \(p_{\text {T}}\), which ensures that the statistical uncertainty stays below a fixed threshold

The fake-factor method relies on the assumption that no prompt leptons appear in the fake-enriched samples, which is not fully correct given the imposed selections. Therefore, the number of residual prompt leptons in the fake-enriched regions is estimated using the Monte Carlo samples described in Sect. 3 and subtracted from the numbers of tight and loose leptons used to measure the fake factors.

Dedicated two-, three-, and four-lepton validation regions, defined in Table 2, are used to validate both the data-driven FNP lepton estimation and the modelling of the dominant diboson background in regions as similar to the signal regions as possible. Figures 2 and 3 present the \(m(\ell ^{\pm },\ell ^{\prime \pm })_{\textrm{lead}}\) distributions (\(\ell , \ell ^\prime =e,~\mu \)) in the validation regions, which are sensitive to different background sources. A simultaneous background-only fit to data in all CRs was performed with diboson and Drell–Yan background normalisations considered as free parameters and with nuisance parameters corresponding to systematics listed in Sect. 6 included. Good background modelling is observed in all these regions.

Fig. 2
figure 2

Distributions of two-lepton mass \(m(\ell ^{\pm },\ell ^{\prime \pm })_{\textrm{lead}}\) for data and SM background predictions in validation regions: a the electron–electron, b the electron–muon, and c the muon–muon two-lepton validation regions. Backgrounds from top-quark and multiboson processes are merged, forming the ‘Other’ category. The hatched bands include all systematic uncertainties after a background-only fit to data (post-fit), with the correlations between various sources taken into account. The last bin also includes any overflow entries. The error bars show statistical uncertainties. The lower panels show the ratio of the observed data to the estimated SM background. The red arrows indicate points that are outside the vertical range of the figure

Fig. 3
figure 3

a Distribution of two-lepton mass \(m(\ell ^{\pm },\ell ^{\prime \pm })_{\textrm{lead}}\), where \(\ell , \ell ^\prime =e,~\mu \) for data and SM background predictions in three-lepton validation regions and b event yield in the four-lepton validation region. Backgrounds from top-quark and multiboson processes are merged, forming the ‘Other’ category. The hatched bands include all systematic uncertainties after a background-only fit to data (post-fit), with the correlations between various sources taken into account. The last bin also includes any overflow entries. The error bars show statistical uncertainties. The lower panels show the ratio of the observed data to the estimated SM background

6 Systematic uncertainties

Several sources of systematic uncertainty are accounted for in the analysis. These correspond to experimental and theoretical sources affecting both the background and signal predictions. All considered sources of systematic uncertainty affect the total event yield, and all, except the luminosity and cross-section uncertainties, also affect the distributions of the variables used in the fit (Sect. 7).

Fig. 4
figure 4

Relative contributions of different sources of statistical and systematic uncertainty in the total background yield estimation after the background-only fit. Systematic uncertainties are calculated in an uncorrelated way by shifting in turn only one nuisance parameter from the post-fit value by one standard deviation, keeping all the other parameters at their post-fit values, and comparing the resulting event yield with the nominal yield. Validation regions do not constrain the normalisation factors or nuisance parameters. Individual uncertainties can be treated as correlated across the regions. Some backgrounds are constrained by the CR data and have strong anti-correlations as a result. The total background uncertainty is indicated by ‘Total uncertainty’

Fig. 5
figure 5

The numbers of observed and expected events in the control, validation, and signal regions for all channels, split by lepton flavour and electric charge combination. The symbol \(\ell \) only stands for light leptons (\(\ell , \ell ^\prime =e,~\mu \)). The background expectation is the result of the background-only fit described in the text. The hatched bands include all post-fit systematic uncertainties with the correlations between various sources taken into account. The error bars show statistical uncertainties. FNP refers to the fake/non-prompt lepton background. Backgrounds from top-quark and multiboson processes are merged, forming the ‘Other’ category. The lower panel shows the ratio of the observed data to the estimated SM background

Table 4 The number of predicted background events in control regions after the background-only fit, compared with the number of events observed in data. Uncertainties correspond to the uncertainties in the predicted event yields and their total is smaller than the sum of the components in quadrature due to correlations between these components. Due to rounding, the totals can differ from the sums of components. FNP refers to the fake/non-prompt lepton background. Backgrounds from top-quark and multiboson processes are merged, forming the ‘Other’ category. Background processes with a negligible yield are marked with a dash (–)
Table 5 The number of predicted background events in validation regions after the background-only fit, compared with the number of events observed in data. Uncertainties correspond to the uncertainties in the predicted event yields and their total is smaller than the sum of the components in quadrature due to correlations between these components. Due to rounding, the totals can differ from the sums of components. FNP refers to the fake/non-prompt lepton background. Backgrounds from top-quark and multiboson processes are merged, forming the ’Other’ category. Background processes with a negligible yield are marked with a dash (–)
Table 6 The number of predicted background events in signal regions after the background-only fit, compared with the number of events observed in data. Uncertainties correspond to the uncertainties in the predicted event yields and their total is smaller than the sum of the components in quadrature due to correlations between these components. Due to rounding, the totals can differ from the sums of components. FNP refers to the fake/non-prompt lepton background. Backgrounds from top-quark and multiboson processes are merged, forming the ’Other’ category. Background processes with a negligible yield are marked with a dash (-)
Fig. 6
figure 6

Distributions of \(m(\ell ^{\pm },\ell ^{\prime \pm })_{\textrm{lead}}\) in signal regions, namely a the electron–electron two-lepton signal region (SR2L), b the electron–muon two-lepton signal region (SR2L) and c the muon–muon two-lepton signal region (SR2L). The background expectation is the result of the background-only fit described in the text. The hatched bands include all systematic uncertainties post-fit with the correlations between various sources taken into account. The solid coloured lines correspond to signal samples, normalised using the theory cross-section, with the \(H^{\pm \pm }\) mass marked in the legend. The \(\times 50\) in the legend indicates the scaling of the signal yield to make it clearly visible in the plots. The error bars show statistical uncertainties. Backgrounds from top-quark and multiboson processes are merged, forming the ‘Other’ category. The last bin also includes any overflow entries. The lower panels show the ratio of the observed data to the estimated SM background. The binning presented in the figures is used in the fit

Fig. 7
figure 7

a Distribution of \(m(\ell ^{\pm },\ell ^{\prime \pm })_{\textrm{lead}}\), where \(\ell , \ell ^\prime =e,~\mu \) in three-lepton signal region (SR3L) and b event yield in the four-lepton signal region (SR4L), where no events are observed. The background expectation is the result of the background-only fit described in the text. The hatched bands include all systematic uncertainties post-fit with the correlations between various sources taken into account. The solid coloured lines correspond to signal samples, normalised using the theory cross-section, with the \(H^{\pm \pm }\) mass marked in the legend. The error bars show statistical uncertainties. Backgrounds from top-quark and multiboson processes are merged, forming the ‘Other’ category. The lower panels show the ratio of the observed data to the estimated SM background. The binning presented in the figures is used in the fit. a The red arrow indicates a point that is outside the vertical range of the figure. The last bin also includes any overflow entries

The cross-sections used to normalise the simulated samples are varied to account for the scale and PDF uncertainties in the cross-section calculation. The variation is 6% for diboson production [82], 13% for \(t{}\bar{t}W\,\) production, 12% for \(t{}\bar{t}Z\,\) production, and 8% for \(t{}\bar{t}H\) production [83]. Since the yield of the diboson background is derived from the likelihood fit to the data, these systematic variations contribute by changing the shapes and the relative normalization of the background predictions used in the likelihood fit in the CRs and SRs. The theoretical uncertainty in the Drell–Yan background is estimated by PDF eigenvector variations of the nominal PDF set, and variations of the PDF scale, \(\alpha _{\text {s}}\), electroweak corrections, and photon-induced corrections. For Sherpa -simulated processes (Drell–Yan, diboson and multiboson processes as listed in Table 1), uncertainties from missing higher orders were evaluated [84] using seven variations of the QCD factorisation and renormalisation scales in the matrix elements by factors of 0.5 and 2, avoiding variations in opposite directions. The effect of the PDF choice is considered using the PDF4LHC prescription [85]. Uncertainties in the nominal PDF set were evaluated using 100 replica variations. Additionally, the results were cross-checked using the central values of the CT 14nnlo [86] and MMHT2014 nnlo [87] PDF sets. The effect of the uncertainty in the strong coupling constant, \(\alpha _{\text {s}}\), was assessed by variations of \(\pm 0.001\). For \(t{}\bar{t}\)   and single-top-quark samples, the uncertainties in the cross-section due to the PDF and \(\alpha _{\text {s}}\) were calculated using the PDF4LHC 15 prescription [85] with the MSTW2008 nnlo [88, 89], CT 10nnlo [90, 91] and NNPDF 2.3lo [42] PDF sets in the five-flavour scheme, and were added in quadrature to the effect of the factorisation scale uncertainty. For \(t\bar{t}V\) production and processes producing three or more top quarks, the uncertainty due to initial-state radiation (ISR) was estimated by comparing the nominal event sample with two samples where the up/down variations of the A14 tune were employed. The theoretical uncertainty of the \(\text {NLO}\) cross-section for \(\textit{pp }\rightarrow H^{++}H^{--}\) is reported to range from a few per cent at low \(H^{\pm \pm }\) masses to approximately 25% [18, 19] for the highest signal mass points studied in the analysis. It is not included in a fit as a nuisance parameter but is drawn as an uncertainty band around the theoretical curves in the exclusion limit plots in Figs. 8 and 9. The uncertainty on signal cross-section includes the renormalisation and factorisation scale dependence and the uncertainty in the parton densities. The theoretical uncertainty in the \(pp \rightarrow H^{++}H^{--}\) simulation is assessed by varying the A14 parameter set in Pythia 8.186 and choosing CTEQ6L1 and CT09MC1 [92] as alternative PDFs. The impact on signal acceptance is found to be negligible. These theoretical uncertainties are considered fully correlated across the various analysis regions.

Fig. 8
figure 8

Observed (solid line) and expected (dashed line) 95% \(\text {CL}\) upper limits on the \(H^{\pm \pm }\) pair production cross-section as a function of \(m(H^{\pm \pm })\) resulting from the combination of all analysis channels, assuming \(\sum _{\ell \ell ^\prime } \mathcal {B} (H^{\pm \pm } \rightarrow \ell ^{\pm } \ell ^{\prime \pm })={100}{\%}\), where \(\ell , \ell ^\prime = e, \mu , \tau \). The surrounding green and yellow bands correspond to the \(\pm 1\) and \(\pm 2\) standard deviation (\(\pm 1 \sigma \) and \(\pm 2 \sigma \)) uncertainty around the combined expected limit, respectively, as estimated using the frequentist approach, where toy experiments based on both the background-only and signal+background hypotheses are generated for this purpose. The theoretical signal cross-section predictions, given by the \(\text {NLO}\) calculation [18, 19], are shown as blue, orange and red lines for the left-handed \(H^{\pm \pm } _L\), right-handed \(H^{\pm \pm } _R\) (which is the same as the Zee–Babu \(k^{\pm \pm }\)), and a sum of the two LRSM chiralities, respectively, with the corresponding uncertainty bands

Fig. 9
figure 9

Observed 95% \(\text {CL}\) upper limits on the \(H^{\pm \pm }\) pair production cross-section as a function of \(m(H^{\pm \pm })\) assuming \(\sum _{\ell \ell ^\prime } \mathcal {B} (H^{\pm \pm } \rightarrow \ell ^{\pm } \ell ^{\prime \pm })={100}{\%}\), where \(\ell , \ell ^\prime =e, \mu , \tau \). The dashed blue, green, and purple lines indicate the observed limit using the two-, three-, and four-lepton exclusive final states, respectively. The limit obtained from the four-lepton final state is the strongest and drives the combined result. The black lines show the combined observed limit obtained using the frequentist approach for a fit with only statistical uncertainties (dotted) and a fit with statistical and systematic uncertainties (solid). The grey line shows the limit using the asymptotic approximation [95], and the cyan dashed line shows the combined observed limit obtained analysing the first \({36.1}\,{\hbox {fb}}^{-1}\) of Run 2 [21]. The theoretical signal cross-section predictions, given by the \(\text {NLO}\) calculation [18, 19], are shown as blue, orange and red lines for the left-handed \(H^{\pm \pm } _L\), right-handed \(H^{\pm \pm } _R\) (which is the same as the Zee–Babu \(k^{\pm \pm }\)), and a sum of the two LRSM chiralities, respectively, with the corresponding uncertainty bands

A significant contribution to the total uncertainty arises from the statistical uncertainty of the MC samples in all analysis regions. Analysis regions have very restrictive selections, and only a small fraction of the initially generated MC events remains after applying all requirements. The statistical uncertainty varies from 5 to 40% depending on the signal region.

Experimental systematic uncertainties due to differences between data and Monte Carlo for lepton reconstruction, identification, isolation, and trigger efficiencies are estimated by varying the corresponding scale factors. The impact of scale-factor systematic variations on the background yield estimation is at most 3% and is less significant than the other systematic uncertainties and MC statistical uncertainties. The same is true for the lepton energy or momentum calibration.

Experimental systematic uncertainties due to different reconstruction and b-tagging efficiencies for reconstructed jets in data and simulation are taken into account. Uncertainties in the absolute jet energy scale and resolution, measured in multi-jet, \(Z\,\text { + jets}\), and \(\gamma \text {+jet}\) events, are considered and are estimated to be less than 2% across the range of \(p_{\text {T}} (\textrm{jet})\) of interest [75, 80]. b-tagging efficiencies are measured in two-lepton \(t{}\bar{t}\) events and are estimated to range from 8% at low momentum to 1% at high momentum [77]. Furthermore, the \(E_{\text {T}}^{\text {miss}}\) measurement uncertainties are estimated by comparing data to simulation in \(Z\,\rightarrow \mu {}\mu \) events without jets, as described in Ref. [80]. The uncertainty in the pile-up simulation, derived from a comparison of data with simulation, is also taken into account [76].

The experimental uncertainty related to electron charge misidentification arises from the statistical uncertainty of both the data and the simulated sample of \(Z\,/\gamma ^* \rightarrow ee\) events used to measure this probability. The charge misidentification probability increases from \(\sim 10^{-4}\) to \(\sim 0.1\) with increasing electron \(p_{\text {T}}\) and \(|\eta |\). The systematic effects obtained by altering the selection requirements imposed on the invariant mass used to select \(Z\,/\gamma ^* \rightarrow ee\) events are found to be negligible compared to the statistical uncertainty.

The experimental systematic uncertainty in the data-driven estimate of the FNP lepton background is evaluated by varying the nominal fake factor to account for different effects. The \(E^{\textrm{miss}}_{\textrm{T}}\) requirement is altered to consider variations in the \(W\,\text {+ jets}\) composition. The flavour composition of the fakes is investigated by requiring an additional recoiling jet in the electron channel and changing the definition of the recoiling jet in the muon channel. Furthermore, the transverse impact parameter criterion for tight muons (defined in Sect. 4.1) is varied by one standard deviation. Finally, in the fake-enriched regions, the normalisation of the subtracted prompt-lepton contribution is altered within its uncertainties. This variation accounts for uncertainties related to the luminosity, the cross-section, and the corrections applied to simulation-based predictions. The FNP systematic uncertainties are correlated across all the different regions and bins, while the statistical uncertainties in each bin are uncorrelated. The statistical uncertainty of the fake factors is added in quadrature to the total systematic uncertainty. The total uncertainty of the FNP background yield varies between 10% and 20%.

The total relative background systematic uncertainty after the background-only fit (Sect. 7), and its breakdown into components, is presented in Fig. 4.

7 Statistical analysis and results

The HistFitter [93] statistical analysis package is used to implement a maximum-likelihood fit. The fit considers the leading lepton pair’s invariant mass distribution \(m(\ell ^{\pm },\ell ^{\prime \pm })_{\textrm{lead}}\), where \(\ell , \ell ^\prime =e,~\mu \), in all two- and three-lepton control and signal regions. In the four-lepton CRs and SRs, the single-bin event yield is used to obtain the numbers of signal and background events. The binning is chosen to optimise the expected sensitivity to the signal model, while also keeping low statistical uncertainties in each bin. The likelihood is the product of a Poisson probability density function describing the observed number of events and Gaussian distributions to constrain the nuisance parameters associated with the systematic uncertainties. The widths of the Gaussian distributions correspond to the magnitudes of these uncertainties, whereas Poisson distributions are used for MC simulation statistical uncertainties. Furthermore, additional free parameters are introduced for the Drell–Yan and diboson background contributions to fit the yields in the corresponding control regions. These free parameters are then used to normalise relevant backgrounds in the signal regions. Fitting the yields of the largest backgrounds reduces the systematic uncertainty in the predicted yield from SM sources. The fitted normalisations are compatible with their SM predictions, within the uncertainties. The diboson yield is described by three free parameters, one each for the two-, three- and four-lepton multiplicity channels. For the scenario in which the \(H^{\pm \pm }\) particle decays into the different possible lepton-flavour combinations with equal probability, a 95% \(\text {CL}\) upper limit was set on the \(\textit{pp }\rightarrow H^{++}H^{--}\) cross-section using the \(\text {CL}_{\text {s}}\) method [94].

7.1 Fit results

The observed and expected yields in all control, validation, and signal regions used in the analysis are presented in Fig. 5 and summarised in Tables 4, 5 and 6. Here ‘pre-fit’ denotes the nominal simulated MC yields and ‘post-fit’ denotes the simulated yields scaled with the normalisation parameters obtained from the likelihood fit to the two-, three-, and four-lepton control and signal regions. In general, good agreement, within statistical and systematic uncertainties, between data and SM predictions is observed in the various regions, demonstrating the validity of the background estimation procedure. No significant excess is observed in any of the signal regions. Correlations between various sources of uncertainty are evaluated and used to estimate the total uncertainty in the SM background prediction. Some of the uncertainties, particularly those connected with the normalisation of the background contributions and the FNP background, are anti-correlated. The \(m(\ell ^{\pm },\ell ^{\prime \pm })_{\textrm{lead}}\) distributions (\(\ell , \ell ^\prime =e,~\mu \)) of the two-lepton signal regions are presented in Fig. 6, and those of the three- and four-lepton signal regions are presented in Fig. 7. In the four-lepton signal region, no data event is observed, which is within the expected yield.

The signal regions were designed to fully exploit the pair production of the \(H^{\pm \pm }\) boson and its boosted topology by applying selections that target same-charge high-\(p_{\text {T}}\) leptons. After a background-only likelihood fit, the Drell–Yan normalisation factor is found to be \(1.13 \pm 0.04\) and the two-, three- and four-lepton channel diboson normalisation factors are \(1.10 \pm 0.06\), \(0.92 \pm 0.05\), and \(1.08 \pm 0.11\), respectively. Figure 8 shows the upper limit on the cross-section as a function of the \(H^{\pm \pm }\) boson mass, where decays into each leptonic final state are equally probable. Since the yields in some regions are very small, the asymptotic approximation [95] cannot be used reliably, so \(10^5\) pseudo-experiments were run instead to obtain the final limits.

The observed lower limits on the \(H^{\pm \pm }\) mass within LRSMs (the Zee–Babu model) vary from 520 to 1050 GeV (410 to 880 GeV), depending on the lepton multiplicity channel, with \(\sum _{\ell \ell ^\prime } \mathcal {B} (H^{\pm \pm }\rightarrow \ell ^{\pm }\ell ^{\prime \pm }) = {100}{\%}\). The observed lower limit on the mass reaches 1080 GeV and 900 GeV when combining all three channels for LRSMs and the Zee–Babu model, respectively. The expected exclusion limit is \({1065^{+30}_{-50}}{\hbox { GeV}}\) for LRSMs and \({880^{+30}_{-40}}{\hbox { GeV}}\) for the Zee–Babu model, where the uncertainties of the limit are extracted from the \(\pm 1\sigma \) band. The limit obtained from the four-lepton final state is the strongest and drives the combined result. A comparison between the various limits obtained from this measurement is presented in Fig. 9.

8 Conclusion

The ATLAS detector at the Large Hadron Collider was used to search for doubly charged Higgs bosons in the same-charge two-lepton invariant mass spectrum, using \(e^{\pm }e^{\pm }\), \(e^{\pm }\mu ^{\pm }\) and \(\mu ^{\pm }\mu ^{\pm }\) final states as well as final states with three or four leptons (electrons and/or muons). The search was performed with \({139}\,{\hbox {fb}}^{-1}\) of data from proton–proton collisions at \(\sqrt{s}={13}{\hbox { TeV}}\), recorded during the Run 2 data-taking period lasting from 2015 to 2018. No significant excess above the Standard Model prediction was found. Lower limits are set on the mass of doubly charged Higgs bosons in the context of the left-right symmetric type-II seesaw and Zee–Babu models. These vary between 520 GeV and 1050 GeV for LRSMs and between 410 GeV and 880 GeV for the Zee–Babu model, depending on the lepton multiplicity channel, assuming that \(\sum _{\ell \ell ^\prime } \mathcal {B} (H^{\pm \pm } \rightarrow \ell ^{\pm }\ell ^{\prime \pm })={100}{\%}\) and that decays to each of the \(ee,~e\mu ,~\mu \mu ,~e\tau ,~\mu \tau ,~\tau \tau \) final states are equally probable. The observed combined lower limit on the \(H^{\pm \pm }\) mass is 1080 GeV within LRSMs and 900 GeV within the Zee–Babu model. These limits are consistent with the expected limits of \({1065^{+30}_{-50}}{\hbox { GeV}}\) and \({880^{+30}_{-40}}{\hbox { GeV}}\) for LRSM and Zee–Babu doubly charged Higgs bosons, respectively. The lower limits on the LRSM \(H^{\pm \pm }\) masses are 300 GeV higher than those from the previous ATLAS result. Moreover, this search provides the first direct test of the Zee–Babu model at the LHC.