Search for a new heavy scalar particle decaying into a Higgs boson and a new scalar singlet in final states with one or two light leptons and a pair of τ-leptons with the ATLAS detector

A search for a new heavy scalar particle X decaying into a Standard Model (SM) Higgs boson and a new singlet scalar particle S is presented. The search uses a proton-proton (pp) collision data sample with an integrated luminosity of 140 fb−1 recorded at a centre-of-mass energy of s\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$ \sqrt{s} $$\end{document} = 13 TeV with the ATLAS detector at the Large Hadron Collider. The most sensitive mass parameter space is explored in X mass ranging from 500 to 1500 GeV, with the corresponding S mass in the range 200–500 GeV. The search selects events with two hadronically decaying τ-lepton candidates from H → τ+τ− decays and one or two light leptons (ℓ = e, μ) from S → VV (V = W, Z) decays while the remaining V boson decays hadronically or to neutrinos. A multivariate discriminant based on event kinematics is used to separate the signal from the background. No excess is observed beyond the expected SM background and 95% confidence level upper limits between 72 fb and 542 fb are derived on the cross-section σ(pp → X → SH) assuming the same SM-Higgs boson-like decay branching ratios for the S → VV decay. Upper limits on the visible cross-sections σ(pp → X → SH → WWττ) and σ(pp → X → SH → ZZττ) are also set in the ranges 3–26 fb and 6–33 fb, respectively.


Introduction
The discovery of the Higgs boson () in 2012 [1,2] by the ATLAS [3] and CMS [4] collaborations at the Large Hadron Collider (LHC) has led to a comprehensive programme of measurements and searches using proton-proton ( ) collision data.All measurements to date are consistent with the prediction of the Standard Model (SM) [5][6][7][8].
Many beyond-the-SM (BSM) theories predict the existence of a heavy scalar boson decaying into two Higgs bosons or additional scalars in an extended Higgs sector.Searches [9][10][11][12] for these scalars have been published by the ATLAS and CMS collaborations.The BSM scenarios tested include the two-Higgs-doublet model (2HDM) [13], the Minimal Supersymmetric Standard Model (MSSM) [14], and the extensions of the 2HDM model with a new singlet scalar (2HDM+) [15,16].In the simplest extension of the MSSM, the Next-to-Minimal Supersymmetric Standard Model (NMSSM), an additional gauge singlet is introduced to generate the -term coupling to the superpotential dynamically and this leads to an enriched Higgs sector with two additional neutral Higgs bosons [17,18].The BSM models hypothesize the existence of a new scalar singlet, , in the processes  → ,  where  is a heavy CP-even scalar boson produced predominantly through the gluon-gluon fusion process (ggF) [19].The decay of the singlet scalar  is assumed to have the same relative couplings as a SM-Higgs boson of the specified mass.It can be produced via decay mode  →  as shown in Figure 1.This study focuses on the decay  → , assuming  masses range from 500 to 1500 GeV, while  is assumed to have a mass range from 200 to 500 GeV and decays predominantly into  ±  ∓ and   with the SM-Higgs boson-like decay branching ratios [20].The search is performed by selecting events containing two hadronically decaying -lepton candidates ( had ) from  →  +  − decays and one or two light leptons (ℓ = , ) from  → ,   decays.This signature-based search is the first of its kind and is competitive with other searches using the final states  ,  ,  [7], and  [8] in the high mass regions.A direct comparison between them is not possible due to the unknown branching fractions for  decays, but each provides independent constraints on the BSM models.To improve background rejection, a multivariate technique based on boosted decision trees (BDTs) is used to discriminate between signal and background based on their different kinematic distributions.Upper limits are set at the 95% confidence level (CL) on the model-dependent cross section (   →  → ) and on the model-independent (   →  →  →  ) and (   →  →  →  ) cross sections.

ATLAS detector
The ATLAS detector [3] at the LHC covers almost the entire solid angle around the collision point. 1 It consists of an inner tracking detector surrounded by a thin superconducting solenoid that produces a 2 T axial magnetic field, electromagnetic and hadronic calorimeters, and a muon spectrometer incorporating three large toroid magnet assemblies with eight coils each.The inner detector contains a high-granularity silicon pixel detector, including the insertable B-layer [21][22][23] added as a new innermost layer in 2014, and a silicon microstrip tracker.Together they enable the precise reconstruction of tracks of charged particles in the pseudorapidity range || < 2.5.The inner detector also includes a transition radiation tracker that provides tracking and electron identification for || < 2.0.The calorimeter system covers the pseudorapidity range || < 4.9.Within the region || < 3.2, electromagnetic calorimetry is provided by barrel and endcap high-granularity lead/liquid-argon (LAr) sampling calorimeters, with an additional thin LAr presampler covering || < 1.8 to correct for energy loss in material upstream of the calorimeters.Hadronic calorimetry is provided by a steel/scintillator-tile calorimeter, segmented into three barrel structures in the region || < 1.7, and two copper/LAr hadronic endcap calorimeters.The solid angle coverage is completed with forward copper/LAr and tungsten/LAr calorimeter modules optimised for electromagnetic and hadronic measurements, respectively.The calorimeters are surrounded by a muon spectrometer in a magnetic field provided by air-core toroid magnets with a bending integral of about 2.5 T m in the barrel and up to 6.0 T m in the endcaps.The muon spectrometer measures the trajectories of muons in the region || < 2.7 using multiple layers of high-precision tracking chambers, and it is instrumented with separate trigger chambers that cover || < 2. 4. A two-level trigger system [24], consisting of a hardware-based level-1 trigger followed by a software-based high-level trigger, is used to reduce the event rate to a maximum of around 1 kHz for offline storage.An extensive software suite [25] is used in the reconstruction and analysis of real and simulated data, in detector operations, and in the trigger and data acquisition systems of the experiment.

Data and Monte Carlo Simulation samples
The search is based on a data sample of   collisions at √  = 13 TeV with 25 ns bunch spacing collected from 2015 to 2018, corresponding to an integrated luminosity of 140 fb −1 .Only events recorded with a single-electron trigger, a single-muon trigger, or a dilepton trigger [26][27][28][29] under stable beam conditions and for which all detector subsystems were operational are considered for analysis.The average number of   interactions per bunch crossing in this dataset ranges from 8 to 45, with a mean of 24.
Events produced in the generic 2HDM+ model are considered as signals, where the kinematic parameters are model-independent, and the results can be interpreted in other BSM models.Events are generated to leading-order (LO) accuracy with Pythia8.2 [30] for matrix element calculation and use the NNPDF2.3LO set of parton distribution functions (PDF) [31].Parton showering (PS) and hadronisation are also simulated using the Pythia8.2generator with the A14 tune [32] and using the same NNPDF2.3LO PDF set.The detector response is simulated using AltFastII [33], which uses a fast, parameterised simulation 1 ATLAS uses a right-handed coordinate system with its origin at the nominal interaction point (IP) in the centre of the detector.
The -axis points from the IP to the centre of the LHC ring, the -axis points upward, and the -axis coincides with the axis of the beam pipe.Cylindrical coordinates (,) are used in the transverse plane,  being the azimuthal angle around the beam pipe.The pseudorapidity is defined in terms of the polar angle  as  = − ln tan(/2).Angular separation is measured in units of of the calorimeter response.Two scalars,  and , are assumed to have a narrow width relative to the experimental resolution.The heavier boson  is constrained to decay only to  and , and  decays to a pair of  or  bosons.The  decays to SM particles are assumed to be the same as those of a SM Higgs boson of the specified mass.To suppress QCD multĳet backgrounds, only semileptonic and fully leptonic decays of  pairs are considered in the analysis.For   decays, the two-lepton final state is considered, with one Z decaying into two light leptons and the other into a pair of neutrinos or jets.The SM Higgs boson from the  decay is required to decay into a pair of -leptons.Monte Carlo (MC) simulation samples were produced for the different background processes.The dominant background is from falsely identified tau leptons originating from jets.The irreducible background with a real tau-lepton contribution is dominated by diboson processes that produce  pairs.The effects of additional   collisions in the same or a nearby bunch crossing (pile-up) is modelled using events from minimum-bias interactions generated with Pythia8.1 [36] using the NNPDF2.3LO set of PDFs and the A3 set of tuned parameters [37], and overlaid onto the simulated hard-scatter events according to the luminosity profile of the recorded data.The generated events were processed through a detailed simulation [38] of the ATLAS detector geometry and response using Geant4 [39] and processed with the same reconstruction software as the data.Corrections were applied to the simulated events so that the selection efficiencies, energy scales, and energy resolutions of particle candidates match those determined from data control samples.The simulated samples are normalised to their SM cross sections, computed to the highest available order in perturbation theory.
The  t and  t/ * samples are generated using the Madgraph5_aMC@NLO v2.6.2 [40] generator, which provides matrix elements at NLO in the strong coupling constant   with the NNPDF3.0NLO PDF set [31].The functional forms of the renormalisation and factorisation scales are set to where  T is defined as the scalar sum of the transverse masses T of the particles generated from the matrix element calculation.The events are interfaced with Pythia8.2 [30] for the parton shower (PS) and hadronisation, using the A14 set of tuned parameters and the NNPDF2.3LO PDF set.
The  t process is obtained from the Powheg-Box v2.0 generator at NLO and interfaced with Pythia8.2 for the parton showering and fragmentation with the A14 tune.This sample uses the NNPDF3.0NLO PDF set.
Higgs boson production in association with a vector boson (  or  , collectively referred to as  ) was simulated using Powheg-Box v2 with the NNPDF3.0NLO PDF set and interfaced with Pythia8.2 for PS and non-perturbative effects.Separate  + ,  − ,  →   and  →   samples are produced.The cross section is normalised at the next-to-next-to-leading order (NNLO) in QCD with NLO electroweak corrections for  q →   and at NLO and next-to-leading-logarithm accuracy in QCD for  →   [41,42].
The  t events are generated with Powheg-Box v2.0 and interfaced with Pythia8.2 for the parton showering and fragmentation with the A14 tune.The PDF set of NNPDF3.0NLO is used.The single top-quark events are simulated with Powheg-Box v2.0 with the NNPDF3.0NLO PDF set and interfaced with Pythia8.2,where the interference between  ad  t production is handled with the Diagram Removal (DR) procedure [43].
A dedicated  t sample including rare  →   * (→  +  − ) radiative decays,  t →  +  − b +  − , is generated using a Madgraph5_aMC@LO matrix element requiring ( +  − ) > 1 GeV and interfaced with Pythia8.2.In this sample, the photon can be radiated from the top quark, the  boson, or the -quark.Both the  t/ * and  t →  +  − b +  − samples are combined and together form the " t (high mass)" sample.The contribution from internal photon conversions ( * →  +  − ) with ( +  − ) < 1 GeV is modelled with Sherpa 2.2.11 [44] through QED multiphoton radiation via the PS in an inclusive  t sample and is referred to as " t * (low mass)" sample.
Diboson backgrounds are generated and normalised using the cross sections computed by Sherpa 2.2.2 [44] using the NNPDF3.0NNLO PDF set and showered using Sherpa's default setting.The cross section is computed at NNLO accuracy in QCD and includes the electroweak (EW) corrections at NLO accuracy.The processes of  and  production associated with jets (+jets) are simulated with Sherpa 2.2.1 [44] using the NNPDF3.0NNLO PDF and showered by the Sherpa built-in implementation using matrix elements for up to two additional jets at NLO and up to four additional jets at LO.The cross section used to normalise the simulation is calculated at NNLO accuracy in QCD and includes EW corrections at NLO accuracy.Low-mass Drell-Yan  ℓℓ production is also included.
The MC background samples are divided into five categories: data-driven fake  had ,  t,  t, diboson, and 'others,' which includes  t, +jets, low-mass Drell-Yan,  , , and other small backgrounds.

Object reconstruction
Events are required to have at least one primary vertex with a minimum of two associated tracks, each with a transverse momentum ( T ) larger than 500 MeV.The primary vertex of an event is defined as the vertex with the highest scalar sum of squared transverse momenta of the associated tracks [45].
Electrons are reconstructed by matching tracks reconstructed in the inner detector (ID) to topological clusters of energy deposits in the electromagnetic calorimeter.They are required to have  T > 10 GeV and || < 2.47, and to be outside the transition region between the barrel and endcap calorimeters, 1.37 < || < 1.52.Electron candidates must satisfy a Loose likelihood-based identification requirement [46].Additional requirements on the associated track  T and the ratio of the electron's calorimeter energy and the track momentum are applied to electrons to suppress photon conversions.Electrons with incorrect charge assignments (arising from asymmetric photon conversions) are rejected using a BDT discriminant based on calorimeter and tracking quantities.An efficiency of 95% for electrons with correct charge assignment is obtained with a rejection factor of ∼ 17 for electrons with incorrect charge assignment [46,47].
Muon candidates are reconstructed from tracks in the muon spectrometer (MS) matched to tracks found in the ID.They are required to have  T > 10 GeV and || < 2.5 and to satisfy Loose identification requirements [48] based on the numbers of hits in the different ID and MS subsystems and on the significance of the charge-to-momentum ratio /.Electrons (muons) are required to have associated tracks satisfying | 0 |/  0 < 5(3) and | 0 sin| < 0.5 mm, where  0 is the transverse impact parameter relative to the beam line, and  0 is the longitudinal impact parameter relative to the primary vertex.
Jets are reconstructed using the anti-  jet clustering algorithm [49,50] with a radius parameter of  = 0.4, applied to topological energy clusters [51,52] and charged-particle tracks, processed using a particle-flow algorithm [53].The reconstructed jets are corrected by the application of jet energy scale (JES) and resolution (JER) calibrations derived from simulation and in situ corrections based on √  = 13 TeV data [54,55].Only jet candidates with  T > 25 GeV and || < 2.5 are considered.Jets with  T < 60 GeV and || < 2.4 originating from pile-up are suppressed by the use of the jet vertex tagger (JVT) [56].Quality criteria are imposed to reject events containing jets from non-collision backgrounds and calorimeter noise [57].
Jets containing -hadrons (-jets) are identified with the DL1r tagger algorithm [58,59], a multivariate discriminant based on inputs such as the impact parameters of displaced tracks and the reconstructed secondary vertices in the jet.Working points are defined with different target efficiencies for tagging -jets from an inclusive  t sample.This analysis uses the 77% -tagging working point to veto -jets.
The reconstruction of the hadronically decaying -lepton candidates ( had ) is seeded by jets reconstructed with an anti-  algorithm using calibrated topological clusters with a radius parameter of  = 0.4 [51,60].They are required to have  T > 20 GeV and || < 2.5, excluding the transition region between the barrel and endcap calorimeters (1.37 < || < 1.52).The  had candidates are required to have one or three associated tracks and a total charge of ±1.An identification algorithm based on a recurrent neural network (RNN) [61] discriminates the visible decay products of the  had candidates from jets initiated by quarks or gluons.The  had candidates are required to satisfy the RNN Medium identification working point in the signal regions, corresponding to an efficiency of 75% (60%) for candidates with one (three) associated track(s).The efficiency is optimized to be flat as a function of the  had  T and pile-up.A dedicated BDT discriminant is used to reject electrons misidentified as hadronic tau decays.To suppress fake  had candidates from muons, the candidates that overlap with low- T reconstructed muons are vetoed.Efficiency scale factors for the reconstruction, identification, and electron BDT rejection are applied to  had candidates in simulation [62].
Anti- had candidates are defined to estimate the background from jets misidentified as  had , as described in Section 7.They are required to satisfy the RNN Very Loose identification working point but fail the nominal RNN requirement applied to the  had candidates.
An overlap removal procedure is applied when two reconstructed objects are close in Δ to avoid double-counting of objects.
Isolation criteria are applied to electrons and muons to suppress contributions from semileptonic decays of heavy-flavour hadrons or jets misidentified as leptons, collectively referred as non-prompt leptons, after the overlap removal procedure.A BDT discriminant is trained based on isolation and -tagging variables and is referred to as the non-prompt lepton BDT [63].The light leptons must satisfy the Loose BDT working point to reject non-prompt leptons.
The missing transverse momentum ì  miss T (with magnitude  miss  ) is defined as the negative vector sum of the ì  T of all selected and calibrated objects in the event, including a term to account for soft particles that are not associated with any of the selected objects.This soft term is calculated from ID tracks matched to the selected primary vertex to make it more resilient to contamination from pile-up interactions [64].

Event selection
In this search three signal regions (SR) are considered, referred to as 1ℓ2 had , 2ℓ2 had , and  2ℓ2 had channels.Events in the 2ℓ2 had and  2ℓ2 had channels contain exactly two light leptons with opposite-sign electric charges, while events in the 1ℓ2 had channel contain exactly one light lepton.
Selected events must satisfy either the un-prescaled single lepton (SL) or dilepton (DL) triggers.The SL triggers are required in the 1ℓ2 had channel while a logical "or" between SL and DL triggers is required for the 2ℓ2 had and  2ℓ2 had channels.The offline reconstructed light leptons in the event must be matched to the trigger-level objects that caused the event to pass the trigger(s).The reconstructed light lepton  T must exceed the threshold set for the online trigger object by at least 1 GeV to avoid threshold biases [63].
A  candidate must be present in the  2ℓ2 had channel.It is defined by two leptons of the same flavour and opposite charge with an invariant mass that is consistent with the  boson mass, | ℓℓ −   | < 10 GeV.
In the 2ℓ2 had channel the dilepton invariant mass must satisfy  ℓℓ > 12 GeV and  → ℓℓ candidates are vetoed.
For each channel, exactly two RNN medium  had candidates of opposite charge with  T > 20 GeV are required.The angular distance between the two  had candidates is required to satisfy Δ (  0 ,  1 ) ≤ 2 to suppress +jets background events.
The event selection for each channel is summarised in Table 1.A multivariate (MVA) technique is employed in all channels to enhance the sensitivity by discriminating signals from the SM background processes.The description of the MVA strategy is given in Section 6.
Same-sign validation regions (VR) are also selected using the same event selection as in the signal regions, except that the two  had candidates must have the same charge (SS).In this VR, the signal contribution is negligible, and the background modelling can be checked using data.

Multivariate discriminant
Boosted decision trees implemented in the Toolkit for Multivariate Analysis (TMVA) [65] are used in each SR to improve the separation between signal and background.In the training process, all signal and background events are divided into independent training and test samples to avoid possible biases.
To simplify the training procedure across the 18 mass hypotheses of  and  particles, a parameterised BDTG method [66] is used.This method trains a BDT using several   samples with the same   by including their generated   mass as an input parameter while the   values of background events are assigned randomly to match the signal population.The choice made for the parameterisation of the BDT is motivated by the fact that the signal kinematics are driven by the two-body decay  →  for the given   .The training algorithm incorporates the   parameter so the BDT is parameterised in   for given .This procedure is performed separately in three channels (1ℓ2 had , 2ℓ2 had , and  2ℓ2 had ) with four   masses (200, 300, 400, 500 GeV), so 12 different BDTs are trained.
Many potential variables were investigated in each SR separately.These variables are chosen as they are expected to provide good separation between the background and signal topologies.Figure 2 shows the distributions of some highly discriminating variables described in Table 2 in each SR.The discrimination of a given variable is quantified by the 'separation' (which measures the degree of overlap between background and signal distributions) and 'importance' (which ranks the power of the variable in the classification of the events) provided by the TMVA package.The BDT discriminant is trained for each SR starting from an extensive list of variables; then, the least important variables are removed sequentially, and the BDT is retrained until the area under the receiver operating characteristic (ROC) curve drops by more than 1%.The final selected BDT input variables in each SR are summarised in Table 2.In the  2ℓ2 had channel, no explicit selection is applied for the second  decay.But, it is required to have a sum of large  miss  from  →  and jets  T from  →  decay in the BDT training to improve the background rejection without further dividing the data.
The BDT modelling in the signal regions is also checked using the data in the SS VR and in the low BDT regions with BDT < 0.1, where the signal contributions are negligible.The data is consistent with the background expectations in both validation regions.The final observable used to extract the signal contribution is the BDT distribution in each SR.

Background Modelling
The reducible backgrounds arise from events where at least one of the  had candidates originates from a source other than vector boson decay.A quark or gluon-initiated jet is usually misidentified as a  had candidate in such backgrounds.These backgrounds with fake  had candidates are estimated using the data-driven fake factor (FF) method [60].The irreducible backgrounds with a real  had candidate are modelled using simulation.
In all channels, the largest contribution to the total fake  had background, including leading and subleading  had candidates, arises from the +jets process.The dominant source of the fake  had are light-quark jets.
Other types of fakes include gluon-initiated, -, and -jets, and an electron or muon misidentified as a fake  had candidate.The background contribution from non-prompt light leptons (ℓ = , ) is small in the signal regions and estimated from simulation.
The FF method is used to estimate the contribution of fake  had background in the 1ℓ2 had , 2ℓ2 had , and  2ℓ2 had signal regions.The method uses an extrapolation from a dedicated control region (CR) enriched in fake  had candidates to estimate the number of events with fake  had in the signal region by rescaling the templates of fake  had in the CR with the fake factors that discussed below.The CR event selection is the same as that used in the signal region, except that one or both  had candidates fail the RNN medium criteria but pass the RNN Very Loose (RNN > 0.05) criteria, referred to as anti-taus or fake  had .
The templates are produced by subtracting the real  had contributions from the simulation.
The fake factor is a transfer factor defined as the ratio of the number of events with RNN medium  had candidate to the number of events with anti- had candidates.FFs are measured from data in two dedicated control regions.These control regions are designed to be enriched in +jets ( (ℓℓ)) and  t (-tagged dilepton) backgrounds, by requiring zero b-tagged jets, in the former case, and one b-tagged jet and a Z boson veto in the latter case.The contribution of events with a real  had candidate from the simulation is at the 2% level and is subtracted in these control regions.The measured FFs in the +jets and  t control regions are consistent within the statistical uncertainties; however, differences between FFs are treated as systematic uncertainties arising from the different jet compositions.The fake  had background estimate is validated in a dedicated validation region for each channel.An overall 10% non-closure uncertainty is estimated by taking the ratio of data to the total predicted background in the 1ℓ2 had same-sign validation region.
The final irreducible and data-driven fake  had background contributions in the signal regions are summarised in Table 3.The data are consistent with the background expectation.The leading sources of systematic uncertainty in the fake  had background before the fit to data come from the subtraction of real  had contribution ( +7.5 −12 %), the non-closure uncertainty (±10%), and the uncertainty due to the fake  had background composition differences between the +jets and  t background control regions (±2%).
Event kinematic distributions are compared between the data and the predicted background for the visible invariant mass of the di- had system (   ), the leading  had  T , and other variables, after using a background-only fit to the data.Figure 2 shows the distributions for the 1ℓ2 had , 2ℓ2 had and  2ℓ2 had channels.Good agreement is found between the data and the predicted background.
Table 3: Event yields after a background-only fit and data in the three signal regions.All the backgrounds except the fake  had are estimated using the real  had contributions from MC.The fake  had background contribution is data-driven.The top-quark, +jets, and small backgrounds are included in the "Others" category.The total uncertainties include systematic and statistical contributions.In the 1ℓ2 had channel, the smaller uncertainty in the total background than the fake  had is due to the anti-correlations between the nuisance parameters of the fake  had and diboson backgrounds obtained in the fit.

Systematic Uncertainties
Systematic uncertainties can affect both the normalisation of signal and background and the shapes of their corresponding discriminant distributions.Each source of uncertainty is considered to be uncorrelated from other sources.For a given source, the uncertainties across all processes and channels are treated as correlated.Events / 20 GeV  Events / 20 GeV

Luminosity
The uncertainty in the integrated luminosity is 0.83%, which affects the overall normalisation of all processes estimated from the simulation.It is derived following a methodology similar to that detailed in Ref. [67] and uses the LUCID-2 detector for the baseline luminosity measurement [68] and a calibration of the luminosity scale from - beam-separation scans.

Reconstructed objects
Uncertainties associated with electrons, muons, and  had candidates arise from the trigger, reconstruction, momentum scale and resolution, particle identification, and, in the case of electrons and muons, isolation efficiencies.These are measured using  → ℓ + ℓ − and / → ℓ + ℓ − events [47,48] in the case of electrons and muons, and using  →  +  − events in the case of  had candidates [60].
Uncertainties associated with jets arise from the jet energy scale (JES) and resolution (JER), and the efficiency to satisfy JVT requirements.The largest contribution comes from the jet energy scale, whose uncertainty dependence on jet  T and , jet flavour, and pile-up treatment is split into 43 uncorrelated components that are treated independently [69].The total JES uncertainty is below 5% for most jets and below 1% for central jets with  T between 300 GeV and 2 TeV.The difference between the JER values in data and simulated events is treated as a systematic uncertainty with one nuisance parameter.It is applied to the simulated events by smearing the jet  T by the prescribed uncertainty.
Uncertainties associated with energy scales and resolutions of leptons and jets are propagated to  miss  .Additional uncertainties originating from the modelling of the underlying event, particularly its impact on the  T scale and resolution of unclustered energy, are negligible.
Efficiencies to veto tagged -jets and -jets in the simulation are corrected by  T -dependent factors to match the efficiencies in data, while the light-jet efficiency is scaled by  T -and -dependent factors.The -jet efficiency is measured in a data sample enriched in  t events [58], while the -jet efficiency is measured using  t events [70] or +-jet events [71].The light-jet efficiency is measured in a multĳet data sample enriched in light-flavour jets [72].These systematic uncertainties are taken to be uncorrelated between -jets, -jets, and light jets.
Pile-up reweighting scale factors are applied to simulated events to reproduce the pile-up distribution corresponding to the data.The uncertainty in these scale factors is estimated by changing the profile in data within uncertainties.

Background modelling
Several sources of systematic uncertainty affecting the modelling of background are considered: the choice of renormalisation and factorisation scale in the matrix-element calculation, the choice of matching scale when matching the matrix elements to the parton shower generator, and the uncertainty in the value of  s when modelling initial-state radiation (ISR).The uncertainty due to the choice of parton shower (PS) and hadronisation model is derived by comparing the predictions from Powheg-Box interfaced either to Pythia 8 or Herwig 7. The latter uses the MMHT2014 LO [73] PDF set in combination with the H7UE tune [74].The uncertainty in the modelling of additional radiation from the PS is assessed by varying the corresponding parameter of the A14 set [75] and by varying the radiation renormalisation and factorisation scales independently by factors of 2.0 and 0.5.The PDF uncertainty is estimated by taking the PDF variations as an envelope around the nominal PDF.
Uncertainties affecting the normalisation of the +jets background are estimated to be approximately 30% by taking the maximum difference between the data and the total background prediction in three or more jets [76].This is mainly driven by different simulations used in the +jets regions where Sherpa 2.2.1 at NLO is used for up to two partons, and LO is used for three or more partons.
Uncertainties in the  t+jets background normalisation include those estimated from variations of the renormalisation and factorisation scales, the choice of parton shower and hadronisation, the NNPDF3.0NNLO PDF set uncertainties, and  s variations.The contribution of  t+jets is small after the -tagging veto is applied.
Uncertainties affecting the modelling of the single-top-quark background include a +5%/−4% uncertainty in the total cross section, which is estimated as a weighted average of the theoretical uncertainties in -, -and -channel production [77][78][79].Additional uncertainties associated with the parton shower, hadronisation, and initial-and final-state radiation are also considered.
The diboson normalisation uncertainty is estimated to be ±15%, which includes those estimated from variations of the renormalisation and factorisation scales, NNPDF3.0NNLO PDF set, and  s variations.Uncertainties in the  t,  t, and VH cross sections are estimated to be ±10%, [+5.8%/−9.2%],and ±1.4%, respectively, arising from the uncertainties in their respective NLO theoretical cross sections [80].Most of the rare background contributions (,  t,  t, , ) are normalised using their NLO theoretical cross sections, and assigned a 50% normalisation uncertainty, except for  where a 5% normalisation uncertainty is used.
The statistical uncertainties of the fake  had background calibration are applied with uncorrelated uncertainties for different sources of fake -leptons.The uncertainties in the modelling of the processes used to evaluate the fake factors are taken into account in the fit.

Signal modelling
Several normalisation and shape uncertainties are considered and they are treated as fully correlated for the  →  signals.Uncertainties in the  and  decay branching ratios are considered by following the recommendation in Ref. [20].The BDT variations due to ISR, FSR, scale, and PDF uncertainties are estimated from a non-resonant  sample and are applied to the  signal as fully correlated in all SRs.The PS uncertainty is also estimated by comparing the BDT shape from a  sample with an alternative sample interfaced with Herwig 7 and is applied to the  signal.This choice is motivated by the fact that the modelling of the non-resonant  signal is calculated to higher order than the  simulation, and their kinematics are quite similar to those in the low   mass points.These uncertainties are cross-checked and found to be consistent with those calculated in the resonant  →  →  analysis [81].

Statistical analysis
The final discriminant distributions are the BDT outputs from all considered analysis regions.They are jointly analysed for the presence of a signal.The BDT distributions are binned to maximise the sensitivity to the signal and the binning is optimised to reduce the statistical fluctuations.
The statistical analysis uses a binned likelihood function L (, ) constructed as a product of Poisson probability terms over all bins considered in the search [82].This function depends on the signal-strength parameter , defined as a factor multiplying the expected yield of signal events normalised to an arbitrary reference cross section (   →  → ) = 1 pb, and on , a set of nuisance parameters (NPs) that encode the effect of systematic uncertainties in the signal and background expectations.The expected total number of events in a given bin depends on  and .All nuisance parameters are subject to Gaussian constraints in the likelihood function.For a given value of , the NPs  allow variations of the expectations for signal and background consistent with the corresponding systematic uncertainties, hence their fitted values result in deviations from the nominal expectations that globally provide the best fit to the data.This procedure reduces the impact of systematic uncertainties on the sensitivity of the search by taking advantage of the highly populated background-dominated bins included in the likelihood fit.Statistical uncertainties in each bin of the predicted final discriminant distributions are considered through dedicated parameters (gammas) in the fit.The best-fit  is obtained by performing a binned likelihood fit to the data under the signal-plus-background hypothesis and maximising the likelihood function L (, ) over  and .
The fitting procedure was initially validated through extensive studies using pseudo-data (which is defined as the sum of all predicted backgrounds plus an injected signal of variable strength) and by performing fits to the data considering only those bins of the final discriminating variable that lie in the low BDT signal-depleted regions defined by BDT < 0.1.In both cases, the robustness of the model for systematic uncertainties was established by verifying the stability of the fitted background when varying the leading sources of uncertainty.After this, the whole BDT distribution in the data is fitted under the signal-plusbackground hypothesis.Further checks involve the comparison of the fitted nuisance parameters before and after removal of the blinding requirements.In addition, it is verified that the fit is able to determine the strength of a simulated signal injected into the real data.
The test statistic   is defined as the profile likelihood ratio,   = −2 ln(L (, θ )/L ( μ, θ)), where μ and θ are the values of the parameters that maximise the likelihood function (subject to the constraint μ ≥ 0), and θ are the values of NPs that maximise the likelihood function for a given value of .The test statistic   is evaluated with the RooFit package [83].
Exclusion limits are set on , derived by using   in the CL s method [84,85].For a given signal scenario, values of  yielding CL s < 0.05, where CL s is computed using the asymptotic approximation [82], are excluded with at least 95% confidence.Exclusion limits are also verified using pseudo-experiments, which give consistent results to within 25%.

Results
A binned likelihood fit under the signal-plus-background hypothesis is performed on each BDT discriminant distribution for each mass point (  ,   ), in the three signal regions, in the two  regions combined, and in all three regions combined.This gives a total of 5 × 18 = 90 fits.The unconstrained parameter of the fit is the signal strength ().
Figure 3 shows the BDT output distributions after the background-only fit to the data for the  →  searches for the most sensitive high mass point (  = 1250 GeV,   = 300 GeV) and the least sensitive low mass point (  = 500 GeV,   = 300 GeV), respectively.No significant pulls or constraints are obtained for the fitted nuisance parameters, resulting in a post-fit background prediction in each signal region that is very close to the pre-fit prediction, except with reduced uncertainties resulting from the fit.
No excess of events is observed beyond the expected SM background, hence 95% CL upper limits are set on the production cross section (   →  → ) as shown in Table 4 for the combined  regions, the   region, and all regions combined after correcting the  → ,   branching fractions assuming the  decays like a SM Higgs boson [86].The combined limit is dominated by the 1ℓ2 had channel and is improved by 26%-53% after adding the 2ℓ2 had and  2ℓ2 had channels.The results are dominated by the statistical uncertainty.The main contributions to the total systematic uncertainty arise from the tau-lepton fakes, the diboson modelling, the  had identification efficiency, lepton identification, and trigger efficiencies.Their impact on the cross section (   →  → ) is summarized in Table 5 for the leastand most-sensitive mass points.The impact on the combined limits is between 2% and 14% across all mass points.
The best expected combined limit is 85 fb for (   →  → ) for   = 1250 GeV and   = 300 GeV.The observed and expected 95% CL upper limits are summarized in Figure 4 for (   →  → ), (   →  →  →   +  − ), and (   →  →  →   +  − ) as a function of combined   and   masses (  +  /25) in GeV.The maximum values allowed for (   →  →  →   +  − ) and (   →  →  →   +  − ) as a function of   and   in the NMSSM parameter space are also shown for comparison.These values are obtained via a parameter space scan using NMSSMTOOLS 6.0.0 [87][88][89] and NMSSMCALC [90], following recommendations from Ref. [20].The scan uses measurements of Higgs boson properties, searches for supersymmetry, B-meson physics, and searches for dark matter to exclude cross section values above the indicated line.The observed limits approach the allowed cross sections in the low-  and low-  part of the NMSSM parameter space.This NMSSM scan for   is also compatible with other NMSSN scans using  →  → , , and  final states.
The corresponding 2D limits are also shown in Figure 5 as a function of   and   masses in GeV.The limit at each mass point is compared to the limits obtained using the BDTs trained for the (lower or upper) neighbouring mass values in the fit and found to be consistent within 15%, indicating that interpolating the limits between mass points is unnecessary.The fit results are also used to set limits on the cross sections for  →  →   decays in events with a pair of -leptons and multiple light-leptons.
TeV corresponding to an integrated luminosity of 140 fb −1 recorded with the ATLAS detector at the Large Hadron Collider.The explored  masses range from 500 to 1500 GeV, with the corresponding  mass in the range 200-500 GeV.The search selects events with two hadronically decaying -lepton candidates from  →  +  − decays and one or two light leptons (ℓ = , ) from  →  decays.A multivariate discriminant based on event kinematics is used to separate the signal from the background.No excess of events is observed beyond the expected SM background and 95% CL upper limits on the cross-section for (   →  → ), assuming the same SM-Higgs boson-like decay branching ratios for  →  decay, are derived between 72 fb and 542 fb.Upper limits on the visible cross-sections (   →  →  →  ) and (   →  →  →  ) are set in the ranges 3-26 fb and 5-33 fb, respectively.

Figure 1 :
Figure 1: Representative diagram that contributes to  →  production via the gluon fusion process.

Figure 2 :
Figure 2: Kinematic distributions obtained after a background-only fit to data (Post-fit) in the 1ℓ2 had signal region:    (a), leading  had  T (b), Δ(ℓ, j) (c); in the 2ℓ2 had signal region:    (d), leading  had  T (e), Δ(ℓ, ℓ) (f); in the  2ℓ2 had signal region:    (g), leading  had  T (h), sum of  miss  and  T of jets (i).The purple (red) dashed line represents the signal for   = 500 GeV and   = 300 GeV (  = 1250 GeV and   = 300 GeV) normalised to total background.The error band includes systematic and statistical uncertainties.The lower panel shows the ratio of data to post-fit background.

Figure 3 :Figure 4 :
Figure3: BDT output distributions obtained from a background-only fit to the data for the  →  search with   = 500 GeV and   = 300 GeV in the channel of (a) 1ℓ2 had , (b) 2ℓ2 had , (c)  2ℓ2 had ; and with   = 1250 GeV and   = 300 GeV in the channel of (d) 1ℓ2 had , (e) 2ℓ2 had , (f)  2ℓ2 had .The red dashed line represents the signal normalised to total background.The error band includes systematic and statistical uncertainties.
Thus three dedicated samples are generated independently by forcing  →  or  →   decays in the following final states:  +  (ℓ q′ ),  +  (ℓℓ) and  +  (ℓℓ) ( q, ).The choice of   considered is set to 200 GeV to enable  decay to on-shell  or  boson pairs and up to 500 GeV to have a narrow width approximation.The background contribution limits the lower bound of   (500 GeV) while the upper bound of   (1500 GeV) is required to be in the resolvable  →  decay region.Eighteen mass points corresponding to various combinations of   and   values for each of the three final states are generated to cover the most interesting phase space, and are enumerated in grids (in GeV) of   = [500, 750, 1000, 1250, 1500] for [34,0, 300]and   = [750, 1000, 1250, 1500] for   = [400, 500].Simulated SM non-resonant  from the ggF process is also generated, for the systematic studies of the  →  signals, using the Powheg-Box v2 generator at the next-leading order (NLO)[34, 35]with the NNPDF2.3LO PDF set and interfaced with Pythia8.2 for PS and hadronisation.

Table 2 :
Input variables used in the BDT training for the 1ℓ2 had , 2ℓ2 had , and  2ℓ2 had channels, where "×" stands for used and "-" stands for not used.The minimum angular distance is evaluated by using all jets in an event.

Table 4 :
Summary of observed and expected 95% CL upper limits on the production cross section of (   →  → ) in fb in the combination of  +  , and in the  and   individual channels.The best expected limit is 85 fb for   = 1250 GeV and   = 300 GeV.The combination limits gain about 26%-53% over the best individual limit from the 1ℓ2 had channel.  Combined (   →  → ) [fb] (   →  →  ) [fb] (   →  →  ) [fb]

Table 5 :
Relative uncertainties in the observed 95% CL upper limits of (   →  → ) obtained from the combined fit to the data for the least-and most-sensitive mass points.The uncertainties are symmetrized and grouped into the categories described in the systematic section.Source of uncertainty Δ/(   →  → ) [%]   = 500,   = 300 [GeV]   = 1250,   = 300 [GeV] search for a new heavy scalar particle  decaying into a Standard Model (SM) Higgs boson (with  →  +  − ) and a new singlet scalar particle  ( →  with  = , ) is presented.The search is based on a data sample of   collisions at A