Search for heavy Higgs bosons with flavour-violating couplings in multi-lepton plus $b$-jets final states in $pp$ collisions at 13 TeV with the ATLAS detector

A search for new heavy scalars with flavour-violating decays in final states with multiple leptons and $b$-tagged jets is presented. The results are interpreted in terms of a general two-Higgs-doublet model involving an additional scalar with couplings to the top-quark and the three up-type quarks ($\rho_{tt}$, $\rho_{tc}$, and $\rho_{tu}$). The targeted signals lead to final states with either a same-sign top-quark pair, three top-quarks, or four top-quarks. The search is based on a data sample of proton-proton collisions at $\sqrt{s}=13$ TeV recorded with the ATLAS detector during Run 2 of the Large Hadron Collider, corresponding to an integrated luminosity of 139f b$^{-1}$. Events are categorised depending on the multiplicity of light charged leptons (electrons or muons), total lepton charge, and a deep-neural-network-based categorisation to enhance the purity of each of the signals. Masses of an additional scalar boson $m_{H}$ between $200-630$ GeV with couplings $\rho_{tt}=0.4$, $\rho_{tc}=0.2$, and $\rho_{tu}=0.2$ are excluded at 95% confidence level. Additional interpretations are provided in models of $R$-parity violating supersymmetry, motivated by the recent flavour and $(g-2)_\mu$ anomalies.


Introduction
Several extensions of the Standard Model (SM) propose the augmentation of the Higgs sector by the addition of a second complex Higgs doublet [1,2] (2HDM), giving rise to five Higgs bosons: two CP-even scalar fields ℎ and , one CP-odd pseudo-scalar , and two charged fields ± . The two CP-even scalars are expected to mix; however, the measurement of Higgs boson properties has revealed no deviations from the expectations of the Standard Model [3,4]. This implies that extra scalars from 2HDMs have to be either very heavy (decoupling limit) or have a vanishingly small mixing with the SM Higgs (alignment limit). To avoid flavour changing neutral Higgs (FCNH) couplings mediated by the SM Higgs, a discrete 2 symmetry is usually imposed [1,2]. A large set of searches for heavy scalars or pseudo-scalars with flavour-conserving decays has been performed in ATLAS [5][6][7][8][9][10][11][12] and CMS [13][14][15][16][17][18][19][20][21][22][23][24][25]. However, if the 2 symmetry is dropped, alignment automatically emerges when all heavy Higgs quartic couplings are O (1) [26]. Therefore, models without 2 symmetry can lead naturally to the alignment limit and predict FCNH couplings in the heavy Higgs sector, while respecting the SM-like nature of the ℎ(125) discovered at the LHC.
The search presented here targets a general two Higgs doublet model (g2HDM) without 2 symmetry, where the heavy Higgs bosons feature FCNH couplings. Only couplings involving top-quarks are considered: , , and . The parameters indicate the coupling of the heavy Higgs boson to particles and . The notation is used to refer to both the and couplings. These kinds of g2HDMs with extra top Yukawa couplings are phenomenologically interesting since they can explain the generation of the baryon asymmetry through the couplings or [27]. No distinction is performed between the different chiralities in the coupling, and an effective coupling = √︃ˆ2 +ˆ2 / √ 2 is used, where the hat symbol is used to denote the original couplings in the g2HDM Lagrangian.
The production and decay modes at tree level considered in the analysis are shown in Figure 1. The presence of the coupling opens the possibility of same-sign top production, as shown in Figures 1(a) (sstt) and 1(b) (ttq), and also three-top production, as shown in Figures 1(c) (ttt) and 1(d) (tttq). The three-top signature is a sensitive probe of beyond-the-SM (BSM) physics [28][29][30][31]. Additionally, four-top quarks can be produced, as shown in Figure 1(e) (tttt) 1 . The targeted final state is characterised by multiple leptons (electrons and muons) and multiple jets containing -flavoured hadrons ( -jets). Many of the production modes are expected to be charge-asymmetric (with preference to positively charged), and this feature is exploited in the search. The relevance of each production mode depends on the chosen coupling. A benchmark of = 0.4 and = 0.2 is chosen to guide the analysis design and optimisation. The values are chosen so that the signal could account for the higher¯and¯¯yields observed in ATLAS analyses [32][33][34][35][36]. Significant kinematic differences and a much stronger charge asymmetry are expected from the targeted signals, which allow them to be differentiated from simple rescalings of both processes. For the chosen couplings, the production (Figures 1(b) and 1(c)) cross section is two orders of magnitude larger than production (Figures 1(d) and 1(e)), and three orders of magnitude larger than same-sign tops production via -channel (Figures 1(a)).
This analysis is the first to target BSM production leading to three-top final states and the first to probe the g2HDM. The production of four-tops in the SM or via heavy scalars was explored previously by ATLAS [35][36][37][38][39] and CMS [40][41][42][43]. Limits on the g2HDM model couplings can be derived from LHC Higgs measurements, physics, and assuming the couplings stay perturbative [44], leading to The subsequent decay can lead to a final state with high multiplicity of leptons and -jets that is targeted by the search. Single production through gluon-gluon fusion, → →¯/ , is not considered since the decay does not lead to the relevant multi-lepton final state. < 1.5, andˆ< 0.1, implying < 1.06. In addition, −¯mixing provides the constraint < 0.14 [45], and −¯mixing provides the constraint |ˆˆ * | < 0.02 [45], which translates to | * | < 0.01 assuming a negligibleˆ. These constraints are derived assuming ≈ + = 500 GeV, and become weaker for higher masses.
The first model features production of electroweakinos (wino or Higgsino) that decay via a lepton-numberviolating RPV coupling of the¯type to a lepton and third-generation quarks. The corresponding term in the superpotential has the form 33 3¯3 , where ∈ 2, 3 is a generation index, and , ,¯are the lepton doublet, quark doublet, and down-type quark singlet superfields, respectively. Relevant diagrams for the production and decay are shown in Figures 2(a) and 2(b). The second model features direct smuon production and decay to a bino-like neutralino, which in turn decays via the same RPV coupling ( 33 ), as shown in Figure 2(c). Figure 2: Signal diagrams for the RPV SUSY signals used as additional interpretation in the analysis. The subsequent decay can lead to a final state with high multiplicity of leptons and -jets that is targeted by the search.

ATLAS detector
The ATLAS detector [62] at the LHC covers nearly the entire solid angle around the collision point. 2 It consists of an inner tracking detector surrounded by a thin superconducting solenoid, electromagnetic and hadron calorimeters, and a muon spectrometer incorporating three large superconducting air-core toroidal magnets.
The inner-detector system (ID) is immersed in a 2 T axial magnetic field and provides charged-particle tracking in the region | | < 2.5. The high-granularity silicon pixel detector covers the vertex region and typically provides four measurements per track, the first hit normally being in the insertable B-layer (IBL) installed before Run 2 [63,64]. It is followed by the silicon microstrip tracker (SCT), which usually provides eight measurements per track. These silicon detectors are complemented by the transition radiation tracker (TRT), which enables radially extended track reconstruction up to | | = 2.0. The TRT also provides electron identification information based on the fraction of hits (typically 30 in total) above a higher energy-deposit threshold corresponding to transition radiation.
The calorimeter system covers the pseudorapidity range | | < 4.9. Within the region | | < 3.2, electromagnetic calorimetry is provided by barrel and endcap high-granularity lead/liquid-argon (LAr) calorimeters, with an additional thin LAr presampler covering | | < 1.8 to correct for energy loss in material upstream of the calorimeters. Hadron calorimetry is provided by the steel/scintillator-tile calorimeter, segmented into three barrel structures within | | < 1.7, and two copper/LAr hadron endcap calorimeters. The solid angle coverage is completed with forward copper/LAr and tungsten/LAr calorimeter modules optimised for electromagnetic and hadronic energy measurements, respectively.
The muon spectrometer (MS) comprises separate trigger and high-precision tracking chambers measuring the deflection of muons in a magnetic field generated by the superconducting air-core toroidal magnets. The field integral of the toroids ranges between 2.0 and 6.0 T m across most of the detector. Three layers of precision chambers, each consisting of layers of monitored drift tubes, cover the region | | < 2.7, 2 ATLAS uses a right-handed coordinate system with its origin at the nominal interaction point (IP) in the centre of the detector and the -axis along the beam pipe. The -axis points from the IP to the centre of the LHC ring, and the -axis points upwards. Cylindrical coordinates ( , ) are used in the transverse plane, being the azimuthal angle around the -axis. The pseudorapidity is defined in terms of the polar angle as = − ln tan( /2). Angular distance is measured in units of Δ ≡ √︁ (Δ ) 2 + (Δ ) 2 .
complemented by cathode-strip chambers in the forward region, where the background is highest. The muon trigger system covers the range | | < 2.4 with resistive-plate chambers in the barrel, and thin-gap chambers in the endcap regions.
Interesting events are selected by the first-level trigger system implemented in custom hardware, followed by selections made by algorithms implemented in software in the high-level trigger [65]. The first-level trigger accepts events from the 40 MHz bunch crossings at a rate below 100 kHz, which the high-level trigger reduces further to record events to disk at about 1 kHz.
An extensive software suite [66] is used in the reconstruction and analysis of real and simulated data, in detector operations, and in the trigger and data acquisition systems of the experiment.

Data and simulated event samples
This analysis uses data from collisions at √ = 13 TeV collected by the ATLAS experiment during 2015-2018. After the application of data-quality requirements [67], the data sample corresponds to an integrated luminosity of 139 fb −1 [68]. The number of additional interactions per bunch crossing (pile-up) in this sample ranges from about 8 to 70, with an average of 34. Only events recorded under stable beam conditions and for which all detector subsystems were known to be in a good operating condition are used. The trigger requirements are discussed in Section 5.
Monte Carlo (MC) simulation samples were produced for the different signal and background processes. Table 1 shows the configurations used in this analysis, with the samples in parentheses and in grey indicating those used to estimate the systematic uncertainties. All simulated samples, except those produced with the S [69] event generator, utilised E G 1.2.0 [70] to model the decays of heavy-flavour hadrons. All samples showered with P use the A14 set of tuned parameters [71] (referred to as 'tune'), whereas those showered with H use the H7-UE tune [72]. Pile-up was modelled using events from minimum-bias interactions generated with P 8.186 [73] with the A3 tune [74], and overlaid onto the simulated hard-scatter events according to the luminosity profile of the recorded data. The generated events were processed through either a full simulation of the ATLAS detector geometry and response using G 4 [75], or a faster simulation where the full G 4 simulation of the calorimeter response is replaced by a detailed parameterisation of the shower shapes [76]. Both types of simulated events were processed through the same reconstruction software used for the collision data. Corrections were applied to the simulated events so that the particle candidates' selection efficiencies, energy scales and energy resolutions match those determined from data control samples. The simulated samples are normalised to their cross sections, and generated to the highest order available in perturbation theory. Samples used to model the g2HDM signal were generated at leading-order (LO) in QCD with M -v2.9.3 [77] with the NNPDF3.1 [78] parton distribution function (PDF) set. Samples were generated for masses in the range of 200 GeV to 1.5 TeV with a 100 GeV step, and processed with the ATLAS Fast Simulation [76]. All signals were produced with the set of couplings = = = 0.1. Each signal process described in Section 1 was generated as a separate MC sample. The LO cross section obtained from Madgraph is used for the normalisation of the signals. Simulated events for different coupling values are obtained by rescaling the samples to match the target cross section and branching ratio of each subprocess. For a given choice of couplings all the processes are taken into account and rescaled. The RPV SUSY signal samples were generated with M v2.9.3, with up to two extra jets at LO in QCD. The matching scale is set at 1/4 of the mass of the SUSY particle being produced. Supersymmetric particle decays via the RPV coupling are simulated with 25% branching ratio to / / / each, a -quark, and a -or -quark depending on the lepton charge. The identical branching ratio to second-and third-generation leptons follows from the choice of 233 = 333 , while the balance in charged and neutral leptons is an assumption. This assumption originates naturally from the presence of a left-handed lepton superfield in the¯coupling, but is distorted by the large mass difference between top and bottom quarks and also affected by the choice of tan( ). Signal cross sections are calculated to next-to-leading order in the strong coupling constant, adding the resummation of soft gluon emission at next-to-leading-logarithmic accuracy (NLO+NLL) [79][80][81][82][83]. The nominal cross section and the uncertainty are taken from an envelope of cross-section predictions using different PDF sets and factorisation and renormalisation scales, as described in Ref. [84]. All signal events were showered with P 8.245 [73] using the NNPDF2.3 [85] PDF set.
The samples used to model the¯and the¯( / * → ℓ + ℓ − ) backgrounds were generated using S -2.2.10 [86] and S -2.2.11, where the matrix element (ME) were calculated for up to one and zero additional partons at next-to-leading-order (NLO) in QCD, respectively, and up to two partons at LO in QCD using C [87] and O L [88]. The ME were merged with the S parton shower (PS) [89] using the M P @N prescription [90], with a CKKW merging scale of 30 GeV for thes ample. These samples are generated using the NNPDF3.0 [91] PDF set, along with the dedicated set of tuned parton-shower parameters developed by the S authors. The invariant mass of the lepton pair ( ℓ + ℓ − ) in the¯( / * → ℓ + ℓ − ) sample is set to be greater than 1 GeV. Both the factorisation and renormalisation scales are set to = = /2 in the¯sample, where is defined as the scalar sum of the transverse masses √︃ 2 + 2 of the particles generated from the ME calculation. In addition to this prediction at NLO in QCD, higher-order corrections relating to electroweak (EW) contributions are also included. First, event-by-event correction factors are applied that provide virtual NLO EW corrections of the order 2 2 s derived using the formalism described in Ref. [92] along with LO corrections of order 3 . Second, real emission contributions from the sub-leading EW corrections at order 3 s [93] are simulated with an independent S -2.2.10 sample produced at LO in QCD. The complete¯simulation is normalised to the total cross section of (¯) = 614.7 fb that comes from the S configuration outlined above considering NLO QCD and NLO EWK effects, based on a similar strategy as used in Ref. [94]. The¯/ * sample is normalised to the cross section (¯/ * ) = 839 fb, calculated at NLO QCD and NLO EW accuracy using M G 5_ MC@NLO [95] and scaled by an off-shell correction estimated at one-loop level in s .
The production of SM¯¯events was modelled using the M 5_ MC@NLO v2.6.2 generator that provides matrix elements at NLO in QCD with the NNPDF3.1 PDF set. The functional form of the renormalisation and factorisation scales are set to = = /4. Top quarks are decayed at LO using M S to preserve all spin correlations. The events are interfaced with P 8.230 for the parton shower and hadronisation, using the NNPDF2.3 PDF set. The production of¯¯events is normalised to a cross section of 12 fb computed at NLO in QCD including EW corrections [93].
Diboson ( ) background processes were simulated with S 2.2.2 [86]. The matrix element was calculated using C [87] and O L [88] with NLO accuracy in QCD for up to one additional parton and at LO accuracy for up to three additional partons, and merged with the S PS using M P @N prescription [90]. The NNPDF3.0 set of PDFs was used, along with the dedicated parton-shower tune for S . The cross section of ( ) = 104 pb used to normalise the sample was computed by S 2.2.2.
Samples for¯ℎ,¯, and single top production were generated using the NLO generator P -B -2 [96][97][98][99][100][101] and interfaced with P 8 for the parton showering and fragmentation. These samples used the NNPDF3.0 PDF set. The ℎ damp parameter, which controls the transverse momentum of the first additional emission beyond the LO Feynman diagram in the PS and therefore regulates the high-T radiation, is set to 3/4 × ( +¯+ ℎ ) in the¯ℎ sample and to 1.5 × in the¯and single top samples, where ( ℎ ) denotes the mass of the top quark (SM Higgs boson).
A dedicated¯sample including rare → * (→ ℓ + ℓ − ) radiative decays,¯→ + −¯ℓ+ ℓ − , was generated using a ME calculated at LO in QCD and requiring ℓ + ℓ − > 1 GeV. In this sample the photon can be radiated from the top quark, the boson, or the -quark. Both the¯( / * → ℓ + ℓ − ) and → + −¯ℓ+ ℓ − samples are combined and together form the "¯/ * " sample. The contribution from internal photon conversions ( * → ℓ + ℓ − ) with ℓ + ℓ − < 1 GeV were modelled by QED multi-photon radiation via the PS in an inclusive¯sample and is referred to as "¯ * (LM)". Dedicated +jets samples containing electrons from material photon conversion ( → + − ) or internal photon conversion were generated with P -B and interfaced with P 8 for the parton showering and fragmentation. These samples are used to model the data in control regions enriched in material and internal conversion electrons, as explained in Section 5.
The remaining rare background contributions listed in Table 1 are normalised using their NLO theoretical cross sections, except for the¯,¯+ − ,¯,¯ℎℎ, and¯ℎ processes, for which a LO cross section is used. Table 1: The configurations used for event generation of signal and background processes. The samples used to estimate the systematic uncertainties are indicated in parentheses and grey. refers to production of an electroweak boson ( or / * ). The matrix element order refers to the order in the strong coupling constant of the perturbative calculation. The "¯(EW)" sample also includes next-to-leading-order electroweak corrections. Tune refers to the underlying-event tune of the parton shower generator. MG5_aMC refers to M G 5_ MC@NLO 2.2, 2.3, or 2.6; P 8 refers to version 8.2 [102]; M P @N refers to the method used in S to match the matrix element to the parton shower. All samples include leading-logarithm photon emission, either modelled by the parton shower generator or by P [103]. The mass of the top quark ( ) and SM Higgs boson were set to 172.5 GeV and 125 GeV, respectively.

Process
Generator ME order Parton shower PDF Tune g2HDM signal MG5_aMC

Event reconstruction and object identification
Interaction vertices from the collisions are reconstructed from at least two tracks with transverse momentum ( T ) larger than 500 MeV that are consistent with originating from the beam collision region in the -plane. If more than one primary vertex candidate is found in the event, the candidate for which the associated tracks form the largest sum of squared T is selected as the hard-scatter primary vertex [104].
Electron candidates are reconstructed from energy clusters in the electromagnetic calorimeter matched to a track in the ID [105]. They are required to satisfy T > 10GeV and | cluster | < 2.47, with the transition region between the endcap and barrel calorimeters (1.37 < | cluster | < 1.52) excluded. Loose and tight electron identification working points are used [105], based on a likelihood discriminant that employs calorimeter, tracking and combined variables to distinguish between electrons and jets. The associated track of an electron candidate is required to have at least two hits in the pixel detector and seven hits total in the pixel and silicon-strip detectors combined. For the tight identification working point, one of these pixel hits must be in the innermost layer, or the next-to-innermost layer if the module traversed in the innermost layer is non-operational, and there must be no association with a vertex from a reconstructed photon conversion [106] in the detector material (denoted as 'material conversion').
Muon candidates are reconstructed by combining tracks in the ID with tracks in the MS [107]. The resulting muon candidates are re-fit using the complete track information from both detector systems [108]. They are required to satisfy T > 10 GeV and | | < 2.5. Loose and medium muon identification working points are used [108].
Electron (muon) candidates are matched to the primary vertex by requiring that their transverse impact parameter, 0 , satisfies | 0 / ( 0 )| < 5 (3), where ( 0 ) is the measured uncertainty in 0 , and requiring that the longitudinal impact parameter, 0 , satisfies | 0 sin | < 0.5 mm, where is the polar angle of the track.
To further suppress leptons from heavy-flavour hadron decays, misidentified jets, or photon conversions (collectively referred to as 'non-prompt leptons'), lepton candidates are also required to be isolated in the tracker and in the calorimeter [109]. A track-based lepton isolation criterion is defined by calculating the quantity = trk T , where the scalar sum includes all tracks (excluding the lepton candidate itself) within the cone defined by Δ < cut around the direction of the lepton. The value of cut is the smaller of min and 10 GeV/ ℓ T , where min is set to 0.2 (0.3) for electron (muon) candidates and where ℓ T is the lepton T . All lepton candidates must satisfy / ℓ T < 0.15. Additionally, electrons (muons) are required to satisfy a calorimeter-based isolation criterion: the sum of the transverse energy within a cone of size Δ = 0.2 around the lepton, after subtracting the contributions from pile-up and the energy deposit of the lepton itself, is required to be less than 20% (30%) of ℓ T . Muons are required to be separated by Δ > 0.2 from any selected jets (defined below). If two electrons are closer than Δ = 0.1, only the one with the higher T is considered. Electrons within Δ = 0.1 of a selected muon are removed.
The selection criteria described above greatly suppress the contribution from non-prompt leptons. However, several channels considered in this search have additional suppression requirements targeting the main non-prompt lepton types. Non-prompt leptons from hadron decays that contain bottom-and charm-quarks, denoted by 'heavy-flavour (HF) non-prompt leptons', are further rejected using a boosted decision tree (BDT) discriminant, referred to as the non-prompt lepton BDT [107,110], which uses isolation and displacement information associated with a track jet that matches the selected light lepton. Three working points (WPs) are used: Tight, VeryTight, and Tight-not-VeryTight. The first two provide a selection of prompt leptons with an efficiency for electrons (muons) that satisfy the calorimeter-and track-based Table 2: Description of the loose inclusive (" "), medium inclusive (" "), medium exclusive (" ex "), and tight (" ") lepton definitions. The electron * is required to fulfil, in addition to the corresponding lepton definition requirements, those corresponding to an internal or material conversion candidate. isolation criteria of about 70% (50%) for T ∼ 20 GeV and reaches a plateau of 90% at T ∼ 50 (55) GeV. The corresponding rejection factor 3 against leptons from the decay of -hadrons ranges from 6 to 18 (29 to 50) depending on the T and , after resolving ambiguities between overlapping reconstructed objects. The Tight-not-VeryTight WP allows to select non-prompt leptons and is part of the event selection of the control regions enriched in HF non-prompt lepton background, as described in Section 6.
To further suppress electrons with incorrect charge assignment, a BDT discriminant based on calorimeter and tracking quantities [111] is used. An efficiency of approximately 96% in the barrel region and 81% in the endcaps is obtained, with rejection factors of 19 in the barrel region and 40 in the endcaps. Material and internal conversion candidates are identified based on a combination of requirements on the invariant mass of tracks and the radius from the reconstructed displaced vertex to the primary vertex. Material conversion candidates have a reconstructed displaced vertex with radius > 20 mm that includes the track associated with the electron. 4 The invariant mass of the associated track and the closest (in Δ ) opposite-charge track, calculated at the conversion vertex, is required to be less than 100 MeV. Internal conversion candidates, which correspond to the internal photon conversions (see Section 3), must fail the requirements for material conversions, and the invariant mass of the track pair, calculated at the primary vertex, is also required to be less than 100 MeV.
The lepton working points used in this analysis are summarised in Table 2. After the initial categorisation based on loose leptons (corresponding to " "), the most optimal lepton working point to further optimise the event selection is chosen depending on the main background processes and the expected number of events in each category. The defined working points are medium inclusive (" "), medium exclusive (" ex "), and tight (" "). The various choices can be seen for the signal and control regions in Section 5. Electron candidates from internal or material conversion are rejected from the , ex , and electron selections. They are used to define control regions enriched in internal or material conversions, and are collectively denoted * (see Section 5).
The constituents for jet reconstruction are identified by combining measurements from both the ID and the calorimeter using a particle flow (PFlow) algorithm [112,113]. Jet candidates are reconstructed from 3 The rejection factor is defined as the reciprocal of the efficiency. 4 The beampipe and insertable B-layer inner radii are 23.5 mm and 33 mm, respectively. these PFlow objects using the anti-algorithm [114, 115] with a radius parameter of = 0.4. They are calibrated using simulation with corrections obtained from in situ techniques in data [113]. Only jet candidates with T > 25 GeV that satisfy | | < 2.5 are selected. To reduce the effects of pile-up, each jet with T < 60 GeV and | | < 2.4 is required to satisfy the "Tight" working point of the Jet Vertex Tagger (JVT) [116] criteria used to identify the jets as originating from the selected primary vertex. A set of quality criteria is also applied to reject events containing at least one jet arising from non-collision sources or detector noise [117].
Jets containing -hadrons are identified ( -tagged) via the DL1r algorithm [118,119] that uses a deeplearning neural network based on the distinctive features of the -hadrons in terms of the impact parameters of tracks and the displaced vertices reconstructed in the ID. Additional input to this network is provided by discriminant variables constructed by a recurrent neural network [120], which exploits the spatial and kinematic correlations between tracks originating from the same -hadron. For each jet, a value for the multivariate -tagging discriminant is calculated. A jet is -tagged if the -tagging score is above a certain threshold, referred to as a working point (WP). Four WPs are defined with average expected efficiencies for -jets of 60%, 70%, 77% and 85%, as determined in simulated¯events. The -tagging distribution obtained by ordering the resulting five exclusive bins from the four WPs from higher to lower -jet efficiency is referred to as "pseudo-continuous" -tagging score, and it is used as input to the multivariate analysis discriminant described in Section 5. In this search, a jet is considered -tagged if it passes the WP corresponding to 77% or 60% efficiency to tag a -jet, with a light-jet 5 rejection factor of about 200 or 2500, and a charm-jet ( -jet) rejection factor of about 6 or 40, as determined for jets with T > 20 GeV and | | < 2.5 in simulated¯events [118]. Correction factors derived from dedicated calibration samples enriched in -jets, -jets, or light jets, are applied to the simulated samples [121-123]. The notation 77% and 60% is used to denote the number of -tagged jets with the corresponding WP.
Ambiguities between independently reconstructed electrons, muons and jets can arise. A sequential procedure, referred to as 'overlap removal', is performed to resolve these ambiguities and, thus, avoids double counting of particle candidates. This procedure is applied to leptons satisfying the criteria. If two electrons are closer than Δ = 0.1, only the one with the higher T is considered. If an electron and a muon overlap within Δ = 0.1, the muon is removed if it is reconstructed from a track and calorimeter deposits consistent with a minimum ionising particle, else the electron is removed. If an electron and a selected jet are within Δ < 0.2 of each other, the jet is removed if it is not a -tagged jet 6 or if it has T > 200 GeV. Muons are required to be separated by Δ > 0.4 from any jet that is ghost-associated [124] to it. If the jet satisfying the Δ < 0.4 requirement is not a -tagged jet and contains less than three tracks with T > 500 MeV, the overlapping jet is rejected from the event, otherwise, the muon is rejected. A lepton lying within a variable-size cone depending on the lepton T and with a maximum radius of = 0.4 around a selected jet that survived all previous overlap criteria is rejected.
The missing transverse momentum ì miss T (with magnitude miss T ) is defined as the negative vector sum of the T of all selected and calibrated objects in the event, including a term to account for the momentum from soft particles that are not associated with any of the selected objects [125]. This soft term is calculated from inner-detector tracks matched to the selected primary vertex, which makes it more resilient to contamination from pile-up interactions. The miss T distribution is used as an input variable to the machine learning training discussed in Section 5.

Search strategy
Events are required to satisfy a minimal preselection and are categorised into orthogonal signal regions (SRs) based on different criteria such as number of leptons, total lepton charge (indicated by ), and a multi-output deep neural network classifier (DNN cat ). This categorisation provides a set of regions that are sensitive to all the possible signal production and decay modes considered in this search. A deep neural network is trained in each of the signal regions to discriminate the signal from the backgrounds (DNN SB ). Additional orthogonal control regions (CRs) are defined in order to fit the normalisation of the main backgrounds. Dedicated kinematic selections are applied to the control regions to improve the purity of the targeted backgrounds. A maximum-likelihood fit is performed across categories to test for a possible signal and constrain in-situ the leading backgrounds simultaneously.
At trigger level, events were selected for read-out using a combination of single-lepton and dilepton triggers, requiring the electrons or muons to satisfy identification criteria similar to those used in the offline reconstruction and isolation requirements [126,127]. Single-electron triggers require a minimum T threshold of 24 (26)  For the analysis selection, at least two jets and at least two leptons are required in the event, and leptons are required to match, with Δ < 0.15, the corresponding leptons reconstructed by the trigger and to have a T exceeding the trigger T threshold by 1 GeV. Events are required to contain at least one -tagged jet with the 60% efficiency working point, or at least two -tagged jets with the 77% efficiency working point. If events contain pairs of opposite-sign charge and same-flavour leptons (OS-SF), all pairs are required to satisfy a mass requirement on the dilepton system mass of OS−SF ℓ + ℓ − > 12 GeV and | OS−SF ℓ + ℓ − − | > 10 GeV. Three disjoint event categories are defined according to the number of loose leptons in the event: same-charge dilepton (2ℓSS), three-lepton (3ℓ), and four-lepton (4ℓ) categories. The four-lepton category is inclusive and contains events with higher lepton multiplicity, while the other two are exclusive. Leptons are ordered by T in the 2ℓSS and 4ℓ regions. In the 3ℓ regions the lepton with opposite-sign charge is taken first, followed by the two same-sign leptons in T order. The T and identification requirements of each lepton in each category are optimised based on a compromise between non-prompt lepton background suppression and signal acceptance enhancement, and are summarized in Table 3.
Multiple control regions (CRs) are defined in order to fit the normalisation of the leading backgrounds. These regions are orthogonal to the signal regions and with one another based on different requirements on the lepton working points, dilepton invariant mass, and jet and -jet multiplicities. Two regions enriched in diboson and¯are defined by requiring one OS-SF pair compatible with a boson, GeV, differing in the jet multiplicity requirement. Two control regions enriched in photon conversions from → * (→ ) are defined, according to the identification of the electron as a material conversion or internal conversion candidate. Finally, six control regions are defined enriched in HF non-prompt leptons, making use of the exclusive lepton identification ex to be orthogonal to the signal regions. Events with two same-sign leptons are split according to the criteria ( , ex ), ( ex , ), ( ex , ex ) for the leading and subleading leptons in T , and further split according to the fake-lepton-candidate flavour. The fake-lepton candidate is assumed to be the subleading lepton in the ( , ex ), ( ex , ex ) regions, and the leading lepton in the ( ex , ) region. This splitting creates six control regions sensitive Table 3: Event selection summary in the signal regions. Leptons are ordered by T in the 2ℓSS and 4ℓ regions. In the 3ℓ regions the lepton with opposite-sign charge is taken first, followed by the two same-sign leptons in T order. In the lepton selection, T, M, L stand for Tight, Medium and Loose lepton definitions. In the region naming, the "CAT ttX" denotes the category based on the DNN cat output enriched in the signal process "ttX". Each of these regions is split according to the lepton charge of the same-sign lepton pair ("++" or "--").
Lepton category 2ℓSS 3ℓ 4ℓ Lepton definition (20, 20) (10, 20, 20) (10, 10, 10, 10) to different relative composition of electron and muon non-prompt lepton backgrounds. Additionally, the transverse mass of the leading lepton and the missing transverse energy, , is required to be lower than 250 GeV in the ( , ex ) and ( ex , ) regions to reduce the¯contribution in these CRs. The full definition of the kinematic selection applied to each control region is given in Table 4. Figure 3 illustrates the categorisation and definition of the signal and control regions that are fit simultaneously. The signal contamination is found to be at most 3% of the total prediction in the control regions, assuming = 400 GeV and = 0.4, = 0.2, and = 0.2.
In order to better target each of the possible signals, a DNN cat is trained to identify each of the five possible production and decay modes of the g2HDM signal. Two DNN cat are trained individually for the 2ℓSS and 3ℓ channels using the K library [128] with T as a backend [129] and Adam optimiser [130]. Hyperparameters are optimised with the Talos library [131]. The networks consist of nine input features, two dense fully connected layers of 33 nodes with rectified linear units as activation functions, interleaved Table 4: Event selection summary in the control regions. The notation * is used to denote material conversion or internal conversion candidates, as described in Section 4. In the HF non-prompt lepton region naming, "2ℓSStt(e)" ("2ℓSStt( )") refers to the control region enriched in non-prompt electrons (muons) from semileptonic -decays originating mostly from¯and with the lepton flavours for the leading and subleading leptons corresponding to " , " (" , "). The additional ( , ex ), ( ex , ), and ( ex , ex ) subscripts refer to the lepton definitions required for the leading and subleading leptons in each region.

Control regions
WZ¯Conversions HF non-prompt (10,20,20) (20, 20) Sum of pseudo-continuous -tagging scores of jets Pseudo-continuous -tagging score of 1st, 2nd, 3rd leading jet in T Sum of T of the jets and leptons ( T,jets , T,lep ) Angular distance of leptons (sum in the case of 3ℓ and 4ℓ) Missing transverse energy Leading transverse momentum of jet -Invariant mass of leading lepton and missing transverse energy -Di/tri/quad-lepton type variable (associated with the number of electrons/muons in event) with a drop-out layer with 20% rate, and five (three) output nodes with a soft-max activation function for the categorisation of 2ℓSS (3ℓ) events. The output categories correspond to the five production modes considered, ignoring in the 3ℓ category signals that cannot produce three leptons. Each event is categorised according to the highest class probability. The nine input features are the number of jets, -tagging score of the three leading jets, sum of -tagging score of all jets, sum of all pair-wise angular distances between leptons, scalar sum of jet T , scalar sum of lepton T , and the event miss T . The network is trained with batch size of 2000 and up to 100 epochs, using all the available signal mass points. To avoid discarding signal events in the evaluation, cross-training is used with the events divided by even/odd event number.
Since several of the probed signal processes are expected to be charge-asymmetric, all the 2ℓSS and 3ℓ   Table 3. At high signal mass, a strong migration is observed from the ttt to the tttq category, due to the high probability of additional radiation. Figures 4(c) and 4(d) show the expected fractional signal contribution in each category for the benchmark coupling. The signals originating from top-Higgs associated production (ttq and ttt) are expected to dominate across all regions, including the categories designed to target other processes, due to the much larger production cross section. This contribution is however strongly dependent on the coupling choice. For the benchmark coupling of = 0.4, = 0.2, = 0.2, decays to top-quark pairs dominate when not suppressed by the available phase-space.
A total of 27 analysis regions are defined, with 17 signal regions (10 with 2ℓSS, 6 with 3ℓ, and one 4ℓ) and 10 control regions. In each region, a given kinematic variable is fit to improve the sensitivity to the targeted signal process (signal regions) or to improve the modelling of a particular background process (control regions).
A DNN SB classifier is trained in each signal region to separate the targeted signal from the sum of backgrounds. The networks consist of 12 input features, two dense fully connected layers of 36 and 48 nodes respectively with sigmoid activation functions, interleaved with a drop-out layer with 40% rate, and one output node with a sigmoid activation function. The 12 input features are the leading jet T , number of muons, transverse mass of leading lepton and miss T system, and the nine variables that are used in the      Table 5 summarises the input variables used for each multivariate discriminant. To achieve good sensitivity over the large range of masses that are tested, the output of the classifier is decorrelated from the signal mass introducing an additional term to the loss function via distance correlation [132,133]. A hyperparameter controls the weight of the additional penalty term, with a value of = 0.5. The value was optimised to achieve a minimal signal mass dependence without compromising the discrimination power. A separate training is performed in each lepton category and signal category. The same DNN SB is used in both positive-and negative-charge regions. Figure 5 shows the DNN SB distribution of the targeted signal in each signal-enriched category, the total signal, and the background in the 2ℓSS ++ CAT sstt, 2ℓSS ++ CAT ttq, 3ℓ ++ CAT ttt, 3ℓ ++ CAT tttq, 3ℓ ++ CAT tttt, and 4ℓ categories.
In the diboson and¯control regions the fitted variable is the -jet multiplicity, −jets , where the distribution is binned with an upper limit of ≥ 2 -jets and ≥ 3 -jets respectively. The subleading lepton

Background estimation
The background processes passing the signal region selections are categorised into irreducible and reducible backgrounds. Irreducible backgrounds (Section 6.1) produce prompt leptons in their decay, i.e. leptons originating from / boson decays, leptonic -lepton decays, or internal conversions. Reducible backgrounds (Section 6.2) have prompt leptons with misassigned charge or at least one non-prompt lepton.
Except for the background from electrons with misassigned charge (denoted as QMisID), all other backgrounds are estimated using the simulated samples described in Section 3. In some cases, the simulation is improved using additional corrections derived from data control samples before the simultaneous fit to data. In particular, the event kinematics of the simulated¯and backgrounds require dedicated corrections to better describe the data. In addition, the yields of some simulated backgrounds, in particular ,¯, and non-prompt-lepton backgrounds, are adjusted via normalisation factors that are determined by performing a likelihood fit to data across all event categories (signal and control regions as defined in Tables 3 and 4) as discussed in Section 8.

Irreducible backgrounds
Background contributions with prompt leptons originate from a wide range of physics processes with the relative importance of individual processes varying by channel. The main irreducible backgrounds originate from¯,¯¯, and¯/ * production, followed by (in particular ) and¯ℎ production, and have final states and kinematic properties similar to the g2HDM signal. Smaller contributions originate from the following rare processes: , , ,¯, , and¯production.

6.1.1¯background
The¯background represents the leading background in several event categories. Despite the use of state-of-the-art simulations, accurate modelling of additional QCD and QED radiation in¯production remains challenging. Given the excellent discriminating power of the DNN SB in the signal regions, the events at lower values of the DNN SB score are enriched in and sensitive to the¯background. Additionally, the signal regions in the 2ℓ and 3ℓ categories are split by the sign of the total lepton charge ( ) to better discriminate some g2HDM signal processes and the¯process, which have a large charge asymmetry, from other SM backgrounds that are charge symmetric. This discrimination improves the modelling of this background in the simultaneous fit. Finally, the DNN cat categories with negative total lepton charge, which are depleted in signals with large charged lepton asymmetry, provide additional constraints on theb ackground, in particular at high values of the DNN SB distribution tail.
Disagreement between the data and the prefit prediction from the simulation is observed, which is accommodated by an overall normalisation factor that is assigned to the¯background, and that is determined during the likelihood fit. The measured normalisation factor for the background-only hypothesis isˆ¯= 1.50 ± 0.14, which is compatible with that determined in the SM¯¯analysis [134], and with a previous measurement of the¯production cross section [135].   Figure 6: Comparison between data and the background prediction for the distribution of (a) the number of jets in the 3ℓVV0b region before the jet multiplicity correction and (b) the number of jets in a 3ℓ region with at least one jet and exactly one -jet defined with the 60% WP after the jet multiplicity correction. The ratio of the data to the background prediction ("Pred.") is shown in the lower panel. The size of the combined statistical and systematic uncertainty in the background prediction is indicated by the blue hatched band. The last bin in each figure contains the overflow.

and¯/ * backgrounds
A comparison of the simulated sample with data showed the need for additional improvements in the modelling of the jet multiplicity spectrum. Therefore, a data-driven correction to the jet multiplicity spectrum is derived from an inclusive trilepton diboson-enriched region with zero -jets defined with the 85% WP for -jet efficiency and at least one jet (denoted as 3ℓVV0b region). The events are required to have three leptons that satisfy the same selection as in the 3ℓ CR.
Figure 6(a) shows the jet multiplicity distribution in the 3ℓVV0b region before the correction. After the correction is applied to , the modelling of the jets distribution improves significantly in a 3ℓ region with at least one jet and exactly one -jet defined with the 60% WP, as shown in Figure 6(b).
The 3ℓVV and 3ℓttZ CRs are used in the likelihood fit to improve the prediction of the background contribution from the and¯/ * processes; these processes have purities of 15% and 75% in the corresponding CRs. The numbers of jets and -jets provide good discrimination between these two processes and are used to build the control regions (number of jets) and as variables in the fit (number of -jets). The measured normalisation factors for the background-only hypothesis are:ˆ= 0.85 ± 0.30 andˆ¯= 0.97 ± 0.19, where the uncertainty includes both statistical and systematic contributions.

Other irreducible backgrounds
The rate of the background from internal conversions with ( + − ) < 1 GeV is estimated using the two dedicated CRs, 3ℓIntC and 3ℓMatC, with a purity of 98% and 30%, respectively. The total yield in each category is used in the likelihood fit to determine the normalisation factor, which is measured for the background-only hypothesis to beˆI ntC = 1.06 ± 0.23, where the uncertainty is dominated by the statistical uncertainty.

Non-prompt leptons
Non-prompt leptons originate from material conversions, heavy-flavour hadron decays, or the improper reconstruction of other particles, with an admixture that depends strongly on the lepton quality requirements and varies across event categories. These backgrounds are small in all 2ℓ and 3ℓ SRs and thus are estimated from simulation, with the normalisation determined by the likelihood fit. The non-prompt lepton background contribution in the 4ℓ SR is negligible and is therefore taken from simulation without dedicated data-driven corrections. The main contribution to the non-prompt-lepton background is from¯production, with much smaller contributions from +jets and single-top-quark processes. The non-prompt leptons in the simulated samples are labelled according to whether they originate from heavy-flavour (HF) or light-flavour (LF) hadron decays, or from a material conversion candidate (Mat. Conv.). The HF category includes leptons from both bottom and charm decays.
Two corrections are applied to the¯and the overall non-prompt lepton background simulation before the fit. First, the¯+≥ 1 -jet contribution from simulation is known to be mismodelled and is therefore corrected by a factor of 1.3 as measured by a previous ATLAS analysis sensitive to the in-situ measurement of this contribution in the single-and opposite-sign dilepton final states [136]. This correction is well motivated since the mismodelling of additional -jets in¯is not expected to depend on the presence of additional non-prompt leptons in the event. Second, the shape of the -jet multiplicity in the non-prompt lepton background simulation is corrected to match data in an orthogonal 2ℓSS validation region enriched with non-prompt leptons, where one of the leptons must satisfy a looser requirement on the non-prompt lepton BDT score but fail the lepton WP criteria.
Several of the event categories introduced in Section 5 were designed to be enriched in specific processes and are used to derive normalisation factors to improve their modelling by the simulation. The 3ℓMatC CR is enriched in material conversions with a purity of 70% and only the total event yield is used. There are six 2ℓ CRs enriched in contributions from HF non-prompt leptons in¯events, i.e. 2ℓtt(e) ( , ex ) , 2ℓtt(e) ( ex , ) , 2ℓtt(e) ( ex , ex ) , 2ℓtt( ) ( , ex ) , 2ℓtt( ) ( ex , ) , and 2ℓtt( ) ( ex , ex ) . In these CRs, the transverse momentum of the fake-lepton-candidate distribution is used to be able to correct for a possible mismodelling in the T of the non-prompt lepton. The fake-lepton candidate is assumed to be the subleading lepton in the ( , ex ), ( ex , ex ) regions, and the leading lepton in the ( ex , ) region. The event requirement to have at least one ex lepton provides separation from the irreducible backgrounds, in particular¯, and thus increases the sensitivity to the HF non-prompt electron and muon contributions. Normalisation factors for three non-prompt-lepton background contributions are estimated from the likelihood fit. The normalisation factor for HF non-prompt leptons is estimated separately for electrons and muons, had and had , respectively. An additional normalisation factor is determined for the material conversions background, Mat Conv . The measured normalisation factors for the background-only hypothesis are:ˆh ad = 1.05 ± 0.31,ˆh ad = 0.92 ± 0.18, andˆM at Conv = 1.16 ± 0.29, where the uncertainties include systematic effects but are dominated by the statistical uncertainty.
Figures 8(a) and 8(b) display the fake-lepton-candidate T distribution in the 2ℓtt(e) ( , ex ) and 2ℓtt( ) ( , ex ) CRs after the likelihood fit to data. As shown in the figures, the purity of HF nonprompt lepton background is 45% and 55%, respectively, which was possible to achieve with the usage of the exclusive ex lepton working point.

Charge misassignment
Backgrounds with leptons with the charge incorrectly assigned affect primarily the 2ℓ channel and predominantly arise from¯production, where one electron udergoes a hard bremsstrahlung and an asymmetric conversion ( ± → ± * → ± + − ) or a mismeasured track curvature. The muon charge misassignment rate is negligible in the T range relevant to this analysis. The electron charge misassignment rate is measured in data using samples of → + − events reconstructed as same-charge pairs and as opposite-charge pairs, with the background subtracted via a sideband method [105].
The charge misassignment rate is parameterised as a function of electron T and | |. It varies from about 10 −5 for low-T electrons (17 ≤ T ≤ 50 GeV) that satisfy | | ≤ 1.37, to about 4 × 10 −3 for high-T electrons ( T ≥ 100 GeV) in the region 2 ≤ | | ≤ 2.47. To estimate the QMisID background in each of the corresponding event categories, the measured charge misassignment rate is then applied to data events   satisfying the requirements of the 2ℓ channels, except that the two leptons are required to be of opposite charge.

Background modelling
The modelling of some representative variables at preselection level is showed in Figure 9. The background prediction includes all the corrections previously described as well as the normalisation factors determined through a likelihood fit to data as discussed in Section 8. Good modelling is observed across all variables and lepton categories.

Systematic uncertainties
The signal and background yields in each signal and control region may be affected by several sources of systematic uncertainty, described in the following subsections. Given the low background yields and good signal-to-background separation provided by the final discriminating variable used in the signal-rich event categories, the search sensitivity is determined by the limited number of data events rather than by the systematic uncertainties on the background estimate. The final uncertainty in the background estimate in the SRs is dominated by the uncertainty in the fitted background normalisations, in particular¯. A summary of all systematic uncertainties included in the analysis is given in Table 6.
Uncertainties associated with the lepton selection arise from the trigger, reconstruction, identification and isolation efficiencies, and lepton momentum scale and resolution [105, 107, 111]. Uncertainties associated with the jet selection arise from the jet energy scale (JES), the JVT requirement and the jet energy resolution (JER) [113,138].
The efficiency of the flavour-tagging algorithm is measured for each jet flavour using control samples in data and in simulation. From these measurements, correction factors are derived to correct the tagging rates in the simulation [121, 122, 139]. These systematic uncertainties are taken as uncorrelated between -jets, -jets, and light-flavour jets. An additional uncertainty is assigned to account for the extrapolation of the -tagging efficiency measurement from the T region used to determine the correction factors to regions with higher transverse momentum [140]. This uncertainty is the leading experimental uncertainty in the analysis.
The treatment of the uncertainties associated with reconstructed objects is common to all analysis channels, and thus these are considered as fully correlated among different analysis regions.

Theoretical uncertainties
The modelling uncertainties on the main irreducible backgrounds are assessed through comparisons with alternative MC samples, as listed in Table 1. Additional uncertainties are evaluated from renormalisation and factorisation scale variations by a factor of 0.5 and 2, relative to the nominal scales, for the¯,¯, and diboson samples. An additional 20% uncertainty is assigned to the¯electroweak contribution [141]. An additional 50% uncertainty is assigned to¯,¯, and¯events with additional heavy-flavour jets, following Ref. [34]. This normalisation uncertainty is not applied to diboson events with heavy-flavour since its normalisation is fit to data. The statistical uncertainty on the fitted parameters for the jet-multiplicity correction is propagated as an uncertainty on the diboson background. The leading theoretical uncertainties arise from¯modelling and additional heavy-flavour uncertainties.
Finally, additional normalisation uncertainties are included for all processes whose normalisation is not obtained from the fit. The¯¯,¯ℎ, and processes are assigned an uncertainties of 20% [93], 11% [95], and 5% [142], respectively. As a conservative estimate, a 50% cross section uncertainty is assigned to thē , ,¯, and triboson backgrounds, which are small backgrounds with low impact on the search.
Uncertainties on the modelling of the signal samples are evaluated through independent variations of the factorisation and renormalisation scales by a factor of two. Additional uncertainties due to PDF effects are estimated through an ensemble of eigenvariations of the NNPDF set, and by taking the differences with respect to alternative PDF sets [143].

Non-prompt lepton uncertainties
The normalisation of HF non-prompt leptons is obtained from regions including at least one ex lepton and extrapolated to the signal regions where the same-sign leptons fulfil the or identification requirements. An uncertainty of 20% on the extrapolation from ex to leptons is applied from the comparison of the relative efficiency between nominal and alternative¯MC samples. An additional 50% uncertainty is assigned to events originating from¯+≥ 1 and¯+≥ 1 , decorrelated between flavours. Validation regions with looser lepton requirements and further enriched in non-prompt leptons are defined. A good agreement between data and background prediction is observed in all kinematic variables except for the number of -jets. Based on this disagreement, an −jets -dependent uncertainty is added to the HF non-prompt background ranging from 6%-40% for 1-3 additional -jets in the non-prompt muon regions, and 10%-80% in the non-prompt electron regions.
The modelling of internal and material conversions is tested in dedicated validation regions with two tight same-sign leptons, requiring one of them to be a conversion candidate. Additional uncertainties of 10% and 50% are assigned to the material and internal conversion backgrounds, respectively, evaluated from the data to background agreement in the validation regions.
A systematic uncertainty of 10%-60% is assigned to the background from electrons with misidentified charge. The uncertainty increases with electron T and decreases with | |. The uncertainty is assessed combining the uncertainties from the measurement of the charge misassignment rate, the difference in rates from varying the window selection, and the different rates measured in data and → + − MC.

Results
A maximum-likelihood fit is performed on all bins in the 27 signal and control regions considered in this search to simultaneously determine the background and the g2HDM signal yields that are most consistent with the data. The DNN SB is used as the discriminating variable in the signal regions, whereas the −jets , fake-lepton-candidate T and event yields are fit in the control regions. The sum of all the g2HDM signal processes studied here (sstt, ttq, ttt, tttq, tttt) is considered as a single signal template and its acceptance in each category is predicted by the simulation.
The likelihood function L ( , ì , ì ) is constructed as a product of Poisson probability terms over all bins considered in the search, and depends on the signal-strength parameter, , a multiplicative factor applied to the predicted yield for the g2HDM signal (depending on the coupling configuration , , and on the assumed mass ), ì , the normalisation factors for several backgrounds (see Section 6), and ì , a set of nuisance parameters (NP) encoding systematic uncertainties in the signal and background expectations [144]. Systematic uncertainties can impact the estimated signal and background rates, the migration of events between categories, and the shape of the fitted distributions; they are summarised in Table 6. Both and ì are treated as free parameters in the likelihood fit. The NPs ì allow variations of the expectations for signal and background according to the systematic uncertainties, subject to Gaussian or Poisson constraints in the likelihood fit. Their fitted values represent the deviations from the nominal expectations that globally provide the best fit to the data. Statistical uncertainties in each bin due to the limited size of the simulated samples are taken into account by dedicated parameters using the Beeston-Barlow "lite" technique [145].
The test statistic is defined as the profile likelihood ratio: = −2 ln(L ( ,ì ,ì )/L (ˆ,ìˆ,ìˆ)), whereˆ,ìˆ, andìˆare the values of the parameters that maximise the likelihood function, andì and ì are the values of the parameters that maximise the likelihood function for a given value of . The test statistic is evaluated with the RooFit package [146]. A related statistic is used to determine the probability that the observed data are incompatible with the background-only hypothesis (i.e. the discovery test) by setting = 0 in the profile likelihood ratio ( 0 ). The -value (referred to as 0 ) representing the probability of the data being compatible with the background-only hypothesis is estimated by integrating the distribution of 0 from background-only pseudo-experiments, approximated using the asymptotic formulae given in Ref. [147], above the observed value of 0 . Some model dependence exists in the estimation of the 0 , as a given signal scenario must be assumed in the calculation of the denominator of 0 , even if the overall signal normalisation is allowed to float and is fit to data. The observed 0 is checked for each explored signal scenario. Upper limits on the signal production cross section for each of the signal scenarios considered are derived by using in the CL s method [148,149]. For a given signal scenario, values of the production cross section (parameterised by ) yielding CL s < 0.05, where CL s is computed using the asymptotic approximation [147], are excluded at ≥ 95% confidence level (CL).
The smallest 0 value is observed when assuming a signal with = 900 GeV and couplings =0.6, =0.0, and =1.1, corresponding to a local significance of 2.8 standard deviations. The fitted signal strength is = 0.07 ± 0.03, pointing to an incompatibility of the model prediction with the size of the excess or else the need for additional undetected decay modes taking up 93% of the branching ratio. The signal cross section resulting from the fit to data for this g2HDM signal hypothesis is 154 fb, with fractional contributions of 55% ttq, 31% sstt, and 14% ttt. The signal with the fitted signal strength closest to unity ( = 0.9 ± 0.4) corresponds to = 900 GeV and couplings =0.2, =0.4, and =0.4 and a local significance of 2.4 . Figure 10 shows the local significance as a function of the three couplings normalised to the sum of the couplings. This normalisation eliminates one degree of freedom related to the total normalisation of the signal, which is not relevant for the computation of the significance. However, a residual dependency on the actual value of the coupling remains as the normalization of the sstt process scales as the fourth power of the couplings, while the rest of the processes scale as a function of the couplings squared.
A comparison of the distributions of observed and expected yields is shown Figure 11(a) for the 17 SRs, and Figure 11(b) for the 10 CRs, after the combined likelihood fit for the signal-plus-background hypothesis. The corresponding post-fit yields for the SRs can be found in Tables 7, 8, and 9 for the 2ℓSS positively charged, 2ℓSS negatively charged, and 3ℓ and 4ℓ SRs, respectively. The signal shown in the figures and tables is the g2HDM signal with couplings =0.6, =0.0, and =1.1, and mass of 900 GeV, which corresponds to the largest observed significance above the background only hypothesis. Good agreement between the data and fitted signal-plus-background yields is found across all event categories.
The systematic uncertainties with the largest impact on the signal strength originate from the modelling of¯with and without additional heavy flavour jets,¯,¯ℎ, and¯¯processes. The signal strength is partially anti-correlated with¯, with a linear correlation value of −35%. The search is dominated by statistical uncertainties.
Comparisons between data and the background prediction for the DNN SB distributions used in the different SRs is shown in Figures 12 and 13. The binning used for the DNN SB distributions in the different SRs represents a compromise between preserving enough discrimination in the fit between the background and the signal for the different values of the heavy mass considered and keeping the statistical uncertainty   Four QMisID tZ Other Uncertainty Pre-Fit B.
(b) Figure 11: Comparison between data and the background prediction for the event yields in (a) the 17 signal region categories and (b) the 10 control region categories. The expected signal for = 900 GeV and couplings =0.6, =0.0, and =1.1, along with the background contributions, is shown after the likelihood fit to data ("Post-Fit") for the signal-plus-background hypothesis. The total background prediction before the likelihood fit to data ("Pre-Fit") is shown as a dashed blue histogram in the upper panel. The ratio of the data to the total prediction is shown in the lower panel, separately for post-fit signal-plus-background (black points) and pre-fit background (dashed blue line). The size of the combined statistical and systematic uncertainty in the background prediction is indicated by the blue hatched band. of the background prediction per bin well below 30%. The signal regions with the largest pre-fit tension between data and the background yields (shown in the blue dashed line) at high values of the DNN SB are the 2ℓSS ++ CAT tttq, the 2ℓSS ++ CAT tttt, the 2ℓSS ++ CAT sstt, and the 2ℓSS ++ CAT ttq regions. Within this model, theˆ¯remains higher than 1, as observed by other analyses [33,42,[150][151][152]. Since the largest discrepancies between data and the background expectation before the fit are observed in signal categories with positive total lepton charge, this tension cannot be explained by the lepton-charge-symmetric SM¯¯production. The goodness-of-fit based on the saturated model [153] for the best fit g2HDM signal plus background is 62%, which shows a better fit to data than the background-only hypothesis with a goodness-of-fit of 45%.
Exclusion limits on the heavy Higgs boson mass are set for different choices of the couplings, as shown in Figure 14 . The excluded mass is also presented as a function of two couplings for different assumptions, as shown in Figure 15.
The search is also used to set limits on RPV SUSY models using the existing DNNs that were trained for the g2HDM model. Figures 16(a) and 16(b) show the exclusion limits obtained on the higgsino and wino models, respectively. Higgsinos (winos) with masses up to 585 (670) GeV are excluded. Figure 17 shows limits on the smuon-bino model. Smuon masses up to 460 GeV are excluded, with weaker exclusion limits for small mass splittings between the smuon and the lightest SUSY particle (LSP), or for LSP masses close to the top-quark threshold.                                 1, the latter corresponding to the couplings yielding the most significant excess. The yellow and green contours of the band around the expected limit are the ±1 and ±2 variations including all uncertainties, respectively. The theoretical prediction for the signal production cross section is also shown as a red line. The production cross section is the sum of the five production modes considered in the search.  Figure 17: Expected and observed exclusion limits on the smuon plus bino RPV SUSY model. The yellow and green contours of the band around the expected limit are the ±1 and ±2 variations including all uncertainties, respectively. The diagonal grey lines indicates the allowed kinematic limit for the decays. The neutralino is assumed to decay to or with equal probability.

Conclusion
A search for a general two Higgs doublet model is presented, where the heavy Higgs bosons feature flavour changing couplings. Such couplings allow for same-sign top and three-top production among others, with a sizeable charge asymmetry. The targeted final state is characterised by multiple leptons and multiple -jets.
To improve the sensitivity of the search, events are categorised according to the lepton multiplicity, total lepton charge, and a multi-output deep neural network classifier. The dominant backgrounds originate from ,¯, and¯, and are estimated from Monte-Carlo simulation and normalised to data. The analysis is performed with proton-proton collision data at                     [125] ATLAS Collaboration, Performance of missing transverse momentum reconstruction with the ATLAS detector using proton-proton collisions at