Search for top squarks decaying via four-body or chargino-mediated modes in single-lepton final states in proton-proton collisions at $\sqrt{s} =$ 13 TeV

A search for the pair production of the lightest supersymmetric partner of the top quark ($\widetilde{\mathrm{t}}_1$) is presented. The search focuses on a compressed scenario where the mass difference between the top squark and the lightest supersymmetric particle, often considered to be the lightest neutralino ($\widetilde{\chi}^0_1$), is smaller than the mass of the W boson. The proton-proton collision data were recorded by the CMS experiment at a centre-of-mass energy of 13 TeV, and correspond to an integrated luminosity of 35.9 fb$^{-1}$. In this search, two decay modes of the top squark are considered: a four-body decay into a bottom quark, two additional fermions, and a $\widetilde{\chi}^0_1$; and a decay via an intermediate chargino. Events are selected using the presence of a high-momentum jet, significant missing transverse momentum, and a low transverse momentum electron or muon. Two analysis techniques are used, targeting different decay modes of the $\widetilde{\mathrm{t}}_1$: a sequential selection and a multivariate technique. No evidence for the production of top squarks is found, and mass limits at 95% confidence level are set that reach up to 560 GeV, depending on the $m(\widetilde{\mathrm{t}}_1) - m(\widetilde{\chi}^0_1)$ mass difference and the decay mode.


Introduction
Searches for new phenomena, in particular supersymmetry (SUSY) [1][2][3][4][5][6], are among the main objectives of the physics programme at the CERN LHC. Supersymmetry, which is one of the most promising extensions of the standard model (SM), predicts superpartners of SM particles, where the spin of each new particle differs by one-half unit with respect to its SM counterpart. If R-parity [7], a new quantum number, is conserved, supersymmetric particles would be pairproduced and their decay chains would end with the lightest supersymmetric particle (LSP). Supersymmetric models can offer solutions to several shortcomings of the SM, in particular those related to the explanation of the mass hierarchy of elementary particles [8,9] and to the presence of dark matter in the universe. The search for SUSY has special interest in view of the recent discovery of the Higgs boson [10][11][12] as it naturally solves the problem of quadratically divergent loop corrections to the mass of the Higgs boson by associating with each SM particle a supersymmetric partner having the same gauge quantum numbers. In many models of SUSY, the lightest neutralino χ 0 1 is the LSP and, being neutral and weakly interacting, would match the characteristics required for a dark matter particle.
Supersymmetry predicts a scalar partner for each SM left-and right-handed fermion. When SUSY is broken, the scalar partners acquire masses different from those of their SM counterparts, and the mass splitting between the two squark mass eigenstates is proportional to the mass of their SM partner. Given the large mass of the top quark, this splitting can be the largest among all squarks. Therefore the lightest supersymmetric partner of the top quark, the t 1 , is often the lightest squark. Furthermore, if SUSY is a symmetry of nature, cosmological observations may suggest the lightest top squark to be almost degenerate with the LSP [13]. This motivates the search for a four-body t 1 decay: t 1 → bff χ 0 1 , where the fermions f and f can be either quarks or leptons. Here, due to the small mass difference between the t 1 and the χ 0 1 , two-body ( t 1 → t χ 0 1 , t 1 → bχ + 1 ) and three-body ( t 1 → bW + χ 0 1 ) decays of the lightest top squark are kinematically forbidden, and the two-body ( t 1 → c χ 0 1 ) decay can be suppressed depending on the details of the model. Alternatively, the decay t 1 → bχ + 1 → bff χ 0 1 is possible if the mass of the lightest chargino is lower than the top squark mass. Figure 1 represents the production of a pair of t 1 followed by a four-body or chargino-mediated decay in simplified models [14]. In this paper, we describe a search for pair production of the t 1 in proton-proton (pp) collisions at the LHC at √ s = 13 TeV, where each top squark can decay either directly, or via a chargino, into the bff χ 0 1 final state. A 100% branching fraction for each decay is assumed when interpreting the results [14]. The final states considered contain jets, missing transverse momentum (p miss T ), and exactly one lepton, which can be either an electron or a muon, originating from the decay of the top squark or the chargino, depending on the considered decay scenario. The lepton can be efficiently reconstructed and identified with transverse momentum (p T ) as low as 5.0 and 3.5 GeV for electrons and muons, respectively. In this search, we expand the result of a previous CMS search in pp collisions at √ s = 8 TeV [15] by including the single-electron final state and lowering the p T thresholds for leptons. Moreover, two different approaches are used in this analysis. A signal selection based on sequentially applied requirements on several discriminating variables (CC) has been designed to provide good sensitivity over a wide range of kinematic signatures corresponding to different (m( t 1 ), m( χ 0 1 )) mass hypotheses and different t 1 decay modes. The CC approach is applied to the four-body and chargino-mediated t 1 decay scenarios. In addition, a multivariate analysis (MVA) followed by a counting experiment approach is used for the signal selection. Applied to the four-body scenario, this approach exploits the correlations between discriminating variables and is adapted for different ∆m = m( t 1 ) − m( χ 0 1 ) kinematic regions, thus optimizing the search across the (m( t 1 ), m( χ 0 1 )) space and improving upon the sensitivity of the CC approach for this scenario. Both approaches are based on a nearly identical preselection.
Other results in the single-lepton final state and for both the four-body and chargino-mediated t 1

Detector and object definition
The central feature of the CMS apparatus is a superconducting solenoid of 6 m internal diameter, providing a magnetic field of 3.8 T. Within the solenoid volume are a silicon pixel and strip tracker, a lead tungstate crystal electromagnetic calorimeter (ECAL), and a brass and scintillator hadron calorimeter (HCAL), each composed of a barrel and two endcap sections. Forward calorimeters extend the pseudorapidity coverage provided by the barrel and endcap detectors. The silicon tracker measures charged particles within the pseudorapidity range |η| < 2.5. Muons are detected in gas-ionization chambers embedded in the steel flux-return yoke outside the solenoid. The detector is nearly hermetic, allowing for momentum balance measurements in the plane transverse to the beam axis. Events are selected for further analysis by a two-tier trigger system that uses custom hardware processors to make a fast initial selection, followed by a more detailed selection executed on a dedicated processor farm. A more detailed description of the CMS detector can be found in Ref. [22]. This analysis utilizes the CMS particle-flow (PF) algorithm [23] to reconstruct and identify PF candidates such as leptons (electrons and muons), photons, and charged and neutral hadrons. The reconstructed vertex with the largest value of summed physics-object p 2 T is taken to be the primary pp interaction vertex (PV). The physics objects are the jets, clustered using a jet finding algorithm [24,25] with the tracks assigned to the vertex as inputs, and the associated missing transverse momentum, taken as the negative vector sum of the p T of those jets.
The electron candidates are reconstructed from energy depositions in the ECAL and from tracks in the inner tracker obtained using the Gaussian-sum filter [26]. The misidentification of electrons is reduced by requiring additional constraints on the shape of the electromagnetic shower in the ECAL, the quality of the match between the trajectory of the track and the ECAL energy deposit, and the relative HCAL deposition in the electron direction. For reconstructing muons the tracks in both the silicon tracker and the muon system are used [27]. The number of measurements in the tracker and muon system and the quality of the track fit are used to reduce the misidentification rate of muons. In order to select leptons ( = e or µ) from the primary interaction, the point of closest approach to the PV of tracks associated with the lepton is required to have transverse component |d xy | < 0.02 cm and longitudinal component |d z | < 0.1 cm with respect to the PV. In order to suppress the selection of nonprompt leptons, which may arise from jets produced in association with the invisible decay of a Z boson, multijet production, or W+jets and tt events with a lost lepton, selected leptons are required to be isolated from jet activity by using a combination of absolute and relative isolation variables. The absolute isolation (I abs ) of the lepton is defined as the scalar sum of the p T of PF candidates within a cone size of R ≡ √ (∆φ) 2 + (∆η) 2 = 0.3. The leptons and charged PF candidates not associated with the PV are not included in the sum. The contribution of the neutral particles from simultaneous pp collisions (pileup) is estimated according to the method described in Ref. [26], and subtracted from I abs . The relative isolation (I rel ) of a lepton is defined as the ratio of lepton I abs to the lepton p T . The electrons and muons are then required to satisfy I abs < 5 GeV for p T ( ) < 25 GeV and I rel < 0.2 for p T ( ) > 25 GeV. This combined isolation criterion allows for a more uniform selection efficiency of leptons as a function of lepton p T . Finally, the selected electrons and muons are also required to have p T above 5.0 GeV and 3.5 GeV and |η| < 2.5 and 2.4, respectively. Tau leptons with a hadronic decay are reconstructed from the PF candidates using the "hadrons-plus-strips" algorithm [28], which achieves an efficiency of 50-60%. The tau candidates are required to satisfy |η| < 2.4.
The jets used in this analysis are reconstructed by clustering PF candidates using the anti-k T algorithm [24] with a distance parameter of 0.4. The missing transverse momentum vector, p miss T , in the event is defined as the negative of the vectorial sum of the transverse momenta of all the PF candidates in the event with its magnitude denoted as p miss T . The pileup contribution to the jet momenta is partially taken into account by subtracting the energy of charged hadrons originating from a vertex other than the PV. The jet momenta are further calibrated to account for contributions from neutral pileup and any inhomogeneities of detector response [29]. The jets have a threshold of p T > 30 GeV and are required to have |η| < 2.4.
Jets originating from bottom (b) quarks are identified ("tagged") as "b jets" using the combined secondary vertex algorithm [30,31], which takes advantage of MVA techniques. The medium working point of this algorithm is used in the CC search, which has a probability of about 1% to misidentify a light quark jet as a b jet while correctly identifying a b jet with an efficiency of about 65%. The same figures for the loose working point, which is used in the MVA search, are 10% and 80%, respectively.

Data and simulated samples
The searches described in this paper are performed using data from pp collisions recorded in 2016 by the CMS experiment at the LHC at a centre-of-mass energy of 13 TeV corresponding to an integrated luminosity of 35.9 fb −1 . Events in the search are collected based on p miss T triggers with thresholds ranging between 90 and 120 GeV. Additional control samples used for estimating backgrounds are selected by single-lepton triggers with p T thresholds of 24 and 27 GeV for muons and electrons, respectively.
In this analysis, Monte Carlo (MC) simulation samples of SM processes are used to relate background yields in control and signal regions, to validate the background estimation methods based on data, and to predict contributions from rare processes. Simulated samples are produced using multiple generators. The main background samples, namely W+jets, tt, and Z/γ * are generated at leading order (LO) by MADGRAPH5 aMC@NLO 2.3.3 [32]. Next-to-leading order (NLO) simulations with the POWHEG v2.0 [33] and POWHEG v1.0 [34] generators are used for single top quark production and the associated tW production, respectively. Diboson events are simulated at NLO with MADGRAPH5 aMC@NLO 2.3.3 and POWHEG v2.0. The LO and NLO NNPDF3.0 [35] parton distribution functions (PDFs) are used consistently with the order of the matrix element calculation in the generated events. Hadronization and showering of events in all generated samples have been simulated using PYTHIA 8.212 [36,37] with the CUETP8M1 [38] tune for the underlying event. The response of the CMS detector is modelled using the GEANT4 [39] program. Simulation and data events are reconstructed with the same algorithms. The effect of pileup is simulated in the MC samples in order to reproduce the observed pileup conditions in data.
The signal samples for the pair production of top squarks ( t 1 t 1 ) are simulated for 250 ≤ m( t 1 ) ≤ 800 GeV in steps of 25 GeV, and 10 ≤ ∆m ≤ 80 GeV in 10 GeV steps. The cross section for t 1 t 1 production at NLO and including next-to-leading logarithmic (NLL) corrections, as calculated by PROSPINO v.2 [40][41][42][43][44][45][46], varies approximately between 20 and 0.1 pb for the mass range considered. The pair production of squarks with up to two additional jets from initial-state radiation (ISR) is generated with MADGRAPH5 aMC@NLO 2.3.3 and is then interfaced with PYTHIA 8.212 for the decay, hadronization, and showering. For the chargino-mediated decay of the scalar top, m( χ ± 1 ) is taken to be the average of m( t 1 ) and m( χ 0 1 ). The decay is generated to proceed via an off-shell W boson, and the t 1 decay length is set to zero. The modelling of the detector response is performed with the CMS fast simulation program [47].
Simulated background and signal samples are corrected for differences with respect to the values measured in data control samples in the selection efficiencies for leptons and b jets, and for the misidentification probability for light-quark and gluon jets as b jets. These corrections are applied as functions of the p T and η of the objects. For the signal samples, additional corrections are applied to take into account any potential differences between the GEANT4 and fast simulations in regards to tagging efficiencies of b jets, leptons, and modelling of p miss T .

Preselection
The preselection requirements used in this paper are designed by considering the general characteristics of the signal, and are based on the methods presented in Ref.
[15]. The CC and MVA approaches share similar preselection requirements with a few minor differences that are noted below due to studies showing that the MVA leads to better performance with slightly different selection than the CC search. In order to match the trigger requirement, events with p miss T > 200 (280) GeV are selected for the CC (MVA) approach. This requirement favours the signal, which tends to have larger missing transverse momentum than SM processes due to two χ 0 1 's escaping detection. The efficiency of signal triggers is measured to be higher than 90 (98)% for p miss T > 200 (280) GeV, and the simulated samples are reweighted as a function of p miss T to account for the inefficiency.
Further suppression of SM processes such as W+jets is achieved by imposing the additional requirement of H T > 300 GeV, where H T is defined as the scalar p T sum of all jets. For the MVA search, this requirement is relaxed to H T > 200 GeV. In order to improve the separation of signal and SM background, we take advantage of events in which the t 1 pair system recoils against an ISR jet. In this case the LSP becomes Lorentz boosted, which increases the p miss T in the event, while jets and leptons remain relatively soft. The ISR jet candidate in the event is selected as the leading jet with |η| < 2.4, which is required to satisfy p T > 100 (110) GeV for the CC (MVA) search. To reduce the contribution from tt production, events are required to have at most two jets with p T > 60 GeV in the CC search. In events with two jets, the azimuthal angle between the leading and subleading (in p T ) jets is required to be less than 2.5 radians in Events W + jets t t Single t Diboson Finally, the soft single-lepton topology is selected by requiring at least one muon or electron in the event, while vetoing events with a τ lepton, or a second electron or muon with p T > 20 GeV. At this stage of the selection, the W+jets and tt processes represent approximately 70% and 20% of the total expected background, respectively. The Z(→ νν) + jets process contributes with jets, genuine p miss T , and a jet misidentified as a lepton. Diboson, single top and Drell-Yan (DY) processes also contribute, with a lower expected yield due to a low cross section or a low acceptance (or both).

The CC approach 4.1 Signal selection
After the preselection detailed in the previous section, W+jets is the dominant background process, followed by a smaller contribution coming from tt production. A kinematic variable with good discrimination against these background processes is the transverse mass where p T ( ) is the transverse momentum of the selected lepton and ∆φ is the angular difference between the lepton p T ( ) and p miss T . The distributions of lepton p T and M T are shown in Fig. 2 for the observed data and simulated background and signal, where we observe good agreement in the shapes of the distributions between data and background simulation. The normalization of the simulation is corrected by the results of a background estimation technique based partially on data, as described in Section 4.2.
The signal regions (SRs) in the CC analysis are defined to maximize the sensitivity of the search by exploiting the differences between the kinematic properties of the final-state parti-cles in the signal and background processes. The leptons originating from the decay of the t 1 squark are expected to be much softer than those from SM processes. Therefore, all SRs are required to satisfy p T ( ) < 30 GeV. In order to retain sensitivity to different ∆m mass gaps, two signal regions SR1 and SR2 are designed targeting small and large mass differences, respectively. Moreover, due to the strong dependence of the p T and M T distributions for the signal on ∆m, these SRs are further subdivided into a total of 44 mutually exclusive regions, which are detailed below and summarized in Table 1. In this search, the correlations between p miss T , H T , and the transverse momentum of the ISR jet candidate (p T (ISR)) are taken into account by defining SRs in terms of the variables C T1 and C T2 : where the numerical values of 100 and 325 GeV are determined by maximizing the ratio of signal to the square root of background in the signal regions.
In SR1, events with a b jet satisfying p T > 30 GeV are rejected since the b jets in signal events with a small mass gap are expected to have typical p T values smaller than this threshold. This b-tag veto significantly reduces the contribution of tt events. In this region, the p miss T and H T requirements of the preselection are simultaneously tightened by requiring C T1 > 300 GeV. Since the W+jets process is the dominant background process for lower M T values, we take advantage of the charge asymmetry in the production of W bosons at LHC and require the lepton to have a negative charge in SR1 regions with M T < 95 GeV. Moreover, the acceptance of the lepton is tightened by requiring |η( )| < 1.5, because leptons from decays of the W boson at the LHC tend to be produced in the forward direction.
In SR2, we require at least one b jet with p T < 60 GeV, but reject events with any b jet having p T > 60 GeV. These requirements increase the efficiency of signal points with larger ∆m while keeping the tt background under control. In this region we also require C T2 > 300 GeV, which is more effective in reducing the tt background compared to the C T1 requirement.
The SR1 (SR2) region is further divided in bins of M T , lepton p T , and C T1 (C T2 ). The M T binning is done below and above the peak around the W boson mass in the M T distribution, with the regions M T < 60 GeV, 60 < M T < 95 GeV, and M T > 95 GeV labelled as a, b, and c, respectively. It can be seen from Fig. 2 that the lower (higher) M T region is more sensitive to signals with smaller (larger) mass gaps. In order to take advantage of the shape differences in lepton p T distributions between various signal points and SM processes, each SR is further divided into lepton p T regions 5-12, 12-20, and 20-30 GeV, referred to as L, M, and H, respectively. An additional region of 3.5-5.0 GeV is added only for muons and only for M T < 95 GeV, and is labelled VL. In addition, SR1 (SR2) is further separated into two regions in C T1 (C T2 ) defined by 300 < C T1 (C T2 ) < 400 GeV and C T1 (C T2 ) > 400 GeV which are labelled X and Y, respectively.

Background prediction
The dominant backgrounds in most of the CC signal regions are W+jets and tt production with a prompt lepton in the final state. The nonprompt sources of leptons become more important in regions with large M T or very low lepton p T . In this section, the methods used to estimate the prompt and nonprompt backgrounds from data are described. Simulation is used to estimate other rare backgrounds with a prompt lepton, namely Z/γ * , diboson, single top quark production, and tt production with an additional W, Z, or γ.
The nonprompt background due to misidentified leptons associated with a jet becomes comparable to the prompt contribution in regions where W+jets and tt production are suppressed, namely in regions of high-M T and very low lepton p T . This background is estimated fully from data using the "tight-to-loose" method, where a "loose" set of identification and isolation cri- Table 1: The CC search: definition of SRs. The subregions of SRs are denoted by tags in parentheses, as described in the text: VL, L, M, and H refer to the four bins in lepton p T , and X and Y to the C T ranges specified in the table. The corresponding control regions (CR) use the same selection with the exception of the lepton p T as shown in the table. For jets, the attributes "soft" and "hard" refer to the p T ranges 30-60 GeV and >60 GeV, respectively.

Variable
Common to all SRs Number of hard jets ≤2 ∆φ(hard jets) (rad) <2.5 p miss teria are defined to select lepton candidates that are more likely to be nonprompt. The loose selection is defined by relaxing the requirement on the lepton isolation to I abs < 20 GeV for p T ( ) < 25 GeV and I rel < 0.8 for p T ( ) > 25 GeV, as well as relaxing the impact parameter conditions to |d xy | < 0.1 cm and |d z | < 0.5 cm. The "tight" criteria correspond to the final lepton selection of the analysis, described in Section 2. The probability that a loose lepton also passes the tight criteria, the tight-to-loose fraction TL , is measured as a function of lepton p T and |η| in an orthogonal "measurement region" largely dominated by multijet events, which is enriched in nonprompt leptons. The fraction TL is measured from data, after the subtraction of the simulated prompt lepton contribution. The final estimate of nonprompt leptons in a SR or control region (CR) is based on the observed yield in an "application region". The latter is defined in the same way as the corresponding SR or CR, with the exception that the lepton has to pass the loose lepton criteria but not the tight ones. The final estimate is obtained by scaling the data yield in the application region by TL /(1 − TL ), after subtracting the simulated prompt lepton contribution.
The absolute normalization of the prompt background simulation in each SR is obtained from a CR with identical requirements as in the SR except for the lepton p T selection. The CR is defined by replacing the lepton p T requirement of the SR with p T ( ) > 30 GeV; therefore, SRs that are only distinguished by different selections in p T ( ) share the same CR. The impact of potential signal contamination is taken into account when deriving the results as described in Section 6. In each CR, a scale factor for the prompt simulation is obtained by normalizing the simulation to data, after subtracting nonprompt and rare background sources from the observed number of events in the CR. The nonprompt contribution used in the subtraction is estimated separately from data. The composition of the CRs in terms of background processes, as well as the total simulated and observed yields, are shown in Table 2. The scale factors, Each validation region is obtained by one of the following changes: (a) lowering the C T1 (in SR1 and CR1) and C T2 (in SR2 and CR2) requirements to 200 < C T < 300 GeV, (b) replacing the conditions on b jets by requiring at least one b jet with p T > 60 GeV. The predictions in the validation regions are compatible with the observations within the uncertainties.

Systematic uncertainties
Processes for which the absolute yield is predicted by simulation are subject to systematic uncertainties in the determination of the integrated luminosity (2.5%) [48]. All simulated samples are subject to experimental uncertainties on the jet energy scale (JES) and jet energy resolution (JER). The uncertainties due to miscalibration of the JES are estimated by varying the jet energy corrections up and down by one standard deviation and propagating the effect to the calculation of p miss T . Moreover, differences of the JER between data and simulation are accounted for by smearing the momenta of jets in simulation. The uncertainties corresponding to b-tagging efficiencies and misidentification rates for tagging light-flavoured or gluon jets as b jets have been evaluated for all simulated samples. The uncertainties corresponding to the correction of simulated samples for trigger and lepton efficiencies are taken as systematic uncertainties. The uncertainty due to the simulation of pileup for simulated background processes is taken into account by varying the expected cross section of inelastic collisions by 5% [49]. An uncertainty of 50% is assigned to the cross sections of all nonleading backgrounds. An overview of all systematic uncertainties related to the background prediction is presented in Table 3.
The nonprompt background estimation method of this search, as described in the previous section, depends on the tight-to-loose fraction TL which is sensitive to the flavour content of jets. The systematic uncertainty due to possible differences in the flavour content of jets between the measurement and application regions is assessed by varying the b-tagging requirements of the measurement region. The resulting uncertainty ranges from 20 to 50% from low to high lepton p T , respectively. The consistency of the method is tested by applying the same procedure to simulated data. To account for any residual deviation found in the test, an additional uncertainty of 20 to 200% is assigned in some regions, with the highest uncertainties applying Table 3: The CC search: typical ranges for relative systematic uncertainties (in %) on the total background prediction and signal prediction in the main SRs. The "-" means that a certain source of uncertainty is not applicable.

Systematic
Background The prompt background prediction procedure of this search, as described in the previous section, relies on the simulation of W+jets and tt production and is sensitive to theoretical uncertainties on ISR. The modelling of ISR for these processes is checked in control samples in data that are highly enriched in tt or W+jets events. The simulation of tt events is tested by comparing the jet multiplicity observed in a control sample with the simulation. Simulated tt events are reweighted based on this comparison, and half of the correction is assigned as the systematic uncertainty [50]. This systematic uncertainty affecting tt also affects the signal samples. Similarly, the simulation of W+jets events is corrected based on the distribution of p T (W) in a control sample, and the difference between the uncorrected and the corrected simulation is assigned as a systematic uncertainty [51]. These two sources of uncertainties lead to relative changes of the total background estimation in the SRs that range from 2 to 10% for the W+jets process, and are less than 1% for the tt process. The estimate of the prompt background depends only weakly on the background composition, since the distributions of p T ( ) in W+jets and tt processes are similar. The corresponding systematic uncertainty is derived from a 20% variation in the relative yields of W+jets and tt backgrounds.
The dominant source of systematic uncertainty for the signal is caused by the modelling of ISR. It is minimized by reweighting the jet multiplicity in the signal sample according to the corrections obtained in the tt sample. Uncertainties due to unknown higher-order effects are estimated by variations of the renormalization and factorization scales by factors of 0.5 and 2. Moreover, possible differences between the fast and the full GEANT4-based modellings of p miss T are taken into account and the corresponding uncertainties are assigned to the signal yields as shown in Table 3. The statistical uncertainty of the signal simulation ranges from 8 to 15%.

Signal selection
For the selection of the signal events corresponding to four-body decays of the t 1 , we use a boosted decision tree (BDT) [52,53] to take advantage of the correlations among variables that discriminate between signal and background.
Compared to the approach of Ref.
[15], we use new variables and search for the most reduced set of best-performing variables to be used as input to the BDT. To find the most discriminating variables we test different sets maximizing the figure of merit (FOM) [54]: where S and B stand for the expected signal and background yields. The term σ B = ( f B) represents the expected systematic uncertainty on the background with f being an estimate of the relative uncertainty of the background yield, taken to be f = 20% (see Section 5.

3). A new variable is incorporated into the set of input variables only if it significantly increases the FOM. The full list of the final input variables is:
• p miss T , p T ( ), and M T : The correlation between p miss T and p T ( ) differs between signal, where the p miss T is due to three missing objects (two χ 0 1 and a ν), and tt and W+jets backgrounds where the p miss T is due to a single missing object (ν). The M T distribution peaks at ≈80 GeV for SM processes where a W boson is produced, while being a rather broad distribution for signal.
• η( ) and Q( ): The pseudorapidity of the lepton η( ) is considered because the decay products of the signal are more centrally produced than those of the W+jets background. The charge of the lepton Q( ) is also considered, as W + and W − are produced unequally at the LHC, while the signal events contain equal numbers of positive and negative leptons.
• p T (ISR), p T (b), N jets , and H T : The p T of the leading jet, p T (ISR), captures the hard ISR jet in signal events, and p T (b) is the p T of the b jet with the highest b tagging discriminant value. Both are sensitive to the different phase space available for signal and background events: m(t) − m(W) for tt, and m( t 1 ) − m( χ 0 1 ) for signal. The multiplicity of selected jets N jets is included, reflecting the mass of the mother particle t 1 . Because the discrimination power of each input variable varies as a function of ∆m, as shown in Fig. 3, the (m( t 1 ), m( χ 0 1 )) plane is partitioned into eight ∆m regions (from 10 to 80 GeV, in 10 GeV steps) and a separate BDT is trained for each partition. The W+jets and tt processes, which represent a large fraction of the total background after preselection, are included in the training of the BDT. The Z(→ νν) + jets process, which represents a nonnegligible fraction of the remaining total background at the final selection level, is also included. The training is done with simulated events for both signal and background processes. The background samples are normalized to their respective cross section to realistically represent the SM background in the training. We take advantage of the similar distribution of the input variables for different (m( t 1 ), m( χ 0 1 )) signal points with the same ∆m, and regroup all signal points for a given ∆m together when feeding signal to the BDT training. This increases the number of signal events for each training. Due to the large variation of the spectrum of the p T ( ) variable across the (m( t 1 ), m( χ 0 1 )) plane, we require p T ( ) < 30 GeV for signal points with ∆m ≤ 60 GeV before training different BDTs, while there is no restriction on p T ( ) for signal points with higher ∆m.    5 show the output distribution of the BDT in data and for the total SM background as taken from simulation. In each case a representative (m( t 1 ), m( χ 0 1 )) signal point is also shown, chosen at the limit of the expected sensitivity of the CC search (see Section 6) and belonging to the ∆m for which the training has been done. We observe that the responses of the BDT, henceforth called BDT outputs, are not the same. This is due to the changing mix between signal and background as well as varying differences of correlations across the (m( t 1 ), m( χ 0 1 )) plane, resulting in different BDT outputs for different ∆m values. We observe good agreement between data and simulation over the entire range dominated by the background (e.g. BDT output smaller than 0.3) for the eight different trainings. The BDT output is also checked in data to be well reproduced by the simulation in two validation regions, across the entire range of the BDT output. These regions are kinematically orthogonal to the preselection while using the same online selection, and are defined as follows: They are also used to evaluate the precision of the method for predicting background, as described in Section 5.2. A SR is defined by applying a threshold to each BDT output. The thresholds on the BDT output are reported in Table 5. On average the BDT selection suppresses the SM background by a factor ≈3 × 10 3 while reducing the signal by a factor ≈25. The total efficiency for signal points at the limit of the sensitivity of the CC search, and across all selections, is of the order of 1.3 × 10 −4 .

Background predictions
The predicted numbers of W+jets and tt events are obtained from data control regions (CRs) based on the BDT output. The number of estimated prompt background events in the SR, Y SR prompt , is derived as follows: Here, X refers to background processes to be estimated, W+jets or tt. The superscripts SR and CR, respectively, refer to the signal and control regions. The term "prompt" refers to processes where a prompt lepton is produced. The term "nonprompt" refers to processes where there is a jet misreconstructed as a lepton. The numbers N SR,CR prompt (X) are predicted from simulated background. The number N CR prompt (Rare) refers to simulated background processes other than those being estimated, and includes single top, DY, and diboson production. The number N CR nonprompt refers to the estimate of the backgrounds with a nonprompt lepton from data (as explained in Section 4.2). Within the region defined by the preselection, the CRs are obtained by requiring BDT < 0.2 to get a data sample enriched in background. They are further enriched in tt events by requiring them to have at least one tight b jet, and in W+jets events by requiring the number of loose b jets to be zero. The level of potential signal contamination in the CRs is well below 5% and is not expected to impact the final result.
The systematic uncertainties associated with these predictions are based on differences between the predicted number of events (obtained from Eq. (2)) and the observed number of data events, both in validation regions as defined in the previous section.
The number Y SR nonprompt of background events with nonprompt leptons is estimated from data in all signal regions with the method described in Section 4.2. The yield of other SM processes such as diboson, single top, and DY production are estimated from simulation.

Systematic uncertainties
All processes that are modelled by simulation are subject to the same systematic uncertainties as described in Section 4.3. The statistical uncertainty of the signal simulation ranges between 3 and 11%. The systematic uncertainty affecting the prediction of the W+jets and tt backgrounds has been described in Section 5.2, where the statistical uncertainty from the number of events in CRs is included. The uncertainties are evaluated from both validation regions, and the larger value is conservatively chosen. Furthermore, uncertainties on the shape of the BDT output, which can affect the background prediction, have been assessed. They are smaller than the aforementioned systematic uncertainties. The systematic uncertainties affecting the prediction of the nonprompt lepton background are the same as in Section 4.2. As we perform a separate analysis for each ∆m region, the uncertainties are evaluated separately and can therefore vary across different values of ∆m. The relative systematic uncertainties on the predictions of the W+jets, tt, and nonprompt lepton on the total background are provided in Table 4.

Results
After performing the two searches, we find no evidence for direct top squark production, as can be seen in Table 5 and in Fig. 6 for the MVA and CC searches, respectively. Both sets of results include the prediction of the W+jets and tt processes, the prediction of the background with a nonprompt lepton, the prediction of other background processes from simulation, the Table 4: The MVA search: relative systematic uncertainties (in %) on the total background and signal prediction. The "-" means that a certain source of uncertainty is not applicable. In the case of the background, the uncertainties are on the total background. Systematic uncertainties on the data-driven prediction of the W+jets, tt, and nonprompt lepton backgrounds are reported.  (Table 5), the overlap between the SRs defined for different ∆m is generally below 50% for adjacent regions, and ranges from 0 to 30% for nonadjacent regions. Taking into account these results, the expected signal yield for each (m( t 1 ), m( χ 0 1 )) mass point, and the corresponding systematic uncertainties, we interpret the absence of a clear excess in terms of a 95% confidence level (CL) exclusion of top squark pair production in the (m( t 1 ), m( χ 0 1 )) plane. The limits are calculated according to the modified frequentist CL s criterion [55][56][57]. A test statistic, defined to be the likelihood ratio between the background-only and signal-plus-background hypotheses, is used to set exclusion limits on top squark pair production. For the CC search, which features a larger number of signal regions, an asymptotic approximation [54] is used, while in the MVA search the distributions of these test statistics are constructed using simulated experiments. Statistical uncertainties are modelled as Poisson distributions. All systematic uncertainties are modelled with a lognormal distribution. In the CC search, the effect of signal contamination in the CRs is taken into account by including the control regions, with the estimate of corresponding signal yields, in the likelihood fit. When interpreting the results, we assume branching fractions of 100% for the two considered decay scenarios. Figure 7 represents the exclusion contour as a function of m( t 1 ) and ∆m for both searches in the case of the four-body decay scenario. Figure 8 represents the interpretation of the CC search for the chargino-mediated scenario. In order to constrain top squark pair production in both decay modes using the information from several final states, a statistical combination of the CC search with the all-hadronic search [19] for both decay scenarios of the top squark is performed. The common systematic uncertainties of the two searches are treated as fully correlated, and the possible correlations arising from events passing the selection criteria of both searches are found to have negligible impact on the final results. The combined limits, shown in Fig. 9, include all SRs and CRs of the all-hadronic and the single-lepton CC searches. Table 5: The MVA search: prediction of the W+jets, tt, nonprompt lepton, and other backgrounds in the eight SRs defined by the threshold on the BDT output reported in the second column. The prediction of the first three processes is based on data, while that of N SR (Rare), i.e. rare backgrounds, is based on simulation. The uncertainties are the quadrature sum of the statistical uncertainties, the systematic uncertainties of Table 4, and for the backgrounds predicted from simulation, the cross section uncertainties. The number of total expected background (N SR (B)) and observed data (N SR (D)) events in each SR are also reported.    Table 1. The vertical bars and the shaded areas represent the statistical uncertainty of the data and the total uncertainty in the prediction, respectively. The lower panel shows the ratio of data to prediction.

Summary
A search for direct top squark pair production is performed in a compressed scenario where the mass difference ∆m between the lightest top squark and the lightest supersymmetric particle (LSP), taken to be the lightest neutralino χ 0 1 , does not exceed the W boson mass. Two decay modes of the top squark are targeted: the four-body prompt decay to bff χ 0 1 , and the chargino-mediated decay to b χ + 1 with a subsequent decay χ + 1 → ff χ 0 1 . Results are based on proton-proton collision data at √ s = 13 TeV, recorded with the CMS detector in 2016 and corresponding to an integrated luminosity of 35.9 fb −1 . Selected events are required to have a single lepton (electron or muon), and significant missing transverse momentum (p miss T ). Because of the small mass difference between the top squark and the LSP, the decay products of the top squark are expected to have low p T . Events where the presence of a jet from initial-state radiation leads to a boost of the top squark pair and sizeable p miss T are selected.
Two search strategies are pursued. In the sequential selection approach (CC), signal regions are defined based on discriminating variables, particularly the transverse mass of the leptonp miss T system and the lepton momentum. In another approach, a multivariate analysis (MVA) is employed that uses both kinematic and topological variables and is specifically trained for different ∆m regions of the four-body decay mode. In both approaches, the dominant contributions to the signal regions from standard model processes (W+jets, tt, and events with misidentified leptons) are estimated from control regions in data.
Data are found to be compatible with the predicted standard model backgrounds. The results are used to set limits at 95% confidence level on the production cross section as a function of the t 1 and χ 0 1 masses, within the context of simplified models. Assuming 100% branching fraction in the decay channel under consideration and the top squark pair production cross section computed at NLO+NLL precision [40][41][42][43][44][45][46], these limits are converted into mass limits.
Both search strategies are applied to the four-body decay mode. For this decay mode, the MVA search excludes top squark masses up to 420 and 560 GeV at ∆m = 10 and 80 GeV, respectively. There is less sensitivity at lower ∆m due to the smaller available phase space, where the very soft kinematics of the decay products lead to a lower acceptance. The limits obtained in the CC approach are comparable with the MVA approach for ∆m = 30 GeV. The CC approach also covers the chargino-mediated decays, where the chargino mass is taken as the average of the top squark and the neutralino masses, probing t 1 masses up to 540 GeV for ∆m ≈ 40 GeV. The results of the CC search have been combined with a search for top squark pair production in the fully hadronic channel [19]. The combined mass limits reach up to 590 and 670 GeV for four-body and chargino-mediated decays, respectively. The reach of the ∆m dependent MVA search in the four-body decay mode is noteworthy, as the exclusion limit goes beyond that of the combined result at high ∆m.
The results summarized in this paper represent the most stringent limits to date on the top squark pair production cross section for mass differences between the top squark and the lightest neutralino below the W boson mass, and for decays proceeding through the four-body or the chargino-mediated modes.
fully acknowledge the computing centres and personnel of the Worldwide LHC Computing Grid for delivering so effectively the computing infrastructure essential to our analyses. Finally, we acknowledge the enduring support for the construction and operation of the LHC and the CMS detector provided by the following funding agencies: BMWFW and FWF (Aus   [17] ATLAS Collaboration, "Search for top-squark pair production in final states with one lepton, jets, and missing transverse momentum using 36 fb −1 of √ s = 13 TeV pp collision data with the atlas detector", (2017). arXiv:1711.11520. Submitted to JHEP.