Observation of tW production in the single-lepton channel in pp collisions at $\sqrt{s}$ = 13 TeV

A measurement of the cross section of the associated production of a single top quark and a W boson in final states with a muon or electron and jets in proton-proton collisions at $\sqrt{s}$ = 13 TeV is presented. The data correspond to an integrated luminosity of 36 fb$^{-1}$ collected with the CMS detector at the CERN LHC in 2016. A boosted decision tree is used to separate the tW signal from the dominant $\mathrm{t\bar{t}}$ background, whilst the subleading W+jets and multijet backgrounds are constrained using data-based estimates. This result is the first observation of the tW process in final states containing a muon or electron and jets, with a significance exceeding 5 standard deviations. The cross section is determined to be 89 $\pm$ 4 (stat) $\pm$ 12 (syst) pb, consistent with the standard model.


Introduction
The observation of singly produced top quarks by the D0 [1] and CDF [2] Collaborations opened a new era in the study of electroweak interactions of top quarks. At hadron colliders, top quarks are produced predominantly via the strong interaction with an antiquark partner (tt). Much less frequently, top quarks and antiquarks are produced singly by the electroweak interaction via the Wtb vertex between the W boson and the top and bottom quarks. Three main processes contribute to electroweak single top quark production: the t channel [3-5], produced by quark scattering via the exchange of a virtual W boson; the s channel [6, 7], produced by quark-antiquark annihilation to an off-shell W boson; and the associated production of a single top quark with a W boson (tW), produced either via the exchange of a top quark or by an intermediate off-shell b quark.
All three single top quark processes are sensitive to the Cabibbo-Kobayashi-Maskawa matrix element V tb , and their study provides a direct probe of its value. Any significant deviation from the established value may be indicative of physics beyond the standard model (SM). The tW process is sensitive in particular to the Wtb vertex, whilst the tand s-channel processes contain contributions from additional four-fermion operators. By studying all three single top quark channels it should, therefore, be possible to disentangle the new physics effects, if any such deviations are observed [8,9].
Whilst the Fermilab Tevatron experiments successfully observed the tand s-channel processes [1, 2], the tW production cross section was too small to be accessible. At the CERN LHC, the tW process has the second-largest cross section among the single top quark channels after the t channel, making detailed studies of the tW process possible. Evidence of the tW process was first reported by the ATLAS and CMS experiments at the LHC using data at √ s = 7 TeV [10, 11], followed by the observation at √ s = 8 TeV [12,13]. Precise cross section and differential measurements have since been carried out using data at √ s = 13 TeV by both collaborations [14][15][16].
The leading-order (LO) Feynman diagrams for the tW process are shown in Fig. 1. The production cross section in proton-proton (pp) collisions at √ s = 13 TeV, assuming a top quark mass m t of 172.5 GeV, has been computed to be 71.7 ± 1.8 (scale) ± 3.4 (PDF) pb at approximate next-to-next-to-LO (NNLO) [17], and 79.5 +1.9 −1.8 (scale) +2.0 −1.4 (PDF) pb at approximate nextto-NNLO (aN 3 LO) [18]. The first uncertainties are due to scale variations in the calculation, and the second correspond to the choice of parton distribution functions (PDFs). The tW process is of special interest because of its interference at next-to-LO (NLO) with tt production [19][20][21]. Whilst the two processes are distinct at LO, they share a subset of Feynman diagrams at NLO, examples of which can be seen in Fig. 2. This leads to conceptual and practical problems with signal definition, the understanding and measurement of which can provide insight into how such types of interference predicted in various new physics models might manifest. Two schemes have been proposed to describe the tW signal: "diagram removal" (DR) [21], where all NLO diagrams that are doubly resonant, such as those in Fig. 2, are excluded from the signal definition; and "diagram subtraction" (DS) [21,22], in which the differential cross section is modified with a gauge-invariant subtraction term that locally cancels the contribution of the tt diagrams. The DR scheme is used to define the tW signal in this analysis.
W − g t Figure 2: Feynman diagrams for tW single top quark production at next-to-leading order that are removed from the signal definition in the DR scheme. Charge conjugate states are implied.
In the SM, top quarks decay almost exclusively to a W boson and a b quark. Consequently, the tW process results in a signature containing two W bosons and one b quark.
To date, all tW studies carried out on data collected by the CMS detector have been performed using the final states in which both W bosons decay leptonically. In comparison to this wellestablished final state, the single-lepton final state-in which one W boson decays leptonically and the other hadronically-has seen little study; to date, only one measurement has been presented by the ATLAS Collaboration using data at √ s = 8 TeV [23]. Whilst the single-lepton channel offers the advantages of larger branching fractions and the possibility of a fully reconstructable top quark system, it suffers from larger and more numerous backgrounds. This paper reports the first measurement from the CMS Collaboration of the tW process in the single-lepton final state. Single-lepton events are selected from pp collisions at √ s = 13 TeV corresponding to an integrated luminosity of 36 fb −1 . A boosted decision tree (BDT) is used to separate the tW signal from the dominant tt background. The subdominant W+jets events and events comprised of jets produced through the strong interaction, referred to as quantum chromodynamic (QCD) multijet events, are constrained using data-based estimates. The tW production cross section is extracted using a binned likelihood fit carried out on the BDT discriminant distributions for both channels and three jet multiplicity regions simultaneously. Tabulated results are provided in HEPData [24].

The CMS detector
The central feature of the CMS apparatus is a superconducting solenoid of 6 m internal diameter, providing a magnetic field of 3.8 T. Within the solenoid volume are a silicon pixel and strip tracker, a lead tungstate crystal electromagnetic calorimeter (ECAL), and a brass and scintillator hadron calorimeter (HCAL), each composed of a barrel and two endcap sections. Forward calorimeters extend the pseudorapidity (η) coverage provided by the barrel and endcap detectors. Muons are detected in gas-ionization chambers embedded in the steel flux-return yoke outside the solenoid.
The candidate vertex with the largest value of summed physics-object p 2 T (where p T is the trans-verse momentum) is taken to be the primary pp interaction vertex. The physics objects are the jets, clustered using the jet finding algorithm [25,26] with the tracks assigned to candidate vertices as inputs, and the associated missing transverse momentum, taken as the negative vector p T sum of those jets.
The particle-flow algorithm [27] aims to reconstruct and identify each individual particle in an event, with an optimized combination of information from the various elements of the CMS detector. The energy of photons is obtained from the ECAL measurement. The energy of electrons is determined from a combination of the electron momentum at the primary interaction vertex as determined by the tracker, the energy of the corresponding ECAL cluster, and the energy sum of all bremsstrahlung photons spatially compatible with originating from the electron track. The energy of muons is obtained from the curvature of the corresponding track [28]. The energy of charged hadrons is determined from a combination of their momentum measured in the tracker and the matching ECAL and HCAL energy deposits, corrected for the response function of the calorimeters to hadronic showers. Finally, the energy of neutral hadrons is obtained from the corresponding corrected ECAL and HCAL energies.
The missing transverse momentum vector p miss T is computed as the negative vector p T sum of all the particle-flow candidates in an event, and its magnitude is denoted as p miss T [29]. The p miss T is modified to account for corrections to the energy scale of the reconstructed jets in the event.
A more detailed description of the CMS detector, together with a definition of the coordinate system used and the relevant kinematic variables, can be found in Ref. [30].

Data and simulated samples
The measurement uses data collected with the CMS detector during pp collisions in 2016 at √ s = 13 TeV, corresponding to an integrated luminosity of 36 fb −1 [31].
Events simulated using the Monte Carlo (MC) method are used throughout the analysis. Signal tW events are simulated using the POWHEG v1 [32] generator interfaced with PYTHIA 8.205 [33] for showering using the CUETP8M1 tune [34]. Fully hadronic decays are excluded from the simulation, and separate samples are created for top quark and antiquark events. The tW process signal is defined using the DR scheme. Events for the tt background are simulated using POWHEG v2 [35] interfaced with PYTHIA 8.205 using the CUETPM2T4 tune [36]. The second-leading background, W+jets, is simulated using MADGRAPH5 aMC@NLO 2.2.2 [37]. The matrix element (ME) calculations are matched to parton shower (PS) using the FxFx [38] algorithm. Single top quark backgrounds from the t and s channel-together referred to as the single t background throughout this paper-are generated using POWHEG v2 interfaced with PYTHIA 8.205 with the CUETP8M1 tune, including spin correlations [39]. QCD multijet events are simulated using MADGRAPH5 aMC@NLO interfaced with PYTHIA 8.205 using the MLM matching [40]. The WW, WZ and ZZ diboson backgrounds-collectively referred to as the VV background-are simulated using PYTHIA 8.205 with the CUETP8M1 tune. All samples are generated at NLO in QCD with the exception of the VV and QCD multijet processes, which are produced at LO. Contributions from other processes are found to be negligible.
For all samples, the proton structure is described using the NNPDF3.0 [41] PDF set, and m t is chosen to be 172.5 GeV. Minimum bias pp interactions generated using PYTHIA 8.205 are overlayed on all simulated events to account for additional interactions occuring per bunch crossing that do not originate from the primary vertex of interest (pileup). The detector re-sponse is simulated using the GEANT4 package [42,43].
All simulated events are processed using the same software chain as for collision data, reweighted to account for the observed distribution in pileup, and normalized to the predicted cross section of the process.

Event selection
Events of interest are selected using a two-tiered trigger system [44,45]. To be considered for the analysis, events must pass high-level triggers that select a single lepton with p T of at least 24 (27) GeV for muons (electrons). Additional offline selections are made such that each event contains exactly one muon with p T > 26 GeV and |η| < 2.1 or one electron with p T > 30 GeV and |η| < 1.48. The forward η range is excluded from the electron selection because background processes dominate in this region. These leptons must pass identification and isolation requirements [28,46], and have originated from the well-reconstructed primary interaction vertex. The isolation requirements are based on the ratio between the lepton p T and the scalar sum of the p T of charged hadrons and neutral particles within a cone of ∆R = √ (∆φ) 2 + (∆η) 2 = 0.3 of the lepton (corrected for pileup), where φ is the azimuthal angle in radians. Events that contain additional leptons with lower p T requirements (p T > 10 GeV for muons and p T > 20 GeV for electrons) and |η| < 2.4 are rejected. Corrections are applied to the trigger and lepton efficiencies in simulation to match those observed in data.
Further selections are made based on the jet topology of the event. Particle-flow jets, reconstructed using the anti-k T algorithm [25] with a distance parameter R = 0.4, are selected if they have p T > 30 GeV and |η| < 2.4. Only jets that are ∆R > 0.4 from the selected leptons are considered. At least two and no more than four jets must be present in the event to be considered in the analysis. The energy of the jets is corrected to take into account inefficiencies and anisotropies in the detectors and reconstruction stages [47].
Jets originating from the hadronization of a b quark are identified (b-tagged) using the combined secondary vertex v2 (CSVv2) algorithm [48]. The candidate b jets must pass the nominal jet selections, as well as a working point of the CSVv2 algorithm chosen to give a b tagging efficiency of ≈75% for b quark jets and a misidentification probability of 1% for u, d, s quark and gluon jets. Exactly one jet that passes these criteria must be present in an event to be used in the analysis. The b tagging efficiencies and misidentification probability are corrected in simulation to match those observed in data.
No selection requirements are made on the p miss T of the event.

Analysis strategy
Events used in the final fit are classified into three distinct analysis regions, one signal region and two control regions. Along with the requirements on leptons and b tagging, an event must contain exactly three jets to be selected in the signal region (3j).
Two control regions are defined such that they are enhanced in the leading backgrounds of the analysis. To keep the regions as kinematically similar to the signal as possible, the selection requirements applied to these regions are identical to those of the signal region, with the exception of the number of selected jets. The first such region contains events with exactly two jets (2j), and is enhanced in the W+jets and QCD multijet backgrounds. The second contains events with exactly four jets (4j), and is enhanced in tt background.
Normalized distributions (templates) and normalization estimates for all processes are taken directly from simulation, with the exception of the W+jets and QCD multijet backgrounds. In the case of the W+jets background, templates are taken from simulation but with the normalization corrected using data to account for the observed mismodelling of jet composition in simulation with respect to data. For the QCD multijet background, mismodelling in both genuine leptons produced in hadron decays, and photon conversions and other objects incorrectly identified as leptons-collectively referred to as nonprompt leptons-precipitates the need to extract both templates and normalization estimates from data directly.
By far the largest contribution to the QCD multijet background is found to be when a jet contains a nonprompt lepton and therefore passes the signal selection requirements. In order to model this background, a sample enriched in these nonprompt leptons is defined. By inverting the isolation requirement on the selected lepton, a sample that is dominated by the QCD multijet background can be created that is as kinematically similar to the desired analysis regions as possible whilst remaining statistically independent. Templates to be used in the final fit of the analysis regions are extracted from these events. A small contribution of tt events is found in this sample, and their contribution-estimated from simulation-is subtracted before use.
The normalizations of both the QCD multijet and W+jets backgrounds are then estimated together using a binned likelihood fit on a distribution that has good separating power between the two processes. The chosen distribution is the transverse mass m W T of the reconstructed leptonically decaying W boson candidate, defined as where p T is the lepton p T , and φ p miss T and φ are the azimuthal angles of the p miss T and lepton, respectively. In events with a real W boson, such as the W+jets background, this distribution peaks at the W boson mass, whereas backgrounds with no real W boson, such as QCD multijet, exhibit a falling distribution that peaks at zero. To avoid potential bias, the fit is carried out in a sample that is enhanced in W+jets and QCD multijet events but statistically independent from the analysis regions, namely on a sample with exactly two jets, neither of which pass b tagging requirements. All other backgrounds are fixed to the values obtained from simulation. Correction factors for both the W+jets and QCD multijet processes are calculated by comparing the results of the fit with initial yield estimates taken directly from simulation. These correction factors are then applied to the expected yields from simulation in each analysis region to estimate the normalization of the two backgrounds.
The uncertainty in extrapolating the correction factors to the analysis regions is assessed by performing the m W T fit to the analysis regions (rather than the no-b-tag sample), and treating the difference as the uncertainty. Both this and the uncertainty from the fit are included in the normalization uncertainty of the W+jets and QCD multijet processes in the final fit. Table 1 shows the event yields per process for each analysis region for the muon and electron channels. Figure 3 shows the p T of the selected lepton in the signal region for the muon and electron channels, scaled to the result of the final fit.
After all selection requirements have been applied, tW signal events are selected with an efficiency of about 5%, and constitute 6% of the expected events in the signal region. In order to increase the sensitivity of the measurement, a multivariate analysis is used to distinguish this signal from the backgrounds. For this analysis, a BDT is trained to identify signal tW events from the dominant tt background. The implementation of the BDT is provided by the "Toolkit for Multivariate Data Analysis" [49], and uses the gradient boosting algorithm [50]. Although Table 1: The total number of events passing the event selection in each analysis region and their associated statistical uncertainties. The event yields are given for the tW signal and all major backgrounds for both the muon (upper) and electron (lower) channels. These values are provided for reference using simulation and scaled to the SM cross sections, with the exception of the QCD multijet background, which is taken from a data-based method, and the W+jets background, which uses the SM cross section corrected using a data-based method. A more precise estimation is obtained from the final fit, as described in the text. The single t background is comprised of the tand s-channel single top quark processes.

Sample
Muon a considerable fraction of the selected events in the signal region comes from QCD multijet and W+jets backgrounds, it was found that, given the relatively small number of available training events for these samples, including contributions from these backgrounds in the samples used to train the BDT did not improve the sensitivity of the result.
The input variables to the BDT are chosen based on their ability to separate the signal from the tt background and the quality of their modelling in simulation. The chosen variables exploit the only difference between a tW and tt event at LO, i.e. the number of jets originating from the fragmentation of a b quark. For a tt event to pass the selection criteria in the signal region, one jet must be misidentified or otherwise fail reconstruction. The loss of this jet causes various kinematic distributions to differ significantly between the two processes, and is particularly noticeable when looking at combinations of reconstructed objects from the selected events. For example, the two selected non-b-tagged jets in the event should, for tW signal events, originate from the hadronic decay of a W boson. In a tt event, however, it is possible that the two jets originate from separate decays. This combinatoric uncertainty means that distributions containing combinations of these objects (angular separation (∆R), total invariant mass, etc.) differ from those of the signal. In order to extract these distributions, candidates for the two intermediate W bosons in the tW signal are reconstructed from the selected objects in each event; a leptonically decaying W boson candidate is reconstructed from the selected lepton and p miss T in the event, and a hadronically decaying W boson candidate is reconstructed from the two non-b-tagged jets.
The BDT input variables, chosen to exploit a variety of these properties, are: • mass of the hadronically decaying W boson candidate, • invariant mass of the b-tagged jet and the sub-leading (in p T ) non-b-tagged jet, • angular separation between the two non-b-tagged jets, • angular separation between the reconstructed leptonic W boson candidate and leading (in p T ) non-b-tagged jet, • p T of the selected lepton, • energy of the two non-b-tagged jets, • angular separation between the b-tagged jet and the selected lepton, • transverse momentum of the system made of the three jets, lepton and p miss T . One BDT is trained for each lepton flavour (electron and muon) in its respective signal region using a subset of the selected tW and tt events as the signal and background samples, respectively. Although they are trained separately, the two BDTs share the same input variables. The trained BDT is then applied to data and simulated samples in each analysis region for its respective lepton flavour, and the produced distributions are used as templates in a likelihood fit to measure the production cross section of the tW process. In the analysis regions where these variables may not be well defined, e.g. the angular separation of the two non-b-tagged jets in the 2j control region, a default value is assigned to the input variable before the discriminant is calculated.

Systematic uncertainties
The sources of systematic uncertainty considered in the analysis are classified as either experimental or modelling uncertainties. These systematic uncertainties are included in the signal extraction as nuisance parameters of the likelihood fit, as an effect on the normalization and/or shape of the input templates. The experimental and modelling uncertainties impact on both shape and normalization, whilst the uncertainty of the luminosity measurement and background normalization uncertainties affect the normalization only. In addition, an uncertainty in the production cross section for each of the background processes is included. For the t-channel single top quark and tt processes this uncertainty is taken from their respective recent CMS measurements [53,54]. For the W+jets and QCD multijet backgrounds this uncertainty is taken from the data-based background estimation. All other backgrounds are assigned an uncertainty of 50%. The normalization uncertainties are treated as correlated across all analysis regions, with the exception of the data-based backgrounds, which are assigned uncorrelated uncertainties in each analysis region.
The modelling uncertainties originate from the choices in the generator parameters made during event simulation. These uncertainties are assessed by comparing the templates produced from the nominal samples with templates derived from alternate samples generated with variations in these parameters. These parameters include ME scale variations in the tW signal POWHEG simulation [55]. The strong coupling parameter α S , which controls the factorization and renormalization scales at parton shower level, is varied to produce samples that reflect the uncertainty in both the initial-and final-state radiation produced by the tW signal and leading tt background.
The h damp parameter in POWHEG, which controls the scale of parton shower [36] matching with the ME [56], and therefore regulates the damping of real emission in NLO calculations, is varied in dedicated samples for the tt background. The effect of the underlying event on the tt background is estimated by varying several parameters that together control the recoil part of the event. The impact of the choice of colour reconnection model on the tt background [57,58] is also assessed in the result.
The uncertainty in the proton PDFs is taken into account by reweighting simulated events using variations of the NNPDF3.0 set [59]. The envelope of these varied weights is taken as the uncertainty in the likelihood fit.
In order to assess the impact of the choice of using the DR or DS scheme when simulating the tW signal events, an alternate signal sample is generated using the DS scheme. The templates that are produced using this alternate sample are treated as the morphed templates under the DR/DS nuisance parameter.
The systematic uncertainties are applied to all relevant processes, signal and backgrounds alike, in exactly the same manner. Their associated nuisance parameters are treated as correlated between all analysis regions in which they are applicable. Where the sources differ due to the lepton flavour (i.e. trigger efficiencies, lepton scale uncertainties), the three regions of each lepton flavour are correlated with each other, but uncorrelated from the regions of opposite flavour. The data-based background uncertainties are uncorrelated across all regions.
For the case of nuisance parameters that change the shape of the input templates, the morphed templates are smoothed with a polynomial fit in order to avoid unrealistic constraints originating from statistical fluctuations. The contribution of each systematic source to the total uncertainty of the result is displayed in Table 2.

Results
A binned likelihood fit is performed on the BDT discriminants in order to extract the tW production cross section. All regions in the muon and electron channels are fit simultaneously to produce the result, with systematic uncertainties included as nuisance parameters in the fit.
The likelihood used in the statistical analysis, L(σ, θ), is a function of the measured signal cross section σ, and a set of nuisance parameters θ that parameterise the systematic uncertainties as nuisance parameters associated with log-normal priors. The number of events in each bin of the input templates is assumed to be described by a Poisson distribution, and is a function of the number of predicted background events, µ, and θ. The best value for µ is then found by maximising the likelihood with respect to all of its parameters. The impact of each source of systematic uncertainty is assessed by performing the fit with the remaining nuisance parameters held constant.
The measured tW production cross section is 89 ± 4 (stat) ± 12 (syst) pb. The total observed uncertainty on the measurement is 15%, compared to an expected uncertainty of 17%. This result is compatible with both the SM predictions for the process of 71.7 ± 1.8 (scale) ± 3.4 (PDF) pb at NNLO in QCD [17], and 79.5 +1.9 −1.8 (scale) +2.0 −1.4 (PDF) pb at aN 3 LO [18]. This corresponds to an excess of signal over the background-only hypothesis that exceeds 5 standard deviations, and is therefore the first observation of the tW channel in the single-lepton final state. Figure 4 shows the BDT discriminant for the signal and control regions scaled to the output of the fit.

Summary
The first observation of the associated production of a single top quark and a W boson in the single-lepton channel containing a muon or electron and jets is presented. The cross section is extracted using a binned likelihood fit of the discriminant from a boosted decision tree designed to separate the signal from the dominant top quark and antiquark pair background. The  analysis is performed using proton-proton collision data at a centre-of-mass energy of 13 TeV recorded by the CMS detector at the LHC corresponding to an integrated luminosity of 36 fb −1 .

Acknowledgments
We congratulate our colleagues in the CERN accelerator departments for the excellent performance of the LHC and thank the technical and administrative staffs at CERN and at other CMS institutes for their contributions to the success of the CMS effort. In addition, we gratefully acknowledge the computing centres and personnel of the Worldwide LHC Computing Grid and other centres for delivering so effectively the computing infrastructure essential to our analyses. Finally, we acknowledge the enduring support for the construction and operation of the LHC, the CMS detector, and the supporting computing infrastructure provided by the follow-         [18] N. Kidonakis and N. Yamanaka, "Higher-order corrections for tW production at high-energy hadron colliders", JHEP 05 ( [24] "HEPData record for this analysis", 2021. doi:10.17182/hepdata.102957. [25] M. Cacciari, G. P. Salam, and G. Soyez, "The anti-k T jet clustering algorithm", JHEP 04 (2008) 063, doi:10.1088/1126-6708/2008/04/063, arXiv:0802.1189.