Measurement of the Production Cross Section for Single Top Quarks in Association with W Bosons in Proton-Proton Collisions at √s = 13 TeV

A measurement is presented of the associated production of a single top quark and a W boson in proton-proton collisions at √ s = 13 TeV by the CMS Collaboration at the CERN LHC. The data collected corresponds to an integrated luminosity of 35.9 fb−1. The measurement is performed using events with one electron and one muon in the final state along with at least one jet originated from a bottom quark. A multivariate discriminant, exploiting the kinematic properties of the events, is used to separate the signal from the dominant tt̄ background. The measured cross section of 63.1 ± 1.8(stat) ± 6.4(syst) ± 2.1 (lumi) pb is in agreement with the standard model expectation.


Introduction
Single top quarks, observed for the first time by the D0 [1] and CDF [2] Collaborations at the Fermilab Tevatron, are produced via the electroweak interaction. There are three main production modes in proton-proton (pp) or proton-antiproton (pp) collisions: the exchange of a virtual W boson (t channel), the production and decay of a virtual W boson (s channel), and the associated production of a top quark and a W boson (tW channel).
The tW process at the CERN LHC provides a unique opportunity to study the standard model (SM) and its extensions through the interference of the process at next-toleading order (NLO) with top quark pair (tt) production [3][4][5]. The tW process also plays an important role because of its sensitivity to the physics beyond the SM [6][7][8].
The tW production rate in pp collisions at the Tevatron was negligible but at the LHC this process makes a significant contribution to single top quark production. The CMS and ATLAS Collaborations have presented evidence for [9, 10] and observations of [11,12] this process in pp collisions at √ s = 7 and 8 TeV, respectively. The ATLAS Collaboration has also measured the production cross section using 13 TeV data [13].
The tW production cross section is computed at an approximate next-to-next-toleading order (NNLO). The corresponding theoretical prediction for the tW cross section in pp collisions at √ s = 13 TeV, assuming a top quark mass (m t ) of 172.5 GeV, is Figure 1. Leading-order Feynman diagrams for single top quark production in the tW channel that implicitly include the charge-conjugate contributions.
σ ref tW = 71.7 ± 1.8 (scale) ± 3.4 (PDF) pb [14]. The first uncertainty refers to the factorization (µ F ) and renormalization (µ R ) scales in quantum chromodynamics (QCD), and the second to parton distribution functions (PDFs). The quoted cross section includes the charge-conjugate modes. The leading-order (LO) Feynman diagrams for tW production are shown in figure 1. This paper reports the first measurement from the CMS Collaboration of tW production in pp collisions at √ s = 13 TeV. The measurement uses data recorded by CMS during 2016, corresponding to an integrated luminosity of 35.9 ± 0.9 fb −1 . The analysis is performed using the e ± µ ∓ dilepton channel, in which both W bosons, either produced in association with the top quark or from the decay of the top quark, decay leptonically into a muon or an electron ( ), and a neutrino. Events with W bosons decaying into τ leptons that decay into electrons or muons also contribute to the measurement. The primary background to tW production in this final state comes from tt production, with Drell-Yan (DY) production of τ lepton pairs that decay leptonically being the next most significant background. To extract the signal, the analysis uses a multivariate technique, exploiting kinematic observables to distinguish the tW signal from the dominant tt background.
The paper is structured as follows. Section 2 provides a summary of the CMS detector and of the Monte Carlo (MC) event simulation. The object and event selection criteria are discussed in section 3. The description of the method used to separate the tW signal from the tt background is given in section 4. The sources of systematic uncertainties are discussed in section 5. The extraction of the tW production cross section is described in section 6, and a summary of the results is presented in section 7.

JHEP10(2018)117
transverse to the beams. A two-level trigger system selects the most interesting pp collisions for offline analysis. A more detailed description of the CMS detector, together with a definition of the coordinate system used and the relevant kinematic variables, can be found in ref. [15].
The tW signal is simulated at NLO using powheg v1 [16] with the NNPDF 3.0 PDF set [17], and pythia v8.205 [18,19], with the underlying event tune CUETP8M1 [20,21], is used for parton showering and hadronization. At NLO in perturbative QCD, tW production interferes with tt production [3][4][5]. Two different procedures can be used to account for this interference: the "diagram removal" (DR) [3] approach, where all NLO diagrams that are doubly resonant are excluded from the signal definition; and the "diagram subtraction" (DS) [3,22] approach, in which the differential cross section is modified with a gauge-invariant subtraction term, which locally cancels the contribution of tt diagrams. The DR scheme is used here, and it has been verified that the number of predicted events after the full selection is comparable with that obtained from the DS approach.
The NLO powheg v2 [23] setup is used to simulate tt events, as well as the dependency of the tt production on m t , µ R and µ F , and the PDF set. The NNPDF 3.0 set is used as the default PDF set. Parton showering and hadronization for the tt events are handled by pythia v8.205 with the underlying event tune CUETP8M2T4 [24].
Background contributions from processes other than tt are also estimated from MC simulations. The DY and W+jets background samples are generated at NLO with Mad-Graph5 amc@nlo v2.2.2 [25] with NNPDF 3.0 PDFs, interfaced with pythia v8.205, with the CUETP8M1 underlying event tune for fragmentation and hadronization. These processes are simulated with up to two additional partons and the FxFx scheme [26] is used for the merging. The contributions from WW, WZ, and ZZ (referred to as VV) processes are simulated at LO with pythia v8.205 with the CUETP8M1 underlying event tune. Other contributions from W and Z boson production in association with tt events (referred to as ttV) are simulated at NLO using MadGraph5 amc@nlo v2.2.2 and interfaced with pythia v8.205 with the CUETP8M1 underlying event tune. Finally, the tt and W+jets samples described above, in the lepton+jets final state, are used to determine the contribution to the background from events with a jet incorrectly reconstructed as a lepton or with a lepton incorrectly identified as being isolated. These last contributions to the background are labeled non-W/Z as they contain a lepton candidate that does not originate from a leptonic decay of a gauge boson.
For comparison with the measured distributions, the event yields in the simulated samples are normalized using the integrated luminosity and their theoretical cross sections. These are taken from NNLO (W+jets and DY [27]), approximate NNLO (single top quark tW channel [14]), and NLO (diboson [28]) calculations. For the simulated tt sample, the full NNLO plus next-to-next-to-leading-logarithmic accuracy calculation [29], performed with the Top++ 2.0 program [30], is used. The PDF uncertainty is added in quadrature to the uncertainty associated with the strong coupling constant (α S ) to obtain a tt production cross section of 832 +20 −29 (scale)±35 (PDF+α S ) pb assuming m t = 172.5 GeV. The simulated samples include additional interactions per bunch crossing (pileup), with the distribution matching that observed in data, with an average of 23 collisions per bunch crossing.

Event selection
In the SM, a top quark decays almost exclusively into a W boson and a bottom quark. The analysis uses the e ± µ ∓ decay channel, in which the W boson produced in association with the top quark and the W boson from the decay of the top quark both decay leptonically, one into an electron and the corresponding neutrino, and the other into a muon and the corresponding neutrino. This leads to a final state composed of two oppositely charged leptons, a jet resulting from the fragmentation of a bottom quark, and two neutrinos. The event selection described here follows closely that used in the measurement of the top quark-antiquark pair production cross section in the dilepton channel [31].
Events are required to pass either a dilepton or single-lepton trigger. The dilepton triggers require events to contain either one electron with transverse momentum p T > 12 GeV and one muon with p T > 23 GeV, or one electron having p T > 23 GeV and one muon with p T > 8 GeV. In addition, single-lepton triggers with one electron (muon) with p T > 27 (24) GeV are used to increase the efficiency. The efficiency for the combination of the single-lepton and dilepton triggers is measured in data events passing the dilepton selection criteria given below and collected using triggers based on the p T imbalance in the event. This efficiency is found to be ≈98%. The efficiency of the simulated trigger is corrected to match that observed in data using a multiplicative scale factor (SF).
The particle-flow (PF) algorithm [32] attempts to reconstruct and identify each individual particle in an event with an optimized combination of information from the various elements of the CMS detector. Leptons (electrons [33] or muons [34]) in the event are required to be well isolated and to have p T > 20 GeV and |η| < 2.4. Isolation requirements are based on the scalar sum of the p T of all PF candidates, reconstructed inside a cone of ∆R = √ (∆η) 2 + (∆φ) 2 = 0.3 (0.4) centered on the electron (muon), excluding the contribution from the lepton candidate. Tracks not coming from the main vertex are excluded in the calculation. This isolation variable is required to be smaller than 6 (15)% of the electron (muon) p T . Events with W bosons decaying into τ leptons contribute to the measurement only if the τ leptons decay into electrons or muons that satisfy the selection requirements. In events with more than two leptons passing the selection, the two with the largest p T are selected for further study.
Jets are reconstructed from the PF candidates using the anti-k T clustering algorithm [35,36] with a distance parameter of 0.4. The jet momentum is determined as the vectorial sum of all particle momenta in the jet, and on average is found from simulation to be within 5 to 10% of the true momentum over the whole p T spectrum and detector acceptance. Additional pp interactions within the same or nearby bunch crossings can contribute additional tracks and calorimetric energy depositions to the jet momentum. To mitigate this effect, tracks identified as originating from pileup vertices are discarded, and an offset correction is applied to correct for the remaining contributions. Jet energy corrections, derived from simulation, are applied so that the average response to jets matches the particle level jets [37]. In situ measurements of the momentum balance in dijet, pho-ton+jet, Z+jet, and multijet events are used to account for any residual differences in jet energy scale (JES) between data and simulation. Additional selection criteria are applied -4 -

JHEP10(2018)117
to each jet to remove jets potentially dominated by anomalous contributions from various subdetector components or reconstruction failures. Jets are required to have p T > 30 GeV and |η| < 2.4. In order to avoid double counting, jets within a cone of ∆R = 0.4 with respect to the selected leptons are not considered. Jets passing the above identification criteria but with p T between 20 and 30 GeV are referred to as "loose jets".
The missing transverse momentum vector p miss T is defined as the negative vector sum of the momenta of all reconstructed PF candidates in an event, projected onto the plane perpendicular to the direction of the beam axis. Its magnitude is referred to as p miss T and the corrections to jet momenta are propagated to the p miss T calculation [38]. In contrast to some sources of backgrounds, such as DY events, the tW final state contains a bottom quark. The identification of jets originating from b quarks results in a significant reduction in background. Jets are identified as b jets using the combined secondary vertex algorithm v2 [39], with an operating point that yields identification efficiencies of ≈70% and misidentification (mistag) probabilities of about 1% and 15% [39] for light-flavor jets (u, d, s, and gluons) and c jets, respectively, as estimated from simulated events.
Events are classified as belonging to the e ± µ ∓ final state if the two leptons with larger p T (leading leptons) passing the above selection criteria are an electron and a muon of opposite charge. We require the leading lepton to have p T > 25 GeV. As this requirement for electrons is lower than the corresponding trigger threshold, some of the phase space is triggered only by the muon or dilepton triggers. This effect is taken into account in the measurement of the trigger efficiency. To reduce the contamination from DY production of τ lepton pairs with low invariant dilepton mass, the invariant mass of the lepton pair is required to be greater than 20 GeV. Figure 2 shows a comparison of several lepton kinematic distributions in data and simulated events after this baseline selection. Figure 3 shows a comparison of the yields observed in data with those estimated from simulated events, classified according to the number of jets and identified b jets in the event. As expected, the most signal-enriched region is the one with one jet that is tagged as a bottom jet (1j1b region), but the size of the signal in comparison with the overwhelming tt background makes a cut-based analysis extremely challenging. Therefore, a multivariate analysis is pursued.
For the final analysis, the events are classified into three independent categories: a signal-enriched region with 1j1b events, and two background-dominated regions with two jets, one with one b-tagged jet (2j1b) and one with two b-tagged jets (2j2b).

Signal extraction
As noted previously, following the baseline event selection the data sample in the 1j1b region consists primarily of tt events with a significant number of tW signal events (as can be seen from figure 3). Given that there is no single observable that clearly discriminates between the signal and background, a multivariate method is used to discriminate the tW signal from the main background process, tt. Several observables are combined into a single discriminator using a boosted decision tree (BDT) technique [40,41]. In this analysis, the BDT implementation is provided by the "Toolkit for Multivariate Data Analysis" [40] package, using the gradient boost algorithm [40,41]. The training of the BDT is performed using dedicated simulated samples for tW and tt that are statistically independent from those used for the signal extraction. The input variables used for training the BDT in the 1j1b region, listed in order of importance to the BDT training are shown below. The order  of importance is determined by counting how often each variable is used to split decision trees. The counts are weighted by the separation gain squared achieved by the variable and by the number of events in the node.
• p T of leading loose jet, set to 0 for events with no loose jets present; • magnitude of the vector sum of the p T 's of leptons, jet, and p miss T (p sys T ); • p T of the jet; • ratio of the scalar sum of the p T of the leptons to the scalar sum (H T ) of the p T 's of leptons, jet, and p miss T ; • number of loose jets; • centrality (ratio between the scalar sums of the p T and of the total momentum) of the jet and the two leptons; • magnitude of the vector sum of the p T of the jet and leptons; • H T ; • ratio of p sys T to H T for the event; • invariant mass of the combination of the leptons, jet, and p miss T ; • number of b-tagged loose jets.
The distributions of the four variables with the most discriminating power, in data and simulated events, are shown in figure 4.
A separate BDT is trained with events in the 2j1b region. The input variables used for the training, listed in order of importance to the BDT training, are the following: • separation in the φ−η space between the dilepton and dijet systems, ∆R(e ± µ ∓ , j 1 j 2 ); • separation in the φ − η space between the dilepton system and the dijet and p miss T system, ∆R (e ± µ ∓ , j 1 j 2 p miss T ); • p T of the subleading jet; • separation in the φ − η space between the leading lepton and the leading jet, ∆R ( 1 , j 1 ).
The 2j2b control region is highly enriched with tt events and is used to constrain this main source of background using the p T distribution of the subleading jet. This variable is sensitive to JES variations and, therefore, useful in constraining this source of systematic uncertainty.
The signal is extracted by performing a maximum likelihood fit to one measured distribution in each of the three measurement regions: the distributions of the BDT output in the 1j1b and 2j1b categories, and of the p T of the subleading jet in the 2j2b region. The binning of the BDT outputs is chosen such that each bin contains approximately the JHEP10(2018)117 same number of tt background events. This selection of binning ensures that enough background events populate all the bins of the distribution, helping to constrain the systematic uncertainties. The fit is performed simultaneously in the three regions. The uncertainties on the tt overall normalization and shapes (including migrations into/out of the signal and control regions) are handled using different nuisance parameters, one for each systematic uncertainty and for all regions.
The likelihood used in this statistical analysis, L(µ, θ), is a function of the signal strength, defined as the ratio of measured and expected cross sections µ = σ tW /σ ref tW , and a set of nuisance parameters, θ, that parametrize the systematic uncertainties present in the analysis. The expected numbers of both signal and background events in each bin of the distributions are obtained using normalized distributions (templates) from simulation, and are a function of θ and, in the case of the signal, µ. The likelihood function is constructed as the product of Poisson probabilities, corresponding to the number of events in each bin of the distributions. Additionally, the systematic uncertainties are introduced in the likelihood by multiplying it by the prior of each nuisance parameter, which are log-normal probability density functions.
The best value for µ is obtained by maximizing the likelihood function with respect to all its parameters. The 68% confidence interval is obtained by considering variations of the test statistic used in ref.
[42] by one unit from its minimum.

Systematic uncertainties
The measurement of the tW production cross section is affected by systematic uncertainties that originate from detector effects and event modeling, which can change the shape and/or the normalization of the distributions used in the fit. Each source of systematic uncertainty is assessed individually by appropriate variations of the MC simulations or by variations of parameter values in the analysis within their estimated uncertainties, and propagated to the signal strength. A nuisance parameter represents each of the sources and these parameters are used, together with the tW production cross section, as parameters in the fit.

Experimental uncertainties
The following sources of experimental uncertainty are considered in the analysis: • The uncertainties in the trigger and lepton identification efficiencies in simulation are estimated by varying data-to-simulation SFs by their uncertainties. These are about 0.7 and 1.5%, respectively, with some dependence on the lepton p T and η. For lepton efficiencies we have two nuisance parameters, one for electrons and one for muons.
• The uncertainty due to the limited knowledge of the JES and jet energy resolution is determined by varying the scale and resolution within the uncertainties in bins of p T and η, typically by a few percent [37]. JES uncertainties are propagated to p miss T .
• The uncertainties resulting from the b tagging efficiency and misidentification rate are determined by varying, within their uncertainties, the b tagging data-to-simulation -9 -

JHEP10(2018)117
SFs of the b jets and the light-flavor jets, respectively. These uncertainties depend on the p T and η of the jet and amount to approximately 2% for b jets and 10% for mistagged jets [39], as determined in simulated tt events.
• The uncertainty assigned to the number of pileup events in simulation is obtained by changing the inelastic pp cross section, which is used to estimate the pileup in data, by ±4.6% [43].
• The uncertainty in the integrated luminosity is estimated to be 2.5% [44].
Given that jets produced in tt or tW events, regardless of the jet multiplicity of the event, are expected to belong to the same kinematical regime, JES, b tagging efficiency and misidentification rate are each covered by one single nuisance parameter.

Modeling uncertainties
It is important for the measurement that the modeling of the tW signal and tt background events is well understood. The impact of the theoretical assumptions in the modeling is determined by building the templates with dedicated simulation samples of tW and tt events. These samples are produced by varying the parameters from those of the standard powheg +pythia simulations.
The uncertainty in the modeling of the hard-production process is assessed by changing independently µ R and µ F in the powheg sample by factors of 2 and 0.5 relative to their common nominal value, which is set in powheg to where p T,t denotes the transverse momentum of the top quark in the tt rest frame.
To account for the parton shower (PS) and fragmentation uncertainties, different effects are studied: • Underlying event: pythia parameters that are tuned to the measurements of the underlying event [21,24], to account for non-perturbative QCD effects, are varied up and down within their uncertainties in simulated tt events.
• Matrix element/PS matching: the uncertainty in the combination of the matrixelement calculation with the parton shower in simulated tt events is estimated from the variation of the powheg parameter h damp = 1.58 +0.66 −0.59 m t [24], which regulates the damping of real emissions in the NLO calculation when matching to the PS [21].
• Initial-(final-) state radiation scale: the PS scale used for the simulation of the initial-(final-) state radiation is varied up and down by a factor of two. These variations are motivated by the uncertainties in the PS tuning [21].
• Color reconnection: the effect of multiple parton interactions and the parameterization of color reconnection have been studied in ref. [24] and are varied accordingly in simulated tt events. In addition, we use a simulation including color reconnection of early resonant decays. The uncertainties that arise from ambiguities in modeling color-reconnection effects are estimated by comparing the default model in pythia with two alternative models of color reconnection, a model with string formation -10 -

JHEP10(2018)117
beyond leading color [45] and a model in which the gluons can be moved to another string [46]. All models are tuned to measurements of the underlying event [21,24]. The largest variation in each bin with respect to the nominal yield is taken as the systematic uncertainty.
• The uncertainty from the choice of PDFs is determined by reweighting the sample of simulated tt events according to the 100 NNPDF3.0 replicas [17]. For each bin, the root-mean-square of the variation in the acceptance for all the PDF sets is taken as an uncertainty. In order not to loose robustness in the fit, a single nuisance parameter is used.
Additionally, the difference between the DS and DR schemes is taken as a source of systematic uncertainty in the signal.
Finally, in order to extract the inclusive cross section from the measurement in the visible phase space, an extrapolation from the visible to the total phase space is needed. This avoids constraining shape-related systematic uncertainties outside the observable phase space (which enter the fit as normalizations). This extrapolation is made by determining the signal acceptance from simulation. The effect of the signal modeling uncertainties in the acceptance is taken into account as an additional source of systematic uncertainty uncorrelated with all the effects described above and added in quadrature to the total uncertainty obtained in the fit.
Measurements of the differential cross section for top quark pair production have shown that the momentum of the top quark is softer than predicted by the powheg simulation [52,53]. The effect of this mismodelling of the p T spectrum was estimated by reweighting the simulation, and found to have a negligible effect. The difference in the predictions of the NLO generators powheg and MadGraph5 amc@nlo for tW and tt production, where both use pythia for hadronization, fragmentation, and additional radiation description, was estimated and found to be negligible with respect to the modeling uncertainties already assigned.

Background normalization uncertainties
For tt a normalization uncertainty of 5% is used. This takes into account effects coming from µ R and µ F scales, PDFs and α S in the NNLO calculation [29]. For DY and non-W/Z backgrounds, a normalization uncertainty of ±50% is assumed. This value is motivated by the precision of estimation methods using control regions in data, which are found to be compatible with the predictions from the simulation. For ttV and VV backgrounds, an uncertainty of 50% is also used. This value reflects the uncertainties in the corresponding predicted cross sections but is increased to account for the uncertainties due to the extrapolation of the inclusive cross section into the phase space used in the analysis. The overall uncertainty is not changed significantly by varying this uncertainty.
Comparisons of the final distributions of the BDT discriminants in the 1j1b and 2j1b regions, as well as the distribution of the subleading jet p T in the 2j2b region for data and simulated events, are shown in figure 5.  The number of expected events for signal and tt obtained before the fit (prefit) and after the fit (postfit) are shown in table 1.
Several nuisance parameters (JES, tt modeling) are significantly constrained due to their effect on the jet multiplicity and the input distributions used in the fit. The tt normalization is also constrained due to the large presence of tt in the different regions.
The impact of each source of systematic uncertainty in the fit, shown in table 2, is evaluated by performing the fit, fixing the rest of the nuisance parameters to their postfit value. We take the difference in quadrature between the uncertainty of the fit with all the nuisance parameters except the one under study fixed to the postfit value, and the uncertainty of the fit with all the nuisances fixed to the postfit value. The uncertainties in the luminosity and in the trigger and lepton efficiencies lead to uncertainties in the background, which is dominant in all bins of the fit. Therefore, these uncertainties make a sizable contribution to the uncertainty in the final measurement.

Summary
The data recorded by CMS at 13 TeV, corresponding to an integrated luminosity of 35.9 ± 0.9 fb −1 , are used to measure the tW production cross section in the e ± µ ∓ channel, classifying the events in terms of the number of jets and jets originating from bottom quarks. The signal is measured using a maximum likelihood fit to the distribution of boosted decision tree discriminants in two of the categories, and to the p T distribution of the second jet with highest p T in a third category. The measured cross section for tW production is found to be 63.1 ± 1.8 (stat) ± 6.4 (syst) ± 2.1 (lumi) pb, achieving a relative uncertainty of 11%. This is the first measurement of this process by the CMS Collaboration at √ s = 13 TeV. The measured cross section is in agreement with the standard model prediction of σ ref tW = 71.7 ± 1.8 (scale) ± 3.4 (PDF) pb and with a similar measurement by the ATLAS Collaboration [13].

Acknowledgments
We congratulate our colleagues in the CERN accelerator departments for the excellent performance of the LHC and thank the technical and administrative staffs at CERN and at other CMS institutes for their contributions to the success of the CMS effort. In addition, we gratefully acknowledge the computing centres and personnel of the Worldwide LHC Computing Grid for delivering so effectively the computing infrastructure essential to our -13 -JHEP10 (2018) [37] CMS collaboration, Jet algorithms performance in 13 TeV data, CMS-PAS-JME- 16-003 (2017).
[38] CMS collaboration, Performance of missing energy reconstruction in √ s = 13 TeV pp collision data using the CMS detector, CMS-PAS-JME- 16-004 (2016 [53] CMS collaboration, Measurement of differential cross sections for top quark pair production using the lepton+jets final state in proton-proton collisions at 13 TeV, Phys. Rev. -19 -