Measurement of the tt production cross section, the top quark mass, and the strong coupling constant using dilepton events in pp collisions at √ s = 13 TeV

A measurement of the top quark–antiquark pair production cross section σ tt in proton–proton collisions at a centre-of-mass energy of 13 TeV is presented. The data correspond to an integrated luminosity of 35 . 9 fb − 1 , recorded by the CMS experiment at the CERN LHC in 2016. Dilepton events (e ± μ ∓ , μ + μ − , e + e − ) are selected and the cross section is measured from a likelihood ﬁt. For a top quark mass parameter in the simulation of m MCt = 172 . 5 GeV the ﬁt yields a measured cross section σ tt = 803 ± 2 (stat) ± 25 (syst) ± 20 (lumi) pb, in agreement with the expectation from the standard model calculation at next-to-next-to-leading order. A simultaneous ﬁt of the cross section and the top quark mass parameter in the powheg simulation is performed. The measured value of m MCt = 172 . 33 ± 0 . 14 (stat) + 0 . 66 − 0 . 72 (syst) GeV is in good agreement with previous measurements. The resulting cross section is used, together with the theoretical prediction, to determine the top quark mass and to extract a value of the strong coupling constant with different sets of parton distribution functions.


Introduction
Measurements of the top quark-antiquark pair cross section σ tt in proton-proton (pp) collisions provide important tests of the standard model (SM). At the CERN LHC, measurements with increasing precision have been performed by the ATLAS and CMS Collaborations in several different decay channels and at four pp collision energies [1][2][3][4][5]. Precise theoretical predictions of σ tt have been performed in perturbative quantum chromodynamics (QCD) at next-to-next-toleading order (NNLO) [6][7][8][9]. The calculations depend on several fundamental parameters: the top quark mass m t ,the strong coupling constant α S , and the parton distribution functions (PDFs) of the proton. The measurements of σ tt have G. Vesztergombi, A. C. Benvenuti: Deceased. ⋆ e-mail: cms-publication-committee-chair@cern.ch been used to determine the top quark pole mass [1,4,10-12], α S [4,13], and the PDFs [14][15][16][17].
The value of m t significantly affects the prediction for many observables, either directly or via radiative corrections. It is a key input to electroweak precision fits [18] and, together with the value of the Higgs boson mass and α S , it has direct implications on the SM predictions for the stability of the electroweak vacuum [19]. In QCD calculations beyond leading order, m t depends on the renormalization scheme. In the context of the σ tt predictions, the pole (on-shell) definition for the top quark mass m pole t has wide applications; however, it suffers from the renormalon problem that introduces a theoretical ambiguity in its definition. The minimal subtraction (MS) renormalization scheme has been shown to have a faster convergence than other schemes [20]. The relation between the pole and MS masses is known to the four-loop level in QCD [21]. Experimentally, the most precise measurements of the top quark mass are obtained in so-called direct measurements performed at the Tevatron and LHC [22][23][24][25]. Except for a few cases such as Ref. [26], the measurements rely on Monte Carlo (MC) generators to provide the relation between the top quark mass and an experimental observable. Current MC generators implement matrix elements at leading or next-to-leading order (NLO), while higher orders are simulated through parton showering. Studies suggest that the top quark mass parameter m MC t , as implemented in current MC generators, corresponds to m pole t to an uncertainty on the order of 1 GeV [27,28]. A theoretically well-defined mass can be determined by comparing the measured tt cross section to the fixed-order theoretical predictions [1,4,10-12].
With the exception of the quark masses, α S is the only free parameter in the QCD Lagrangian. While the renormalization group equation predicts the energy dependence of α S , i.e. it gives a functional form for α S (Q), where Q is the energy scale of the process, actual values of α S can only be obtained from experimental data. By convention and to facilitate comparisons, α S values measured at different energy scales are typically evolved to Q = m Z , the mass of the Z boson. The current world-average value for α S (m Z ) is 0.1181±0.0011 [29]. In spite of this relatively precise result, the uncertainty in α S still contributes significantly to many QCD predictions, including cross sections for top quark or Higgs boson production. Very few measurements allow α S to be tested at high Q, and the precision on the world-average value for α S (Q) is driven by low-Q measurements. A determination of σ tt was used by the CMS Collaboration to extract the value of α S (m Z ) at NNLO for the first time [11]. In the prediction for σ tt , α S appears not only in the expression for the parton-parton interaction but also in the QCD evolution of the PDFs. Varying the value of α S (m Z ) in the σ tt calculation therefore requires a consistent modification of the PDFs. The full correlation between the gluon PDF, α S , and m t in the prediction for σ tt has to be accounted for.
The analysis uses events in the dileptonic decay channels in which the two W bosons from the electroweak decays of the two top quarks each produce an electron or a muon, leading to three event categories: e ± μ ∓ , μ + μ − , and e + e − . The data set was recorded by CMS in 2016 at a centre-ofmass energy of 13 TeV, corresponding to an integrated luminosity of 35.9fb −1 . The measurement is performed using a maximum-likelihood fit in which the sources of systematic uncertainty are treated as nuisance parameters. Distributions of observables are chosen as input to the fit so as to further constrain the uncertainties. The fitting procedure largely follows the approach of Ref. [4]. In this analysis, the number of events is significantly larger than in previous data sets, thus providing tighter constraints. The dominant uncertainties come from the integrated luminosity and the efficiency to identify the two leptons. The correlation between the three decay channels is used to constrain the overall lepton identification uncertainty to that of the better-constrained lepton, which is the muon.
Experimentally, the measured value of σ tt has a residual dependence on the value of m MC t used in the simulation to estimate the detector efficiency and acceptance. In contrast, the experimental dependence of σ tt on the value of α S (m Z ) used in the simulation is negligible [11]. For the extraction of a theoretically well-defined m t , the dependence of the cross section on the assumption of a m MC t value can be reduced by including m MC t as an additional free parameter in the fit [30]. In this paper, the cross section σ tt is first measured for a fixed value of m MC t = 172.5 GeV, and then determined simultaneously with m MC t . In the simultaneous fit, input distributions sensitive to the top quark mass are introduced in order to constrain m MC t . For the measured parameter m MC t ,thesame systematic uncertainties are taken into account as in Ref. [31]. Finally, the measured value of σ tt at the experimentally constrained value of m MC t is used to extract α S (m Z ) and m t in the MS scheme, using different PDF sets. For m t , the pole mass scheme is also considered.
The paper is structured as follows. After a brief description of the CMS experiment and the MC event generators in Sect. 2, the event selection is presented in Sect. 3. The event categories and the maximum-likelihood fit are explained in Sect. 4. The systematic uncertainties in the measurement are discussed in Sect. 5. The result of the cross section measurement at a fixed value of m MC t = 172.5 GeV is presented in Sect. 6, and the simultaneous measurement of σ tt and m MC t is presented in Sect. 7. The extraction of m t and α S in the MS scheme and the top quark pole mass are described in Sects. 8 and 9, respectively, and a summary is given in Sect. 10.

The CMS detector and Monte Carlo simulation
The central feature of the CMS apparatus [32] is a superconducting solenoid of 6 m internal diameter, providing a magnetic field of 3.8 T. Within the solenoid volume are a silicon pixel and strip tracker, a lead tungstate crystal electromagnetic calorimeter, and a brass and scintillator hadron calorimeter, each composed of a barrel and two endcap sections. These are used to identify electrons, photons, and jets. Forward calorimeters extend the pseudorapidity coverage provided by the barrel and endcap detectors. Muons are detected in gas-ionization chambers embedded in the steel flux-return yoke outside the solenoid. The detector is nearly hermetic, providing reliable measurement of the momentum imbalance in the plane transverse to the beams. A two-level trigger system selects interesting events for offline analysis [33]. A more detailed description of the CMS detector, together with a definition of the coordinate system used and the relevant kinematic variables, can be found in Ref. [32].
The powheg v2 [34][35][36] NLO MC generator is used to simulate tt events [37] and its model dependencies on m MC t , the PDFs [37], and the renormalization and factorization scales, μ r = μ f = m T = √ m 2 t + p 2 T , where m t is the pole mass and p T is the transverse momentum of the top quark. The PDF set NNPDF3.0 [38] is used to describe the proton structure. The parton showers are modelled using pythia 8.2 [39] with the CUETP8M2T4 underlying event (UE) tune [40,41]. In this analysis, tt events are split into a signal and a background component. The signal consists of dilepton events and includes contributions from leptonically decaying τ leptons. All other tt events are considered as background.
Contributions to the background include single top quark processes (tW), Drell-Yan (DY) events (Z/γ * +jets), and W+jets production, as well as diboson (VV) events (including WW, WZ, and ZZ) with multiple jets, while the contribution from QCD multijet production is found to be negligible. The DY and tW processes are simulated in powheg v2 [42][43][44] with the NNPDF3.0 PDF and interfaced to pythia 8.202 with the UE tune CUETP8M2T4 [45]f o r hadronization and fragmentation. The W+jets events are generated at NLO using MadGraph5_amc@nlo 2.2.2 [46,47] with the NNPDF3.0 PDF and pythia 8.2 with the UE tune CUETP8M1. Events with WW, WZ, and ZZ diboson processes are generated at leading order using pythia 8.2 with the NNPDF2.3 PDF and the CUETP8M1 tune.
To model the effect of additional pp interactions within the same or nearby bunch crossing (pileup), simulated minimum bias interactions are added to the simulated data. Events in the simulation are then weighted to reproduce the pileup distribution in the data, which is estimated from the measured bunch-to-bunch instantaneous luminosity, assuming a total inelastic pp cross section of 69.2 mb [48].
For comparison with the measured distributions, the event yields in the simulated samples are normalized to their cross section predictions. These are obtained from calculations at NNLO (for W+jets and Z/γ * +jets [49]), NLO plus next-to-next-to-leading logarithms (NNLL) (for tW production [50]), and NLO (for diboson processes [51]). For the simulated tt sample, the full NNLO+NNLL calculation, performed with the Top++ 2.0 program, is used [52]. The proton structure is described by the CT14nnlo [53]P D Fs e t , where the PDF and α S uncertainties are estimated using the prescription by the authors. These are added in quadrature to the uncertainties originating from the scale variation m t /2 <μ r ,μ f < 2m t . The cross section prediction is σ theo tt = 832 +20 −29 (scale) ± 35 (PDF + α S ) pb, assuming a top quark pole mass of 172.5 GeV.

Event selection
Events with at least two leptons (electron or muon) of opposite charge are selected. In events with more than two leptons, the two leptons of opposite charge with the highest p T are used. An event sample of three mutually exclusive event categories e ± μ ∓ , μ + μ − , and e + e − is obtained.
A combination of single and dilepton triggers is used to collect the events. Each event is required to pass at least one of the triggers described below. Events in the e ± μ ∓ channel are required to contain either one electron with p T > 12 GeV and one muon with p T > 23 GeV, or one electron with p T > 23 GeV and one muon with p T > 8GeV. Events in the same-flavour channels are required to have p T > 23 (17) GeV for the electron (muon) with the higher p T , referred to in the following as the leading lepton, and p T > 12 (8) GeV for the other electron (muon), referred to as the subleading lepton. For all channels, single-lepton triggers with one electron (muon) with p T > 27 (24) GeV are also used.
The particle-flow (PF) algorithm aims to reconstruct and identify each individual particle in an event, and to form PF candidates by combining information from the various components of the CMS detector [54]. The reconstructed vertex with the largest value of summed physics-object p 2 T is taken to be the primary pp interaction vertex.
Electron and muon candidates are identified through their specific signatures in the detector [55,56]. Lepton candidates are required to have p T > 25 (20) GeV for the leading (subleading) lepton, in the range |η| < 2.4. Electron candidates in the transition region between the barrel and endcap calorimeters, corresponding to 1.4442 < |η| < 1.5660, are rejected because the reconstruction of electrons in this region is not optimal.
Lepton isolation requirements are based on the ratio of the scalar sum of the p T of neighbouring PF candidates to the p T of the lepton candidate, which is referred to as the lepton isolation variable. These PF candidates are the ones falling within a cone of size ΔR = 0.3 (0.4) for electrons (muons), centred on the lepton direction, excluding the contribution from the lepton candidate itself. The cone size ΔR is defined as the square root of the quadrature sum of the differences in the azimuthal angle and pseudorapidity. The value of the isolation variable is required to be smaller than 6% for electrons and 15% for muons. Events with dilepton invariant mass m ℓℓ < 20 GeV (ℓ = e,μ) are rejected to suppress backgrounds due to QCD multijet production and decays of low mass resonances. Additionally, leptons are required to be consistent with originating from the primary interaction vertex.
Jets are reconstructed from the PF candidates using the anti-k T clustering algorithm with a distance parameter of 0.4 [57,58]. The jet momentum is determined from the vectorial sum of all particle momenta in the jet, and is found from simulation to be within 5 to 10% of the true momentum over the relevant phase space of this analysis [59]. Pileup interactions can contribute additional tracks and calorimetric energy depositions to the jet momentum. To mitigate this effect, charged particles identified as originating from pileup vertices are discarded and an offset correction is applied to correct for remaining contributions [59]. The jet energy corrections are determined from measurements of the energy balance in dijet, multijet, photon+jet, and leptonically decaying Z+jets events, and are applied as a function of the jet p T and η to both data and simulated events [59]. For this measurement, jets are selected if they fulfill the criteria p T > 30 GeV and |η| < 2.4.
Jets originating from the hadronization of b quarks (b jets) are identified (b tagged) using the combined secondary vertex [60] algorithm, which combines lifetime information from tracks and secondary vertices. To achieve high purity, a working point is chosen such that the fraction of light-flavour jets with p T > 30 GeV that are falsely identified as b jets is 0.1%, resulting in an average efficiency of about 41% for genuine b jets and 2.2% for c jets [60].
In the same-flavour channels, μ + μ − and e + e − , DY events are suppressed by excluding the region of the Z boson mass through the requirement 76 < m ℓℓ < 106 GeV. In these channels, events are also required to contain at least one btagged jet.
Distributions of the leading and subleading lepton p T and η, and the jet and b-tagged jet multiplicities in events fulfilling the above selection criteria are shown in Figs. 1, 2 and 3 for the e ± μ ∓ , μ + μ − , and e + e − channels, respectively. The event yields in the simulations are normalized to the corresponding cross section predictions, as explained in Sect. 2. Selected events include a very small contribution from tt processes in the lepton+jets decay channel (referred to as "tt other" in the figures) in which one of the charged leptons originates from heavy-flavour hadron decay, misidentified hadrons, muons from light-meson decays, or electrons from unidentified photon conversions. Such leptons also lead to dilepton background in this analysis via W+jets processes.
In all categories, the simulation is found to describe the data well within the systematic uncertainties, indicated by the bands in the figures.

Event categories and fit procedure
The measurement is performed using a template fit to multidifferential distributions, divided into distinct event categories using the b-tagged jet multiplicity, similar to the method utilized in a previous measurement [4]. In each of the sameflavour channels, two categories are defined, corresponding to events having 1 or 2 b-tagged jets. Events with zero btagged jets are not included since they are dominated by the DY background process. In the e ± μ ∓ channel, three categories are defined, corresponding to events having 1, 2, or 0o r≥ 3 b-tagged jets. The templates describing the distributions for the signal and background events are taken from simulation. Categorizing the events by their b-tagged jet multiplicity allows the efficiency ǫ b for selecting and identifying a b jet to be constrained. Previous measurements that used a template fit with dilepton events were restricted to the e ± μ ∓ channel [1,4]. In this analysis, the decay channels with two electrons and two muons are also included in the fit. In this way, additional constraints on the lepton identification efficiencies are obtained.
First, a visible tt cross section σ vis tt , defined for a phase space corresponding to the experimentally accessible fiducial volume, as described in Sect. 6, is determined. For the visible cross section, the fit is used to constrain the systematic uncertainties from the data. Using the relation the measured visible cross section is then extrapolated to the full phase space to obtain σ tt . Here, A ℓℓ denotes the acceptance, which is defined as the fraction of tt events that fulfill the selection criteria for the visible cross section. The acceptance incorporates the combined branching fraction for the t and t quarks to decay to two charged leptons [29]. Apart from the free parameter of interest σ vis tt , the parameters of the fit are the J nuisance parameters λ = (λ 1 ,λ 2 ,...,λ J ) corresponding to the various sources of systematic uncertainty, discussed in detail in Sect. 5.
The likelihood function L is based on Poisson statistics: where i denotes the bin of the respective final-state distribution, and ν i and n i are the expected and observed number of events in bin i, respectively. The symbol π(λ j ) denotes a penalty term for the deviation of the nuisance parameter λ j from its nominal value according to its prior density distribution. A Gaussian prior density distribution is assumed for all nuisance parameters. The expectation values ν i can be written as Here, s i denotes the expected number of tt signal events in bin i and the quantity b MC k,i represents the prediction of the number of background events in bin i from source k.T h e Minuit program [61] is used to minimize −2ln(L) with L given in Eq. (2), and the Minos [61] algorithm is used to estimate the uncertainties.
For the determination of the b tagging efficiencies, multinomial probabilities are used to describe the expected number of signal events with one b-tagged jet, s 1b , two b-tagged jets, s 2b , and zero or more than two b-tagged jets, s other : where L denotes the integrated luminosity and ǫ ℓℓ is the efficiency for events in the visible phase space to pass the full selection described in Sect. 3. The quantity C b corrects for any small correlations between the tagging of two b jets in an event, expressed as C b = 4s all s 2b /(s 1b + 2s 2b ) 2 , where s all denotes the total number of signal events. The values for ǫ ℓℓ , ǫ b , and C b are directly determined from the tt signal simulation, expressing ǫ b as (s 1b + 2s 2b )/2s all . The values of these parameters for the nominal signal simulation in the e ± μ ∓ channel are ǫ eμ = 0.49, ǫ b = 0.30, and C b = 1.00.  The overall selection efficiency ǫ ℓℓ is a linear combination of the efficiencies ǫ eμ , ǫ ee , and ǫ μμ , in the three different dilepton channels, each given by the product of the two efficiencies for identifying a single lepton of the respective flavour. Prior to the fit, the muon identification uncertainty is smaller than that for electrons. By fitting the three dilepton decay channels simultaneously, the ratio of single-lepton efficiencies ǫ e and ǫ μ is constrained. In the fit, the electron identification uncertainty is constrained to that for muons.

Events / GeV
The values for ǫ ℓℓ , ǫ b , C b , the number of signal events in each category, and the background rates depend on the nuisance parameters λ. The dependence on the parameter λ j is modelled by a second-order polynomial that describes the quantity at the three values λ j = 0, 1, −1, corresponding to  The same distributions as in Fig. 1, but for the μ + μ − channel the nominal value of the parameter and to a variation by +1 and − 1 standard deviation, respectively. If a variation is only possible in one direction, a linear function is used to model the dependence on λ j . The events are further categorized by the number of additional non-b-tagged jets in the event. Each of the seven previously described event categories is further divided by grouping together events with 0, 1, 2, or ≥ 3 additional non-btagged jets, thus producing 28 disjoint event categories. For those categories that have events with at least one additional non-b-tagged jet, the smallest p T among those jets is used as the observable in the fit. For those categories containing events with zero additional non-b-tagged jets, the total num-ber of events in the category is used as the observable in the fit. The further division of events into these categories and the observable distributions from each category provide the sensitivity to constrain the modelling systematic uncertainties, such as those coming from variations in the scales for the matrix element (ME) and parton shower (PS) matching. For events with no additional jets, the total event yield is used.
The statistical uncertainty in the templates from simulation is taken into account by using pseudo-experiments. At each iteration, templates are varied within their statistical uncertainty. Templates created from different simulations are treated as statistically uncorrelated, while templates derived by varying weights in the simulation are treated as correlated.  The template dependencies are rederived and the fit to data is repeated. Repeating this 30,000 times yields an approximately Gaussian distribution of the fitted value of the tt cross section (and of m MC t in the combined fit) and of the vast majority of the nuisance parameters. The root-mean-square of each distribution is considered as an additional uncertainty from the event counts in the simulated samples for the corresponding nuisance parameter.
The input distributions to the fit are shown in Figs. 4, 5 and 6, where the data are compared to the signal and background distributions resulting from the fit to the data. In the top row, the number of events without additional non-b-tagged jets is displayed. For events with at least one additional nonb-tagged jet, the p T distributions of the non-b-tagged jet with the smallest p T in the respective category is considered, except for the category corresponding to events with 2 b-tagged jets and at least three additional non-b-tagged jets, where the statistical uncertainty of the simulation is high. This distribution is chosen in order to constrain the jet energy scale at lower jet p T , where the corresponding systematic uncertainty is larger [59]. Good agreement is found between the data and the simulation.

Systematic uncertainties
The contributions from each source of systematic uncertainty are represented by nuisance parameters (see Sect. 4). For each Here, the solid gray band represents the contribution of the statistical uncertainty in the MC simulation uncertainty, the simulation is used to construct template histograms that describe the expected signal and background distributions for a given nuisance parameter variation. In the fit of the templates to the data, the best values for σ vis tt (and m MC t in the case of the combined fit) and all nuisance parameters are determined, as described in Sect. 4. The prior probability density functions for the nuisance parameters have a Gaussian shape. Table 1 shows the value of the contributions of the uncertainties after the fit.
Most of the experimental uncertainties are determined from ancillary measurements in which data and simulation are compared and small corrections to the simulation, referred to as scale factors (SFs), are determined. To assess the impact of the uncertainty in these corrections, the SFs are varied within their uncertainty and the analysis is repeated.
The trigger efficiencies are determined using multiple independent methods, which show agreement within 0.3%.

fb
An additional statistical uncertainty arises because the SFs are determined from the data in intervals of p T and η.
The uncertainty in the SFs of the lepton identification efficiency is typically 1.5% for electrons and 1.2% for muons, with a small dependency on the lepton p T and η. The uncertainties in the calibration of the muon and electron momentum scales are included as nuisance parameters for each lepton separately. Their impact on the measurement is negligible.
The impact of the jet energy scale (JES) uncertainties is estimated by varying the jet momenta within the JES uncertainties, split into 18 contributions [59]. To account for the jet energy resolution (JER), the SFs are varied within their |η|-dependent uncertainties [62].
The uncertainties associated with the b tagging efficiency are determined by varying the related corrections for the simulation of b jets and light-flavour jets, split into 16 orthogonal contributions for b jets. These uncertainties depend on the p T of each jet and amount to approximately 1.5% for b jets in tt signal events [60]. The uncertainty in the modelling of the number of pileup events is obtained by changing the inelastic pp cross section, which is used to model the pileup in simulation, by ± 4.6% [48].
The integrated luminosity uncertainty is not included in the fit as a nuisance parameter, but treated as an external uncertainty. It is estimated to be 2.5% [63].
The ME scale uncertainties for the simulation of the tt and DY are assessed by varying the renormalization and factorization scale choices in powheg by factors of two up and down independently [64,65], avoiding cases where μ f /μ r = 1/4or4.
To estimate the uncertainty due to the NLO generator, the powheg tt signal sample is replaced by a tt sample generated using the MadGraph5_amc@nlo program with FxFx matching [66]. This uncertainty is only included in the combined measurement of σ tt and m MC pare with the latest direct top quark mass measurement from CMS in the lepton+jets channel [31].
The PDF uncertainty is estimated using the 28 orthogonal Hessian eigenvectors of the CT14 [53] PDF, which are used as independent inputs to the fit.
Differential measurements of σ tt at √ s = 13 TeV have demonstrated that the p T distribution of the top quark is softer than predicted by the powheg simulation [67][68][69]. An additional uncertainty, referred to as "Top quark p T ", is estimated by reweighting the simulation. This nuisance parameter has a one-sided prior distribution.
The uncertainty due to the matching of the ME to the PS in simulation is estimated by varying the h damp parameter in powheg, as described in Ref. [40]. The uncertainty due to the assumptions in the UE tune is estimated by varying the tuning parameters [40]. The impact of the PS scale uncertainty is estimated by varying the initial-state radiation (ISR) and the final-state radiation (FSR) scales by a factor of two up and down [41], similar to the case of renormalization and factorization scales.
The uncertainties due to the assumed b hadron branching fraction (BF) and fragmentation are taken into account following the procedures described in Ref. [31]. For the fragmentation, variations of the Bowler-Lund fragmentation function [70] and the comparison to the Peterson fragmentation function [71] are considered.
The effects of colour reconnection (CR) processes on the top quark final state are estimated by enabling early resonance decays (ERD) in pythia. In the nominal sample, ERD are turned off. Alternative colour reconnection models are considered, such as "gluon move" [72] and "QCD inspired" [73], since they were found to potentially have relevant effects for the measurement of the top quark mass [31].
For the uncertainties related to the background contributions, prior normalization uncertainties of 30% are assumed [74]. The contributions of these uncertainties are small and/or strongly constrained in the fit. For the DY background, separate nuisance parameters are used for each btagged jet category in order to remove the dependence of the fit result on the prediction of the b-tagged jet multiplicity distribution by the DY MC simulation. Similarly, the DY background is given an additional uncertainty of 5, 10, 30, and 50% for events with exactly 0, 1, 2, and 3 or more jets, respectively. The first three numbers are estimated by performing scale variations in W+jets predictions with NLO precision, whereas the last one is assigned conservatively.
In total, 103 uncertainty sources are used in the fit. In Fig. 7, the normalized pulls and constraints for the nuisance parameters related to the modelling uncertainties are shown. For each nuisance parameter, the normalized pull is defined as the difference between the best-fit and the input values, normalized to the pre-fit uncertainty, and the constraint is defined as the ratio of the post-fit to the pre-fit uncertainty. The vast Table 1 The relative uncertainties in σ vis tt and σ tt and their sources, as obtained from the template fit. The uncertainty in the integrated luminosity and the MC statistical uncertainty are determined separately. The individual uncertainties are given without their correlations, which are however accounted for in the total uncertainties. Extrapolation uncertainties only affect σ tt . For these uncertainties, the ± notation is used if a positive variation produces an increase in σ tt , while the ∓ notation is used otherwise  The nuisance parameter for the p T distribution of the top quarks is pulled by one standard deviation. This is expected since it is known that the observed p T distribution of the top quark is softer than predicted by the simulation [68,69].

Cross section measurement
The visible cross section is defined for tt events in the fiducial region with two oppositely charged leptons (electron or muon). Contributions from leptonically decaying τ leptons are included. The leading lepton is required to have p T > 25 GeV, and the subleading lepton must have p T > 20 GeV. Both leptons have to be in the range |η| < 2.4. From the likelihood fit, described in Sect. 4, the visible cross section is measured to be σ vis tt = 25.61 ± 0.05 (stat) ± 0.75 (syst) ± 0.64 (lumi) pb.
Here, the uncertainties denote the statistical uncertainty, the systematic uncertainty, and that coming from the uncertainty in the integrated luminosity. The full list of uncertainties is presented in Table 1.
The total cross section σ tt is obtained by extrapolating the measured visible cross section to the full phase space. As explained in Sect. 4, the extrapolation is described by a multiplicative acceptance correction factor A ℓℓ (see Eq. (1)). The extrapolation uncertainty is determined for each relevant model systematic source j as described in the following: all nuisance parameters except the one under study are fixed to their post-fit values; the nuisance parameter λ j is set to values +1 and − 1, and the variations of A ℓℓ are recorded. The resulting variations of σ tt with respect to the nominal value, obtained with the post-fit value of λ j ,aretakenasthe additional extrapolation uncertainties. The individual uncertainties in σ tt from these sources are summed in quadrature to estimate the total systematic uncertainty, as summarized in Table 1. A fixed value of m MC t = 172.5 GeV is chosen in the simulation, and no uncertainty is assigned.
As shown in Table 1, in comparison to the fiducial cross section, the relative systematic uncertainty in the total cross section is marginally increased. The result is in good agreement with the theoretical calculation at NNLO+NNLL, which predicts a tt cross section of 832 +20 −29 (scale)±35 (PDF+α S ) pb, as described in Sect. 2.
An independent cross section measurement is performed using a simple event-counting method and a more restrictive event selection, following closely the analysis of Ref. [75]. in the e ± μ ∓ channel. In the left column events with zero or three or more b-tagged jets are shown. The middle (right) column shows events with exactly one (two) b-tagged jets. Events with zero, one, two, or three or more additional non-b-tagged jets are shown in the first, second, third, and fourth row, respectively. The hatched bands correspond to the total uncertainty in the sum of the predicted yields. The ratios of data to the sum of the predicted yields are shown in the lower panel of each figure. Here, the solid gray band represents the contribution of the statistical uncertainty The analysis uses events in the e ± μ ∓ channel with at least two jets, at least one of which is b tagged. The cross section is measured to be σ tt = 804±2 (stat)±31 (syst)±20 (lumi) pb, in good agreement with the main result.

Simultaneous measurement of σ tt and m MC t
The analysis is designed such that the dependence of the measured tt cross section on m MC t is small. However, because of the impact of the top quark mass on the simulated detector efficiency and acceptance, the measurement is expected to have a residual dependence on the chosen value of m MC t .In previous measurements, this dependence was determined by repeating the analysis with varied mass values.
Here, the approach proposed in Refs. [5,30]isfollo wed. The value of m MC t is introduced in the fit as an additional free parameter. In the simultaneous fit, σ tt and m MC Fig. 9 Comparison of data (points) and post-fit distributions of the expected signal and backgrounds from simulation (shaded histograms) used in the simultaneous fit of σ tt and m MC t in the e ± μ ∓ channel. In the left column events with zero or three or more b-tagged jets are shown. The middle (right) column shows events with exactly one (two) b-tagged jets. Events with zero, one, two, or three or more additional non-b-tagged jets are shown in the first, second, third, and fourth row, respectively. The hatched bands correspond to the total uncertainty in the sum of the predicted yields and include the contribution from the top quark mass (Δm MC t ). The ratios of data to the sum of the predicted yields are shown in the lower panel of each figure. Here, the solid gray band represents the contribution of the statistical uncertainty tainty therefore account for the dependence on m MC t and can be used, e.g. for the extraction of m t and α S using fixed-order calculations. The value of m MC t , in turn, can be compared to the results of direct measurements using, e.g. kinematic fits [31].
In contrast to the σ tt measurement presented in Sect. 6,the sensitivity of the simultaneous fit to m MC t is maximized by introducing a new observable: the minimum invariant mass m min ℓb , which is defined as the smallest invariant mass found when combining the charged leptons with the b jets in an event. To minimize the impact from background, only the e ± μ ∓ sample is used. The simultaneous fit of σ tt and m MC t is performed in 12 mutually exclusive categories, according to the number of b-tagged jets and of additional non-b-tagged jets in the event. The same observables as in Fig. 4 are used as input to the fit, where the jet p T spectrum is replaced by the m min ℓb distribution in categories with at least one b-tagged jet, as shown in Fig. 8. To construct the templates describing the dependence of the final-state distributions on m MC t , separate MC simulation samplesoftt and tW production are used in which m MC t is varied in the range m MC t = 172.5 ± 3 GeV. The data and MC samples, the event selection, the modelling of the systematic uncertainties, and the fit procedure are identical to those described in Sect. 4. In the simultaneous fit, the same systematic uncertainties are included as in a previous CMS measurement [31]ofthem MC t . The results of the two measurements are thus directly comparable.
Comparisons of the data and the prediction from the MC simulation before and after the fit are presented in Figs. 8 and 9, respectively. Good agreement is found in both cases.
The result of the fit is found to be stable against the choice of the fit distributions, and the introduction of the m min ℓb distribution was confirmed not to alter the final result on σ tt or the behaviour with respect to the nuisance parameters. The procedure is calibrated by performing fits where data is replaced by simulations with different m MC t hypotheses: full closure of the method is obtained and no additional correction is applied. The effect of the statistical uncertainty in the simulation on the fit results is estimated as explained in Sect. 4 and is considered as an additional uncertainty. The results for σ tt and m MC t are σ tt = 815 ± 2 (stat) ± 29 (syst) ± 20 (lumi) pb, m MC t = 172.33 ± 0.14 (stat) +0. 66 −0.72 (syst) GeV.
The value for the cross section is in good agreement with the result obtained for a fixed value of m MC t = 172.5GeV, reported in Sect. 6. The correlation between the two parameters is found to be 12%.
The results of the simultaneous fit to σ tt and m MC t are summarized in Tables 2 and 3, respectively, together with the contribution of each systematic uncertainty to the total uncertainty. Normalized pulls and constraints of the nuisance parameters related to modelling uncertainties are shown in Fig. 10. The nuisance parameters displayed in this figure show similar trends to those in Fig. 7, described above. Here, the constraints on the nuisance parameters tend to be less stringent because only data in the e ± μ ∓ channel are used to determine the two parameters of interest, using mostly the m min ℓb spectra in place of the jet p T distributions within the jet and b-tagged jet categories.
As a cross-check, a measurement of m MC t is performed by fitting a single m min ℓb distribution containing all events with at least one b-tagged jet. The resulting value is m MC t = 171.92 ± 0.13 (stat) +0. 76 −0.77 (syst) GeV. Since the uncorrelated uncertainty with respect to the main result is estimated to be at least 0.54 GeV, which is larger than the difference between the two measurements, the two results are in good agreement.

Extraction of m t and α S (m Z ) in the MS scheme
The cross section value obtained in the simultaneous fit to σ tt and m MC t is used to extract α S (m Z ) and m t in the MS renormalization scheme. For this purpose, the measured and the predicted cross sections are compared via a χ 2 minimization. The χ 2 fit is performed using the open-source QCD analysis framework xFitter [76] and a χ 2 definition from Ref. [77].
The method to determine m t and α S (m Z ) is very similar to the one used in earlier CMS analyses to extract α S (m Z ) using jet cross section measurements, e.g. in Ref. [78].
It is assumed that the measured σ tt is not affected by non-SM physics. The SM theoretical prediction for σ tt at NNLO [6-9] is calculated using the Hathor 2.0 [79]p r ogram, interfaced with xFitter. This is the only available calculation to date that provides the m t definition in the MS scheme. The top quark mass in the MS scheme is denoted by m t (m t ), following the convention of presenting the value of a running coupling at a fixed value. In the calculation, the renormalization and factorization scales, μ r and μ f , are set to m t (m t ). These are varied by a factor of two up and down, independently, avoiding cases where μ f /μ r = 1/4 or 4, in order to estimate the uncertainty due to the missing higherorder corrections (referred to in the following as the scale variation uncertainty).
The values of α S (m Z ) and m t cannot be determined simultaneously, since both parameters alter the predicted σ tt in such a way that any variation of one parameter can be compensated by a variation of the other. In the presented analysis, the values of m t and α S (m Z ) are therefore determined at fixed values of α S (m Z ) and m t , respectively.
The four most recent PDF sets available [80] at NNLO are used:ABMP16nnlo [17],CT14nnlo [53], MMHT14nnlo [81], and NNPDF3.1nnlo [82]. While CT14nnlo does not use any tt data as input, the PDF sets ABMP16nnlo and MMHT14nnlo use measurements of inclusive tt cross sections at the Tevatron and LHC, and NNPDF3.1nnlo makes use of all available inclusive and differential tt cross section measurements. Using the currently available tt measurements has only a marginal effect on a global PDF and α S (m Z ) fit [17,53]. The details of the PDFs relevant for this analysis are summarized in Table 4. In the MMHT14nnlo, CT14nnlo, and NNPDF3.1nnlo PDFs, the value of α S (m Z ) is assumed to be 0.118. In ABMP16nnlo, α S (m Z ) is fitted simultaneously with the PDFs. The ABMP16nnlo PDF employs the MS scheme for the heavy-quark mass treatment in its determination. Similar to the value of α S (m Z ),thevalueofm t (m t ) in the ABMP16nnlo set is obtained in a simultaneous fit with the PDFs. For the other PDFs, the values of m pole t are assumed, as listed in Table 4. Since the analysis is performed in the MS scheme, the assumed m pole t of each PDF is converted into m t (m t ) using the RunDec [83,84] code, according to the prescription by the corresponding PDF group.
For each used PDF set, a series of α S (m Z ) values is provided. The PDF uncertainties for all sets correspond to a 68% confidence level (CL), whereby the uncertainties in the CT14nnlo PDF set are scaled down from 95% CL.
Because of the strong correlation between α S and m t in the prediction of σ tt ,forthem t extraction, the value of α S (m Z ) in the theoretical prediction is set to that of the particular PDF defined as the ratio of the post-fit uncertainty to the pre-fit uncertainty of a given nuisance parameter, while the normalized pull is the difference between the post-fit and the pre-fit values of the nuisance parameter normalized to its pre-fit uncertainty. The horizontal lines at ± 1represent the pre-fit uncertainty set. Similarly, in the theoretical prediction of σ tt used for the α S (m Z ) determination, the value of m t is the one used in the PDF evaluation. The correlation of the values of m t (m t ), α S (m Z ), and the proton PDFs in the prediction of σ tt is also studied.
To extract the value of α S (m Z ) from σ tt , the measured cross section is compared to the theoretical prediction, and for each α S (m Z ) member of each PDF set, the χ 2 is evaluated. In the case of ABMP16nnlo and NNPDF3.1nnlo, the complete set of PDF uncertainties is provided for each member of the α S (m Z ) series and is accounted for in the analysis. The uncertainties in the CT14nnlo and MMHT14nnlo PDFs are evaluated only for the central α S (m Z ) value of 0.118 and are used for each α S (m Z ) variant in the fit. The optimal value of α S (m Z ) is subsequently determined from a parabolic fit of the form to the χ 2 (α S ) values. Here, χ 2 min is the χ 2 value at α S = α min S and δ(α min S ) is the fitted experimental uncertainty in α min S , which also accounts for the PDF uncertainty. The χ 2 (α S ) scan is illustrated in Fig. 11 for the PDF sets used, demonstrating a clear parabolic behaviour. To estimate the scale variation uncertainties, this procedure is repeated with μ r and μ f being varied, and the largest deviations of the resulting values of α min S from that of the central scale choice are considered as the corresponding uncertainties. The values of the α S (m Z ) obtained using different PDFs are listed in Table 5 and shown in Fig. 11. The uncertainties in the measured σ tt and the PDF contribute about equally to the resulting α S (m Z ) uncertainty.  The values of α S (m Z ) obtained using different PDF sets are consistent among each other and are in agreement with the world-average value [29] within the uncertainties, although suggesting a smaller value of α S (m Z ).Thevalueofα S (m Z ) is also in good agreement with the recent result of the analysis in Ref. [85] of jet production in deep-inelastic scattering using the NNLO calculation by the H1 experiment, and is of comparable precision.
The same procedure is used to extract m t (m t ) by fixing α S (m Z ) to the nominal value at which the used PDF is evaluated. The fit is performed by varying m t (m t ) ina5-GeV range around the central value used in each PDF. The uncertainties related to the variation of α S (m Z ) in the PDF are estimated by repeating the fit using the PDF eigenvectors with α S (m Z ) varied within its uncertainty, as provided by NNPDF3.1nnlo, MMHT2014nnlo, and CT14nnlo. In the Table 6 Va l u e s o f m t (m t ) obtained from the comparison of the σ tt measurement with the NNLO predictions using different PDF sets. The first uncertainty shown comes from the experimental, PDF, and α S (m Z ) uncertainties, and the second from the variation in the renormalization and factorization scales PDF set m t (m t ) (GeV) ABMP16 161.6 ± 1.6 (fit + PDF + α S ) +0.1 −1.0 (scale) NNPDF3.1 164.5 ± 1.6 (fit + PDF + α S ) +0.1 −1.0 (scale) CT14 165.0 ± 1.8 (fit + PDF + α S ) +0.1 −1.0 (scale) MMHT14 164.9 ± 1.8 (fit + PDF + α S ) +0.1 −1.1 (scale) case of ABMP16nnlo, the value of α S (m Z ) is a free parameter in the PDF fit and its uncertainty is implicitly included in the ABMP16nnlo PDF uncertainty eigenvectors. The resulting m t (m t ) values are summarized in Table 6, where the fit uncertainty corresponds to the precision of the σ tt measurement. The results obtained with different PDF sets are in agreement, although the ABMP16nnlo PDF set yields a systematically lower value. This difference is expected and has its origin in a larger value of α S (m Z ) = 0.118 assumed in the NNPDF3.1, MMHT2014, and CT14 PDFs. The values of m t (m t ) are in agreement with those originally used in the evaluation of each PDF set. The results are shown in Fig. 12 for the four different PDFs used.
The dependence of the α S (m Z ) result on the assumption on m t (m t ) is investigated for each PDF by performing the χ 2 (α S ) scan for ten values of m t (m t ) varying from 160.5  173.7 ± 2.0 (fit + PDF + α S ) +0.9 −1.4 (scale) MMHT14 173.6 ± 1.9 (fit + PDF + α S ) +0.9 −1.4 (scale) to 165.0 GeV. A linear dependence is observed, as shown in Fig. 13.

Extraction of m t in the pole mass scheme
The extraction of m t is repeated in the pole mass scheme using the Top++ 2.0 program [52], which employs the calculation of σ tt at NNLO, improved by the NNLL soft-gluon resummation. The results are summarized in Table 7.T h e scale variation uncertainties are estimated in the same way as in the case of the m t (m t ) extraction. These uncertainties are larger than those determined in the MS scheme. This is because of the better convergence of the perturbative series when using the MS renormalization scheme in the calculation of σ tt .

Summary
A measurement of the top quark-antiquark pair production cross section σ tt by the CMS Collaboration in protonproton collisions at a centre-of-mass energy of 13 TeV is presented, corresponding to an integrated luminosity of 35.9fb −1 . Assuming a top quark mass in the simulation of m MC t = 172.5 GeV, a visible cross section is measured in the fiducial region using dilepton events (e ± μ ∓ , μ + μ − ,e + e − ) and then extrapolated to the full phase space. The total tt production cross section is found to be σ tt = 803 ± 2(stat)± 25 (syst) ± 20 (lumi) pb. The measurement is in good agreement with the theoretical prediction calculated to next-tonext-to-leading order in perturbative QCD, including softgluon resummation to next-to-next-to-leading logarithm.
The measurement is repeated including the top quark mass in the powheg simulation as an additional free parameter in the fit. The sensitivity to m MC t is maximized by fitting the minimum invariant mass found when combining the charged leptons with the b jets in an event. This yields a cross section of σ tt = 815 ± 2(stat) ± 29 (syst) ± 20 (lumi) pb and a value of m MC t = 172.33 ± 0.14 (stat) +0. 66 −0.72 (syst) GeV, in good agreement with previous measurements. The value of σ tt obtained in the simultaneous fit is further used to extract the values of the top quark mass and the strong coupling constant at next-to-next-to-leading order in the minimal subtraction renormalization scheme, as well as the value of the top quark pole mass for different sets of parton distribution functions.
Acknowledgements We congratulate our colleagues in the CERN accelerator departments for the excellent performance of the LHC and thank the technical and administrative staffs at CERN and at other CMS institutes for their contributions to the success of the CMS effort. In addition, we gratefully acknowledge the computing centres and personnel of the Worldwide LHC Computing Grid for delivering so effectively the computing infrastructure essential to our analyses. Finally, we acknowledge the enduring support for the construction and operation of the LHC and the CMS detector provided by the following funding agencies: BMBWF

Data Availability Statement
This manuscript has no associated data or the data will not be deposited . [Authors' comment: Release and preservation of data used by the CMS Collaboration as the basis for publications is guided by the CMS policy as written in its document "CMS data preservation, re-use and open access policy" (https://cms-docdb.cern. ch/cgi-bin/PublicDocDB/RetrieveFile?docid=6032&filename=CMSD ataPolicyV1.2.pdf&version=2).] Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecomm ons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. Funded by SCOAP 3 .