1 Introduction

The CERN LHC has provided a large sample of proton–proton (\({\text {p}\text {p}}\)) collisions containing events with a vector boson (V) accompanied by one or more jets originating from heavy-flavour quarks (\(\text {V}+\text {HF jets}\)). Precise measurements of \(\text {V}+\text {HF jets}\) observables can be used to test theoretical calculations of these processes and the modelling of \(\text {V}+\text {HF jets}\) events in the currently available Monte Carlo (MC) event generator programs.

Measurements of \(\text {V}+\text {HF jets}\) production also provide new input to the determination of the quark content of the proton. This information constrains the proton parton distribution functions (PDFs), a ubiquitous ingredient in many data analyses at LHC, and still an important source of systematic uncertainty (see e.g. Ref. [1] for a recent review). In this context, the measurements of the associated production of a W boson and a charm (\({\text {c}}\)) quark (\(\text {W}+{\text {c}}\) production) in proton–proton collisions at the LHC at \(\sqrt{s}=8\,\text {TeV} \) presented in this paper provide new valuable information.

Measurements of \(\text {W}+{\text {c}}\) production in hadronic collisions at the\(\,\text {TeV}\) scale were performed at the Tevatron by the CDF [2, 3] and D0 [4] Collaborations. The \(\text {W}+{\text {c}}\) process has been studied in \({\text {p}\text {p}}\) collisions at the LHC at centre-of-mass energies of 7, 8 and 13\(\,\text {TeV}\) by the CMS [5, 6], ATLAS [7], and LHCb [8] experiments.

For the CMS measurement at \(\sqrt{s}=7\,\text {TeV} \) with integrated luminosity of about 5\(\,\text {fb}^{-1}\), \(\text {W}+{\text {c}}\) candidates are identified through exclusive or semileptonic decays of charm hadrons inside a jet with transverse momentum of the jet larger than 25\(\,\text {GeV}\). The ATLAS analysis at the same centre-of-mass energy and similar integrated luminosity tags \(\text {W}+{\text {c}}\) events either by the presence of a muon from a semileptonic charm decay within a hadronic jet with transverse momentum larger than 25\(\,\text {GeV}\) or by the reconstruction of a charm hadron exclusive decay with transverse momentum of the \(\text {D}^{(*)\pm } \) candidate above 8\(\,\text {GeV}\). The CMS analysis at \(\sqrt{s}=13\,\text {TeV} \) with an integrated luminosity of 35.7\(\,\text {fb}^{-1}\), uses the \({\text {D}}^{*+} \rightarrow \text {D}^{0} \uppi ^{+} \) with \(\text {D}^{0} \rightarrow \text {K} ^{-} \uppi ^{+} \) (plus the charge conjugated process) exclusive decay with transverse momentum of the \(\text {D}^{*\pm } \) candidate above 5\(\,\text {GeV}\). The LHCb measurement is based on integrated luminosities of 1 (2)\(\,\text {fb}^{-1}\) at \(\sqrt{s}=7~(8) \,\text {TeV} \), and uses tagging algorithms based on Boosted Decision Trees for the identification of \({\text {c}}\) jets in conjunction with \(\text {b} \) jets.

We present in this paper the first measurement of the \(\text {W}+{\text {c}}\) production cross section at \(\sqrt{s}=8\,\text {TeV} \) in the central region. The W boson is identified by a high transverse momentum isolated lepton (\(\text {e}, \upmu \)) coming from its leptonic decay. Fiducial cross sections are measured, both inclusively and differentially as functions of the absolute value of the pseudorapidity (\(\eta ^{\ell }\)) and, for the first time, the transverse momentum (\(p_{\textrm{T}} ^{\ell }\)) of the lepton from the W boson decay. Jets containing a \({\text {c}}\) quark are identified in two ways: (i) the identification of a muon inside the jet that comes from the semileptonic decay of a \({\text {c}}\) flavoured hadron, and (ii) a secondary vertex arising from a visible charm hadron decay. The secondary-vertex \({\text {c}}\text { jet} \) identification method, also newly introduced in this analysis, provides a large sample of \(\text {W}+{\text {c}}\) candidates. Measurements obtained in these four channels (e and \({\upmu }\) decay of W boson, \({\text {c}}\text { jet} \) with muon or secondary vertex) are combined, resulting in reduced systematic uncertainties compared with previous CMS measurements.

The study of \(\text {W}+{\text {c}}\) production at the LHC provides direct access to the strange quark content of the proton at the W boson mass energy scale [9]. The sensitivity comes from the dominance of the \(\bar{\text {s}} \text {g} \rightarrow {\text {W}}^{+} +\bar{{\text {c}}} \) and \(\text {s} \text {g} \rightarrow {\text {W}}^{-} +{\text {c}}\) contributions in the hard process, as depicted in Fig. 1. The inclusion of strangeness-sensitive LHC measurements in global analyses of the proton PDFs has led to a significant reduction of the uncertainty in the strange quark PDF [10]. The contribution of additional LHC \(\text {W}+{\text {c}}\) measurements will provide valuable input to further constrain the strange quark content of the proton.

Fig. 1
figure 1

Leading order diagrams for the associated production of a W boson and a charm (anti)quark

A key property of \(\text {W}+{\text {c}}\) production is the opposite sign (OS) of the electric charges of the W boson and the \({\text {c}}\) quark. Gluon splitting processes like \(\text {q} \bar{\text {q}} ^\prime \rightarrow \text {W}+\text {g} \rightarrow \) \(\text {W}+{\text {c}}\bar{{\text {c}}} \) also give rise to final states with an OS W boson and a \({\text {c}}\) quark (antiquark), but with an additional \({\text {c}}\) antiquark (quark) of the same sign (SS) electric charge as that of the W boson. In most of the background processes, it is equally probable to select events with OS electric charges as with SS, whereas \({\text {q} \text {g} \rightarrow \text {W}+{\text {c}}}\) only yields OS events. Furthermore, distributions of the physical observables of OS and SS background events are expected to be the same, thus, the statistical subtraction of OS and SS distributions leads to an effective removal of these charge-symmetric backgrounds. This technique is referred to in the paper as \(\text {OS--SS}\) subtraction. In the present analysis, the electric charges of the lepton from the W boson decay and the muon (or that assigned to the secondary vertex) inside the \({\text {c}}\text { jet} \) are used to perform the \(\text {OS--SS}\) subtraction procedure.

The product of the cross sections and branching fraction \(\sigma (\text {p}\text {p}\rightarrow {\text {W}}^{+} +\bar{{\text {c}}} +\text {X}){\mathcal {B}}({\text {W}}^{+} \rightarrow \ell ^+\upnu )\), \(\sigma (\text {p}\text {p}\rightarrow {\text {W}}^{-} +{\text {c}}+\text {X}){\mathcal {B}}({\text {W}}^{-} \rightarrow \ell ^-{\bar{\upnu }})\), their sum \(\sigma (\text {p}\text {p}\rightarrow \text {W}+{\text {c}}+\text {X}){\mathcal {B}}(\text {W}\rightarrow \ell \upnu )\), and the cross section ratio \(\sigma (\text {p}\text {p}\rightarrow {\text {W}}^{+} +\bar{{\text {c}}} +\text {X})/\sigma (\text {p}\text {p}\rightarrow {\text {W}}^{-} +{\text {c}}+\text {X})\), are measured at \(\sqrt{s}=8\,\text {TeV} \). They are abbreviated as \(\sigma ({\text {W}}^{+} +\bar{{\text {c}}})\), \(\sigma ({\text {W}}^{-} +{\text {c}})\), \(\sigma (\text {W}+{\text {c}})\), and \(R_{{\text {c}}}^{\pm }\). The cross sections and cross section ratio are measured at the parton level in a fiducial region of phase space defined in terms of the kinematics of the lepton from the W boson (\(p_{\textrm{T}} ^{\ell }> 30\,\text {GeV} \), and \(|\eta ^{\ell } | < 2.1\)), and the \({\text {c}}\) quark (\(p_{\textrm{T}} ^{{\text {c}}} > 25\,\text {GeV} \) and \(|\eta ^{{\text {c}}} | < 2.5\)) with a separation between the \({\text {c}}\) quark and the lepton \(\varDelta R({{\text {c}}},\ell )= \sqrt{\smash [b]{(\varDelta \eta )^2 +(\varDelta \phi )^2}} > 0.5\). The cross sections and cross section ratio are also measured differentially as functions of \(|\eta ^{\ell } |\) and \(p_{\textrm{T}} ^{\ell }\).

The paper is structured as follows: the CMS detector is briefly described in Sect. 2, and the data and simulated samples used are presented in Sect. 3. Section 4 presents the selection of the signal sample. Section 5 reviews the sources of systematic uncertainties and their impact on the measurements. The measurements of the fiducial \(\text {W}+{\text {c}}\) cross section and \(R_{{\text {c}}}^{\pm }\) are detailed in Sect. 6, the differential measurements are reported in Sect. 7, and a comparison with theoretical predictions is presented in Sect. 8. The details of the QCD analysis are described in Sect. 9. Finally, the main results of the paper are summarized in Sect. 10.

Tabulated results are provided in the HEPData record for this analysis [11].

2 The CMS detector 

The central feature of the CMS apparatus is a superconducting solenoid of \(6 \hbox { m}\) internal diameter, providing a magnetic field of \(3.8 \hbox { T}\). Within the solenoid volume are a silicon pixel and strip tracker, a lead tungstate crystal electromagnetic calorimeter (ECAL), and a brass and scintillator hadron calorimeter, each composed of a barrel and two endcap sections. Extensive forward calorimetry complements the coverage provided by the barrel and endcap detectors. The silicon tracker measures charged particles within the pseudorapidity range \(|\eta |< 2.5\). It consists of 1440 silicon pixel and 15 148 silicon strip detector modules. For particles of \(1< p_{\textrm{T}} < 10\,\text {GeV} \) and \(|\eta | < 1.4\), the track resolutions are typically 1.5% in \(p_{\textrm{T}} \) and 25–90 (45–150)\(\mu \hbox {m}\) in the transverse (longitudinal) impact parameter [12]. The electron momentum is estimated by combining the energy measurement in the ECAL with the momentum measurement in the tracker. The momentum resolution for electrons with \(p_{\textrm{T}} \approx 45\,\text {GeV} \) from \(\text {Z} \rightarrow \text {e}^{+} {}\text {e}^{-} \) decays ranges from 1.7% for nonshowering electrons in the barrel region to 4.5% for showering electrons in the endcaps [13]. Muons are measured in the pseudorapidity range \(|\eta |< 2.4\), using three technologies: drift tubes, cathode strip chambers, and resistive plate chambers. Matching muons to tracks measured in the silicon tracker results in a relative transverse momentum resolution for muons with \(20<p_{\textrm{T}} <100\,\text {GeV} \) of 1.3–2.0% in the barrel and better than 6% in the endcaps. The \(p_{\textrm{T}}\) resolution in the barrel is better than 10% for muons with \(p_{\textrm{T}}\) up to 1\(\,\text {TeV}\) [14]. For muons with \(1<p_{\textrm{T}} <25\,\text {GeV} \), the relative transverse momentum resolution is 1.2–1.7% in the barrel and 2.5–4.0% in the endcaps [12]. Events of interest are selected using a two-tiered trigger system [15]. The first level, composed of custom hardware processors, uses information from the calorimeters and muon detectors to select events at a rate of around \(100\hbox { kHz}\) within a fixed latency of about \(4\mu \hbox {s}\). The second level, known as the high-level trigger, consists of a farm of processors running a version of the full event reconstruction software optimized for fast processing, and reduces the event rate to around \(1\hbox { kHz}\) before data storage. A more detailed description of the CMS detector, together with a definition of the coordinate system used and the basic kinematic variables, can be found in Ref. [16].

3 Data and simulated samples 

The data were collected by the CMS experiment during 2012 in \({\text {p}\text {p}}\) collisions at a centre-of-mass energy of 8\(\,\text {TeV}\) with an integrated luminosity of 19.7\(\,\text {fb}^{-1}\).

Samples of simulated events are produced with MC event generators, both for the signal process and for the main backgrounds. They are normalized to the integrated luminosity of the data sample using their respective cross sections. A sample of \(\text {W}+\text {jets}\) events is generated with MadGraph v5.1.3.30 [17], interfaced with pythia v6.4.26 [18] for parton showering and hadronization using the MLM [19, 20] jet matching scheme. The MadGraph generator produces parton-level events with a vector boson and up to four partons on the basis of a leading order (LO) matrix-element calculation. The generator uses the parton distribution function (PDF) set CTEQ6L [21], which is reweighted to the next-to-next-to-leading-order (NNLO) PDF set MSTW2008NNLO [22]. A sample of \(\text {Z} +\text {jets}\) events, which includes the exchange of a virtual photon, is generated with MadGraph interfaced with pythia6 with the same conditions as for the \(\text {W}+\text {jets}\) event sample. They are normalized to the inclusive \(\text {W}\) and \(\text {Z} \) production cross sections evaluated at NNLO with fewz 3.1 [23], using the MSTW2008NNLO PDF set.

Background samples of top (t) quark events (\({\hbox {t} {}{\bar{\hbox {t}}}} \) and single top) are generated at next-to-leading-order (NLO) with powheg v1.0 [24,25,26,27], interfaced with pythia6 and using the CT10 [28] PDF set. The \({\hbox {t} {}{\bar{\hbox {t}}}} \) cross section is taken at NNLO from Ref. [29]. The t-channel single-top cross section is calculated at NLO with Hathor v2.1 [30, 31] and the \(\hbox {t} \) \(\text {W}\) and s-channel cross sections are taken at NNLO from Ref. [32]. Diboson (VV) production (\(\text {W}\text {W}\), \(\text {W}\text {Z} \), and \(\text {Z} \text {Z} \) processes) is modelled with samples of events generated with pythia6 and the CTEQ6L1 PDF set. Their cross sections are evaluated at NLO with mcfm 6.6 [33], using the MSTW2008NLO PDF set. For all simulations, the pythia6 parameters for the underlying event modelling are set to the Z2\(^{*}\) tune [34, 35]. Final state QED radiation is modelled by pythia6.

Simulated events are weighted to correct the charm quark fragmentation fractions into the weakly decaying hadrons \(\text {D}^{\pm }\), \(\text {D}^{0}\)/\(\bar{\text {D}}^{0}\), \(\text {D}_{\textrm{s}}^{\pm } \) and \(\varLambda _{{\text {c}}}^\pm \) in pythia6, to match the combination of measurements given in Ref. [36]. An additional event weight correcting the decay branching fractions larger than 1% of \(\text {D}^{0}\) /\(\bar{\text {D}}^{0}\) and \(\text {D}^{\pm }\) mesons is introduced to make them agree with more recent values [37, 38]. These decay modes altogether represent about 70% of the total \(\text {D}^{0}\) /\(\bar{\text {D}}^{0}\) and \(\text {D}^{\pm }\) decay rate. The remaining \(\text {D}^{0}\) /\(\bar{\text {D}}^{0}\) and \(\text {D}^{\pm }\) decay modes are globally adjusted to keep the normalization of the decay branching fractions to unity. The \(\text {D}^{0}\) /\(\bar{\text {D}}^{0}\) and \(\text {D}^{\pm }\) mesons constitute about 80% of the total number of produced charm hadrons, thus approximately 56% of the charm sample is corrected by this adjustment.

Generated events are processed through a Geant4-based [39] CMS detector simulation and trigger emulation. Simulated events are then reconstructed using the same algorithms used to reconstruct collision data.

The simulated samples incorporate additional \({\text {p}\text {p}}\) interactions in the same bunch crossing (pileup) to reproduce the experimental conditions. Simulated events are weighted so that the pileup distribution matches the measured one, with an average of about 21 \({\text {p}\text {p}}\) interactions per bunch crossing.

The simulated trigger, reconstruction, and selection efficiencies are corrected to match those observed in the data. Lepton efficiencies (\(\epsilon _{\ell }\)) are evaluated with data samples of dilepton events in the Z boson mass peak with the “tag-and-probe” method [40], and correction factors \(\epsilon _{\ell }^\text {data}/\epsilon _{\ell }^{\textrm{MC}}\), binned in \(p_{\textrm{T}} \) and \(\eta \) of the leptons, are computed. These corrections are typically close to 1% for muons and 3% for electrons, with no relevant dependence on the \(p_{\textrm{T}} \) and \(\eta \) of the lepton.

The simulated signal sample is composed of W bosons accompanied by jets originating from \(\text {b} \), \({\text {c}}\), and light quarks (or antiquarks) and gluons. Simulated \(\text {W}+\text {jets}\) events are classified according to the flavour of the generated partons. A \(\text {W}+\text {jets}\) event is categorized as \(\text {W}+{\text {c}}\) if a single charm quark with \(p_{\textrm{T}} >15\,\text {GeV} \) is generated in the hard process. Otherwise, it is classified as \(\text {W}+ \text {b} \) if at least one \(\text {b} \) quark with \(p_{\textrm{T}} >15\,\text {GeV} \) is generated. Remaining events are labelled as \(\text {W}+ {\text {c}}{}\bar{{\text {c}}} \) if at least a \({\text {c}}\bar{{\text {c}}} \) quark–antiquark pair is present in the event, or as \(\text {W}\,+ \,{\text {u} \text {d} \text {s} \text {g}} \) if no \({\text {c}}\) or \(\text {b} \) quarks are produced. The contribution from the \(\text {W}+ {\text {c}}{}\bar{{\text {c}}} \) process is expected to vanish after \(\text {OS--SS}\) subtraction.

4 Event reconstruction and selection 

Jets, missing transverse momentum, and related quantities are determined using the CMS particle-flow (PF) reconstruction algorithm [41], which aims to reconstruct and identify each individual particle in an event, with an optimized combination of information from the various elements of the CMS detector.

Jets are built from PF candidates using the anti-\(k_{\textrm{T}}\) clustering algorithm [42, 43] with a distance parameter \(R = 0.5\). The energy and momentum of the jets are corrected, as a function of the jet \(p_{\textrm{T}} \) and \(\eta \), to account for the nonlinear response of the calorimeters and for the presence of pileup interactions [44, 45]. Jet energy corrections are derived using samples of simulated events and further adjusted using dijet, photon+jet, and Z+jet events in data.

Electron and muon candidates are reconstructed following standard CMS procedures [13, 14]. The missing transverse momentum vector \({\vec p}_{\textrm{T}}^{\text {miss}} \) is the projection of the negative vector sum of the momenta, onto the plane perpendicular to the beams, of all the PF candidates. The \({\vec p}_{\textrm{T}}^{\text {miss}} \) is modified to include corrections to the energy scale of the reconstructed jets in the event. The missing transverse momentum, \(p_{\textrm{T}} ^\text {miss} \), is defined as the magnitude of the \({\vec p}_{\textrm{T}}^{\text {miss}} \) vector, and it is a measure of the transverse momentum of particles leaving the detector undetected [46].

The primary vertex of the event, representing the hard interaction, is selected among the reconstructed vertices as the one with the highest sum of the transverse momenta squared of the tracks associated with it.

4.1 Selection of \(\text {W}\) boson events 

Events with a high-\(p_{\textrm{T}} \) lepton from the W boson decay are selected online by a trigger algorithm that requires the presence of an electron with \(p_{\textrm{T}} > 27\,\text {GeV} \) or a muon with \(p_{\textrm{T}} >24\,\text {GeV} \). The analysis follows the selection criteria used in Ref. [47] and requires the presence of a high-\(p_{\textrm{T}} \) isolated lepton in the pseudorapidity region \(|\eta | < 2.1\). The \(p_{\textrm{T}} \) of the lepton must exceed 30\(\,\text {GeV}\).

The combined isolation \(I_{\text {comb}}\) is used to quantify the additional hadronic activity around the selected leptons. It is defined as the sum of the transverse momentum of neutral hadrons, photons and the \(p_{\textrm{T}} \) of charged hadrons in a cone with \(\varDelta R = \sqrt{\smash [b]{(\varDelta \eta )^2 +(\varDelta \phi )^2}}<0.3\) (0.4) around the electron (muon) candidate, excluding the contribution from the lepton itself. Only charged particles originating from the primary vertex are considered in the sum to minimize the contribution from pileup interactions. The contribution of neutral particles from pileup vertices is estimated and subtracted from \(I_{\text {comb}}\). For electrons, this contribution is evaluated with the jet area method described in Ref. [48]; for muons, it is taken to be half the sum of the \(p_{\textrm{T}} \) of all charged particles in the cone originating from pileup vertices. The factor one half accounts for the expected ratio of neutral to charged particle production in hadronic interactions. The electron (muon) candidate is considered to be isolated when \(I_{\text {comb}}/p_{\textrm{T}} ^{\ell }< 0.15\) (0.12). Events with a second isolated lepton with \(p_{\textrm{T}} ^{\ell }>20\,\text {GeV} \) and \(|\eta | < 2.1\), and opposite charge to the lepton from the W candidate are discarded to reduce the contribution from \(\text {Z} +\text {jets}\) and \({\hbox {t} {}{\bar{\hbox {t}}}} \) events.

The transverse mass (\(m_{\textrm{T}} \)) of the lepton and \({\vec p}_{\textrm{T}}^{\text {miss}} \) is defined as,

$$\begin{aligned} m_{\textrm{T}} \equiv \sqrt{{2~p_{\textrm{T}} ^{\ell }~p_{\textrm{T}} ^\text {miss} ~[1-\cos (\phi _\ell -\phi _{p_{\textrm{T}} ^\text {miss}})]}}, \end{aligned}$$

where \(\phi _\ell \) and \(\phi _{p_{\textrm{T}} ^\text {miss}}\) are the azimuthal angles of the lepton momentum and the \({\vec p}_{\textrm{T}}^{\text {miss}} \) vector, respectively. Events with \(m_{\textrm{T}} < 55\,\text {GeV} \) are discarded from the analysis to suppress the contamination from QCD multijet events. The remaining contribution after \(\text {OS--SS}\) subtraction is negligible.

4.2 Selection of \(\text {W}+{\text {c}}\) events 

A \(\text {W}+\text {jets}\) sample is selected from the sample of W boson events by additionally requiring the presence of at least one jet with transverse momentum (\(p_{\textrm{T}} ^{\text {jet}}\)) larger than 25\(\,\text {GeV}\) in the pseudorapidity region \(|\eta ^{\text {jet}} |<2.5\). Jets are not selected if they have a separation \(\varDelta R ({\text {jet}},\ell )\) \(<0.5\) in the \(\eta \)-\(\phi \) space between the jet axis and the selected isolated lepton.

Hadrons with \({\text {c}}\) quark content decay weakly with lifetimes of the order of \(10^{-12}\hbox {s}\) and mean decay lengths larger than \(100 {\upmu \hbox {m}}\) at the LHC energies. Secondary vertices well separated from the primary vertex are reconstructed from the tracks of their charged decay products. In a sizeable fraction of the decays (\({\approx }\) 10–15% [38]) there is a muon in the final state. We make use of these properties and focus on the following two signatures to identify jets originating from a \({\text {c}}\) quark:

\(\centerdot \):

Semileptonic (SL) channel, a well-identified muon inside the jet coming from the semileptonic decay of a charm hadron.

\(\centerdot \):

Secondary vertex (SV) channel, a reconstructed displaced secondary vertex inside the jet.

When an event fulfils the selection requirements of both topologies, it is assigned to the SL channel, which has a higher purity. Thus, the SL and the SV categories are mutually exclusive, i.e., the samples selected in each channel are statistically independent. The event selection process is summarized in Table 1 for the four analysis categories, the \(\text {W}\) boson decay channels to electron or muon, and the SL and SV charm identification channels.

These two signatures are also features of weakly decaying \(\text {b} \) hadrons. Events from physical processes producing \(\text {b} \) jets accompanied by a W boson will be abundantly selected in the two categories. The most important source of background events is \({\hbox {t} {}{\bar{\hbox {t}}}} \) production, where a pair of W bosons and two \(\text {b} \) jets are produced in the decay of the top quark–antiquark pair. This final state mimics the analysis topology when at least one of the W bosons decays leptonically, and there is an identified muon or a reconstructed secondary vertex inside one of the \(\text {b} \) jets. However, this background is effectively suppressed by the \(\text {OS--SS}\) subtraction. The chance to identify a muon or a secondary vertex inside the \(\text {b} \) jet with opposite or same charge than the charge of the W candidate is identical, thus delivering an equal number of OS and SS events.

Table 1 Summary of the selection requirements for the four analysis categories

Top quark–antiquark events where one of the W bosons decays hadronically into a \({\text {c}}\bar{\text {s}} \) (or \(\bar{{\text {c}}} \text {s} \)) quark–antiquark pair may result in additional event candidates if the SL or SV signature originates from the \({\text {c}}\) jet. This topology produces real OS events, which contribute to an additional background after \(\text {OS--SS}\) subtraction. Similarly, single top quark production also produces real OS events, but at a lower level because of the smaller production cross section.

The production of a W boson and a single \(\text {b} \) quark through the process \(\text {q} \text {g} \rightarrow \text {W}+ \text {b} \), similar to the one sketched in Fig. 1, produces actual OS events, but it is heavily Cabibbo-suppressed and its contribution to the analysis is negligible. The other source of a W boson and a \(\text {b} \) quark is \(\text {W}+ \text {b} {}{\bar{\text {b}}} \) events where the \(\text {b} {}{\bar{\text {b}}} \) pair originates from gluon splitting and only one of the two \(\text {b} \) jets is identified. These events are also charge symmetric as it is equally likely to identify the \(\text {b} \) jet with the same or opposite charge than that of the W boson and its contribution cancels out after the \(\text {OS--SS}\) subtraction.

4.2.1 Event selection in the SL channel 

The \(\text {W}+{\text {c}}\) events with a semileptonic charm hadron decay are identified by a reconstructed muon among the constituents of any of the selected jets. The muon candidate has to satisfy the same reconstruction and identification quality criteria as those imposed on the muons from the W boson decay, has to be reconstructed in the region \(|\eta | < 2.1\) with \(p_{\textrm{T}} ^{\upmu }<25\,\text {GeV} \) and \(p_{\textrm{T}} ^{\upmu }/p_{\textrm{T}} ^{\text {jet}}<0.6\), and it must not be isolated from hadron activity, \(I_{\text {comb}}/p_{\textrm{T}} ^{\upmu }>0.2\). No minimum \(p_{\textrm{T}} \) threshold is explicitly required, but the muon reconstruction algorithm sets a natural threshold around 3\(\,\text {GeV}\) (2\(\,\text {GeV}\)) in the barrel (endcap) region, since the muon must traverse the material in front of the muon detector and travel deep enough into the muon system to be reconstructed and satisfy the identification criteria. If more than one such muon is identified, the one with the highest \(p_{\textrm{T}} \) is selected. The electric charges of the muon in the jet and the lepton from the W boson decay determine whether the event is treated as OS or SS. Semileptonic decays into electrons are not selected because of the high background in identifying electrons inside jets.

Additional requirements are applied for the event selection in the \(\text {W}\rightarrow {\upmu }{\upnu }\) channel, because the selected sample is affected by a sizeable contamination from dimuon \(\text {Z} +\text {jets}\) events. Events with a dimuon invariant mass close to the Z boson mass peak (\(70<m_{\upmu \upmu }<110\,\text {GeV} \)) are discarded. Furthermore, the invariant mass of the muon pair must be larger than 12\(\,\text {GeV}\) to suppress the background from low-mass resonances.

Finally, if the muon in the jet candidate comes from a semileptonic decay of a charm hadron, its associated track is expected to have a significant impact parameter, defined as the projection in the transverse plane of the vector between the primary vertex and the muon trajectory at its point of closest approach. To further reduce the \(\text {Z} +\text {jets}\) contamination in the \(\text {W}\rightarrow {\upmu }{\upnu }\) channel, we require the impact parameter significance (IPS) of the muon in the jet, defined as the muon impact parameter divided by its uncertainty, to be larger than 1.

The above procedure results in an event yield of \(52\,179 \pm 451\) (\(32\,071 \pm 315\)), after \(\text {OS--SS}\) subtraction, in the \(\text {W}\rightarrow \text {e}\upnu \) (\(\text {W}\rightarrow {\upmu }{\upnu }\)) channel where the quoted uncertainty is statistical. The smaller yield in the \(\text {W}\rightarrow {\upmu }{\upnu }\) channel is mainly due to the requirement on the IPS of the muon inside the jet, which is solely applied to this channel. Table 2 shows the flavour composition of the selected sample according to simulation. The fraction of \(\text {W}+{\text {c}}\) signal events is around 80%. The dominant background arises from \({\hbox {t} {}{\bar{\hbox {t}}}} \) production (around 8%), where one of the W bosons produced in the decay of the top quark pair decays leptonically and the other hadronically with a \({\text {c}}\) quark in the final state. The contribution from \({\hbox {t} {}{\bar{\hbox {t}}}} \) events where one of the top quarks is out of the acceptance of the detector is estimated with the simulated sample to be negligible. Figure 2 shows the distributions after \(\text {OS--SS}\) subtraction of the IPS (left) and \(p_{\textrm{T}} \) (right) of the muon inside the jet for events in the selected sample. The difference between data and simulation in the high-\(p_{\textrm{T}} \) region in Fig. 2, right (\(p_{\textrm{T}} \gtrsim 20\,\text {GeV} \)), is related to a similar behaviour observed in the \(p_{\textrm{T}} ^{\upmu }\)/\(p_{\textrm{T}} ^{\text {jet}}\) distribution. Differences are significantly reduced by reweighting the simulation with weights extracted from the \(p_{\textrm{T}} ^{\upmu }\)/\(p_{\textrm{T}} ^{\text {jet}}\) distribution to make the corresponding simulation description match the data.

Fig. 2
figure 2

Distributions after \(\text {OS--SS}\) subtraction of the impact parameter significance, IPS, (left) and \(p_{\textrm{T}} \) (right), of the muon inside the \({\text {c}}\text { jet} \) for events in the SL sample, summing up the contributions of the two W boson decay channels. The IPS distribution is shown after all selection requirements except the one on this variable. The last bin of the distribution includes all events with \(\text {IPS}>7.5\). The \(p_{\textrm{T}} \) distribution includes the selection requirement \(\text {IPS}>1.0\) for the \(\text {W}\rightarrow {\upmu }{\upnu }\) channel. The contributions of the various processes are estimated with the simulated samples. Vertical bars on data points represent statistical uncertainty in the data. The hatched areas represent the sum in quadrature of statistical and systematic uncertainties in the MC simulation. The ratio of data to simulation is shown in the lower panels. The uncertainty band in the ratio includes the statistical uncertainty in the data, and the statistical and systematic uncertainties in the MC simulation

Table 2 Simulated flavour composition (in %) of the SL sample after the selection summarized in Table 1 and \(\text {OS--SS}\) subtraction, for the electron and muon decay channels of the W boson. \(\text {W}+ {\text {Q} \bar{\text {Q}}} \) is the sum of the contributions of \(\text {W}+ {\text {c}}{}\bar{{\text {c}}} \) and \(\text {W}+ \text {b} {}{\bar{\text {b}}} \); its negative value is an effect of the OS–SS subtraction. Quoted uncertainties are statistical only

4.2.2 Event selection in the SV channel 

An independent \(\text {W}+{\text {c}}\) sample is selected looking for secondary decay vertices of charm hadrons within the reconstructed jets. Displaced secondary vertices are reconstructed with either the simple secondary vertex (SSV) [49] or the inclusive vertex finder (IVF) [50, 51] algorithms. Both algorithms follow the adaptive vertex fitter technique [52] to construct a secondary vertex, but differ in the tracks used. The SSV algorithm takes as input the tracks constituting the jet; the IVF algorithm starts from a displaced track with respect to the primary vertex (seed track) and tries to build a vertex from nearby tracks in terms of their separation distance in three dimensions and their angular separation around the seed track. IVF vertices are then associated with the closest jet in a cone of \(\varDelta R=0.3\). Tracks used for the reconstruction of both secondary vertices must have \(p_{\textrm{T}} >1\,\text {GeV} \) to avoid misreconstructed or poorly reconstructed tracks.

If there are several jets with a secondary vertex, only the jet with the highest transverse momentum is selected. If more than one secondary vertex within a jet is reconstructed, the one with the highest transverse momentum, computed from its associated tracks, is considered.

To ensure that the secondary vertex is well separated from the primary one, we require the secondary-vertex displacement significance, defined as the three dimensional (3D) distance between the primary and the secondary vertices, divided by its uncertainty, to be larger than 3.5.

We define the corrected secondary-vertex mass, \(m_\text {SV}^\text {corr}\), as the invariant mass of all charged particles associated with the secondary vertex, assumed to be pions, \(m_\text {SV}\), corrected for additional particles, either charged or neutral, that may have been produced but were not reconstructed [53]:

$$\begin{aligned} m_\text {SV}^\text {corr} = \sqrt{m^2_\text {SV} + p^2_\text {SV} \sin ^2 \theta } + p_\text {SV} \sin \theta , \end{aligned}$$

where \(p_\text {SV}\) is the modulus of the vectorial sum of the momenta of all charged particles associated with the secondary vertex, and \(\theta \) is the angle between the momentum vector sum and the vector from the primary to the secondary vertex. The corrected secondary-vertex mass is thus, the minimum mass the long-lived hadron can have that is consistent with the direction of flight. To reduce the contamination of jets not produced by the hadronization of a heavy-flavour quark (light-flavour jet background), \(m_\text {SV}^\text {corr}\) must be larger than 0.55\(\,\text {GeV}\).

Vertices reconstructed with the IVF algorithm are considered first. If no IVF vertex is selected, SSV vertices are searched for, thus providing additional event candidates.

For charged charm hadrons, the sum of the charges of the decay products reflects the charge of the \({\text {c}}\) quark. For neutral charm hadrons, the charge of the closest hadron produced in the fragmentation process can indicate the charge of the \({\text {c}}\) quark [54, 55]. Hence, to classify the event as OS or SS, we scrutinize the charge of the secondary vertex and of the nearby tracks. We consider the SV as positively (negatively) charged if the sum of the charges of the constituent tracks is larger (smaller) than zero. If the secondary vertex charge is zero, we take the charge of the primary vertex track closest to the direction of the secondary vertex (given by the sum of the momentum of the constituent tracks). We only consider primary vertex tracks with \(p_{\textrm{T}} >0.3\,\text {GeV} \) and within an angular separation, \(\varDelta R < 0.1\), from the secondary vertex direction. If non zero charge cannot be assigned, the event is rejected.

In about 45% of the selected events, the reconstructed charge of the secondary vertex is zero, and in 60% of them, a charge can be assigned from the primary vertex track. According to the simulation, the charge assignment is correct in 70% of the cases, both for charged and neutral secondary vertices.

After \(\text {OS--SS}\) subtraction, we obtain an event yield of \(118\,625 \pm 947\) (\(132\,117 \pm 941\)) in the \(\text {W}\rightarrow \text {e}\upnu \) (\(\text {W}\rightarrow {\upmu }{\upnu }\)) channel. Table 3 shows the flavour composition of the selected sample, as predicted by the simulation. The purity of the \(\text {W}+{\text {c}}\) signal events is about 75%. The dominant background comes from \(\text {W}\,+ \,{\text {u} \text {d} \text {s} \text {g}} \) jets (around 15%), mostly from the processes \({\text {u} \text {g} \rightarrow {\text {W}}^{+} + \text {d}}\) and \({\text {d} \text {g} \rightarrow {\text {W}}^{-} + \text {u}}\), which are OS. Figure 3 shows the distributions after \(\text {OS--SS}\) subtraction of the secondary vertex displacement significance and the corrected secondary-vertex mass for data and simulation.

Table 3 Simulated flavour composition (in %) of the SV sample after the selection summarized in Table 1, including OS–SS subtraction, for the electron and muon W boson decay channels. \(\text {W}+ {\text {Q} \bar{\text {Q}}} \) is the sum of the contributions of \(\text {W}+ {\text {c}}{}\bar{{\text {c}}} \) and \(\text {W}+ \text {b} {}{\bar{\text {b}}} \). Quoted uncertainties are statistical only
Fig. 3
figure 3

Distributions after \(\text {OS--SS}\) subtraction of the secondary-vertex displacement significance (left) and corrected secondary-vertex mass (right). For each distribution all selection requirements are applied except the one on the displayed variable. The last bin of each plot includes all events beyond the bin. The contributions from all processes are estimated with the simulated samples. Vertical bars on data points represent the statistical uncertainty in the data. The hatched areas represent the sum in quadrature of statistical and systematic uncertainties in the MC simulation. The ratio of data to simulation is shown in the lower panels. The uncertainty band in the ratio includes the statistical uncertainty in the data, and the statistical and systematic uncertainties in the MC simulation

The distributions from the MC simulations are corrected for known discrepancies between data and simulation in the secondary vertex reconstruction. The events of the SL sample are used to compute data-to-simulation scale factors for the efficiency of charm identification through the reconstruction of a SV [56, 57]. The fraction of events in the SL sample with a secondary vertex is computed for data and simulation, and the ratio of data to simulation is applied as a scale factor to simulated \(\text {W}+{\text {c}}\) signal events in the SV sample. The scale factor is \(0.94 \pm 0.03\), where the uncertainty includes the statistical and systematic effects. The systematic uncertainty includes contributions from the uncertainties in the pileup description, jet energy scale and resolution, lepton efficiencies, background subtraction, and modelling of charm production and decay fractions in the simulation. The dependence of the scale factor on the \(p_{\textrm{T}} \) of the jet is included when computing differential cross sections, as explained in Sect. 7.

A jet \(p_{\textrm{T}} \)- and \(\eta \)-dependent correction factor between 1.0 and 1.2 is applied to the \(\text {W}\,+ \,{\text {u} \text {d} \text {s} \text {g}} \) component of the \(\text {W}+\text {jets}\) simulation to account for inaccuracies in the description of light-flavour jet contamination entering the signal. Those values correspond to data/simulation correction factors for light jets being misidentified as heavy-flavour jets, as computed in Ref. [58].

5 Systematic uncertainties 

The impact of various sources of uncertainty in the measurements is estimated by recalculating the cross sections and cross section ratio with the relevant parameters varied up and down by one standard deviation of their uncertainties. Most sources of systematic uncertainty equally affect \(\sigma ({\text {W}}^{+} +\bar{{\text {c}}})\) and \(\sigma ({\text {W}}^{-} +{\text {c}})\) measurements, thus, their effects largely cancel in the cross section ratio. We discuss first the uncertainties in the determination of the fiducial cross section in the four channels. The uncertainties in the cross section ratio are summarized at the end of the section. The most relevant sources of systematic uncertainties in the differential cross sections are further discussed in Sect. 7.

The combined uncertainty in the lepton trigger, reconstruction, and identification efficiencies results in a cross section uncertainty of 1.3 and 0.8% for the \(\text {W}\rightarrow \text {e}\upnu \) and \(\text {W}\rightarrow {\upmu }{\upnu }\) channel, respectively. The uncertainty in the efficiency of the identification of muons inside jets is approximately 3%, according to dedicated studies in multijet events [14], which directly translates into an equivalent uncertainty in the measured cross section in the SL channels.

The probability of lepton charge misassignment is studied with data using \(\text {Z} \rightarrow \ell \ell \) events reconstructed with same- or opposite-sign leptons. The charge misidentification probability for muons is negligible (\(<10^{-4}\)). For the electrons, it is \({\approx }\)0.4%, which propagates into a negligible uncertainty in the cross section measurements.

The effects of the uncertainty in the jet energy scale and the jet energy resolution are assessed by varying the corresponding correction factors within their uncertainties, according to the results of dedicated CMS studies [44, 45]. The resulting uncertainty is below 1.5%. The uncertainty from a \({\vec p}_{\textrm{T}}^{\text {miss}} \) mismeasurement in the event is estimated by smearing the simulated \({\vec p}_{\textrm{T}}^{\text {miss}} \) distribution to match that in data. The resulting uncertainty in the cross section is less than 0.2%. Uncertainties in the pileup modelling are calculated using a modified pileup profile obtained by changing the mean number of interactions by \(\pm 5\%\). This variation covers the uncertainty in the \({\text {p}\text {p}}\) inelastic cross section and in the modelling of the pileup simulation. It results in less than 1% uncertainty in the cross section measurements.

The measured average of the inclusive charm quark semileptonic branching fractions is \({{\mathcal {B}}}({\text {c}}\rightarrow \ell ) = 0.096\pm 0.004\) [38], while the exclusive sum of the individual contributions from all weakly decaying charm hadrons is \(0.086\pm 0.004\) [36, 38]. The average of these two values, \({{\mathcal {B}}}({\text {c}}\rightarrow \ell ) = 0.091 \pm 0.003\), is consistent with the pythia value used in our simulations (9.3%). We assign a 5% uncertainty in the SL channel to cover both central values within one standard deviation. For the SV channel, remaining inaccuracies in the charm hadron branching fractions in the pythia6 simulation are covered by a systematic uncertainty (2.6%) equal to the change in the cross section caused by the correction of \(\text {D}^{0}\)/\(\bar{\text {D}}^{0}\) and \(\text {D}^{\pm }\) decay branching fractions, as described in Sect. 3. The systematic effect of the uncertainty in the charm quark fragmentation fractions is set to be equal to the change in the cross section (1.2%) caused by the correction procedure described in Sect. 3. This uncertainty is assigned to both the SL and SV channels.

To account for inaccuracies in the simulation of the energy fraction of the charm quark carried by the charm hadron in the fragmentation process, we associate a systematic uncertainty computed by weighting the simulation to match the distribution of an experimental observable representative of that quantity. We use the distribution of the muon transverse momentum divided by the jet transverse momentum, \(p_{\textrm{T}} ^{\upmu }\)/\(p_{\textrm{T}} ^{\text {jet}}\), for the SL channel, and the secondary vertex transverse momentum divided by the jet transverse momentum, \(p_{\textrm{T}} ^{\text {SV}}\)/\(p_{\textrm{T}} ^{\text {jet}}\), for the SV channel. This procedure results in an uncertainty in the cross section of \({\approx }1\%\) in the SL channel and \({\lesssim }0.5\%\) in the SV channel.

The uncertainty in the scale factor correcting the SV reconstruction efficiency in simulation propagates into a systematic uncertainty of 2.2% in the cross section.

The modelling of the simulation of the secondary vertex charge assignment efficiency is studied with data using the subset of the events of the SL sample where a displaced secondary vertex has also been identified. The requirement of a reconstructed secondary vertex in the SL sample increases the \(\text {W}+{\text {c}}\) signal contribution to 95%. The charge of the secondary vertex is tested against the charge of the muon inside the jet, which is taken as a reference. The uncertainty in the SV charge determination is estimated as the difference in the rate obtained in data and simulation of correct SV charge assignment and results in a 1.2% uncertainty in the cross section.

The uncertainty in the determination of the background processes is thoroughly evaluated. The \(\text {OS--SS}\) subtraction procedure efficiently suppresses the contribution from background processes that produce equal amounts of OS and SS candidates, thus rendering the measurements largely insensitive to the modelling of these backgrounds. This is the case of \({\hbox {t} {}{\bar{\hbox {t}}}} \) production with the subsequent leptonic decay of the two W bosons, which is completely removed. We have checked with data how efficiently the \(\text {OS--SS}\) subtraction procedure eliminates these charge symmetric \({\hbox {t} {}{\bar{\hbox {t}}}} \) events. A \({\hbox {t} {}{\bar{\hbox {t}}}} \)-enriched control sample is selected by requiring a pair of high-\(p_{\textrm{T}} \) isolated leptons of different flavour, \(\text {e}\)-\(\upmu \), with opposite charge, following the same lepton selection criteria as in the \(\text {W}+{\text {c}}\) analysis. Events with at most two reconstructed jets with \(p_{\textrm{T}} >30\,\text {GeV} \) are selected. A nonisolated muon or a secondary vertex inside one of the jets is required. The charge of the highest-\(p_{\textrm{T}} \) isolated lepton and the charge of the muon in the jet or the secondary vertex are compared to classify the event as OS or SS. The test is repeated taking separately the highest-\(p_{\textrm{T}} \) lepton of the two possible lepton flavours and charges. A reduction down to less than 1% is observed in all cases after \(\text {OS--SS}\) subtraction. This behaviour is well reproduced in the simulation.

Some background contribution is expected from \({\hbox {t} {}{\bar{\hbox {t}}}} \) events where one of the W bosons decays leptonically, and the other one decays hadronically into a \({\text {c}}\bar{\text {s}} \) (\(\bar{{\text {c}}} \text {s} \)) pair. These are genuine OS events. The accuracy of the simulation to evaluate this contribution is checked with data using a semileptonic \({\hbox {t} {}{\bar{\hbox {t}}}} \)-enriched sample selected by requiring a high-\(p_{\textrm{T}} \) isolated lepton (\(\text {e}\) or \(\upmu \)) fulfilling the criteria of the \(\text {W}+{\text {c}}\) selection, and at least four jets in the event, one of them satisfying either the SL or SV selection. The relative charge of the muon in the jet or the secondary vertex with respect to the lepton from the W decay determines the event to be OS or SS. The number of events after \(\text {OS--SS}\) subtraction in the simulation and in data agree better than 10%. This difference is assigned as the uncertainty in the description of the semileptonic \({\hbox {t} {}{\bar{\hbox {t}}}} \) background. The effect on the fiducial \(\text {W}+{\text {c}}\) cross section is smaller than 1%.

The uncertainty in the contribution from single top quark processes is estimated by varying the normalization of the samples according to the uncertainties in the theoretical cross sections, \(\sim \) 5–6%. It produces a negligible effect on the measurements.

The contribution from \(\text {Z} +\text {jets}\) events is only relevant in the \(\text {W}\rightarrow {\upmu }{\upnu }\) channel of the SL category, amounting to \({\sim }7\%\) of the selected events. The level of agreement between data and the \(\text {Z} +\text {jets}\) simulation is studied in the region of the Z boson mass peak, \(70<m_{\upmu \upmu }<110\,\text {GeV} \), which is excluded in the signal analysis, applying the same selection procedure as for the signal sample, except for the invariant mass requirement; a difference of about 15% is observed. This discrepancy is assigned as a systematic uncertainty, assuming the same mismodelling outside the Z mass peak region. The effect on the cross section is about 1%.

An additional systematic uncertainty is assigned to account for a possible mismodelling of the \(\text {W}\,+ \,{\text {u} \text {d} \text {s} \text {g}} \) background. The systematic uncertainty is evaluated by using simulation correction factors, as presented in Sect. 4.2.2, associated with different misidentification probabilities. The uncertainty in the \(\text {W}\,+ \,{\text {u} \text {d} \text {s} \text {g}} \) contribution is \({\approx }10\%\), which translates into a 1% uncertainty in the cross section.

The \(\text {OS--SS}\) subtraction removes almost completely the contribution from gluon splitting processes to the selected sample. We have estimated that a possible mismodelling up to three times the experimental uncertainty in the gluon splitting rate into \({\text {c}}{}\bar{{\text {c}}} \) quark pairs [59, 60] has a negligible impact on the measurements.

The signal sample is generated with MadGraph and pythia6 using the CTEQ6L1 PDF and weighted to NNLO PDF set MSTW2008NNLO. The effect from the PDF uncertainty is estimated using other NNLO PDF sets (CT10 and NNPDF2.3 [61]). The resulting uncertainty in the cross section is small (\({\lesssim }1\%\)). Following the prescription of the individual PDF groups, the PDF uncertainty is of the same order.

In the signal modelling, no uncertainties are included in the simulation of higher-order terms in perturbative QCD (parton shower) or nonperturbative effects (hadronization, underlying event). The \(\text {OS--SS} \) subtraction technique removes the contributions to \(\text {W}+{\text {c}}\) production coming from charm quark–antiquark pair production, rendering the measurement insensitive to those effects.

The statistical uncertainty in the determination of the selection efficiency using the simulated samples is 2% for the SL channel and 1% for the SV channel, and is propagated as an additional systematic uncertainty. The uncertainty in the integrated luminosity is 2.6% [62].

The total systematic uncertainty in the \(\text {W}+{\text {c}}\) cross section is 7% for the measurements in the SL channels, and 5% for those in the SV channels.

Most of the systematic uncertainties cancel out in the measurement of the cross section ratio \(R_{{\text {c}}}^{\pm }\). This is the case of uncertainties related to lepton reconstruction and identification efficiencies, secondary vertex reconstruction, charm hadron fragmentation and decay fractions, and integrated luminosity determination. All other sources of uncertainty have a limited effect. The most relevant source of systematic uncertainty is the statistical uncertainty in the determination with the simulation of the selection efficiencies separately for the samples of \({\text {W}}^{+} \) and \({\text {W}}^{-} \) bosons. The total systematic uncertainty in the measurement of \(R_{{\text {c}}}^{\pm }\) in the SL channels is 3.5%, and 2.5% in the SV channels.

6 Fiducial \(\text {W}+{\text {c}}\) cross section and \(({\text {W}}^{+} +\bar{{\text {c}}})/({\text {W}}^{-} +{\text {c}})\) cross section ratio 

Cross sections are unfolded to the parton level using the \(\text {W}+{\text {c}}\) signal reference as defined in the MadGraph generator at the hard-scattering level. Processes where a charm-anticharm quark pair is produced in the hard interaction are removed from the signal definition. To minimize acceptance corrections, the measurements are restricted to a phase space that is close to the experimental fiducial volume with optimized sensitivity for the investigated processes: a lepton with \(p_{\textrm{T}} ^{\ell }>30\,\text {GeV} \) and \(|\eta ^{\ell } | < 2.1\), together with a \({\text {c}}\) quark with \(p_{\textrm{T}} ^{{\text {c}}} > 25\,\text {GeV} \) and \(|\eta ^{{\text {c}}} | < 2.5\). The \({\text {c}}\) quark parton should be separated from the lepton of the W boson candidate by a distance \(\varDelta R({{\text {c}}},\ell )>0.5\).

The measurement of the \(\text {W}+{\text {c}}\) cross section is performed independently in four different channels: the two charm identification SL and SV channels, and using W boson decay to electrons or muons. For all channels under study, the \(\text {W}+{\text {c}}\) cross section is determined using the following expression:

$$\begin{aligned} \sigma (\text {W}+{\text {c}})= \frac{Y_{\text {sel}}(1-f_{\text {bkg}})}{{\mathcal {C}} \, {\mathcal {L}}}, \end{aligned}$$
(1)

where \(Y_{\text {sel}}\) is the selected event yield in data and \(f_{\text {bkg}}\) the fraction of remaining background events, both after the selection process summarized in Table 1, and \(\text {OS--SS}\) subtraction. The fraction \(f_{\text {bkg}}\) is estimated from simulation. The signal yield, \(Y_{\text {sel}}(1-f_{\text {bkg}})\), is presented in Table 4.

The factor \({\mathcal {C}}\) corrects for losses in the selection process of \(\text {W}+{\text {c}}\) events produced in the fiducial region at parton level. It also subtracts the contributions from events outside the measurement fiducial region and from \(\text {W}+{\text {c}}\) events with \(\text {W}\rightarrow \uptau \upnu \), \(\uptau \rightarrow \text {e}+ \text {X}\) or \(\uptau \rightarrow \upmu + \text {X}\). It is calculated, using the sample of simulated signal events, as the ratio between the event yield of the selected \(\text {W}+{\text {c}}\) sample (according to the procedure described in Sects. 4.2.1 and 4.2.2 and after \(\text {OS--SS}\) subtraction) and the number of \(\text {W}+{\text {c}}\) events satisfying the phase space definition at parton level. The values of the \({\mathcal {C}}\) factors are also given in Table 4. The uncertainties quoted in the table include statistical and the associated systematic effects as discussed in Sect. 5. The different values of \({\mathcal {C}}\) reflect the different reconstruction and selection efficiencies in the four channels. In the SL channel, only about 3% of the signal charm hadrons generated in the fiducial region of the analysis produce a muon in their decay with enough momentum to reach the muon detector and get reconstructed. In the SV channel, only about 6% of the events with a charm hadron decay remain after SV reconstruction, SV charge assignment and \(\text {OS--SS}\) subtraction. The remaining inefficiency, accounted for in the \({\mathcal {C}}\) correction factors, is due to selection criteria of the samples. According to the simulation, the contribution to the cross section of events with \(m_{\textrm{T}} <55\,\text {GeV} \) is around 20%. No uncertainty is assigned to the modelling of this extrapolation. The integrated luminosity of the data is denoted by \({\mathcal {L}}\).

Finally, the fiducial \(\text {W}+{\text {c}}\) production cross section computed with Eq. (1) in the SL and SV channels for the electron and muon decay channels separately is shown in the last column of Table 4. Statistical and systematic uncertainties are quoted.

Table 4 Results in the SL (upper) and SV (lower) channels for the \(\text {W}\rightarrow \text {e}\upnu \) and \(\text {W}\rightarrow {\upmu }{\upnu }\) decays separately. Here \(Y_{\text {sel}}(1-f_{\text {bkg}})\) is the estimate for the signal event yield after background subtraction, \({\mathcal {C}}\) is the acceptance times efficiency correction factor, and \(\sigma (\text {W}+{\text {c}})\) is the measured production cross section

The \({\text {W}}^{+} +\bar{{\text {c}}} \) and \({\text {W}}^{-} +{\text {c}}\) cross sections are also measured independently using Eq. (1) after splitting the sample according to the charge of the lepton from the W boson decay, and the cross section ratio is computed. The corresponding numbers are summarized in Table 5. The overall yield of \({\text {W}}^{-} +{\text {c}}\) is expected to be slightly larger than that of \({\text {W}}^{+} +\bar{{\text {c}}} \) due to the small contribution, at a few percent level, of \(\text {W}+{\text {c}}\) production from the Cabibbo-suppressed processes \({\bar{\text {d}}} \text {g} \rightarrow {\text {W}}^{+} +\bar{{\text {c}}} \) and \(\text {d} \text {g} \rightarrow {\text {W}}^{-} +{\text {c}}\); this contribution is not symmetric because of the presence of down valence quarks in the proton.

Table 5 Measured production cross sections \(\sigma ({\text {W}}^{+} +\bar{{\text {c}}})\), \(\sigma ({\text {W}}^{-} +{\text {c}})\), and their ratio, \(R_{{\text {c}}}^{\pm }\), in the SL (upper) and SV (lower) channels for the electron and muon W boson decay modes

Results obtained for the \(\text {W}+{\text {c}}\) cross sections and cross section ratios in the different channels are consistent within uncertainties, and are combined to improve the precision of the measurement. The Convino [63] tool, which is used to perform the combination, is a maximum-likelihood approach including correlations between uncertainties within and between measurements. Systematic uncertainties arising from a common source and affecting several measurements are considered as fully correlated. In particular, all systematic uncertainties are assumed fully correlated between the electron and muon channels, except those related to the lepton reconstruction. The combined cross section and cross section ratio are:

$$\begin{aligned} \begin{aligned} \sigma (\text {W}+{\text {c}})&= 117.4 \pm 0.6\,\text {(stat)} \pm 5.6\,\text {(syst)} \,\, \hbox {pb}, \\ R_{{\text {c}}}^{\pm }&= 0.983 \pm 0.010\,\text {(stat)} \pm 0.017\,\text {(syst)}. \end{aligned} \end{aligned}$$

The contribution of the various sources of systematic uncertainty to the combined cross section is shown in Table 6. For each of the sources in the table, the quoted uncertainty is computed as the difference in quadrature between the uncertainty of the nominal combination and the one of a combination with that uncertainty fixed to the value returned by Convino.

Table 6 Impact of the sources of systematic uncertainty in the combined \(\sigma (\text {W}+{\text {c}})\) measurement

A prediction of the \(\text {W}+{\text {c}}\) cross section is obtained with the MadGraph simulation sample. It is estimated by applying the phase space definition requirements to the generator-level quantities: a lepton from the W boson decay with \(p_{\textrm{T}} ^{\ell }>30\,\text {GeV} \) and \(|\eta ^{\ell } | < 2.1\); a generator-level \({\text {c}}\) quark with \(p_{\textrm{T}} ^{{\text {c}}} > 25\,\text {GeV} \) and \(|\eta ^{{\text {c}}} | < 2.5\), and separated from the lepton by a distance \(\varDelta R({\text {c}},\ell )>0.5\). A prediction for the \(R_{{\text {c}}}^{\pm }\) ratio is similarly derived. The MadGraph prediction for the cross section is \(\sigma (\text {W}+{\text {c}})= 110.9 \pm 0.2 \,\text {(stat)} \,\,\hbox {pb}\), and, for the cross section ratio, it is \(R_{{\text {c}}}^{\pm }\) = \(0.969 \pm 0.004 \,\text {(stat)} \). They are in agreement with the measured values within uncertainties.

7 Differential \(\text {W}+{\text {c}}\) cross section and \(({\text {W}}^{+} +\bar{{\text {c}}})/({\text {W}}^{-} +{\text {c}})\) cross section ratio 

The \(\text {W}+{\text {c}}\) production cross section and \(R_{{\text {c}}}^{\pm }\) are measured differentially, as functions of \(|\eta ^{\ell } |\) and \(p_{\textrm{T}} ^{\ell }\). The binning of the differential distributions is chosen such that each bin is sufficiently populated to perform the measurement. Event migration between neighbouring bins caused by detector resolution effects is evaluated with the simulated signal sample and is negligible. The total sample is divided into subsamples according to the value of \(|\eta ^{\ell } |\) or \(p_{\textrm{T}} ^{\ell }\), and the cross section and cross section ratio are computed using Eq. (1). There is no significant dependence of the fraction of remaining background events, \(f_{\text {bkg}}\), after \(\text {OS--SS}\) on \(|\eta ^{\ell } |\), whereas it decreases by a factor of two along the studied \(p_{\textrm{T}} \) range.

The charm identification efficiency and its description in simulation vary with the \(p_{\textrm{T}} \) of the jet containing the \({\text {c}}\) quark. In \(\text {W}+{\text {c}}\) events, there is a correlation between the transverse momentum of the \({\text {c}}\) jet and that of the lepton from the W boson decay. Thus, for the determination of the differential cross sections as a function of \(p_{\textrm{T}} ^{\ell }\), we apply charm identification efficiency scale factors, dependent on jet \(p_{\textrm{T}} \), to the simulated samples. These jet \(p_{\textrm{T}} \)-dependent scale factors are determined using the same procedure described in Sect. 4.2.2 by dividing the SL sample into subsamples depending on the jet \(p_{\textrm{T}} \) and computing data-to-simulation scale factors for the efficiency of charm identification through the reconstruction of a secondary vertex for each of them. The value of the scale factors range from 0.9 to 1.0.

Systematic uncertainties in the differential \(\text {W}+{\text {c}}\) cross sections are in the range of 7–8% for the SL channels and 4–5% for the SV channels. The main sources of the systematic uncertainty are related to the charm hadron decay rates in simulation, the charm identification efficiencies, and the limited event count of the simulated samples. The largest uncertainty for the differential cross section as a function of the lepton \(p_{\textrm{T}} \) (4–5%) arises from the uncertainty in the charm identification efficiency scale factors. The systematic uncertainty for the differential cross section ratios is in the range of 2–3% for both channels, essentially coming from the limited event count of the simulated samples.

The \(\text {W}+{\text {c}}\) differential cross sections, obtained after the combination of the measurements in the four channels, as functions of \(|\eta ^{\ell } |\) and \(p_{\textrm{T}} ^{\ell }\) are presented in Tables 7 and 8 . The combination of the differential \(R_{{\text {c}}}^{\pm }\) values is given in Table 9 as a function of \(|\eta ^{\ell } |\), and in Table 10 as a function of \(p_{\textrm{T}} ^{\ell }\). The Convino tool is used for the combination; systematic uncertainties are assumed to be fully correlated among bins of the differential distributions.

Table 7 Measured differential cross section as a function of \(|\eta ^{\ell } |\), \(\textrm{d}\sigma (\text {W}+{\text {c}})/\textrm{d}|\eta ^\ell | \) from the combination of all four channels
Table 8 Measured differential cross section as a function of \(p_{\textrm{T}} ^{\ell }\), \(\textrm{d}\sigma (\text {W}+{\text {c}})/\textrm{d}{p_{\textrm{T}} ^\ell } \) from the combination of all four channels
Table 9 Measured cross section ratio \(R_{{\text {c}}}^{\pm }\) as a function of \(|\eta ^{\ell } |\), from the combination of all four channels
Table 10 Measured cross section ratio \(R_{{\text {c}}}^{\pm }\) as a function of \(p_{\textrm{T}} ^{\ell }\), from the combination of all four channels

8 Comparison with theoretical predictions 

The measured total and differential cross sections and cross section ratios are compared in this section with the analytical calculations from the \(\textsc {mcfm} \) 8.2 program [33, 64]. The \(\text {W}+{\text {c}}\) process description is available in \(\textsc {mcfm} \) up to \({\mathcal {O}}(\alpha _\textrm{S} ^2)\) with a massive charm quark (\(m_{{\text {c}}}=1.5\,\text {GeV} \)). The \(\textsc {mcfm} \) predictions for this process do not include contributions from gluon splitting into a \({\text {c}}{}\bar{{\text {c}}} \) pair, but only contributions where the strange (or the down) quark couples to the W boson. The implementation of the \(\text {W}+{\text {c}}\) process follows the calculation for the similar single top quark \(\hbox {t} \) \(\text {W}\) process [65]. The parameters of the calculation are adjusted to match the experimental measurement: \(p_{\textrm{T}} ^{\ell }>30\,\text {GeV} \), \(|\eta ^{\ell } |<2.1\), \(p_{\textrm{T}} ^{{\text {c}}}>25\,\text {GeV} \), and \(|\eta ^{{\text {c}}} |<2.5\).

We compute predictions for the following NLO PDF sets: MMHT2014 [66], CT14 [67], NNPDF3.1 [68], and ABMP16 [69]. They include dimuon data from neutrino-nucleus deep inelastic scattering to provide information on the strange quark content of the proton. Both the factorization and the renormalization scales are set to the W boson mass, \(m_{\text {W}}\). To estimate the uncertainty from missing higher perturbative orders, cross section predictions are computed by varying independently the factorization and renormalization scales to twice and half their nominal values, with the constraint that the ratio of the two scales is never larger than 2. The envelope of the cross sections with these scale variations defines the theoretical scale uncertainty.

The value in the calculation of the strong coupling at the energy scale of the mass of the Z boson, \(\alpha _\textrm{S} (m_{\text {Z}})\), is set to \(\alpha _\textrm{S} (m_{\text {Z}}) = 0.118 (0.119)\) for the predictions with MMHT2014, CT14 and NNPDF3.1 (ABMP16). Uncertainties in the predicted cross sections associated with \(\alpha _\textrm{S} (m_{\text {Z}})\) are evaluated as half the difference in the predicted cross sections evaluated with a variation of \(\varDelta (\alpha _\textrm{S})=\pm 0.002\). Uncertainties associated with the value of \(\alpha _\textrm{S} (m_{\text {Z}})\) for the ABMP16 PDF set are given together with their PDF uncertainties and are not quoted separately in the tables.

The theoretical predictions for the fiducial \(\text {W}+{\text {c}}\) cross section are summarized in Table 11, where the central value of each prediction is given, together with the uncertainty arising from the PDF variations within each set, the choice of scales, and \(\alpha _\textrm{S} \). The experimental result reported in this paper is also included in Table 11. The size of the PDF uncertainties depends on the different input data and methodology used by the various groups. In particular, they depend on the parameterization of the strange quark PDF and on the definition of the one standard deviation uncertainty band. The maximum difference between the central values of the various PDF predictions is \({\sim }8\%\). This difference is smaller than the total uncertainty in each of the individual predictions. Theoretical predictions are in agreement within the uncertainties with the measured cross section, as depicted in Fig. 4 (left), although lower.

Theoretical predictions for \(\sigma ({\text {W}}^{+} +\bar{{\text {c}}})\) and \(\sigma ({\text {W}}^{-} +{\text {c}})\) are computed independently in the same phase space of the measurement under the same conditions previously explained. Expectations for \(R_{{\text {c}}}^{\pm }\) are derived from them and presented in Table 12. All theoretical uncertainties are significantly reduced in the cross section ratio prediction. The theoretical predictions of the cross section ratio agree with each other, with the largest difference reaching 4%. The experimental value is larger than the theoretical predictions, but it is within two or three standard deviations depending on the prediction. They are presented graphically in Fig. 4 (right). The ratio of cross sections is sensitive to the asymmetry in the strange quark–antiquark content in the proton, but also to the down quark and antiquark asymmetry from the Cabibbo-suppressed process \({\bar{\text {d}}} \text {g} \rightarrow {\text {W}}^{+} +\bar{{\text {c}}} \) (\(\text {d} \text {g} \rightarrow {\text {W}}^{-} +{\text {c}}\)). The \(\text {d} \)-\({\bar{\text {d}}} \) asymmetry is larger in absolute value than the difference between strange quarks and antiquarks. It is worth noting that the CT14 PDF theoretical predictions assumes no strangeness asymmetry.

Table 11 Theoretical predictions for \(\sigma (\text {W}+{\text {c}})\) from \(\textsc {mcfm} \) at NLO. The kinematic selection follows the fiducial phase space definition: \(p_{\textrm{T}} ^{\ell }>30\,\text {GeV} \), \(|\eta ^{\ell } |<2.1\), \(p_{\textrm{T}} ^{{\text {c}}}>25\,\text {GeV} \), \(|\eta ^{{\text {c}}} |<2.5\), and \(\varDelta R({\text {c}},\ell )>0.5\). For each PDF set, the central value of the prediction is given, together with the relative uncertainty as prescribed from the PDF set, and the uncertainties associated with the scale variations and with the value of \(\alpha _\textrm{S}\). The total uncertainty is given in the last column. The last row in the table gives the experimental results presented in this paper
Table 12 Theoretical predictions for \(R_{{\text {c}}}^{\pm }\) calculated with \(\textsc {mcfm} \) at NLO. The kinematic selection follows the experimental requirements: \(p_{\textrm{T}} ^{\ell }>30\,\text {GeV} \), \(|\eta ^{\ell } |<2.1\), \(p_{\textrm{T}} ^{{\text {c}}}>25\,\text {GeV} \), \(|\eta ^{{\text {c}}} |<2.5\), and \(\varDelta R({\text {c}},\ell )>0.5\). For each PDF set, the central value of the prediction is given, together with the relative uncertainty as prescribed from the PDF set, and the uncertainties associated with the scale variations and with the value of \(\alpha _\textrm{S} \). The total uncertainty is given in the last column. The last row in the table gives the experimental results presented in this paper
Fig. 4
figure 4

Comparison of the theoretical predictions for \(\sigma (\text {W}+{\text {c}})\) (left) and \(\sigma ({\text {W}}^{+} +\bar{{\text {c}}})/\sigma ({\text {W}}^{-} +{\text {c}})\) (right) computed with \(\textsc {mcfm} \) and several sets of PDFs with the current experimental measurements

Predictions for the differential cross sections are obtained from analytical calculations with \(\textsc {mcfm} \), using the same binning as in the data analysis. Systematic uncertainties in the scale variations in some pseudorapidity bins and for some PDF sets reach 10%. Scale uncertainties in the differential cross sections as a function of \(p_{\textrm{T}} ^{\ell }\) are larger than in those as a function of \(|\eta ^{\ell } |\).

The theoretical predictions are compared with the combination of the experimental measurements presented in Section 7. Figure 5 shows the measurements given in Tables 7 and 8, and predictions for the differential cross sections as functions of \(|\eta ^{\ell } |\) and \(p_{\textrm{T}} ^{\ell }\), respectively. Theoretical predictions from MadGraph using the PDF set MSTW2008NNLO are also shown. The shape of the differential distribution as a function of \(|\eta ^{\ell } |\) is well described by all theoretical predictions. Theoretical predictions are about 10% lower than the measured cross section in the low transverse momentum region, \(p_{\textrm{T}} ^{\ell }<50\,\text {GeV} \). Recent calculations [70] point to NNLO corrections between 5 and 10% that bring theoretical predictions closer to the measurements.

Fig. 5
figure 5

Differential cross sections, \(\textrm{d}\sigma (\text {W}+{\text {c}})/\textrm{d}|\eta ^\ell | \) (upper) and \(\textrm{d}\sigma (\text {W}+{\text {c}})/\textrm{d}{p_{\textrm{T}} ^\ell } \) (lower). The data points are the combination of the results with the four different samples: SL and SV samples in \(\text {W}\rightarrow \text {e}\upnu \) and \(\text {W}\rightarrow {\upmu }{\upnu }\) events. Theoretical predictions at NLO computed with \(\textsc {mcfm} \) and four different NLO PDF sets are also shown. Symbols showing the theoretical expectations are slightly displaced in the horizontal axis for better visibility. The error bars in the \(\textsc {mcfm} \) predictions include PDF, \(\alpha _\textrm{S}\), and scale uncertainties. The inset in the lower plot, \(\textrm{d}\sigma (\text {W}+{\text {c}})/\textrm{d}{p_{\textrm{T}} ^\ell } \), zooms into the measurement-prediction comparison for the last bin, \(100<p_{\textrm{T}} ^{\ell }<200\,\text {GeV} \). Predictions from MadGraph using the PDF set MSTW2008NNLO are also presented

The predictions for the differential cross section ratio as functions of \(|\eta ^{\ell } |\) and \(p_{\textrm{T}} ^{\ell }\) are presented in Fig. 6, together with the cross section ratios given in Tables 9 and 10. Theoretical predictions from MadGraph are also shown. The measured cross section ratio, as a function of \(p_{\textrm{T}} ^{\ell }\), is larger than the predictions in the 35–60\(\,\text {GeV}\) range but compatible within uncertainties. According to Ref. [70], NNLO corrections for \(p_{\textrm{T}} ^{\ell }<60\,\text {GeV} \) are of the order of 5%, and are around 1% for \(p_{\textrm{T}} ^{\ell }>60\,\text {GeV} \). These corrections would improve the description of the measurements in the low \(p_{\textrm{T}} ^{\ell }\) region.

Fig. 6
figure 6

Cross section ratio, \(R_{{\text {c}}}^{\pm }\), as functions of \(|\eta ^{\ell } |\) (upper) and \(p_{\textrm{T}} ^{\ell }\) (lower). The data points are the combination of the results from the SL and SV samples in \(\text {W}\rightarrow \text {e}\upnu \) and \(\text {W}\rightarrow {\upmu }{\upnu }\) events. Theoretical predictions at NLO computed with \(\textsc {mcfm} \) and four different NLO PDF sets are also shown. Symbols showing the theoretical expectations are slightly displaced in the horizontal axis for better visibility. The error bars in the \(\textsc {mcfm} \) predictions include PDF, \(\alpha _\textrm{S}\), and scale uncertainties. Predictions from MadGraph using the PDF set MSTW2008NNLO are also presented

9 Impact on the strange quark distribution determination

The associated \(\text {W}+{\text {c}}\) production at a centre-of-mass energy of 8\(\,\text {TeV}\) directly probes the strange quark distribution of the proton at the scale of \(m^2_{\text {W}}\), in the kinematic range of \(0.001< x <0.080\), where x is the fraction of the proton momentum taken by the struck parton in the infinite-momentum frame. The present combined measurement of the \(\text {W}+{\text {c}}\) production cross section, determined as a function of \(|\eta ^{\ell } |\) and for lepton \(p_{\textrm{T}} ^{\ell }>30\,\text {GeV} \), is used in a QCD analysis at NLO.

The combination of the HERA inclusive deep inelastic scattering (DIS) cross sections [71] and the available CMS measurements of the lepton charge asymmetry in W boson production at \(\sqrt{s}=7\) and 8\(\,\text {TeV}\) [72, 73] are used. The CMS measurements probe the valence quark distributions in the kinematic range \(10^{-3} \le x \le 10^{-1}\) and have indirect sensitivity to the strange quark distribution. The CMS measurements of \(\text {W}+{\text {c}}\) production at \(\sqrt{s}=7\) [5] and 13\(\,\text {TeV}\) [6] are also used in a joint QCD analysis to fully exploit the other measurements at CMS that are sensitive to the strange quark distribution. The measurements included in this analysis are the HERA combined reduced cross sections for charged and neutral currents as a function of \(Q^2\) and x for different centre-of-mass energies, the muon charge asymmetry as a function of the pseudorapidity of the muon, and the \(\text {W}+{\text {c}}\) differential cross section as a function of \(|\eta ^{\ell } |\).

The correlations of the experimental uncertainties for each individual data set are included. The systematic uncertainties in the semileptonic branching fraction are treated as correlated between the CMS measurements of \(\text {W}+{\text {c}}\) production at 7 and 8\(\,\text {TeV}\). The rest of the systematic uncertainties are treated as uncorrelated between the two data-taking periods. The measurements of \(\text {W}+{\text {c}}\) production at a centre-of-mass energy of 13\(\,\text {TeV}\) are treated as uncorrelated with those at 7 and 8\(\,\text {TeV}\) because of the different methods of charm tagging and the differences in reconstruction and event selection in these data sets.

The theoretical predictions for the muon charge asymmetry and for the \(\text {W}+{\text {c}}\) production are calculated at NLO using the mcfm 6.8 program [33, 64], which is interfaced with applgrid 1.4.56 [74]. The open-source QCD fit framework for PDF determination xFitter [75, 76], version 2.0.0, is used with the parton distributions evolved using the Dokshitzer–Gribov–Lipatov–Altarelli–Parisi equations [77,78,79,80,81,82] at NLO, as implemented in the qcdnum 17-00/06 program [83]. The Thorne–Roberts [22, 84] general mass variable flavour number scheme at NLO is used for the treatment of heavy quark contributions with heavy quark masses \(m_{\text {b}} = 4.5\,\text {GeV} \) and \(m_{{\text {c}}} = 1.5\,\text {GeV} \), which correspond to the values used in the signal MC simulation in the cross section measurements. The renormalization and factorization (\(\mu _f\)) scales are set to Q, which denotes the four-momentum transfer in the case of the DIS data and \(m_{\text {W}}\) in the case of the muon charge asymmetry and the \(\text {W}+{\text {c}}\) process. The strong coupling is set to \(\alpha _\textrm{S} (m_{\text {Z}})\) = 0.118. The \(Q^2\) range of the HERA data is restricted to \(Q^2 \ge Q^2_{\min } = 3.5\,\text {GeV} ^2\) to ensure the applicability of perturbative QCD over the kinematic range of the fit. The procedure for the determination of the PDFs follows that of Ref. [6].

The PDFs of the proton, xf(x), are generically parameterized at the starting scale

$$\begin{aligned} xf(x) = A x^{B} (1-x)^{C} (1 + D x + E x^2). \end{aligned}$$
(2)

The parameterized PDFs are the gluon distribution, \(x\text {g} \), the valence quark distributions, \(x\text {u} _\textrm{v}\), \(x\text {d} _\textrm{v}\), the \(\text {u} \)-type and \(\text {d} \)-type anti-quark distributions, \(x{\bar{\text {u}}} \), \(x{\bar{\text {d}}} \), and \(x\text {s} \) (\(x\bar{\text {s}} \)) denoting the strange (anti-)quark distribution. By default it is assumed that \(x\text {s} =x\bar{\text {s}} \).

The central parameterization at the initial scale of the QCD evolution chosen as \(Q^2_{0} = 1.9\,\text {GeV} ^2\) is

$$\begin{aligned} x\text {g} (x)= & {} A_{\text {g}} x^{B_{\text {g}}} (1-x)^{C_{\text {g}}} , \end{aligned}$$
(3)
$$\begin{aligned} x\text {u} _\textrm{v}(x)= & {} A_{\text {u} _\textrm{v}} x^{B_{\text {u} _\textrm{v}}} (1-x)^{C_{\text {u} _\textrm{v}}}\left( 1+E_{\text {u} _\textrm{v}}x^2 \right) , \end{aligned}$$
(4)
$$\begin{aligned} x\text {d} _\textrm{v}(x)= & {} A_{\text {d} _\textrm{v}} x^{B_{\text {d} _\textrm{v}}} (1-x)^{C_{\text {d} _\textrm{v}}} , \end{aligned}$$
(5)
$$\begin{aligned} x{\bar{\text {u}}} (x)= & {} A_{{\bar{\text {u}}}} x^{B_{{\bar{\text {u}}}}} (1-x)^{C_{{\bar{\text {u}}}}}\left( 1+D_{{\bar{\text {u}}}}x\right) , \end{aligned}$$
(6)
$$\begin{aligned} x{\bar{\text {d}}} (x)= & {} A_{{\bar{\text {d}}}} x^{B_{{\bar{\text {d}}}}} (1-x)^{C_{{\bar{\text {d}}}}} , \end{aligned}$$
(7)
$$\begin{aligned} x\bar{\text {s}} (x)= & {} A_{\bar{\text {s}}} x^{B_{\bar{\text {s}}}} (1-x)^{C_{\bar{\text {s}}}}. \end{aligned}$$
(8)

The parameters \(A_{\text {u} _\textrm{v}}\) and \(A_{\text {d} _\textrm{v}}\) are determined using the quark counting rules and \(A_{\text {g}}\) using the momentum sum rule [85]. The normalization and slope parameters, A and B, of \({\bar{\text {u}}} \) and \({\bar{\text {d}}} \) are set equal such that \(x{\bar{\text {u}}} = x{\bar{\text {d}}} \) at very small x. The strange quark PDF \(x\bar{\text {s}} \) is parameterized as in Eq. (8), with \(B_{\bar{\text {s}}} = B_{{\bar{\text {d}}}}\), leaving two free strangeness parameters, \(A_{\bar{\text {s}}}\) and \(C_{\bar{\text {s}}}\). The optimal central parameterization was determined in a so-called parameterization scan following the HERAPDF procedure [71].

For all measured data, the predicted and measured cross sections together with their corresponding uncertainties are used to build a global \(\chi ^2\), minimized to determine the initial PDF parameters [75, 76]. The quality of the overall fit can be judged based on the global \(\chi ^2\) divided by the number of degrees of freedom, \(n_{\textrm{dof}}\). For each data set included in the fit, a partial \(\chi ^2\) divided by the number of measurements (data points), \(n_{\textrm{dp}}\) , is provided. The correlated part of \(\chi ^2\) reports on the influence of the correlated systematic uncertainties in the fit. The logarithmic penalty \(\chi ^2\) part comes from a \(\chi ^2\) term used to minimize bias. The full form of the \(\chi ^2\) used in this analysis follows the HERAPDF2.0 analysis [71]. The global and partial \(\chi ^2\) values for each data set are listed in Table 13, illustrating a general agreement among all the data sets. The somewhat high \(\chi ^2\) values for the combined DIS data are very similar to those observed in Ref. [71], where they are investigated in detail. The same fit, using the four different analysis channels instead of the combined measurement for \(\text {W}+{\text {c}}\) at \(\sqrt{s}=8\,\text {TeV} \), gives very consistent results and comparable values of \(\chi ^2\) for all data sets included.

Table 13 The partial \(\chi ^2\) per number of data points, \(n_{\textrm{dp}}\), and the global \(\chi ^2\) per number of degrees of freedom, \(n_{\textrm{dof}}\), resulting from the PDF fit

The experimental PDF uncertainties are investigated according to the general approach of HERAPDF [71, 86]. A cross check was performed using the MC method [87, 88]. The parton distributions and their uncertainties obtained from both methods are consistent.

We show results for the strange quark distribution \(x\text {s} (x,\upmu _f^2)\) and the strangeness suppression factor \(R_{\text {s}}(x,\upmu _f^2)\) = \((\text {s} +\bar{\text {s}})/({\bar{\text {u}}} +{\bar{\text {d}}})\). To investigate a possible impact of the assumptions on model input on the PDFs, alternative fits are performed, in which the heavy quark masses are set to \(m_{\text {b}} = 4.25\) and \(4.75\,\text {GeV} \), \(m_{{\text {c}}} = 1.45\) and \(1.55\,\text {GeV} \), and the value of \(Q^2_\text {min}\) imposed on the HERA data is set to 2.5 and \(5.0\,\text {GeV} ^2\). These variations do not alter results on \(x\text {s} (x,\upmu _f^2)\) or \(R_{\text {s}}(x,\upmu _f^2)\) significantly, compared to the experimental PDF fit uncertainty.

The differences between the central fit and the fits corresponding to the variations of \(Q^2_{\min }\), \(m_{{\text {c}}}\), and \(m_{\text {b}}\) are added in quadrature, separately for positive and negative deviations, and represent the model uncertainty. The parameterization variations considered consist of adding extra D and E parameters in the polynomials of Eq. (2) and varying the starting scale: \(Q^2_{0}=1.6\) and \(2.2\,\text {GeV} ^2\). In addition, further variations of the low-x sea quark parameterization are allowed: the A and B parameters for \({\bar{\text {u}}} \) and \({\bar{\text {d}}} \) are allowed to differ. The strange quark distribution and strangeness suppression factor are consistent with the nominal fit. The parameterization uncertainty corresponds to the envelope of the fits described above. The additional release of the condition \(B_{\bar{\text {s}}}=B_{{\bar{\text {d}}}}\) in the fit results in a shape of the \(\text {s} \) quark PDF that could possibly violate the nonsinglet octet combination rules of QCD [89]. Therefore this fit is only used for the parameterization variation and not as a nominal fit. The total PDF uncertainty is obtained by adding in quadrature the experimental, model, and parameterization uncertainties.

To assess the impact of the \(\text {W}+{\text {c}}\)  data collected at \(\sqrt{s}=8\,\text {TeV} \) on \(x\text {s} (x,\upmu _f^2)\) and \(R_{\text {s}}(x,\upmu _f^2)\), another QCD fit is performed, using the same parameterization described in Eqs. (38) but without these data. The central values of all parton distributions in those two fits are consistent within experimental uncertainties. The results of these two QCD fits for the \(\text {s} \) quark PDF and \(R_{\text {s}}\) at the scale of \(m^2_{\text {W}}\) are shown in Fig. 7. The relative total uncertainties are also compared in Fig. 7. The reduction of the uncertainties for these distribution with respect to those obtained without the new data is clearly visible.

Fig. 7
figure 7

The strange quark distribution (upper left) and the strangeness suppression factor (upper right) as a function of x at the factorization scale of \(m^2_{\text {W}}\). The corresponding relative total uncertainties are compared in the lower plots (strange quark distribution, lower left, and strangeness suppression factor, lower right). The results from the QCD analysis, shown as a filled area, use as input the combination of the inclusive deep inelastic scattering (DIS) cross sections [71], the CMS measurements of the lepton charge asymmetry in W boson production at \(\sqrt{s}=7\) and 8\(\,\text {TeV}\) [72, 73], and the CMS measurements of \(\text {W}+{\text {c}}\) production at \(\sqrt{s}=7\) [5],  8 (this analysis) and 13\(\,\text {TeV}\) [6]. The \(\text {W}+{\text {c}}\) measurement at \(\sqrt{s}=8\,\text {TeV} \) is not used for the fit shown in hatched style

Fig. 8
figure 8

The strange quark distribution (left) and the strangeness suppression factor (right) as a function of x at the factorization scale of \(m^2_{\text {W}}\). The results of the current analysis are shown together with those from the global NLO PDFs, ABMP16 and NNPDF3.1 in the upper plot, and CT18 and MSHT20 in the lower one. This QCD analysis uses as input the combination of the inclusive deep inelastic scattering (DIS) cross sections [71], the CMS measurements of the lepton charge asymmetry in W boson production at \(\sqrt{s}=7\) and 8\(\,\text {TeV}\) [72, 73], and the CMS measurements of \(\text {W}+{\text {c}}\) production at \(\sqrt{s}=7\) [5],  8 (this analysis) and 13\(\,\text {TeV}\) [6]

In Fig. 8, the distributions of \(x\text {s} (x,\upmu _f^2)\) and \(R_{\text {s}}(x,\upmu _f^2)\) at the scale of \(m^2_{\text {W}}\) obtained in this analysis are presented together with the results of other global PDFs: ABMP16 [69], NNPDF3.1 [68], CT18 [90], and MSHT20 [91]. These PDF sets have in common the use of the combined HERA data set, and also include neutrino charm production data and LHC W and Z boson measurements to provide information on the strange quark content of the proton. The overall agreement between the various results is good.

10 Summary

The associated production of a \(\text {W}\) boson with a charm quark (\(\text {W}+{\text {c}}\)) in proton–proton (\(\text {p}\text {p}\)) collisions at a centre-of-mass energy of 8\(\,\text {TeV}\) is studied with a data sample collected by the CMS experiment corresponding to an integrated luminosity of 19.7\(\,\text {fb}^{-1}\). The \(\text {W}+{\text {c}}\) process is selected based on the presence of a high transverse momentum lepton (electron or muon) coming from a \(\text {W}\) boson decay and a charm hadron decay. Charm hadron decays are identified either by the presence of a muon inside a jet or by reconstructing a secondary decay vertex within a jet. Inclusive and differential fiducial cross section measurements are performed with four different data samples (electron and muon \(\text {W}\) boson decay channels and reconstruction of semileptonic and inclusive decays of charm hadrons). Cross section measurements are unfolded to the parton level. The ratio of the cross sections of \({\text {W}}^{+} +\bar{{\text {c}}} \) and \({\text {W}}^{-} +{\text {c}}\) is also measured. The results from the four different channels are consistent and are combined.

The measured fiducial \(\text {W}+{\text {c}}\) production cross section and the \(({\text {W}}^{+} +\bar{{\text {c}}})/({\text {W}}^{-} +{\text {c}})\) cross section ratio are:

$$\begin{aligned}&\sigma (\text {p}\text {p}\rightarrow \text {W}+{\text {c}}+\text {X}) \, {\mathcal {B}}(\text {W}\rightarrow \ell \upnu ) \\&\quad =117.4 \pm 0.6 \,\text {(stat)} \pm 5.6 \,\text {(syst)} \,\, \hbox {pb},\\&\frac{\sigma (\text {p}\text {p}\rightarrow {\text {W}}^{+} +\bar{{\text {c}}} +\text {X})}{\sigma (\text {p}\text {p}\rightarrow {\text {W}}^{-} +{\text {c}}+\text {X})} \\&\quad =0.983 \pm 0.010\,\text {(stat)} \pm 0.017 \,\text {(syst)}. \end{aligned}$$

The measurements are compared with the predictions of the MadGraph MC simulation normalized to the NNLO cross section prediction of inclusive \(\text {W}\) production from fewz. They are consistent within uncertainties.

The measurements are also compared with analytical NLO calculations from the mcfm program using different NLO PDF sets. A fair agreement is seen in the differential cross section as a function of the absolute value of the pseudorapidity of the lepton from the \(\text {W}\)   boson. Differences of \({\sim }10\%\) occur in the differential cross section as a function of the transverse momentum of the lepton in the 30–50\(\,\text {GeV}\) range.

The combined measurement of the \(\text {W}+{\text {c}}\)  production cross section as a function of the absolute value of the pseudorapidity of the lepton from the W  boson decay is used in a QCD analysis at NLO, together with inclusive deep inelastic scattering measurements from HERA and earlier results from CMS on \(\text {W}+{\text {c}}\)  production and the lepton charge asymmetry in W  boson production. The strange quark distribution \(x\text {s} (x,\upmu _f^2)\) and the strangeness suppression factor \(R_{\text {s}}(x,\upmu _f^2)\) = \((\text {s} +\bar{\text {s}})/({\bar{\text {u}}} +{\bar{\text {d}}})\) are determined and agree with other NLO PDF sets such as ABMP16 [69], NNPDF3.1 [68], CT18 [90], and MSHT20 [91]. The inclusion of the present results further constrains the strange quark distribution and the strangeness suppression factor.