A search for the standard model Higgs boson decaying to charm quarks

A direct search for the standard model Higgs boson, H, produced in association with a vector boson, V (W or Z), and decaying to a charm quark pair is presented. The search uses a data set of proton-proton collisions corresponding to an integrated luminosity of 35.9 fb−1, collected by the CMS experiment at the LHC in 2016, at a centre-of-mass energy of 13 TeV. The search is carried out in mutually exclusive channels targeting specific decays of the vector bosons: W → ℓν, Z → ℓℓ, and Z → νν, where ℓ is an electron or a muon. To fully exploit the topology of the H boson decay, two strategies are followed. In the first one, targeting lower vector boson transverse momentum, the H boson candidate is reconstructed via two resolved jets arising from the two charm quarks from the H boson decay. A second strategy identifies the case where the two charm quark jets from the H boson decay merge to form a single jet, which generally only occurs when the vector boson has higher transverse momentum. Both strategies make use of novel methods for charm jet identification, while jet substructure techniques are also exploited to suppress the background in the merged-jet topology. The two analyses are combined to yield a 95% confidence level observed (expected) upper limit on the cross section σVHℬH→cc¯\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$ \sigma \left(\mathrm{VH}\right)\mathrm{\mathcal{B}}\left(\mathrm{H}\to \mathrm{c}\overline{\mathrm{c}}\right) $$\end{document} of 4.5 2.4−0.7+1.0\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$ \left({2.4}_{-0.7}^{+1.0}\right) $$\end{document} pb, corresponding to 70 (37) times the standard model prediction.


Introduction
The discovery of a Higgs boson, H, with the CERN LHC data collected in 2010-2012 by both the ATLAS [1] and CMS [2,3] experiments in 2012 represented a major step toward the characterization of the electroweak symmetry breaking mechanism [4][5][6]. The mass of this particle is measured to be m H ∼ 125 GeV [7-9] and its decays in the γγ, ZZ, WW, and ττ modes have been observed [10-20]. All measured properties so far [7,8,[21][22][23][24][25][26][27][28][29] indicate that, within the measurement uncertainties, this new particle is consistent with the expectations of the standard model (SM). Nevertheless, there remains much to be learned about the properties of this new particle. One of the highest priorities of the LHC physics program is the measurement of the couplings of the H boson to other SM particles. Recently both ATLAS and CMS Collaborations reported the first direct measurements of the H boson couplings to third-generation quarks (t and b) [30][31][32][33] and found them to be also compatible with the SM prediction. A measurement of the couplings of the H boson to second generation leptons [34,35] and quarks is the next target.
In this paper, we focus on the search for H bosons decaying to cc, a charm quark-antiquark pair. The H boson to charm quark Yukawa coupling y c can be significantly modified by physics beyond the SM [36][37][38][39]. Allowed range for unobserved decays constitutes the bound on the charm quark Yukawa coupling. Indirect constraints on y c obtained from a global fit to existing H boson data result in an upper bound on κ c ≡ y c /y SM c of 6.2 [40] at 95% confidence level (CL), assuming the absence of non-SM production mechanisms. A direct measurement of this process is extremely challenging at a hadron collider. The branching fraction of this process according to SM computations, B(H → cc ) = 0.0288 +0.0016 −0.0006 [41], is a factor 20 smaller than that of H → bb, and there is a very large background from SM processes comprised uniquely of jets produced through the strong interaction, referred to as quantum chromodynamics (QCD) multijet events. Results from direct searches for H → cc at the LHC in the ZH (Z → , = e or µ) channel were previously reported by the ATLAS Collaboration using a data sample of proton-proton (pp) collisions at a centre-of-mass energy of 13 TeV, corresponding to an integrated luminosity of 36.1 fb −1 [42]. The observed (expected) exclusion limit on the signal strength µ (defined as the product of the measured H boson production cross section and the H → cc branching fraction divided by the same quantity as predicted by the SM) at 95% CL was found to be 110 (150). This paper presents the first direct search for the H → cc decay carried out by the CMS Collaboration. It uses pp collision data corresponding to an integrated luminosity of 35.9 fb −1 , collected with the CMS experiment at the LHC in 2016 at a centre-of-mass energy of 13 TeV. The search targets H bosons produced in association with a W or Z boson, which we collectively refer to as vector (V) bosons. The presence of a V boson greatly suppresses backgrounds stemming from otherwise overwhelming QCD multijet processes, and its leptonic decays provide a crucial handle to collect the events efficiently. The most significant remaining backgrounds arise from V+jets (processes that account for one or more jets recoiling against a vector boson), tt, and VH(H → bb ) processes. To fully explore the H → cc decay mode, the analysis is split into two separate searches involving different topologies: the "resolved-jet" topology, in which the H boson candidate is reconstructed from two well-separated and individually resolved charm quark jets, and the "merged-jet" topology, in which the hadronization products of the two charm quarks are reconstructed as a single jet. The former focuses on H boson candidates with lower transverse momentum, p T , while the latter performs better for H boson candidates with high p T . In practice, the two topologies can have significant overlap and so, for the final result, the two are made distinct by defining them in reference to whether the V boson in the event has p T (V) below or above a single threshold chosen to maximise the sensitivity to the VH(H → cc ) process.
The central feature of this search is the identification of charm quark jets. In both topologies, novel tools based upon advanced machine learning (ML) techniques are used for charm quark jet identification [43,44]. In addition, the merged-jet topology makes use of jet substructure information to further suppress the backgrounds.

The CMS detector
The central feature of the CMS apparatus is a superconducting solenoid of 6 m internal diameter, providing a magnetic field of 3.8 T. Within the solenoid volume are a silicon pixel and strip tracker, a lead tungstate crystal electromagnetic calorimeter (ECAL), and a brass and scintillator hadron calorimeter (HCAL), each composed of a barrel and two endcap sections. Forward calorimeters extend the pseudorapidity (η) coverage provided by the barrel and endcap detectors. Muons are detected in gas-ionisation chambers embedded in the steel flux-return yoke outside the solenoid.
In the barrel section of the ECAL, an energy resolution of about 1% is achieved for unconverted or late-converting photons that have energies in the range of tens of GeV. The remaining barrel photons have a resolution of about 1.3% up to |η| = 1, rising to about 2.5% at |η| = 1.4. In the endcaps, the resolution of unconverted or late-converting photons is about 2.5%, while other endcap photons have a resolution between 3 and 4% [45].
In the region |η| < 1.74, the HCAL cells have widths of 0.087 in η and 0.087 rad in azimuth (φ). In the η-φ plane, and for |η| < 1.48, the HCAL cells map on to 5×5 arrays of ECAL crystals to form calorimeter towers projecting radially outwards from close to the nominal interaction point. For |η| > 1. 74, the coverage of the towers increases progressively to a maximum of 0.174 in ∆η and ∆φ. Within each tower, the energy deposits in ECAL and HCAL cells are summed to define the calorimeter tower energies.
Muons are measured in the range |η| < 2.4, with detection planes made using three technologies: drift tubes, cathode strip chambers, and resistive-plate chambers. The efficiency to reconstruct and identify muons is greater than 96%. Matching muons to tracks measured in the silicon tracker results in a relative p T resolution, for muons with p T up to 100 GeV, of 1% in the barrel and 3% in the endcaps. The p T resolution in the barrel is better than 7% for muons with p T up to 1 TeV [46].
Events of interest are selected using a two-tiered trigger system [47]. The first level, composed of custom hardware processors, uses information from the calorimeters and muon detectors to select events at a rate of around 100 kHz within a fixed time interval of less than 4 µs. The second level, known as the high-level trigger (HLT), consists of a farm of processors running a version of the full event reconstruction software optimised for fast processing, and reduces the event rate to around 1 kHz before data storage.
A more detailed description of the CMS detector, together with a definition of the coordinate system used and the relevant kinematic variables, can be found in Ref. [48].
The V+jets events are generated with MADGRAPH5 aMC@NLO v2.4.2 [62] at NLO with up to two additional partons, and at LO accuracy with up to four additional partons. The production cross sections for the V+jets samples are scaled to the NNLO cross sections obtained using FEWZ 3.1 [63]. Events in both LO and NLO samples are reweighted to account for NLO EW corrections to p T (V), which reach up to 10% for p T (V) ≈ 400 GeV. In addition, a LO-to-NLO correction is applied to LO samples as a function of the separation in η between the two leading jets in the event [64]. The p T (V) spectrum in simulation after the aforementioned corrections is observed to be harder than in data, as expected due to missing higher-order EW and QCD contributions to the V+jets processes [65]. A residual reweighting of p T (V), that is obtained via a fit to the data-to-simulation ratio in the control regions (detailed in Section 5) of the W( ν)H(cc ) and Z( )H(cc) channels in the resolved analysis, is applied.
Diboson background events are generated with MADGRAPH5 aMC@NLO v2.4.2 [62] at NLO with up to two additional partons in the matrix element calculations. The same generator is used at LO accuracy to generate a sample of QCD multijet events. The tt [66] and single top production processes in the tW-and t-channels [67,68] are generated to NLO accuracy with POWHEG v2, while the s-channel [69] single top process is generated with MAD-GRAPH5 aMC@NLO v2.4.2. The production cross sections for the tt samples are scaled to the NNLO prediction with the next-to-next-to-leading-log result obtained from TOP++ v2.0 [70]. The tt samples are reweighted as a function of top quark p T to account for the known differences between data and simulation [71].
The parton distribution functions (PDF) used to produce all samples are the NNLO NNPDF3.1 set [72]. For parton showering and hadronisation, including the H → cc decay, the matrix element generators are interfaced with PYTHIA v8.230 [73] with the CUETP8M1 [74] underlying event tune. The matching of jets from matrix element calculations and those from parton shower is done with the FxFx [75] (MLM [76]) prescription for NLO (LO) samples. For all samples, simulated additional pp interactions in the same or adjacent bunch crossings (pileup) are added to the hard-scattering process. The events are then reweighted to match the pileup profile observed in the collected data.

Event reconstruction and selection
Events are reconstructed using the CMS particle-flow (PF) algorithm [77], which seeks to reconstruct and identify the individual particles in the event via an optimal combination of all information in the CMS detector. The reconstructed particles are identified as charged or neutral hadrons, electrons, muons, or photons, and constitute a list of PF candidate physics objects. At least one reconstructed vertex is required. In the case of multiple collision vertices from pileup interactions, the candidate vertex with the largest value of summed physics-object p 2 T is taken to be the primary pp interaction vertex. The physics objects are the jets, clustered using the jet finding algorithm [78,79] with the tracks assigned to candidate vertices as inputs, and the associated missing transverse momentum, taken as the negative vector sum of the p T of those jets. Events affected by reconstruction failures, detector malfunctions, or noncollision backgrounds, are identified and rejected by dedicated filters [80].
Electrons are reconstructed by combining information from the tracker and energy deposits in the ECAL [81]. Muons are reconstructed by combining information from the tracker and the muon system [46]. Only tracks originating from the PV can be associated with the electrons or muons, and quality criteria [46,81] are further imposed that obtain more pure identification without substantial loss of efficiency. To suppress leptons stemming from b and c decays, while retaining leptons from V decays, isolation is required from jet activity within a cone of radius ∆R = √ (∆η) 2 + (∆φ) 2 = 0.3. The isolation is defined as the scalar p T sum of the PF candidates within the cone divided by the lepton p T . The upper threshold applied on the relative isolation is 0.06 for electrons and muons in the W( ν)H(cc ) channel and 0.15 and 0.25 for electrons and muons respectively in the Z( )H(cc) channel. Charged PF candidates not originating from the PV, as well as PF candidates identified as electrons or muons, are not considered in the sum [82]. The isolation of electrons and muons is also corrected for the estimated energy that is contributed to the isolation region by neutral particles originating from pileup. In the case of electrons, the latter is estimated by an effective jet area from the measured neutral energy density [81], while for muons, the ∆β-correction method [46] is applied.
Jets are reconstructed by clustering the PF candidates with the anti-k T algorithm [78,79] using a distance parameter R. The jet momentum is determined as the vectorial sum of all PF candidate momenta in the jet, and is found in simulation to be within 5 to 10% of the true momentum over the full detector acceptance and range of p T considered in this analysis. The raw jet energies are then corrected to establish a relatively uniform response of the calorimeter in η and a calibrated absolute response in p T . Additional corrections to account for any residual differences between the jet energy scale in data and simulation are extracted and applied based on comparison of data and simulated samples in relevant control regions [83]. The jet energy resolution typically amounts to 15-20% at 30 GeV, about 10% at 100 GeV, and 5% at 1 TeV [83]. Corrections extracted from data control regions are applied to account for the difference between the jet energy resolution in data and simulation. Additional selection criteria are applied to each jet to remove those that are potentially dominated by instrumental or reconstruction failures [84].
Two collections of jets reconstructed with the anti-k T algorithm are used in the search. The first consists of jets clustered with R = 0.4, and will be referred to as "small-R jets". The charged hadron subtraction algorithm [85] is used to eliminate PF candidates from the jet constituents associated with vertices from pileup interactions. The neutral component of the energy arising from pileup interactions is estimated with the effective area method [84]. The small-R jets are required to have p T > 20 GeV and to be within the tracker acceptance, |η| < 2.4. Any small-R jets that overlap with preselected electrons and muons, as defined by ∆R(j, ) < 0.4, are discarded.
The second jet collection is based on jets reconstructed using R = 1.5. This collection will be referred to as "large-R jets" in what follows. In this case, the PUPPI algorithm [86] is used to correct the jet energy for contributions coming from pileup. Additional information on jet substructure is obtained by reclustering the constituents of these jets via the Cambridge-Aachen algorithm [87]. The "modified mass drop tagger" algorithm [88,89], also known as the "soft-drop" (SD) algorithm, with angular exponent β = 0, soft cutoff threshold z cut = 0.1, and characteristic radius R 0 = 1.5 [90], is applied to remove soft, wide-angle radiation from the jet. In the default configuration, the SD algorithm identifies two hard subjets within the large-R jet by reversing the Cambridge-Aachen clustering history. The kinematics of the two subjets is used to calculate the 4-momentum of the large-R jet. The large-R jets are required to have |η| < 2.4 and a soft drop mass of 50 < m SD < 200 GeV. Large-R jets that overlap with preselected electrons and muons, as defined by ∆R(j, ) < 1.5, are discarded.
One of the most challenging tasks of this analysis is the discrimination of jets that are the result of the hadronisation of c quarks from all other jet flavours. Tagging c jets is more difficult than tagging b jets because they are less distinct from light-flavour quark or gluon jets (udsg) in regard to mass, decay length of charmed hadrons produced in the hadronisation process, and multiplicity of tracks inside the jet. The resolved-and merged-jet topology analyses use different strategies for tagging c jets. More details on c tagging are presented below in Sections 5 and 6.
The missing transverse momentum vector p miss T is computed as the negative vector p T sum of all the PF candidates in an event, and its magnitude is denoted p miss T [80]. The magnitude and direction of p miss T are modified to account for corrections to the energy scale of the reconstructed jets in the event.

Baseline selection
The search uses the leptonic decays of the vector bosons to define three mutually exclusive channels based on the charged-lepton multiplicity in the final state, namely: "0L" channel as referring to the Z(νν)H(cc ) signal process, "1L" channel as referring to the W( ν)H(cc ) signal process, and "2L" channel as referring to the Z( )H(cc) signal process. The 1L and 2L channels are further subdivided based on lepton flavour. Only electrons and muons are considered in this search.
Events in the 0L channel are collected with a p miss T trigger, while events in the 1L channel are obtained with a trigger requiring the presence of an isolated electron or muon with p T above 27 and 24 GeV, respectively. Events in the 2L channel of the resolved-jet topology analysis are selected by triggers that require the presence of a pair of leptons with p T larger than 23 and 12 GeV for electrons, and 17 and 8 GeV for muons. The same dielectron trigger has been used in the 2L Z(ee) channel of the merged-jet topology analysis, while events in the Z(µµ) channel are selected by the above single-muon trigger, which provides high efficiency for muons produced in the decays of high-p T bosons.
The collected events are required to pass additional offline criteria. In the 0L channel corresponding to Z boson decays to neutrinos, p miss T > 170 GeV is required and events with identified isolated leptons are rejected. The p miss T is taken to correspond to p T (V) in this case. Events with a single electron (muon) with p T > 30 (25) GeV pass the 1L selection. The leptonically decaying W boson is approximately reconstructed as the vectorial sum of the lepton momentum and p miss T . The event topology is required to be compatible with the leptonic decay of a Lorentzboosted W boson by requiring ∆φ(p miss T , ) < 2.0 (1.5) in the resolved-jet (merged-jet) topology analysis. Finally, for the 2L selection, the two highest p T leptons are required to be of the same flavour, opposite electric charge, and to have a p T above 20 GeV. The Z boson candidates are then reconstructed as the sum of the four-momenta of these two leptons, and the invariant mass of the candidates is required to be compatible with the Z boson mass (75 < m < 105 GeV).
A typical VH(H → cc ) event has the signature of a vector boson recoiling against a H boson with little additional activity. The event selection is designed to retain such events while suppressing background processes as much as possible. In addition to the requirement of a high-p T vector boson, the QCD multijet background is reduced to negligible levels by demanding the p miss T to not be aligned with any jet in the event and requiring the azimuthal angular separation ∆φ( p miss T trk , p miss T ) < 0.5 for which p miss T trk is calculated solely from charged particles. This latter selection reduces the contribution of QCD multijet events that arise from the presence of "fake" p miss T coming from jet energy mismeasurement in the calorimeters. A significant fraction of the tt background is suppressed by rejecting events with N aj small-R > 1 in the 0L and 1L channels, and N aj small-R > 2 in the 2L channel of the merged-jet topology analysis, where N aj small-R represents the additional small-R jet multiplicity. This requirement is not needed in the 2L channel of the resolved-jet topology analysis where the top quark background is negligible.
The dominant background that remains after the application of the event selection described above is V+jets. Contribution from this background is suppressed by requiring the dijet invariant mass (m jj ) to satisfy 50 < m jj < 200 GeV. Contributions from tt and single top processes remain significant in the 0L and 1L channels because of the presence of at least one W boson and because b quarks are often misidentified as c quarks by the c tagging algorithms. Contributions from diboson processes are typically small as a result of their small production cross sections. The background originating from H bosons decaying into b quarks presents kinematic properties similar to those of the signal, with the exception of a higher average energy of neutrinos in b jets than in c jets. This background is reduced by exploiting dedicated jet flavour taggers, as described in Sections 5 and 6.
The details of the resolved-jet topology and merged-jet topology analyses are described in Sections 5 and 6, respectively. Section 7 is dedicated to the treatment of the systematics uncertainties and Section 8 presents the results of the two analyses and of their combination. Section 8 presents also the strategy that is used to make the two analyses mutually exclusive in order to facilitate their combination for the final results.

Resolved-jet topology analysis
Approximately 95% of the VH events produced at √ s = 13 TeV have a vector boson with p T lower than 200 GeV, corresponding to the phase space region where the H boson decay products generally give rise to two distinctly reconstructed small-R jets in the CMS detector. The resolved-jet analysis aims to exploit a large fraction of this phase space which, however, contains a sizeable background contamination. The requirement of a moderate boost of the vector boson is then found to be crucial to the reduction of V+jets and tt backgrounds. Dedicated charm taggers based on ML are used to order the jets in the event by their likelihood to be c jets that are considered for use in reconstructing the H boson candidate.
Backgrounds arise from the production of W and Z bosons in association with one or more jets, single and pair-produced top quarks, and diboson events. A small residual QCD background is present in the 0L and 1L channels. High-purity control regions for the V +udsg and tt backgrounds are identified in data and used to estimate expected yields of these backgrounds in the signal region. Samples of events in regions that are disjoint from the signal region in c tagging probability and dijet mass but which are enhanced in V+bb and V+cc production are used to provide data-driven constraints on the V+bb and V+cc backgrounds, respectively. Finally, a binned maximum likelihood fit is carried out simultaneously in the signal region and in the control regions for all channels.

Higgs boson reconstruction
The H candidate is reconstructed as two distinct small-R jets. The identification of c jets among those arising from other flavours of quarks or gluons is achieved with the Deep Combined Secondary Vertex (DeepCSV) algorithm [43]. This algorithm encodes a multiclassifier based on advanced ML techniques and provides three output weights p(b), p(c), and p(udsg)  Figure 1: Efficiency to tag a c jet as a function of the b jet and light-flavour quark or gluon jet mistag rates. The working point adopted in the resolved-jet topology analysis to select the leading CvsL jets is shown with a white cross. The white lines correspond to c jet isoefficiency curves. The plot makes use of jets with p T > 20 GeV that have been clustered with AK4 algorithm in a simulated tt+jets sample before application of data-to-simulation reshaping scale factors.
which can be interpreted as the probabilities for a given jet to have originated from a bottom quark, a charm quark, or a gluon or light-flavour quark, respectively. By combining the various DeepCSV outputs, it is possible to define two discriminators for c tagging. The inputs to the DeepCSV algorithm are variables constructed from observables associated with the reconstructed primary and secondary vertices, tracks, and jets. The discrimination between c jets and light-flavour quark or gluon jets is achieved via the probability ratio defined as CvsL = p(c)/[p(c) + p(udsg)]. In the same way, discrimination between c jets and b jets makes use of the probability ratio defined as The two discriminator ratio values for each jet define a two-dimensional distribution. The resulting c tagging efficiency as a function of the b jet and light-flavour quark or gluon jet efficiencies is shown in Fig. 1. To account for residual O(10%) differences in the distributions of CvsL and CvsB found in the comparison of data and simulation, reshaping scale factors have been extracted using an iterative fit to distributions in control regions enriched in Drell-Yan+jets, semileptonic tt+jets, and W+c events that provide data samples with large fractions of light-flavour quark or gluon jets, b jets, and c jets, respectively.
The probability ratios CvsL and CvsB are used to discriminate candidates that are consistent with the c jet hypothesis from jets originating from light-flavour quarks or gluons, and b quarks, respectively. The two jets with the highest score of CvsL in the event are chosen to build the H candidate four-vector. Events are required to have the leading jet in CvsL score passing the c tagger working point requirements (CvsL > 0.4, CvsB > 0.2). This working point has been chosen such that the efficiency to identify a c jet is approximately 28%, while the misidentification rate is 4% for light-flavour quark or gluon jets and 15% for b jets. To ac-count for jets originating from final-state radiation (FSR), additional jets with p T > 30 GeV and |η| < 3.0 are included in the calculation of the components of the H candidate four-vector if they lie in a cone of ∆R < 0.8 centred on the direction of one of the two leading jets.

Signal extraction
In addition to the selections reported in Sections 4.1 and 5.1, in the 1L and 0L channels of the resolved-jet topology analysis, where larger backgrounds are expected, the V candidates are required to have p T of at least 100 and 170 GeV respectively, while, for the same channels, the H candidates are required to have p T of at least 100 and 120 GeV. In the 2L channel, where the background from tt production is much smaller and the effective signal cross section is also lower, two regions are considered: a low-p T (V) region defined by 50 < p T (V) < 150 GeV and a high-p T (V) region with p T (V) > 150 GeV (no upper cut is applied on p T (V)). In VH(H → cc ) signal events, the vector boson is typically produced in the azimuthal direction opposite to that of the H boson. Therefore, an additional requirement on the difference in azimuthal angle between the reconstructed V and H candidate, ∆φ(V, H) >2.5 (>2.0 in the 0L channel), is applied.
In the signal regions defined by the application of the selection criteria mentioned above, a boosted decision tree (BDT) with gradient boost [91] has been trained to enhance the signal separation from background. Separate BDTs have been trained for 0L, 1L, and 2L (low-p T (V) and high-p T (V)) channels. The distributions of all variables used to construct the BDT discriminator and hence the BDT distribution itself are taken from simulation after the application of the corrections detailed in Section 3. Table 1 lists the input variables considered in each channel. As expected, the most discriminating variables are found to be the H candidate invariant mass and the CvsL max .
The remaining background contribution is estimated from a combination of simulated events and data. While the normalisations of QCD, single-top, diboson, and VH(H → bb ) processes are estimated via simulation, the normalisations of the V+jets and tt+jets backgrounds are determined from fits to data in dedicated control regions in order to avoid potential mismodelling of the flavour composition of these samples. Four control regions per channel are designed to constrain the most important background processes: a region dominated by tt+jets events (TT), a region targeting V+jets with at least one jet originating from light-flavour quarks or gluons (LF), a region enriched in V+jets events with one b jet and one b or c jet (HF), and a region enriched with V+ccevents (CC). The definitions of the different control regions are based mainly on the inversion of the criteria on the charm tagger discriminators values of the CvsL-leading jet applied to define the signal regions. To define the LF control region the selection (CvsL < 0.4, CvsB > 0.2) is used while both the HF and TT control regions are defined applying the selection (CvsL > 0.4, CvsB < 0.2). In order to differentiate the TT from HF control regions, further requirements are applied such as a veto on the reconstructed Z boson mass in the 2L channel, m / ∈ [75, 120] GeV, and the requirement N aj small-R ≥ 2. The CC control region is defined identically to the signal region, except for inverting the requirement on the H mass. The simulated V+jets backgrounds are similarly split into four classes depending on the flavour(s) of the additional jet(s) present in the processes: V+2 light-flavour quark or gluon jets, V +udsg and 1b or 1c, V+bb or bc, and V+cc jets.
Separate normalisation scale factors are used to constrain Z+jets processes in the 0L and 2L channels, while the normalisation scale factors related to W+jets processes are shared between the 0L and 1L analysis channels. To constrain the tt+jets process, on the other hand, each channel relies on its own independent normalisation scale factors. The normalisation scale Table 1: Variables employed in the training of the BDT used for each channel of the resolved-jet topology analysis. The 2L case has separate training for the low-and high-p T (V) channels, but exploits the same input variables. ∆R between leading and subleading CvsL jets -∆φ(j 1 , j 2 ) azimuthal angle between leading and subleading CvsL jets -∆η(j 1 , j 2 ) difference in pseudorapidity between leading and subleading CvsL jets ∆φ( 1 , 2 ) azimuthal angle between leading and subleading p T leptons --∆η( 1 , 2 ) difference in pseudorapidity between leading and subleading p T leptons --∆φ( 1 , j 1 ) azimuthal angle between leading p T lepton and leading CvsL jet --∆φ( 2 , j 1 ) azimuthal angle between subleading p T lepton and leading CvsL jet --∆φ( 2 , j 2 ) azimuthal angle between subleading p T lepton and subleading CvsL jet --∆φ( 1 , p miss T ) azimuthal angle between leading p T lepton and missing transverse momentum --N aj small-R number of small-R jets minus the number of FSR jets N so f t 5 multiplicity of soft track-based jets with p T > 5 GeV factors are measured together with the signal strength modifier through a simultaneous fit to data in all control and signal regions for all of the analysis channels. The simulated diboson background is split according to the presence or absence of a Z boson decaying to a pair of charm quarks, labelling them as VZ(Z → cc ) if such a Z boson decay is present and VV+other otherwise. Whereas in the signal regions the BDT discriminator is used for the final signal extraction, in the control regions the shape of the CvsB distribution is used in the TT, HF, and CC regions, while that of CvsL is used in the LF region. Figure 2 shows the most representative distributions of the CvsB discriminant for the subleading CvsL jet for the HF and CC control regions in the 2L (Z(µµ)) low-p T (V), 2L (Z(ee)) highp T (V), 1L (W(µν)), and 0L channels. The post-fit distributions (for fit details, see Section 8) in Fig. 2 show good agreement between the data and the simulation in these two most significant control regions. Moreover, the employment of the full distribution of the CvsB score provides a good separation between the V+bb and V+cc processes that makes it possible to constrain these two backgrounds. The corresponding distributions for the other channels are not shown but are similar in their behaviour.

Merged-jet topology analysis
For the case of a Lorentz-boosted H boson as flagged by a V boson with p T (V) 200 GeV, a merged-jet topology is considered. The dominant backgrounds after the baseline selection presented in Section 4.1 come from V+jets and tt processes. The V bosons in the signal process have on average larger p T than those from the V+jets background. The analysis focuses on the reconstruction of moderately to highly Lorentz-boosted H bosons where the decay products are contained in a single large-R jet. Dedicated object reconstruction tools based on large-R jets and advanced ML techniques were developed to identify and reconstruct Lorentz-boosted H bosons decaying to charm quarks.

Higgs boson reconstruction
The cornerstone of the merged-jet topology analysis is the reconstruction of the H → cc candidate in a single, large-R jet, which has the potential to provide a better signal purity because the signal has a tendency to be more boosted than the dominant V+jets and tt backgrounds, as noted above. In view of this, the high-p T regime with p T (V) 200 GeV, though representing no more than approximately 5% of the total phase space, can provide a significant contribution to the search. Moreover, the merged-jet approach has important advantages over the resolved-jet approach at high p T . The possibility for both c quarks to reside in a single large-R jet enhances the signal acceptance, improves the identification of the correct pair of jets to use in reconstructing the H boson, and similarly facilitates the task of taking into account any FSR that may have been emitted by the quarks. A more detailed discussion of the potential advantages of this approach can be found in Refs. [89,92]. Given the small fraction of signal events that survive a selection with p T (V) 200 GeV, it is critical to carefully choose the R parameter of the jet clustering algorithm. In general, the angular separation between the decay products of a Lorentz-boosted particle such as the H boson is approximately given by ∆R ∼ 2m H /p T (H). For a p T (H) ∼ p T (V) of 200 GeV, this gives ∆R ≈ 1.25. For good signal purity and acceptance, we have thus chosen to use large-R jets clustered by the anti-k T algorithm with a distance parameter of R = 1.5.
As for the resolved-jet topology analysis, one of the biggest challenges is the efficient reconstruction of the pair of c quarks from the H boson decay, while also achieving significant rejection of both light-flavour quarks and gluons, as well as b quarks that contribute backgrounds  to this search. To this end, a novel algorithm, DeepAK15 [44], has been used for the identification of jet substructure to tag W, Z, and H bosons, as well as top quarks. In addition, DeepAK15 is designed to discriminate between decay modes with different flavour content (e.g. H → bb, H → cc, H → qqqq) [44]. The algorithm deploys ML methods on the PF candidates and secondary vertices, which are used as inputs. DeepAK15 is designed to exploit information related to jet substructure, flavour, and pileup simultaneously, yielding substantial gains with respect to other approaches [44]. With the aid of a generative adversarial neural network [44,93], the algorithm is largely decorrelated from the jet mass, while preserving most of the method's discriminating power.
The performance in terms of the receiver operating characteristic (ROC) curve of the cc discriminant for identifying a pair of c quarks from H boson decay versus quarks from the V+jets process for large-R jets with p T > 200 GeV is shown in Fig. 3 (left). Three working points (WPs) are defined on the cc tagging discriminant distribution with approximately 1, 2.5, and 5% misidentification rates, and the corresponding efficiencies for identifying a cc pair are approximately 23, 35, and 46%. Another important parameter is the misidentification of b jets as signal c jets. The corresponding ROC curve is displayed in Fig. 3 (right). For the three WPs defined above, the corresponding b jet misidentification rates are approximately 9, 17, and 27%. To improve the sensitivity of the analysis, three mutually exclusive cc-enriched categories, the "low-purity" (LP), "medium-purity" (MP), and "high-purity" (HP) categories, are defined based on the three WPs. The merged jet algorithm is calibrated using data and MC simulated samples, and p T -dependent simulation-to-data scale factors, typically ranging from 0.85 to 1.30, are extracted and applied only to the VH(H → cc ) signal events for reasons that will be provided in Section 6.2.

Signal extraction
In the merged-jet topology analysis, events are required to have at least one large-R jet with p T > 200 GeV, with the highest p T large-R jet selected as the H boson candidate. In VH(H → cc ) signal events, the vector boson and the H boson are typically emitted back-to-back in φ. Therefore, the difference in azimuthal angle between the reconstructed vector boson and H candidate, ∆φ(V, H), is required to be at least 2.5 rad. To avoid double-counting, small-R jets are removed from the event if they overlap the H candidate with ∆R(small-R jet, H) < 1.5.
To further distinguish the V H signal process from the main backgrounds, a separate BDT is developed for each channel. The goal is to define a discriminant that improves the separation between VH signal and the main backgrounds, while remaining largely independent of the cc tagging discriminant and the H mass. The BDT only makes use of kinematic information from the event, without including intrinsic properties of H such as the flavour content and the mass of the large-R jet, which will be used in a fit to the data for the signal extraction. Table 2 summarises the kinematic variables used as input to the BDT for each of the three channels.  The BDT distributions of the three channels for events passing the above selection are shown in Fig. 4 (left) for the VH(H → cc ) signal and the background processes. The discrimination power of the BDT depends on the channel. An improved discrimination power is obtained in the 2L and 1L channels compared to the 0L channel. In particular, in the 1L channel, improvement is achieved thanks to the presence of the charged lepton and p miss T , which are then used for the training of the BDT to provide additional handles to suppress the background. For all channels, events with BDT values greater than 0.5 define the signal region. Figure 4 (right) shows the distributions of the cc discriminant in the three channels in the signal region for the V H signal and the background processes. Good separation is observed between signal and background. The performance of the cc discriminant degrades with the presence of b quarks, as is the case for tt events, for instance. The signal regions of the merged-jet topology analysis are finally defined requiring the large-R jet to pass one of the three working points of the cc discriminant mentioned above.
Dedicated control regions, each enriched in a specific background process, are defined to aid the background estimation in each channel. Two types of control regions are defined: the "low-BDT" control region consisting of events with BDT value <0.5, which is enriched in V+jets background, and the high-N aj small-R control region, defined by inverting the selection on the  number of small-R jets to yield a high-purity tt sample. The latter is not used for the 2-lepton channel since the tt contribution is small in this channel. In both control regions, events are required to satisfy the same cc tagging discriminant criteria as applied in the signal regions in order to probe events with a similar flavour composition. This allows the efficiency of the cc tagging discriminant to be estimated directly from the data without further corrections being required, as verified in studies with simulated events and events in data validation regions orthogonal to the control and signal regions.
The low-BDT and the high-N aj small-R control regions, together with the signal regions, are included in the maximum likelihood fit to correct for any difference between data and simulation in the production rate of the V+jets and tt processes in the phase space selected by this analysis. Parameters used to separately scale the overall normalisation of the W +jets, Z +jets, and tt background processes are allowed to float freely in the fit. These parameters scale the background rate in the same way in both the control and the signal regions. The parameters are defined separately for each channel, with the exception that the same scale factor is assumed for the W +jets process in the 0L and 1L channels. Any potential difference in the cc tagging efficiency between data and simulation is also taken into account in the measured simulationto-data scale factors.

Systematic uncertainties
Systematic effects can impact the shape of the BDT discriminant distribution for both the resolved-and the merged-jet topologies, as well as the H candidate mass in the merged-jet topology. The dominant uncertainties are associated with the normalisation of the background from the data control regions and the limited sizes of the simulated samples. Additional systematic effects are related to the jet energy scale and resolution, which are treated as correlated between the large-and small-R jets. Theoretical uncertainties related to the cross sections and the p T spectra of the signal and backgrounds are also considered. In the resolved-jet analysis the systematic uncertainties in PDFs, and in the renormalisation and factorisation scales are treated as uncorrelated among the four flavour classes considered in the V+jets processes, as described in Section 5. Lastly, the uncertainties in the c quark identification are also considered. The full list of systematic uncertainties is provided in Table 3.
In Table 4 the uncertainty sources are grouped into categories and their impact on the fitted signal strength resulting from the combination of the resolved-jet and merged-jet topology analyses is provided (see Section 8 for more details). The uncertainty breakdown shows that the search for the VH(H → cc ) process is mainly limited by the available statistics: the related uncertainty accounts for more than 85% of the total uncertainty in the fitted µ. The statistical uncertainties include contributions from the limited number of events in the available dataset and the background normalisations extracted from the control regions. The main sources of systematics uncertainties come from the charm tagging efficiencies and the modelling of the simulated physics processes, representing ∼ 28% and ∼ 25% of the total uncertainty, respectively. Also the uncertainties in the theory prediction play a considerable role, representing approximately 30% of the total uncertainty in µ.

Results
The signal extraction strategy is based on a binned likelihood fit to the data, with the signal and control regions fitted simultaneously. The upper limit (UL) on the signal strength µ for SM production of VH(H → cc ) is extracted at 95% CL based on a modified frequentist Table 3: Summary of the systematic uncertainties for each channel. Uncertainties in the lepton identification and trigger efficiencies are treated as a normalisation uncertainty in the resolvedjet topology analysis and as a shape uncertainty in the merged-jet topology analysis.

Source
Type 0-lepton 1-lepton 2-lepton Simulated sample event count shape c tagging efficiency shape Top quark p T reweighting shape p T (V) reweighting shape V H : p T (V) NLO [94,95] under the asymptotic approximation for the profile likelihood test statistic [96,97]. Both analyses are validated by measuring the products of the VZ production cross section and the branching fraction of Z to charm quark-antiquark pair, B (Z → cc ). The systematic uncertainties are incorporated in the fit as constrained parameters of the likelihood function. The cross section of the VH(H → bb ) process is set to its SM prediction for the H boson mass of 125 GeV.
The results obtained in the resolved-jet and merged-jet topologies analyses independently, i.e., exploiting larger regions of the full phase space prior to defining disjoint data samples for the combination of results, are described in Sections 8.1 and 8.2. As described in Sections 5 and 6, in the merged-jet topology analysis the phase-space considered is bounded from below by p T (V) > 200 GeV, while for the resolved-jet topology analysis the lower bound is set by the p T (V) thresholds of 50, 100, and 170 GeV in the 2L, 1L and 0L channels, respectively. Neither of the analyses has an upper limit on p T (V). The two analyses are then combined for the final result, presented in Section 8.3, after making them statistically independent via a selection on p T (V) to set an upper bound for the resolved-jet topology analysis that is also the lower bound on the merged-jet topology analysis.

Resolved-jet topology
In the resolved-jet topology analysis, the VH(H → cc ) signal is extracted via a binned likelihood fit to the BDT output distributions, that is carried out simultaneously with fits to the backgrounds in control regions. In the LF control regions the fits are for the CvsL min distributions, while in the TT, HF, and CC control regions they are for the CvsB min distributions, as detailed in Section 5.2.
The analysis is first validated by measuring the product of the VZ production cross section and B (Z → cc ) normalised to the SM prediction. A separate BDT is trained for each channel, using VZ(Z → cc ) as signal and VH(H → cc ) as contribution to background with cross section fixed to the SM prediction. The measured signal strength for the VZ(Z → cc ) process is µ VZ(Z→cc ) = 1.35 +0.94 −0.95 with an observed (expected) significance of 1.5 (1.2) standard deviations (σ), respectively. The results are consistent within uncertainties with the SM expectation.
A dedicated BDT is trained for each channel to distinguish the VH(H → cc ) signal from the backgrounds. Figure 5 displays the BDT distributions in all search channels after the fit to the data. In all plots, the value of each nuisance parameter has been fixed to its best fit value. In general, the BDT distributions in data agree well with the background predictions. The largest excess in the data occurs at large BDT values in the high-p T (V) 2L (Z(ee)) channel with an observed local signal significance of 2.1 σ.
The observed (expected) UL at 95% CL on µ for SM VH(H → cc ) production is 75 (38 +16 −11 ), and the measured signal strength is µ VH(H→cc ) = 41 +20 −20 . The uncertainties in the expected UL correspond to a variation of ±1 σ in the expected event yields under the background-only hypothesis. The results are consistent with the SM expectations within two standard deviations. This modest deviation is mostly due to the small excess mentioned above. The results for each channel and their combination are shown in Table 5. The most sensitive channel is 2L, whereas the 0L and 1L channels have similar sensitivity.

Merged-jet topology
In the merged-jet topology analysis, the VH(H → cc ) signal is extracted via a binned maximum likelihood fit to the soft-drop mass m SD of H, with the signal regions and the control Single regions from all three purity categories included in the fit simultaneously. In total, 15 bins are used in the fit for each region, with a bin width of 10 GeV corresponding roughly to the m SD resolution. The m SD distributions of the VH(H → cc ) and background processes in all three channels in the high-purity category are shown in Fig. 6. The background prediction is in good agreement with the observed data, within uncertainties.
Similar to the resolved-jet topology analysis, the full procedure of the merged-jet topology analysis is validated by measuring the product of the VZ production cross section and B (Z → cc ) normalised to the SM prediction. The event selection, including the kinematic BDT, the cc tagging discriminant criteria, and the signal extraction procedure, remain unchanged. In place of VH(H → cc ), the VZ(Z → cc ) process is considered to be the signal and VH(H → cc ) contributes to the background with cross section fixed to the SM prediction. The measured signal strength is µ VZ(Z→cc ) = 0.69 +0.89 −0.75 with an observed (expected) significance of 0.9 (1.3) σ. The results are consistent within uncertainties with the SM expectation.
The best fit value of µ for SM VH(H → cc ) production is µ VH(H→cc ) = 21 +26 −24 , and the observed (expected) UL on µ is found to be 71 (49 +24 −15 ) at 95% CL. The uncertainties in the expected UL correspond to a variation of ±1 σ in the expected event yields under the background-only hypothesis. The observed values are in agreement with the SM expectation. The results for each channel and their combination are shown in Table 5. All channels yield comparable sensitivity in the merged-jet topology analysis.

Combination
To further improve the sensitivity of the search, a single likelihood analysis has been carried out on the two sets of data selected in the merged-and resolved-jet topologies analyses. To this end, the two analyses are categorised based on p T (V). Events with values smaller than a certain value of p T (V) are used in the resolved-jet topology analysis, whereas the remaining events are used in the merged-jet topology analysis. The main theoretical and experimental systematic uncertainties are correlated between the two analyses, with the exception of those related to the charm tagger efficiency (merged-jet topology) and reshaping (resolved-jet topology), and those related to the V+jets PDFs and the renormalisation and factorisation scales because of the different treatment of the V+jets processes adopted for the two analyses. The two regions demarcated by p T (V) = 300 GeV provide the best combined sensitivity in terms of expected limits on VH(H → cc ).   −11 ) at 95% confidence level. The uncertainties in the expected UL correspond to a variation of ±1 σ in the expected event yields under the background-only hypothesis. The measured signal strength is µ VH(H→cc ) = 37 +17 −17 (stat) +11 −9 (syst). The observed values of the signal strength agrees within 2.1 σ with the SM expectation. The results in the individual channels also agree with the SM expectation. The modest disagreement between the expected and observed UL's is mainly due to the small excess observed in data in the Z(ee) high-p T (V) channel of the resolved-jet topology analysis. process, for the resolved-jet topology analysis for p T (V) < 300 GeV, the merged-jet topology analysis for p T (V) ≥ 300 GeV, and their combination.

Summary
In this paper, we present the first search by the CMS Collaboration for the standard model (SM) Higgs boson H decaying to a pair of charm quarks, produced in association with a vector boson V (W or Z). The search uses proton-proton collision data at a centre-of-mass energy of 13 TeV collected with the CMS detector in 2016 and corresponding to an integrated luminosity of 35.9 fb −1 . The search is carried out in five modes, Z(µµ)H, Z(ee)H, Z(νν)H, W(µν)H, and W(eν)H, with two complementary analyses targeting different regions of phase space. The signal is extracted by statistically combining the results of the two analyses. Each analysis is first validated by carrying out a search for Z boson decay to a cc pair and comparing the observed signal strength with the SM prediction. The Z boson signal strength for the combination of the two analyses is measured to be µ VZ(Z→cc ) = σ/σ SM = 0.55 +0. 86 −0.84 , with an observed (expected) significance of 0.7 (1.3) standard deviations.
The measured best fit value of σ (VH) B (H → cc ) for the combination of the two analyses is 2.40 +1.12 −1.11 (stat) +0.65 −0.61 (syst) pb, which corresponds to a best fit value of µ for SM VH(H → cc ) production of µ VH(H→cc ) = σ/σ SM = 37 +17 −17 (stat) +11 −9 (syst), compatible within two standard deviations with the SM prediction. The larger measured µ VH(H→cc ) value is due to a small excess observed in data in the resolved analysis, with a local significance of 2.1 standard deviations. The observed (expected) 95% CL upper limit on σ (VH) B (H → cc ) from the combination of the two analyses is 4.5 (2.4 +1.0 −0.7 ) pb. This limit can be translated into an observed (expected) upper limit on µ VH(H→cc ) of 70 (37 +16 −11 ) at 95% CL by using the theoretical values of the cross section and branching fraction for the SM H boson with the mass of 125 GeV. This result is the most stringent limit on σ (pp → H) B (H → cc ) to-date.

Acknowledgments
We congratulate our colleagues in the CERN accelerator departments for the excellent performance of the LHC and thank the technical and administrative staffs at CERN and at other CMS institutes for their contributions to the success of the CMS effort. In addition, we gratefully acknowledge the computing centres and personnel of the Worldwide LHC Computing Grid for delivering so effectively the computing infrastructure essential to our analyses. Finally, we acknowledge the enduring support for the construction and operation of the LHC and the CMS detector provided by the following funding agencies:   [15] ATLAS Collaboration, "Study of (W/Z)H production and Higgs boson couplings using H → WW * decays with the ATLAS detector", JHEP 08 (2015) 137, doi:10.1007/JHEP08(2015)137, arXiv:1506.06641. [22] CMS Collaboration, "Combined measurements of Higgs boson couplings in proton-proton collisions at √ s = 13 TeV", Eur. Phys. J. C79 (2019) 421, doi:10.1140/epjc/s10052-019-6909-y, arXiv:1809.10733.
[28] ATLAS Collaboration, "Combined measurement of differential and total cross sections in the H → γγ and the H → ZZ * → 4 decay channels at √ s = 13 TeV with the ATLAS detector", Phys. Lett