Search for dark matter produced in association with a Higgs boson decaying to a pair of bottom quarks in proton–proton collisions at \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\sqrt{s}=13\,\text {Te}\text {V} $$\end{document}s=13Te

A search for dark matter produced in association with a Higgs boson decaying to a pair of bottom quarks is performed in proton–proton collisions at a center-of-mass energy of 13\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\,\text {Te}\text {V}$$\end{document}Te collected with the CMS detector at the LHC. The analyzed data sample corresponds to an integrated luminosity of 35.9\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\,\text {fb}^{-1}$$\end{document}fb-1. The signal is characterized by a large missing transverse momentum recoiling against a bottom quark–antiquark system that has a large Lorentz boost. The number of events observed in the data is consistent with the standard model background prediction. Results are interpreted in terms of limits both on parameters of the type-2 two-Higgs doublet model extended by an additional light pseudoscalar boson \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathrm {a}$$\end{document}a (2HDM+\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathrm {a}$$\end{document}a) and on parameters of a baryonic \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathrm {Z}'$$\end{document}Z′ simplified model. The 2HDM+\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathrm {a}$$\end{document}a model is tested experimentally for the first time. For the baryonic \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathrm {Z}'$$\end{document}Z′ model, the presented results constitute the most stringent constraints to date.


Introduction
Astrophysical evidence for dark matter (DM) is one of the most compelling motivations for physics beyond the standard model (SM) [1][2][3]. Cosmological observations demonstrate that around 85% of the matter in the universe is comprised of DM [4] and they are largely consistent with the hypothesis that DM is composed primarily of weakly interacting massive particles. If nongravitational interactions exist between DM and SM particles, DM could be produced by colliding SM particles at high energy. Assuming the pair production of DM particles in hadron collisions occurs through a spin-0 or spin-1 bosonic mediator coupled to the initial-state particles, the DM particles leave the detector without measurable signatures. If DM particles are produced in association with a detectable SM particle, which could be emitted as initialstate radiation from the interacting constituents of the cole-mail: cms-publication-committee-chair@cern.ch liding protons, or through new effective couplings between DM and SM particles, their existence could be inferred via a large transverse momentum imbalance in the collision event.
The production of the SM Higgs boson [5][6][7] via initialstate radiation is highly suppressed because of the mass dependence of its coupling strength to fermions. Nonetheless, the associated production of a Higgs boson and DM particles can occur if the Higgs boson takes part in the interaction producing the DM particles [8][9][10]. Such a production mechanism would allow one to directly probe the structure of the effective DM-SM coupling.
In this paper, we present a search for DM production in association with a scalar Higgs boson, h, with a mass of 125 GeV that decays to a bottom quark-antiquark pair (bb). As the bb decay mode has the largest branching fraction of all Higgs boson decay modes allowed in the SM, it provides the largest signal yield. The search is performed using the data set collected by the CMS experiment [11] at the CERN Large Hadron Collider (LHC) at a center-of-mass energy of 13 TeV in 2016, corresponding to an integrated luminosity of approximately 35.9 fb −1 . Similar searches have been conducted at the LHC by both the ATLAS and CMS Collaborations, analyzing data collected at 8 [12] and 13 TeV [13,14]. Results are interpreted in terms of two simplified models predicting this signature. The first is a type-2 two-Higgs doublet model extended by an additional light pseudoscalar boson a (2HDM+a) [15]. The a boson mixes with the scalar and pseudoscalar partners of the observed Higgs boson, and decays to a pair of DM particles, χ χ. The second is a baryonic Z model [10], in which a "baryonic Higgs" boson mixes with the SM Higgs boson. In this model, a vector mediator Z is exchanged in the s-channel and, after the radiation of an SM Higgs boson, decays to two DM particles. Representative Feynman diagrams for the two models are presented in Fig. 1.
In the 2HDM+a model, the DM particle candidate χ is a fermion that can couple to SM particles only through a spin-0, pseudoscalar mediator. Since the couplings of the new spin-0 mediator to SM gauge bosons are strongly suppressed, the 2HDM+a model is consistent with measurements of the SM Higgs boson production and decay modes, which so far show no significant deviation from SM predictions [16]. In contrast to previously explored two-Higgs doublet models [9,12,13,17], the 2HDM+a framework ensures gauge invariance and renormalizability. In this model there are six mass eigenstates. Two are charge-parity (CP)-even scalars: the light h, assumed to be the observed 125 GeV Higgs boson, and the heavy H. These are the result of the mixing of the neutral CP-even weak eigenstates with a mixing angle α. The two CP-odd pseudoscalar mass eigenstates are the light a and the heavy A, which are linear combinations of the CP-odd weak eigenstates, with a mixing angle θ . Finally, there are two heavy charged scalars H ± with identical mass. The masses of a and A, the angle θ , and the ratio of the vacuum expectation values of h and H, tan β, are varied in this search. The mixing angle α changes with β following the relation α = β − π/2. Perturbativity and unitarity put restrictions on the magnitudes and the signs of the three quartic couplings λ 3 , λ P1 , λ P2 , and we set their values to λ 3 = λ P1 = λ P2 = 3 [15]. The masses of the charged Higgs bosons and of the heavy CP-even Higgs boson are assumed to be the same as the mass of the heavy pseudoscalar, i.e., m H = m H ± = m A . When performing a scan in the m Am a plane, tan β is assumed to be 1 and sin θ is assumed to be 0.35, following the recommendations in Ref.
[18]. The DM particle χ is assumed to have a mass of m χ = 10 GeV. For tan β 1, the coupling strengths of both a and A to b quarks are enhanced, and effects from bb-initiated production are included in the signal simulation for all values of tan β.
The baryonic Z model [10] is an extension of the SM with an additional U(1) B Z gauge boson that couples to the baryon number B. The model predicts the existence of a new Dirac fermion that is neutral under SM gauge symmetries, has non-zero B, and is stable because of the corresponding U(1) B symmetry. The state therefore serves as a good DM candidate. To generate the Z mass, a baryonic Higgs scalar field is introduced to spontaneously break the U(1) B symmetry. In analogy with the SM, there remains a physical baryonic Higgs particle, h B , with a vacuum expectation value v B , which couples to the Z boson. The Z and the SM Higgs boson, h, interact with a coupling strength of where ζ is the h-h B mixing angle. The chosen value for the Z coupling to quarks, g q , is 0.25 and the Z coupling to DM, g χ , is set to 1, following the recommendations in Ref. [19]. This is well below the bounds g q , g χ ∼ 4π , where perturbativity and the validity of the effective field theory break down [10]. Constraints on the SM Higgs boson properties make the mixing angle ζ consistent with cos ζ = 1 within uncertainties of the order of 10%, thereby requiring sin ζ to be less than 0.4 [10]. In this search, it is assumed that sin ζ = 0.3. It is also assumed that g hZ Z /m Z = 1, which implies v B = m Z sin ζ . This choice maximizes the cross section without violating the bounds imposed by SM measurements. The free parameters in the model under these assumptions are thus m Z and m χ , which are varied in this search.
Signal events are characterized by a large imbalance in the transverse momentum (or hadronic recoil), which indicates the presence of invisible DM particles, and by hadronic activity consistent with the production of an SM Higgs boson that decays to a bb pair. Thus, the search strategy followed imposes requirements on the mass of the reconstructed Higgs boson candidate, which is also required to be Lorentz-boosted. Together with the identification of the hadronization products of the two b quarks produced in the Higgs boson decay, these requirements define a data sample that is expected to be enriched in signal events. Several different SM processes can mimic this topology, the most important of which are top quark pair production and the production of a vector boson (V) in association with multiple jets. For each of these SM processes that constitute the largest sources of background, statistically independent data samples are used to predict the hadronic recoil distributions. Both the signal and background contributions to the hadronic recoil distributions observed in data are extracted with a likelihood fit, performed simultaneously in all samples.

The CMS detector
The CMS detector, described in detail in Ref. [11], is a multipurpose apparatus designed to study high transverse momentum ( p T ) processes in proton-proton (pp) and heavy ion collisions. A superconducting solenoid occupies its central region, providing a magnetic field of 3.8 T parallel to the beam direction. Charged particle trajectories are measured using silicon pixel and strip trackers that cover a pseudorapidity region of |η| < 2.5. A lead tungstate crystal electromagnetic calorimeter (ECAL), and a brass and scintillator hadron calorimeter surround the tracking volume and extend to |η| < 3. The steel and quartz-fiber forward Cherenkov hadron calorimeter extends the coverage to |η| < 5. The muon system consists of gas-ionization detectors embedded in the steel flux-return yoke outside the solenoid and covers |η| < 2.4. Online event selection is accomplished via the two-tiered CMS trigger system [20]. The first level is designed to select events in less than 4µs, using information from the calorimeters and muon detectors. Subsequently, the high-level trigger processor farm reduces the event rate to 1 kHz.

Simulated data samples
The signal processes are simulated at leading order (LO) accuracy in quantum chromodynamics (QCD) perturbation theory using the MadGraph5_amc@nlo v2.4.2 [21] program. To model the contributions from SM Higgs boson processes as well as from the tt and single top quark backgrounds, we use the powheg v2 [22-24] generator. These processes are generated at the next-to-leading order (NLO) in QCD. The tt production cross section is further corrected using calculations at the next-to-next-to-leading order in QCD including corrections for soft-gluon radiation estimated with next-to-next-to-leading logarithmic accuracy [25]. Events with multiple jets produced via the strong interaction (referred to as QCD multijet events) are generated at LO using MadGraph5_amc@nlo v2. . Additional inelastic pp interactions in the same or a neighboring bunch crossing (pileup) are included in the simulation. The pileup distribution is adjusted to match the corresponding distribution observed in data.

Event reconstruction
The reconstructed interaction vertex with the largest value of summed physics-object p 2 T is taken to be the primary event vertex. The physics objects used for the primary event vertex determination are the clusters found by the anti-k T clustering algorithm [37,38], with a distance parameter of 0.4, from the charged particle tracks in the event, as well as the associated missing transverse momentum, taken as the negative vector sum of the p T of those clusters. The offline selection requires all events to have a primary vertex reconstructed within a 24 cm window along the z-axis around the nominal interaction point, and a transverse distance from the nominal interaction region less than 2 cm.
The particle-flow (PF) algorithm [39] aims to reconstruct and identify each individual particle in an event, with an optimized combination of information from the various elements of the CMS detector. The energy of photons is obtained from the ECAL measurement. The energy of electrons is determined from a combination of the electron momentum at the primary interaction vertex as determined by the tracker, the energy of the corresponding ECAL cluster, and the energy sum of all bremsstrahlung photons spatially compatible with originating from the electron track. The energy of muons is obtained from the curvature of the corresponding track. The energy of charged hadrons is determined from a combination of their momentum measured in the tracker and the matching ECAL and HCAL energy deposits, corrected for zero-suppression effects and for the response function of the calorimeters to hadronic showers. Finally, the energy of neutral hadrons is obtained from the corresponding corrected ECAL and HCAL energies. The PF candidates are then used to construct the physics objects described in this section. At large Lorentz boosts, the two b quarks from the Higgs boson decay may produce jets that overlap and make their individual reconstruction difficult. In this search large-area jets clustered from PF candidates using the Cambridge-Aachen algorithm [40] with a distance parameter of 1.5 (CA15 jets) are utilized to identify the Higgs boson candidate. The large cone size is chosen in order to select signal events where the Higgs boson has a medium Lorentz-boost and hence its decay products begin to merge for p T (h) 200 GeV. To reduce the impact of particles arising from pileup interactions, the four-vector of each PF candidate is scaled with a weight calculated with the pileup per particle identification (PUPPI) algorithm [41] prior to the clustering. The absolute jet energy scale is corrected using calibrations derived from data [42]. The CA15 jets are also required to be central (|η| < 2.4). The "soft-drop" (SD) jet grooming algorithm [43] is applied to remove soft wide-angle radiation from the jets. We refer to the mass of the groomed CA15 jet as m SD .
The ability to identify two b quarks inside a single CA15 jet is crucial for this search. A likelihood for the CA15 jet to contain two b quarks is derived by combining the information from primary and secondary vertices and tracks in a multivariate discriminant optimized to distinguish CA15 jets originating from h → bb decays from the cases where the hadronization of energetic quarks or gluons [44] leads to the presence of a CA15 jet. The working point chosen for this algorithm (the "double-b tagger") corresponds to an identification efficiency of 50% for a bb system with a p T of 200 GeV, and a probability of 10% for misidentifying CA15 jets originating from combinations of quarks or gluons not coming from a resonance decay. The efficiency of the algorithm increases with the p T of the bb system, reaching an efficiency of 65% for a CA15 jet with a p T > 500 GeV. In this p T regime, the misidentification rate for QCD jets is about 13%. The probability for misidentifying CA15 jets from top quark decays is 14% across the entire p T spectrum. These estimates are derived with no additional requirements on the CA15 jet kinematics.
Energy correlation functions are used to identify the twoprong structure in the CA15 jet expected from a Higgs boson decay to two b quarks, and to distinguish it from QCD-like jets (i.e., jets that do not originate from a heavy resonance decay) and jets from the hadronic decays of top quarks. The energy correlation functions ( v e N ) are sensitive to correlations among the constituents of the CA15 jet [45] and depend on N factors of the particle energies and v factors of their pairwise angles, weighted by the angular separation of the constituents.
As motivated in Ref.
[45], the ratio N 2 = 2 e (β) 2 ) 2 is used as a two-prong tagger for the identification of the CA15 jet containing the Higgs boson decay products. The parameter β, which controls the weighting of the angles between constituent pairs in the computation of the N 2 variable, is chosen to be 1 since this value gives the best two-prong jet identification. distribution as expected for CA15 jets originating from a Higgs boson decaying to a bb pair (solid red) is compared with the expected distribution for CA15 jets originating from the decay products of top quarks decaying hadronically (dotted grey). The distribution corresponding to CA15 jets that do not originate from a heavy resonance decay is also shown (dashed blue) It is noted that requiring a jet to be two-pronged based on the value of a jet substructure variable, such as N 2 , will affect the shape of the distribution of m SD for the background processes. In this search, the value of m SD is required to be consistent with the Higgs boson mass. It is therefore desirable to preserve a smoothly falling jet mass distribution for QCD-like jets. As motivated in Ref.
[46], the dependence of N 2 on the variable ρ = ln(m 2 SD / p 2 T ) is tested, since the distribution of ρ in QCD-like jets is expected to be invariant of the jet mass and p T . The decorrelation strategy described in Ref.
[46] is applied, choosing a QCD misidentification efficiency of 20%, which corresponds to a signal efficiency of 55% and a misidentification rate for top quark jets of 36% across the entire CA15 jet p T spectrum. This results in a modified tagging variable, which we denote as N DDT 2 , where the superscript DDT stands for "designing decorrelated taggers" [46]. Figure 2 shows the expected distribution of N DDT 2 for CA15 jets matched to a Higgs boson decaying to a bb pair, together with the distributions expected for CA15 jets matched to hadronically decaying top quarks and for QCDlike CA15 jets.
This search also utilizes narrow jets clustered from the PF candidates using the anti-k T algorithm with a distance parameter of 0.4 ("AK4 jets"). Narrow jets originating from b quarks are identified using the combined secondary vertex (CSVv2) algorithm [44]. The working point used in this search has a b-jet identification efficiency of 81%, a charm jet selection efficiency of 37%, and a 9% probability of misidentifying light-flavor jets [44]. Jets that are b-tagged are required to be central (|η| < 2.4).
Electron reconstruction requires the matching of a supercluster in the ECAL with a track in the silicon tracker. Reconstructed electrons are required to be within |η| < 2.5, excluding the transition region 1.44 < |η| < 1.57 between the ECAL barrel and endcap. Identification criteria [47] based on the ECAL shower shape and the consistency of the electron track with the primary vertex are imposed. Muon candidates are selected by two different reconstruction approaches [48]: one in which tracks in the silicon tracker are matched to a track segment in the muon detector, and another in which a track fit spanning the silicon tracker and muon detector is performed starting with track segments in the muon detector. Further identification criteria are imposed on muon candidates to reduce the number of hadrons and poorly measured mesons misidentified as muons [48]. These additional criteria include requirements on the number of hits in the tracker and in the muon systems, the fit quality of the global muon track, and the track's consistency with the primary vertex. Muon candidates with |η| < 2.4 are considered in this analysis. A minimum p T of 10 GeV is required for electron and muon candidates. Both are required to satisfy isolation requirements that limit the total energy of tracks and calorimeter clusters measured in conical regions about them. Hadronically decaying τ leptons, τ had , are reconstructed using the hadron-plus-strips algorithm [49], which uses charged hadron and neutral electromagnetic objects to reconstruct intermediate resonances into which the τ lepton decays. The τ had candidates with p T > 18 GeV and |η| < 2.3 are considered [47,49,50]. Photon candidates, identified by means of requirements on the ECAL energy distribution and its distance to the closest track, must have p T > 15 GeV and |η| < 2.5.
The missing transverse momentum p miss T is defined as the negative vectorial sum of the p T of all the reconstructed PF candidates. Its magnitude is denoted as p miss T . Corrections to jet momenta are propagated to the p miss T calculation, and event filters are used to remove spurious high p miss T events caused by instrumental noise in the calorimeters or beam halo muons [51]. These filters remove about 1% of signal events.

Event selection
Signal events are characterized by a high value of p miss T , the absence of any isolated lepton (e, μ, or τ ) or photon, and the presence of a CA15 jet identified as a Higgs boson candidate. In the signal region (SR) described below, the dominant background contributions arise from Z+jets, W+jets, and tt production. To predict the p miss T spectra of these processes in the SR, data from different control regions (CRs) are used. Single-lepton CRs are designed to predict the tt and W+jets backgrounds, while dilepton CRs predict the Z+jets background contribution. The hadronic recoil, U , serves as a proxy for the p miss T distribution of the main background processes in the SR and is defined by excluding the electron(s) and muon(s) from the p miss T computation in the CRs. Predictions for other backgrounds are obtained from simulation.
Events are selected online by the high level trigger system, using a jet reconstruction algorithm and constituents that mirror those of the offline analysis. The trigger requires large values of p miss T,trig or H miss T , where p miss T,trig is the magnitude of the vectorial p T sum over all PF particles and H miss T is the magnitude of the vectorial p T sum over all AK4 jets with p T > 20 GeV and |η| < 5.2 at the trigger level. Muon candidates are excluded from the online p miss T,trig calculation. Minimum thresholds on p miss T,trig and H miss T are between 90 and 120 GeV, depending on the data-taking period. Collectively, online requirements on p miss T,trig and H miss T are referred to as p miss T triggers. These triggers are measured to be 96% efficient for p miss T (U ) > 200 GeV and 100% efficient for p miss T (U ) > 350 GeV. For CRs that require the presence of electrons, events are collected by single-electron triggers, in which at least one electron is required by the online selection criteria. These sets of requirements are referred to as single-electron triggers.
A common set of preselection criteria is used for all regions. The presence of exactly one CA15 jet with p T > 200 GeV and |η| < 2.4 is required. It is also required that 100 < m SD < 150 GeV and N DDT 2 < 0. In the SR (CRs), p miss T (U ) has to be larger than 200 GeV, and the minimum azimuthal angle φ between any AK4 jet and the direction of p miss T ( U ) must be larger than 0.4 radians to reject multijet events that mimic signal events. Events with any τ had candidate or photon candidate are vetoed. The number of AK4 jets for which ΔR = (Δη) 2 + (Δφ) 2 > 1.5, where Δη and Δφ are, respectively, the differences in pseudorapidity and in the azimuthal angle (measured in radians) of a given AK4 jet and the CA15 jet, is required to be smaller than two. This number is referred to as "additional AK4 jets" in the following. This requirement significantly reduces the contribution from tt events in the SR.
Events that meet the preselection criteria described above are split into the SR and the different CRs based on their lepton multiplicity and the presence of a b-tagged AK4 jet not overlapping with the CA15 jet, as summarized in Table 1. For the SR, events are selected if they have no isolated electrons (muons) with p T > 10 GeV and |η| < 2.5 (2.4), and the previously described double-b tag requirement on the Higgs boson candidate CA15 jet is imposed.
To predict the p miss T spectrum of the Z+jets process in the SR, dimuon and dielectron CRs are used. Dimuon events are selected online employing the same p miss T triggers that are used in the SR. These events are required to have two oppositely charged muons (having p T > 20 GeV and p T > 10 GeV for the leading and trailing muon, respec- tively) with an invariant mass between 60 and 120 GeV. The leading muon has to satisfy tight identification and isolation requirements and is selected with an average efficiency of 95%. Dielectron events are selected online using singleelectron triggers. Two oppositely charged electrons with p T greater than 10 GeV are required offline, and they must form an invariant mass between 60 and 120 GeV. To be on the plateau of the trigger efficiency, at least one of the two electrons must have p T > 40 GeV and must satisfy tight identification and isolation requirements that correspond to an efficiency of 70% [47]. Events that satisfy the SR selection because of the loss of a single lepton primarily originate from W+jets and semileptonic tt events. To predict these backgrounds, four singlelepton samples are used: single-electron and single-muon, with and without a b-tagged AK4 jet outside the CA15 jet. The single-lepton CRs with a b-tagged AK4 jet target tt events, while the other two single-lepton CRs target W+jets events. Single-muon events are selected using the p miss T triggers described above. Single-electron events are selected using the same single-electron triggers employed in the online selection of dielectron events. The electron (muon) candidate in these events is required to have p T > 40 (20) GeV and to satisfy tight identification and isolation requirements. In addition, samples with a single electron must have p miss T > 50 GeV to avoid a large contamination from multijet events.
Each CR is further split into two subsamples depending on whether or not the CA15 jet satisfies the double-b tag requirement. This division allows for an in situ calibration of the scale factor that corrects the simulated misidentification probability of the double-b tagger for the three main backgrounds to the probability observed in data.

Signal extraction
As mentioned in Sect. 1, signal and background contributions to the data are extracted with a simultaneous binned likelihood fit (using the RooStats package [52]) to the p miss T and U distributions in the SR and the CRs. The dominant SM process in each CR is used to predict the respective background in the SR via transfer factors T . These factors are determined in simulation and are given by the ratio of the prediction for a given bin in p miss T in the SR and the corresponding bin in U in the CR, for the given process. This ratio is determined independently for each bin of the corresponding distribution.
For example, if b denotes the tt process in the b-tagged single-lepton control sample that is used to estimate the tt contribution in the SR, the expected number of tt events, N i , in the i th bin of the SR is then given by i is a freely floating parameter included in the likelihood to scale the tt contribution in bin i of U in the CR.
The transfer factors used to predict the W+jets and tt backgrounds take into account the impact of lepton acceptances and efficiencies, the b tagging efficiency, and, for the singleelectron control samples, the additional requirement on p miss T . Since the CRs with no b-tagged AK4 jets and a double-btagged CA15 jet also have significant contributions from the tt process, transfer factors to predict this contamination from tt events are also imposed between the single-lepton CRs with and without b-tagged AK4 jets. A similar approach is applied to estimate the contamination from W+jets production in the tt CR with events that fail the double-b tag requirement. Likewise, the Z+jets background prediction in the signal region is connected to the dilepton CRs via transfer factors. They account for the difference in the branching fractions of the Z → νν and the Z → decays and the impacts of lepton acceptances and selection efficiencies.

Systematic uncertainties
Nuisance parameters are introduced into the likelihood fit to represent the systematic uncertainties of the search. They can affect either the normalization or the shape of the p miss T (U ) distribution for a given process in the SR (CRs) and can be constrained in the fit. The shape uncertainties are incorporated by means of Gaussian prior distributions, while the rate uncertainties are given a log-normal prior distributions. The list of the systematic uncertainties considered in this search is presented in Table 2. To better estimate their impact on the results, uncertainties from a similar source (e.g., uncertainties in the trigger efficiencies) have been grouped. The groups of uncertainties have been ordered by the improvement in sensitivity obtained by removing the corresponding Table 2 Sources of systematic uncertainty, along with the type (rate/shape) of uncertainty and the affected processes. For the rate uncertainties, the percentage value of the prior is quoted. The last column denotes the improvement in the expected limit when removing the uncertainty group from the list of nuisances included in the likelihood fit. Such improvement is estimated considering as signal processes the 2HDM+a model with m A = 1.1 TeV and m a = 150 GeV and the baryonic Z model with m Z = 0.2 TeV and m χ = 50 GeV nuisances in the likelihood fit. The sensitivity in the baryonic Z model is generally poorer than that of a 2HDM+a model because the former predicts a more background-like p miss T distribution. The description of each single uncertainty in the text follows the same order as in the table.
Scale factors are used to correct for differences in the double-b tagger misidentification efficiencies in data and in the simulated W/Z+jets and tt samples. These scale factors are measured by simultaneously fitting events that pass or fail the double-b tag requirement. The correlation between the double-b tagger and p miss T (or U ) is taken into account by allowing recoil bins to fluctuate within a constraint that depends on the recoil value. Such dependence is estimated from the profile of the two-dimensional distribution of the double-b tag discriminant vs. the p T of the CA15 jet. This is the shape uncertainty that has the largest impact on the upper limits on the signal cross sections.
Shape uncertainties due to the bin-by-bin statistical uncertainties in the transfer factors are considered for the Z+jets, W+jets, and tt processes.
For the signal and the SM h processes, an uncertainty in the double-b tagging efficiency is applied that depends on the p T of the CA15 jet. This shape uncertainty has been derived through a measurement performed using a sample enriched in multijet events with double-muon-tagged g → bb splittings. A 7% rate uncertainty in the efficiency of the requirement on the substructure variable N DDT 2 , which is used to identify two-prong CA15 jets, is assigned to all processes where the decay of a resonance inside the CA15 jet cone is expected. Such processes include signal production together with SM h and diboson production. The uncertainty has been derived from the efficiency measurement obtained by performing a fit in a control sample enriched in semi-leptonic tt events, where the CA15 jet originates from the W boson that comes from the hadronically decaying top quark.
A 4% rate uncertainty due to the imperfect knowledge of the CA15 jet energy scale [42] is assigned to all the processes obtained from simulation.
Similarly, a 5% rate uncertainty in the p miss T magnitude, as measured by CMS in Ref. [53], is assigned to each of the processes estimated from simulation.
A rate uncertainty of 2.5% in the integrated luminosity measurement [54] is included and assigned to processes determined from simulation. In these cases, uncertainties in Table 3 Post-fit event yield expectations per p miss T bin for the SM backgrounds in the signal region when including the signal region data in the likelihood fit, under the background-only assumption. Also quoted are the expected yields for two signal models. Uncertainties quoted in the predictions include both the systematic and statistical components

Fig. 3
The p miss T distribution in the signal region before and after a likelihood fit. The data are in agreement with post-fit background predictions for the SM backgrounds, and no significant excess is observed. The dashed red histogram corresponds to the pre-fit estimate for the SM backgrounds. The lower panel shows the ratio of the data to the predicted SM background, before and after the fit. The rightmost p miss T bin includes overflow events the PDFs and uncertainties due to variations in the QCD renormalization and factorization scales are included as shape uncertainties, obtained by varying those parameters in the simulation.
The p miss T trigger efficiency is parametrized as a function of U and measured using both single-muon and dimuon events. The difference between these measurements is used to derive an uncertainty, which results in a 1% rate uncertainty for processes estimated using simulation. Processes estimated using control regions (tt, W+jets, and Z+jets) are sensitive to the effect of this uncertainty as a function of U , so a shape uncertainty (as large as 2% at low U values) is considered for such processes. The efficiencies of the singleelectron triggers are parametrized as a function of the electron p T and η and an associated 1% systematic uncertainty is added into the fit.
An uncertainty in the efficiency of the CSV b tagging algorithm applied to isolated AK4 jets is assigned to the transfer factors used to predict the tt background. The scale factors that correct this efficiency are measured with standard CMS methods [44]. They depend on the p T and η of the b-tagged (or mistagged) jet and therefore their uncertainties are included in the fit as shape uncertainties.
The uncertainty in the τ lepton veto amounts to 3%, correlated across all U bins. Also correlated across all U bins are the uncertainties in the electron and muon selection efficiencies, which amount to 1%.
An uncertainty of 21% in the heavy-flavor fraction of W+jets is reported in previous CMS measurements [55,56]. The uncertainty in the heavy-flavor fraction of jets produced together with a Z boson is measured to be 22% [57,58]. To take into account the variation of the double-b tagging efficiency introduced by such uncertainties, the efficiencies for the W+jets and Z+jets processes are reevaluated after varying the heavy-flavor component in the simulation. The difference in the efficiency with respect to the nominal efficiency value is taken as a systematic uncertainty, and amounts to 4% in the rate of the W+jets process and 5% in the rate of the Z+jets process.
Uncertainties in the SM h production due to variations of the renormalization/factorization scales and PDFs are included as shape variations. An uncertainty of 100% is assigned to the QCD multijet yield. This uncertainty is esti-   > 250 GeV and requiring that the minimum azimuthal angle between p miss T and the jet directions be less than 0.1 radians. One nuisance parameter represents the uncertainty in QCD multijet yields in the signal region, while separate nuisance parameters are introduced for the muon CRs and electron CRs. A systematic uncertainty of 20% is assigned to the single top quark background yields as reported by CMS in Ref. [59] and is correlated between the SR and the CRs. An uncertainty of 20%, correlated across the SR and CRs, is also assigned to the diboson production cross section as measured by CMS in Refs. [60,61].

Results
The expected yields for each background in the SR and their uncertainties, as determined in the likelihood fit under the background-only assumption, are presented in Table 3, along with the observed data yields. Good agreement is observed between data and the predictions. Due to anticorrelations between background processes, in some bins the uncertainty in the background sum is smaller than the uncertainties in the individual contributions, such as, for example, the Z+jets yields.
Expected yields are also presented for two signal models. The selection efficiencies for the chosen points correspond to 5% for the 2HDM+a model and 1% for the baryonic Z model. Figure 3 shows the pre-fit and post-fit p miss T distributions in the SR for signal and for all SM backgrounds, as well as the observed data distribution. The likelihood fit has been performed simultaneously in all analysis regions. The data agree with the background predictions at the one standard deviation level, and the post-fit estimate of the SM background is slightly larger than the pre-fit one. The distributions for U in the muon and electron CRs, after a fit to the data, are presented in Figs. 4 and 5.
No significant excess over the SM background expectation is observed in the SR. The results of this search are interpreted in terms of upper limits on the signal strength modifier μ = σ/σ theory , where σ theory is the predicted production cross section of DM candidates in association with a Higgs boson and σ is the upper limit on the observed cross section. The upper limits are calculated at 95% confidence level (CL) using a modified frequentist method [62][63][64] computed with an asymptotic approximation [65]. Figure 6 shows the upper limits on μ for the three scans To compare results with DM direct detection experiments, limits from the baryonic Z model are presented in terms of a spin-independent (SI) cross section σ SI for DM scattering off a nucleus. Following the recommendation of Ref. [66], the value of σ SI is determined by the equation: where μ nχ is the reduced mass of the DM-nucleon system, f (g q ) is the mediator-nucleon coupling, which depends on the mediator coupling to SM quarks g q , g χ is the mediator coupling to SM particles, and m med is the mass of the mediator. The resulting σ SI limits as a function of the DM mass are shown in Fig. 8. Under the assumptions made for the baryonic Z model, these limits on the DM-nucleon SI cross section are the most stringent to date for m χ < 5 GeV.

Summary
A search for dark matter (DM) produced in association with a Higgs boson decaying to a pair of bottom quarks in a sample of proton-proton collision data corresponding to 35.9 fb −1 is presented. No significant deviation from the predictions of the standard model is observed, and 95% CL upper limits on the production cross sections predicted by a type-2 two-