1 Introduction

The existence of dark matter (DM) is well established from astrophysical observations [1], where the evidence relies entirely on gravitational interactions. According to fits based on the Lambda cold dark matter model of cosmology [2] to observational data, DM comprises 26.4% of the current matter-energy density of the universe, while baryonic matter accounts for only 4.8% [3]. In spite of the abundance of DM, its nature remains unknown. This mystery is the subject of an active experimental program to search for dark matter particles, including direct-detection experiments that search for interactions of ambient DM with ordinary matter, indirect-detection experiments that search for the products of self-annihilation of DM in outer space, and searches at accelerators and colliders that attempt to create DM in the laboratory.

The search presented here considers a “mono-Z ” scenario where a Z boson, produced in proton–proton (\({\mathrm{p}} {\mathrm{p}} \)) collisions, recoils against DM or other beyond the standard model (BSM) invisible particles. The Z boson subsequently decays into two charged leptons (\(\ell ^{+}\ell ^{-}\), where \(\ell ={\mathrm{e}} \) or \({{\upmu }{}{}} \)) yielding a dilepton signature, and the accompanying undetected particles contribute to missing transverse momentum. The analysis is based on a data set of \({\mathrm{p}} {\mathrm{p}} \) collisions at a center-of-mass energy of 13\(\,\text {Te}\text {V}\) produced at the CERN LHC. The data were recorded with the CMS detector in the years 2016–2018, and correspond to an integrated luminosity of \(137{\,\text {fb}^{-1}} \). The results are interpreted in the context of several models for DM production, as well as for two other scenarios of BSM physics that also predict invisible particles.

Fig. 1
figure 1

Feynman diagrams illustrative of the BSM processes that produce a final state of a Z boson that decays into a pair of leptons and missing transverse momentum: (upper left) simplified dark matter model for a spin-1 mediator, (upper right) 2HDM+ \(\textsf {a}\) model, (lower left) invisible Higgs boson decays, and (lower right) graviton (G) production in a model with large extra dimensions or unparticle (U) production. Here A represents the DM mediator, \({\upchi }{}{}\) represents a DM particle, while (H, h) and  \(\textsf {a}\) represent the scalar and pseudoscalar Higgs bosons, respectively. Here h is identified with the 125\(\,\text {Ge}\text {V}\) scalar boson. The dotted line represents either an unparticle or a graviton

These results extend and supersede a previous search by CMS in the mono-Z channel based on a data set collected at \(\sqrt{s}=13\,\text {Te}\text {V} \) corresponding to an integrated luminosity of 36\(\,\text {fb}^{-1}\)  [4]. The ATLAS experiment has published searches in this channel as well with the latest result based on a data set corresponding to an integrated luminosity of 36\(\,\text {fb}^{-1}\)  [5]. Similar searches for DM use other “mono-X” signatures with missing transverse momentum recoiling against a hadronic jet [6, 7], a photon [8], a heavy-flavor (bottom or top) quark [9,10,11], a \({\mathrm{W}} \) or \({\mathrm{Z}} \) boson decaying to hadrons [5, 7, 12], or a Higgs boson [13,14,15,16,17,18]. An additional DM interpretation is explored in searches for Higgs boson decays to invisible particles [19, 20].

The paper is organized as follows. The DM and other BSM models explored are introduced along with their relevant parameters in Sect. 2. Section 3 gives a brief description of the CMS detector. The data and simulated samples are described in Sect. 4, along with the event reconstruction. The event selection procedures and background estimation methods are described in Sects. 5 and 6, respectively. Section 7 details the fitting method implemented for the different models presented, while Sect. 8 discusses the systematic uncertainties. The results are given in Sect. 9, and the paper is summarized in Sect. 10.

2 Signal models

Several models of BSM physics can lead to a signature of a Z boson subsequently decaying into a lepton pair and missing transverse momentum. The goal of this paper is to explore a set of benchmark models for the production of DM that can contribute to this final state. In all DM models we consider, the DM particles are produced in pairs, \({\upchi }{}{} {\bar{{\upchi }{}{}}} \), where \({\upchi }{}{} \) is assumed to be a Dirac fermion.

First, we consider a set of simplified models for DM production [21, 22]. These models describe the phenomenology of DM production at the LHC with a small number of parameters and provide a standard for comparing and combining results from different search channels. Each model contains a massive mediator exchanged in the s-channel, where the mediator (either a vector, axial-vector, scalar, or pseudoscalar particle) couples directly to quarks and to the DM particle \({\upchi }{}{}\). An example tree-level diagram is shown in Fig. 1 (upper left). The free parameters of each model are the mass of the DM particle \(m_{\upchi }{}{} \), the mass of the mediator \(m_{\text {med}}\), the mediator-quark coupling \(g_{{\mathrm{q}}}\), and the mediator-DM coupling \(g_{{\upchi }{}{}}\). Following the suggestions in Ref. [22], for the vector and axial-vector studies, we fix the couplings to values of \(g_{{\mathrm{q}}}=0.25\) and \(g_{{\upchi }{}{}}=1\) and vary the values of \(m_{\upchi }{}{} \) and \(m_{\text {med}}\), and for the scalar and pseudoscalar studies, we fix the couplings \(g_{{\mathrm{q}}}=1\) and \(g_{{\upchi }{}{}}=1\), set the dark matter particle mass to \(m_{\upchi }{}{} =1\,\text {Ge}\text {V} \), and vary the values of \(m_{\text {med}}\). The comparison with data is carried out separately for each of the four spin-parity choices for the mediator.

We also explore a two-Higgs-doublet model (2HDM) with an additional pseudoscalar boson,  \(\textsf {a}\), that serves as the mediator between DM and ordinary matter. This “2HDM+ \(\textsf {a}\) ” model [23, 24] is a gauge-invariant and renormalizable model that contains a Higgs scalar (h), which we take to be the observed 125 GeV Higgs boson, a heavy neutral Higgs scalar (H), a charged Higgs scalar (\({{\mathrm{H}}}{}{\pm }\)), and two pseudoscalars (A,  \(\textsf {a}\)), where the pseudoscalar bosons couple to the DM particles. For the process studied in this paper, the H boson is produced via gluon fusion and decays into a standard model (SM) Z boson and the pseudoscalar  \(\textsf {a}\). These subsequently decay into a pair of leptons and a pair of DM particles, respectively, as shown in Fig. 1 (upper right). The sizable couplings of the Z boson to the Higgs bosons makes the mono-Z channel more sensitive to this model than the mono-jet or mono-photon channels. Among the parameters of this model are the Higgs boson masses, the ratio \(\tan \beta \) of the vacuum expectation values of the two Higgs doublets, and the mixing angle \(\theta \) of the pseudoscalars. We consider only configurations in which \(m_{{\mathrm{H}}}=m_{{{\mathrm{H}}}{}{\pm }}=m_{{\mathrm{A}}}\), and fix the values \(\tan \beta =1\) and \(\sin \theta =0.35\), following the recommendations of Ref. [24].

We also examine the case where the h boson acts as a mediator for DM production, as discussed in “Higgs portal” models [25,26,27,28]. If \(m_{{\upchi }{}{}}<m_{{\mathrm{h}}}/2\), the Higgs boson could decay invisibly into a pair of DM particles. The mechanism for such decays can be found, for example, in many supersymmetric theoretical models that contain a stable neutral lightest supersymmetric particle, e.g., a neutralino [29], that is sufficiently light. An illustrative Feynman diagram for such a case is shown in Fig. 1 (lower left), while additional gluon-induced diagrams are also considered.

In addition to the DM paradigm, we consider a model where unparticles are responsible for the missing transverse momentum in the final state. The unparticle physics concept [30, 31] is based on scale invariance, which is anticipated in many BSM physics scenarios [32,33,34]. The effects of the scale-invariant sector (“unparticles”) appear as a non-integral number of invisible massless particles. In this scenario, the SM is extended by introducing a scale-invariant Banks–Zaks field, which has a nontrivial infrared fixed point [35]. This field can interact with the SM particles by exchanging heavy particles with a high mass scale \(M_\textsf {U}\) [36]. Below this mass scale, where the coupling is nonrenormalizable, the interaction is suppressed by powers of \(M_\textsf {U}\) and can be treated within an effective field theory (EFT). The parameters that characterize the unparticle model are the possible noninteger scaling dimension of the unparticle operator \(d_\textsf {U}\), the coupling of the unparticles to SM fields \(\lambda \), and the cutoff scale of the EFT \(\varLambda _\textsf {U}\). In order to remain in the EFT regime, the cutoff scale is set to \(\varLambda _\textsf {U}=15\,\text {Te}\text {V} \) and to maintain unitarity, only \(d_\textsf {U}>1\) is considered. Figure 1 (lower right) shows the tree-level diagram considered in this paper for the production of unparticles associated with a Z boson.

The final SM extension considered in this paper is the Arkani-Hamed–Dimopoulos–Dvali (ADD) model of large extra dimensions [37, 38], which is motivated by the disparity between the electroweak (EW) unification scale (\(M_\text {EW} \sim 100\,\text {Ge}\text {V} \)) and the Planck scale (\(M_\text {Pl} \sim 10^{19}\,\text {Ge}\text {V} \)). This model predicts graviton (G) production via the process \({\mathrm{q}} {\bar{{\mathrm{q}}}} \rightarrow {\mathrm{Z}} + {\mathrm{G}} \), as shown in Fig. 1 (lower right). The graviton escapes detection, leading to a mono-Z signature. In the ADD model, the apparent Planck scale in four spacetime dimensions is given by \(M_\text {Pl}^2 \approx M_{\mathrm {D}}^{n+2}R^n\), where \(M_{\mathrm {D}}\) is the fundamental Planck scale in the full (n+4)-dimensional spacetime and R is the compactification length scale of the extra dimensions. Assuming \(M_{\mathrm {D}}\) is of the same order as \(M_\text {EW}\), the observed large value of \(M_\text {Pl}\) suggests values of R much larger than the Planck length. These values are on the order of nm for \(n=3\), decreasing with larger values of n. The consequence of the large compactification scale is that the mass spectrum of the Kaluza–Klein graviton states becomes nearly continuous [37, 38], resulting in a broadened spectrum for the transverse momentum (\(p_{\mathrm {T}}\)) of the Z boson.

3 The CMS detector

The central feature of the CMS apparatus is a superconducting solenoid of 6 m internal diameter, providing a magnetic field of 3.8 T. Within the solenoid volume are a silicon pixel and strip tracker, a lead tungstate crystal electromagnetic calorimeter (ECAL), and a brass and scintillator hadron calorimeter (HCAL), each composed of a barrel and two endcap sections. Forward calorimeters extend the pseudorapidity (\(\eta \)) coverage provided by the barrel and endcap detectors. Muons are detected in gas-ionization chambers embedded in the steel flux-return yoke outside the solenoid.

Events of interest are selected using a two-tiered trigger system [39]. The first level (L1), composed of custom hardware processors, uses information from the calorimeters and muon detectors to select events at a rate of around 100 kHz within a time interval of less than 4\(\,\mu \text {s}\). The second level, known as the high-level trigger (HLT), consists of a farm of processors running a version of the full event reconstruction software optimized for fast processing, and reduces the event rate to around 1 kHz before data storage.

A more detailed description of the CMS detector, together with a definition of the coordinate system used and the relevant kinematic variables, can be found in Ref. [40].

4 Data samples and event reconstruction

This search uses \({\mathrm{p}} {\mathrm{p}} \) collision events collected with the CMS detector during 2016, 2017, and 2018 corresponding to a total integrated luminosity of 137\(\,\text {fb}^{-1}\). The data sets from the three different years are analyzed independently with appropriate calibrations and corrections to take into account the different LHC running conditions and CMS detector performance.

Several SM processes can contribute to the mono-Z signature. The most important backgrounds come from diboson processes: \({\mathrm{W}} {\mathrm{Z}} \rightarrow \ell {{\upnu }{}{}} \ell \ell \) where one lepton escapes detection, \({\mathrm{Z}} {\mathrm{Z}} \rightarrow \ell \ell {{\upnu }{}{}} {{\upnu }{}{}} \), and \({\mathrm{W}} {\mathrm{W}} \rightarrow \ell \ell {{\upnu }{}{}} {{\upnu }{}{}} \). There can also be contributions where energetic leptons are produced by decays of top quarks in \({\mathrm{t}} {}{\bar{{\mathrm{t}}}} \) or \({\mathrm{t}} {\mathrm{W}} \) events. Smaller contributions may come from triple vector boson processes (\({\mathrm{W}} {\mathrm{W}} {\mathrm{Z}} \), \({\mathrm{W}} {\mathrm{Z}} {\mathrm{Z}} \), and \({\mathrm{Z}} {\mathrm{Z}} {\mathrm{Z}} \)), \({\mathrm{t}} {}{\bar{{\mathrm{t}}}} {\mathrm{W}} \rightarrow {\mathrm{W}} {\mathrm{W}} {\mathrm{b}} {}{\bar{{\mathrm{b}}}} {\mathrm{W}} \), \({\mathrm{t}} {}{\bar{{\mathrm{t}}}} {\mathrm{Z}} \rightarrow {\mathrm{W}} {\mathrm{W}} {\mathrm{b}} {}{\bar{{\mathrm{b}}}} {\mathrm{Z}} \), and \({\mathrm{t}} {}{\bar{{\mathrm{t}}}} {{\upgamma }{}{}} \rightarrow {\mathrm{W}} {\mathrm{W}} {\mathrm{b}} {}{\bar{{\mathrm{b}}}} {{\upgamma }{}{}} \), referred to collectively as V V V due to the similar decay products. Drell–Yan (DY) production of lepton pairs, \({\mathrm{Z}}/{{\upgamma }{}{}} ^*\rightarrow \ell \ell \), has no intrinsic source of missing transverse momentum but can still mimic a mono-Z signature when the momentum of the recoiling system is poorly measured. A minor source of background is from events with a vector boson and a misreconstructed photon, referred to as \({\mathrm{V}{}{}} {{\upgamma }{}{}} \).

Monte Carlo simulated events are used to model the expected signal and background yields. Three sets of simulated events for each process are used in order to match the different data taking conditions. The samples for DM production are generated using the dmsimp package [41, 42] interfaced with \(\textsc {MadGraph} {}5\_a\textsc {mc@nlo} \) 2.4.2 [43,44,45,46]. The pseudoscalar and scalar model samples are generated at leading order (LO) in quantum chromodynamics (QCD), while the vector and axial-vector model samples are generated at next-to-leading-order (NLO) in QCD. The \({\textsc {powheg}} \)v2 [47,48,49,50,51] generator is used to simulate the \({\mathrm{Z}} {\mathrm{h}} \) signal process of the invisible Higgs boson at NLO in QCD, as well as the \({\mathrm{t}} {}{\bar{{\mathrm{t}}}} \), \({\mathrm{t}} {\mathrm{W}} \), and diboson processes. The BSM Higgs boson production cross sections, as a function of the Higgs boson mass for the \({\mathrm{Z}} {\mathrm{h}} \) process are taken from Ref. [52]. Samples for the 2HDM+ \(\textsf {a}\) model are generated at NLO with \(\textsc {MadGraph} {}5\_a\textsc {mc@nlo} \) 2.6.0. Events for both the ADD and unparticle models are generated at LO using an EFT implementation in \({\textsc {pythia}} \) 8.205 in 2016 and 8.230 in 2017 and 2018 [53, 54]. In order to ensure the validity of the effective theory used in the ADD model, a truncation method, described in Ref. [55], is applied. Perturbative calculations are only valid in cases where the square of the center-of-mass energy (\({\hat{s}}\)) of the incoming partons is smaller than the fundamental scale of the theory (\(M_{\mathrm {D}}^2\)). As such, this truncation method suppresses the cross section for events with \({\hat{s}} > M_\text {D}^2\) by a factor of \(M_\text {D}^4/{\hat{s}}^{2}\). The effect of this truncation is largest for small values of \(M_{\mathrm {D}}\), but also increases with the number of dimensions n as more energy is lost in extra dimensions. The \(\textsc {MadGraph} {}5\_a\textsc {mc@nlo} \) 2.2.2 (2.4.2) generator in 2016 (2017 and 2018) is used for the simulation of the V V V, \({\mathrm{V}{}{}} {{\upgamma }{}{}} \), and DY samples, at NLO accuracy in QCD.

The set of parton distribution functions (PDFs) used for simulating the 2016 sample is NNPDF 3.0 NLO [56] and for the 2017 and 2018 samples it is NNPDF 3.1 NNLO. For all processes, the parton showering and hadronization are simulated using pythia 8.226 in 2016 and 8.230 in 2017 and 2018. The modeling of the underlying event is generated using the CUETP8M1 [57] (CP5 [58]) for simulated samples corresponding to the 2016 (2017 and 2018) data sets. The only exceptions to this are the 2016 top quark sample, which uses CUETP8M2 [57] and the simplified DM (2HDM+ \(\textsf {a}\)) samples, which uses CP3 [58] (CP5) tunes for all years. All events are processed through a simulation of the CMS detector based on Geant4  [59] and are reconstructed with the same algorithms as used for data. Simultaneous \({\mathrm{p}} {\mathrm{p}} \) collisions in the same or nearby bunch crossings, referred to as pileup, are also simulated. The distribution of the number of such interactions in the simulation is chosen to match the data, with periodic adjustments to take account of changes in LHC operating conditions [60]. The average number of pileup interactions was 23 for the 2016 data and 32 for the 2017 and 2018 data.

Information from all subdetectors is combined and used by the CMS particle-flow (PF) algorithm [61] for particle reconstruction and identification. The PF algorithm aims to reconstruct and identify each individual particle in an event, with an optimized combination of information from the various elements of the CMS detector. The energies of photons are obtained from the ECAL measurement. The energies of electrons are determined from a combination of the electron momentum at the primary interaction vertex as determined by the tracker, the energy from the corresponding ECAL cluster, and the energy sum from all bremsstrahlung photons spatially compatible with originating from the electron track. The momentum of muons is obtained from the curvature of the corresponding track in the tracker detector in combination with information from the muon stations. The energies of charged hadrons are determined from a combination of their momentum measured in the tracker and the matching ECAL and HCAL energy deposits, corrected for the response function of the calorimeters to hadronic showers. Finally, the energies of neutral hadrons are obtained from the corresponding corrected ECAL and HCAL energies.

The candidate vertex with the largest value of summed physics-object \(p_{\mathrm {T}} ^2\) is taken to be the primary \({\mathrm{p}} {\mathrm{p}} \) interaction vertex. The physics objects are the jets, clustered using the jet finding algorithm [62, 63] with the tracks assigned to candidate vertices as inputs, and the associated missing transverse momentum, taken as the negative vector sum of the \(p_{\mathrm {T}}\) of those jets.

Both electron and muon candidates must pass certain identification criteria to be further selected in the analysis. They must satisfy requirements on the transverse momentum and pseudorapidity: \(p_{\mathrm {T}} > 10\,\text {Ge}\text {V} \) and \(|\eta | < 2.5~(2.4)\) for electrons (muons). At the final level, a medium working point [64, 65] is chosen for the identification criteria, including requirements on the impact parameter of the candidates with respect to the primary vertex and their isolation with respect to other particles in the event. The efficiencies for these selections are about 85 and 90% for each electron and muon, respectively.

In the signal models considered in this paper, the amount of hadronic activity tends to be small, so events with multiple clustered jets are vetoed. For each event, hadronic jets are clustered from reconstructed particle candidates using the infrared and collinear safe anti-\(k_{\mathrm {T}}\) algorithm [62, 63] with a distance parameter of 0.4. Jet momentum is determined as the vectorial sum of all particle momenta in the jet, and is found from simulation to be, on average, within 5 to 10% of the true momentum over the entire spectrum and detector acceptance. Pileup interactions can contribute additional tracks and calorimetric energy depositions to the jet momentum. To mitigate this effect, charged particles identified to be originating from pileup vertices are discarded and an offset is applied to correct for remaining contributions [66]. Jet energy corrections are derived from simulation to bring the measured response of jets to the average of simulated jets clustered from the generated final-state particles. In situ measurements of the momentum balance in dijet, photon+jet, \({\mathrm{Z}} \)+jet, and multijet events are used to determine corrections for residual differences between jet energy scale in data and simulation [66]. The jet energy resolution amounts typically to 15% at 10\(\,\text {Ge}\text {V}\), 8% at 100\(\,\text {Ge}\text {V}\), and 4% at 1\(\,\text {Te}\text {V}\). Additional selection criteria are applied to each jet to remove jets potentially dominated by anomalous contributions from some subdetector components or reconstruction failures [67]. Jets with \(p_{\mathrm {T}} > 30\,\text {Ge}\text {V} \) and \(|\eta |<4.7\) are considered for the analysis.

To identify jets that originated from b quarks, we use the medium working point of the DeepCSV algorithm [68]. This selection was chosen to remove events from top quark decays originating specifically from \({\mathrm{t}} {}{\bar{{\mathrm{t}}}} \) production, without causing a significant loss of signal. For this working point, the efficiency to select b quark jets is about 70% and the probability for mistagging jets originating from the hadronization of gluons or \({\mathrm{u}}/{\mathrm{d}}/{\mathrm{s}} \) quarks is about 1% in simulated \({\mathrm{t}} {}{\bar{{\mathrm{t}}}} \) events.

To identify hadronically decaying \({\uptau }{}{}\) leptons (\({\uptau }{}{} _\mathrm {h}\)), we use the hadron-plus-strips algorithm [69]. This algorithm constructs candidates seeded by PF jets that are consistent with either a single or triple charged pion decay of the \({\uptau }{}{}\) lepton. In the single charged pion decay mode, the presence of neutral pions is detected by reconstructing their photonic decays. Mistagged jets originating from non-\({\uptau }{}{}\) decays are rejected by a discriminator that takes into account the pileup contribution to the neutral component of the \({\uptau }{}{} _\mathrm {h}\) decay [69]. The efficiency to select real hadronically decaying \({\uptau }{}{}\) leptons is about 75% and the probability for mistagging jets is about 1%.

The missing transverse momentum vector \({\vec p}_{\mathrm {T}}^{\text {miss}}\) is computed as the negative vector sum of the transverse momenta of all the PF candidates in an event, and its magnitude is denoted as \(p_{\mathrm {T}} ^\text {miss}\)  [70]. The \({\vec p}_{\mathrm {T}}^{\text {miss}}\) is modified to account for corrections to the energy scale of the reconstructed jets in the event. Events with anomalously high \(p_{\mathrm {T}} ^\text {miss}\) can originate from a variety of reconstruction failures, detector malfunctions, or noncollision backgrounds. Such events are rejected by event filters that are designed to identify more than 85–90% of the spurious high-\(p_{\mathrm {T}} ^\text {miss}\) events with a misidentification rate of less than 0.1% [70].

5 Event selection

Events with electrons (muons) are collected using dielectron (dimuon) triggers, with thresholds of \(p_{\mathrm {T}} > 23\) (17)\(\,\text {Ge}\text {V}\) and \(p_{\mathrm {T}} > 12\) (8)\(\,\text {Ge}\text {V}\) for the electron (muon) with the highest and second-highest measured \(p_{\mathrm {T}}\), respectively. Single-electron and single-muon triggers with \(p_{\mathrm {T}}\) thresholds of 25 (27) and 20 (24)\(\,\text {Ge}\text {V}\) for 2016 (2017–2018) are used to recover residual inefficiencies, ensuring a trigger efficiency above 99% for events passing the offline selection.

In the signal region (SR), events are required to have two (\(N_{\ell } = 2\)) well-identified, isolated electrons or muons with the same flavor and opposite charge (\({{{\mathrm{e}}}{}{+}} {{{\mathrm{e}}}{}{-}} \) or \({{{{\upmu }{}{}}}{}{+}} {{{{\upmu }{}{}}}{}{-}} \)). At least one electron or muon of the pair must have \(p_{\mathrm {T}} > 25\,\text {Ge}\text {V} \), while the second must have \(p_{\mathrm {T}} > 20\,\text {Ge}\text {V} \). In order to reduce nonresonant background, the dilepton invariant mass is required to be within 15\(\,\text {Ge}\text {V}\) of the world-average Z boson mass \(m_{{\mathrm{Z}}}\) [71]. Additionally, we require the \(p_{\mathrm {T}}\) of the dilepton system \(p_{\mathrm {T}} ^{\ell \ell }\) to be larger than 60\(\,\text {Ge}\text {V}\) to reject the bulk of the DY background. Since little hadronic activity is expected for the signal, we reject events having more than one jet with \(p_{\mathrm {T}} >30\,\text {Ge}\text {V} \) within \(|\eta |<4.7\). The top quark background is further suppressed by rejecting events containing any b-tagged jet with \(p_{\mathrm {T}} > 30\,\text {Ge}\text {V} \) reconstructed within the tracker acceptance of \(|\eta | < 2.4\). To reduce the \({\mathrm{W}} {\mathrm{Z}} \) background in which both bosons decay leptonically, we remove events containing additional electrons or muons with loose identification and with \(p_{\mathrm {T}} > 10\,\text {Ge}\text {V} \). Events containing a loosely identified \({\uptau }{}{} _\mathrm {h}\) candidate with \(p_{\mathrm {T}} >18\,\text {Ge}\text {V} \) and \(|\eta | < 2.3\) are also rejected. Decays that are consistent with production of muons or electrons are rejected by an overlap veto.

In addition to the above criteria, there are several selections designed to further reduce the SM background. The main discriminating variables are: the missing transverse momentum, \(p_{\mathrm {T}} ^\text {miss}\); the azimuthal angle formed between the dilepton \(p_{\mathrm {T}}\) and the \({\vec p}_{\mathrm {T}}^{\text {miss}}\), \(\varDelta \phi ({\vec p}_{\mathrm {T}} ^{\ell \ell },{\vec p}_{\mathrm {T}}^{\text {miss}})\); and the balance ratio, \(|p_{\mathrm {T}} ^\text {miss}-p_{\mathrm {T}} ^{\ell \ell } |/p_{\mathrm {T}} ^{\ell \ell }\). The latter two variables are especially powerful in rejecting DY and top quark processes. Selection criteria are optimized to obtain the best signal sensitivity for the range of DM processes considered. The final selection requirements are: \(p_{\mathrm {T}} ^\text {miss} > 100\,\text {Ge}\text {V} \), \(\varDelta \phi ({\vec p}_{\mathrm {T}} ^{\ell \ell },{\vec p}_{\mathrm {T}}^{\text {miss}}) > 2.6~{\mathrm{radians}}\), and \(|p_{\mathrm {T}} ^\text {miss}-p_{\mathrm {T}} ^{\ell \ell } |/p_{\mathrm {T}} ^{\ell \ell } < 0.4\).

For the 2HDM+ \(\textsf {a}\) model, the selection differs slightly. We make a less stringent requirement on the missing transverse momentum, \(p_{\mathrm {T}} ^\text {miss} >80\,\text {Ge}\text {V} \), and require the transverse mass, \(m_{\mathrm {T}} =\sqrt{\smash [b]{2p^{\mathrm {T}}_{\ell \ell }p_{\mathrm {T}} ^\text {miss} [1-\cos \varDelta \phi ({\vec p}_{\mathrm {T}} ^{\ell \ell },{\vec p}_{\mathrm {T}}^{\text {miss}})]}}\) to be greater than 200\(\,\text {Ge}\text {V}\). The kinematic properties of the 2HDM+ \(\textsf {a}\) production yield a peak in the \(m_{\mathrm {T}}\) spectrum near the neutral Higgs scalar (H) mass that is advantageous for background discrimination.

In order to avoid biases in the \(p_{\mathrm {T}} ^\text {miss}\) calculation due to jet mismeasurement, events with one jet are required to have the azimuthal angle between this jet and the missing transverse momentum, \(\varDelta \phi ({\vec p}_{\mathrm {T}} ^{\mathrm {j}},{\vec p}_{\mathrm {T}}^{\text {miss}})\), larger than 0.5 radians. To reduce the contribution from backgrounds such as \({\mathrm{W}} {\mathrm{W}} \) and \({\mathrm{t}} {}{\bar{{\mathrm{t}}}} \), we apply a requirement on the distance between the two leptons in the \((\eta ,\phi )\) plane, \(\varDelta R_{\ell \ell } < 1.8\), where \(\varDelta R_{\ell \ell } = \sqrt{\smash [b]{(\varDelta \phi _{\ell \ell })^2+(\varDelta \eta _{\ell \ell })^2}}\).

A summary of the selection criteria for the SR is given in Table 1.

Table 1 Summary of the kinematic selections for the signal region

6 Background estimation

We estimate the background contributions using combined information from simulation and control regions (CRs) in data. A simultaneous maximum likelihood fit to the \(p_{\mathrm {T}} ^\text {miss}\) or \(m_{\mathrm {T}}\) distributions in the SR and CRs constrains the background normalizations and their uncertainties. Specific CRs target different categories of background processes, as described below.

6.1 The three-lepton control region

The \({{\mathrm{W}} {\mathrm{Z}} \rightarrow \ell '{{\upnu }{}{}} \ell \ell }\) decay mode can contribute to the SR when the third lepton (\(\ell '={\mathrm{e}} \) or \({{\upmu }{}{}} \)) escapes detection, and this same process can be monitored in an orthogonal CR, where the third lepton is identified and then removed. The construction of the three-lepton (\(3\ell \)) CR is based on events with three well-reconstructed charged leptons. A Z boson candidate is selected in the same manner as for the SR , while an additional electron or muon with identical quality and isolation is required. In cases where there are multiple Z boson candidates, the candidate with invariant mass closest to the Z boson mass is selected. To enhance the purity of the \({\mathrm{W}} {\mathrm{Z}} \) selection, \(p_{\mathrm {T}} ^\text {miss}\) of at least 30\(\,\text {Ge}\text {V}\) is required and the invariant mass of three leptons is required to be larger than 100\(\,\text {Ge}\text {V}\). The backgrounds in this CR are similar to those in the SR, with a sizable nonprompt background from DY events where a jet is misidentified as a lepton [72]. An additional minor source of background is from events with a vector boson and a misreconstructed photon (\({\mathrm{V}{}{}} {{\upgamma }{}{}} \)). All background estimates for this CR are taken from simulation.

To simulate the consequences of not detecting the third lepton, the “emulated \(p_{\mathrm {T}} ^\text {miss}\) ” is estimated from the vectorial sum of \({\vec p}_{\mathrm {T}}^{\text {miss}}\) and the transverse momentum (\({\vec p}_{\mathrm {T}}\)) of the additional lepton. The emulated \(p_{\mathrm {T}} ^\text {miss}\) is then used in place of the reconstructed \(p_{\mathrm {T}} ^\text {miss}\) and the same selection is applied as for the SR. Since there is negligible contamination from \({\mathrm{W}} {\mathrm{Z}} \rightarrow {\uptau }{}{} {{\upnu }{}{}} \ell \ell \) and top quark backgrounds in this CR, no veto is applied on additional \({\uptau }{}{} _\mathrm {h}\) or b jet candidates. The resulting emulated \(p_{\mathrm {T}} ^\text {miss}\) spectrum is shown in Fig. 2 (upper). For the 2HDM+ \(\textsf {a}\) case, the “emulated \(m_{\mathrm {T}}\) ” is used instead of “emulated \(p_{\mathrm {T}} ^\text {miss}\) ” with the same selections.

Fig. 2
figure 2

Emulated \(p_{\mathrm {T}} ^\text {miss}\) distribution in data and simulation for the \(3\ell \) (upper) and \(4\ell \) (lower) CRs. Uncertainty bands correspond to the postfit combined statistical and systematic components, where the fitting method is described in Sect. 7

6.2 The four-lepton control region

The \({\mathrm{Z}} {\mathrm{Z}} \) process contributes to the SR through the \({\mathrm{Z}} {\mathrm{Z}} \rightarrow \ell \ell {{\upnu }{}{}} {{\upnu }{}{}} \) decay mode, and the same production process can be monitored via the decay mode \({\mathrm{Z}} {\mathrm{Z}} \rightarrow 4\ell \). The \(4\ell \) CR is based on events with two pairs of charged leptons. Each pair comprises two leptons of opposite charge and the same flavor and corresponds to a Z candidate. Two of the four leptons must fulfill the same requirements on the leptons as in the SR, while, in order to increase the yield, the other two leptons need only pass relaxed lepton quality requirements. The highest \(p_{\mathrm {T}}\) Z boson candidate is required to have an invariant mass within 35\(\,\text {Ge}\text {V}\) of the Z boson mass \(m_{{\mathrm{Z}}}\) [71]. Additionally, we require the transverse momentum of this Z boson candidate to be larger than 60\(\,\text {Ge}\text {V}\). Additional backgrounds to the \({\mathrm{Z}} {\mathrm{Z}} \) final state are events from triboson processes, events with a vector boson and a higgs boson (\({\mathrm{V}{}{}} {\mathrm{h}} \)) and from nonprompt events. These backgrounds are almost negligible. All background estimates for this CR are taken from simulation.

For these four-lepton events, the emulated \(p_{\mathrm {T}} ^\text {miss}\) is calculated as the vectorial sum of the \({\vec p}_{\mathrm {T}}^{\text {miss}} \) and the \({\vec p}_{\mathrm {T}}\) of the Z boson candidate with the larger absolute mass difference to \(m_{{\mathrm{Z}}}\). The choice of which Z boson to use as a proxy for an invisibly decaying boson negligibly alters the emulated \(p_{\mathrm {T}} ^\text {miss}\) spectrum. The same selection as the SR is then applied using the emulated \(p_{\mathrm {T}} ^\text {miss}\) in place of the reconstructed \(p_{\mathrm {T}} ^\text {miss}\), with the exception of the \({\uptau }{}{} _\mathrm {h}\) and b jet candidate vetoes. The resulting emulated \(p_{\mathrm {T}} ^\text {miss}\) spectrum is shown in Fig. 2 (lower). Similarly to the \(3\ell \) CR, the “emulated \(m_{\mathrm {T}}\) ” is used instead of “emulated \(p_{\mathrm {T}} ^\text {miss}\) ” for the 2HDM+ \(\textsf {a}\) case and the distribution is well described by the SM background estimations.

6.3 The electron-muon control region

We estimate the contribution of the flavor-symmetric backgrounds from an \({\mathrm{e}} {{\upmu }{}{}} \) CR based on events with two leptons of different flavor and opposite charge (\({\mathrm{e}} ^{\pm }{{\upmu }{}{}} ^{\mp }\)) that pass all other analysis selections. This CR is largely populated by nonresonant backgrounds (NRB) consisting mainly of leptonic W boson decays in \({\mathrm{t}} {}{\bar{{\mathrm{t}}}} \), \({\mathrm{t}} {\mathrm{W}} \), and \({\mathrm{W}} {\mathrm{W}} \) events, where the dilepton mass happens to fall inside the \({\mathrm{Z}} \) boson mass window. Small contributions from single top quark events produced via s- and t-channel processes, and \({\mathrm{Z}} \rightarrow {\uptau }{}{} {\uptau }{}{} \) events in which \({\uptau }{}{}\) leptons decay into light leptons and neutrinos, are also considered in the NRB estimation.

6.4 The DY control region

The DY background is dominant in the region of low \(p_{\mathrm {T}} ^\text {miss}\). This process does not produce undetectable particles. Therefore, any nonzero \(p_{\mathrm {T}} ^\text {miss}\) arises from mismeasurement or limitations in the detector acceptance. The estimation of this background uses simulated DY events, for which the normalization is taken from data in a sideband CR of \(80< p_{\mathrm {T}} ^\text {miss} < 100\,\text {Ge}\text {V} \) where the signal contamination is negligible, with all other selections applied. For the 2HDM+ \(\textsf {a}\) analysis, a similar approach is taken with relaxed \(p_{\mathrm {T}} ^\text {miss} \) selection of \(50< p_{\mathrm {T}} ^\text {miss} < 100\,\text {Ge}\text {V} \) and an additional selection of \(m_{\mathrm {T}} < 200\,\text {Ge}\text {V} \) applied. The sideband CR is included in the maximum likelihood fit and a 100% uncertainty is assigned to the extrapolation from this CR to the SR. This uncertainty has little effect on the results because of the smallness of the overall contribution from the DY process in the SR.

7 Fitting method

After applying the selection, we perform a binned maximum likelihood fit to discriminate between the potential signal and the remaining background processes. The data sets for each data-taking year are kept separate in the fit. This yields a better expected significance than combining them into a single set because the signal-to-background ratios are different for the three years due to the different data-taking conditions. The electron and muon channels have comparable signal-to-background ratios, and are combined in the fit, while the contributions, corrections and systematic uncertainties are calculated individually.

The \(p_{\mathrm {T}} ^\text {miss}\) distribution of events passing the selection is used as the discriminating variable in the fit for all of the signal hypotheses except for the 2HDM+ \(\textsf {a}\) model. For this model, the \(m_{\mathrm {T}}\) distribution is used since a Jacobian peak around the pseudoscalar Higgs boson mass is expected. Events in the SR are split into 0-jet and 1-jet categories to take into account the different signal-to-background ratios. In addition, for the CRs defined in Sect. 6, events with 0-jet and 1-jet are included as a single category in the fit. The \({\mathrm{e}} {{\upmu }{}{}} \) and DY CRs are each included as a single bin corresponding to the total yield. The \(p_{\mathrm {T}} ^\text {miss}\) or \(m_{\mathrm {T}}\) spectra in the \(3\ell \) and \(4\ell \) CRs are included in the fit with the same binning as in the SR, where these spectra are based upon the emulated \(p_{\mathrm {T}} ^\text {miss}\). To allow for further freedom in the \({\mathrm{Z}} {\mathrm{Z}} \) and \({\mathrm{W}} {\mathrm{Z}} \) background estimation, the \(p_{\mathrm {T}} ^\text {miss}\) and emulated \(p_{\mathrm {T}} ^\text {miss}\) distributions are split into three regions with independent normalization parameters: low (\(< 200\,\text {Ge}\text {V} \)), medium (200–400\(\,\text {Ge}\text {V}\)), and high (\(> 400\,\text {Ge}\text {V} \)), with uncertainties of 10, 20, and 30%, respectively. These values are based on the magnitudes of the theoretical uncertainties as described in Sect. 8. For fits to the 2HDM+ \(\textsf {a}\) model, three similar \(m_{\mathrm {T}}\) regions are chosen with the same uncertainties: low (\(< 400\,\text {Ge}\text {V} \)), medium (400–800\(\,\text {Ge}\text {V}\)), and high (\(> 800\,\text {Ge}\text {V} \)). To make the best use of the statistical power in the CRs and to take advantage of the similarities of the production processes, we take the normalization factors to be correlated for the \({\mathrm{Z}} {\mathrm{Z}} \) and the \({\mathrm{W}} {\mathrm{Z}} \) backgrounds in each \(p_{\mathrm {T}} ^\text {miss}\) region.

For each individual bin, a Poisson likelihood term describes the fluctuation of the data around the expected central value, which is given by the sum of the contributions from signal and background processes. Systematic uncertainties are represented by nuisance parameters \(\theta \) with log-normal probability density functions used for normalization uncertainties and Gaussian functions used for shape-based uncertainties, with the functions centered on their nominal values \({\hat{\theta }}\). The uncertainties affect the overall normalizations of the signal and background templates, as well as the shapes of the predictions across the distributions of observables. Correlations among systematic uncertainties in different categories are taken into account as discussed in Sect. 8. The total likelihood is defined as the product of the likelihoods of the individual bins and the probability density functions for the nuisance parameters:

$$\begin{aligned} {\mathcal {L}} = {\mathcal {L}}_{\text {SR}} {\mathcal {L}}_{3\ell } {\mathcal {L}}_{4\ell } {\mathcal {L}}_{{\mathrm{e}} {{\upmu }{}{}}} {\mathcal {L}}_{\text {DY}} \, f_{\text {NP}} \Big (\theta \mid {\hat{\theta }} \Big ) \end{aligned}$$
(1)

The factors of the likelihood can be written more explicitly as

$$\begin{aligned} {\mathcal {L}} _{\text {SR}}=&\, \prod _{i,j} {\mathcal {P}} \Big ( N^{\text {SR}}_{\text {obs},i,j} \mid \mu _{\text {DY}}N^{\text {SR}}_{\text {DY},i,j}(\theta ) \nonumber \\&+ \mu _{\text {NRB}}N^{\text {SR}}_{\text {NRB},i,j}(\theta ) + N^{\text {SR}}_{\text {other},i,j}(\theta ) \nonumber \\&+ \mu _{{\mathrm{V}{}{}} {\mathrm{V}{}{}},r(i)}(N^{2\ell }_{{\mathrm{Z}} {\mathrm{Z}} ,i,j}(\theta ) + N^{\text {SR}}_{{\mathrm{W}} {\mathrm{Z}} ,i,j}(\theta )) \nonumber \\&+ \mu N^{\text {SR}}_{\text {Sig},i,j}(\theta ) \Big ), \end{aligned}$$
(2)
$$\begin{aligned} {\mathcal {L}} _{3\ell }=&\, \prod _{i} {\mathcal {P}} \Big ( N^{3\ell }_{\text {obs},i} \mid N^{3\ell }_{\text {other},i}(\theta ) + \mu _{{\mathrm{V}{}{}} {\mathrm{V}{}{}},r(i)} N^{3\ell }_{{\mathrm{W}} {\mathrm{Z}} ,i}(\theta ) \Big ), \end{aligned}$$
(3)
$$\begin{aligned} {\mathcal {L}} _{4\ell }=&\, \prod _{i} {\mathcal {P}} \Big ( N^{4\ell }_{\text {obs},i} \mid N^{4\ell }_{\text {other},i}(\theta ) + \mu _{{\mathrm{V}{}{}} {\mathrm{V}{}{}},r(i)} N^{4\ell }_{{\mathrm{Z}} {\mathrm{Z}} ,i}(\theta ) \Big ), \end{aligned}$$
(4)
$$\begin{aligned} {\mathcal {L}} _{{\mathrm{e}} {{\upmu }{}{}}}=&\,{\mathcal {P}} \Big ( N^{{\mathrm{e}} {{\upmu }{}{}}}_{\text {obs}} \mid \mu _{\text {NRB}}N^{{\mathrm{e}} {{\upmu }{}{}}}_{\text {NRB}}(\theta ) + N^{{\mathrm{e}} {{\upmu }{}{}}}_{\text {other}}(\theta ) \Big ), \end{aligned}$$
(5)
$$\begin{aligned} {\mathcal {L}} _{\text {DY}} =&\, {\mathcal {P}} \Big ( N^{\text {DY}}_{\text {obs}} \mid \mu _{\text {DY}}N^{\text {DY}}_{\text {DY}}(\theta ) +\mu _{\text {NRB}}N^{\text {DY}}_{\text {NRB}}(\theta ) \nonumber \\&+N^{\text {DY}}_{\text {other}}(\theta ) + N^{\text {DY}}_{{\mathrm{Z}} {\mathrm{Z}} }(\theta ) + N^{\text {DY}}_{{\mathrm{W}} {\mathrm{Z}} }(\theta ) + \mu N^{\text {DY}}_{\text {Sig}}(\theta ) \Big ). \end{aligned}$$
(6)

The purpose of the fit is to determine the confidence interval for the signal strengths \(\mu \). Here \({\mathcal {P}}(N\mid \lambda )\) is the Poisson probability to observe N events for an expected value of \(\lambda \), and \(f_{\text {NP}}(\theta \mid {\hat{\theta }})\) describes the nuisance parameters with log-normal probability density functions used for normalization uncertainties and Gaussian functions used for shape-based uncertainties. The index i indicates the bin of the \(p_{\mathrm {T}} ^\text {miss}\) or \(m_{\mathrm {T}}\) distribution, r(i) corresponds to the region (low, medium, high) of bin i, and the index j indicates either the 0-jet or 1-jet selection. The diboson process normalization in the region r(i) is \(\mu _{{\mathrm{V}{}{}} {\mathrm{V}{}{}},r(i)}\), while \(\mu _{\text {DY}}\) is the DY background normalization and \(\mu _{\text {NRB}}\) is the normalization for the nonresonant background. The yield prediction from simulation for process x in region y is noted as \(N^{y}_{x}\). The smaller backgrounds in each region are merged together and are indicated collectively as “other”. The method above for constructing likelihood functions follows that of Ref. [73], where a more detailed mathematical description may be found.

8 Systematic uncertainties

In the following, we describe all of the uncertainties that are taken into account in the maximum likelihood fit. We consider the systematic effects on both the overall normalization and on the shape of the distribution of \(p_{\mathrm {T}} ^\text {miss}\) or \(m_{\mathrm {T}}\) for all applicable uncertainties. We evaluate the impacts by performing the full analysis with the value of the relevant parameters shifted up and down by one standard deviation. The final varied distributions of \(p_{\mathrm {T}} ^\text {miss}\) or \(m_{\mathrm {T}}\) are used for signal extraction and as input to the fit. For each source of uncertainty, variations in the distributions are thus treated as fully correlated, while independent sources of uncertainty are treated as uncorrelated. Except where noted otherwise, the systematic uncertainties for the three different years of data taking are treated as correlated.

The assigned uncertainties in the integrated luminosity are 2.5, 2.3, and 2.5% for the 2016, 2017, and 2018 data samples [74,75,76], respectively, and are treated as uncorrelated across the different years.

We apply scale factors to all simulated samples to correct for discrepancies in the lepton reconstruction and identification efficiencies between data and simulation. These factors are measured using DY events in the \({\mathrm{Z}} \) boson peak region [65, 77, 78] that are recorded with unbiased triggers. The factors depend on the lepton \(p_{\mathrm {T}}\) and \(\eta \) and are within a few percent of unity for electrons and muons. The uncertainty in the determination of the trigger efficiency leads to an uncertainty smaller than \(1\%\) in the expected signal yield.

For the kinematic regions used in this analysis, the lepton momentum scale uncertainty for both electrons and muons is well represented by a constant value of \(0.5\%\). The uncertainty in the calibration of the jet energy scale (JES) and resolution directly affects the \(p_{\mathrm {T}} ^\text {miss}\) computation and all the selection requirements related to jets. The estimate of the JES uncertainty is performed by varying the JES. The variation corresponds to a re-scaling of the jet four-momentum as \(p \rightarrow p (1 \pm \delta p_{\mathrm {T}} ^{\text {JES}}/p_{\mathrm {T}})\), where \(\delta p_{\mathrm {T}} ^{\text {JES}}\) is the absolute uncertainty in the JES, which is parameterized as function of the \(p_{\mathrm {T}}\) and \(\eta \) of the jet. In order to account for the systematic uncertainty from the jet resolution smearing procedure, the resolution scale factors are varied within their uncertainties. Since the uncertainties in the JES are derived independently for the three data sets, they are treated as uncorrelated across the three data sets.

The signal processes are expected to produce very few events containing b jets, and we reject events with any jets that satisfy the b tagging algorithm working point used. In order to account for the b tagging efficiencies observed in data, an event-by-event reweighting using b tagging scale factors and efficiencies is applied to simulated events. The uncertainty is obtained by varying the event-by-event weight by ±1 standard deviation. Since the uncertainties in the b tagging are derived independently for the three data sets, they are treated as uncorrelated across the three data sets. The variation of the final yields induced by this procedure is less than \(1\%\).

Simulated samples are reweighted to reproduce the pileup conditions observed in data. We evaluate the uncertainty related to pileup by recalculating these weights for variations in the total inelastic cross section by 5% around the nominal value [79]. The resulting shift in weights is propagated through the analysis and the corresponding \(p_{\mathrm {T}} ^\text {miss}\) and \(m_{\mathrm {T}}\) spectra are used as input to the maximum likelihood fit. The variation of the final yields induced by this procedure is less than 1%.

Shape-based uncertainties for the \({\mathrm{Z}} {\mathrm{Z}} \) and \({\mathrm{W}} {\mathrm{Z}} \) backgrounds, referred to jointly as \({\mathrm{V}{}{}} {\mathrm{V}{}{}} \), and signal processes are derived from variations of the renormalization and factorization scales, the strong coupling constant \(\alpha _\mathrm {S}\), and PDFs [80,81,82]. The scales are varied up and down by a factor of two. Variations of the PDF set and \(\alpha _\mathrm {S}\) are used to estimate the corresponding uncertainties in the yields of the signal and background processes following Ref. [56]. The missing higher-order EW terms in the event generation for the \({\mathrm{V}{}{}} {\mathrm{V}{}{}} \) processes yield another source of theoretical uncertainty [83, 84]. The following additional higher-order corrections are applied: a constant (approximately \(10\%\)) correction for the \({\mathrm{W}} {\mathrm{Z}} \) cross section from NLO to NNLO in QCD calculations [85]; a constant (approximately \(3\%\)) correction for the \({\mathrm{W}} {\mathrm{Z}} \) cross section from LO to NLO in EW calculations, according to Ref. [86]; a \(\varDelta \phi ({\mathrm{Z}}, {\mathrm{Z}})\)-dependent correction to the \({\mathrm{Z}} {\mathrm{Z}} \) production cross section from NLO to next-to-next-to-leading order (NNLO) in QCD calculations [87]; a \(p_\mathrm {T}\)-dependent correction to the \({\mathrm{Z}} {\mathrm{Z}} \) cross section from LO to NLO in EW calculations, following Refs. [83, 84, 86], which is the dominant correction in the signal region. We use the product of the above NLO EW corrections and the inclusive NLO QCD corrections [88] as an estimate of the missing NLO EW\(\times \)NLO QCD contribution, which is not used as a correction, but rather assigned as an uncertainty. The resulting variations in the \(p_{\mathrm {T}} ^\text {miss}\) and \(m_{\mathrm {T}}\) distribution are used as a shape uncertainty in the likelihood fit.

The shapes of the \(p_{\mathrm {T}} ^\text {miss}\) and \(m_{\mathrm {T}}\) distributions are needed for each of the background processes. For the DY and nonresonant processes, we take the shape directly from simulation. The distributions for the \({\mathrm{Z}} {\mathrm{Z}} \) and \({\mathrm{W}} {\mathrm{Z}} \) processes are obtained by taking the shapes from the simulation and normalizing them to the yield seen in the data in the CR. The gluon-induced and the quark-induced \({\mathrm{Z}} {\mathrm{Z}} \) processes have different acceptances and their uncertainties are treated separately, while the normalization factors are taken to be correlated. In all cases, the limited number of simulated events in any given bin gives rise to a systematic uncertainty. This uncertainty is treated as fully uncorrelated across the bins and processes.

A summary of the impact on the signal strength of the systematic uncertainties is shown in Table 2. The \({\mathrm{Z}} {\mathrm{h}} (\text {invisible})\) model is used as an example to illustrate the size of the uncertainties, both for the presence (\({\mathcal {B}}({\mathrm{h}} \rightarrow \text {invisible})=1\)) and absence (\({\mathcal {B}}({\mathrm{h}} \rightarrow \text {invisible})=0\)) of a signal. These two paradigms are used to generate Asimov data sets that are then fit to give the uncertainty estimates shown in Table 2. The systematic uncertainties are dominated by the theoretical uncertainty in the \({\mathrm{Z}} {\mathrm{Z}} \) and \({\mathrm{W}} {\mathrm{Z}} \) background contributions.

Table 2 Summary of the uncertainties in the branching fraction arising from the systematic uncertainties considered in the \({\mathrm{Z}} {\mathrm{h}} (\text {invisible})\) model assuming \({\mathcal {B}}({\mathrm{h}} \rightarrow \text {invisible})=1\) (signal) and \({\mathcal {B}}({\mathrm{h}} \rightarrow \text {invisible})=0\) (no signal). Here, lepton measurement refers to the combined trigger, lepton reconstruction and identification efficiencies, and the lepton momentum and electron energy scale systematic uncertainty. Theory uncertainties include variations of the renormalization and factorization scales, \(\alpha _{s}\), and PDFs as well as the higher-order EWK corrections

9 Results

The number of observed and expected events in the SR after the final selection is given in Table 3, where the values of the expected yields and their uncertainties are obtained from the maximum likelihood fit. The observed numbers of events are compatible with the background predictions. The expected yields and the product of acceptance and efficiency for several signal models used in the analysis are shown in Table 4. The post-fit \(p_{\mathrm {T}} ^\text {miss}\) distributions for events in the signal region in the 0-jet and 1-jet categories are shown in Fig. 3. The final \(m_{\mathrm {T}} \) distributions used for the 2HDM+ \(\textsf {a}\) model are shown in Fig. 4.

Table 3 Observed number of events and post-fit background estimates in the two jet multiplicity categories of the SR. The reported uncertainty represents the sum in quadrature of the statistical and systematic components
Table 4 Expected yields and the product of acceptance and efficiency for several models probed in the analysis. The quoted values correspond to the \({\mathrm{Z}} \rightarrow \ell \ell \) decays. The reported uncertainty represents the sum in quadrature of the statistical and systematic components
Fig. 3
figure 3

The \(p_{\mathrm {T}} ^\text {miss}\) distributions for events in the signal region in the 0-jet (upper) and 1-jet (lower) categories. The rightmost bin also includes events with \(p_{\mathrm {T}} ^\text {miss} >800\,\text {Ge}\text {V} \). The uncertainty band includes both statistical and systematic components. The \({\mathrm{Z}} {\mathrm{h}} (\text {invisible})\) signal normalization assumes SM production rates and the branching fraction \({\mathcal {B}}({\mathrm{h}} \rightarrow \text {invisible})=1\). For the ADD model, the signal normalization assumes the expected values for \(n=4\) and \(M_{\mathrm {D}}=2\,\text {Te}\text {V} \)

Fig. 4
figure 4

The \(m_{\mathrm {T}}\) distributions for events in the signal region in the 0-jet (upper) and 1-jet (lower) categories. The rightmost bin includes all events with \(m_{\mathrm {T}} >1000\,\text {Ge}\text {V} \). The uncertainty band includes both statistical and systematic components. The signal normalization assumes the expected values for \(m_{\mathrm{H}} =1200\,\text {Ge}\text {V}, m_{\,\textsf {a}}=300\,\text {Ge}\text {V} \) within the 2HDM+ \(\textsf {a}\) framework where \(m_{{\mathrm{H}}}=m_{{{\mathrm{H}}}{}{\pm }}=m_{{\mathrm{A}}}\), \(\tan \beta =1\) and \(\sin \theta =0.35\)

For each of the models considered, simulated signal samples are generated for relevant sets of model parameters. The observed \(p_{\mathrm {T}} ^\text {miss}\) and \(m_{\mathrm {T}} \) spectra are used to set limits on theories of new physics using the modified frequentist construction \(\text {CL}_\text {s}\)  [73, 89, 90] used in the asymptotic approximation [91].

9.1 Simplified dark matter model interpretation

In the framework of the simplified models of DM, the signal production is sensitive to the mass, spin, and parity of the mediator as well as the coupling strengths of the mediator to quarks and to DM. The \(p_{\mathrm {T}} ^\text {miss}\) distribution is used as an input to the fit. Limits for the vector and axial-vector mediators are shown as a function of the mediator mass \(m_{\text {med}}\) and DM particle mass \(m_{\upchi }{}{} \) as shown in Figure 5. Cosmological constraints on the DM abundance [92] are added to Fig. 5 where the shaded area represents the region where additional physics would be needed to describe the DM abundance. For vector mediators, we observe a limit around \(m_{\text {med}}>870\,\text {Ge}\text {V} \) for most values of \(m_{\upchi }{}{} \) less than \(m_{\text {med}}/2\). For axial-vector mediators the highest limit reached in the allowed region is about \(m_{\text {med}}>800\,\text {Ge}\text {V} \). In both cases, the previous limits from this channel are extended by about 150\(\,\text {Ge}\text {V}\), but the limits are still less restrictive than those from published mono-jet results [7] because weakly coupled Z bosons are radiated from the initial state quarks much less frequently than gluons. Figure 6 shows the 90% \(\text {CL}\) limits on the DM-nucleon cross sections calculated following the suggestions in Ref. [22]. Limits are shown as a function of the DM particle mass for both the spin-independent and spin-dependent cases and compared to selected results from direct-detection experiments.

Fig. 5
figure 5

The 95% \(\text {CL}\) exclusion limits for the vector (upper) and the axial-vector (lower) simplified models. The limits are shown as a function of the mediator and DM particle masses. The coupling to quarks is fixed to \(g_{{\mathrm{q}}}=0.25\) and the coupling to DM is set to \(g_\chi =1\)

Fig. 6
figure 6

The 90% \(\text {CL}\) DM-nucleon upper limits on the cross section for simplified DM in the spin-independent (upper) and spin-dependent (lower) cases. The coupling to quarks is set to \(g_{{\mathrm{q}}}=0.25\) and the coupling to DM is set to \(g_\chi =1\). Limits from the XENON1T [93], LUX [94], PandaX-ll [95], CRESST-III [96], and DarkSide-50 [97] experiments are shown for the spin-independent case with vector couplings. Limits from the PICO-60 [98], PICO-2L [99], IceCube [100], and Super-Kamiokande [101] experiments are shown for the spin-dependent case with axial-vector couplings

In addition to vector and axial-vector mediators, scalar and pseudoscalar mediators are also tested. For these models, we fix both couplings to quarks and to DM particles: \(g_{{\mathrm{q}}}=1\) and \(g_\chi =1\) as suggested in Ref. [22]. Since the choice of DM particle mass is shown to have negligible effects on the kinematic distributions of the detected particles, we set it to the constant value of \(m_{\upchi }{}{} =1\,\text {Ge}\text {V} \). Figure 7 gives the 95% \(\text {CL}\) exclusion limits on the production cross section over the predicted cross section as a function the mediator mass \(m_{\text {med}}\). The expected limits are about 25% better than the previous results in this channel [4], but are not yet sensitive enough to exclude any value of \(m_{\text {med}}\). The best limits obtained on the cross section are about 1.5 times larger than the predicted values for low values of \(m_{\text {med}}\).

Fig. 7
figure 7

The 95% \(\text {CL}\) upper limits on the cross section for simplified DM models with scalar (upper) and pseudoscalar (lower) mediators. The coupling to quarks is set to \(g_{{\mathrm{q}}}=1\), the coupling to DM is set to \(g_\chi =1\) and the DM mass is \(m_\chi =1\,\text {Ge}\text {V} \)

9.2 Two-Higgs-doublet model interpretation

For the 2HDM+ \(\textsf {a}\) model, the signal production is sensitive to the heavy Higgs boson and the pseudoscalar  \(\textsf {a}\) masses. As discussed in Sect. 7, the \(m_{\mathrm {T}}\) distribution is used in the fit rather than \(p_{\mathrm {T}} ^\text {miss}\). The limits on both the heavy Higgs boson and the additional pseudoscalar mediator  \(\textsf {a}\) are shown in Fig. 8. The mixing angles are set to \(\tan \beta =1\) and \(\sin \theta =0.35\) with a DM particle mass of \(m_{{\upchi }{}{}}=10\,\text {Ge}\text {V} \). The mediator mass with the most sensitivity is \(m_{{\mathrm{H}}}=1000\,\text {Ge}\text {V} \), where the observed (expected) limit on \(m_{\,\textsf {a}}\) is 440 (340)\(\,\text {Ge}\text {V}\). For small values of \(m_{\,\textsf {a}}\), the limit on \(m_{\mathrm{H}} \) is about 1200\(\,\text {Ge}\text {V}\). These can be compared with the observed (expected) limits from ATLAS of \(m_{\,\textsf {a}}>340\) (340)\(\,\text {Ge}\text {V}\) and \(m_{{\mathrm{H}}}>1050\) (1000)\(\,\text {Ge}\text {V}\) based on a \(\sqrt{s}=13\,\text {Te}\text {V} \) data set corresponding to an integrated luminosity of 36\(\,\text {fb}^{-1}\)  [102].

Fig. 8
figure 8

The 95% \(\text {CL}\) upper limits on the 2HDM+ \(\textsf {a}\) model with the mixing angles set to \(\tan \beta =1\) and \(\sin \theta =0.35\) and with a DM particle mass of \(m_{{\upchi }{}{}}=10\,\text {Ge}\text {V} \). The limits are shown as a function of the heavy Higgs boson and the pseudoscalar masses

9.3 Invisible Higgs boson interpretation

For the search for invisible decays of the Higgs boson, we use the \(p_{\mathrm {T}} ^\text {miss}\) distribution as input to the fit. We obtain upper limits on the product of the Higgs boson production cross section and branching fraction to invisible particles \(\sigma _{{\mathrm{Z}} {\mathrm{h}}}{\mathcal {B}}({\mathrm{h}} \rightarrow {\text {invisible}})\). This can be interpreted as an upper limit on \({\mathcal {B}}({\mathrm{h}} \rightarrow {\text {invisible}})\) by assuming the production rate [52, 103, 104] for an SM Higgs boson at \(m_{{\mathrm{h}}} = 125\,\text {Ge}\text {V} \). The observed (expected) 95% \(\text {CL}\) upper limit at \(m_{{\mathrm{h}}} = 125\,\text {Ge}\text {V} \) on \({\mathcal {B}}({\mathrm{h}} \rightarrow {\text {invisible}})\) is 29% (\(25^{+9}_{-7}\)%) as shown in Fig. 9. The observed (expected) limit from the previous CMS result in this channel was \({\mathcal {B}}({\mathrm{h}} \rightarrow {\text {invisible}})< 45 (44)\)%. The combinations of all earlier results yields an observed (expected) limit of 19 (15)% from CMS [19] and 26% (\(17^{+5}_{-5}\)%) from ATLAS [20].

Fig. 9
figure 9

The value of the negative log-likelihood, \(-2\varDelta \)ln\({\mathcal {L}}\), as a function of the branching fraction of the Higgs boson decaying to invisible particles

9.4 Unparticle interpretation

In the unparticle scenario, the same analysis of the \(p_{\mathrm {T}} ^\text {miss}\) spectrum is performed. At 95% \(\text {CL}\), upper limits are set on the cross section with \(\varLambda _\textsf {U}=15\,\text {Te}\text {V} \). The limits are shown in Fig. 10 as a function of the scaling dimension \(d_\textsf {U}\). The observed (expected) limits are 0.5 (0.7) pb, 0.24 (0.26) pb, and 0.09 (0.07) pb for \(d_\textsf {U} = 1\), \(d_\textsf {U} = 1.5\), and \(d_\textsf {U} = 2\) respectively, compared to 1.0 (1.0) pb, 0.4 (0.4) pb, and 0.15 (0.15) pb for the earlier result [4]. These limits depend on the choice of \(\lambda \) and \(\varLambda _\textsf {U}\), as the cross section scales with the Wilson coefficient \(\lambda /\varLambda _\textsf {U}\) [30]. We fix the coupling between the SM and the unparticle fields to \(\lambda =1\).

Fig. 10
figure 10

The 95% \(\text {CL}\) upper limits on unparticle+Z production cross section, as a function of the scaling dimension \(d_\textsf {U}\). These limits apply to fixed values of the effective cutoff scale \(\varLambda _\textsf {U}=15\,\text {Te}\text {V} \) and coupling \(\lambda =1\)

9.5 The ADD interpretation

In the framework of the ADD model of extra dimensions, we use the fits to the \(p_{\mathrm {T}} ^\text {miss}\) distribution to calculate limits on the number of extra dimensions n and the fundamental Planck scale \(M_{\mathrm {D}}\). The cross section limit calculated as a function of \(M_{\mathrm {D}}\) for the case where \(n=4\) is shown in Fig. 11. The limits on \(M_{\mathrm {D}}\) as a function of n are obtained, as shown in Fig. 12. The observed (expected) 95% \(\text {CL}\) exclusion upper limit on the mass \(M_{\mathrm {D}}\) is 2.9–3.0 (2.7–2.8)\(\,\text {Te}\text {V}\) compared to earlier results of 2.3–2.5 (2.3–2.5)\(\,\text {Te}\text {V}\)  [4].

Fig. 11
figure 11

The 95% \(\text {CL}\) cross section limit in the ADD scenario as a function of \(M_{\mathrm {D}}\) for \(n=4\)

Fig. 12
figure 12

The 95% \(\text {CL}\) expected and observed exclusion limits on \(M_{\mathrm {D}}\) as a function of the number of extra dimensions n

9.6 Summary of limits

Table 5 gives a summary of the limits expected and observed for a selection of relevant parameters in all of the models considered.

Table 5 Observed and expected 95% \(\text {CL}\) limits on parameters for the simplified DM models, invisible decays of the Higgs boson, two-Higgs-doublet model, large extra dimensions in the ADD scenario, and unparticle model. For the scalar and pseudoscalar mediators, the limits are dependent on the mediator mass, so the lowest values for the ratio of observed to theoretical cross sections are presented. For the vector and axial-vector mediators, the limits are dependent on the DM particle mass, so the limits are shown for \(m_{\upchi }{}{} <300\,\text {Ge}\text {V} \) for the vector mediator and \(m_{\upchi }{}{} =240\,\text {Ge}\text {V} \) for the axial-vector mediator

10 Summary

Events with a Z boson recoiling against missing transverse momentum in proton–proton collisions at the LHC are used to search for physics beyond the standard model. The results are interpreted in the context of several different models of the coupling mechanism between dark matter and ordinary matter: simplified models of dark matter with vector, axial-vector, scalar, and pseudoscalar mediators; invisible decays of a 125\(\,\text {Ge}\text {V}\) scalar Higgs boson; and a two-Higgs-doublet model with an extra pseudoscalar. Outside the context of dark matter, models that invoke large extra dimensions or propose the production of unparticles could contribute to the same signature and are also considered. The observed limits on the production cross sections are used to constrain parameters of each of these models. The search utilizes a data set collected by the CMS experiment in 2016–2018, corresponding to an integrated luminosity of 137\(\,\text {fb}^{-1}\) at \(\sqrt{s}=13\,\text {Te}\text {V} \). No evidence of physics beyond the standard model is observed. Comparing to the previous results in this channel based on a partial data sample collected at \(\sqrt{s}=13\,\text {Te}\text {V} \) in 2016, corresponding to an integrated luminosity of approximately 36\(\,\text {fb}^{-1}\) for CMS [4] and for ATLAS [5], the exclusion limits for simplified dark matter mediators, gravitons and unparticles are significantly extended. For the case of a 125\(\,\text {Ge}\text {V}\) scalar boson, an upper limit of 29% is set for the branching fraction to fully invisible decays at 95% confidence level. Results for the two-Higgs-doublet model with an additional pseudoscalar are presented in this final state and probe masses of the pseudoscalar mediator up to 440\(\,\text {Ge}\text {V}\) and of the heavy Higgs boson up to 1200\(\,\text {Ge}\text {V}\) when the other model parameters are set to specific benchmark values.