Measurement of WZ and ZZ production in pp collisions at √ s =8 TeV in final states with b-tagged jets

Measurements are reported of the WZ and ZZ production cross sections in proton-proton collisions at √ s=8 TeV in final states where one Z boson decays to b-tagged jets. The other gauge boson, either W or Z, is detected through its leptonic decay (either W→e ,  or Z→e+e− , +−, or vv̄). The results are based on data corresponding to an integrated luminosity of 18.9 fb−1 collected with the CMS detector at the Large Hadron Collider. The measured cross sections, (pp→WZ)=30.7±9.3(stat.)±7.1(syst.)±4.1(th.)±1.0(lum.)pb and (pp→ZZ)=6.5±1.7(stat.)±1.0(syst.)±0.9(th.)±0.2(lum.)pb , are consistent with next-to-leading order quantum chromodynamics calculations. DOI: https://doi.org/10.1140/epjc/s10052-014-2973-5 Posted at the Zurich Open Repository and Archive, University of Zurich ZORA URL: https://doi.org/10.5167/uzh-108356 Journal Article Accepted Version The following work is licensed under a Creative Commons: Attribution 3.0 Unported (CC BY 3.0) License. Originally published at: CMS Collaboration; Canelli, M F; Chiochia, V; Kilminster, B; Robmann, P; et al (2014). Measurement of WZ and ZZ production in pp collisions at √ s=8 TeV in final states with b-tagged jets. European Physical Journal C Particles and Fields, 74:2973. DOI: https://doi.org/10.1140/epjc/s10052-014-2973-5 EUROPEAN ORGANIZATION FOR NUCLEAR RESEARCH (CERN) CERN-PH-EP/2014-022 2014/08/12


Introduction
The study of WZ and ZZ (referred to collectively as VZ) diboson production in proton-proton collisions provides an important test of the gauge sector of the standard model (SM). In pp collisions at √ s = 8 TeV, the predicted cross sections are σ (pp → WZ) = 22.3 ± 1.1 pb and σ (pp → ZZ) = 7.7 ± 0.4 pb at next-to-leading order (NLO) in quantum chromodynamics (QCD) [1]. A significant deviation from these theoretical values would indicate contributions from physics beyond the SM. Both processes constitute important backgrounds to the associated production of V and standard model Higgs (H) bosons, especially in those channels involving H → bb decays. The production rate of two vector bosons in pp collisions at the Large Hadron Collider (LHC) has been measured by the ATLAS and Compact Muon Solenoid (CMS) Collaborations in all-leptonic WZ and ZZ decay modes [2][3][4][5].
We present a measurement of the VZ production cross sections in the VZ → Vbb decay mode, where the V decays leptonically: Z → νν, W ± → ± ν, and Z → + − , with corresponding to either e or μ. Contributions from W → τ ν * e-mail: cms-publication-committee-chair@cern.ch with leptonic τ decays are included in the W ± → ± ν channels. The analysis uses final states with no charged leptons (0-lepton), single lepton (1-lepton), or dilepton (2-lepton) events with electron and muon channels analyzed separately. The Z boson decays to b quarks are selected by requiring the presence of two b-tagged jets. The results are based on data corresponding to an integrated luminosity of 18.9 fb −1 collected with the CMS detector at the LHC. Two methods are used in the analysis, one involves a fit to the output of a multivariate discriminant, and the other a fit to the two-jet mass (m bb ) distribution. The cross sections are calculated simultaneously for WZ and ZZ production at transverse momenta of the accompanying V of p V T > 100 GeV, for Z boson masses falling within the window 60 < M Z < 120 GeV. The latter requirement assures a uniform treatment of interference with background processes. Approximately 15 % of the WZ and 14 % of the ZZ total inclusive cross sections are contained within their respective regions of acceptance for p V T > 100 GeV, as calculated using several event generators discussed in the following section. The 1-lepton channel is sensitive almost exclusively to WZ production, while the 2-lepton modes are restricted to the ZZ process. The channel with no charged leptons is sensitive to both production modes, with ZZ and WZ channels contributing 70 % and 30 %, respectively, to these events. The 0-lepton WZ events contribute primarily when the lepton from W ± → ± ν falls outside of acceptance.

CMS detector, triggering, object reconstruction and event simulation
A description of the CMS detector can be found in Ref. [6]. Particles produced in pp collisions are detected in the pseudorapidity range |η| < 5, where η = − ln[tan(θ/2)], and θ is the polar angle relative to the direction of the counterclockwise circulating proton beam. The CMS detector comprises a superconducting solenoid, providing a uniform axial magnetic field of 3.8 T over a cylindrical region that is 12.5 m long and 6 m in diameter. The magnetic volume contains a silicon pixel and strip tracking system (|η| < 2.5), surrounded by a lead tungstate crystal electromagnetic calorimeter (ECAL) and a brass/scintillator hadronic calorimeter (HCAL) at |η| < 3.0. A steel/quartz-fiber Cherenkov calorimeter extends the coverage to |η| = 5. The steel flux-return yoke outside the solenoid is instrumented with gas-ionization detectors used to identify muons at |η| < 2.4. The 1-lepton channels rely on several single-lepton triggers with p T thresholds between 17 and 30 GeV and restrictive lepton identification. The 2-lepton channels use the same single-muon triggers for selecting the Z → μ + μ − events and 2-electron triggers with p T thresholds of 17 and 8 GeV for the electron of higher and lower p T , respectively, and with more restrictive isolation requirements for selecting the Z → e + e − events.
A combination of several triggers is used for the events without charged leptons: all triggers require E miss T to be above a given threshold, such that the trigger efficiency ranges from 70 to 99 % for E miss T = 100 GeV to 170 GeV, respectively. Electron reconstruction requires a match of a cluster in the ECAL to a track reconstructed in the silicon tracker [7-9]. Electron identification relies on a multivariate technique that combines observables sensitive to the amount of bremsstrahlung emitted along the electron trajectory, the match in position and energy of the electron trajectory with the associated cluster, as well as the energy distribution in the cluster. Additional requirements are imposed to minimize background from electrons produced through photons converting into e + e − pair while traversing the tracker material. Electron candidates are considered if observed in the pseudorapidity range |η| < 2.5 but excluding the transition regions at 1.44 < |η| < 1.57 between the ECAL barrel and endcaps.
Muons are reconstructed using two algorithms [10]: one in which tracks in the silicon tracker are matched to signals in the muon chambers, and another in which a global fit is performed to the track that is seeded by signals detected in the outer muon system. The muon candidates are required to be reconstructed by both algorithms. Additional identification criteria are imposed on muon candidates to reduce the fraction of tracks misidentified as muons. These include the number of hits reconstructed in the tracker and in the muon system, the quality of the global fit to a muon trajectory, and its consistency with originating from the primary vertex. Muon candidates are finally required to fall in the |η| < 2.4 range.
Jets are reconstructed from particle-flow [11,12] objects using the anti-k T jet clustering algorithm [13], with a distance parameter of 0.5, as implemented in the fastjet package [14,15]. Each jet is required to lie within |η| < 2.5 and have p T > 20 GeV. Jet energy corrections are applied as a function of η and p T of the jet [16]. The imbalance in transverse momentum (often referred to as "missing transverse energy vector") is calculated as the negative of the vectorial sum of the p T of all particle-flow objects identified in the event, and the magnitude of this vector is referred to as E miss T . The procedures of Ref. [17] are applied on an event-by-event basis to mitigate the effects of multiple interactions per beam crossing (pileup).
The CMS combined secondary-vertex (CSV) b-tagging algorithm [18] is used to identify jets that are likely to originate from the hadronization of b quarks. This algorithm combines the information about track impact parameters and secondary vertices in a discriminant that distinguishes b jets from jets originating from light quarks, gluons, or c quarks. The output of the CSV algorithm is a continuous discriminator with a value in the range 0 to 1, where typical thresholds for b jet selection range from loose (≈0.2) to tight (≈0.9). Depending on the chosen CSV threshold, the efficiencies for tagging jets originating from b quarks range from 50 % (tight) to 75 % (loose), while the misidentification rates for c quarks range from 5 % (tight) to 25 % (loose) and for light quarks or gluons range from 0.2 % (tight) to 3 % (loose).
The b-jet energy resolution is improved by applying multivariate regression techniques similar to those used in the CDF experiment [19]. An additional correction, beyond the standard CMS jet energy corrections, is derived from simulated events to recalibrate each b-tagged jet with the generated b quark energy. This involves a specialized boosted decision tree (BDT) [20,21] trained on simulated signal events, with inputs that include information on jet structure, such as information about individual tracks, jet constituents, information on semileptonic b-hadron decays, and the presence of any low-p T leptons. The BDT correction, identical to that used in Ref. [17], improves the resolution on the mass of the bb system by ≈15 %, resulting in an increase in the sensitivity of the analysis of 10-20 %, depending on the specific channel. The Z → bb invariant mass resolution after this correction is ≈10 %.
Simulated samples of events are produced using several event generators, and the response of the CMS detector is modeled using the Geant4 program [22]. The Mad-Graph 5.1 [23] generator is used to generate the diboson signals, as well as the background from W+jets, Z+jets, and tt events. The single-top-quark samples are generated with powheg [24][25][26][27], and generic multijet samples using pythia 6.4 [28]. VH event samples with a SM H boson mass of m H = 125 GeV are also produced using the powheg [29] event generator interfaced to herwig ++ [30] for parton showering and hadronization. The NLO MSTW2008 set [31] of parton distribution functions (PDF) is used to produce the NLO powheg samples, while the leading-order (LO) CTEQ6L1 set [32] is used for the events that correspond to LO calculations. The Z2Star tune [33] is used to parametrize the underlying event. Corrections to account for differences in efficiencies between data and simulation are measured using data using a tag and probe technique [34], and applied as individual weights to each of the simulated events.

Event selection
We use the analysis techniques developed in the CMS VH studies of Ref. [17]. Event selection is based on the reconstruction of a vector boson that decays leptonically in association with the Z boson that decays into two b-tagged jets. Dominant backgrounds to VZ production include V+b jets, V+light flavor (LF = udsc quark or gluon) jets, tt, singletop-quark, generic multijet, and H boson production. In general, b-tagging reduces the contributions from LF events, and counting additional jet activity is used to reduce background from tt and single-top-quark events. Finally, the value of m bb provides a way to distinguish VZ from V+b and SM VH production, as discussed below.
The reconstruction of a Z → bb decay proceeds by selecting two central jets from the primary vertex with |η| < 2.5, each with a p T above some chosen threshold, and defining the bb candidate as the jet pair with largest vectorial sum of transverse momenta ( p T bb ). This combination is very efficient for p V T > 100 GeV without biasing the differential distribution of the background, and also defines the two-jet mass m bb , which is required to be < 250 GeV. The two selected jets are also required to be tagged as b jets, with a value of the CSV discriminator that depends on the specific nature of the event.
Candidate W ± → ± ν decays in WZ events are identified through the presence of a single isolated lepton and significant E miss T . Electrons and muons are required to have p T > 30 GeV and p T > 20 GeV, respectively. To reduce contamination from generic multijet processes, the E miss T is required to be > 45 GeV. In addition, the azimuthal angle (φ) between the E miss T vector and the lepton is required to be < π/2. At least two jets with p T > 30 GeV and a moderate CSV discriminator value are required to define the Z → bb candidate.
Candidate Z → + − decays in ZZ events are reconstructed by combining isolated, oppositely charged pairs of electrons or muons, with a dilepton invariant mass of 75 < m < 105 GeV. The p T of each lepton is required to be > 20 GeV. The two jets of the Z → bb candidate must pass a loose CSV discriminator value, which is optimized in simulated events for increasing the sensitivity of the analysis.
The identification of Z → νν decays in ZZ events requires E miss T > 100 GeV in the event, and at least one of the b jets with p T > 60 GeV and the other with p T > 30 GeV to form a Z → bb candidate. Moderate CSV requirements are applied on both jets. Two additional event requirements are imposed to reduce the multijet background in which E miss T originates from mismeasured jet energies. First, a φ(E miss T , jet) > 0.5 radians requirement is applied on the azimuthal angle between the direction of E miss T and the p T of the jet closest in φ that satisfies |η| < 2.5 and p T > 25 GeV. The second requirement is that the azimuthal angle between the direction of E miss T (trks) , as calculated from only the charged tracks that satisfy p T > 0.5 GeV and |η| < 2.5, and the direction of the full E miss T has φ(E miss T , E miss T (trks) ) < 0.5 radians. Finally, to reduce background from tt events in the 1-lepton and 0lepton channels, events that contain any additional isolated leptons with p T > 20 GeV are rejected.

Multivariate analysis
The signal region is defined by events that satisfy the V and Z boson reconstruction criteria described above. To optimize the significance of the signal as well as the bb mass resolution, events are classified into different regions of the V boson transverse momentum. In particular, we define three regions for the 1-lepton channels: GeV is defined for the 2-lepton channels. Three regions for the channel without charged leptons are defined by (i) 100 To reduce background in the region of smallest p V T , the E miss T significance (defined as the ratio of E miss T to the square root of the total transverse energy deposited in the calorimeter) is required to be >3 √ GeV. To better discriminate between signals and background, the final stage of the analysis introduces a BDT discriminant trained on simulated samples for signal and all background processes. The set of input variables is identical to the one used in Ref. [17], and includes the mass of the bb system, the number of additional jets beyond the b and b candidates (N aj ), the value of CSV for the bb jets with CSV min specifying the smaller value and CSV max the larger one, and the distance in η-φ between the b and b jet axes, Figure 1(a) displays the combined differential distribution for events from all channels as a function of the logarithm of the signal-to-background (S/B) ratio of the values of the output of the corresponding S and B contributions to the BDT discriminants of each event. Panel (b) gives the ratio of the data (black points) to the SM expectation (histogram) relative to the background-only hypothesis, while panel (c) gives the ratio to the expectation from the SM, including the VZ contribution. The excess observed in bins with largest S/B is clearly consistent with what is expected for VZ production in the SM. As a cross-check of the multivariate analysis, we perform a simpler analysis based on the m bb distribution of the reconstructed bb jets of the hypothesized Z boson. The signal region is defined by events that satisfy the V and Z boson reconstruction criteria used in the multivariate analysis. Events are again classified according to p V T , and, in addition, more restrictive selections are introduced than in the multivariate analysis, because the single variable m bb is not a sufficiently sensitive discriminant.
In the 0-lepton and 1-lepton channels, the b-tagging requirements are tightened, respectively, to a tight CSV max and a medium CSV min . A veto is also imposed on any additional jets, and φ(V, Z) is required to be >2.95 radians. The regions of 100 < p V T < 130 GeV, 130 < p V T < 180 GeV, and p V T > 180 GeV are used to analyze the 1-muon channel, and the regions for the 1-electron channel are defined as 100 < p V T < 150 GeV and p V T > 150 GeV. The selected regions for the 0-lepton channel are identical in p V T to the requirements used in the multivariate analysis, but we define ranges of p T bb > 110 GeV, p T bb > 140 GeV, and p T bb > 190 GeV, and impose an additional threshold for the jet of highest p T of >80 GeV for the region of  Fig. 2 (a) The combined bb invariant mass distribution for all channels, compared to MC simulation of SM contributions. (b) Same distribution as in (a), but with all backgrounds to VZ production, except for the VH contribution, subtracted. The contributions from backgrounds and signal are summed cumulatively. The expectations for the sum of VZ signal and background from VH production are also shown superimposed. The error bars and cross-hatched regions reflect statistical uncertainties at 68 % confidence level are defined by 100 < p V T < 150 GeV and p V T > 150 GeV, and, in addition, we require medium CSV max and moderate CSV min thresholds, and E miss T < 60 GeV. Figure 2(a) combines events from all channels into a single m bb distribution, which is compared to expectations from the SM. Figure 2(b) shows the same distribution, but after subtracting all SM contributions except for the VZ signals and VH backgrounds. The VZ signal is clearly visible, with a yield compatible to that expected in the SM.

Background calibration regions and systematic uncertainties
Calibration regions in data are used to validate the simulated distributions used to build the BDT discriminants, as well as to correct normalizations of the major background contributions from W and Z bosons produced in association with jets (LF or b quarks) and tt production. These calibration regions are identical to those of Ref. [17], and typically involve inversion of b-tag selection criteria and two-jet mass sidebands around the signal region. A set of simultaneous fits is then performed to distributions of discriminating variables in the calibration regions, separately for each channel, to obtain consistent scale factors that are used to adjust the yields from simulated events. These scale factors account not only for discrepancies between predicted cross sections and data, but also for any residual differences in the selection of physical objects. Separate scale factors are consequently applied for each of the background processes in the different channels. For the backgrounds from V+jets, the calibration regions are enriched in either b or LF jets. Uncertainties in the scale factors include statistical components arising from the fits to the discriminant (affected by the finite size of the data and MC samples), and systematic uncertainties originating from b tagging, jet energy scale, and jet energy resolution.
The numerical values of the scale factors are close to unity and their uncertainties (3-50 %) are identical to those of Ref. [17]. The systematic uncertainties considered in the measurement of the cross section using the multivariate analysis are summarized in Table 1. The two columns give the uncertainty in the "signal strength" μ for the WZ and ZZ processes, which corresponds to the ratio of the observed yield relative to the yield expected from the SM. Each systematic uncertainty is represented by a nuisance parameter and profiled in the combined fit. To evaluate the impact of individual uncertainties a fit to a simulated pseudo-dataset is performed removing individual nuisance parameters.
Theoretical uncertainties in the acceptances are evaluated using the mcfm [1] generator by changing the QCD factorization and renormalization scales up and down by a factor of two relative to the default scales of μ R = μ F = m Z . The impact of uncertainties in PDF and α s on the cross section and acceptance of the VZ signal are evaluated following the PDF4LHC prescription [35,36], using CT10 [37], MSTW08 [31], and NNPDF2.0 [38] sets of PDF, and the combined uncertainty is found to be 5 % for both WZ and ZZ production. Because of the large p V T values required in this analysis, the results are sensitive to electroweak (EW) and NNLO QCD corrections, both of which can be significant. Since the exact corrections for the VZ process are not available, we use the NLO EW [39][40][41] and next-to-nextto-leading-order (NNLO) QCD [42] corrections to VH production, and apply these to the VZ channel, because they are expected to be similar for the two processes. Based on the size of the correction, an additional 10 % uncertainty is assigned to the inclusive cross section to account for the extrapolation to the p V T < 100 GeV region. The uncertainty in CMS luminosity is estimated to be 2.6 % [43]. Muon and electron triggering, reconstruction, and identification efficiencies are determined in data from samples of Z → + − decays. The uncertainty in the lepton yields due to trigger inefficiency is 2 % per lepton, as is the uncertainty in lepton identification efficiency. The parame-ters describing the turn-on in the trigger efficiency in the 0-lepton channel are varied within their statistical uncertainties for different assumptions on the methods used to derive the efficiency. The estimated uncertainty is 3 %.
The jet energy scale is also varied within its uncertainty as a function of jet p T and η, and the efficiency of the selections is then recomputed to assess the dependence on these variables. The effect of this uncertainty on the jet energy resolution is evaluated by smearing the jet energies according to their measured uncertainties, a process that affects both the normalization and distribution of events. An uncertainty of 3 % is assigned to the yields of all processes in the 0-lepton and 1-lepton channels due to uncertainties related to E miss T , such as its scale and resolution.
Scaling factors to normalize b-tagging in simulation to that in data (measured in b enhanced samples of jets that contain muons) are applied consistently to jets in simulated signal and background events. The measured uncertainties in b-tagging scale factors are 3 % per b-quark jet, 6 % per c-quark jet, and 15 % per mistagged jet (originating from a gluon or from a light quark) [18]. These translate into uncertainties in yields of 3-15 %, depending on channel and specific process. The BDT output is also affected by the distributions of the CSV output, and an uncertainty is therefore assigned according to ±1 standard deviation (SD) variation in yield and shape of the CSV distributions.
Finally, the sizes of the simulated samples, as well as uncertainties in generator-level modeling of V+jets and tt backgrounds, are taken into account to determine the total uncertainty in the signal strength μ.

Results
The total cross sections are determined from a simultaneous fit to all final states, constrained by the number of events observed in each category. The likelihood is written as a combination of individual channel likelihoods for the signal and background hypotheses. We extract the best-fit values of the signal strength assuming the SM expectation for the ratio of σ (WZ)/σ (ZZ) at NLO. Using the baseline multivariate analysis, the VZ process is observed with a statistical significance of 6.3 SD (5.9 SD expected). The measurement corresponds to a signal strength relative to the SM of μ = 1.09 +0. 24 −0.21 . The cross-check analysis based on m bb yields a significance of 4.1 SD (4.6 SD expected), which corresponds to μ = 0.97 +0.32 −0.29 . In the following, the interpretation refers to the more sensitive multivariate analysis.
The cross sections extracted from the individual channels are consistent with each other and with the SM predictions, as can be seen in Fig. 3(a). To extract the WZ and ZZ cross sections, a simultaneous fit is performed floating independently the WZ and ZZ contributions, with results displayed  Fig. 3 (a) Best-fit values of the ratios of the VZ production cross sections, relative to SM predictions for individual channels, and for all channels combined (hatched band). (b) Contours of 68 and 95 % confidence level for WZ and ZZ production cross sections. The large cross indicates the best-fit value including its 68 % statistical uncertainty, and the light small cross shows the result for the MCFM NLO calculation in Fig. 3(b). The most likely values are μ WZ = 1.37 +0.42 −0.37 and μ ZZ = 0.85 +0. 34 −0.31 . The values for the signal strength are extrapolated to the mass window 60 < M Z < 120 GeV for both the bb and lepton pair invariant masses. The resulting cross section for inclusive WZ production is σ (pp → WZ) = 30.7 ± 9.3 (stat.) ± 7.1 (syst.) ± 4.1 (th.) ± 1.0 (lum.) pb, compared to the theoretical value of σ (pp → WZ) = 22.3 ± 1.1 pb, calculated with mcfm using the MSTW2008 PDF. The ZZ cross section is σ (pp → ZZ) = 6.5 ± 1.7 (stat.) ± 1.0 (syst.) ± 0.9 (th.) ± 0.2 (lum.) pb, for the same Z-mass window, which can be compared to the theoretical value of σ (pp → ZZ) = 7.7 ± 0.4 pb, also calculated with mcfm using the MSTW2008 PDF. The uncertainties in both theoret-ical values include uncertainties in the PDF and α s , and those originating from the uncertainty in renormalization and factorization scales. The ZZ cross section is in agreement with CMS measurements using all-leptonic V decays of Ref.
[5], which is more precise than this analysis.
Acknowledgments We congratulate our colleagues in the CERN accelerator departments for the excellent performance of the LHC and thank the technical and administrative staffs at CERN and at other CMS institutes for their contributions to the success of the CMS effort. In addition, we gratefully acknowledge the computing centres and personnel of the Worldwide LHC Computing Grid for delivering so effectively the computing infrastructure essential to our analyses. Finally, we acknowledge the enduring support for the construc-  Open Access This article is distributed under the terms of the Creative Commons Attribution License which permits any use, distribution, and reproduction in any medium, provided the original author(s) and the source are credited.