Measurements of the associated production of a Z boson and b jets in pp collisions at sqrt(s) = 8 TeV

Measurements of the associated production of a Z boson with at least one jet originating from a b quark in proton-proton collisions at sqrt(s) = 8 TeV are presented. Differential cross sections are measured with data collected by the CMS experiment corresponding to an integrated luminosity of 19.8 inverse femtobarns. Z bosons are reconstructed through their decays to electrons and muons. Cross sections are measured as a function of observables characterizing the kinematics of the b jet and the Z boson. Ratios of differential cross sections for the associated production with at least one b jet to the associated production with any jet are also presented. The production of a Z boson with two b jets is investigated, and differential cross sections are measured for the dijet system. Results are compared to theoretical predictions, testing two different flavour schemes for the choice of initial-state partons.


Introduction
The associated production of vector bosons and jets (V+jets) in hadronic collisions is a large background source in measurements of several standard model (SM) processes, Higgs boson studies, and many searches for physics beyond the SM. Its description constitutes an important benchmark for perturbative quantum chromodynamics (pQCD) predictions. Differential cross sections as a function of kinematic observables characterizing V+jets topologies are sensitive to the contributions from both the hard scattering process and the associated soft QCD radiation, as well as to the parton distribution functions (PDFs). Among the V+jets processes, the case in which a Z/γ * boson is produced in association with b quarks, pp → Z + (≥1b), hereafter denoted as Z(1b), is particularly interesting. Antiquarks are also assumed in the notation, and the Z/γ * interference contribution is considered to be part of the process. Within the SM, the Z(1b) final state is the dominant background for studies of the associated production of Higgs and Z bosons, in which the Higgs boson decays into a bb pair [1]. Many physics scenarios beyond the SM predict final states with b quarks and Z bosons: new generations of heavy quarks (b , t ) decaying into Z(1b) [2], supersymmetric Higgs bosons produced in association with b quarks [3], and extended SM scenarios with additional SU(2) doublets with enhanced Zbb coupling [4]. The study of the associated production of Z bosons and b quark jets may also provide information useful in describing an analogous process where a W boson is produced, which is more difficult to measure because of higher background contamination.
This paper presents measurements of associated production of a Z boson and b quark jets using proton-proton collision data at 8 TeV collected with the CMS detector, corresponding to an integrated luminosity of 19.8 fb −1 . The Z boson is reconstructed through its leptonic decay into an electron or muon pair, while the presence of b quarks is inferred from the characteristics of jets (denoted as b jets) that originate from their hadronization products and subsequent decays. In order to characterize Z(1b) production, fiducial differential cross sections are measured as a function of five kinematic observables: the transverse momentum p T and pseudorapidity η of the highest-p T b jet, the Z boson p T , the scalar sum of the transverse momenta of all jets regardless of the flavour of the parton producing them (H T ), and the azimuthal angular difference between the direction of the Z boson and the highest-p T b jet (∆φ Zb ). Ratios of the differential cross sections for Z(1b) and Z+jets production, inclusive in jet flavour, are also measured as a function of these five observables. The cancellation of several systematic uncertainties in the cross section ratio allows an even more precise comparison with theory than the differential cross sections themselves.
Events with at least two b jets, henceforth Z(2b), contribute as background sources to other SM and beyond-SM processes. The production dynamics of this kind of event are studied through the measurement of the fiducial differential cross section as a function of observables characterizing the kinematic properties of the dijet system formed by the leading and subleading (in p T ) b jets: the p T of these two jets; the Z boson p T ; the invariant masses of the bb and Zbb systems (M bb and M Zbb respectively); the angle ∆φ bb between the two b jets in the plane transverse to the beam axis and their separation in the η-φ plane (∆R bb ); the distance in the η-φ plane between the Z boson and the closer b jet (∆R min Zb ); and the asymmetry in the distances in the η-φ plane between the Z boson and the closer versus farther b jets (A Zbb ).
Previously, the cross section for the associated production of Z bosons and b jets was measured in proton-antiproton collisions by the CDF [5] and D0 [6] Collaborations at the Fermilab Tevatron and in proton-proton collisions at a centre-of-mass energy of 7 TeV by the ATLAS [7] and CMS [8] Collaborations at the CERN LHC. The CMS Collaboration also studied Z(2b) production by explicitly reconstructing b hadron decays [9], in order to explore the region where b quarks are emitted in an almost collinear topology. Previous measurements of the ratio of the Z(1b) to the Z+jets inclusive cross section were published by the D0 Collaboration [10].
The paper is organized as follows: Section 2 is dedicated to the description of the CMS apparatus and Section 3 to the data and simulated samples used in the analysis. Section 4 discusses the lepton, jet, and b jet reconstruction and the event selection. Section 5 discusses background estimation, while Section 6 is dedicated to the description of the unfolding procedure to correct data for detector effects. Section 7 presents a discussion of the systematic uncertainties. In Section 8 the measured differential cross sections and the corresponding ratios are presented, together with a discussion of the comparison with theoretical predictions. Finally, the results are summarized in Section 9.

The CMS detector
A detailed description of the CMS detector, together with the definition of the coordinate system used and the relevant kinematic variables, can be found in Ref. [11]. The central feature of the CMS apparatus is a superconducting solenoid of 6 m internal diameter. The field volume houses a silicon tracker, a crystal electromagnetic calorimeter (ECAL), and a brass and scintillator hadron calorimeter, each composed of a barrel and two endcap sections. The magnet flux-return yoke is instrumented with muon detectors. The silicon tracker measures charged particles within the pseudorapidity range |η| < 2.5. It consists of 1440 silicon pixel and 15 148 silicon strip detector modules and is located in the 3.8 T field of the superconducting solenoid. For nonisolated particles of 1 < p T < 10 GeV and |η| < 1.4, the track resolutions are typically 1.5% in p T and 25-90  µm in the transverse (longitudinal) impact parameter [12]. The electron momentum is estimated by combining the energy measurement in the ECAL with the momentum measurement in the tracker. The momentum resolution for electrons with p T ≈ 45 GeV from Z → ee decays ranges from 1.7% for nonshowering electrons in the barrel region to 4.5% for showering electrons in the endcaps [13]. Muons are measured in the pseudorapidity range |η| < 2.4, with detection planes made using three technologies: drift tubes, cathode strip chambers, and resistive plate chambers. Matching muons to tracks measured in the silicon tracker results in a relative transverse momentum resolution for muons with 20 < p T < 100 GeV of 1.3-2.0% in the barrel and better than 6% in the endcaps. The p T resolution in the barrel is better than 10% for muons with p T up to 1 TeV [14]. Forward calorimeters extend the pseudorapidity coverage provided by the barrel and endcap detectors.
The CMS detector uses a two-level trigger system. The first level of the system, composed of custom hardware processors, uses information from the calorimeters and muon detectors to select the most interesting events in a fixed time interval of less than 4 µs. The high-level trigger processor farm further decreases the event rate from around 100 kHz to less than 1 kHz before data storage.

Event simulation
The associated production of a Z boson and jets is experimentally reconstructed as two opposite-sign same-flavour electrons or muons accompanied by jets and can be mimicked by various background sources: tt events, dibosons (WW, WZ, ZZ) and W bosons produced in association with jets, single top quark events, as well as Z+jets events in which the Z boson decays into τ + τ − . Diboson events with a leptonic Z boson decay and jets produced in the hadronic decay of the other vector boson are not considered as part of the signal. Samples of simulated events are used to model both the signal and the background processes. The MAD-GRAPH 5.1.3.30 [15] event generator is used to simulate Z+jets (including jets from b quarks), W+jets, and tt events; this generator implements a leading-order (LO) matrix element calculation with up to four (three) additional partons in the final state for V+jets (tt) events, using the CTEQ6L1 PDF set [16], which is based on the five flavour scheme (5FS). A detailed discussion is given in Section 8.2. The parton-level events are interfaced with PYTHIA version 6.424 [17] for parton showering, hadronization, and description of the multiple-parton interactions (MPIs). The PYTHIA6 Z2* tune, which is based on the CTEQ6L1 PDF set, is used [18]. The matrix element and parton shower calculations are matched using the k t -MLM algorithm [19]. The cross section inclusive in jet multiplicity is rescaled to its next-to-next-to-leading-order (NNLO) prediction, computed with FEWZ 3.1 [20,21] for the Z+jets and W+jets processes, and with the calculation of reference [22] for the tt process. To study systematic uncertainties, signal events are also generated using MADGRAPH5 aMC@NLO [23] version 2.2.1, with next-to-leadingorder (NLO) matrix elements for zero, one, and two additional partons merged with the FXFX algorithm [24], interfaced with PYTHIA version 8.205 [25] for showering and hadronization. In this case the NNPDF 3.0 NLO PDF set [26] is used. Depending on the flavours included in the matrix element calculation of the event or produced in the parton shower through gluon splitting, the inclusive Z+jets sample can be divided into Z+b quark, c quark, and light-flavour (u, d, s quark and gluon) jet subsamples. As explained in Section 6, the jet flavour identification is based on the particle content of the final state.
Diboson events are simulated with PYTHIA6, and the inclusive cross section rescaled to the NLO prediction provided by MCFM [27]. The single top quark contribution is evaluated using POWHEG-BOX version 1.0 [28][29][30][31][32] interfaced with PYTHIA6 for parton showering, hadronization, and MPI description. The contribution of multijet events is evaluated using PYTHIA6 generated events and found to be negligible. Generated events are processed with a simulation of the CMS detector based on the GEANT4 toolkit [33]. Signals induced by additional pp interactions in the same or adjacent bunch crossings, referred to as pileup, are simulated using events generated with PYTHIA6. The pileup distribution in simulation is adjusted in order to reproduce the collision rates observed in data. During the 2012 data taking, the average pileup rate was about 21 interactions per bunch crossing.

Event selection
The analysis is based on an online trigger selection requiring events to contain a pair of electron or muon candidates with asymmetric minimum thresholds on their transverse momenta. These threshold settings depended on the instantaneous luminosity and reached maximum values of 17 GeV for the leading lepton and 8 GeV for the subleading one. Events are required to contain a Z boson, reconstructed through its decay into an electron or muon pair, produced in association with at least one or at least two hadronic jets. For the Z(1b) and Z(2b) event selections the jets are also required to be identified as originating from the hadronization of a b quark.
All the measured particles are reconstructed using the particle-flow (PF) algorithm [34,35]. The particle-flow event algorithm reconstructs and identifies each individual particle with an optimized combination of information from the various elements of the CMS detector. The energy of photons is obtained directly from the ECAL measurement, corrected for zero-suppression effects. The energy of electrons is evaluated from a combination of the electron momentum at the primary interaction vertex as determined by the tracker, the energy of the corresponding ECAL cluster, and the energy sum of all bremsstrahlung photons spatially compatible with originating from the electron track. The transverse momentum of the muons is obtained from the curvature of the corresponding track. The energy of charged hadrons is determined from a combination of the momentum measured in the tracker and the matching ECAL and HCAL energy deposits, corrected for zero-suppression effects and for the response functions of the calorimeters to hadronic showers. Finally, the energy of neutral hadrons is obtained from the corresponding corrected ECAL and HCAL energies.
The reconstructed leptons selected as candidate decay products of the Z boson must match those that triggered the event and must be associated with the primary vertex of the event, defined as the reconstructed vertex with the largest sum of p 2 T of its constituent tracks. Reconstructed electrons must satisfy a set of selection criteria designed to minimize misidentification at a desired efficiency level [13]; the discriminating observables include the measured shower shape in the ECAL and the spatial matching between the electromagnetic deposit in the calorimeter and the reconstructed track associated with it. Additional requirements on electron tracks are used to reject products of photon conversions. Electron isolation criteria exploit the full PF-based event reconstruction, using particles within a cone around the electron direction with radius ∆R = √ (∆φ) 2 + (∆η) 2 = 0.3. The isolation requirement is defined by I rel = (I charged + I photon + I neutral )/p e T < 0.15, where I charged is the scalar p T sum of all the charged hadrons, I photon is the scalar p T sum of all the photons, and I neutral the scalar sum of p T of all the neutral hadrons in the cone of interest. The notation p e T refers to the transverse momentum of the reconstructed electron. Pileup can add extra particles, which affect the isolation variable. Accordingly, only charged particles originating from the reconstructed primary vertex are used in the calculation of I charged . The photon and neutral hadronic contribution to the isolation variable coming from pileup is subtracted using the jet area approach [36]. Electrons must have p e T > 20 GeV and be reconstructed within the pseudorapidity range |η| < 1.44 and 1.57 < |η| < 2.4, which exclude the barrel-endcap transition region.
Muon identification criteria are based on the fit quality for tracks measured in the tracker and the muon detector [14]. Further selection criteria are added in order to reject muons from cosmic rays. Muon isolation is computed using all particles reconstructed by the PF algorithm within a cone of radius ∆R = 0.4 around the muon direction, requiring I rel = (I charged + I photon + I neutral )/p µ T < 0.2. Muons must have p µ T > 20 GeV and |η| < 2.4. As in the case of electrons, charged particles not originating from the primary vertex are excluded from the isolation calculation. The pileup contribution to I photon and I neutral is estimated as half of the corresponding charged hadronic component and is subtracted in the definition of the I rel variable.
The efficiencies for lepton trigger, reconstruction, identification, and isolation are measured with the "tag-and-probe" technique [37] as a function of the lepton η and p T . A sample of events containing a Z boson decaying into e + e − or µ + µ − is used for these studies. Efficiency corrections ("scale factors") of up to 1.2% (7.3%), dependent on lepton p T and η, are applied to account for differences in the estimated efficiencies between data and simulation in the electron (muon) channel.
The pair of selected same-flavour, opposite-sign, highest-p T isolated leptons is retained as a Z boson candidate if the invariant mass M of the pair lies within the 71-111 GeV mass interval. The overall efficiency of the trigger and event selection within the fiducial acceptance is 88% for dimuons and 58% for dielectrons.
Jets are reconstructed using the anti-k t algorithm [38,39] with a distance parameter of 0.5. In order to suppress the contribution from pileup interactions, charged particles not associated with the primary vertex are excluded from the clustering. Jets are required to be in the tracking acceptance region |η| < 2.4 and to have p T > 30 GeV, thereby reducing the contribution from the underlying event to less than 5%, where jets have a softer p T spectrum compared to jets from the hard scattering process. Jets with a distance ∆R < 0.5 from the closer lepton used for the Z boson decay reconstruction are not considered in the analysis. The jet energy scale (JES) is calibrated using a factorized approach as described in Refs. [40,41]. The jet energy resolution (JER) in data is known to be worse than in the simulation; therefore the simulated resolution is degraded to compensate for this effect as a function of the jet kinematics [40,41].
Jets from b quarks are identified using the combined secondary vertex (CSV) b tagging algorithm [42], a multivariate classifier that makes use of information about reconstructed secondary vertices as well as the impact parameters of the associated tracks with respect to the primary vertex to discriminate b jets from c and light-flavour jets. The threshold applied to the discriminating variable gives a b tagging efficiency of about 50% and a misidentification probability of 0.1% for light jets and 1% for c jets. Scale factors, measured in multijet events and dependent on jet p T , are used to correct the b, c, and light-flavour jet efficiencies in the simulation to match those observed in the data [42]. The scale factors for b jets are determined using samples of events enriched in such a flavour of jets. This enrichment is obtained including both multijet events containing a muon geometrically associated with a jet, with high probability of originating from the semileptonic decay of a b hadron, and leptonic and semileptonic tt events, where the leading p T jets are usually b jets. The scale factors are around 0.93, slowly decreasing for jets with p T above 120 GeV. The scale factors for c jets are assumed the same as for b jets, with an uncertainty twice as large. Relatively pure samples of c jets from W + c events, selected using identified muons within the jet, are used to validate this assumption. For light-flavour jets, the same CSV algorithm yields scale factors between 1.1 and 1.4, depending on the jet p T . The calculation is based on tracks with negative signed impact parameter and secondary vertices with negative signed decay lengths, where the sign is defined by the relative direction of the jet and the particle momentum. Finally, events are selected if they contain a Z boson candidate and at least one b-tagged jet.
The missing transverse momentum vector p miss T is defined as the projection on the plane perpendicular to the beams of the negative vector sum of the momenta of all reconstructed particles in an event. Its magnitude is referred to as E miss T . The E miss T significance, introduced in Ref. [43,44], offers an event-by-event assessment of the consistency of the observed missing energy with zero, given the reconstructed content of the event and known measurement resolutions. In order to suppress the background contamination from tt production, events with E miss T significance greater than 30 are vetoed. This requirement provides a 13% tt background rejection with no loss in signal efficiency.
The Z(1b) event selection described above yields 26443 (36843) events for the dielectron (dimuon) channels. The exclusive b-tagged jet multiplicity and invariant mass distributions of the same flavour dilepton are presented in Figs. 1 and 2, for the Z(1b) event selection for electron and muon respectively. Data are compared with the simulations where the Z+jets events are described by MADGRAPH +PYTHIA6, and good agreement is observed. In all figures, the simulated events are reweighted by scale factors in order to compensate for the residual data-tosimulation discrepancies in lepton selection efficiency, JES and JER calibration, and b tagging efficiency. The background contributions from Z+jets and tt events as adjusted in Section 5 are included in Figs. 1 and 2.

5 Background estimation
A Drell-Yan event in which a Z boson decays into τ + τ − may contribute to the dielectron or dimuon signal events if both τ leptons decay into electrons or muons. These events are treated as a background source and, being at the few per mil level, their contribution is evaluated from simulation.
The process pp → tt → W + bW − b → + − bb + E miss T is the dominant non-Drell-Yan background source. The tt background contribution is estimated separately for Z+jets, Z(1b), and Z(2b) events by using the signal selection criteria to produce samples of eµ pairs, which are enriched in tt events with negligible signal contamination. For each measured observable these samples provide the estimates of the tt background; residual non-tt backgrounds in them, amounting to about 29%, 8% and 2% respectively, are subtracted using the simulated prediction. The integrals of such estimates need to be rescaled by the ratio of the same-flavour lepton to eµ yields. This ratio is determined using control samples for both the same-flavour lepton and eµ selections by inverting the E miss T significance requirement, namely, E miss T significance >30. For the same-flavour lepton samples, this selection removes the contribution from the signal processes, while enhancing the fraction of tt events in the sample. The residual contamination from other non-tt processes is similar in the same-lepton and eµ selections, amounting to about 20%, 7%, 3% respectively, and is again taken into account using the simulation. The ratio of the eµ to the ee or µµ yields in the control samples is used to rescale the estimates of this background source for each lepton channel separately. The ratio is determined as the scaling factor for the normalization of the binned dilepton invariant mass (M ) distribution in the eµ sample that minimizes the difference of this distribution from the corresponding samelepton-flavour M distribution in a least-square fit procedure. The fit of the M distribution is performed in the sideband regions 50-84 GeV and 100-200 GeV, to avoid any assumption about the M shape for both opposite and same-sign lepton pairs in the Z peak region.
The remaining background sources are estimated using simulation. Diboson events may mimic the Z+b final state when one or more leptons are not reconstructed or when a W or Z boson decays hadronically into a qq pair (in particular a Z boson may decay into a genuine bb pair). Single top quarks produced in association with either a W boson or one or more b jets may also generate a signal-like signature. These events, together with W+jets, can mimic the signal if a lepton of the same flavour is produced in the hadronization or if a hadron is misidentified. The contribution of multijet events is found to be negligible, as has been previously observed in other similar Z+jets analyses [45].
After subtraction of all non-Drell-Yan background contributions, the extraction of the Z(1b) and Z(2b) event yields requires an evaluation of the purity of the b tagging selection, i.e. the fraction of selected Drell-Yan events in which the desired number of b-tagged jets, at least one or at least two, originate from the hadronization of a corresponding number of b quarks. This fraction is determined from a study of the secondary vertex mass distribution of the leading b-tagged jet, defined as the invariant mass of all the charged particles associated with its secondary vertices, assuming the pion mass for each considered particle. This evaluation is done separately for dielectron and dimuon final states to avoid correlations between channels and to simplify the combination. The secondary vertex mass distributions for b, c, and light-flavour jets produced in association with Z bosons are obtained from the simulation based on the MAD-GRAPH event generator interfaced with PYTHIA6 by using the 5FS scheme for PDFs. The sum of the distributions is fitted to the observed distribution with an extended binned likelihood, after subtraction of all non-Drell-Yan background contributions, by varying the three normalization scale factors c b , c c , c udsg for the various components. The c c , c udsg factors are used for the subtraction of the respective components. This procedure reduces the dependence on the normalization of the b hadron production and decay in the simulation because the expected shape of the secondary vertex mass distribution is used. In the case of the Z(2b) selection, as it can be seen in Fig. 1, the contamination from c and light-flavour jets is negligible and is subtracted using simulation; only the c bb scaling factor for the genuine double b jet component is determined from the fit, and it is used only to correct the relative proportion of Z(1b) and Z(2b) events in the simulation, as discussed in Section 6.
The results of the fit to the secondary vertex mass distributions are presented in Fig. 3 for the Z(1b) analysis, showing the flavour composition in each channel. Data-to-simulation scale factors, as determined by the fit, are given in Table 1 for both event selections and Z boson decay channels. The flavour composition of the selected sample after the scale factor corrections for the Z(1b) samples is also shown.
The b-flavour contribution is constrained by the high secondary vertex mass region of the distribution of the CSV discriminating variable, while the c-flavour contribution is mostly important in the region between 1 and 2 GeV, and the light-flavour contribution below 1 GeV. This results in a strong anticorrelation both between the b-and c-flavour and between c-and light-flavour contributions, with an estimated correlation coefficient of about -0.6 in both cases, whereas the correlation between the b-and light-flavour contributions is negligible. As a consequence, a fluctuation in the small c quark component may cause a difference in the scale factors between different lepton channels. Table 1: Normalization scale factors and post-fit fractions for b, c and light-flavour (u, d, s quark and gluon) components in the selected Z(1b) events, and scale factor for b in the selected Z(2b) events, obtained from a fit to the secondary vertex mass distribution for dielectron and dimuon final states. The quoted uncertainties are statistical only. 1.17 ± 0.09 The signal yield for Z(1b) events is therefore obtained, for each bin of a distribution, from the selected event yield N selected as where N tt , N MC Dibosons , and N MC Others are the tt, diboson, and other background contributions respectively, c c N MC Z+c and c udsg N Z+udsg are the numbers of Drell-Yan events in which the b-tagged jets originate from either a c or a light-flavour parton, and the scale factors multiply the event yields predicted by the simulation. For the calculation of the Z(2b) event yield a similar procedure is applied: The c c and c udsg scale factors are also re-evaluated from subsamples obtained by dividing the ranges of the studied observables into wide intervals, in order to study a possible correlation with the observables themselves. The statistical uncertainty of these scale factors depends on the chosen observable and binning, ranging from a factor of 2 up to 10 relative to the size of the uncertainty of the default values obtained with the full sample. Because no statistically The amount of background in the final event selection, estimated with the procedures discussed above, can be observed in Fig. 1. For the Z(1b) selection, in the electrons (muons) samples the Z+c contribution amounts to about 17% (20%), the Z+light flavour jets (including gluons) to 10% (7%), and the tt to 9% (8%). Other background contributions are globally below the 2% level. The Z(1b) contribution in the corresponding selected sample is about 62% (63%) for the electrons (muons) channel.

Unfolding
The differential event yields are corrected for event selection efficiencies and for detector resolution effects back to the stable-particle level. For this purpose, the singular value decomposition (SVD) [46] unfolding technique, implemented in the ROOUNFOLD toolkit [47], is used. The unfolding procedure is based on a response matrix, which describes the relationship between the particle levels and measured values of a given observable due to the detector resolution and acceptance. The response matrix is calculated using Z(1b) events that are generated with MADGRAPH in the 5FS, interfaced to PYTHIA6, and followed by the detector simulation. Response matrices are computed separately for the Z(1b) and Z(2b) selections. The proportion of events with exactly one or at least two b quarks in the simulation is reweighted to match that observed in data, as determined by the c bb scaling factor.
Fiducial cross sections are defined, based on event generator predictions at the particle level, for leptons and jets reconstructed from the collection of all stable final-state particles, using the same selection criteria as the data analysis. The two leptons (electrons or muons) with the highest transverse momentum and with p T > 20 GeV and |η| < 2.4 are selected as Z boson 7 Systematic uncertainties decay products if their invariant mass is in the range of 71-111 GeV. Electromagnetic final-state radiation effects are taken into account in the generator-level lepton definition by clustering all photons in a cone of radius ∆R = 0.1 around the final-state leptons. The leptons selected as Z boson decay products are then removed from the particle collection used for the jet clustering at the generator level. The remaining particles, excluding neutrinos, are clustered into jets using the anti-k t algorithm with a distance parameter of 0.5. Generated jets are selected if their distance from the leptons forming the Z boson candidate is larger than ∆R = 0.5. Jets originating from the hadronization of b quarks are selected if a b hadron is an ancestor of one of the particles clustered in it, and this b hadron has a distance from the jet in the η-φ plane of ∆R ≤ 0.5. Jets and b jets are selected if they have p T > 30 GeV and lie in the pseudorapidity range |η| < 2.4.
As a cross-check of the SVD technique, the unfolding is also performed with the iterative D'Agostini method [48], leading to compatible results within the statistical uncertainties.

Systematic uncertainties
Several sources of systematic uncertainty affect the cross section measurement: the JES and JER, the calculation of the unfolding response matrix, the estimation of the b quark fraction, the background subtraction, the event selection efficiencies, the pileup description, and the integrated luminosity. For every source other than the luminosity, the full analysis procedure is repeated after the variation of the corresponding input values, and the difference of the extracted cross section with respect to the central measurement is used as an estimate of the uncertainty due to that source. The uncertainties are symmetrized, if not already symmetric. The systematic uncertainties in the measured Z(1b) and Z(2b) differential cross sections are summarized in Table 2 and in Tables 3 and 4, respectively.
Reconstructed jet energies must be corrected for several effects, such as pileup contamination, instrumental noise, nonuniformities and nonlinearities in the detector response, and flavour composition. The resulting uncertainty depends on the transverse momentum and pseudorapidity of the jet. The systematic effect due to the application of JES corrections in the data is estimated by increasing and decreasing the correction parameters deviation from their nominal values by one standard. The uncertainty for the JER correction is evaluated in the same way.
For the cross section measurement in a given bin, the systematic uncertainty induced by the model used in the unfolding procedure is evaluated as the difference between the standard result and that obtained with an alternative model for unfolding, namely MADGRAPH5 aMC@NLO interfaced with PYTHIA8. This alternative model implements NLO hard scattering matrix elements, compared to the LO matrix elements of MADGRAPH interfaced to PYTHIA6, and also includes different details of the underlying event, hadronization, and particle decay descriptions compared to the default choice. In order to evaluate the genuine model-induced effects, the statistical uncertainties from the two simulated samples are subtracted in quadrature from the difference; any negative results so obtained are replaced with zero. The uncertainty associated with the size of the simulated sample used to compute the response matrix elements is determined by producing replicas of the matrix whose elements are fluctuated according to a Poisson distribution.
The uncertainty induced by the secondary vertex mass fit, used to extract the true flavour composition of the Z(1b) sample, is twofold. One part is due to the statistical uncertainty in the c c , c udsg scale factors, whose effect is estimated by varying them up and down by one standard deviation, taking into account their correlation. This source of uncertainty is considered as part of the statistical uncertainty, because it is due to the finite size of the collision data sample. The other part stems from the choice of the simulation model for the shape of the secondary vertex mass distributions. This choice affects also the correction of the relative proportion of different b multiplicities provided by the scale factor c bb . In addition, a systematic uncertainty is associated, for both Z(1b) and Z(2b) samples, with the modelling of the c quark and light-flavour contributions to each measured observable. Both of these model-induced uncertainties, collectively indicated in the tables as "c, udsg background model", are estimated by replacing the default model given by MADGRAPH 5FS interfaced with PYTHIA6 with MAD-GRAPH5 aMC@NLO 5FS interfaced with PYTHIA8. The scale factors, which are determined from the alternative model, are in statistical agreement for dielectron and dimuon channels within one standard deviation. The difference between the results obtained with the two models is therefore considered as safely accounting for possible residual discrepancies between data and simulation.
For each lepton channel the systematic uncertainties in the lepton efficiency calculations for triggering, reconstruction, identification, and isolation are estimated from the Z → "tagand-probe" measurements of data-to-simulation efficiency scale factors. The global effect of the systematic uncertainty related to the scale factors is 1.5% in the dielectron final state and 2% in the dimuon final state. The uncertainties in the b tagging efficiency scale factors include contributions from the pileup contamination, the gluon splitting rate in simulation (g → bb), varied by ±50%, and the energy fraction carried by the b hadrons in the hadronization (varied by ±5%) [42]. The global value of the b tagging systematic uncertainty amounts to 3% per btagged jet. Scale factors for c jets, assumed equal to those for b jets, are assigned an uncertainty twice as large as for the b jets.
The simulation is reweighted according to the generated primary vertex multiplicity and the instantaneous luminosity in data to reproduce the observed primary vertex multiplicity distribution, and provide a reliable representation of pileup. The minimum-bias event cross section in simulation is tuned to provide the best agreement between data and simulation in the vertex multiplicity distribution of Z → µµ events. The uncertainty associated with this procedure is estimated by varying this minimum-bias cross section value by 5%.
The uncertainty in the tt background normalization is derived from the statistical uncertainties of the same-flavour and eµ control samples and is included in the total statistical uncertainty. The systematic uncertainty related to the diboson background (ZZ, WW, WZ) is evaluated by varying the theoretical production cross sections by ±15% of their central values, corresponding to about three standard deviations of the overall theoretical normalization uncertainty and covering the typical differences between the theoretical and measured values. In addition, the statistical uncertainty induced by the limited size of simulation samples is taken into account.
The systematic uncertainty in the integrated luminosity is 2.6% [49].
In the ratios of Z(1b) and Z(2b) to the inclusive Z+jets cross sections, the uncertainties are simultaneously propagated to both the numerator and denominator, taking correlations into account. The uncertainties in the energy scale, resolution, and efficiency corrections for reconstructed leptons and jets are considered to be fully correlated, as is the uncertainty in the integrated luminosity. Tables 2-4 summarize the ranges of variation of the uncertainties for each observable measured with the Z(1b) and Z(2b) samples. Table 2: Uncertainties (in percent) in the differential cross sections as a function of the leading b jet p T and |η|, the Z boson p T , H T , and ∆φ Zb between the Z boson and the leading b jet, for the Z(1b) sample.

Observables
Differential cross sections as a function of a number of kinematic observables are measured in order to characterize the production mechanisms of Z(1b) events.
For Z(1b) events, five kinematic observables are studied. First, p T and |η| of the leading-p T b jet are measured, together with the Z boson p T . The distributions of these variables are directly sensitive to the b quark PDF and initial-state gluon splitting and may show differences between different PDF flavour schemes. Searches for physics processes beyond the SM in Lorentz-boosted topology events depend on precise knowledge of the Z boson p T distribution. The scalar sum H T of the transverse momenta of all selected jets, regardless of their flavour, is related to the structure of the hadronic system recoiling against the boson. The measurement of this observable at high values is potentially sensitive to the presence of intermediate heavy particles decaying hadronically, as predicted, for example, in some SUSY scenarios. Finally, the topology of the system composed of the Z boson and b jet is studied by measuring the cross section as a function of the azimuthal angular separation between the direction of the Z boson and the direction of the highest-p T b jet, ∆φ Zb . This observable is also sensitive to the presence of boosted particles decaying into a Z boson and b quarks.
Ratios of the differential cross sections for Z(1b) and Z+jets events, inclusive in the jet flavour, are also measured: R(x) = dσ(Z+(≥1b))/dx dσ(Z+jets)/dx , with x representing one of the five observables described above. The inclusive Z+jets event selection is defined by releasing the requirement of a b-tagged jet in the Z(1b) selection. In these ratios the kinematic observables referring to the highest-p T b-tagged jet from the Z(1b) sample are used in the numerator, while for the denominator the observables related to the highest-p T jet from the Z+jet sample are examined. Several systematic uncertainties cancel in the ratios, allowing a precise comparison with theory.
For Z(2b) events, the cross section is measured as a function of the transverse momenta of the Z boson and of the leading and subleading b jets. In addition, the cross section is studied as a function of several variables explicitly related to the topology of the final state consisting of a Z boson and the two highest-p T b jets. The invariant mass M bb of the bb system and the invariant mass M Zbb of the Zbb system are studied, because their distributions are sensitive to the presence of heavy intermediate particles.
Angular correlations between the b jets and between each b jet and the Z boson are described by four observables, also studied in Ref.
[9]. The azimuthal angular separation ∆φ bb between the directions of the two b jets in the transverse plane is useful to identify back-to-back configurations of the b quarks. The distance between the directions of the two b jets in the η-φ plane √ (∆η bb ) 2 + (∆φ bb ) 2 , where ∆η bb is the separation in pseudorapidity between the two b jets. This variable is sensitive to the different production mechanisms of the Z(2b) final-state b quarks. In particular, it is useful to discriminate between the contributions whose scattering amplitudes are dominated by terms involving gluon splitting, g → bb, and those where a Z boson is emitted from one of the final-state b quarks. The process qq → Zbb contributes to both cases, while qg → ZbbX (with X an additional parton) contributes to the former and gg → Zbb to the latter. These contributions correspond, respectively, to the regions where the two b quarks are almost collinear or mostly acollinear. Because two b jets must be reconstructed, this measurement cannot be sensitive to low-angle gluon splitting, where the distance between the jet-initiating partons is smaller than twice the jet size. This region is better explored by searching directly for pairs of b hadrons close in space, as studied in Ref.
[9], whose decay products might be part of a single reconstructed jet. Another angular observable of interest is ∆R min Zb , the angular separation between the Z boson and the closer b jet in the η-φ plane. This variable is useful for testing multileg tree-level and NLO corrections in which a Z boson is radiated from a quark, because it is sensitive to event topologies with the Z boson in the vicinity of one of the two b jets. Finally, the A Zbb asymmetry between the b jet direction and the Z boson direction is computed using a combination of ∆R min Zb and ∆R max Zb (the latter being the η-φ separation between the Z boson and the farther b jet): The A Zbb asymmetry can provide an indirect test of pQCD validity at higher orders of the perturbative series. A nonzero value of A Zbb is related to the emission of additional gluon radiation in the final state, while values of A Zbb close to zero identify configurations in which the two b jets are emitted symmetrically with respect to the Z boson direction.

Theoretical predictions
The measured differential cross sections for the associated production of Z bosons and b jets are compared to several perturbative QCD theoretical calculations.
In pQCD the amplitude for this process can be computed using two alternative approaches. In the four-flavour scheme (4FS) [50], the b quark mass is explicitly included in the predictions and acts as an infrared cutoff, partly removing possible divergences in the matrix element calculation. This approach corresponds to an effective QCD theory, with n f = 4 quark flavours involved in the computation of the running of the strong coupling constant α S . In this scheme no b quark PDF is used, and the b quark is always produced explicitly by the gluon splitting g → bb process. In the 5FS [51] (where n f = 5), the gluon splitting contribution is included within the b quark PDF, and the b quark mass is set to zero in the matrix element calculation. The two schemes can be defined in such a way as to provide identical results when all orders in pQCD are computed. However, differences appear in fixed-order predictions, where the different ordering of terms in the matrix element expansion gives different results. The comparison of different flavour schemes is interesting because, in pQCD, the evolution of the b quark PDF as a function of the Bjorken x variable shows sizeable differences between tree-level calculations and those at NLO. These differences are introduced by singularities in the Altarelli-Parisi splitting functions that are present only at NLO; they have no impact on the tree-level evolution of the b quark PDF [52].
While NLO calculations are now available for both flavour schemes, LO calculations are still interesting to study because they allow the inclusion of multiple additional light, hard partons in the matrix element. This feature is expected to provide a better description of the real hard radiation, compared to fixed-order NLO calculations matched with parton showering.
The MADGRAPH plus PYTHIA 6 event generator, introduced in Section 3, describes signal events at full detector simulation level and provides theoretical predictions at tree level for the associated production of Z bosons and jets, including b jets. This calculation is based on the 5FS using the LO MADGRAPH 5.1.3.30 matrix element generator, with up to four additional partons in the matrix element calculation. The factorization and renormalization scales are chosen on an event-by-event basis as the transverse mass of the event, clustered with the k t algorithm down to a 2→2 topology, and k t computed at each vertex splitting, respectively [19,53]. The matrix element calculation is interfaced with PYTHIA version 6.424, using tune Z2* for parton showering, hadronization, and description of MPI. The CTEQ6L1 PDF is adopted in the calculations. The Drell-Yan inclusive cross section is rescaled to the NNLO calculation provided by FEWZ 3.1 [20,21], which has a uncertainty of about 5%. This uncertainty is not propagated into the figures presented below.
Theoretical predictions at tree level based on MADGRAPH matrix elements for the Z + 2b process are also computed using the 4FS MSTW2008 LO PDF set [54]. The factorization and renormalization scales are defined as in the 5FS case. Also in this case, parton showering and hadronization are provided by PYTHIA6 with the tune Z2*. The inclusive cross section is rescaled to the Z + 2b NLO calculation with MADGRAPH5 aMC@NLO [23] for the 4FS, which has an estimated theoretical uncertainty of 15%, dominated by scale variations. This uncertainty is not shown in the figures.
The event generator MADGRAPH5 aMC@NLO [23] version 2.2.1 is used to provide results at NLO, combining matrix elements for zero, one, and two additional partons through the FXFX algorithm [24]. The NNPDF 3.0 NLO PDF set [26], based on the 5FS, is used. Parton showering and hadronization are performed by PYTHIA version 8.205 [25], using the CUETP8M1 tune [55]. The choice of QCD scales is the same as for the LO MADGRAPH prediction. This is the same event generator that is used in Section 3 to study the systematic uncertainty in the secondary vertex mass distribution.
The 5FS is also used to compute the NLO POWHEG prediction for a Z boson associated with two extra partons, including b quarks [56]. Lower parton multiplicities are described in the matrix element as well, but with no guarantee of NLO accuracy. The scale choice is based on the MINLO approach [57]. The NNPDF 3.0 PDF set [26] is used, and the matrix element calculation is matched with the PYTHIA8 parton shower evolution and hadronization, using the CUETP8M1 tune.
For both MADGRAPH5 aMC@NLO and POWHEG, no further rescaling of the native cross section is made. Theoretical systematic uncertainties in the predictions, caused by the choice of the QCD factorization and renormalization scales and by the propagation of the uncertainties in PDFs, are computed. The former are estimated by varying the QCD scales by factors of 2 and 0.5, while the latter are computed according to PDF authors' prescriptions. The uncertainty from varying the QCD scales is generally the dominant contribution. These theoretical uncertainties are displayed in the figures only in the ratio plots, with the statistical uncertainty shown separately, and add up to about 10% and 20% for the two calculations, respectively. For LO calculations, only the statistical uncertainty of theoretical predictions is shown.

Comparison with data
The measured differential cross sections, unfolded for detector effects, are compatible for the two leptonic channels, and are therefore combined into an uncertainty-weighted average for a single lepton flavour. Correlations between systematic uncertainties for the electron and muon channels are taken into account in the combination. In particular, all uncertainties are treated as fully correlated, with the exception of those related to lepton efficiencies, tt background estimates, and the statistical part of the subtraction of the c quark and udsg components from Z+jets, and the statistical part of the unfolding uncertainty, which are treated as fully uncorrelated. All the cross sections are measured in the fiducial phase space defined at the generated particle level for the unfolding procedure, as discussed in Section 6. No attempt is made to disentangle b jet production in the hard scattering processes and in MPI.
These measured values can be compared with the corresponding predictions at NLO of MAD-GRAPH5 aMC@NLO interfaced with PYTHIA8 (described in Sec.8.2), 4.23 +0.27 −0.37 pb for Z(1b) and 0.356 +0.030 −0.031 pb for Z(2b). The prediction overestimates by about 20% the measured value for Z(1b), while a reasonable agreement is found for Z(2b) within uncertainties.
The ratio of the cross sections in the fiducial phase space for the production of at least two and at least one b jet is σ fid (pp → Z + (≥2b)) σ fid (pp → Z + (≥1b)) = 0.093 ± 0.004 (stat) ± 0.007 (syst), to be compared with the theoretical prediction 0.084 +0.002 −0.001 where the systematic uncertainties are considered as fully correlated.
Results for the differential cross sections for the Z(1b) events are presented in Figs. 4-8, together with the ratios to the corresponding observables for the inclusive Z+jets event selection. Where applicable, the last bin also includes overflow values. A discrepancy of about 20% is seen in the overall normalization for the 4FS-based prediction, of the same order of magnitude as its estimated theoretical uncertainty. The POWHEG prediction tends to overestimate the cross sections by about 25%.
Apart for the normalization difference, the pQCD calculation with massive b quarks (4FS) seems to reproduce, slightly better, the shape of the observed distributions in the soft momentum regime of b jets. For the leading b jet p T spectrum (Fig. 4), the ratio with data is reasonably flat below 80 GeV, whereas it presents a clear slope in the higher p T range. A similar behaviour is clear in the Z boson p T distribution below 130 GeV (Fig. 6) and in the H T spectrum below 200 GeV (Fig. 7). The POWHEG generator considerably overestimates the soft parts of the p T and H T spectra. The leading b jet |η| spectrum shape is well reproduced by the MADGRAPH 4FS configuration (Fig. 5), while MADGRAPH 5FS calculation slightly overestimates the central part of the spectrum. The shape of the distribution of the azimuthal angular separation ∆φ Zb between the Z boson and the leading b jet is reproduced within uncertainties by all the calculations (Fig. 8). The NLO MADGRAPH5 aMC@NLO predictions have similar behaviour to those from LO MADGRAPH 5FS. As far as the NLO POWHEG-based prediction is concerned, it shows a similar behaviour to MADGRAPH5 aMC@NLO, but the discrepancies are larger, reaching about 40% at the peak of the Z boson p T spectrum. In general, the shape predicted by each calculation compares with data, within uncertainties, in a similar way in the high Z boson p T and H T regions, which are potentially sensitive to new physics contributions.
The underestimation of the normalization by MADGRAPH 4FS and the overestimation by POWHEG are also observed in the ratio of Z(1b) and inclusive Z+jets processes (described by the MAD-GRAPH generator in the 5FS). The pseudorapidity distribution (Fig. 5), with an almost flat shape, clearly shows that the ratio for the 4FS-based prediction is about 4%, compared to the 5% of the 5FS-based calculations, while POWHEG predicts about 6%. The 4FS prediction also fails to reproduce the ratio of the leading jet p T spectra (Fig. 4), which is clearly underestimated below 80 GeV. In contrast, POWHEG overestimates the spectrum in the soft region by about 30%. Similar discrepancies, although less pronounced, are observed for H T and the Z boson p T . The ratio as a function of the azimuthal separation between the Z boson and the b jet (Fig. 8) is also slightly underestimated by the MADGRAPH 4FS prediction when the Z boson is approximately back-to-back with respect to the leading b jet, with a difference in the azimuthal angles close to π.
The results for the differential cross sections measured with the Z(2b) event selection are shown in Figs. 9-17. Within uncertainties, no global normalization discrepancy is observed, contrary to the Z(1b) case. The leading and subleading jet spectra are slightly underestimated in the soft region by the LO calculations (the leading b jet p T in the first two bins of Fig. 9 and the subleading b jet p T in the first bin of Fig. 10), while the Z boson p T distribution is well reproduced, within uncertainties (Fig. 11). The 4FS predictions overestimate the data at the high end of these p T distributions. The ratios of all theoretical predictions and the data show a slight positive slope for the azimuthal separation (Fig. 14). All the other distributions are well reproduced. In general, given the experimental uncertainties, the measurements do not strongly discriminate between the theoretical predictions. The ratio of the MADGRAPH5 aMC@NLO and POWHEG predictions based on NLO matrix elements with data shows a similar shape.

Summary
The process of associated production of jets, including b jets, and a Z boson decaying into lepton pairs ( = e, µ) are measured in LHC pp collisions at √ s = 8 TeV with the CMS experiment, using a data set corresponding to an integrated luminosity of 19.8 fb −1 . The measured fiducial cross sections are compared to several theoretical predictions. The cross sections are measured as a function of various kinematic observables describing the event topology with a Z boson and at least one b jet: the p T and η of the leading b jet, the Z boson p T , the scalar sum H T of the jet transverse momenta, and the azimuthal angular difference between the directions of the leading b jet and the Z boson. A comparison is made of the unfolded data with leadingorder pQCD predictions based on matrix element calculations matched with parton showering, testing models using the MADGRAPH event generator, or with the NLO calculations, merging predictions for zero, one, and two extra jets with MADGRAPH5 aMC@NLO, or for the first two jets with POWHEG in the MINLO approach. In most cases the theoretical predictions agree with the data, although the normalization for MADGRAPH 4FS, which fails to describe simultaneously both the low-and high-p T b jet regions, is underestimated by 20%. The ratios of differential cross sections for the production of a Z boson in association with at least one b jet and the inclusive Z+jets production are measured and compared with theoretical expectations.   The 4FS-based prediction fails to describe the shape of the ratio as a function of the leading b jet p T , and discrepancies in the shape are also observed for high values of the Z boson p T .
The production of a Z boson in association with two b jets is also investigated. In this case the kinematic observables are the transverse momenta of the leading and subleading b jets, the p T of the Z boson, the separations of the b jets both in azimuthal angle and in the η-φ plane, the minimal distance in the η-φ plane between the Z boson and a b jet, the asymmetry between the minimal and the maximal distances between the Z boson and a b jet, and the invariant masses of the bb and the Zbb systems. The measured distributions are generally well reproduced by the predictions.

Acknowledgments
We congratulate our colleagues in the CERN accelerator departments for the excellent performance of the LHC and thank the technical and administrative staffs at CERN and at other CMS institutes for their contributions to the success of the CMS effort. In addition, we gratefully acknowledge the computing centres and personnel of the Worldwide LHC Computing Grid for delivering so effectively the computing infrastructure essential to our analyses. Finally, we acknowledge the enduring support for the construction and operation of the LHC and the CMS detector provided by the following funding agencies: the Austrian Federal Ministry of Science, Research and Economy and the Austrian Science Fund; the Belgian [7] ATLAS Collaboration, "Measurement of differential production cross-sections for a Z boson in association with b-jets in 7 TeV proton-proton collisions with the ATLAS detector", JHEP 10 (2014) 141, doi:10.1007/JHEP10(2014)141, arXiv:1407.3643.