Measurement of the Z/gamma* + b-jet cross section in pp collisions at 7 TeV

The production of b jets in association with a Z/gamma* boson is studied using proton-proton collisions delivered by the LHC at a centre-of-mass energy of 7 TeV and recorded by the CMS detector. The inclusive cross section for Z/gamma* + b-jet production is measured in a sample corresponding to an integrated luminosity of 2.2 inverse femtobarns. The Z/gamma* + b-jet cross section with Z/gamma* to ll (where ll = ee or mu mu) for events with the invariant mass 6025 GeV and abs(eta)<2.1, and a separation between the leptons and the jets of Delta R>0.5 is found to be 5.84 +/- 0.08 (stat.) +/- 0.72 (syst.) +(0.25)/-(0.55) (theory) pb. The kinematic properties of the events are also studied and found to be in agreement with the predictions made by the MadGraph event generator with the parton shower and the hadronisation performed by PYTHIA.


1
At the Large Hadron Collider (LHC), the measurement of the production of a Z/γ * boson in association with b quarks is important, both as a benchmark channel to the production of the Higgs boson in association with b quarks, and as a standard model background for searches for the Higgs boson and new physics in final states with leptons and b jets. The dominant contribution to Z/γ * + b-jet production in proton-proton (pp) collisions at the LHC centre-ofmass energy of 7 TeV comes from the gluon-gluon interaction. Calculations of the cross section, driven by perturbative QCD, are currently derived in two schemes: fixed-flavour and variableflavour. The fixed-flavour scheme allows only u, d, s, and c quarks and gluons to participate in the hard scattering process, with the b quarks produced explicitly in pairs from gluon splitting. Complete calculations in this scheme with massive b quarks at next-to-leading order (NLO) have recently been released [1]. The variable-flavour scheme instead allows the b quark to participate directly in the hard scattering by integrating the gluon splitting process into the parton distribution functions (PDFs). In this case NLO calculations have been performed using massless b quarks [2][3][4]. To all orders in perturbation theory, both schemes can be made exactly identical. Still, at any finite order the results might differ significantly, depending on the ordering of the perturbative expansion.
This paper describes the measurement with the Compact Muon Solenoid (CMS) detector of the pp → Z/γ * + b-jet cross section with Z/γ * → (where = ee or µµ), including at least one b jet and any additional light jets, and compares it with theoretical predictions in the variableflavour scheme. Distributions of the kinematic variables for jets and leptons are also compared to Monte Carlo (MC) simulations using the tree-level matrix-element calculations of MAD-GRAPH (version 5.1.1.0) [5] in the variable-flavour scheme with massless b quarks, and using the PYTHIA (version 6.424) [6] description of the parton shower and hadronisation processes with the Z2 tune [7,8]. Multiparton interactions (MPIs) are included in the PYTHIA simulation. For systematic studies, signal events are also simulated using the SHERPA (version 1.3.0) [9] MC generator, in which the process is computed at the leading order (LO) in the variableflavour scheme, and with the NLO fixed-flavour calculation of aMC@NLO [1] matched to the HERWIG (version 6.520) [10, 11] event generator for the parton shower and the hadronisation part. Similar measurements of the Z/γ * + b-jet cross section were performed by the ATLAS Collaboration [12] at the LHC and by the CDF [13] and D0 [14] Collaborations at the Tevatron pp collider. It should be noted that in the latter cases the dominant contribution comes from the quark-antiquark interaction.
The main backgrounds arise from the production of Z/γ * with jets of other flavours misidentified as b jets and from tt + jets events. Other processes such as QCD multijets, W + jets, single top, and dibosons (WW, WZ) producing a final state with misidentified leptons or b jets are found to give a negligible contribution. Irreducible backgrounds such as ZZ and the associated production of W and top, resulting in a final state with two genuine leptons and a b jet, are also found to give a negligible contribution.
The CMS experiment uses a right-handed coordinate system, with the origin at the nominal interaction point, the x axis pointing to the centre of the LHC ring, the y axis pointing up (perpendicular to the plane of the LHC ring), and the z axis pointing along the anticlockwise beam direction. The polar angle θ is measured from the positive z axis and the azimuthal angle φ is measured in the xy plane in radians. The pseudorapidity is defined as η = − ln(tan θ 2 ). The CMS detector features pixel and silicon-strip trackers with coverage up to |η| = 2.4 that, together with a 3.8 T solenoid magnet, allow for tracks with transverse momentum (p T ) as low as 100 MeV to be reconstructed, and give a p T resolution of 1% at 100 GeV. Also within the magnetic field are an electromagnetic crystal calorimeter (ECAL) extending up to |η| = 3.0 with an electromagnetic transverse energy (E T ) resolution of about 3%/ √ E T /GeV, and a hermetic hadron calorimeter (HCAL) extending up to |η| = 5.2 with a transverse hadronic energy resolution of 100%/ √ E T /GeV. Embedded in the steel magnetic field return yoke is an efficient muon system capable of reconstructing and identifying muons up to |η| = 2.4. Further details of the CMS detector may be found in Ref. [15].
The data used in this analysis were collected between March and August 2011 and correspond to an integrated luminosity of L = 2.22 ± 0.05 fb −1 [16]. The peak instantaneous luminosity varied during this period from 1.0 × 10 32 to 2.5 × 10 33 cm −2 s −1 . The average number of inelastic collisions per bunch crossing was 6.2 with an RMS of 2.9. The selection of events with a Z/γ * boson decaying to a pair of electrons or muons is based on the selection used in the measurement of the inclusive Z/γ * cross section [17]. Events are selected using dielectron and dimuon triggers. The dielectron trigger has p T thresholds of 17 and 8 GeV on the leading and subleading electrons, respectively. The dimuon trigger has thresholds that increased with increased instantaneous luminosity, from 7 GeV on both muons to 13 and 8 GeV on the leading and subleading muons, respectively. Events from pure beam-related backgrounds are rejected by requiring that at least one primary vertex be reconstructed within the luminous region (|z| < 24 cm), with a fit based on at least five tracks.
Inclusive Z/γ * + jets and tt + jets events are simulated with MADGRAPH, using PYTHIA for the parton shower, hadronisation, and MPIs. The Z/γ * boson is simulated with a minimum mass of 50 GeV and only leptonic decays are considered. The Z/γ * + jets sample is normalised to the integrated luminosity in data using the cross section of 3048 ± 130 pb [18], which accounts for the O(α 2 S ) next-to-next-to-leading-order (NNLO) corrections to the inclusive Z/γ * production. For the tt + jets sample the NLO cross section of 158 +23 −24 pb [19] is used. Pile-up events are added by assuming a flat distribution of additional interaction vertices up to 10 vertices, and a tail from a Poisson distribution above 10. All MC events are reweighted to reproduce the number of pile-up events expected in data, as derived from the instantaneous luminosity distribution.
Both electrons (muons) are required to be reconstructed with p T > 25 (20) GeV [20, 21]. The electrons (muons) are further required to be well within the detector acceptance, with pseudorapidity |η| < 2.5 (2.1). For electrons, the transition region 1.444 < |η SC | < 1.566 between the barrel and endcap parts of the ECAL is excluded, η SC being the pseudorapidity of the electron ECAL cluster. Energy deposits from final state radiation (FSR) are recombined with electrons during the reconstruction process but not with muons. Nonprompt leptons are rejected by requiring a maximum distance of closest approach between the track and the beam axis in the transverse plane of 200 µm.
The lepton isolation is defined using the sum of transverse energy (or p T ) around the lepton in a cone size ∆R < 0.3 in the tracker (I trk ), ECAL (I ECAL ), and HCAL (I HCAL ) detectors, with ∆R = (∆η) 2 + (∆φ) 2 . For electrons, separate selections are applied to I trk /p e T , I ECAL /p e T , and I HCAL /p e T . Electron identification and isolation criteria are chosen to provide 85% efficiency on the 2010 data sample [17], and are more stringent than the requirements applied at the trigger level. Muon identification criteria are mostly based on cosmic-ray rejection and the quality of the global fit including the tracker and the muon chambers. For muon isolation, the combined variable (I trk + I ECAL + I HCAL )/p µ T is required to be less than 0.15. With this choice, the probability of having two misidentified leptons is negligible. Opposite charges for the leptons are required when forming pairs, and the lepton invariant mass M is required to lie between 60 and 120 GeV. In the case of multiple combinations, the lepton pair with the invariant mass closest to the mass of the Z boson is kept. by linking tracks, ECAL clusters, and HCAL clusters. Each particle is reconstructed with the optimal momentum or energy resolution by considering information from all subdetectors: charged hadrons are reconstructed from tracks; photons and neutral hadrons are reconstructed from energy clusters in the ECAL and HCAL. These individual particles are then clustered into jets using the anti-k T jet clustering algorithm [24] with a distance parameter of 0.5, as implemented in the FASTJET package [25,26]. Jets are calibrated to ensure a uniform energy response in p T and η, using photon+jet, Z+jet, and dijet events [27]. The contribution to the jet energy from pile-up is estimated on an event-by-event basis using the jet area method described in Ref. [28], and is subtracted from the overall E T response.
The reconstructed jets are required to have p T > 25 GeV and to be separated from each of the Z/γ * leptons by at least ∆R = 0.5. Furthermore, jets are required to have |η| < 2.1, to ensure optimal b-tagging performance. Loose identification criteria [29] are applied in order to further reject jets coming from beam background, and to reject calorimeter noise and isolated photons. These criteria are based on the requirements that the total energy of the jet be shared between more than one HCAL readout cell and not originate entirely from deposits associated with neutral particles; the selection efficiency for genuine jets is close to 100% in both data and MC events.
The efficiency of the dilepton selection is estimated in data and MC simulation using the tagand-probe method introduced in Ref. [17] in events with at least two leptons and a jet passing the requirements detailed above. For data-MC comparisons, MC events are reweighted according to the data/MC scale factors per lepton, as a function of their p T and η. The systematic uncertainty on the scale factor per lepton is less than 2% for electrons, and less than 1% for muons.
The Z/γ * + jets MC sample is split into three subsamples, according to the underlying production of b jets, c jets, or jets originating only from gluons or u, d, s quarks (hereafter called light jets), with no requirement on the p T or η of the jets. These subsamples are labelled respectively Z+b, Z+c, and Z+l. The fraction of events after the dilepton+jet selection that contain at least one reconstructed jet with p T > 25 GeV and |η| < 2.1, matched to a generator-level b (c) quark in ∆R, is about 4% (7%).
Jets originating from b quarks are tagged by taking advantage of the long b-hadron lifetime. The Simple Secondary Vertex (SSV) algorithm discriminant is a monotonic function of the three-dimensional flight distance significance from the reference primary vertex (i.e. the primary vertex with the highest quadratic sum of the p T of its constituent tracks, ∑ tracks p 2 T ) to the chosen secondary vertex. Values of the discriminant greater than one indicate the presence of a secondary vertex. To improve the purity of the selection, only secondary vertices built from at least three tracks are considered, referred to as high purity (HP) vertices in the following. The number of HP secondary vertices per jet is shown in Fig. 1 (left), after the dilepton+jet selection. The leading jet is found to have at least one HP secondary vertex in 2.4 ± 0.2% of the dilepton+jet events. The distribution of the SSV HP discriminant is shown for the leading jet in Fig. 1 (right). The discriminant value to define b jets is chosen to be 2.0, such that the rate of tagging a light jet (mistagging rate) is below 0.1%. Further details can be found in Refs. [30,31].
The b-tagging efficiencies and mistagging rates are measured in the data and MC samples, as functions of the p T and η of the jet, using inclusive jet samples, as described in Ref. [32]. The tagging efficiency in the MC simulation is found to be higher than that in data, as can be seen in Fig. 1 (right) in the discrepancy between data and MC observed in the discriminant above 2.0. In all subsequent results, a weight is applied to the MC events to reproduce the b-tagging efficiency and mistagging rate measured in the data. This weight takes into account the appropriate data/MC scale factor for each b-tagged jet, depending on the generator-level flavour. The MC b-jet efficiency is extracted from the signal Z+b MC simulation.
Before (after) b tagging, the reference vertex is found to be identical to the dilepton vertex in more than 99.7% (99.9%) of the events. Therefore, no explicit requirement of a common vertex is applied to the dilepton and b jet. The invariant mass and p T distributions of the lepton pairs are shown in Fig. 2 after the dilepton+b-jet selection. A discrepancy between data and MC simulation is observed in the dilepton p T distribution, especially in the region between 50 and 120 GeV. This hardening of the spectrum in data could come from higher-order corrections [33,34].
The p T distribution for the b-tagged jet with the largest transverse momentum, hereafter called the leading-p T b jet, is shown after the dilepton+b-jet selection in Fig. 3 (left). A deficit in MC events is seen at around 70 GeV. The distribution of ∆φ(Z, b jet) between the leading-p T b jet and the lepton pair is shown in Fig. 3 (right). A deficit in MC events is seen in the region 2 < ∆φ(Z, b jet) < 2.7.
The cross section for the production of a Z/γ * boson in association with at least one hadronlevel b jet is extracted from the selected numbers of dielectron+b-jet and dimuon+b-jet events, taking into account the b-jet purity P, the fraction f tt of tt events, the b-tagging efficiency ε b , the lepton efficiency ε , the correction factor C hadron for detector and reconstruction effects, and the lepton acceptance A , using the following equation:  The event-level b-jet purity P is extracted by means of a fit to the distribution of secondaryvertex mass of the leading-p T b jet in the data. The mass of the secondary vertex is defined as the invariant mass of all tracks originating from the secondary vertex, assuming the pion mass for each track. Separate sets of distributions for b, c, and light jets are derived from the MC simulation: (i) using the Z/γ * + jets sample and (ii) using inclusive jet samples reweighted to match the p T and η spectra of the leading-p T candidate jet in the dilepton+b-jet datasets. While the distributions from the inclusive jet samples were used as a baseline, the systematic uncertainty on the purity is calculated from the differences between the two sets and from the statistical uncertainty from the fit. The fraction of events in which the b hadron originates from gluon splitting is found to have a negligible impact on the shapes, but is nevertheless included in the shape uncertainty. The secondary-vertex mass is shown in Fig. 4 for the dielectron (left) and dimuon (right) channels. As tt production also yields genuine b jets, tt events must be subtracted after the purity correction. The fraction f tt of tt background events remaining after the selection is evaluated from the data using the dilepton invariant mass distribution. The tt contribution in the region of the Z-boson mass peak, [60 GeV, 120 GeV], is extrapolated from the upper sideband region, which is dominated by tt events. The ratio of the numbers of tt events in the two regions is taken from the MC simulation and corrected for discrepancies between data and simulation using the dileptonic tt decay channel in the background-free eµbb final state. Consequently, the systematic uncertainty on the tt contribution is dominated by the uncertainty on the data/MC scale factor obtained with the dileptonic tt decay measurement.

Secondary vertex mass (GeV
Other backgrounds, e.g. diboson and single-top processes, yield negligible contributions which are mostly removed after taking into account the purity. The MPI effects are included in the measurement and are expected to contribute to the cross section at a level of about 2% [35].
The average efficiency ε of the dilepton selection is evaluated using the MC Z+b signal sample corrected event-by-event to reproduce the lepton efficiencies measured in data, as explained previously. Systematic uncertainties on ε arise from the tag-and-probe analysis.
Similarly, the average efficiency ε b of the b-jet selection is evaluated using the MC Z+b signal sample, corrected event-by-event to reproduce the b-tagging efficiency and mistagging rates measured in the data, as explained previously. Systematic uncertainties arise mainly from the uncertainty on the weights applied: (i) the data/MC scale factor uncertainty of 10% for b and c jets, and 10-20% for light jets; (ii) the mistagging rate uncertainty of 20-30%. These uncertainties are documented in Refs. [31,32].
The correction factor C hadron is introduced to account for detector resolution and other reconstruction effects. It is computed from the MC MADGRAPH +PYTHIA signal sample by comparing the event yields at detector level to the event yields at generator level. The same kinematic selections as for the reconstructed leptons and jets are applied to the generator-level objects (including the selection in M and ∆R(jet, leptons)). To ensure that the correct objects are selected at the detector level, only jets matched to generator-level b jets and leptons matched to the generator-level Z/γ * -decay leptons are considered. Hadron jets are defined using the anti-k T clustering algorithm with a distance parameter of 0.5 applied to all stable particles but neutrinos after the hadronisation. A hadron jet is labelled as a b jet if there is a b hadron within ∆R = 0.5 of the jet axis. Systematic uncertainties on C hadron are derived by using SHERPA and aMC@NLO +HERWIG, and from the uncertainty on the jet energy resolution (14% for jets in the barrel, 22% in the endcaps) [27]. In the electron case, C hadron contains a small acceptance term coming from the ECAL transition region being removed at the reconstructed level. In the muon case, it contains a correction from FSR.
Once the cross section is corrected back to the particle level, a final acceptance factor A is applied to correct for the efficiency of the lepton acceptance selection: Z/γ * → ee with p e T > 25 GeV and |η e | < 2.5, or Z/γ * → µµ with p µ T > 20 GeV and |η µ | < 2.1, the electron and muon properties being defined before FSR. The systematic uncertainty on A is evaluated with SHERPA, aMC@NLO +HERWIG, and MCFM [2][3][4].
The (p T ,η)-dependent jet-energy-scale uncertainty amounts to 3-5% of the p T of the jet. Its effect on the cross section is estimated to be 2.5% using the MC signal sample, reweighted to match the data. To estimate the uncertainty due to the pile-up, the mean of the expected distribution used to reweight the MC simulation is shifted up and down by 0.6 interactions.
The estimates of the parameters defined above and the resulting cross sections are summarised in Table 1 for the ee+b and µµ+b selections. The contributions expected from the Z+b MC signal sample are 1308 ± 15 (stat.) and 2078 ± 19 (stat.) events for the ee+b and µµ+b selections, respectively, to be compared with the background-corrected data yields of 1288 ± 29 (stat.) ± 84 (syst.) and 2121 ± 37 (stat.) ± 124 (syst.) events. The theoretical uncertainties on the cross section results presented in Table 1 come from the systematic uncertainties on C hadron and A that were estimated using different MC models. Fractional uncertainties on the cross section results are summarised in Table 2.
After correction for the b-tagging efficiencies and the lepton acceptance requirements, results for the ee and µµ selections are found to be in good agreement, 5.61 ± 0.13 (stat.) ± 0.73 (syst.) +0.24 −0.53 (theory) pb and 5.97 ± 0.10 (stat.) ± 0.73 (syst.) +0. 25 −0.57 (theory) pb, respectively. The ee and µµ results are combined, taking into account correlated uncertainties as given in Table 2, and the final result is found to be 5.84 ± 0.08 (stat.) ± 0.72 (syst.) +0. 25 −0.55 (theory) pb. The results are compared to the NLO calculations obtained with the MCFM tool [2][3][4]. The inclusive cross section at parton level is found to be σ MCFM parton = 4.73 ± 0.54 pb, using the same acceptance requirements for the leptons and parton jets. The uncertainty on the MCFM estimate Table 1: Extraction of the cross section σ hadron (Z/γ * + b, Z/γ * → ) for = ee or µµ. The uncertainty on each parameter contains all the systematic effects considered in the analysis, summarised in Table 2 and detailed in the main text. The first uncertainty on the cross section results is statistical, while the second is systematic, and the third accounts for limitations of the theory.
Variable  comes from the CTEQ6M PDF set [36] and variations of the renormalisation and factorisation scales by factors of 0.5 and 2 around the mass of the Z boson, considering both correlated and anticorrelated combinations [37]. In order to extract the corresponding prediction at the hadron level, nonperturbative (NP) effects like hadronisation are quantified. A correction factor C NP is computed from parton to hadron level using MADGRAPH +PYTHIA and aMC@NLO +HERWIG. Parton jets are defined using the anti-k T clustering algorithm with a distance parameter of 0.5, applied to all quarks and gluons after showering but before hadronisation. A parton jet is labelled as a b jet if there is a b quark among its constituents. The correction is found to be C NP = (84 ± 3)%, leading to a hadron-level-corrected NLO prediction of 3.97 ± 0.47 pb. The theoretical prediction in the context of this MCFM calculation is found to be smaller than the data measurement.
In conclusion, the production of b jets in association with a Z/γ * boson has been studied in 2.2 fb −1 of proton-proton collision data at a centre-of-mass energy of 7 TeV recorded by the CMS detector. Measurements performed in the electron and muon Z/γ * -decay channels are combined. The MADGRAPH simulation interfaced with PYTHIA is used to derive the correction from the reconstructed level to the hadron level. The Z/γ * + b-jet cross section, with Z/γ * → where = ee or µµ, for events with the Z/γ * lepton pair invariant mass 60 < M < 120 GeV, and at least one b jet at the hadron level with p T > 25 GeV and |η| < 2.1, using anti-k T jets reconstructed with a distance parameter of 0.5 and with a separation between the leptons and the jets of ∆R > 0.5, is found to be 5.84 ± 0.08 (stat.) ± 0.72 (syst.) +0.25 −0.55 (theory) pb. The distributions of the kinematic variables for the leading-p T b jet and the Z/γ * -decay leptons are found to be in fair agreement with the predictions made by the MADGRAPH event generator interfaced with PYTHIA, and normalised to the integrated luminosity in data using a cross section value that includes the NNLO corrections to the inclusive Z/γ * production. The residual discrepancy may be a consequence of the higher order terms absent in the MADGRAPH tree-level simulation in the variable-flavour scheme with massless b quarks.
We congratulate our colleagues in the CERN accelerator departments for the excellent performance of the LHC machine. We thank the technical and administrative staff at CERN and other CMS institutes, and acknowledge support from:      [20] CMS Collaboration, "Electron Reconstruction and Identification at √ s = 7 TeV", CMS Physics Analysis Summary CMS-PAS-EGM-10-004, (2010).