Measurement of b jet shapes in proton-proton collisions at sqrt(s) = 5.02 TeV

We present the first study of charged-hadron production associated with jets originating from b quarks in proton-proton collisions at a center-of-mass energy of 5.02 TeV. The data sample used in this study was collected with the CMS detector at the CERN LHC and corresponds to an integrated luminosity of 27.4 pb−1. To characterize the jet substructure, the differential jet shapes, defined as the normalized transverse momentum distribution of charged hadrons as a function of angular distance from the jet axis, are measured for b jets. In addition to the jet shapes, the per-jet yields of charged particles associated with b jets are also quantified, again as a function of the angular distance with respect to the jet axis. Extracted jet shape and particle yield distributions for b jets are compared with results for inclusive jets, as well as with the predictions from the PYTHIA event generator. Submitted to the Journal of High Energy Physics c © 2020 CERN for the benefit of the CMS Collaboration. CC-BY-4.0 license ∗See Appendix A for the list of collaboration members ar X iv :s ub m it/ 31 99 63 2 [ he pex ] 2 8 M ay 2 02 0


Introduction
Jets, the collimated showers of particles produced by fragmentation and hadronization of hardscattered quarks or gluons, are long established experimental probes for studies of quantum chromodynamics (QCD) [1]. The internal structure of the jet, defined by the energy, momentum, and spatial distribution of its constituents, is sensitive to the details of the evolution from an initial hard scattering through fragmentation and hadronization into observable hadrons in the final state. The angular distributions of constituent particle yields and jet shapes, studied in this work, are affected by parton fragmentation and hadronization processes. At high transverse momenta (p T ) with respect to the beam direction in the core of the jet, the dominant contribution to these distributions is set by the initial branching of the hard scattered parton which is calculable in perturbative QCD (pQCD). However, for lower p T particles and those at larger radial distances from the jet direction, higher order corrections and nonperturbative processes become of major importance. Characterizing the effect of these additional contributions on the internal structure of jets remains challenging for theoretical calculations [2][3][4].
In this paper, the internal structure of jets is studied at the charged particle level using the data for proton-proton (pp) collisions at a center-of-mass energy of √ s = 5.02 TeV. These data, corresponding to an integrated luminosity of 27.4 pb −1 , were collected by the CMS experiment in 2015. For this study, b jets are defined by the presence of at least one b quark, which is inferred from the properties of b hadron decays. A b jet sample selected via a combined secondary vertex (CSV) discriminator [5], is composed of jets initiated by a single bottom quark, as well as of a contribution from bb pairs produced from gluon splitting. Jet-correlated charged particle transverse momentum distributions, referred to as jet shapes, are measured as a function of radial distance ∆r = √ (∆η) 2 + (∆φ) 2 from the jet axis. Here ∆η = η jet − η trk and ∆φ = φ jet − φ trk are the pseudorapidity and azimuthal differences between the jet and the charged particle, respectively. To extend the jet shape measurements further into the region where nonperturbative effects dominate, we use a jet-track correlation technique [6,7]. This method has been shown to reliably subtract the part of the event unrelated to the hard scattering (the underlying event), as well as the contribution of additional pp interactions in the same or nearby bunch crossings (pileup). We study the p T -differential distributions of jet shapes and particle yields for b jets. By comparing these measurements with the results for inclusive jets and with PYTHIA [8,9] simulation for the b jet and inclusive jet shapes at large angles from the jet axis, this study provides new constraints on pQCD calculations, as well as on the nonperturbative contribution to jet shapes. It will also provide a baseline for the future study of the parton flavor dependence of the interaction between the jet and the quark-gluon plasma [10], which is created in high energy heavy ion collisions.

The CMS detector
The central feature of the CMS apparatus is a superconducting solenoid of 6 m internal diameter, providing a magnetic field of 3.8 T. Within the solenoid volume are a silicon pixel and strip tracker, a lead tungstate crystal electromagnetic calorimeter (ECAL), and a brass and scintillator hadron calorimeter (HCAL), each composed of barrel and endcap sections. Two forward hadron (HF) steel and quartz-fiber calorimeters complement the barrel and endcap detectors, extending the calorimeter from the range |η| < 3.0 to |η| < 5.2. Events of interest are selected using a two-tiered trigger system [11].
In the region |η| < 1.74, the HCAL cells have widths of 0.087 in both pseudorapidity η and azimuth φ. Within the central barrel region of |η| < 1.48, the HCAL cells map onto 5 × 5 ECAL crystal arrays to form calorimeter towers projecting radially outwards from the nominal interaction point. Within each tower, the energy deposits in ECAL and HCAL cells are summed to define the calorimeter tower energies, which are subsequently used in the particle flow algorithm to reconstruct the jet energies and directions [12]. In this work, jets are reconstructed within the η range of |η| < 1.6.
The silicon tracker measures charged particles within |η| < 2.5. It consists of 1440 silicon pixel and 15 148 silicon strip detector modules. For nonisolated particles with 1 < p T < 10 GeV in the barrel region, the track resolutions are typically 1.5% in p T and 25-90 (45-150) µm in the impact parameter direction transverse (longitudinal) to the colliding beams [13]. A detailed description of the CMS detector, together with a definition of the coordinate system used and the relevant kinematic variables, can be found in Ref. [14].

Event selection and simulated event samples
The data used in this analysis were taken in a special low-luminosity running period in which the reduced levels of pileup (approximately 1.5 events per bunch crossing assuming a total inelastic cross section of 65 mb −1 [15]) allowed for precise measurements of the jet characteristics described in this paper. The jet samples are collected with a calorimeter-based trigger that uses the anti-k T jet clustering algorithm with a distance parameter of R = 0.4 [16]. This trigger requires events to contain at least one jet with p T > 80 GeV, and is fully efficient for events containing jets with reconstructed p T > 90 GeV. The data selected by this trigger are referred to as "jet-triggered" and are used to study the jet-related particle yields and for a data-driven estimation of acceptance effects via an event mixing technique as described in Section 5. To reduce contamination from non-collision events, such as calorimeter noise and beam-gas collisions, vertex and noise reduction selections are applied as described in Refs. [17,18]. These selections include a requirement for events to contain at least 3 GeV of energy in one of the calorimeter towers in the HF on each side of the interaction point, and to have a primary vertex (PV) with at least two tracks within 15 cm of the center of the nominal interaction region along the beam axis (|v z | < 15 cm).
Monte Carlo (MC) simulated event samples are used to evaluate the performance of the event reconstruction, particularly the track reconstruction efficiency, and the jet energy response and resolution. The MC samples of two different PYTHIA tunes (version 6.424 with Z2 tune [19] and version 8.230 with CP5 tune [20]) were used to simulate the hard scattering, parton showering, and hadronization of the partons. The b jets in PYTHIA simulations are obtained by requiring the presence of a generator-level b quark in the simulated QCD jet sample. The GEANT4 (10.02p02) [21] toolkit is used to simulate the CMS detector response. An additional reweighting procedure is performed to match the simulated v z distribution to that observed in data.
A widely used type of the jet axis, the anti-k T E-Scheme jet axis, is calculated by merging all the jet candidates, as well as input particles to the jet clustering by simply adding the fourmomenta during the clustering procedure in the anti-k T algorithm [24]. This type of jet axis is not infrared and collinear (IRC) safe [25] and can produce nonphysical radial structures in jet constituent distributions. To minimize the IRC effect on the jet direction determination, the jet axis for this work is re-calculated by the winner-takes-all recombination scheme [26,27], which is applied to the constituents found by the nominal anti-k T E-Scheme algorithm for this jet.
The b jet candidates for this work are selected by the CSV discriminator [13,28]. The CSV discriminator is a multivariate classifier that makes use of information about reconstructed secondary vertices (SV) as well as the impact parameters of the associated tracks with respect to the primary vertex, to discriminate b jets from charm-flavor and light-flavor jets. The working point selected for this analysis leads to a 65% b jet selection efficiency and 69% purity (the true b jet fraction of all jets that passed the CSV selection criteria) from the multijet sample (referring to the background of charm jets and light jets). Possible differences in the purity between data and MC are assessed using a negative-tag technique [5]. This technique selects non-b jets using the same variables and techniques as the standard CSV algorithm both in data and in the simulation to extract a scale factor, which indicates the data-to-MC difference. A correction for a bias resulting from the discriminator is discussed in Section 5.
In both data and simulation, charged particles are reconstructed using an iterative tracking method [13] based on the hit information from both the pixel and silicon strip subdetectors, permitting the reconstruction of charged particles within |η| < 2.4. The tracking efficiency ranges from approximately 90% at p T = 1 GeV to no less than 90% for p T > 10 GeV. Tracks with p T > 1.0 GeV and |η| < 2.4 are used in this study.

Jet-track angular correlations
To study the distributions of the charged particles associated with jets, a two-dimensional (2D) array of the ∆η and ∆φ values of the tracks relative to the jet axis were produced. This is computed for six bins of p trk T bounded by the values 1, 2, 3, 4, 8, 12, and 300 GeV. Each of these 2D correlations is normalized by N jets , the number of jets in the sample. This procedure, the same one as used in Ref. [29], creates a per-jet averaged ∆η-∆φ distribution of raw charged particle densities for each p trk T : where N same represents the yield of jet-track pairs from the same event. For the jet shape measurements, the 2D correlations are weighted by p trk T on a per-track basis, producing a per-jet averaged ∆η-∆φ distribution of p trk T with respect to the jet axis direction. An event mixing method [7] is applied following the construction of the raw 2D correlations S(∆η, ∆φ) to account for the shape of single inclusive jet and track distributions and the effects of the detector acceptance for tracks. For this correction, a mixed-event pair distribution ME(∆η, ∆φ) is constructed by using the jets from one event and the tracks from a different event, matched in the vertex position along the beam axis in 1 cm bins: where N mix represents the number of jet-track pairs from the mixed-event.
The per-jet associated yield is corrected for the jet-track pair efficiency via the following relation: 1 N jets The ratio ME(0, 0)/ME(∆η, ∆φ) is the normalized correction factor and ME(0, 0) is the mixed event yield for jet-track pairs that are approximately collinear and hence have the maximum pair acceptance.
After the signal pairs S(∆η, ∆φ) have been corrected using Eq. (3), the underlying event contribution and uncorrelated backgrounds from tracks unrelated to selected jets are removed by using the measured charged-particle yields far from the jet axis in a large-∆η region. The ∆φ distribution averaged over 1.5 < |∆η| < 2.5 is used to estimate the ∆φ dependence of the background contribution to the correlations over the entire |∆η| < 4.0 region and is subtracted from the acceptance-corrected yields of Eq.
(3). The pair-acceptance corrected signal pair distribution S(∆r) as a function of radius ∆r is obtained from the integration of Eq.
(3) over a ring area with radius ∆r.
The signal of b-tagged jets S tag is then corrected for residual light-flavor jet contamination. We use an approach partially relying on data for the decontamination procedure, expressed via the following equation: where the S * tag (∆r) and S mistagged (∆r) are signals of the (decontaminated) b jets and the mistagged light-flavor jets, respectively. The S mistagged (∆r) is approximated by the inclusive jet-track correlation signal S inclusive (∆r) from the data, with a modification for simulating the jet multiplicity bias that is discussed later in this section. The purity, c purity , which is defined as the ratio of the number of tagged true jets to the number of jets tagged by the CSV discriminator, comes from the simulation.
Finally, simulation-based corrections are applied to account for the jet axis resolution, tracking reconstruction efficiency, and the bias in the charged particle yield and jet shapes that comes from the b-tagging discriminator. A large fraction of tracks associated with b jets originates from SV and has a slightly different reconstruction efficiency from that of tracks originating from PV. Therefore, we derive the efficiency corrections as a function of track p T and radial distance ∆r from the MC b jet simulation by taking the ratio of correlated signals built with reconstructed tracks over those with generated tracks. This bin-by-bin correction has been applied to the raw data distributions accordingly. The discriminator used for b tagging relies on the properties of the SVs associated with the jet as input, therefore biasing the jet selection towards jets with better SV resolution. This bias, though slight, is present in distributions for both true b jets selected by the tagger, and in the mistagged light-flavor jets contaminating the sample. We calculate corrections for the tagging bias as a function of ∆r from MC simulation by constructing the following per-jet normalized ratios of radial distributions: where S mistagged (∆r), S inclusive (∆r), S all-b (∆r), and S tagged-b (∆r) represent the signal of tracks correlated with the mistagged jets, inclusive light-flavor jets, and b jets, and the tagged b jets, respectively. All of these procedures correct the data to a particle level which can be compared with theoretical calculations directly.
The fully corrected 2D correlations are integrated over annular rings in the ∆η-∆φ plane (as illustrated in [30]) to study distributions of charged-particle yields Y(∆r): with respect to the jet axis as a function of ∆r for b and inclusive-jet samples and, where N trk is the number of the charged particles from jets. The jet shape distributions ρ(∆r), defined as: where ∆r a and ∆r b define the annular edges of ∆r, δr = ∆r b − ∆r a , and p trk T stands for the p T of the charged particles, are also examined.

Systematic uncertainties
A number of sources of systematic uncertainties are considered, including the tracking efficiency, tagging bias corrections, decontamination procedure, jet reconstruction, acceptance corrections, and background subtraction. The list of systematic uncertainties is summarized in Table 1, and the evaluation of each source of uncertainty is discussed below.
The tracking reconstruction efficiencies for b jet and inclusive jet tracks have been compared to account for the uncertainty in reconstruction efficiency for displaced tracks, and a maximum difference of about 4% was observed. The full magnitude of the observed difference is assigned as a conservative estimation to cover the MC-based tracking reconstruction uncertainty. To study possible differences in track reconstruction between data and simulation, a study of D meson decays was used [31]. The D meson branching fraction ratio of 3-prong to 5-prong decays was calculated in data with MC-based efficiency corrections and compared with the world-average value [32]. The observed difference is used to derive a 4% systematic uncertainty for this source. For the full tracking-related uncertainty these two errors were added in quadrature.
The uncertainty for correcting the bias induced by the CSV discriminator is dominated by the uncertainties in the contributions from gluon-splitting and primary b quarks to the b jet sample. Jets originating from different mechanisms of b-quark production (i.e. flavor creation, flavor excitation, and gluon splitting) can be studied individually in PYTHIA simulations. We note that directions of b and b jets from gluon splitting are more likely to be close to each other, resulting in broader shapes for those jets. The fraction of b jets from the gluon splitting is about 45% based on QCD calculations [33]. In the kinematic range of this analysis, PYTHIA simulations show about 44% of b jets from gluon splitting. The corresponding systematic uncertainty has been evaluated by varying this fraction by 20% (as estimated in Ref [34]), and the observed 5% difference in the correction from this variation is propagated as an uncertainty.
The decontamination procedure is affected by the uncertainties in the purity estimation. Using the negative tagging method (described in Section 4) we have derived the data-to-simulation scale factor, which amounted to about 7% difference in estimated contamination levels. We evaluate the related systematic uncertainty by comparing results obtained with and without the derived scale factor; less than 5% variation is observed in the correlation results. This 5% maximum variation is taken as a systematic uncertainty for the decontamination.
The overall jet energy scale (JES) is sensitive to the relative fraction of quark and gluon jets in the sample. The energy scale uncertainty is found to be 2% for jets in the study in Ref. [35]. Therefore, we varied the energy threshold of selected jets by this amount in both directions. The resulting uncertainty in correlated track yields is found to be below 2%, since the in-jet multiplicity and the jet fragmentation function change slowly with the jet p T . The jet energy resolution (JER) data-to-MC difference is about 15% based on the γ+jet studies [36]. The effects were accounted for by adding a 15% smearing to the reconstructed jet p T . The resulting variation in correlation distributions was found to be below 3.5%. In total, a systematic uncertainty of 4% is assigned for the JER-and JES-related effects.
The uncertainties from the mixed-event acceptance correction are estimated by looking for an asymmetry of the sideband regions, which is defined by the difference of the sideband value between the positive and negative ∆η. Additionally, the sideband regions (1.5 < |∆η| < 2.5) that are far away from the jet axis are expected to have no short-range correlation contributions and, thus, to be independent of ∆η. Any deviations from this expectation and the measured asymmetry are used to quantify the related systematic uncertainty, which was found to be between 1 and 2%.
Uncertainties associated with the background subtraction are evaluated by considering the average point-to-point difference between two sideband regions (1.5 < |∆η| < 2.0 and 2.0 < |∆η| < 2.5) following the background subtraction. The background subtraction uncertainty is found to be roughly 3% for the lowest p trk T bin, where the signal-to-background ratio is the lowest, and decreases to negligible levels as functions of p trk T . These systematic uncertainties are treated as uncorrelated, and the total systematic uncertainty is calculated by adding the individual sources in quadrature.

Results
Figure 1 presents the charged-particle yields for inclusive and b jets in proton-proton collisions as a function of the radial distance ∆r from the jet axis. The results are shown with stacked histograms to indicate the intervals in p trk T , and dots to denote the total summed yields in the region 1 < p trk T < 12 GeV. It illustrates that the high-p T charged particles are mostly distributed around the small ∆r region while the larger ∆r region is dominated by the low-p T charged particles. Figure 2 compares the radial distributions of the total charged-particle yields associated with the inclusive and b jets studied in data and in PYTHIA simulations. Charged-particle yield distributions for both b and inclusive jets are found to be generally described by PYTHIA pre-dictions, although PYTHIA 6.424 shows a better agreement with the data than that found using the PYTHIA 8.230 prediction. Larger charged-particle yields are observed to be associated with b jets as compared with inclusive jets, particularly in the low-∆r region (see Fig. 2, right). This larger contribution in soft tracks at small radial distance ∆r implies the presence of different fragmentation patterns and decay kinematics between the b jets and inclusive jets.  (middle) with 1 < p trk T < 12 GeV are presented as functions of ∆r. Both types of jets with p T > 120 GeV and charged particles with 1 < p trk T < 12 GeV are used to construct the distributions as functions of ∆r for data (red), PYTHIA 6.426 (blue line) and PYTHIA 8.230 (green dashed line) simulations , respectively. The right plot shows the particle yield difference of b jets and inclusive jets as functions of ∆r for pp data and PYTHIA 6.426 (blue line) and PYTHIA 8.230 (green dashed line) simulations. The shadowed boxes represent the systematic uncertainties.
Measurements of the jet shapes ρ(∆r) are presented in Figs. 3 and 4.The left and right panels of Fig. 3 show p T -differential ρ(∆r) distribution for inclusive and b jets, respectively. The comparison between data and PYTHIA simulations is presented in Fig. 4. We note that, while small-∆r trends are mostly well described by MC simulation for both jet selections, the distributions at larger radial distances are underestimated, indicating a shortage of soft radiative contributions. The right panel of Fig. 4 shows the ratio of b to inclusive jet shapes for data and simulation.
Observed variations in the ratio of jet shapes indicate a shift of transverse momentum from small to large ∆r for the constituents of the b jets compared to that carried by the particles from inclusive jets. These differences may arise from the dead-cone effect, the suppression of radiation from a charged particles with mass m q and energy E q in the region with emission angle θ m q /E q [37,38], as this phenomenon is expected to be more apparent in b jets than in inclusive jets, which mostly originate from light partons.
Monte Carlo simulations have difficulties capturing the details of jet shapes for both inclusive and b jets distributions at large angular distances, where nonperturbative contributions are likely to dominate. Additionally, we observe that a higher fraction of transverse momentum is distributed towards the higher radial distances from the center of the jet for the b jets as compared to the inclusive jet sample. A similar tendency, albeit insufficient to fully capture this trend, is seen in PYTHIA simulations, with PYTHIA 8.230 giving slightly better description than PYTHIA 6.426 in the larger ∆r region while PYTHIA 6.426 shows a better performance than PYTHIA 8.230 in the small ∆r region, as illustrated in the right panel of Fig. 4. The observed data to PYTHIA discrepancy in the b-to-inclusive jet shape ratios at large radii may arise from the difference in the gluon splitting contributions between data and simulation, as mentioned earlier. We note that Monte Carlo studies show that b and b jets from gluon splitting result in significantly broader jet shapes than those of inclusive jets.  Figure 4: The jet shape distribution ρ(∆r) of inclusive jets (left) and b jets (middle), both with p T > 120 GeV and p trk T > 1 GeV are presented as functions of ∆r for data(red markers), the PYTHIA 6.426 (blue line) and the PYTHIA 8.230 (green dashed line) simulations. The right plot shows the b-to-inclusive jet shape ratio as functions of ∆r for data, PYTHIA 6 (blue line) and PYTHIA 8.230 (green dashed line) simulations. The shadowed boxes represent the systematic uncertainty.

Summary
The first measurements of charged-particle yields and jet shapes for b jets in proton-proton collisions are presented, using data collected with the CMS detector at the LHC at a centerof-mass energy of √ s = 5.02 TeV. The correlations of charged particles with jets are studied, using the particles with transverse momentum p trk T > 1 GeV and pseudorapidity |η| < 2.4, and the jets with p T > 120 GeV and |η| < 1.6. Charged-particle yields associated with jets are presented as functions of the relative angular distance ∆r = √ (∆η) 2 + (∆φ) 2 from the jet axis. In these studies, a large number of associated charged particles at low ∆r are found for b jets compared to those for inclusive jets, which are produced predominantly by gluons and light flavor quarks. The trends observed in pp data for particle yield distributions associated with both types of jets are reproduced by PYTHIA calculations (in versions 6.426 and 8.230).
In addition to the charged-particle yields, we examine the jet transverse momentum profile variable ρ(∆r), defined using the distribution of charged particles in annular rings around the jet axis, with each particle weighted by its p trk T value. The measured shapes of b jets are broader than those of inclusive jets. The shapes for both types of jets are reproduced by PYTHIA calculation in the small ∆r region, with PYTHIA 6.426 giving a better agreement. However, measured transverse momenta distributions at larger ∆r are underestimated in PYTHIA simulations for b and inclusive jets, with larger data-to-simulation differences observed for b jets.
This result provides new constraints on perturbative quantum chromodynamics calculations for flavor dependence in parton fragmentation and gluon radiation, as well as the relative contributions of different processes to b quark production. These measurements are also expected to offer an important reference for future studies of flavor dependence for parton interactions with the quark-gluon plasma formed in relativistic heavy ion collisions.

Acknowledgments
We congratulate our colleagues in the CERN accelerator departments for the excellent performance of the LHC and thank the technical and administrative staffs at CERN and at other CMS institutes for their contributions to the success of the CMS effort. In addition, we gratefully acknowledge the computing centers and personnel of the Worldwide LHC Computing Grid for delivering so effectively the computing infrastructure essential to our analyses. Finally, we acknowledge the enduring support for the construction and operation of the LHC and the CMS detector provided by the following funding agencies: BMBWF and FWF ( , 123959, 124845, 124850, 125105, 128713, 128786, and 129058 (Hungary) [11] CMS Collaboration, "The CMS trigger system", JINST 12 (2017) P01020, doi:10.1088/1748-0221/12/01/P01020, arXiv:1609.02366.