Measurement of beauty and charm production in pp collisions at $\sqrt{s}=5.02$ TeV via non-prompt and prompt D mesons

The $p_\mathrm{T}$-differential production cross sections of prompt and non-prompt (produced in beauty-hadron decays) D mesons were measured by the ALICE experiment at midrapidity ($|y|<0.5$) in proton--proton collisions at $\sqrt{s}=5.02~\mathrm{TeV}$. The data sample used in the analysis corresponds to an integrated luminosity of $(19.3\pm0.4)~\mathrm{nb^{-1}}$. D mesons were reconstructed from their decays $\mathrm{D^0 \to K^-\pi^+}$, $\mathrm{D^+\to K^-\pi^+\pi^+}$, and $\mathrm{D_s^+\to \phi\pi^+\to K^-K^+\pi^+}$ and their charge conjugates. Compared to previous measurements in the same rapidity region, the cross sections of prompt $\mathrm{D^+}$ and $\mathrm{D_s^+}$ mesons have an extended $p_\mathrm{T}$ coverage and total uncertainties reduced by a factor ranging from 1.05 to 1.6, depending on $p_\mathrm{T}$, allowing for a more precise determination of their $p_\mathrm{T}$-integrated cross sections. The results are well described by perturbative QCD calculations. The fragmentation fraction of heavy quarks to strange mesons divided by the one to non-strange mesons, $f_\mathrm{s}/(f_\mathrm{u}+f_\mathrm{d})$, is compatible for charm and beauty quarks and with previous measurements at different centre-of-mass energies and collision systems. The $\mathrm{b\overline{b}}$ production cross section per rapidity unit at midrapidity, estimated from non-prompt D-meson measurements, is $\mathrm{d}\sigma_\mathrm{b\overline{b}}/\mathrm{d} y|_\mathrm{|y|<0.5} = 34.5 \pm 2.4 (\mathrm{stat.}) ^{+4.7}_{-2.9} (\mathrm{tot. syst.})~\mu\mathrm{b}$. It is compatible with previous measurements at the same centre-of-mass energy and with the cross section predicted by perturbative QCD calculations.


Introduction
Measurements of the production of hadrons containing charm or beauty quarks in proton-proton (pp) collisions provide an important test of Quantum Chromodynamics (QCD) calculations. They also set the reference for the respective measurements in heavy-ion collisions, where the study of charm-and beauty-quark interaction with the quark-gluon plasma (QGP) constituents is a rich source of information about the medium properties and its inner dynamics [1]. Several measurements of charm and beauty production were carried out in pp collisions at √ s = 2. 76 The D-and B-meson data are generally described within uncertainties by perturbative QCD calculations at Next-to-Leading-Order with Nextto-Leading Log resummation, like FONLL [56,57] and GM-VFNS [58][59][60][61][62][63]. These calculations rely on the factorisation of soft (non-perturbative) and hard (perturbative) processes and calculate the transversemomentum (p T ) differential cross sections of charm-or beauty-hadron production as a convolution of a hard-scattering cross section at the partonic level, parton distribution functions (PDFs) of the colliding protons, and fragmentation functions (FF) modelling the transition from heavy quarks to heavy-flavour hadrons [64]. Recently, also calculations with next-to-next-to-leading-order (NNLO) QCD radiative corrections became available for the beauty-quark production [65].
In this paper we report an update of the measurement of prompt (i.e. produced in the charm quark fragmentation, either directly or through decays of excited open charm and charmonium states) D +and D + s -meson production performed with ALICE in the rapidity interval |y| < 0.5 in pp collisions at √ s = 5.02 TeV [3], obtained using an improved analysis technique. We also present a new measurement of the production of non-prompt D 0 , D + , and D + s mesons from beauty-hadron decays. The analysis of prompt D + and D + s mesons is extended down to p T = 0 and 1 GeV/c, respectively. Non-prompt D mesons are measured down to p T = 1 GeV/c (D 0 meson) and 2 GeV/c (D + and D + s mesons). These new results provide an improvement in terms of low-p T reach and particle species accessed with respect to the previous measurement of non-prompt D 0 production by CMS [30]. Such an extension is important to test perturbative QCD (pQCD) calculations over a wider p T interval and to better determine the heavyquark production cross section. These measurements also provide a reference for Pb-Pb collisions in the low-p T region, a relevant one to address nuclear effects like shadowing, heavy-quark diffusion in the QGP, and the expected enhancement of the production of hadrons with strange quarks [66].
The paper is organised as follows. In Section 2 the ALICE apparatus and the analysed data sample are described. In Section 3 the analysis procedure is explained. Machine-learning algorithms are used to classify and separate the prompt and non-prompt D-meson signals and the combinatorial background. A data-driven procedure is used to calculate the fraction of prompt and non-prompt D mesons. The systematic uncertainties are discussed in Section 4. In Section 5 the results are presented. First, in Section 5.1, the p T -differential cross sections of prompt and non-prompt D mesons are reported and compared to theoretical predictions. Then, in Section 5.2, the ratios of the measured cross sections of the D-meson species are computed. In theoretical calculations, these ratios are sensitive mainly to the FF or the adopted hadronisation model. In particular, the comparison of the production rate of strange mesons with that of non-strange ones allows the determination of the ratio f s /( f u + f d ), i.e. the fragmentation fraction of charm and beauty quarks to strange mesons divided by the one to non-strange mesons. In Section 5.3, by extrapolating down to p T = 0 the measured non-prompt D-meson cross sections, an estimate of the production cross section of beauty quarks at midrapidity is obtained, which represents the most-precise result to date in pp collisions at √ s = 5.02 TeV. A summary concludes the paper.
Non-prompt and prompt D-meson production in pp collisions at √ s = 5.02 TeV ALICE Collaboration

Experimental apparatus and data sample
The ALICE apparatus is composed of a central barrel, consisting of a set of detectors for particle reconstruction and identification at midrapidity, a forward muon spectrometer, and various forward and backward detectors for triggering and event characterisation. A complete description and an overview of their typical performance are presented in Refs. [67,68].
The D-meson decay products were reconstructed at midrapidity exploiting the tracking and particle identification capabilities of the central barrel detectors, which cover the full azimuth in the pseudorapidity interval |η| < 0.9. These detectors are embedded in a large solenoidal magnet that provides a magnetic field B = 0.5 T parallel to the beam direction. Charged-particle tracks are reconstructed from their hits in the Inner Tracking System (ITS) and the Time Projection Chamber (TPC). The ITS is the innermost ALICE detector; it consists of six cylindrical layers of silicon detectors, allowing a precise determination of the track parameters in the vicinity of the interaction point. The TPC provides up to 159 three-dimensional space points to reconstruct the charged-particle trajectory, as well as particle identification via the measurement of the specific ionisation energy loss dE/dx. The particle identification capabilities of the TPC are extended by the Time-Of-Flight (TOF) detector, which is used to measure the flight time of the charged particles from the interaction point. The event collision time is obtained using either the information from the T0 detector, the TOF detector, or a combination of the two. The T0 detector consists of two arrays ofČerenkov counters, located on both sides of the nominal interaction point, covering the pseudorapidity intervals −3.28 < η < −2.97 and 4.61 < η < 4.92. The V0 detector was used for triggering and event selection. It is composed of two scintillator arrays, located on both sides of the nominal interaction point and covering the pseudorapidity intervals −3.7 < η < −1.7 and 2.8 < η < 5.1.
The results presented in this paper were obtained from the analysis of the data sample of pp collisions at √ s = 5.02 TeV collected in 2017. The events used in the analysis were recorded with a minimum bias (MB) trigger which required coincident signals in the two scintillator arrays of the V0 detector. Events were further selected offline in order to remove background due to the interaction between one of the beams and the residual gas present in the beam vacuum tube and other machine-induced backgrounds [68]. This selection was based on the timing information of the two V0 arrays and the correlation between the number of hits and track segments in the two innermost layers of the ITS, consisting of Silicon Pixel Detectors (SPD). In order to maintain a uniform acceptance in pseudorapidity, events were required to have a reconstructed collision vertex located within ±10 cm from the centre of the detector along the beam-line direction. Events with multiple primary vertices reconstructed from TPC and ITS tracks, due to pileup of several collisions, were rejected. The rejected pileup events amount to about 1% of the triggered events and the remaining undetected pileup is negligible in the present analysis. After the aforementioned selections, the data sample used for the analysis consists of about 990 million MB events, corresponding to an integrated luminosity L int = (19.3 ± 0.4) nb −1 [69].
The Monte Carlo samples utilised in the analysis were obtained simulating pp collisions with the PYTHIA 8.243 event generator [70,71] (Monash-13 tune [72]), and propagating the generated particles through the detector using the GEANT3 package [73]. A cc-or bb-quark pair was required in each simulated PYTHIA pp event and D mesons were forced to decay into the hadronic channels of interest for the analysis. The luminous region distribution and the conditions of all the ALICE detectors in terms of active channels, gain, noise level, and alignment, and their evolution with time during the data taking, were taken into account in the simulations.
3 Analysis technique D 0 , D + , and D + s mesons and their charge conjugates were reconstructed through the decay channels D 0 → K − π + (with branching ratio BR = (3.950 ± 0.031)%), D + → K − π + π + (BR = (9.38 ± 0.16)%), and D + s → φπ + → K − K + π + (BR = (2.24 ± 0.08)%) [74]. The analysis was based on the reconstruction of decay-vertex topologies displaced from the interaction vertex. The separation induced by the weak decays of prompt D 0 , D + , and D + s is typically a few hundred of µm (cτ ≃ 123, 312, and 151 µm, respectively [74]). Decay vertices of non-prompt D mesons, originating from beauty-hadron decays, on average are more displaced from the interaction vertex due to the larger mean proper decay lengths of beauty hadrons (cτ ≃ 500 µm [74])]) as compared to charm hadrons. Therefore, exploiting the selection of displaced decay-vertex topologies, it is possible not only to separate D mesons from the combinatorial background, but also non-prompt from prompt D mesons. D-meson candidates were built combining pairs or triplets of tracks with the proper charge signs, each with |η| < 0.8, p T > 0.3 GeV/c, at least 70 (out of 159) associated space points in the TPC, a fit quality χ 2 /ndf < 2 in the TPC (where ndf is the number of degrees of freedom involved in the track fit procedure), and a minimum of two (out of six) hits in the ITS, with at least one in either of the two innermost SPD layers, which provide the best pointing resolution. These track-selection criteria reduce the D-meson acceptance in rapidity, which drops steeply to zero for |y| > 0.5 at low p T and for |y| > 0.8 at p T > 5 GeV/c. Thus, only D-meson candidates within a fiducial acceptance region, |y| < y fid (p T ), were selected. The y fid (p T ) value was defined as a second-order polynomial function, increasing from 0.5 to 0.8 in the transverse-momentum range 0 < p T < 5 GeV/c, and as a constant term, y fid = 0.8, for p T > 5 GeV/c.
To reduce the large combinatorial background and to separate the contribution of prompt and nonprompt D mesons, a machine-learning approach based on Boosted Decision Trees (BDT) was adopted. Two different implementations of the BDT algorithm, provided by the TMVA [75] and XGBoost [76] libraries, were considered. Signal samples of prompt and non-prompt D mesons for the BDT training were obtained from simulations based on the PYTHIA 8 event generator as described in Section 2. The background samples were obtained from the sidebands of the candidate invariant-mass distributions in the data. Before the training, loose kinematic and topological selections were applied to the D-meson candidates together with the particle identification (PID) of decay-product tracks. Pions and kaons were selected by requiring compatibility with the respective particle hypothesis within three standard deviations (3 σ ) between the measured and the expected signals for both the TPC dE/dx and the time of flight. Tracks without TOF hits were identified using only the TPC information. For D + s -meson candidates, an absolute difference of the reconstructed K + K − invariant mass with respect to the PDG world average of the φ meson [74] (∆M KK ) under 15 MeV/c 2 was additionally required. The D-meson candidate information provided to the BDTs, as an input for the models to distinguish among prompt and non-prompt D mesons and background candidates, was mainly based on the displacement of the tracks from the primary vertex (d 0 ), the distance between the D-meson decay vertex and the primary vertex (decay length, L), the D-meson impact parameter, and the cosine of the pointing angle between the D-meson candidate line of flight (the vector connecting the primary and secondary vertices) and its reconstructed momentum vector. Additional variables related to the PID of decay tracks were used for D + and D + s candidates. The value of ∆M KK was also considered for D + s candidates. Independent BDTs were trained for the different D-meson species and in different p T intervals. Subsequently, they were applied to the real data sample in which the type of candidate is unknown. The BDT outputs are related to the candidate probability to be a non-prompt D meson or combinatorial background. Selections on the BDT outputs were optimised to obtain a high non-prompt D-meson fraction while maintaining a reliable signal extraction in the case of the non-prompt analysis. For the prompt D + and D + s analysis, selections were tuned to provide a large statistical significance for the signal and a small contribution of non-prompt candidates.
3.1 Measurement of non-prompt D 0 , D + , and D + s mesons Samples enhanced with non-prompt candidates were selected by requiring a low candidate probability to be combinatorial background and a high probability to be non-prompt. The raw yields of D 0 , D + , and √ s = 5.02 TeV ALICE Collaboration  Figure 1: Invariant-mass distributions of D 0 , D + , and D + s candidates and charge conjugates in 1 < p T < 2 GeV/c, 8 < p T < 10 GeV/c, and 2 < p T < 4 GeV/c intervals, respectively. The blue solid lines show the total fit functions as described in the text and the red dashed lines are the combinatorial background. In case of the D 0 candidates, the grey dashed line represents the combinatorial background with the contribution of the reflections. The raw-yield (S) values are reported together with their statistical uncertainties resulting from the fit. The fraction of non-prompt candidates in the measured raw yield is reported with its statistical and systematic uncertainties. D + s mesons, including both particles and antiparticles, were extracted from binned maximum-likelihood fits to the invariant-mass (M) distributions. The raw yields could be extracted in transverse-momentum intervals in the range 1 < p T < 24 GeV/c for D 0 mesons, 2 < p T < 16 GeV/c for D + mesons, and 2 < p T < 12 GeV/c for D + s mesons. The fit function was composed of a Gaussian for the description of the signal and of an exponential term for the background. To improve the stability of the fits, the widths of the D-meson signal peaks were fixed to the values extracted from data samples dominated by prompt candidates, given the naturally larger abundance of prompt compared to non-prompt D mesons. For the M(KKπ) distribution, an additional Gaussian was used to describe the peak due to the decay D + → K − K + π + , with a branching ratio of (9.68 ± 0.18) × 10 −3 [74], present at a lower invariantmass value than the D + s -meson signal peak. For the D 0 meson, the contribution of signal candidates to the invariant-mass distribution with the wrong mass assigned to the D 0 -decay tracks (reflections) was included in the fit. It was estimated based on the invariant-mass distributions of the reflected signal in the simulation, which were described as the sum of two Gaussian functions. The contribution of reflections to the raw yield is about 0.5% − 4%, depending on p T . Examples of invariant-mass distributions together with the result of the fits and the estimated non-prompt fractions are reported in Fig. 1, for the 1 < p T < 2 GeV/c, 8 < p T < 10 GeV/c, and 2 < p T < 4 GeV/c intervals of the D 0 , D + , and D + s candidates, respectively. The procedure used to calculate the fraction of non-prompt candidates present in the extracted raw yields is described in Section 3.2. The measured raw yields, although dominated by non-prompt candidates, still contain a residual contribution of prompt D mesons which satisfy the BDT-based selections. The statistical significance of the observed signals, S/ √ S + B, varies from 4 to 10, depending on the D-meson species and on the p T interval.
The p T -differential cross section of non-prompt D mesons was computed for each p T interval as The raw-yield values (sum of particles and antiparticles, N D+D,raw ) were divided by a factor of two and multiplied by the non-prompt fraction f non-prompt to obtain the charged-averaged yields of non-prompt D mesons. Furthermore, they were divided by the acceptance times efficiency of non-prompt D mesons (Acc × ε) non-prompt , the BR of the decay channel, the width of the p T interval (∆p T ), the correction factor for the rapidity coverage c ∆y (see below), and the integrated luminosity L int = N ev /σ MB , where N ev Non-prompt and prompt D-meson production in pp collisions at √ s = 5.02 TeV ALICE Collaboration is the number of analysed events and σ MB = (50.9 ± 0.9) mb is the cross section for the MB trigger condition [69].
The (Acc × ε) correction was obtained from simulations, described in Section 2, using samples not employed in the BDT training. The (Acc × ε) factors, computed for the selections used in the final result, as a function of p T for prompt and non-prompt D 0 , D + , and D + s mesons within the fiducial acceptance region are shown in Fig. 2, along with the ratios of the non-prompt over prompt factors. The selection applied to obtain the non-prompt enhanced samples strongly suppresses the prompt D-meson efficiency, while the acceptance is the same between prompt and non-prompt D mesons. The prompt D-meson acceptance times efficiency is smaller than the one of non-prompt D mesons by a factor varying from 5 to 700, depending on the D-meson species and the p T interval. The difference between the (Acc × ε) factors of prompt and non-prompt mesons is less pronounced for D + than for D 0 , due to the more similar lifetimes of D + and beauty hadrons. For D + s mesons, looser selections than those used for the other Dmeson species were applied due to the lower yield of D + s mesons, leading to a smaller difference between the (Acc × ε) factors of the prompt and non-prompt components.
The correction factor for the rapidity acceptance c ∆y was computed with FONLL perturbative QCD calculations, which have shown a good description of the rapidity dependence of the D-meson cross section [3, 33]. The correction factor was defined as the ratio between the generated D-meson yield in ∆y = 2 y fid and that in |y| < 0.5. Calculations of c ∆y based on the PYTHIA 8 event generator were in agreement within 1%. The f non-prompt fraction was calculated with a novel data-driven approach, which is described in Section 3.2.

Data-driven estimation of non-prompt fraction
The fraction f non-prompt of non-prompt D mesons in the raw yield was estimated by sampling the raw yield at different values of the BDT output related to the candidate probability of being a non-prompt D meson. In this way, a set of raw yields Y i with different contributions of prompt and non-prompt D mesons was obtained. These raw yields can be related to the corrected yields of prompt (N prompt ) and non-prompt (N non-prompt ) D mesons via the acceptance-times-efficiency factors as follows In the above equation, δ i represents a residuum that accounts for the equation not holding exactly due to the uncertainty on Y i , (Acc × ε) non-prompt i , and (Acc × ε) prompt i . The definition of n selections leads to the 6 Non-prompt and prompt D-meson production in pp collisions at √ s = 5.02 TeV ALICE Collaboration that can be exactly solved in case of two equations (assuming δ i = 0). With n selections, the N prompt and N non-prompt parameters are obtained by minimising the χ 2 where δ T δ T δ T is the row vector of residuals and C C C the covariance matrix accounting for the uncertainties inherent to each equation. The variances σ 2 i were calculated from the statistical uncertainty on the raw yields and efficiencies as Given that the corrected yields are unknown variables, an iterative procedure was used to define the total uncertainty: in the first step only the uncertainty on the raw yields was taken into account, while from the second iteration the corrected yields N prompt and N non-prompt obtained in the previous step were also used. In the covariance terms σ i,j the correlation coefficient was assumed to be This assumption is justified by the fact that the BDT response is sampled monotonically, so that the n selections are ordered in such a way that the i th selected sample is completely included in the (i − 1) th one. For D 0 mesons, only the equation for the strictest set of selections was defined as in Eq. 2. All the others were expressed in terms of the difference between the (i − 1) th and the i th raw yields, In this case, the covariance terms were assumed to be zero, resulting in a diagonal covariance matrix.
The fraction of non-prompt D mesons in the raw yield can be computed for any set of selections i from the corrected yields obtained from the χ 2 minimisation as Rather than from the N non-prompt parameter obtained from the minimisation of the χ 2 in Eq. 4, the final values of the non-prompt D-meson cross sections were determined by choosing a selection providing a high non-prompt component and a good signal extraction, as described in Section 3, and by calculating its respective f non-prompt fraction according to Eq. 7. This approach facilitates the determination of the systematic uncertainty. Figure 3 shows an example of raw-yield distribution as a function of the BDT-based selection employed in the minimisation procedure for D 0 mesons with 1 < p T < 2 GeV/c (top left panel), D + mesons with 8 < p T < 10 GeV/c (top right panel), and D + s mesons with 2 < p T < 4 GeV/c (bottom left panel). The leftmost data point of each distribution is the raw yield corresponding to the looser selection on the BDT output related to the candidate's probability of being a non-prompt D meson, while the rightmost one corresponds to the strictest selection, which is expected to preferentially select non-prompt D mesons. The prompt and non-prompt components, obtained for each BDT-based selection from the minimisation procedure as (Acc × ε) prompt i N prompt and (Acc × ε) non-prompt i N non-prompt , are represented by the red and blue filled histograms, respectively, while their sum is reported by the green histograms. In √ s = 5.02 TeV ALICE Collaboration In general, the f non-prompt values decrease with p T , because at high p T a less stringent selection on the BDT probability of being non-prompt is needed to preserve a sufficient number of candidates to perform the invariantmass analysis.
3.3 Measurement of prompt D + and D + s mesons The measurement of prompt D + and D + s mesons follows the same procedure described in Section 3.1. The same machine-learning models trained for the non-prompt D + and D + s analysis were employed.
Non-prompt and prompt D-meson production in pp collisions at √ s = 5.02 TeV ALICE Collaboration   and charge conj. Samples containing a small fraction of non-prompt candidates were obtained selecting on the BDT outputs and requiring a low candidate probability to be combinatorial background and non-prompt. The raw yields of D + and D + s mesons were extracted in the transverse-momentum intervals 0 < p T < 36 GeV/c and 1 < p T < 24 GeV/c, respectively, extending the measurement to lower p T with respect to the previously published results [3]. The employed fit configurations were the same as for the non-prompt analysis, except that the widths of the D + -and D + s -meson signal peaks were unconstrained in the fit. Moreover, for D + mesons in 0 < p T < 1 GeV/c a third-order polynomial function was used to describe the combinatorial background, instead of an exponential function. Figure 4 shows the invariant-mass distributions, together with the result of the fits, in the 0 < p T < 1 GeV/c and 1 < p T < 2 GeV/c intervals for D + and D + s candidates, respectively. The statistical significance of the observed signals varies from about 3 to 40 for D + mesons and from 4 to 14 for D + s mesons, depending on the p T interval. The S/B values obtained are 0.07 − 2.5 (0.31 − 3.1) for D + (D + s ) mesons, depending on p T . The performance of the adopted BDT-based selections was compared with that obtained in the previous study [3]. An improvement of the statistical significance by a factor 1.1 − 2 (1.2 − 1.7) for D + (D + s ) mesons in the common p T regions of the two measurements is observed, implying a reduction of statistical uncertainties by the same factor. Furthermore, the efficiency for prompt D + and D + s mesons is higher in the BDT-based analysis by a factor 1.2 − 4 and 1.7 − 2.2, respectively, depending on the p T interval.
The data-driven method described in Section 3.2, which is based on the reliable extraction of raw yields with different fractions of prompt and non-prompt candidates, cannot be used for the estimation of the f prompt fraction in all the p T intervals of the prompt D + and D + s measurements, due to the limited size of the analysed data sample. Thus, the f prompt fraction was calculated similarly to previous measurements (see e.g. Refs. [4,77]) using the beauty-hadron production cross sections from FONLL calculations, the beauty hadron to D + X decay kinematics from the PYTHIA 8 decayer, and the acceptance-timeefficiency correction factors for non-prompt D + and D + s mesons from Monte Carlo simulations. The values of f prompt range between 0.86 and 0.96 depending on the D-meson species and p T interval. The procedure to estimate the systematic uncertainty on f prompt will be described in Section 4. Figure 5 reports the D + -and D + s -meson f prompt fractions obtained with the FONLL-based approach compared with those resulting from the data-driven method, the latter were computed in the p T ranges of the non-Non-prompt and prompt D-meson production in pp collisions at √ s = 5.02 TeV ALICE Collaboration  prompt D + and D + s measurements where a good reliability of the method can be granted. The fractions of prompt D-meson yields estimated with the two different strategies are well in agreement within the statistical and systematic uncertainties in the common p T intervals.

Systematic uncertainties
The systematic uncertainties on the measurement of prompt and non-prompt D-meson cross sections were estimated with procedures similar to those described in Refs. [3, 77,78], including the following sources: (i) extraction of the raw yield from the invariant-mass distributions; (ii) non-prompt and prompt fraction estimations; (iii) track reconstruction efficiency; (iv) D-meson selection efficiency; (v) PID efficiency; (vi) generated D-meson p T shape in the simulation. In addition, an overall normalisation systematic uncertainty induced by the branching ratios of the considered D-meson decays [74] and the integrated luminosity [69] were considered. The estimated values of the systematic uncertainties for some representative p T intervals of the different analyses are summarised in Table 1. The contributions of the different sources were summed in quadrature to obtain the total systematic uncertainty. For nonprompt D mesons, the systematic uncertainties on the non-prompt fraction estimation and the raw-yield extraction were treated as correlated and summed linearly.
The systematic uncertainty on the raw-yield extraction was evaluated by repeating the fit of the invariantmass distribution varying the lower and upper limits of the fit range and the functional form of the background fit function. In order to test the sensitivity to the line-shape of the signal, a bin-counting method was used, in which the signal yield was obtained by integrating the invariant-mass distribution after subtracting the background estimated from the side-band fit. In addition, for the analysis of nonprompt D mesons the width of the Gaussian function used to model the signal peaks was varied within the uncertainty of the value obtained from the fits to the prompt-enhanced sample. The effect was found to be negligible, hence no additional systematic uncertainty was assigned. For non-prompt D 0 mesons, an additional contribution due to the description of signal reflections in the invariant-mass distribution was estimated by varying the shape and the normalisation of the templates used for the reflections in the invariant-mass fits. The systematic uncertainty was defined as the RMS of the distribution of the signal Non-prompt and prompt D-meson production in pp collisions at √ s = 5.02 TeV ALICE Collaboration Table 1: Summary of the relative systematic uncertainties on non-prompt D 0 , D + , and D + s cross sections and prompt D + and D + s cross sections in different p T intervals.
yields obtained from all these variations and ranges from 1% to 11% depending on the D-meson species and the p T interval.
The systematic uncertainty on the value of f non-prompt obtained with the data-driven approach was estimated by changing the sets of selection criteria used for the procedure described in Section 3.2. A systematic uncertainty ranging from 2% to 10% was assigned. This source of systematic uncertainty was found to be mostly correlated with the signal extraction procedure. The correlation was evaluated by repeating the computation of f non-prompt varying the fit configurations used for the raw-yield extraction, as described above. For the analysis of prompt D + and D + s mesons, the systematic uncertainty on f prompt was estimated by varying the FONLL parameters (b-quark mass, factorisation, and renormalisation scales) as prescribed in [79]. It ranges between +1 −1 % and +6 −7 % depending on the D-meson species and p T interval.
The systematic uncertainty on the track reconstruction efficiency was evaluated by varying the trackquality selection criteria and by comparing the prolongation probability of the TPC tracks to the ITS hits in data and simulation. The comparison of the ITS-TPC prolongation efficiency in data and simulations was performed after weighting the relative abundances of primary and secondary particles in the simulation to match those observed in data, which were estimated via fits to the inclusive track impact-parameter distributions [80]. The estimated uncertainty depends on the D-meson p T and ranges from 3% to 5% for the two-body decay of D 0 mesons and from 4% to 7% for the three-body decays of D + and D + s mesons. The systematic uncertainty on the selection efficiency originates from imperfections in the description of the detector resolutions and alignments in the simulation. It was estimated by comparing the corrected yields obtained by repeating the analysis with different machine-learning selection criteria, i.e. varying the selections on the BDT outputs, resulting in a significant modification of the efficiencies, raw yield and background values. The assigned systematic uncertainty ranges from 2% to 10%.
To estimate the uncertainty on the PID selection efficiency, the pion and kaon PID selection efficiencies were compared in data and in simulations. For this study, a pure sample of pions was selected from K 0 S and Λ decays, while samples of kaons in the TPC (TOF) were obtained applying a strict PID selection using the TOF (TPC) information. Since no significant differences were observed, no systematic uncertainty was assigned. As an additional test, the analysis was repeated without PID selection. The resulting D-meson cross sections were found to be compatible with those obtained with the PID selection.
The systematic effect on the efficiency due to a possible difference between the real and simulated Dmeson transverse-momentum distributions was estimated by evaluating the efficiency after reweighting Non-prompt and prompt D-meson production in pp collisions at √ s = 5.02 TeV ALICE Collaboration the p T shape from the PYTHIA 8 generator to match the one from FONLL calculations. The weights were applied to the p T distributions of prompt D mesons and to the decaying beauty hadrons in case of non-prompt D mesons. The assigned uncertainty is 7% in the p T interval 0 − 1 GeV/c of the prompt D + meson, where the selection criteria are strict, while for other p T intervals the uncertainty is less than 1%.

Production cross sections
The p T -differential production cross sections of prompt and non-prompt D 0 , D + , and D + s mesons measured in |y| < 0.5 are shown in the left panel of Fig. 6. The p T -differential cross sections of prompt D + and D + s mesons are compatible within uncertainties with the previous results [3], but have extended p T coverage and total uncertainties reduced by a factor ranging from 1.05 to 1.6 depending on p T and D-meson species due to the improved analysis technique described in Section 3.3. The measurement of prompt D 0 mesons is the one reported previously in Ref.
The right panel of Fig. 6 shows the ratios of the p T -differential cross sections of non-prompt and prompt D mesons. The statistical uncertainties assigned to each ratio were computed considering that those of the prompt and non-prompt measurements are uncorrelated. This assumption is valid since the fraction of D-meson candidates shared by the two samples is small. The systematic uncertainty related to the determination of the tracking efficiency and to the luminosity were propagated as correlated in the ratios, while all the other sources of systematic uncertainties were considered as uncorrelated between the measurements of prompt and non-prompt D mesons. The ratio increases with increasing p T for Non-prompt and prompt D-meson production in pp collisions at √ s = 5.02 TeV ALICE Collaboration all the three D-meson species up to p T = 12 GeV/c, as expected due to the harder p T distribution of beauty hadrons (H b ) compared to D mesons. The ratios for D + and D 0 mesons are compatible within uncertainties, while for the D + s meson the central points are systematically higher compared to the other two D-meson species, suggesting a larger contribution of beauty-hadron decays to D + s compared to nonstrange D mesons, although no firm conclusion can be drawn given the current uncertainties.
The p T -differential cross sections of prompt and non-prompt D mesons are compared to predictions obtained with FONLL [56,57,79] and GM-VFNS [60,61,63] pQCD calculations in Fig. 7 and Fig. 8, respectively. The FONLL uncertainty band includes the uncertainties due to the choice of the renormalisation (µ R ) and factorisation (µ F ) scales and of the c and b quark masses, as well as the uncertainties on the CTEQ6.6 PDFs [81]. In GM-VFNS, the uncertainty related to the choice of the scales is estimated by varying only µ R and the CTEQ14 PDFs [82] are employed. Within the FONLL framework, the fragmentation fractions f (c → D) from Ref. [83] were used to normalise the prompt D 0and D + -meson cross sections, while a calculation of the prompt D + s -meson production cross section is not available. For non-prompt D mesons, FONLL calculations were used to compute the beauty-hadron cross section, while PYTHIA 8 [70,71] was used for the description of H b → D + X decay kinematics and branching ratios. The contributions from the different beauty-hadron species were weighted according to fragmentation fractions of b quarks into b-hadron species f (b → H b ) measured in the Z → bb decays [74] reported in Table 2, which provide a good normalisation for B-meson measurements performed by the ATLAS, CMS, and LHCb Collaborations [19,36,84]. Two different approaches are instead considered in the GM-VFNS framework. In the first one, the transition from the beauty quark to the charm hadron is described in a single step, exploiting a set of FFs for b → D + X obtained from measurements in e + e − collisions as described in Refs. [85,86]. In the second approach [63], the b → D + X transition is treated in two separate steps, consisting in the b → H b fragmentation and the H b → D + X decay, similarly to what was performed in the FONLL+PYTHIA8 calculation. For this latter approach, only predictions for D 0 and D + mesons are available.
The measured p T -differential cross sections of prompt D 0 , D + , and D + s mesons are described within uncertainties by the FONLL and GM-VFNS predictions. In the case of FONLL, the data lie on the upper edge of the theory uncertainty band, while for the GM-VFNS calculation, the central values of the predictions tend to underestimate the data at low and intermediate p T and to overestimate them at high p T . The measured non-prompt D-meson cross sections are instead in better agreement with the central values of the FONLL+PYTHIA 8 predictions, while they are underestimated by the GM-VFNS calculations. In the case of the one-step approach, the predictions are lower than the data by a factor ranging between 2 and 10 depending on the p T and the particle species. The two-step approach describes better the non-prompt D 0 and D + measurements, nevertheless it still underestimates the measured cross sections. This confirms that all the different terms of the factorisation approach play a crucial role in the description of the heavy-flavour hadron cross sections, indicating the importance of setting stronger constraints on the fragmentation and decay kinematics.
The visible cross sections of prompt and non-prompt D mesons were computed by integrating the measured p T -differential cross sections in the measured p T range. The results are reported in Table 3, √ s = 5.02 TeV ALICE Collaboration  where the prompt D 0 -meson cross section is the same as in Ref.
[3], scaled for the updated BR of the D 0 → K − π + decay channel reported in Ref. [74]. In the integration of the p T -differential cross sections, the systematic uncertainties were propagated as fully correlated among the measured p T intervals, except for the raw-yield extraction uncertainty, which was treated as uncorrelated considering the variations of √ s = 5.02 TeV ALICE Collaboration   Table 3: p T -integrated production cross sections in the measured p T range for prompt and non-prompt D mesons in the range |y| < 0.5 in pp collisions at √ s = 5.02 TeV.

Meson
Kinematic range (GeV/c) Visible cross section (µb) the signal-to-background ratio and the shape of the combinatorial-background distribution as a function of p T . The p T -integrated production cross sections in |y| < 0.5 were evaluated by multiplying the visible cross sections by an extrapolation factor calculated as follows. For prompt D mesons, the extrapolation factor for each D-meson species was computed using the FONLL central predictions to evaluate the ratio between the production cross section in |y| < 0.5 and that in the measured p T interval. The systematic uncertainties on the extrapolation factor were estimated by considering (i) the variation of the factorisation and renormalisation scales in the FONLL calculation, (ii) the uncertainty on the mass of the charm quark, and (iii) the CTEQ6.6 PDFs uncertainties, as proposed in Ref. [79]. Since FONLL predictions are not available for prompt D + s mesons, the central value of the extrapolation factor was computed as described in Ref.
[3], using the prediction based on the p T -differential cross section of charm quarks from FONLL, the fragmentation fractions f (c → D + s ) and f (c → D * + s ) from ALEPH measurements [87], and the charm fragmentation functions from Ref. [88]. The measurements of D 0 and D + mesons extend from p T = 0 up to p T = 36 GeV/c, leading to an extrapolation factor close to unity and a negligible associated uncertainty. In the case of non-prompt D mesons, the extrapolation factor was evaluated using the FONLL predictions for the beauty-hadron production and PYTHIA 8 to describe the H b → D + X decay kinematics. Besides the uncertainties of FONLL, for the non-prompt D-meson extrapolation factors two additional sources of systematic uncertainties were considered, i.e. the uncertainty on (i) the beauty fragmentation fractions f (b → H b ) and (ii) the branching ratios of the H b → D + X decays. The former was estimated considering an alternative set of beauty fragmentation fractions measured in pp collisions [74] reported in Table 2, while for the latter the branching ratios implemented in PYTHIA 8 were reweighted in order to reproduce the measured values reported in Ref. [74]. In addition, it was verified that the extrapolation factors computed with the PYTHIA 8 decayer were compatible with those resulting from the usage of the EvtGen package [89] for the description of the beauty-hadron decays. The production cross sections for prompt and non-prompt D mesons in |y| < 0.5 are reported in Table 4. The cross sections of prompt D + s and D + mesons are compatible with those reported in Ref.
[3], but their total uncertainties are reduced, owing to the improved precision of the p Tdifferential measurements and the extended p T range, which implies a smaller fraction of extrapolated cross section.

Cross section ratios
The p T -integrated cross sections were used to compute the ratios of production yields among the different D-meson species reported in Table 5. In the computation of these ratios, the systematic uncertainties related to the tracking efficiency, luminosity, and, for the prompt D mesons, the contribution due to Non-prompt and prompt D-meson production in pp collisions at √ s = 5.02 TeV ALICE Collaboration    the subtraction of the component from beauty-hadron decays, were considered as correlated among the different D-meson species. The extrapolation uncertainties were also treated as correlated, except for the source of uncertainty due to the branching ratios of the beauty-hadron decays used in the extrapolation of the p T -integrated cross section of non-prompt D mesons. All the other sources of systematic uncertainties were propagated as uncorrelated. The D + /D 0 ratio is compatible between prompt and non-prompt Dmeson production, while for the D + s over non-strange D meson ratios, the measured values are higher for non-prompt D mesons than for prompt D mesons with a significance of about 2.5 σ . This finding is qualitatively expected from the b → ccs and b → ccs weak decays, which enhance D + s final states. Moreover, it is consistent with previous measurements at LEP [83].
A possible p T dependence was investigated computing the p T -differential ratios. The ratios between the p T -differential production cross sections of D + and D 0 mesons and the ratios between the one of D + s mesons and the sum of the D 0 and D + mesons are reported in the left and right panels of Fig. 9, respectively. The measured ratios are independent of p T in the measured p T range within the current Non-prompt and prompt D-meson production in pp collisions at √ s = 5.02 TeV ALICE Collaboration experimental precision. They are also compatible with the FONLL predictions in the case of prompt D 0 and D + mesons and FONLL+PYTHIA 8 in the case of non-prompt D mesons. In the right panel of Fig. 9, the contributions of D + s from B 0 s and non-strange B meson decays in the FONLL+PYTHIA 8 calculation are depicted separately to highlight the substantial contribution of non-prompt D + s mesons from the decay of non-strange B mesons.
The prompt D + s /(D 0 + D + ) ratio represents the fragmentation fraction of charm quarks to charm-strange mesons f s divided by the one to non-strange charm mesons f u + f d , given that all D * + and D * 0 mesons decay to D 0 and D + mesons, and all D * + s mesons decay to D + s mesons. Considering that the uncertainties in the production ratios reported in Table 5 are dominated by the limited precision of the measurements in the low p T region and that the p T -differential ratios are constant within uncertainties, the ratio of charm-quark fragmentation fractions was computed by fitting the data with a constant function, leading to In addition to the degree of correlation among the D-meson species considered for the computation of the p T -differential ratios, all the sources of systematic uncertainties except for the one related to the rawyield extraction were propagated as fully correlated among the different p T intervals. A similar strategy was adopted by the LHCb Collaboration for the beauty sector in Ref. [37].
In Fig. 10, the charm-quark fragmentation-fraction ratio f s /( f u + f d ) is compared with previous measurements of strangeness suppression factor γ s from the ALICE [5], H1 [90], ZEUS [91], and ATLAS [18] Collaborations. They were divided by a factor two to account for the difference between γ s and the ratio of fragmentation fractions f s /( f u + f d ). The theoretical uncertainties in case of the H1 result include the branching ratio uncertainty and the model dependencies of the acceptance determination, while for the ATLAS result the extrapolation uncertainties to the full phase space are included. All the values are compatible within uncertainties and with the average of measurements at LEP [83]. The experimental points are also compared to the value obtained from PYTHIA 8 simulations with Monash-13 tune [72] and found to be compatible with it within the uncertainties, even if a tension of about 2.7 standard deviations (including both statistical and systematic uncertainties) is observed for the result presented in this paper.
A similar procedure was followed to obtain the fragmentation fraction of beauty quarks to beautystrange mesons divided by the one to non-strange beauty mesons, starting from the measured non-prompt D + s /(D 0 + D + ) ratio. In the case of non-prompt D mesons, an additional correction factor was necessary to account for the fraction of non-prompt D + s mesons not originating from B 0 s decays and that of nonprompt D 0 and D + mesons not originating from non-strange B-meson decays. This correction factor was computed from FONLL+PYTHIA 8 and a systematic uncertainty was assigned by varying the set of beauty fragmentation fractions and the beauty-hadron branching ratios, as described in Section 5.1. In the case of D + s mesons, B 0 s and non-strange B mesons are expected to contribute almost equally to the non-prompt D + s cross section as shown in the right panel of Fig. 9, while most of the non-prompt D 0 and D + mesons come from non-strange B-meson decays. The p T -differential ratio of beauty-quark fragmentation fractions was then computed as Non-prompt and prompt D-meson production in pp collisions at √ s = 5.02 TeV ALICE Collaboration  The beauty-quark fragmentation-fraction ratio f s /( f u + f d ) is compared with previous measurements from CDF [92], LHCb [37, 44], and ATLAS [20] Collaborations in Fig. 11. The ATLAS measurement was divided by a factor two assuming isospin symmetry for the u and d quarks, which implies f u = f d . All the f s /( f u + f d ) values measured in pp and pp collisions are found to be compatible with the LEP average, computed by the HFLAV Collaboration [93] and the value obtained from PYTHIA 8 simulations with Monash-13 tune [72]. It is also interesting to note that the fragmentation-fraction ratios f s /( f u + f d ) are similar for the charm and beauty sectors and are consistent with the ratio of light strange to non-strange particle production in pp and e + e − collisions [94].

Extrapolation to the bb bb bb production cross section
The bb production cross section per unit of rapidity at midrapidity (|y| < 0.5) was computed following a similar procedure as the one adopted to derive the p T -integrated production cross sections of non-prompt D mesons. In this case, the extrapolation factor α bb extr was computed as where dσ bb /dy| FONLL |y|<0.5 is the bb production cross section obtained with FONLL calculations with a correction for the different shapes of the rapidity distributions of beauty hadrons and bb pairs, and σ FONLL+PYTHIA 8 b→D (p min T < p T < p max T , |y| < 0.5) is the non-prompt D meson cross section in the measured phase space from the FONLL+PYTHIA 8 model. The correction for the bb rapidity distribution is composed of two factors. The first factor accounts for the different rapidity distributions of beauty mesons and single beauty quarks and it was evaluated to be unity in the relevant rapidity range based on FONLL calculations. A 1% uncertainty on this factor was evaluated from the difference between √ s = 5.02 TeV ALICE Collaboration   Figure 12: Estimates of dσ bb /dy at midrapidity from dielectron [97] and non-prompt D 0 , D + , and D + s meson measured in pp collisions at √ s = 5.02 TeV compared to FONLL [56,57,79] and NNLO [65] predictions. The average dσ bb /dy of the estimates from the single D-meson species is also reported. values from FONLL and PYTHIA 8. The second correction factor is the ratio (dσ bb /dy)/(dσ b /dy), which was estimated from NLO pQCD calculations (POWHEG [95]) as dσ |y|<0.5 bb /dσ |y|<0.5 b = 1.06. A 1% uncertainty on this factor was estimated from the difference among the values obtained varying the factorisation and renormalisation scales in the POWHEG calculation and using different sets of PDFs (CT10NLO [96] and CT14NLO [82]). The other sources of systematic uncertainty on the extrapolation factor, i.e. FONLL, BR(H b → D + X), and f (b → H b ), are the same as those described in Section 5.1 for the extrapolation of the p T -integrated production cross sections of non-prompt D meson.
The dσ bb /dy was computed separately for each D-meson species and the three values were then averaged using the inverse of the quadratic sum of the absolute statistical and uncorrelated systematic uncertainties as weights. The systematic uncertainties related to the tracking uncertainty and the extrapolation uncertainties related to FONLL and the beauty fragmentation fractions were treated as fully correlated among the three D-meson species, while all the other sources as uncorrelated. The resulting bb cross section at midrapidity is dσ bb dy |y|<0.5 = 34.5 ± 2.4(stat) ± 2.5(syst) ± 0.7(lumi) ± 0.3(BR) +3.8 −1.1 (extr) ± 0.5(rap. shape) µb. (12) Figure 12 shows the extrapolated dσ bb /dy from each D-meson species and their average, compared to those obtained from dielectron [97] along with a comparison to FONLL and NNLO calculations. The values extracted from the three D-meson species are compatible within uncertainties among each other and with those obtained from the other two ALICE measurements, as well as with the FONLL and NNLO predictions. As compared to FONLL calculations, the inclusion of NNLO corrections leads to a slightly larger central value, more in agreement with the experimental result based on non-prompt D mesons, and to reduced theoretical uncertainties. The measurements in pp collisions at √ s = 5.02 TeV are also shown in Fig. 13

Summary
The p T -differential cross sections of prompt and non-prompt D 0 , D + , and D + s mesons were measured at midrapidity (|y| < 0.5) in pp collisions at √ s = 5.02 TeV using a machine-learning technique based on Boosted Decision Trees. A data-driven method was employed for the evaluation of the fraction of non-prompt D mesons, f non-prompt , and for the validation of the FONLL-based method adopted in the measurement of prompt D mesons. In comparison to previously published results based on the same data sample [3], the cross sections of prompt D + and D + s mesons have total uncertainties reduced by a factor ranging from 1.05 to 1.60 and cover an extended transverse-momentum range, down to p T = 0 and p T = 1 GeV/c for D + and D + s mesons, respectively. The measurements of non-prompt mesons were performed in the interval 1 < p T < 24 GeV/c for D 0 mesons, 2 < p T < 16 GeV/c for D + mesons, and 2 < p T < 12 GeV/c for D + s mesons. The measured p T -differential cross sections are compatible with FONLL calculations in the full p T range of the measurements. For prompt D mesons, the measured values lie on the upper edge of the FONLL uncertainty band, while the measured nonprompt D-meson cross sections are in better agreement with the central value of the predictions obtained using the beauty-hadron cross section from FONLL calculations and the H b → D + X decay kinematics from the PYTHIA 8 decayer. The GM-VFNS calculations also describe the measured prompt D-meson cross sections, while they underestimate the non-prompt D-meson cross sections. The modelling of the b → D + X transition with a single step underestimates the measurements by a factor ranging between 2 and 10 depending on p T . Larger cross sections, in better agreement with the data, are obtained with a twostep process in which the b → H b fragmentation and the H b → D + X decay kinematics are factorised.
Non-prompt and prompt D-meson production in pp collisions at √ s = 5.02 TeV ALICE Collaboration Therefore, this does not invalidate the GM-VFNS calculation of the cross section of the partonic process, nor the validity of the collinear factorisation, but it confirms the importance of properly modelling the fragmentation process and the decay kinematics.
The ratios of production cross sections as well as the fragmentation fraction to strange mesons divided by the one to non-strange mesons for charm quarks, and beauty quarks, are compatible with previous measurements by other experiments for different centre-of-mass energies and colliding systems.
The bb production cross section at midrapidity per unit of rapidity in pp collisions at √ s = 5.02 TeV was estimated from the measured production cross sections of non-prompt D 0 , D + , and D + s mesons using the predictions based on FONLL calculations for the beauty-hadron cross section and the PYTHIA 8 decayer for the description of the H b → D + X decay kinematics. The extrapolated dσ bb /dy from each D-meson species are compatible among each other and with previous ALICE measurements based on dielectrons [97], and with FONLL and NNLO calculations. The dσ bb /dy determined from the average of the three D-meson species is dσ bb dy |y|<0.5 = 34.5 ± 2.4(stat) +4.7 −2.9 (tot. syst) µb.
The measurements presented in this paper provide an important test for pQCD calculations in the charm and beauty sectors and a precise reference for studies in heavy-ion collisions.