Production of light (anti)nuclei in pp collisions at $\sqrt{s}~=~5.02$ TeV

The study of the production of nuclei and antinuclei in pp collisions has proven to be a powerful tool to investigate the formation mechanism of loosely bound states in high-energy hadronic collisions. In this paper, the production of protons, deuterons and $^3$He and their charge conjugates at midrapidity is studied as a function of the charged-particle multiplicity in inelastic pp collisions at $\sqrt{s}=5.02$ TeV using the ALICE detector. Within the uncertainties, the yields of nuclei in pp collisions at $\sqrt{s}=5.02$ TeV are compatible with those in pp collisions at different energies and to those in p-Pb collisions when compared at similar multiplicities. The measurements are compared with the expectations of coalescence and Statistical Hadronisation Models. The results suggest a common formation mechanism behind the production of light nuclei in hadronic interactions and confirm that they do not depend on the collision energy but on the number of produced particles.


Introduction
Light (anti)nuclei are abundantly produced in ultrarelativistic heavy-ion collisions [1][2][3] at the Large Hadron Collider (LHC), but their measurement in pp collisions is challenging due to their lower production yields. As a consequence, until few years ago there were only few measurements of the production rates of (anti)nuclei in small collision systems [1,[4][5][6]. This has recently changed thanks to the large pp data samples collected by ALICE at the LHC, which allow us to perform more precise and differential measurements of the production of light (anti)nuclei. In this paper, we present the detailed study of the multiplicity and transverse momentum dependence of (anti)proton, (anti)deuteron and (anti) 3 He production in pp collisions at √ s = 5.02 TeV. The results shown in the following are the most accurate obtained so far in small systems and represent the full compilation of data available for pp collisions at different energies at the end of the LHC Run 2.
The production mechanism of light (anti)nuclei in high-energy hadronic collisions is not fully understood. The classes of models used for comparison with the experimental results are the Statistical Hadronisation Models (SHM) and the coalescence models. SHMs assume that particles originated from an excited region evenly occupy all the available states in phase space [7]. Pb-Pb collisions, characterised by a large extension of the particle-emitting source and hence considered as large systems, are described according to a grand canonical ensemble [8]. On the contrary, pp and p-Pb collisions, which are characterised by a small size and are considered as small systems, must be described based on a canonical ensemble, requiring the local conservation of the appropriate quantum numbers [9]. The expression Canonical Statistical Model (CSM) is used to underline the canonical description.
An important observable that provides information on the production mechanism is the ratio between the p T -integrated yields of nuclei and protons. The measured d/p and 3 He/p ratios show a rather constant behaviour as a function of centrality in Pb-Pb collisions. In contrast to that, they increase in pp and p-Pb collisions with increasing multiplicity, finally reaching the values measured in Pb-Pb collisions [1,10,11]. The constant nuclei-to-proton ratios in large collision systems is predicted by the SHMs [12], while the experimentally determined difference between small and large systems can be qualitatively explained as an effect of the canonical suppression of the nuclei yields for small system sizes. The prediction of the CSM saturates towards the grand canonical value at larger system size [13] .
In coalescence models, (anti)nuclei are formed by nucleons close in phase space [14]. In this approach, the coalescence parameter B A relates the production of (anti)protons to the one of (anti)nuclei. B A is defined as where p T is the transverse momentum, y the rapidity and N the number of particles. The labels p and A are used to denote properties related to protons and nuclei with mass number A, respectively. The production spectra of the (anti)protons are evaluated at the transverse momentum of the nucleus divided by the mass number, so that p p T = p A T /A. Neutron spectra are assumed to be equal to proton spectra, due to the isospin symmetry restoration in hadron collisions at the LHC. Since the coalescence process is expected to occur at the late stages of the collision, the B A parameter is related to the emission volume. In a simple coalescence approach, which describes the uncorrelated particle emission from a point-like source, B A is expected to be independent of p T and multiplicity. In this context, the measurements of the nuclei-to-proton ratios and of the B A parameters in pp collisions at √ s = 5.02 TeV reported in this paper are important to complete the present picture of the production of light nuclei in small systems. In addition, the increased statistics exploited in the present analysis will allow us to better constrain the models, thus to provide important inputs to both the theoretical and experimental communities.
The same pseudorapidity range is covered by the TPC [19], which is the main tracking detector, consisting of a hollow cylinder whose axis coincides with the nominal beam axis. The active volume, filled with a Ne/CO 2 /N 2 gas mixture at atmospheric pressure, has an inner radius of about 85 cm and an outer radius of about 250 cm. The trajectory of a charged particle is estimated using up to 159 combined measurements (clusters) of drift times and radial positions of the ionisation electrons. The charged-particle tracks are then reconstructed by combining the hits in the ITS and the measured clusters in the TPC. The TPC is also used for particle identification (PID) by measuring the specific energy loss (dE/dx) in the TPC gas. In pp collisions, the dE/dx in the TPC is measured with a resolution of ≈ 5.2% [15].
The TOF [20] covers the full azimuth for the pseudorapidity interval |η| < 0.9. The detector is based on the Multigap Resistive Plate Chambers (MRPC) technology and is located, with a cylindrical symmetry, at an average distance of 380 cm from the beam axis. The particle identification is based on the difference between the measured time of flight and its expected value, computed for each mass hypothesis from track momentum and length. A precise starting signal for the measurement of the time of flight by the TOF is provided by the T0 detector, consisting of two arrays of Cherenkov counters, T0A and T0C, which cover the pseudorapidity regions 4.61 ≤ η ≤ 4.92 and 3.28 ≤ η ≤ 2.97, respectively [21]. The overall resolution on the particles time of flight, including the start time, is ≈ 80 ps.

Data sample
This analysis is based on approximately 900 million pp collisions (events) at √ s = 5.02 TeV collected in 2017 by ALICE at the LHC. Events are selected by a minimum-bias (MB) trigger, requiring at least one hit in each of the two V0 detectors. An additional offline rejection is performed to remove events with more than one reconstructed primary vertex (pile-up events) and events triggered by interactions of the beam with the residual gas in the LHC beam pipe [17]. In total, 1.8% of the collected events are rejected due to these selections.
The production of (anti)nuclei is measured around midrapidity, within a rapidity range of |y| < 0.5, and within the pseudorapidity interval |η| < 0.8 to maximise the detector performance. The selected tracks are required to have at least 70 reconstructed points in the TPC and two points in the ITS in order to guarantee good track momentum and dE/dx resolution in the relevant p T ranges. In addition, at least one hit in the SPD is required to ensure a resolution of the distance of closest approach to the primary vertex better than 300 µm, both along the beam axis (DCA z ) and in the transverse plane (DCA xy ) [15]. The quality of the accepted tracks is checked by requiring the χ 2 per TPC reconstructed point and per ITS Data are divided into multiplicity intervals classified by a roman numeral from I to X, going from the highest to the lowest multiplicity [10]. In order to achieve a higher statistical precision, classes are merged into nine classes for (anti)protons and (anti)deuterons and into two classes for (anti)helion. The multiplicity classes are defined from the mean of the V0 signal amplitudes as percentiles of the INEL > 0 pp cross section, where INEL > 0 events are defined as collisions with at least one charged particle in the pseudorapidity region |η| < 1 [22]. The mean charged-particle multiplicities for each class, dN ch /dη , are listed in Table 1.

Data analysis 4.1 Raw yield extraction
The first important step in the analysis is the particle identification. As already shown in previous works [1,6,10,23,24], the identification of (anti)nuclei is performed with two different methods, depending on the particle species and on the transverse momentum. For (anti)protons and (anti)deuterons with p T < 1 GeV/c, the identification relies on the measurement of the dE/dx using the TPC. The number of signal candidates is extracted through a fit with a Gaussian with two exponential tails to the n σ TPC distribution for each p T interval. The n σ TPC is defined as the difference between the measured and the expected dE/dx for each particle species, divided by dE/dx resolution of the TPC. For p T ≥ 1 GeV/c, it is more difficult to separate (anti)protons and (anti)deuterons from other charged particles of |Z| = 1. Therefore, PID is performed using the TOF detector information in addition. The squared mass of the particle is evaluated as m 2 = p 2 t 2 TOF /L 2 − 1/c 2 , where t TOF is the measured time of flight, L is the length of the track and p is the momentum of the particle. In order to reduce the background, the tracks are in addition required to have |n σ TPC | < 3. The squared mass distributions of the signal are fitted with a Gaussian function with an exponential tail. Background originating from other particle species or from the random match of a TOF hit with another track significantly increases with p T and is modelled with the sum of Gaussian and exponential functions. For (anti)helion, only the TPC dE/dx measurement is used, because their signal in the TPC can be easily separated from the one of other particle species, due to the electric charge (Z = 2). The raw yield of (anti)helion is obtained through a fit of the n σ TPC with a Gaussian function for the signal and a Gaussian function for the contamination coming from (anti)triton, Production of light (anti)nuclei in pp collisions at √ s = 5.02 TeV ALICE Collaboration where present. When the background is negligible, the raw yield is extracted by directly counting the (anti)nuclei candidates. Otherwise, the TPC dE/dx and TOF squared mass distributions are fitted with the aforementioned models, using an extended-maximum-likelihood approach and the yield is obtained as a fit parameter. In the signal extraction, the fit quality is monitored and a successful Pearson test is required with the probability to reject a true hypothesis of 5%.

Efficiency and acceptance correction
The raw yield must be corrected to take into account the tracking efficiency and the detector acceptance. This correction is evaluated from Monte Carlo (MC) simulated events, which are generated using the event generator PYTHIA8.21 (Monash2013 tune) [25]. However, since PYTHIA8 does not handle the production of nuclei properly, it is necessary to inject (anti)nuclei on top of each generated event. In each pp collision, one deuteron, one antideuteron, one helion or one antihelion are injected, randomly chosen from a flat rapidity distribution in the range |y| < 1 and a flat p T distribution in the range p T ∈ [0, 10] GeV/c. The GEANT4 [26] transport code is exploited to describe the hadronic interaction of the particles propagating through the detector material. The correction is defined as the ratio between the number of reconstructed (anti)nuclei in the rapidity range |y| < 0.5 and in the pseudorapidity interval |η| < 0.8 and the number of generated ones in |y| < 0.5. The correction is computed separately for each (anti)nucleus and for the TPC and TOF analyses. Moreover, the raw signal needs to be corrected for trigger inefficiencies. The selected events are requested to have at least one charged-particle in the pseudorapidity region |η| < 1 (INEL > 0) [22]. Some INEL > 0 events can be lost due to the finite trigger efficiency (event loss) and all the particles produced in those events are lost as well (signal loss). Hence, it is necessary to correct the spectra for the event and the signal losses. The correction must be evaluated from MC simulations because the number of rejected events and lost particles is only known there. For (anti)protons, this correction is directly computed from the MC simulation because their production is handled by the event generator. On the contrary, (anti)nuclei are injected on top of a pp collision and a direct estimation from the MC is not possible, because there would be a bias in the number of lost (anti)nuclei. For this reason, the correction for pions, kaons and protons is evaluated in this case in a different MC data set with no injected nuclei and the average value is used for (anti)deuterons and (anti)helions. Further details on this method can be found in [10,23]. This correction is negligible at high multiplicity (< 1‰) and becomes relevant at low multiplicity (up to 14% for (anti)protons and (anti)deuterons, 2% for (anti)helions, in the low p T region p T < 1 GeV/c).

Secondary (anti)nuclei contamination
The contribution of secondary (anti)nuclei, i.e. (anti)nuclei that are not produced directly in the collision, must be subtracted from the total measured yields. Secondary nuclei are mostly produced in the interaction of particles with the vacuum beam pipe and the detector material. Moreover, an important contribution to secondary (anti)protons is also given by the weak decay of heavier particles. All particles coming from strong and electromagnetic decays are considered as primary. (Anti)deuterons and (anti)helions receive a negligible background contribution from weak decays, since the only known contribution comes from the decays of hypertriton ( 3 Λ H → d + p + π and 3 Λ H → 3 He + π) and their antimatter counterparts, whose production is known to be suppressed in pp collisions [6]. Finally, the production of secondary antideuterons and antihelions from material is extremely rare due to baryon number conservation. The fraction of primary (anti)nuclei is evaluated through a template fit to the DCA xy distribution of the data, as described in [1]. The templates for primary and secondary (anti)protons and deuterons are obtained from MC simulations. For (anti)protons, two templates are used to describe both (anti)protons from weak decays and from material. While the template for primary (anti)helions is extracted from the MC as well, this is not possible for the template for secondaries, due to the very rare production of antihelion. For this reason, the (anti)proton template at half the (anti)helion p T is used as a proxy for the (anti)helion one. This procedure is based on the assumption that the DCA xy distributions of secondary Production of light (anti)nuclei in pp collisions at √ s = 5.02 TeV ALICE Collaboration (anti)helions can be represented by the DCA xy distributions of (anti)protons at a transverse momentum which is scaled with the rigidity p/z of (anti)helion, where z is the (anti)helion electric charge. The contribution of secondary nuclei is observed to be more relevant at low p T (20% for protons, 40% for deuterons and 90% for helions) and to decrease exponentially with increasing transverse momentum.

Systematic uncertainties
One contribution of the systematic uncertainties comes from the adopted track selection criteria. This uncertainty is evaluated by varying the selections, as done in [10]. The effect of the subtraction of secondary (anti)nuclei is studied with the variation of the DCA z and DCA xy selections as well. This is the most relevant contribution for (anti)helion at low p T , decreasing with p T . The estimation of the systematic uncertainty related to the raw signal extraction depends on the considered species. For (anti)protons, the difference between the signal extracted by direct count and the one extracted from the fit is taken into account. For (anti)deuterons, this is obtained by varying the interval in which the direct counting of (anti)deuterons is performed. Finally, for (anti)helion a toy MC has been developed in order to generate 10000 TPC dE/dx samples that are compatible with the default one. A possible bias in the signal extraction process is investigated by refitting each distribution and looking into the variation of the extracted yields. Another source of systematic uncertainty is given by the incomplete knowledge of the material budget of the detector in the MC simulations. This is evaluated by comparing different MC simulations in which the material budget of the ALICE detector was varied by ±4.5% [15] after conversions. This value corresponds to the uncertainty on the determination of the material budget obtained by measuring photon conversions. The imperfect knowledge of the hadronic interaction cross section of (anti)nuclei in the material contributes to the systematic uncertainty as well and depends on the particle species. Similarly, an uncertainty related to the ITS-TPC matching is considered and evaluated from the difference between the ITS-TPC matching efficiencies in data and MC. Finally, the trigger inefficiency is also a source of systematic uncertainties. The uncertainty is assumed to be half of the difference between the signal loss correction (described in section 4.2) and unity. It strongly depends on the event multiplicity: it is negligible at high multiplicity and contributes up to 7% in the lowest event class for (anti)deuterons and (anti)helions. Where present, it decreases with increasing p T . The list of all the sources of systematic uncertainty for the INEL > 0 multiplicity class is reported in Table 2. The average values between matter and antimatter are reported for (anti)protons, (anti)deuterons and (anti)helions, for the lowest and highest p T values of the measured spectra.

Results and discussion
The transverse-momentum spectra for (anti)protons, (anti)deuterons and (anti)helions are shown in Fig. 1. In each p T interval, the reported yield is the average between matter and antimatter. Both of them are compatible, as already observed in previous measurements carried out by ALICE [1,10,11,23]. The measured spectra are fitted in order to extrapolate the yields in the unmeasured p T -region. For (anti)protons and (anti)deuterons, data are fitted with a Lévy-Tsallis function [27], while for (anti)helion a simple exponential depending on m T is used because it provides a better description of the data. The fraction of the yield obtained from the extrapolation depends on the considered particle species and on the multiplicity class, since the p T -coverage is generally different, being maximum (minimum) at high (low) multiplicity. For (anti)protons, the extrapolation contributes with a fraction of 10% (20%) of the total yield for the highest (lowest) multiplicity class, while for (anti)deuterons and (anti)helions it contributes with a fraction of 25% (55%) and 35% (40%) of the total yield, respectively. The p T -spectra are also fitted with a Boltzmann function and a simple exponential depending on p T , in order to quantify the effect of the chosen function on the p T -integrated yield. The difference between the yields obtained with the reference and the alternative functions is taken as systematic uncertainty. This accounts for ≈ 2% for (anti)protons and (anti)deuterons, depending on the transverse-momentum coverage of the spectra, whereas for (anti)helions this accounts for 12% in the highest multiplicity class and ≈ 19% in the lowest multiplicity class. The p T -integrated yields dN/dy are reported in Table 1. For (anti)protons, the statistical uncertainties on the yields are negligible, being ≈1% of the systematic uncertainty. Figure 2 shows the mean transverse momentum p T as a function of charged-particle multiplicity. The results are compared with those obtained in previous measurements and they confirm the increasing trend with multiplicity. Moreover, a clear mass ordering is present, as already observed for other light-flavoured particle species and for different collision systems and energies [30,32].
Combining the information from the production spectra of protons and nuclei, the coalescence parameter can be evaluated according to Eq. 1. Figure 3 [10,23] at different energies. In particular, it is now understood that the increase with transverse momentum of the coalescence parameter in INEL > 0 collisions is, in large part, due to the change in shape of the transverse momentum spectra of protons in different multiplicity intervals [10]. It is also worth mentioning that in pp collisions at high multiplicity (HM) [24], where the system size is larger than the one resulting from INEL > 0 collisions, the raise with p T cannot be neglected even in fine multiplicity classes. In [24], it was shown that the B A as a function of transverse momentum can be described by coalescence predictions, assuming a Gaussian wave function for the nuclei.
Insights into the dependence of the production mechanisms on the system size can also be obtained by studying the evolution of B A with charged-particle multiplicity. Indeed, as shown in [33], the chargedparticle multiplicity dN ch /dη can be considered as a proxy of the system size. Figure 4 shows B 2 and B 3 as a function of charged-particle multiplicity for different collision systems and energies. The presented measurements are obtained in transverse momentum ranges with central values of p T /A = 0.75 GeV/c for B 2 and p T /A = 0.78 GeV/c for B 3 , but the trend is alike for other values.
The measurements are compared with the theoretical predictions from [33], where two different parameterisations of the source radius as a function of multiplicity are used (see [33] for details). It is evident that there is no single parameterisation of the system size that is able to fit both the measured B 2 and B 3 . However, as stated also in [24], charged-particle multiplicity is not a perfect proxy for the system size, because for each multiplicity the source radius depends also on the transverse-momentum of the particle of interest. Anyhow, the data corresponding to the different collision systems and energies confirm a trend with multiplicity, which can be interpreted as an effect of the interplay between the size of the system and that of the nucleus. Indeed, at low charged-particle multiplicity, the system size is comparable with the size of the nucleus (about 2 fm, depending on the nuclear species and on the parameterisation of  the model), determining the slow decrease with multiplicity. On the contrary, increasing the multiplicity the system size becomes larger and larger than the nucleus size, making the coalescence process less and less probable [1,33]. Figure 5 shows the ratios between the p T -integrated yields of nuclei and protons as a function of chargedparticle multiplicity. A common trend as a function of the charged-particle multiplicity is seen, monotonically increasing for pp and p-Pb collisions and eventually saturating for Pb-Pb collisions [24]. This is the effect of the interplay between the different evolution with the charged-particle multiplicity of the source size and of the particle yields [24]. The systematic uncertainties in this analysis are reduced with respect to the previous ALICE measurements thanks to the recent studies on the interaction cross section of antideuteron with the material [35]. The experimental data are compared with the predictions of both Thermal-FIST [13] CSM and coalescence model [34]. The CSM prediction is provided for different cor-Production of light (anti)nuclei in pp collisions at √ s = 5.02 TeV ALICE Collaboration   [13] for two sizes of the correlation volume V C . For (anti)deuterons, the green band represents the expectation from a coalescence model [34]. For (anti)helion, the green and blue lines represent the expectations from a two-body and three-body coalescence models [34].
relation volumes V C , from 1 to 3 times the volume dV /dy. For both (anti)deuterons and (anti)helions, the CSM and the coalescence model can qualitatively describe the observed trend. A detailed study of the V C value is required to determine if the CSM is able to describe simultaneously the deuteron and helion measurement here reported. The coalescence model seems to describe better the data points, and better for (anti)deuterons than for (anti)helions, where some tension at intermediate multiplicity is visible.

Conclusions
The LHC demonstrated to be an unprecedented antimatter factory. The production of nuclei and antinuclei has been explored at all energies delivered by the LHC during its Run 2 [6,10,11,23,24,31] and a clear pattern emerged: the production of nuclei is tightly driven by the underlying event multiplicity. Other variables, like the collision energy or even the colliding system (pp or p-Pb), are essentially irrelevant in the description of the nucleosynthesis processes in hadronic collision.
The CSM can explain qualitatively the observed trend in the nucleus-to-proton ratios as a function of multiplicity. On the other hand, coalescence connects the hadron-emitting source size with the observed production of nuclei. The size of the hadron-emitting source increases with multiplicity and decreases with momentum as demonstrated by recent particle correlation measurements [36]. Through this observation, coalescence can predict the yield of nuclei as a function of both multiplicity and momentum starting from the measured proton spectrum. In this paper, it is shown that the coalescence prediction agrees quantitatively with the measured deuteron-to-proton ratio, while the helion-to-proton ratio in pp collisions at 5.02 TeV confirms the trend of the previous measurements deviating from the coalescence prediction at intermediate multiplicities. However, the comparison between the coalescence parameters with coalescence predictions show great sensitivity to different source size parameterisations, suggesting that some of the observed discrepancies might be due to the source size determination. During the LHC Run 3, the ALICE experiment targets an integrated luminosity of 6 pb −1 for pp collisions at 5.02 (or 5.5) TeV and up to 200 pb −1 at 13 TeV [37], which corresponds to a sample larger by at least a factor 400 with respect to Run 2. This sample will enable a simultaneous study of the production of nuclei and the

Acknowledgements
The ALICE Collaboration would like to thank all its engineers and technicians for their invaluable contributions to the construction of the experiment and the CERN accelerator teams for the outstanding performance of the LHC complex. The      [32] ALICE Collaboration, S. Acharya et al., "Multiplicity dependence of (multi-)strange hadron production in proton-proton collisions at √ s = 13 TeV", Eur. Phys. J. C 80 no. 2, (2020) 167, arXiv:1908.01861 [nucl-ex].