1 Introduction

Light (anti)nuclei are abundantly produced in ultrarelativistic heavy-ion collisions [1,2,3] at the Large Hadron Collider (LHC), but their measurement in pp collisions is challenging due to their lower production yields. As a consequence, until few years ago there were only few measurements of the production rates of (anti)nuclei in small collision systems [1, 4,5,6]. This has recently changed thanks to the large pp data samples collected by ALICE at the LHC, which allow us to perform more precise and differential measurements of the production of light (anti)nuclei. In this paper, we present the detailed study of the multiplicity and transverse momentum dependence of (anti)proton, (anti)deuteron and (anti)\(^3\hbox {He}\) production in pp collisions at \(\sqrt{s} = 5.02\) TeV. The results shown in the following are the most accurate obtained so far in small systems and represent the full compilation of data available for pp collisions at different energies at the end of the LHC Run 2.

The production mechanism of light (anti)nuclei in high-energy hadronic collisions is not fully understood. The classes of models used for comparison with the experimental results are the Statistical Hadronisation Models (SHM) and the coalescence models. SHMs assume that particles originated from an excited region evenly occupy all the available states in phase space [7]. Pb–Pb collisions, characterised by a large extension of the particle-emitting source and hence considered as large systems, are described according to a grand canonical ensemble [8]. On the contrary, pp and p–Pb collisions, which are characterised by a small size and are considered as small systems, must be described based on a canonical ensemble, requiring the local conservation of the appropriate quantum numbers [9]. The expression Canonical Statistical Model (CSM) is used to underline the canonical description.

An important observable that provides information on the production mechanism is the ratio between the \(p_{\mathrm {T}}\)-integrated yields of nuclei and protons. The measured d/p and \(^3\)He/p ratios show a rather constant behaviour as a function of centrality in Pb–Pb collisions. In contrast to that, they increase in pp and p–Pb collisions with increasing multiplicity, finally reaching the values measured in Pb–Pb collisions [1, 10, 11]. The constant nuclei-to-proton ratios in large collision systems is predicted by the SHMs [12], while the experimentally determined difference between small and large systems can be qualitatively explained as an effect of the canonical suppression of the nuclei yields for small system sizes. The prediction of the CSM saturates towards the grand canonical value at larger system size [13] .

In coalescence models, (anti)nuclei are formed by nucleons close in phase space [14]. In this approach, the coalescence parameter \(B_{\mathrm {A}}\) relates the production of (anti)protons to the one of \(\text {(anti)nuclei}\). \(B_{\mathrm {A}}\) is defined as

$$\begin{aligned} B_{\mathrm {A}}\left( p_{\mathrm {T}}^{\mathrm {p}}\right) = \frac{1}{2\pi p_{\mathrm {T}}^{\mathrm {A}}}\frac{\mathrm {d}^2N_{\mathrm {A}}}{\mathrm {d}y\mathrm {d}p^{\mathrm {A}}_{\mathrm {T}}} \; \bigg / \left( \frac{1}{2\pi p_{\mathrm {T}}^{\mathrm {p}}}\frac{\mathrm {d}^2N_{\mathrm {p}}}{\mathrm {d}y\mathrm {d}p_{\mathrm {T}} ^{\mathrm {p}}}\right) ^{\mathrm {A}} , \end{aligned}$$

where \(p_{\mathrm {T}}\) is the transverse momentum, y the rapidity and N the number of particles. The labels p and A are used to denote properties related to protons and nuclei with mass number A, respectively. The production spectra of the \(\text {(anti)protons}\) are evaluated at the transverse momentum of the nucleus divided by the mass number, so that \(p_{\mathrm {T}}^{\mathrm {p}} = p_{\mathrm {T}}^{\mathrm {A}} /A\). Neutron spectra are assumed to be equal to proton spectra, due to the isospin symmetry restoration in hadron collisions at the LHC. Since the coalescence process is expected to occur at the late stages of the collision, the \(B_{\mathrm {A}}\) parameter is related to the emission volume. In a simple coalescence approach, which describes the uncorrelated particle emission from a point-like source, \(B_\mathrm {A}\) is expected to be independent of \(p_{\mathrm {T}}\)  and multiplicity. In this context, the measurements of the nuclei-to-proton ratios and of the \(B_\mathrm {A}\) parameters in pp collisions at \(\sqrt{s} = 5.02\) TeV reported in this paper are important to complete the present picture of the production of light nuclei in small systems. In addition, the increased statistics exploited in the present analysis will allow us to better constrain the models, thus to provide important inputs to both the theoretical and experimental communities.

2 The ALICE apparatus

A detailed description of the ALICE detectors can be found in [15, 16] and references therein. In the following more information is given on the sub-detectors used to perform the analysis presented in this work, namely the V0, the Inner Tracking System (ITS), the Time Projection Chamber (TPC) and the Time-of-Flight (TOF). All of them are located inside a solenoidal magnet creating a magnetic field parallel to the beam line, with an intensity of 0.5 T for the data sample here considered.

The V0 detector [17] is formed by two arrays of scintillation counters placed around the beam pipe on either side of the interaction point. They cover the pseudorapidity ranges \(2.8 \le \eta \le 5.1\) (V0A) and \(-3.7 \le \eta \le -1.7\) (V0C). The collision multiplicity is estimated using the signal amplitude in the V0 detector, which is also used as a trigger detector. More details will be given in Sect. 3.

The ITS [18] provides high resolution track points in the proximity of the interaction region and consists of three subsystems. Going from the innermost to the outermost subsystem, we find: two layers of Silicon Pixel Detectors (SPD), two layers of Silicon Drift Detectors (SDD) and two layers equipped with double-sided Silicon Strip Detectors (SSD). The ITS extends radially from 3.9 to 43 cm, it is hermetic in azimuth and it covers the pseudorapidity range \(|\eta |<0.9\).

The same pseudorapidity range is covered by the TPC [19], which is the main tracking detector, consisting of a hollow cylinder whose axis coincides with the nominal beam axis. The active volume, filled with a Ne/\(\hbox {CO}_2\)/\(\hbox {N}_2\) gas mixture at atmospheric pressure, has an inner radius of about 85 cm and an outer radius of about 250 cm. The trajectory of a charged particle is estimated using up to 159 combined measurements (clusters) of drift times and radial positions of the ionisation electrons. The charged-particle tracks are then reconstructed by combining the hits in the ITS and the measured clusters in the TPC. The TPC is also used for particle identification (PID) by measuring the specific energy loss (\(\mathrm {d} E/\mathrm {d} x\)) in the TPC gas. In pp collisions, the \(\mathrm {d} E/\mathrm {d} x\) in the TPC is measured with a resolution of \(\approx 5.2\%\) [15].

The TOF [20] covers the full azimuth for the pseudorapidity interval \(|\eta |<0.9\). The detector is based on the Multigap Resistive Plate Chambers (MRPC) technology and is located, with a cylindrical symmetry, at an average distance of 380 cm from the beam axis. The particle identification is based on the difference between the measured time of flight and its expected value, computed for each mass hypothesis from track momentum and length. A precise starting signal for the measurement of the time of flight by the TOF is provided by the T0 detector, consisting of two arrays of Cherenkov counters, T0A and T0C, which cover the pseudorapidity regions \(4.61 \le \eta \le 4.92\) and \(3.28 \le \eta \le 2.97\), respectively [21]. The overall resolution on the particles time of flight, including the start time, is \(\approx 80\) ps.

3 Data sample

This analysis is based on approximately 900 million pp collisions (events) at \(\sqrt{s}=5.02\) TeV collected in 2017 by ALICE at the LHC. Events are selected by a minimum-bias (MB) trigger, requiring at least one hit in each of the two V0 detectors. An additional offline rejection is performed to remove events with more than one reconstructed primary vertex (pile-up events) and events triggered by interactions of the beam with the residual gas in the LHC beam pipe [17]. In total, 1.8% of the collected events are rejected due to these selections.

The production of (anti)nuclei is measured around midrapidity, within a rapidity range of \(|y|<0.5\), and within the pseudorapidity interval \(|\eta |<0.8\) to maximise the detector performance. The selected tracks are required to have at least 70 reconstructed points in the TPC and two points in the ITS in order to guarantee good track momentum and \(\mathrm {d} E/\mathrm {d} x\) resolution in the relevant \(p_{\mathrm {T}}\) ranges. In addition, at least one hit in the SPD is required to ensure a resolution of the distance of closest approach to the primary vertex better than 300 \(\upmu \)m, both along the beam axis (\(\hbox {DCA}_\mathrm {z}\)) and in the transverse plane (\(\hbox {DCA}_\mathrm {xy}\)) [15]. The quality of the accepted tracks is checked by requiring the \(\chi ^2\) per TPC reconstructed point and per ITS reconstructed point to be less than 4 and 36, respectively. Finally, tracks originating from kink topologies of kaon and pion decays are rejected.

Data are divided into multiplicity intervals classified by a roman numeral from I to X, going from the highest to the lowest multiplicity [10]. In order to achieve a higher statistical precision, classes are merged into nine classes for (anti)protons and (anti)deuterons and into two classes for (anti)helion. The multiplicity classes are defined from the mean of the V0 signal amplitudes as percentiles of the \(\mathrm {INEL}>0\) pp cross section, where \(\mathrm {INEL}>0\) events are defined as collisions with at least one charged particle in the pseudorapidity region \(|\eta |<1\) [22]. The mean charged-particle multiplicities for each class, \(\left<{\mathrm {d} N_\mathrm {ch}/\mathrm {d} \eta } \right>\), are listed in Table 1.

Table 1 Multiplicity classes for the different measurements, with the corresponding charged-particle multiplicity density at midrapidity \(\langle \)d\(N_\mathrm {ch}\)/d\(\eta \rangle \) and percentiles of the INEL > 0 pp cross section, and \(p_{\mathrm {T}}\)-integrated yields dN/dy for the different species. For protons, statistical uncertainties are negligible with respect to systematic uncertainties

4 Data analysis

4.1 Raw yield extraction

The first important step in the analysis is the particle identification. As already shown in previous works [1, 6, 10, 23, 24], the identification of (anti)nuclei is performed with two different methods, depending on the particle species and on the transverse momentum. For (anti)protons and (anti)deuterons with \(p_{\mathrm {T}}\) \(< 1\) GeV/c, the identification relies on the measurement of the \(\mathrm {d} E/\mathrm {d} x\) using the TPC. The number of signal candidates is extracted through a fit with a Gaussian with two exponential tails to the \(n_{\sigma _{\mathrm {TPC}}}\) distribution for each \(p_{\mathrm {T}}\) interval. The \(n_{\sigma _{\mathrm {TPC}}}\) is defined as the difference between the measured and the expected \(\mathrm {d} E/\mathrm {d} x\) for each particle species, divided by \(\mathrm {d} E/\mathrm {d} x\) resolution of the TPC. For \(p_{\mathrm {T}}\) \(\ge 1\) GeV/c, it is more difficult to separate (anti)protons and (anti)deuterons from other charged particles of \(|Z|=1\). Therefore, PID is performed using the TOF detector information in addition. The squared mass of the particle is evaluated as \(m^{2} = p^{2}\left( t_{\mathrm {TOF}}^2/L^2 - 1/c^2\right) \), where \(t_\mathrm {TOF}\) is the measured time of flight, L is the length of the track and p is the momentum of the particle. In order to reduce the background, the tracks are in addition required to have \(|n_{\sigma _{\mathrm {TPC}}}| < 3\). The squared mass distributions of the signal are fitted with a Gaussian function with an exponential tail. Background originating from other particle species or from the random match of a TOF hit with another track significantly increases with \(p_{\mathrm {T}}\) and is modelled with the sum of Gaussian and exponential functions. For (anti)helion, only the TPC \(\mathrm {d} E/\mathrm {d} x\) measurement is used, because their signal in the TPC can be easily separated from the one of other particle species, due to the electric charge (\(\hbox {Z} = 2\)). The raw yield of (anti)helion is obtained through a fit of the \(n_{\sigma _{\mathrm {TPC}}}\) with a Gaussian function for the signal and a Gaussian function for the contamination coming from (anti)triton, where present. When the background is negligible, the raw yield is extracted by directly counting the (anti)nuclei candidates. Otherwise, the TPC \(\mathrm {d} E/\mathrm {d} x\) and TOF squared mass distributions are fitted with the aforementioned models, using an extended-maximum-likelihood approach and the yield is obtained as a fit parameter. In the signal extraction, the fit quality is monitored and a successful Pearson test is required with the probability to reject a true hypothesis of \(5\%\).

4.2 Efficiency and acceptance correction

The raw yield must be corrected to take into account the tracking efficiency and the detector acceptance. This correction is evaluated from Monte Carlo (MC) simulated events, which are generated using the event generator PYTHIA8.21 (Monash2013 tune) [25]. However, since PYTHIA8 does not handle the production of nuclei properly, it is necessary to inject (anti)nuclei on top of each generated event. In each pp collision, one deuteron, one antideuteron, one helion or one antihelion are injected, randomly chosen from a flat rapidity distribution in the range \(|y|<1\) and a flat \(p_{\mathrm {T}}\) distribution in the range \(p_{\mathrm {T}}\) \(\in [0,10] \) GeV/c. The GEANT4 [26] transport code is exploited to describe the hadronic interaction of the particles propagating through the detector material. The correction is defined as the ratio between the number of reconstructed (anti)nuclei in the rapidity range \(|y|<0.5\) and in the pseudorapidity interval \(|\eta |<0.8\) and the number of generated ones in \(|y|<0.5\). The correction is computed separately for each (anti)nucleus and for the TPC and TOF analyses. Moreover, the raw signal needs to be corrected for trigger inefficiencies. The selected events are requested to have at least one charged-particle in the pseudorapidity region \(|\eta |<1\) (INEL \(> 0\)) [22]. Some INEL \(> 0\) events can be lost due to the finite trigger efficiency (event loss) and all the particles produced in those events are lost as well (signal loss). Hence, it is necessary to correct the spectra for the event and the signal losses. The correction must be evaluated from MC simulations because the number of rejected events and lost particles is only known there. For (anti)protons, this correction is directly computed from the MC simulation because their production is handled by the event generator. On the contrary, (anti)nuclei are injected on top of a pp collision and a direct estimation from the MC is not possible, because there would be a bias in the number of lost (anti)nuclei. For this reason, the correction for pions, kaons and protons is evaluated in this case in a different MC data set with no injected nuclei and the average value is used for (anti)deuterons and (anti)helions. Further details on this method can be found in [10, 23]. This correction is negligible at high multiplicity (\(< 1\)‰) and becomes relevant at low multiplicity (up to 14% for (anti)protons and (anti)deuterons, 2% for (anti)helions, in the low \(p_{\mathrm {T}}\) region \(p_\mathrm {T}<1\hbox { GeV/}c\)).

4.3 Secondary (anti)nuclei contamination

The contribution of secondary (anti)nuclei, i.e. (anti)nuclei that are not produced directly in the collision, must be subtracted from the total measured yields. Secondary nuclei are mostly produced in the interaction of particles with the vacuum beam pipe and the detector material. Moreover, an important contribution to secondary (anti)protons is also given by the weak decay of heavier particles. All particles coming from strong and electromagnetic decays are considered as primary. (Anti)deuterons and (anti)helions receive a negligible background contribution from weak decays, since the only known contribution comes from the decays of hypertriton (\(^3_\Lambda \)H \(\rightarrow \) d + p + \(\pi \) and \(^3_\Lambda \)H \(\rightarrow \) \(^3\)He + \(\pi \)) and their antimatter counterparts, whose production is known to be suppressed in pp collisions [6]. Finally, the production of secondary antideuterons and antihelions from material is extremely rare due to baryon number conservation. The fraction of primary (anti)nuclei is evaluated through a template fit to the \(\hbox {DCA}_\mathrm {xy}\) distribution of the data, as described in [1]. The templates for primary and secondary (anti)protons and deuterons are obtained from MC simulations. For (anti)protons, two templates are used to describe both (anti)protons from weak decays and from material. While the template for primary (anti)helions is extracted from the MC as well, this is not possible for the template for secondaries, due to the very rare production of antihelion. For this reason, the (anti)proton template at half the (anti)helion \(p_{\mathrm {T}}\) is used as a proxy for the (anti)helion one. This procedure is based on the assumption that the \(\hbox {DCA}_\mathrm {xy}\) distributions of secondary (anti)helions can be represented by the \(\hbox {DCA}_\mathrm {xy}\) distributions of (anti)protons at a transverse momentum which is scaled with the rigidity p/z of (anti)helion, where z is the (anti)helion electric charge. The contribution of secondary nuclei is observed to be more relevant at low \(p_{\mathrm {T}}\) (20% for protons, 40% for deuterons and 90% for helions) and to decrease exponentially with increasing transverse momentum.

Fig. 1
figure 1

Transverse-momentum spectra of (anti)protons (left), (anti)deuterons (center) and (anti)helions (right) in the different multiplicity classes, reported in Table 1. (Anti)deuteron and (anti)proton spectra are fitted with a Lévy–Tsallis function [27], while (anti)helion spectra are fitted with an exponential function with respect to the transverse mass \(m_{\mathrm {T}}\)

4.4 Systematic uncertainties

One contribution of the systematic uncertainties comes from the adopted track selection criteria. This uncertainty is evaluated by varying the selections, as done in [10]. The effect of the subtraction of secondary (anti)nuclei is studied with the variation of the \(\hbox {DCA}_\mathrm {z}\) and \(\hbox {DCA}_\mathrm {xy}\) selections as well. This is the most relevant contribution for (anti)helion at low \(p_{\mathrm {T}}\), decreasing with \(p_{\mathrm {T}}\). The estimation of the systematic uncertainty related to the raw signal extraction depends on the considered species. For (anti)protons, the difference between the signal extracted by direct count and the one extracted from the fit is taken into account. For (anti)deuterons, this is obtained by varying the interval in which the direct counting of (anti)deuterons is performed. Finally, for (anti)helion a toy MC has been developed in order to generate 10000 TPC \(\mathrm {d} E/\mathrm {d} x\) samples that are compatible with the default one. A possible bias in the signal extraction process is investigated by refitting each distribution and looking into the variation of the extracted yields. Another source of systematic uncertainty is given by the incomplete knowledge of the material budget of the detector in the MC simulations. This is evaluated by comparing different MC simulations in which the material budget of the ALICE detector was varied by \(\pm \,4.5\%\) [15] after conversions. This value corresponds to the uncertainty on the determination of the material budget obtained by measuring photon conversions. The imperfect knowledge of the hadronic interaction cross section of (anti)nuclei in the material contributes to the systematic uncertainty as well and depends on the particle species. Similarly, an uncertainty related to the ITS-TPC matching is considered and evaluated from the difference between the ITS-TPC matching efficiencies in data and MC. Finally, the trigger inefficiency is also a source of systematic uncertainties. The uncertainty is assumed to be half of the difference between the signal loss correction (described in Sect. 4.2) and unity. It strongly depends on the event multiplicity: it is negligible at high multiplicity and contributes up to 7% in the lowest event class for (anti)deuterons and (anti)helions. Where present, it decreases with increasing \(p_{\mathrm {T}}\). The list of all the sources of systematic uncertainty for the INEL \(> 0\) multiplicity class is reported in Table 2. The average values between matter and antimatter are reported for (anti)protons, (anti)deuterons and (anti)helions, for the lowest and highest \(p_{\mathrm {T}}\) values of the measured spectra.

Table 2 Summary of the contributions to the systematic uncertainties of the yield for the INEL \(> 0\) event class for the different species
Fig. 2
figure 2

Mean transverse momentum of (anti)protons (left), (anti)deuterons (centre) and (anti)helions (right) in pp collisions at \(\sqrt{s} = 5.02\) TeV, in high-multiplicity pp collisions at \(\sqrt{s} = 13\) TeV [24], in INEL > 0 pp collisions at \(\sqrt{s} = 13\) TeV [23,