1 Introduction

The mass composition of the UHECRs is one of the most important ingredients needed when trying to elucidate the origin and acceleration mechanisms of these most energetic particles in the universe. The most reliable observable from extensive air showers (EAS) used to infer the mass composition is the \(X_{\mathrm{max}}\) parameter [1], the atmospheric depth where the energy deposit profile of secondary particles reaches its maximum. This parameter is related to the mass of the primary particle which initiate the shower, \(\langle X_{\mathrm{max}}\rangle \propto - \ln A\), with larger mean values and dispersion for light primary particles in comparison with the heavier nuclei. Experimentally, the mass composition of UHECRs was inferred from measurements of the first two moments of the \(X_{\mathrm{max}}\) distributions \((\langle X_{\mathrm{max}}\rangle \;\mathrm{{and}}\;\sigma _{X_{\mathrm{max}}})\) as a function of the primary energy, by the Pierre Auger [2,3,4], High-Resolution Fly’s Eye (HiRes) [5] and the Telescope Array [6] Collaborations. Despite of the large data acquisition time (the Pierre Auger Observatory is operating since 2004) and large acceptance of the experiments, the reconstruction of the mass composition is affected by large uncertainties mainly due to the unknown interaction cross sections at highest energies, experimental systematic uncertainties and poor statistics at the highest energies.

In [7] the Pierre Auger Collaboration show that using only the limited information given by the first two moments of the \(X_{\mathrm{max}}\) distributions, degeneracies may be induced when interpreting the mass composition of a given \(X_{\mathrm{max}}\) distribution, e.g. different mixes of primary particles can have identical mean and dispersion. To get information on fractions of individual nuclei, the Pierre Auger Collaboration used the entire shape of \(X_{\mathrm{max}}\) distributions fitting them with MC templates for (p, He, N, Fe). The fits were performed with a binned maximum-likelihood method and the goodness of the fits was characterised with p-value. With the use of this method the Auger data for \(E > 10^{17.8}\) eV could be described well with mixed compositions consisting of p, He and N (as representative for the intermediate mass elements), while fractions of Fe were close to zero in most of the energy bins.

In this work we show that fitting the \(X_{\mathrm{max}}\) distributions with (p, He, N, Fe) elements, the fit quality is affected if some intermediate elements, e.g. Ne/Si are in fact present. For that, we propose a method for finding the best combination from a larger set of possible elements (p, He, C, N, O, Ne, Si and Fe) to fit the data. Applying this method to the Auger data reported in [4] we observe a slight improvement of p-values when Ne/Si are considered as additional fitting elements in some energy bins, especially at energies below the ankle (\(E < 10^{18.6}\) eV) where the statistics in the data is larger.

In Sect. 2 we describe the simulation procedure to obtain the \(X_{\mathrm{max}}\) Probability Density Functions (PDFs) for each primary species in the energy range \(\lg (E/\mathrm eV) = [17.8 - 19.3]\). In Sect. 3 we show the influence of Ne/Si abundance on the goodness of fit parameter p-value. In Sect. 4 we present the fit results on \(X_{\mathrm{max}}\) distributions measured by the Pierre Auger Observatory (until 2014), considering the elements which best describe the data. Section 5 concludes the paper.

2 Simulations

We used the CONEX v4r37 simulation code [8, 9] to generate the \(X_{\mathrm{max}}\) distributions for each element (p, He, C, N, O, Ne, Si and Fe) in 15 energy intervals of 0.1 in \(\log (E/eV)\) starting from \(E = 10^{17.8}\) eV up to \(E = 10^{19.3}\) eV. Three high energy hadronic interaction models were employed, EPOS-LHC [10], QGSJETII-04 [11] and Sibyll 2.1 [12]. The zenith angle of the showers were sampled from an isotropic distribution in the interval \(\theta = [0^{\circ }{-}60^{\circ }]\). The statistics of the simulation data set consists in \(10^4\)\(10^{5}\) events per each primary species per hadronic interaction model in each energy interval. A PDF of \(X_{\mathrm{max}}\) for a nuclear species in a given energy interval consists in a binned \(X_{\mathrm{max}}\) distribution in the range (0–2000) g/\(\hbox {cm}^{2}\) with a bin width \(= 20\) g/\(\hbox {cm}^{2}\). The true \(X_{\mathrm{max}}\) values given by the CONEX simulations were modified to account for the detector acceptance and experimental resolution (Eqs. (7) and (8) from [4]). An example of PDFs of \(X_{\mathrm{max}}\) for proton and iron induced showers in the energy interval \(\lg (E/ \mathrm eV) = [19.0{-}19.1]\) for two hadronic interaction models is presented in Fig. 1.

Fig. 1
figure 1

PDFs of \(X_{\mathrm{max}}\) for proton and iron induced showers considering EPOS-LHC and QGSJETII-04. The \(X_{\mathrm{max}}\) values are obtained from CONEX simulations taking into account the experimental acceptance and resolution effects (Eqs. (7) and (8) from [4])

We will use these PDFs in the next section to generate random \(X_{\mathrm{max}}\) distributions with different mixes of primary particles to observe the behavior of the goodness of fit estimator p-value as a function of different prior abundances, when the \(X_{\mathrm{max}}\) distributions are fitted with the four fixed PDFs (p, He, N and Fe).

3 Influence of Ne/Si on the goodness of fit

The results on mass composition of primary cosmic rays at energies \(E > 10^{17.8}\) eV reported in [7] indicate a modulation of the abundances of primary protons, He and N nuclei as a function of energy. The experimental \(X_{\mathrm{max}}\) distributions in each energy interval were fitted with four primary PDFs (p, He, N and Fe) following a binned maximum-likelihood procedure. Different astrophysical models suggest a variation of the abundance of different elements as a function of energy below and above the ankle [13,14,15,16]. In such a scenario, the observed modulation of the reconstructed fractions might be biased as a consequence of a high abundance of an intermediate element not included into the fitting procedure, in the case when the \(X_{\mathrm{max}}\) distributions are fitted with the same fixed four species (p, He, N and Fe) over the entire energy range.

We performed the following test. Using individual \(X_{\mathrm{max}}\) values obtained from simulations as explained in Sect. 2, we build \(X_{\mathrm{max}}\) distributions for each energy bin considering random abundances of eight primary species (p, He, C, N, O, Ne, Si and Fe). We generated a large number of such distributions (\(3 \times 10^{4}\)) to ensure that we cover all possible mixes. The statistics in each distribution is of the same magnitude as in the Auger data. Then, using a binned maximum-likelihood procedure we fit these \(X_{\mathrm{max}}\) distributions with 4 PDFs (p, He, x, Fe), where x was varied from C to Si.

The minimizing quantity, \(-\ln L\), in this fitting procedure is defined as:

$$\begin{aligned} -\ln L = \sum _{i} y_i - n_i + n_i \ln (n_i / y_i), \end{aligned}$$
(1)

where \(n_i\) stands for the measured counts in the “i”-th bin of an \(X_{max}\) distribution and \(y_i\) represents the MC prediction for that bin [17]. The p-value parameter represents the probability of obtaining a worse fit than that observed, even if the distribution predicted by the fit results is correct:

$$\begin{aligned} \textit{p-value} = 1 - \Gamma \left( \frac{ndf}{2}, \frac{\chi ^2}{2}\right) , \end{aligned}$$
(2)

where \(\Gamma \) is the incomplete gamma function, ndf represents the number of degrees of freedom, and \(\chi ^2\) represents the sum of the square of residuals using the parameters computed by the likelihood method. Note that the p-values calculated using Eq. 2 differ from those calculated in [7]. We make the approximation that L behaves like a \(\chi ^2\) variable while in [7] the p-value parameters are calculated in a more realistic way, using mock data sets of the predicted fractions with size equal to the real data sets. Even if the absolute p-values might be affected by this approximation we consider that the relative variation of p-values with the components included in the fit is significant. In addition, we mention that the p-values obtained by us with Eq. 2 do not differ significantly from those obtained with the method used in [7], therefore we consider that the main conclusion of this paper will be not affected by this choice.

Indeed, the best results were obtained when the distributions were fitted with p, He, Fe and any of CNO nuclei. Further, we tried to check what is the capability of this fitting method to reconstruct these four abundances (p, He, N, Fe) if one of the primary species has a high prior abundance. An example of the evolution of the fit quality as a function of different abundance of nuclear species in the \(X_{\mathrm{max}}\) distributions is represented in Fig. 2 for the energy interval \(\lg (E/\mathrm eV) = [18.4{-}18.5]\) for EPOS-LHC. We considered the actual statistics measured by Auger in this energy interval, N = 1139 (upper panel) and the case in which we double and triple the number of events in distributions (middle panel and bottom panel respectively). The results from Figs. 2 and 3 can be interpreted as follows: the first blue circle stands for the case in which the true fraction of protons in \(X_{\mathrm{max}}\) distributions was in the interval [0–0.1] while the rest seven elements had random abundances. The second blue circle stands for the case when the true fraction of protons was in the range [0.1–0.2] and so on. Similarly, the green “x” cross symbol stands for the case in which the true fraction of He in \(X_{\mathrm{max}}\) distributions was in the interval [0–0.1] while the rest seven elements had random abundances and so on. It was convenient to quantify the quality of fit as fraction of events with p-value\( > 0.1\).

As we can see in Fig. 2, the probability of obtaining a good p-value decreases with the increase of abundances of Ne or Si and with increase of statistics in \(X_{\mathrm{max}}\) distributions, when the fitting procedure includes only four PDFs (p, He, N and Fe). Moreover, we found that the reconstructed fractions of protons, He and Fe differ from the true fractions by up to \(20\%\) in some cases (i.e. when the abundance of Ne or Si is \(> 40\%\)). When fitting the same distributions with five elements including Si (p, He, N, Si and Fe), the fit quality is not affected by the higher prior abundances of Ne or Si. These results are presented in Fig. 3. We observed that in the energy bins where the statistics is very small (e.g. \(\lg (E/\mathrm eV) = [19.2{-}19.3]\), \(N = 87\)), the higher abundance of Ne or Si does not affect the quality of the fit. For these energies the reliable estimations on the mass composition can not be obtained due to poor statistics available in data.

Fig. 2
figure 2

Fraction of events with a p-value greater than 0.1 as a function of prior abundance of different species. The \(X_{\mathrm{max}}\) distributions correspond to the energy interval \(\lg (E/\mathrm eV) = [18.4{-}18.5]\) considering EPOS-LHC. The fitting function includes four elements (p, He, N and Fe). The statistics in the \(X_{\mathrm{max}}\) distributions is \(N = N_{Auger} = 1139\) (top), \(N = 2 N_{Auger} = 2278\) (middle) and \(N = 3 N_{Auger} = 3417\) (bottom)

Fig. 3
figure 3

Fraction of events with a p-value greater than 0.1 as a function of prior abundance of different species. The \(X_{\mathrm{max}}\) distributions correspond to the energy interval \(\lg (E/\mathrm eV) = [18.4{-}18.5]\) considering EPOS-LHC. The fitting function includes five elements (p, He, N, Si and Fe). The statistics in the \(X_{\mathrm{max}}\) distributions is \(N = N_{Auger} = 1139\) (top), \(N = 2 N_{Auger} = 2278\) (middle) and \(N = 3 N_{Auger} = 3417\) (bottom)

4 Fitting Auger \(X_{\mathrm{max}}\) distributions

We fit the experimental \(X_{\mathrm{max}}\) distributions measured at the Pierre Auger Observatory [4], with the four fixed PDFs (p, He, N and Fe) on the entire energy range \(\lg (E/\mathrm eV) = [17.8{-}19.3]\). The results we have obtained are in a very good agreement with those reported in [7], since we build our PDFs based on the same version of CONEX code, employing the same versions of hadronic interaction models and considering the same binned maximum-likelihood fitting procedure.

We found that the most appropriate approach to fit the experimental \(X_{\mathrm{max}}\) distributions in each energy interval is to consider all possible combinations of PDFs from a larger set of nuclear elements (p, He, C, N, O, Ne, Si and Fe) and then to find the “best combination” of elements which best describe the data. Thus, the number of elements from a “best combination” may vary between 1 and 8. We will refer from now on to this fitting approach as “best combination”.

It is worth mentioning that in the minimization procedure of the log-likelihood (Eq. 1) we do not neglect the empty bins and the ndf parameter is calculated as the number of bins in the \(X_{max}\) distribution minus the number of parameters considered in the fit. Therefore, the computation of the p-value parameter takes into account the number of parameters considered in the fit.

In Fig. 4 we give an example of a \(X_{\mathrm{max}}\) distribution measured by the Pierre Auger Observatory in the energy interval \(\lg (E/\mathrm eV) = [17.9 - 18.0]\). We found that the “best combination” (Fig. 4 left) suggests that the shape of the distribution is best described only by two elements, protons and O, with p-value\(= 0.35\), for the case of EPOS-LHC model. In Fig. 4 right we present the results obtained by fitting the same \(X_{\mathrm{max}}\) distribution with 4 PDFs (p, He, N and Fe). In this case we obtain a worse p-value\(= 0.22\).

Fig. 4
figure 4

\(X_{\mathrm{max}}\) distribution measured at the Pierre Auger Observatory in the energy interval \(\lg (E/\mathrm eV) = [17.9{-}18.0]\). The reconstructed fractions predicted by the “best combination” fitting procedure are displayed on the (left) with p-value\( = 0.35\). The reconstructed fractions obtained by fitting the same distribution with 4 PDFs (p, He, N and Fe) are displayed on the (right) (p-value\(= 0.22\)). All the PDFs are obtained considering the EPOS-LHC interaction model

A direct comparison of the two fitting procedures is presented in Figs. 5, 6 and 7 for EPOS-LHC, QGSJETII-04 and Sibyll 2.1 for the entire energy range. In the case of QGSJETII-04 (Fig. 6) and Sibyll 2.1 (Fig. 7) we observe negligible differences if the \(X_{\mathrm{max}}\) distributions are fitted with the four fixed PDFs (p, He, N and Fe) or if we use the “best combination” of elements. The only modification consists in a slight improvement of the p-value parameter over the entire energy range for the “best combination” case. Important to mention that the number of elements from the “best combination” consists in two or three elements over the entire energy range for each hadronic interaction model. The error bars (statistical uncertainties) of the fitted fractions should not be compared with those from [7] since they are computed considering different methods. We have employed the MINOS technique based on \(\Delta L = 1/2\) rule, while in [7] the authors used the Feldman–Cousins procedure in which the parameter uncertainties are computed in a more rigorous way by enforcing unitarity. Most likely the uncertainties from Figs. 5, 6 and 7 from our manuscript are underestimated.

The most interesting aspect is observed in the case of EPOS-LHC model (Fig. 5) at the lower energies. We found that for some energy intervals, e.g. \(\lg (E/\mathrm eV) = [18.1{-}18.2]\), \([18.4{-}18.5]\), the “best combination” suggest the presence of Ne or Si in Auger data with a slight improvement of the p-value parameter. This aspect is in agreement with our results from Sect. 3, where we found that a high prior abundance of Ne or Si (\( > 20 \% \)) may affect the quality of fit if the \(X_{\mathrm{max}}\) distributions are fitted with the four PDFs (p, He, N and Fe). Without making speculations, one can consider that the results presented in this paper could be a hint for the presence of the heavier elements (\(20< A < 39\)) around the ankle, as predicted in [16].

Fig. 5
figure 5

Fitted fractions in each energy interval considering EPOS-LHC model. Blue full circles stand for the fitting method which uses only four fixed species (p, He, N and Fe) on the entire energy range. Red circles represent the fitted fractions found for the “best combination” method and black stars stand for Auger 2014 results [7]

Fig. 6
figure 6

Fitted fractions in each energy interval considering QGSJETII-04 model. Blue full circles stand for the fitting method which uses only four fixed species (p, He, N and Fe) on the entire energy range. Red circles represent the fitted fractions found for the “best combination” method and black stars stand for Auger 2014 results [7]

Fig. 7
figure 7

Fitted fractions in each energy interval considering Sibyll 2.1 model. Blue full circles stand for the fitting method which uses only four fixed species (p, He, N and Fe) on the entire energy range. Red circles represent the fitted fractions found for the “best combination” method and black stars stand for Auger 2014 results [7]

5 Discussions and conclusions

In this paper we investigated the capability to infer the mass composition of the primary UHECRs from measurements of \(X_{\mathrm{max}}\) distributions. Using simulated \(X_{\mathrm{max}}\) distributions for a large set of primary species (p, He, C, N, O, Ne, Si and Fe), we build \(X_{\mathrm{max}}\) distributions with random mixes of elements for each energy interval in the energy range \(\lg (E/\mathrm eV) = [17.8{-}19.3]\). We found that a high prior abundance of Ne or Si can bias the reconstructed fractions of elements if the distributions are fitted with four fixed PDFs (p, He, N and Fe). We found that the fit quality decreases with increasing the Ne/Si abundance and with increasing the statistics in the \(X_{\mathrm{max}}\) distributions.

We proposed an alternative approach to infer the mass composition from the \(X_{\mathrm{max}}\) distributions which finds the “best combination” of elements best describing the distributions from a larger set of primaries. Applying this method to the \(X_{\mathrm{max}}\) distributions measured by the Pierre Auger Observatory until 2014, it was shown that in some low energy bins, only for the EPOS-LHC model the “best combination” of elements suggests the presence of Ne or Si, with a slight improvement of the p-value parameter. Since we have shown using simulations that a high Ne/Si prior abundance will affect the fit quality if the \(X_{\mathrm{max}}\) distribution is fitted with four PDFs (p, He, N and Fe), we consider that it is important to take into account further elements in future studies.