Measurement of inclusive forward neutron production cross section in proton-proton collisions at $\mathrm{\sqrt{s} = 13~TeV}$ with the LHCf Arm2 detector

In this paper, we report the measurement relative to the production of forward neutrons in proton-proton collisions at $\mathrm{\sqrt{s} = 13~TeV}$ obtained using the LHCf Arm2 detector at the Large Hadron Collider. The results for the inclusive differential production cross section are presented as a function of energy in three different pseudorapidity regions: $\eta>10.76$, $8.99<\eta<9.22$ and $8.81<\eta<8.99$. The analysis was performed using a data set acquired in June 2015 that corresponds to an integrated luminosity of $\mathrm{0.194~nb^{-1}}$. The measurements were compared with the predictions of several hadronic interaction models used to simulate air showers generated by Ultra High Energy Cosmic Rays. None of these generators showed good agreement with the data for all pseudorapidity intervals. For $\eta>10.76$, no model is able to reproduce the observed peak structure at around $\mathrm{5~TeV}$ and all models underestimate the total production cross section: among them, QGSJET II-04 shows the smallest deficit with respect to data for the whole energy range. For $8.99<\eta<9.22$ and $8.81<\eta<8.99$, the models having the best overall agreement with data are SIBYLL 2.3 and EPOS-LHC, respectively: in particular, in both regions SIBYLL 2.3 is able to reproduce the observed peak structure at around $\mathrm{1.5-2.5~TeV}$.


Introduction
In recent years, several ground-based experiments, like the Pierre Auger Observatory [1] and Telescope Array [2], have measured the flux and composition [3,4] of cosmic rays above 10 18 eV, the Ultra High Energy Cosmic Rays (UHECRs). These results are affected by large uncertainties due to the hadronic interaction models on which measurements must rely. They are indeed necessary to reconstruct the properties of the incident cosmic ray from the Extensive Air Shower (EAS) it forms when interacting with the atmosphere. Due to the lack of calibration data relative to secondary particles produced in the forward region in high energy collisions, these models lead to very different predictions in the phase space that is relevant to describe EASs physics [5]. Some of these models, like QGSJET II-04 [6], EPOS-LHC [7] and SIBYLL 2.3 [8], were tuned making use of the results relative to p-p collisions at √ s = 0.9 and 7 TeV, obtained in Run I operations of the CERN Large Hadron Collider (LHC) [9], the highest energy accelerator currently available. Even considering these post-LHC models, the level of agreement between them is still not satisfactory [10], so that higher energy data can help in reducing this discrepancy. Among LHC experiments, the LHCf [11] experiment has been designed in order to measure the distributions of neutral particles produced in the very forward region in p-p and p-nucleus collisions. A particular interest is reserved to forward neutral hadrons detectable by the experiment, mainly neutrons, because of the key role that forward baryons have in the development of the atmospheric shower and in the abundance of the muonic component [12]. The measurements relative to the inclusive differential cross section of forward neutrons produced in p-p collisions at √ s = 7 TeV have already been published by the collaboration [13]. In this paper, we extend this analysis to the data relative to p-p collisions at √ s = 13 TeV, making use of the same pseudorapidity regions previously defined. Compared to the 7 TeV case, this corresponds to an increase of the energy in the laboratory frame by a factor four and an increase in the p T coverage by a factor two. Note that the results presented in this paper are relative to only one of the two LHCf detectors (Arm2), but in the future we plan to enlarge the pseudorapidity acceptance extending the analysis to the other detector (Arm1).

The LHCf experiment
The LHCf experiment consists of two detectors [14], Arm1 and Arm2, placed on the opposite sides of LHC Interaction Point 1 (IP1), in regions called Target Absorber Neutral (TAN), located at a distance of 141.05 m from IP1. Thanks to this position, each of the two detectors is able to measure neutral particles produced in p-p and p-nucleus collisions with a pseudorapidity coverage η > 8. 4.
Arm2 is formed by two calorimetric towers made up by 16 Gd 2 SiO 5 (GSO) scintillators layers (1 mm thick) alternated to 22 tungsten (W) layers (7 mm thick) for a total length of about 21 cm, equivalent to 44 X 0 and 1.6 λ I . The transverse sizes of the two towers, called small tower and large tower, are respectively 25 mm × 25 mm and 32 mm × 32 mm. Apart from energy, reconstructed using the deposit in the 16 scintillator layers, the two towers have the possibility to reconstruct transverse position as well, making use of 4 xy imaging layers placed at different depths in the calorimeter. These imaging layers are made up by 0.16 mm read-out pitch silicon microstrip detectors.
The detector just described is different from the one used in [13], because, before LHC Run II operations, it was upgraded in order to improve the radiation hardness. However, the performances of the current detector for the reconstruction of hadronic showers, investigated making use of beam test data and MC simulations, are not significantly different from the old ones [15]. For hadrons between 1 and 6.5 TeV, detection efficiency ranges from about 52 to 72%, energy resolution from 28 to 38% and position resolution from 0.3 to 0.1 mm.

Experimental data set
The data analyzed in this paper were acquired during proton-proton collisions at √ s = 13 TeV, from 22:32 of June 12th to 1:30 of June 13th 2015 (CEST). This time period corresponds to the LHC Fill 3855, a special low luminosity and high β * fill dedicated to LHCf data taking. In this fill, 29 bunches collided at IP1 with a half crossing angle of 145 µrad downward. Additionally, 6 and 2 non-colliding bunches at IP1 circulated for the clockwise and counter-clockwise beams, respectively. Beam β * was around 19 m and µ, the average number of collisions per bunch crossing, was in the range 0.007-0.012. Given the 15% acceptance of the calorimeter for inelastic collisions, event pile-up at the detector level was estimated to be below 1%, effect that was therefore neglected in this analysis. Using the instantaneous luminosity measured from the ATLAS experiment (3-5 ×10 28 cm −2 s −1 , [16]) and taking into account LHCf data acquisition live time, the recorded integral luminosity is 0.194 nb −1 . The number of recorded shower events in Arm2 is 2.1 ×10 6 .

Simulation data set
Monte Carlo simulations (MC) relative to the same experimental configuration present at the LHC are necessary for four different purposes: estimation of correction factors and systematic uncertainties, validation of the whole analysis procedure, energy spectra unfolding and comparison of models with the final result. All simulations steps (collision, transport and interaction with the detector) were realized making use of Cosmos 7.633 [17] and EPICS 9.15 [18] libraries, except where differently specified. In the following, MC are separated in three different categories, also summarized in table 1: a) MC that require full detector simulation, i.e. to generate collisions, to transport products from IP1 to TAN (taking into account the effect of magnetic field, particles decay and particles interaction with the beam pipe) and to simulate the interaction with the detector. Three different models were used to generate collisions: QGSJET II-04, EPOS-LHC and DPMJET 3.04 [19], with a corresponding statistics of 10 8 , 5×10 7 and 2×10 7 inelastic collisions, respectively. In all cases, the transport through the beam pipe and the interaction with the detector were simulated making use of the DPMJET 3.04 model. In addition, we generated another special set of 4×10 7 inelastic collisions dedicated to multihit events (events in which more than one particle enter a given calorimetric tower) both for QGSJET II-04 and EPOS-LHC. For this purpose, two steps are required. At first, starting from the full sample described above, we separated each multihit event, where n particles simultaneously hit the detector, into n independent singlehit events (events in which only one particle enters a given calorimetric tower). Then we simulated again the interaction with the detector, this time one particle at a time. All these simulations were used for the validation of the whole analysis procedure and for a preliminary comparison with experimental data at the folded spectra level, as well as for the estimation of a part of the correction factors (beam-pipe background, Particle IDentification (PID), multihit, fake, missed, see section 5.2) and of the systematic uncertainties (position resolution, unfolding method, see section 5.4). Note that for this last purpose (evaluation of corrections and uncertainties), only QGSJET II-04 and EPOS-LHC were employed as collision generators, whereas DPMJET 3.04 was excluded because of its limited agreement with data already at the folded spectra level (as discussed in section 6).  Table 1. Summary of the three MC sets described in the text. The simulation is divided in three main steps: generation of particles produced in p-p collisions at √ s = 13 TeV, transport of particles from IP1 to TAN (taking into account the effect of bending magnetic field, particles decay and beam pipe interaction), interaction of particles with the Arm2 detector. For each MC set we specified which software interfaces and which interaction models were used to simulate each step. The symbol "-" indicates that it was not necessary to simulate the given step for that MC set. b) MC that require only the generation of collisions and the transport of secondaries to TAN, without taking into account the interaction with the detector. We simulated 10 8 collisions for each of the following models: QGSJET II-04, EPOS-LHC, SIBYLL 2.3, DPMJET 3.06 and PYTHIA 8.212 [20]. The PYTHIA 8.212 simulation data set was generated using its own dedicated generator, whereas CRMC [21], an interface tool for event generators, was employed for the remaining models (version 1.5.6 for DPMJET 3.06, version 1.6.0 for the others). In both cases, all particles having cτ < 1 cm were considered unstable and their decay was treated by the interface. After generation, the secondaries were transported through the beam pipe making use of the DPMJET 3.04 model (taking into account magnetic field, particles decay and particles interaction with the beam pipe). All these simulations were employed for the estimation of hadron contamination correction factors and for the final comparison with the unfolded spectra.
c) MC that require only the interaction with the detector to build the response matrix used for unfolding. In order to be free from the shape of the injected distribution, we considered a single neutron flat energy spectrum from 0 to 6.5 TeV, uniformly distributed on the whole area of the detector. The interaction with the detector was simulated using DPMJET 3.04 and QGSJET II-04, so that we could estimate the dependence of the result on the model employed in unfolding. 1 This flat MC sample is composed by about 10 7 neutrons per tower for each model.  Table 2. Definition of the three pseudorapidity regions used for the analysis: η is the pseudorapidity, r is the distance from the beam center in the detector plane and ∆φ is the coverage of azimuthal angle.
As a last note, we would like to mention that in case of p-p collisions at √ s = 13 TeV, QGSJET II-04 predicts a significant number of neutrons having exactly p T = 0 GeV/c. According to the author's suggestion [22], in this analysis such kind of events was removed from all simulation sets relative to this generator.

Analysis
In this section, we present the analysis procedure, divided in five main steps. At first, we define the event selection criteria and we reconstruct all events in the sample, building the folded energy distributions for each pseudorapidity region (section 5.1). Then, we estimate correction factors to be applied to the folded spectra (section 5.2) and we unfold distributions to deconvolute our result taking into account the detector response (section 5.3). Later, all necessary systematic uncertainties are evaluated and associated both to the folded and unfolded spectra (section 5.4). Finally, these distributions are converted to the inclusive differential production cross section and compared to model predictions (section 6).

Event reconstruction
The event reconstruction algorithm used in this analysis is similar to the one employed in [13], even though we estimated again all calibration parameters and optimized the reconstruction criteria (making use of beam test and MC simulation data), because of the detector upgrade after LHC Run I operations. In particular, due to the difficult identification of a multihit event in case one of the particles is a hadron, we decided to reconstruct all events as singlehit (looking for a single peak in the imaging layers) and later apply a correction for this effect, as described in section 5.2.
An event is accepted if it satisfies a software trigger condition, defined as an energy deposit above 850 MeV in at least three consecutive scintillator layers. This value has been chosen taking into account the efficiency curves of discriminators used to generate the hardware trigger during LHC operations. Because hadron detection efficiency is very limited below 500 GeV, we report the final result only above this energy, even if in all analysis steps we considered the spectrum over 250 GeV because it leads to better unfolding performances in the low energy region. In this analysis, we used the same three pseudorapidity regions defined in [13]. Here called A, B and C, they correspond to a coverage of η > 10.76, 8.99 < η < 9.22 and 8.81 < η < 8.99, respectively. Figure 1 shows the definitions of these regions on the detector area (more details are given in table 2). Due to the limited position resolution of imaging layers, some migration of events inside and outside the pseudorapidity regions is present. This effect is taken into account both including it inside fake and missed correction factors and adding a position resolution systematic uncertainty.
Particle identification is performed exploiting the variable L 2D = L 90% − 0.25 × L 20% , L 20% and L 90% representing the longitudinal depths where the fraction of the energy deposit of a shower compared to the total release in the calorimeter is 20% and 90%, respectively. For each pseudorapidity region, we identified as neutrons all events having L 2D > L thr 2D , where L thr 2D is a threshold value, estimated from simulations in order to maximize the product of efficiency and purity. Even if this threshold is in principle energy-dependent, for simplicity we fixed a constant value for all energy bins inside the same pseudorapidity region, later correcting for residual inefficiencies and contaminations.

Correction factors
The values of correction factors estimated for the present analysis are summarized in figure 2 and table 3. Apart from beam-gas contamination, all of them were calculated independently for each energy bin. Corrections were applied to the energy spectra in the same order that they are mentioned in the following:   Figure 2. Relative correction factors before (left) and after (right) unfolding. In the two cases, black markers show the relative statistical error present on the folded distribution before background correction and on the unfolded distribution before missed events correction, respectively. The different colors represent the relative change of the histograms due to each correction.

• Beam background
Beam-related background is due to two different processes: the interaction of primary protons with residual gas in the beam line and the interaction of secondary particles from collisions with the beam pipe. The first contribution was estimated using events associated with non-crossing bunches at IP1, generated purely from beam-gas interactions, and comparing this number to the events associated with colliding bunches, that include both signal and background. We obtained a background-to-signal ratio of about 1%. The second contribution was estimated making use of MC simulations in which we simulated the interaction with the beam pipe, leading to a background- to-signal ratio that depends on the reconstructed energy, ranging from about -5% at 500 GeV to less than -1% above 1 TeV. Both corrections were applied to the measured distributions and the uncertainty was neglected, being it always smaller than 1%.
• PID PID has a limited efficiency (hadrons identification) and purity (photons contamination). Correction factors were estimated by performing a template fit of the L 2D distributions of photons and hadrons predicted from simulations to the L 2D distribution experimentally observed. An example of this procedure is shown in figure 3. As expected, we did not observe any significant dependence on the generator, being correction factors related to detector response and not to the distributions of incoming particles. Because of this reason, the uncertainty on PID factors is mainly statistical, estimated making use of a confidence interval method based on the result of the template fit. The correction ranges between -5% and +8%, whereas the uncertainty is below 10%.
• Multihit Multihit events are not correctly identified by the reconstruction algorithm defined in section 5.1. Because of this reason, we estimated correction factors using the special simulation sample described in section 3: in this MC, each multihit event involving n particles simultaneously hitting the same tower was divided in n different events, one per each particle. This sample was reconstructed in two different ways: in the first case we artificially piled-up each group of n events in a single multihit event, whereas in the second case we considered each event individually. These two cases correspond to the real/ideal situation in which our detector cannot/can distinguish multihit events. Corrections were calculated making use of QGSJET II-04 and EPOS-LHC generators, estimating the final factors from the average over the two models and the systematic error from half their difference. The correction ranges between -14% and 10%, whereas the uncertainty is below 10%.

• Fake and missed
Fake and missed events refer respectively to events that, due to detector misreconstructions, are incorrectly included in and excluded from the measured distributions. Both effects are due to improper position and/or energy reconstruction, that is significant only in the low energy region. As discussed in section 5.1, the limited detection efficiency of hadrons contributes as well to the missed events. Actually, this effect constitutes the largest contribution to correction factors: indeed, due to the small depth of the detector, a significant fraction of hadrons interacts late or does not interact at all in the calorimeter. Above 2 TeV, the detection efficiency is mostly constant around a 70% value, whereas, below 2 TeV, it strongly decreases due to the smaller energy deposit. Note that fake and missed corrections are expressed as a function of different definition of energy (reconstructed and true) because they are respectively applied before and after unfolding. Both effects were estimated making use of the ideal (multihit corrected) sample described above, again using the average of QGSJET II-04 and EPOS-LHC as the final correction factors. Generator dependence resulted to be less than 1%, therefore we neglected this contribution to the final uncertainty. Fake correction (i.e. the ratio between the entries that should not be reconstructed in that bin and the number of reconstructed events in that bin) slightly decreases with increasing energy, ranging from -8% to -1%; missed correction (i.e. the ratio between the entries that should be reconstructed in that bin and the number of reconstructed events in that bin) ranges from more than +100% at 500 GeV to about +40% at 6.5 TeV.

• Hadron contamination
A not negligible fraction of hadrons reaching the detector are not actually neutrons, but mainly Λ 0 , K 0 L and other neutral and charged hadrons. Because the detector can not distinguish between them, the unfolded spectra must be corrected in order to recover the distributions of neutrons at the IP1. Neutrons at the IP1 include both neutrons produced from p-p collisions and neutrons produced from the decay of short lived particles (cτ < 1 cm). The hadron contamination was calculated making use of all the five generators described in section 4 after simulating the transport of secondaries to TAN. The final correction factors were estimated from the average over the five models with a systematic error given by the maximum deviation observed. The correction ranges between -60% and -7%, whereas the uncertainty is below 20%.

Spectra unfolding
Given the limited energy resolution, spectra unfolding is necessary to deconvolute our measurements taking into account detector response. This task was performed using the iterative Bayesian method [23] implemented in the RooUnfold package [24]. Some changes to the default implementation were made in order to separate the choice of the prior used as initial hypothesis of the algorithm from the distribution employed to build the response matrix of the detector. This modification was necessary to distinguish between the different contributions to the uncertainty coming from the unfolding process itself, as discussed in section 5.4. Forgetting for the moment this systematic contribution, the best estimation of the final unfolded distributions was obtained making use of a flat prior as initial hypothesis and building the response matrix through a flat energy spectra of single neutrons uniformly distributed over the whole detector area, simulated using the DPMJET 3.04 model. In order to estimate the uncertainties related to deconvolution, unfolding was repeated several times changing time by time the prior, the response matrix or the test input spectrum. Because of this reason, it was necessary to define a general convergence criteria that could be valid in all these cases. Thus, we decided to stop the iterative procedure when ∆χ 2 , the χ 2 change between the output of two consecutive iterations, was below 1. This threshold was chosen as a compromise between a good level of convergence and a limited number of iterations. For example, in the case of the data points in the final unfolded distributions, this condition corresponds to 51, 12 and 10 iterations for Region A, B and C, respectively. An additional change to the default procedure is relative to the choice of applying fake and missed correction outside the unfolding algorithm, altough we confirmed that results are consistent with the standard approach. Note that this approach is different from the one used in [13].

Systematic uncertainties
The typical value of systematic uncertainties estimated for the present analysis are summarized in figure 4 and Figure 4. Relative systematic uncertainties on the final unfolded spectra. From left to right, the three columns represent: the uncertainties already associated to the folded distributions, the uncertainties due to spectra unfolding and the uncertainties related to hadron contamination corrections. In all three cases, black markers show the relative statistical error present on the unfolded distribution.
added. It is important to distinguish between uncertainties associated to the spectrum after unfolding (unfolding error, hadron contamination) and before unfolding (all the remaining contributions). While the first group can simply be added to the final distribution, the second one must be correctly propagated through the unfolding. Because iterative Bayesian unfolding is able to handle statistical but not systematic errors, we proceeded in the following way: we considered each contribution independently and artificially shifted the folded nominal spectrum by its estimated relative uncertainty; we unfolded the two distributions obtained in such a way, corresponding to the two extremes of the error bars, using the same  Table 4. Minimum-maximum value of the relative systematic uncertainties for each contribution discussed in section 5.4, separately reported for the three pseudorapidity regions selected in this analysis.
number of iterations as in the nominal case; lastly, we obtained the systematic contribution on the final distribution from the ratio between the unfolded spectra relative to the error bars and the unfolded nominal one. All systematic uncertainties are described in the following: • Energy scale The uncertainty on the energy scale is related to three different contributions. The first one is due to the calibration process (3.7%), including the absolute gain calibration of each scintillator layer, non uniformity in position dependent correction factors and residuals in the conversion from the deposited energy to the primary energy. The second one is related to different hardware effects present during data taking at the LHC (2.0%), including ADC non-linearity, cables attenuation, stability of the high voltages, relationship between high voltages and PMT gains. The third one is due to the shift of the position of π 0 mass peak in the two gamma invariant mass distribution respect to simulations (+1.6%), discussed in [25]. Despite this shift is still consistent within the two previously mentioned systematic uncertainties, we decided to conservatively take into account this contribution in our analysis, because, in case of hadronic showers, we do not have a clear energy signature that can be used to check the absolute scale. The total energy scale error, obtained from the quadratic sum of the three terms mentioned, resulted to be ±4.5%. The systematic uncertainty on the spectrum was estimated shifting the energy scale by the corresponding error and comparing it with the original distribution. As we can see from table 4, the energy scale is the main source of uncertainty, especially in the high energy region where the slope of the spectrum is steep. The uncertainty strongly depends on energy, ranging from about 1% to about 100%.

• Beam center
In order to measure pseudorapidity we need to know the beam center, defined as the projection of the beam direction at the interaction point on the detector plane. This parameter was obtained fitting the position distribution of high energy hadrons through a bidimensional function. The uncertainty on the beam center resulted to be ±0.3 mm, a value consistent with the run-by-run fluctuations of this parameter. In order to estimate the systematic uncertainty on the spectrum we shifted the beam center position by this quantity on the x and y axes. Making use of these four additional distributions, we obtained the systematic uncertainty from the maximum deviation respect to the original one. The uncertainty is generally below 5%.

• Position resolution
In order to simplify the unfolding procedure, deconvolution takes into account only the effects due to energy smearing, not to position resolution. Although migration caused by position misreconstruction was corrected applying fake and missed factors, we decided to consider this effect as a source of systematic uncertainty as well. This contribution was calculated from simulations, comparing the spectrum obtained using reconstructed position with the one obtained using true position. After having evaluated this ratio both with QGSJET II-04 and EPOS-LHC generators, the maximum deviation was used as estimation of the systematic uncertainty. The uncertainty is generally below 10%.
• Integrated luminosity The error on the integrated luminosity, derived from ATLAS measurements [16], was estimated to be 1.9%: this value was considered as an energy independent systematic uncertainty on the spectrum.

• Correction factors
The systematic uncertainties due to correction factors were discussed in section 5.2: they are relevant for PID, multihit, hadron contamination corrections and negligible for beam-related background, fake, missed corrections.

• Interaction cross section
Detection inefficiency considered in missed events correction factors depends on the cross section that describes the interaction of a hadron with the detector. In order to estimate the uncertainty on this parameter, we compared the proton-tungsten interaction cross section used for the DPMJET 3.04 model in EPICS with the one obtained from the QGSP_BERT 4.0 model in GEANT4 [26], a different simulation toolkit. The maximum deviation found below 6.5 TeV was 6%, that was considered as an energy independent systematic uncertainty on the spectrum.

• Unfolding
Four different terms contribute to the uncertainty related to the unfolding itself. The first one, here called unfolding method uncertainty (generally 3-20%), is due to the ability of the iterative Bayesian method to give a spectrum consistent with the true level one: we estimated this term using simulations relative to QGSJET II-04 and EPOS-LHC generators, unfolding the distribution making use of a flat prior and comparing the unfolded spectrum obtained in this way with the true distribution. The second one, here called unfolding prior uncertainty (generally 3-20%), is due to the dependence of the unfolded spectrum on the choice of the prior used at the initial stage of the iterative procedure: we estimated this term comparing the unfolded results obtained using a flat prior with the ones obtained using priors chosen from QGSJET II-04, DPMJET 3.04 and EPOS-LHC generators. As discussed in appendix A, the best knowledge we have on the prior shape would be to assume x F scaling [27] and extrapolate it from ISR measurements [28,29] (using the method suggested in [30]), but we decided to not consider it because it leads to a large unfolding bias. 2 The third one, here called unfolding interaction uncertainty (generally 1-70%), is due to the dependence of the unfolded spectrum on the hadronic interaction model used to build the response matrix: we estimated this term comparing the unfolded results obtained simulating the interaction with the detector making use of two different models, DP-MJET 3.04, employed for all simulations described in section 4, and QGSJET II-04. Note that this uncertainty is not shown on the final result because, being so large respect to the other contributions, it would be a too conservative choice considering that, as described in appendix B, DPMJET 3.04 resulted to have very good agreement with beam test data, respect to QGSJET II-04. The fourth one, here called unfolding algorithm uncertainty (generally 3-20%), is due to the different unfolded spectrum that we can obtain using different algorithms: we estimated this term comparing the unfolded results obtained using iterative Bayesian method with the ones obtained using SVD [31] and TUnfold [32] (also implemented inside the RooUnfold package). Note that this uncertainty should be in principle already included in the first two, but we decided to conservatively add it because the algorithm performances, tested using simulations, may be different in the case of real data, being all models very different from experimental measurements, especially in Region A.

Results
In this section, we present the measured neutron production rate and we compare it with model predictions, both before and after unfolding. Distributions are expressed in terms of the differential production cross section dσ n /dE, corrected for the limited coverage of the azimuthal angle. In the comparison with generators, for each model we used its own inelastic cross section, as reported in [25]. 2 The Feynman variable is defined as xF ≡ 2pZ/ √ s, where pZ, the longitudinal component of the momentum, can be approximated to the energy E in the very forward region. The xF scaling hypothesis asserts that the differential cross section expressed as a function of xF should be independent of the center of mass energy for xF above approximately 0.2.  Figure 5. Folded differential neutron production cross section for p-p collisions at √ s = 13 TeV measured using the LHCf Arm2 detector. Black markers are experimental data with statistical uncertainty, whereas gray bands represent the quadratic sum of statistical and systematic uncertainties. Colored histograms refer to models predictions at the detector level. The top plot shows the energy distributions expressed as dσ n /dE and the bottom one the ratios of these distributions to the experimental data.
The Arm2 folded dσ n /dE is shown in figure 5. This result refers to the distribution before the application of multihit correction and, therefore, the uncertainty is given by the quadratic sum of all statistical and systematic contributions on the folded spectrum except multihit and interaction cross section. The binning was defined in such a way that the statistical uncertainty is always below 10%. As we can see, no model reproduces well the experimental measurements, but, in particular, the agreement of DPMJET 3.04 with data is very limited, both in terms of shape and yield. For this reason, as described in section 4, we excluded this generator from the estimation of model dependent corrections and uncertainties.
The Arm2 unfolded dσ n /dE is shown in figure 6. The total uncertainty is given by the quadratic sum of all statistical and systematic contributions. The values of differential cross section are also summarized in appendix C. In the pseudorapidity region η > 10.76, neutron production exhibits a peak structure around 5 TeV that is not predicted by any model. As discussed in appendix A, the peak is present independently from the choice of a flat prior or an ISR prior, but the position of the peak changes from x F ∼ 0.75 in the former case to x F ∼ 0.80 in the latter case, also affecting the conclusion on the x F scaling hypothesis. Regarding models, QGSJET II-04 shows a constant increase of the production rate up to high energies, whereas EPOS-LHC and SIBYLL 2.3 reach an approximately constant production rate above 3.5 TeV. These three generators do not contemplate the presence of a high energy peak in forward neutron differential cross section and, therefore, they lead to a lower yield at 5 TeV and a higher yield at 6.5 TeV. On the other side, DPMJET 3.06 and PYTHIA 8.212 have a peak structure, but they strongly underestimate both the position of  Figure 6. Unfolded differential neutron production cross section for p-p collisions at √ s = 13 TeV measured using the LHCf Arm2 detector. Black markers are experimental data with statistical uncertainty, whereas gray bands represent the quadratic sum of statistical and systematic uncertainties. Colored histograms refer to models predictions at the generator level. The top plot shows the energy distributions expressed as dσ n /dE and the bottom one the ratios of these distributions to the experimental data.
the peak and the production rate at high energies. For all generators the total production cross section in η > 10.76 is lower than the one experimentally observed: QGSJET II-04 is the model leading to the smallest deficit respect to data, amounting to a value of about 20%. Even in the pseudorapidity regions 8.99 < η < 9.22 and 8.81 < η < 8.99, no model reproduces well the experimental measurements, although deviation is smaller than in the previous case. SIBYLL 2.3 and EPOS-LHC have the best overall agreement with data in 8.99 < η < 9.22 and 8.81 < η < 8.99, respectively. In particular, SIBYLL 2.3 is compatible with data in the region between 1.5 and 2.5 TeV, where neutron production is maximum, but it is softer below and harder above this point. The other models generally underestimate (QGSJET II-04, EPOS-LHC) or overestimate (DPMJET 3.06, PYTHIA 8.212) the inclusive differential cross section in all the energy range.
The general trend of experimental data is similar to what observed at √ s = 7 TeV [13]. Direct comparison of models can not be done because the version used here is different respect to the one employed in [13]: in particular, QGSJET II-04, EPOS-LHC and SIBYLL 2.3 were tuned using LHC Run I results. Comparing the pre-LHC and post-LHC version of SIBYLL, we can observe a significant increase of the neutron production in all the pseudorapidity regions and this fact improves the agreement of the model with experimental measurements. Differently, QGSJET and EPOS are not affected by relevant changes. Whereas no strong variation is found also in PYTHIA, DPMJET exhibits a very different inclusive differential cross section in the two cases. Because no significant changes in the mechanisms responsible for forward neutron production are expected between √ s = 7 and 13 TeV, this variation must be due to the fact that the p T coverage is different in the two cases, i.e. the p T intervals corresponding to a given pseudorapidity region at 13 TeV are almost twice the ones at 7 TeV.

Summary
The LHCf experiment measured the inclusive differential cross section of forward neutrons produced in proton-proton collisions at √ s = 13 TeV. The analysis interested the same three pseudorapidity regions considered in the case of √ s = 7 TeV: η > 10.76, 8.99 < η < 9.22 and 8.81 < η < 8.99. Experimental measurements were compared to the prediction of several hadronic interaction models: QGSJET II-04, EPOS-LHC, SIBYLL 2.3, DPMJET 3.06 and PYTHIA 8.212. No one of these generators resulted to be in a good agreement with data due to a still not satisfactory understanding of particles production in the forward region. In η > 10.76, the experimental measurements exhibit a peak structure around 5 TeV that is not predicted by any model and all models underestimate the total production cross section, with QGSJET II-04 being the generator leading to the smallest deficit respect to data. In the other two regions, the experimental measurements exhibit a peak structure around 1.5-2.5 TeV that is well reproduced by SIBYLL 2.3, with SIBYLL 2.3 and EPOS-LHC being the models in best agreement with data in 8.99 < η < 9.22 and 8.81 < η < 8.99, respectively.
The results presented in this paper are relative to the Arm2 detector only, but in the future we plan to extend this analysis to the Arm1 detector as well, in order to enlarge the pseudorapidity acceptance by the use of regions that are accessible to only one of them. An additional improvement of the analysis will be possible thanks to the common operations that the LHCf and ATLAS experiments had in 2015. By exploiting the ATLAS information in the central region, we can tag LHCf triggers on an event-by-event basis and therefore investigate the different mechanisms of neutron production in the forward region [33,34].

A Change in the unfolded result depending on the choice of flat or ISR prior
Experimental measurements of inclusive differential neutron production cross section in the forward region have been carried on at ISR [28,29] and by PHENIX [30]. The different measurements are in good agreement between each other and demonstrates the validity of x F scaling for p-p collisions in the range √ s = 30.6 − 200 GeV. Thus, when choosing the initial prior of iterative Bayesian unfolding, it would be in principle better to choose the one extrapolated from ISR measurements rather than the flat one. This extrapolation can be done easily using the same method described in [30], limiting the integration to the acceptance of Region A in LHCf, corresponding to p T < 0.276 GeV/c · x F in the case of p-p collisions at √ s = 13 TeV. Here we call ISR prior the one obtained extrapolating the ISR results relative to p-p collisions at √ s = 44.9 GeV according to the procedure mentioned above. Unfortunately, as shown in figure 7, this prior was found to have a large bias on the final result, observed both on experimental data and generators simulations. In particular, being very different from all models, the ISR prior leads to an unfolded spectrum very different from generator truth in case of QGSJET II-04 simulations, because of its feature of a sharp peak around x F = 0.80 that rapidly decreases to 0 at x F = 1. In addition, this problem was found only with the ISR prior, but not with an artificial prior obtained shifting the peak position from x F = 0.80 towards smaller or larger values. Thus, we concluded that the ISR prior is not a safe choice for our analysis because, in the case the distribution present in nature was different from ISR measurements, the unfolding would lead to a result that is different from the true one. As shown in figure 7, all other priors lead to similar results and, in any case, the residual difference is considered as systematic uncertainty. Note that all this discussion applies only to Region A, but not to Region B and C, whose p T coverage is outside the validity of the extrapolation suggested in [30]. The ISR prior was extrapolated using the method described in [30] on data relative to p-p collisions at √ s = 44.9 GeV. Note that the ISR prior leads to an unfolded result that is very different from the one obtained with the other priors.

B Comparison of the measured detector response with hadronic interaction models
The choice of the hadronic interaction model used to simulate the detector response plays a very important role in the analysis described in this paper. It is employed for detector calibration, for the estimation of correction factors, for the evaluation of sytematic uncertainties and for spectra unfolding. Unfortunately, the agreement between different models is not good in the energy range considered in this analysis. Because of this reason, we decided to test them using experimental data at the highest energy achievable at beam test facilities. These data were acquired at the CERN Super Proton Synchrotron (SPS) in July-August 2015, making use of muons, electrons and protons beams at different energies. The total energy deposit of 350 GeV protons in the detector was compared to the one predicted by DPMJET 3.04 and QGSJET II-04 in the same experimental configuration. The event selection applied to these distributions differs from the one described in section 5.1 for two main reasons: the reduction of the software trigger threshold (75 MeV for small tower and 150 MeV for large tower) to cope with the small detection efficiency at this energy; the reduction of the detector fiducial area to a small square around center (5 mm × 5 mm for small tower and 6 mm × 6 mm for large tower) in order to select a region of uniform response. The comparison between data and MC is shown in figure 8, whereas the parameters of the distributions are reported in table 5. As we can see, DPMJET 3.04 is in very good agreement with experimental measurements compared to QGSJET II-04: therefore we decided to use DPMJET 3.04 to simulate the interaction of particles with the detector. In particular, the deviation of the mean and the energy resolution predicted by this model respect to the one observed in data is at most 2%. Because this difference is compatible with the error on the gain of scintillator channels, we decided to neglect this additional contribution to the uncertainty on the energy scale. As a final remark, it is important to note that the choice of the model used to simulate the interaction with the detector is independent from the ability of the same model to reproduce the spectra of particles produced in √ s = 13 TeV p-p collisions. This is because, apart from the fact that in the calorimeter the first interaction is between the incoming particle and tungsten, there is a difference of more than 4 orders of magnitude between the highest possible energy of a particle reaching the detector (6.5 TeV) and the energy of the primary proton in the laboratory frame where the other proton is at rest (90 PeV).    Table 6. Inclusive differential neutron production cross section dσ n /dE for the three pseudorapidity regions considered in this analysis. Upper and lower uncertainties, expressed as the quadratic sum of statistical and systematics contributions, are also reported.