Generalisation of the identity method for determination of high-order moments of multiplicity distributions with a software implementation

The incomplete particle identification limits the experimentally-available phase space region for identified particle analysis. This problem affects ongoing fluctuation and correlation studies including the search for the critical point of strongly interacting matter performed on SPS and RHIC accelerators. In this paper we provide a procedure to obtain nth order moments of the multiplicity distribution using the identity method, generalising previously published solutions for n=2\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$n=2$$\end{document} and n=3\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$n=3$$\end{document}. Moreover, we present an open source software implementation of this computation, called Idhim, that allows one to obtain the true moments of identified particle multiplicity distributions from the measured ones provided the response function of the detector is known.


Introduction
Search for the critical point of strongly interacting matter remains one of the most important goals of experimental searches in heavy ion physics [1,2]. Its basic property -the increase of the correlation length of the considered system -forces experimenters to shift their interests from inclusive spectra to higher-order moments and cumulants of the particle multiplicity distributions. A particular interest is paid towards net-proton fluctuations being the most sensitive to the searched phenomenon [3].
One of the most serious experimental issues, which largely limits the available phase-space coverage, and, possibly, affects the studied signal, is the incomplete particle identification caused by finite detector resolution. To overcome this problem an experimental technique, called the identity method, was proposed in Ref. [4], and extended in Refs. [5][6][7]. So far the identity method was described for the seca e-mail: majam@if.pw.edu.pl ond [5] and third [6] order moments. In Ref. [6] it was also used to reexamine the first moments of the identified particle distributions. The impact of particle losses due to detector inefficiencies on results from the identity method is discussed in Ref. [7]. The author shows that it remains applicable provided detection efficiencies can be determined with sufficient accuracy. With the ongoing development of theoretical studies concerning higher order moments it seems appropriate to extend experimental techniques and tools as well.
In the present study the identity method is extended in two ways. Firstly, a strict procedure to obtain nth order moments of the multiplicity distribution is shown. Secondly, a program, called Idhim, which performs such calculations for any given number of considered particle types, is presented. It also allows one to obtain moments up to any order provided the detector response function is known. The modification of first moments from Ref. [6], also included in Idhim, may address possible biases in other popular methods [8] (e.g. maximal likelihood method [9,10]).
The paper is organized as follows. In Sect. 2, basic quantities of the identity method are presented. The computation of nth moments of true multiplicity distribution is shown in Sect. 3. Modifications necessary to apply the general formulas in practice are addressed in Sect. 4. Description of the Idhim program which computes moments of the true multiplicity distribution is given in Sect. 5. Section 6 contains tests of the program with the detector response close to the ones measured in real experiments. Conclusion in Sect. 7 ends the paper.

Basic quantities
The identity method is developed under the assumption that particles are identified by measuring quantity x (e.g., a mass) of observed particles. Due to the finite detector resolution one gets a continuous distribution for x, denoted by ρ j (x), where index j stands for one of k particle types. The density is expected to sum up to the mean of N j , i.e., the multiplicity for this type: For a given particle observation, its conditional probability of being of a given type is expressed by a quantity called identity, defined as: In the case of complete particle identification w j is reduced to two extreme values: w j = 0 for particles of types other than j and w j = 1 for particles of type j.
In the same way, one can define an aggregated quantity for a given particle type: where N (ν) is the total multiplicity (including all particle types) of the νth of considered N ev events. From these events one obtains the distribution of different types of W with its moments defined as where n j denotes the order of the moment of the distribution of W j .

Computing the nth moments of multiplicity distribution
We will now show how one can compute all the nth moments of the multiplicity distribution N n 1 1 · N n 2 2 · . . . · N n k k with n 1 + n 2 + · · · + n k = n using the moments of the measured identity variables. The procedure will be a generalisation of those published for n = 2 [5] and n = 3 [6]. First, we shall demonstrate how the value of a moment of identity variables W n 1 1 · W n 2 2 · . . . · W n k k , depends on the multiplicity distribution. We have the following: where P(N 1 , N 2 , . . . , N k ) is the multiplicity distribution, i.e., the probability of observing N 1 particles of the first time, N 2 particles of the second type and so forth, and P j ( is the probability distribution of the jth type. Let us firstly focus on the innermost part of Eq. 5, denoted hereafter by ω: We will now use the multinomial theorem to expand the n l th power. Let us first define the following notation for brevity: a a 1 ,a 2 ,...,a k ≡ a 1 +a 2 +···+a k =a a a 1 , a 2 , . . . , a k .
In this notation the multinomial theorem is represented by which allows us to express ω as where the first summation is over all possible combinations of k nonnegative integers η 1 (l) , . . . , η k (l) that sum up to n l . Let us now use the multinomial theorem again, this time to expand the η j (l) th power: This formulation could be rearranged to give . . .
If we now put ω expressed in such way back to Eq. 5, we can notice that integration over x j i can be applied to the product . . .
where function u j is defined as Let us now focus on the part of Eq. 12 depending on j, which will be denoted by λ j , Since each u j depends on a tuple of values of length k, it is convenient to introduce a notation for such tuples: This allows us to express λ j as where the first summation is over all possible combinations of N j tuples η j 1 , . . . , η j N j (each containing k nonnegative integers) that sum up to η j . We can notice that zero tuples, i.e., η j i = (0, 0, . . . , 0), do not contribute to λ j since u j (0, 0, . . . , 0) = 1. Let us then express the sequence η j 1 , η j 2 , . . . , η j N j as a combination of several non-zero tuples. Let Γ j denote a set of all such combinations possible for η j , so that each γ ∈ Γ j consists of |γ | different non-zero tuples: μ If we use this to compute λ j , we get where a * b means that the value b appears a times in the multinomial symbol. The indicator variable is necessary so that in the case of η j = 0, we have λ j = 1, as in Eq. 16.
If we now focus on the multinomial symbol involving N j , it can be expanded as We can see it is a polynomial of N j of degree |γ | p=1 m γ p , which is at least 1 and at most η Since λ j is a weighted sum of such polynomials (and indicator variable), it is a polynomial of N j of at most the same degree and can therefore be expressed as Further coefficients, i.e., λ j p for p > η j Σ , equal zero. Let us put this formulation of λ j back to Eq. 12. This gives us We can rearrange it as which finally gives us Since k j=1 η j Σ = n 1 + n 2 + · · · + n k = n, the order of the moments at the right hand side is at most equal n. We have therefore just shown how to express any moment of W distributions with order n as a sum of moments of N distributions with order ≤ n. Since this dependency is linear, we can define the whole problem as a set of linear equations.
We also need to define elements which will take into account the contribution of moments of N with orders lower than n: To arrange the moments in a linear order, let us now choose any one-to-one function f from sequences of length k summing up to n to numbers 1, 2, . . . , n+k−1 k−1 . We can use it to construct a matrix A having elements A ξ,ζ = a q 1 ,q 2 ,...,q k n 1 ,n 2 ,...,n k and vector B with elements B ξ = b n 1 ,n 2 ,...,n k for ξ = f (n 1 , n 2 , . . . , n k ) and ζ = f (q 1 , q 2 , . . . , q k ). We can also arrange unknown moments in a vector N such that N ζ = N or in matrix notation: If detA = 0, the moment we are looking for can be computed as

Modifications
In the previous section we have shown the procedure to compute the nth moments of the multiplicity distribution as a generalisation of the computations for n = 2 and n = 3, but to apply it in practice we needed to make three modifications.
Firstly, to compute the first moments as proposed in Ref. [6], we need to replace Eq. 1 with the following: Now the distribution of a measured x for a given particle type j is normalised to arbitrary value A j , which does not have to equal N j . As a result, we also need modify Eq. 13, which now becomes The rest of the procedure holds, and corrected N j could be computed by applying it for n = 1. Secondly, the measured x is traditionally associated with the particle mass, but it can be any measured quantity, not necessarily a single scalar value. In general, it could be a multi-dimensional vector x, e.g. mean energy loss and timeof-flight, as long as integration in function u is performed accordingly.
Thirdly, measurement of x could be performed in several phase space bins, corresponding to different detector configurations. In such cases Eq. 1 takes form where θ denotes a configuration from a configuration space Θ. Analogously, the definition of w (Eq. 2) has to take into account θ as well: where w j (x, θ) denotes value of the jth identity variable for a measurement x registered in configuration θ . Finally, the computation of u (Eq. 13) has to take into account measurements in all configurations, so All three modifications have been described here separately for simplicity, but could be combined if necessary.

Implementation
The Idhim program was designed to provide an easy way to obtain moments of the true multiplicity distribution of identified particles provided the detector resolution is know.
The implementation in Java, using EJML 1 library for linear algebra operations, is available as open source. 2 The required input to the program includes: (i) a list of particle types in a text file, with each line providing a particle type name, (ii) W n 1 1 · . . . · W n k k moments in a tsv (i.e., tab-separated values) file, with each line describing one moment as a list if n 1 , . . . , n k indices, followed by the moment value, (iii) a list of phase space bins, where a detector response is known, as a tsv file (if there is more than one kinematic variables which define such bins, multiple tab-separated indices may be provided), (iv) a directory containing files with a detector response functions in each bin.
An exemplary set of all needed files is provided with the program. The input format allows for applicability to a wide range of different experiments. Firstly, a number of considered particle types is arbitrary. In a typical case of particle identification it depends on a collision energy and available statistics, e.g., at low interaction energies one does not need to consider deuterons and/or Helium-3, whereas at high energies or with large available statistics they must be taken into account. Secondly, only W n 1 1 · . . . · W n k k moments, not the full distributions, need to be provided. Finally, in a typical experiment, a particle identification is performed by a set of detectors with an overlapping momentum coverage. Thus, a full momentum coverage of an experiment consists of regions with ρ being 1D function, e.g., when particles are identified only by d E/dx or time-of-flight (ToF), or 2D function, e.g., when particles are identified by combined measurements of d E/dx and ToF. An example of such a non-uniform detector acceptance is shown in Fig. 1. Bins with any number of dimensions, which reflect changing detector configuration or particle yields, can be defined as long as density function for all particle types is given in the same points of the space. The next section includes example demonstrating the usefulness of the program features described above. The computation of all moments of multiplicity distributions up to the fourth order was tested on two models. The first one is a Monte Carlo model (so-called fast generator), where the number of particles of a given type produced in a single event was generated from Poisson distributions with a different free parameter λ for each considered particle type. Test included four most popular particle types, namely electrons, pions, kaons and protons, with their respective λ as 1, 10, 2, 4. The number of events is set to 1,000,000. Particles are generated according to the Poisson distribution and are uncorrelated (except the detector response), so the true values of generated moments are The generated cross-moments are defined as the multiplication of the pure ones. A simulated detector response consists of mean energy loss measurements in the Time Projection Chamber. For each particle, its mean energy loss was generated from a Gaussian distribution with parameters based on experimental data from Refs. [9,11] in two bins simulating the momentum dependence of the detector response. Testing several different momentum dependencies showed that particle distribution between bins does not affect the final results. An exemplary simulated dE/dx distribution in a single bin is shown in Fig. 2.  1000  0010  2000  1010  0020  0200  0101  0002  1100  0110  1001  0011  0300  0201  0102  0003  1200  0210  1101  0111  1002  0012  1110  1011  0120  0021  2100  2001  1020  0030  3000  2010  0400  0301  0202  0103  0004  2200  2101  2002  1210  0220  1111  0121  1012  0022  2110  2011  3100  3001  1120  1021  0130  0031  1300  0310  1201  0211  1102  0112  1003  0013  2020  4000  3010  1030  The Idhim program is used to obtain reconstructed moments of the considered particle types up to the fourth order. The statistical uncertainty of the reconstructed moments results from uncertainty of the fitted distributions ρ j (x) as well as from the W n 1 1 · . . . · W n k k moment values. Both sources are correlated, so the standard error propagation is complicated and inconvenient. Instead, the statistical uncertainty is obtained using the bootstrap method [12].
Reconstructed and generated moments as well as their ratio are shown in Fig. 3. The ratio is 1 within the statistical uncertainty for all considered values.
Another test was performed using 3 million p+p interactions at √ s N N = 17.3 GeV generated using the EPOS [13,14] model with the detector acceptance containing two types of acceptance regionsd E/dx only and combined ToF and d E/dx. An example of such a two dimensional distribution is shown in Fig. 4. The shape of the 2D distribution and its parameters were based on a real data analysis in Ref. [15].
Again, in order to mimic the momentum dependence of the detector response, it was divided into several bins. Reconstructed and generated moments as well as their ratio are shown in Fig. 5. Again, the ratio is 1 within the statistical uncertainty for all considered values.
Both the procedure and its implementation are functioning as expected. The difference between generated and reconstructed first moments of N and W is negligible but in case of the higher orders, the differences can reach 70%. In order to accommodate for different possible shapes of the ρ functions, they are delivered in a binned form. Thus, a proper binning is important to describe the functions' shapes. The identity method does not address other detector biases or its efficiency. Other possible biases should be addressed by the appropriate experimental tools (for examples and details see Refs. [16][17][18]).

Conclusion
In this paper we extend the identity method in two ways. Firstly, a new strict procedure to obtain nth order moments of multiplicity distribution of an arbitrary number of particles is discussed. Secondly, a software implementation of this procedure is presented. Provided a detector response is known, it computes any moments, including the first ones.
It is equally precise both for low and high mean multiplicities. Two tests were performed in order to validate the program. The first test, based on simple fast generator check, showed that program works well in case of lack of correlations between particles. The difference between the recon -0100  0001  1000  0010  2000  1010  0020  0200  0101  0002  1100  0110  1001  0011  0300  0201  0102  0003  1200  0210  1101  0111  1002  0012  1110  1011  0120  0021  2100  2001  1020  0030  3000  2010  0400  0301  0202  0103  0004  2200  2101  2002  1210  0220  1111  0121  1012  0022  2110  2011  3100  3001  1120  1021  0130  0031  1300  0310  1201  0211  1102  0112  1003  0013  2020  4000  3010  1030  0040 Moment value  1000  0010  2000  1010  0020  0200  0101  0002  1100  0110  1001  0011  0300  0201  0102  0003  1200  0210  1101  0111  1002  0012  1110  1011  0120  0021  2100  2001  1020  0030  3000  2010  0400  0301  0202  0103  0004  2200  2101  2002  1210  0220  1111  0121  1012  0022  2110  2011  3100  3001  1120  1021  0130  0031  1300  0310  1201  0211  1102  0112  1003  0013  2020  4000  3010  1030 Fig. 5 Reconstructed (black circles) and generated (red squares) moments and their ratio. The ratio uncertainty values for moments 0003 and 0004, omitted from the plot for clarity, equal 1.00 ± 0.07 and 1.00 ± 0.55, respectively structed and generated moments is at the level of statistical uncertainty or below. The second test was performed on p+p interactions simulated in the EPOS model. The second test confirmed that correlations between particles do not affect the program's efficiency. It also showed that Idhim can be easily used in the case of a non-uniform detector acceptance which contains different detector types. As a last comment we would like to stress that the successful analysis of moments of identified particle distributions depends on an understanding of a detector response. Possible flaws in description of the ρ functions will propagate to the identity method and the final results. Moreover, the identity method does not compensate for a limited detector efficiency. Thus, ρ distributions and mean W 's have to be corrected for a limited and often momentum-dependent detector efficiency by other known methods.