Identification of the K*± resonance decay by topological cuts and multivariate discrimination methods

A study of the identification of the short-lived K*± resonance via its decay into a K0s and a charged pion has been carried out. A sample of minimum bias PYTHIA pp collision events has been simulated and fully reconstructed in the ALICE detector. Traditional methods based on multidimensional topological cuts and artificial neural networks with different architectures have been compared in detail. Examples of additional use of multivariate discrimination methods are also reported.


Introduction
The Large Hadron Collider (LHC) at CERN will produce beams of protons and heavy ions up to 14 TeV and 5.5 A TeV c.m. energy respectively. The ALICE experiment [1,2] is dedicated to the study of these ultrarelativistic pp and nucleus-nucleus collisions. Heavy-ion collisions at such energies allow the study of a hot and dense matter, with the possibility to observe several probes giving information on the first stage of the collision.
Among the important signals proposed to study the possible phase transition from nuclear matter to a plasma of quarks and gluons, in-medium modifications of meson resonances are a relevant observable [3], since they carry information on the interaction of such particles and/or their daughters with the fireball medium.
The study of short-lived resonances, with typical lifetimes in the order of a few fm/c, i.e. comparable to the expected lifetime of the hot and dense matter produced in such collisions, may also probe the role of the rescattering and regeneration processes between chemical and kinetic freeze-out. The observation of such resonances is critical both in pp and in heavy-ion collisions, because of the large background originating from the high multiplicity environment. Simulation studies of the reconstruction capabilities in the ALICE detector have been carried out for several years to date, although some improvement in the algorithms and performance may still be expected especially for the most challenging cases.
Here we want to discuss the possibility to reconstruct the two-step decay of the K * ± short-lived resonance (with a email: Francesco.Riggi@ct.infn.it a cτ = 4 fm and a branching ratio of 33%) into a charged pion (π ± ) and a K 0 s (decaying in turn into a π + π − pair with a cτ = 2.68 cm and a branching ratio of 68.6%). For such study we used a sample of 2 × 10 5 pp minimum bias events generated with PYTHIA [4] at 900 GeV c.m. energy and fully reconstructed in ALICE. Such a set of events could be available during the LHC beam commissioning, according to the original schedule. A much larger set of events is expected at the full 14 TeV c.m. energy, for which detailed simulations are in progress, with the help of distributed GRID computing facilities. Even though a fine tuning of the reconstruction procedure may be necessary at the final LHC energy, most of the considerations and the obtained results at 900 GeV are rather indicative of what can be expected at higher energies.
K * ± resonances were measured at √ s = 200 GeV by the STAR Collaboration [5] in minimum bias pp interactions and in peripheral Au-Au collisions, by reconstruction procedures based on topological cuts taking into account the detector capabilities. These allowed, for pp collisions, to identify at mid-rapidity a sample of approximately 10 4 K * ± with transverse momenta higher than 0.7 GeV/c.
Traditional methods based on topological cuts and neural network approaches were both employed in the present work, with the aim to exploit their capabilities in a situation where the signal is expected to be much smaller than the combinatorial background arising from false associations of K 0 s and charged pion tracks. Additional tests with multivariate discrimination methods which are now widely available to the high energy physics community through well diffused packages were also carried out. Section 2 describes the strategy to reconstruct short-lived resonances in the ALICE environment, while Sects. 3 and 4 report the results obtained for the K * ± in case of traditional and neural network approach respectively. A comparison between alternative methods through the Toolkit MultiVariate Analysis (TMVA) package is discussed in Sect. 5. Some conclusions are finally drawn in Sect. 6.
2 The ALICE environment and the reconstruction of short-lived resonances ALICE is a general-purpose experiment at the CERN Large Hadron Collider, especially conceived for the study of heavy-ion physics. It is designed to track and identify the large number of particles ( dN ch / dy up to 8000) predicted in ultra-relativistic Pb-Pb collisions at 5.5 A TeV. The ALICE detector includes a central part (spanning the central rapidity region −0.9 < η < 0.9), inside a magnet which provides a weak solenoidal magnetic field (0.2-0.5 T), and a forward part (covering the large rapidity region). The central part has several detectors for tracking and particle identification: the inner tracking system (ITS), the time projection chamber (TPC), the transition radiation detector (TRD) and the time-of-flight (TOF). While all these detectors have a complete coverage in azimuth, the central region has also two additional detectors with partial azimuth coverage: the high momentum particle identification detector (HMPID) and the photon spectrometer (PHOS). An additional electromagnetic calorimeter (EMCal), presently under construction, will soon be included in the ALICE installation. In the forward rapidity region (−4 < η < −2.5) the muon spectrometer will detect single muons and muon pairs. Additional detectors are the zero degree calorimeter (ZDC), the photon multiplicity detector (PMD), the forward multiplicity detector (FMD) and the V0 and T0 detectors, for more specialized tasks.
The primary vertex in ALICE is reconstructed through the information provided by the two innermost pixel layers, with a resolution along the beam axis as good as a few µm in case of Pb-Pb collisions and up to 40 µm for pp collisions. Since the lifetimes of short-lived resonances are in the order of 1-50 fm/c, their decay products originate in most cases (K * (892), Λ(1520), ρ, φ, . . .) from the primary vertex, and may be considered as primary particles as far as the tracking is concerned. However, in the case of K * ± , the daughter K 0 s has a cτ of 2.68 cm, so that a secondary vertex needs to be identified, with a proper association of the K 0 s to a primary pion (Fig. 1). In addition to primary and secondary vertex reconstruction capabilities, good identification of short-lived resonances relies also on additional features of the ALICE detector: i) Tracking efficiency: tracking strategies in ALICE make use of the Kalman filter method through the ITS and the TPC detectors. Such strategies allow a good tracking performance, down to very low momenta, even for high multiplicity events. ii) Momentum resolution: the association of ITS, TPC and TRD detectors results in a p T resolution as good as 3% up to 100 GeV/c, and correspondingly better (about 0.7%) for momenta in the order of 1 GeV/c, which are relevant for the bulk of the resonance yields. iii) Track impact parameter resolution: the information on the impact parameter is mainly driven by the ITS and TPC detectors, with a resolution on the transverse impact parameter expected to be smaller than 100 µm in the GeV/c region. iv) Particle identification: charged particle identification (PID) is achieved at low momenta, from 0.2 GeV/c, by means of a combined dE/ dx information from the silicon layers of the ITS and from the TPC. At higher momenta, in the order of a few GeV/c, the TOF detector is able to provide an optimal identification. A Bayesian approach is used to combine the PID information associated to the different detectors. Figure 1 shows the topology of the K 0 s and K * ± decays. Two different strategies for the reconstruction of K 0 s or Λ (both denoted usually as a V0 decay vertex) have been developed in ALICE, one based on selection and coupling of secondary tracks after full tracking, the other trying to find V0 candidates during the tracking itself. In the first approach, the V0 finding procedure starts with the selection of secondary tracks: tracks which have a too small impact parameter (b + , b − ) with respect to the primary vertex are eliminated. Then, each secondary track is combined with all the other secondary tracks having an opposite charge. A pair is rejected if the distance of closest approach (dca) between the two tracks is too large. Once the secondary vertex position is defined, only the vertices inside a given fiducial region of radius R are kept. The inner boundary of this fiducial region is limited by the expected particle density and the tracking precision. Finally, the V0 finding procedure checks whether the momentum of the V0 candidate points back to the primary vertex, through a cut on cos θ p , the cosine of the pointing angle between the V0 momentum and the vector R connecting the primary vertex and the V0 vertex position.
In order to reconstruct the K * ± decay, a candidate K 0 s must be associated to a pion track originating from the primary vertex (i.e. with an impact parameter b pion not exceeding a given value). The relative angle ϑ rel between the K 0 s and the pion momenta, as well as the momenta themselves may also be used in principle to improve the separation of true decays from the background. The Armenteros-Podolanski [6] variables α and q t have been also considered, both for the symmetrical decay of the K 0 s and for the asymmetrical K * ± −→ K 0 s + π ± decay.
3 Study of the K decay into K 0 s and π via topological cuts To reconstruct the K * ± resonance, different sets of standard cuts were tested for the selection of K 0 s (see Table 1): a set of loose cuts which retains most of the secondary vertices, although with a large amount of background still present, and two slightly different sets of tight cuts, which select a sample of K 0 s with a higher purity for further correlation studies. All sets are based on the use of the five variables dca, b + , b − , R, cos θ p . We also considered possible cuts in the Armenteros-Podolanski plane defined by the variables α and q t , but they did not sensibly improve the results. Table 1 shows the typical performance of such sets, in terms of cut efficiency (with respect to the number of findable K 0 s ) and sample purity. The first two rows in Table 1 refer to the cuts already discussed in the PPR [2]. With the PPR tight cuts, a selection efficiency of 52.1% is obtained, with a sample purity of 97.9%. Slightly better results may be obtained with the set listed in the last row, which gives an efficiency of 54%, with a comparable sample purity, and a significance S/ (S + B) = 56 for 2 × 10 5 events. After applying such cuts, the invariant mass spectrum of the V0 candidates is shown in Fig. 2. All the results here and in the following always refer to the invariant mass window between 0.48 and 0.51 GeV/c 2 , which defines a golden K 0 s . After selection of K 0 s , the K * ± may be reconstructed by coupling the K 0 s to each primary pion (impact parameter smaller than 3 cm) in the event. Both charge signs were included in the analysis, so the results refer to the sum of K * + and K * − . Without any additional cut, one gets the invariant mass spectrum shown in Fig. 3. By subtraction of the combinatorial background by the event mixing technique, as explained in the following, a signal-to-noise ratio equal to S/B = 0.059 was obtained in the mass region (0.8-1.0) GeV/c 2 . In principle, additional kinematical variables may be used to improve the S/B ratio, with a loss of efficiency. Among such variables, we tested the pion track impact parameter b pion , the relative angle ϑ rel between the K 0 s and the charged pion, the momentum p res of the K * ± and the Armenteros-Podolanski variables of the resonance, α res and q tres . Application of such cuts slightly improves the S/B ratio up to 0.073, with a cut efficiency reduction down to 67%. However, the invariant mass spec-  Table 1) Fig. 3. Invariant mass of the K 0 s -π ± pairs, after selection of K 0 s through standard cuts trum is strongly deformed by such cuts, which reduce the available phase space outside the resonance peak. The extraction of the signal was then accomplished by evaluating the combinatorial background around the resonance peak, without imposing additional kinematical cuts. The event mixing technique (i.e. coupling a K 0 s from one event and a charged pion from another event) was used. To investigate the effect of the event properties on the event mixing, a preliminary study of the background in the event mixing technique was carried out, comparing the true background (defined by all those pairs which do not originate from a K * decay) and the combinatorial background evaluated by event mixing, for different selections of the event pairs. As a result, only events with similar vertex location (∆z ≤ 3 cm) and particle multiplicities (∆m ≤ 5) were included in the mixing procedure. A sufficiently large number of events were mixed, so as to ensure a negligible statistical error. The combinatorial spectrum was then normalized outside the resonance peak. Figure 4 shows the result of the subtraction of the combinatorial background from the signal spectrum, compared with the distribution of the true K 0 s -π ± pairs. The significance for 2 × 10 5 events is S/ (S + B) = 7.6. The yield and peak position of the extracted signal are in general agreement with the corresponding quantities for the true pairs. There is however some overestimation (1090 found pairs against 935 true pairs) of the yield with respect to the true pairs, especially on the left side of the invariant mass peak. This could result from imperfect background estimation. Additional information on the p t -dependence of the signal, as well as any consideration on the possibility to measure with high precision the position and the width of the mass peak, which could be important to understand inmedium modifications of the resonance properties, would require a much larger set of events, such as those expected Fig. 4. Comparison between true (crosses) and found (histogram) K 0 s -π ± pairs, after subtraction of the combinatorial background by the event mixing technique at the full LHC energy, and will be the object of further investigations.

Testing a neural network approach to the problem
The optimization of traditional kinematical and topological cuts in a multidimensional space is always a long process, which does not guarantee to lead to optimal results, since the number of variables may be rather high and often correlations between them are expected. In such cases, alternative methods may be tested, which try to reduce the number of variables, such as the principal component analysis (PCA) [7] or the linear discriminant analysis (LDA) [8]. A widely used approach in such situations is also given by neural networks [9], which have been used in a variety of applications concerning particle and rare decays identifications [10]. In this paper we tested different network topologies for the recognition of the K 0 s −→ π + π − and K * ± −→ K 0 s π ± decays, comparing the results to those obtained with classical cut selections. An additional series of tests aiming at the comparison between additional multivariate discrimination methods is discussed in the next section.
A feedforward multilayered neural network consists of a set of input neurons, one or more hidden layers of neurons, a set of output neurons, and synapses connecting each layer to the subsequent layer. The synapses connect each neuron a i in the first (a) layer to each neuron b j in the hidden (b) layer and each b j neuron to the output.
Several network architectures were tested. The inputs to the network were chosen among the kinematical parameters defined above. In a first approach, a neural network was trained to recognize the K 0 s decay, using as input neurons a number of variables between 5 (dca, R, b + , b − , cos θ) and 9 (the above variables plus the invariant mass M K 0 s , the Armenteros-Podolanski variables α, q t , and the K 0 s momentum, p K 0 s ). The training phase was carried out on a sample of 4000 candidates K 0 s (invariant mass between 0.48 and 0.51 GeV/c 2 ), containing true and fake V0's in the correct ratio, and randomly distributed in the sample. Network weights were randomly initialized, and the updating process was stopped after N epochs iterations, usually when the errors, either in the learning or in the testing samples, did not sensibly change. Different tests were made, by choosing the number of layers and neurons in each layer (no hidden layers, one and two hidden layers), the learning method used by the network and the number of epochs where to stop the learning step.
As an example, Table 2 shows a comparison between different learning methods used in a 9-9-1 network (9 input neurons, 9 neurons in the hidden layer and 1 output neuron), as they are usually available in standard analysis packages. From this and additional tests, no strong dependence of the results was seen on the network parameters. Figure 5 shows a plot of the network output, for the background and the signal (K 0 s ), obtained with the stochastic  5. Output from the neural network for the recognition of K 0 s minimization learning method applied to a 9-9-1 network topology. Depending on the choice of the cut on the neural network output, one may select a sample with higher purity (defined as the ratio between the number of found and true particles -in such a case the K 0 s -and the number of found particles), with some efficiency loss, as it is shown in Fig. 6. Also shown for comparison are the results obtained from two sets of standard cuts.
Some improvement in the significance S/ (S + B) was also observed for the neural network strategy, from the value 56 obtained with the standard cuts to values ranging from 59 to 61, depending on the choice of the cut. Figure 7 shows the invariant mass spectrum of the π + π − pairs, for different values of the neural network cut. Comparable results were obtained with a simplified network architecture, with only 5 input neurons, corresponding to the main kinematical variables. Setting up the optimal conditions for a neural network implies a long and timeconsuming process, since several possibilities may be ex- ploited. This means that the results shown here are not necessarily the best which one could obtain from a neural network approach; however, since the results do not strongly depend on the different choices explored, one can be confident that the best performance is not far from the one actually obtained. Along this line, one can conclude that the use of neural networks gives in such case comparable, or slightly better results than traditional methods based on topological cuts. This is in qualitative agreement with what found, e.g., in [11] in a different context, namely 2 GeV/nucleon Ni + Cu interactions.
To reconstruct the K * ± resonance, several strategies making use of neural networks may be employed. One possibility is to use a mixed approach, where K 0 s 's are first identified by classical methods (or by a neural network itself, if this results in a better performance) and an additional neural network is implemented using only the variables describing the resonance. In a global approach, a unique neural network is built with all the possible inputs which take into account both the decay of K * ± into K 0 s and π ± and of K 0 s into π + π − . We tested both possibilities with similar results. In the following the strategy making use of a global network topology is discussed. However, to remove a large part of the background, a preselection of K 0 s was made before starting the learning process. After testing several network architectures, a 16-16-1 network employing 16 input neurons, 16 neurons on a hidden layer and one output neuron (Fig. 8) was employed.
The network was trained on a sample of about 460 resonance decays, mixed with a sample of false correlations between K 0 s and π ± . With the expected ratio between true and false decays, in the order of 0.05, the network was   Fig. 8. Architecture of the neural network implemented for the K * ± decay recognition. For the sake of simplicity, only a few synapses between the input layer and the hidden layer are here shown not able to recognize the true decays in the testing sample. The ratio between the two samples was then varied, and better results were obtained with a 1 : 1-1 : 2 true/fake ratio in the learning sample. Figure 9 shows the output from the neural network, while Fig. 10 shows the correlation between sample purity and efficiency in the testing sample, compared to the results obtained from the traditional method, for different cuts in the resonance parameters (white squares). A qualitative agreement between the two methods is found, again with some improvement in the neural network approach.
By use of the selection operated by the neural network, with a cut ≥ 0.3, the invariant mass spectrum of the K 0 s -π ± pairs was evaluated and compared to that obtained applying the same cut from the neural network to a sample of mixed events (Fig. 11). After subtracting the contribution of the combinatorial spectrum by the event mixing technique, the result is shown in Fig. 12, compared to the distribution of true pairs. A nice agreement between the Fig. 9. Output from the neural network for the recognition of the K * ± resonance Fig. 10. Sample purity vs. efficiency for the neural network recognizing the K * ± resonance. The empty squares mark the results obtained with standard cuts on the resonance parameters

Comparing multivariate discrimination methods by the TMVA package
Due to the problems frequently encountered in high energy physics (HEP), specific packages have been made recently available to HEP users to use and evaluate multivariate classification techniques. One such package is the Toolkit for MultiVariate Analysis (TMVA) which is available within the ROOT-integrated environment [12]. The package includes a variety of methods, such as the linear and non-linear discriminant analysis, multi-dimensional likelihood estimation, different implementation of artificial neural networks, k-nearest neighbour classifiers, and many others. Object-oriented implementations in C++/ROOT for each individual method are available to the user, in order to provide an easy way to compare different methods. Training, testing and evaluation are provided by suitable scripts which act on the same data set with the same prescriptions.
While the optimization of each classifier requires a careful tuning of the parameters for the needs of the specific problem, a reasonable fast check of the capabilities of different methods may be achieved in a short time even with the default parameters.
For such reason, we tested such possibility by a comparison between some of the widely used methods, employing the same data set and input variables as discussed in the previous section. The exercise was only intended as an example of what can be done in a specific case, with the help of a user-friendly toolkit, without going into the details of the specific optimization process.
Similarly to what discussed in the neural network approach, the output from each individual classifier is by convention a variable which accumulates at larger values for the signal and at smaller values for the background. Hence, by a proper cutting on this output, the user may select a sample with higher purity although with a reduced efficiency. As an example, Fig. 13a-d shows the (normalized) output signal distributions in our test sample, for the following methods: the boosted decision trees (BDT), the multilayer perceptron (MLP), the probability density estimator range search (PDERS) and the function discriminant analysis (FDA). For additional information on the meaning and use of such methods and the proper references the reader is addressed to the TMVA Users Guide [12]. The shapes of the signal and background samples are different in each of these methods, and in some cases (see for instance the FDA result with the default parameters in Fig. 13) a poor discrimination may be expected between signal and background.
Cutting on each of these classifier outputs, a typical plot of the background rejection factor versus the signal efficiency may be obtained, as shown in Fig. 14, which again compares the result from BDT, MLP, PDERS and FDA methods. As expected, the performances of the various methods may be largely different. However, it must be stressed that a proper comparison between them may be carried out only after a detailed tuning of all the relevant parameters, which is beyond the aims of the present work.
Suitable macros available inside the TMVA package allow the user to plot several quantities of interest as a function of the output classifier cut (the signal efficiency, the purity, their product, the significance, . . .), in order to evaluate the various methods. For our set of simulated data, an overall comparison of the signal efficiency versus purity is reported in Table 3 and Fig. 15 for several methods contained in the TMVA package. To allow a comparison, the value of the cut which maximizes the significance was chosen for each individual method. As expected, most of the points show a rough anticorrelation between efficiency and purity, so that one may choose to exploit that particular method which better fit the user requirement. Boosted decision tree and multilayer perceptron are probably the best suited methods in this case to increase the signal purity, although at the expense of the efficiency. Final statements about the superiority of such methods however would imply a detailed tuning of all methods, which is a rather time-consuming process.

Conclusions
A study of the identification of the short-lived K * ± resonance in pp collisions detected with ALICE at 900 GeV has been carried out. This study was intended as a pre- Fig. 13. Classifier normalized output distributions for signal and background events from the test sample containing K * ± resonance data liminary investigation to understand to what extent are the two-step decays of short-lived resonances observable in proton-proton events, where the charged multiplicity is not as high as in Pb-Pb events at the nominal LHC energy. To test the role of different reconstruction strategies, traditional methods based on topological cuts in a multidimensional space, neural network approaches and alternative classification schemes were tested. All strategies do not guarantee that the achieved results are optimized, since no a priori criteria exist to arrive at the optimal choice of the selection cuts or network topology. However, the results may be considered rather as typical of the performance which can be reached by such approaches. It was demonstrated that even with a limited number of events, in the order of a few 10 5 , the yield of the K * ± signal may be reliably extracted from the invariant mass spectrum. A larger number of events, in the order of 10 6 -10 7 , will then allow a complete study of such resonance decay, with the possibility to measure the transverse momentum spectrum in a wide p t -range. As far as the selection strategy is concerned, neural network recognition of the K 0 s and of the K * ± resonance was seen to give comparable, or even slightly better, results than that based on traditional methods. Additional studies based on alternative classification strategies, as those briefly presented at the end of the paper, could be useful in case of very highmultiplicity events expected in central Pb-Pb collisions at 5.5 A TeV.