Principal Component Analysis of collective flow in Relativistic Heavy-Ion Collisions

In this paper, we implement Principal Component Analysis (PCA) to study the single particle distributions generated from thousands of {\tt VISH2+1} hydrodynamic simulations with an aim to explore if a machine could directly discover flow from the huge amount of data without explicit instructions from human-beings. We found that the obtained PCA eigenvectors are similar to but not identical with the traditional Fourier bases. Correspondingly, the PCA defined flow harmonics $v_n^\prime$ are also similar to the traditional $v_n$ for $n=2$ and 3, but largely deviated from the Fourier ones for $n\geq 4$. A further study on the symmetric cumulants and the Pearson coefficients indicates that mode-coupling effects are reduced for these flow harmonics defined by PCA.


Introduction
Collective flow is one of the most important observables in relativistic heavy-ion collisions, which provides valuable information on the initial state fluctuations, final state correlations and the QGP properties. In the past decades, various flow observables have been extensively measured in experiments and studied in theory [1][2][3][4][5][6]. In general, these flow observables are defined based on the Fourier decomposition. For example, the integrated flow harmonics are defined as: where V n = v n e inΨ n is the n-th order flow-vector, v n is the nth order flow harmonics and Ψ n is the corresponding event plane angle. In general, the first coefficient, v 1 , is called the directed flow, the second coefficient, v 2 , is called the elliptic flow and the third coefficient v 3 , is called the triangular flow. For n ≥ 3, v n is also referred as the higher order flow harmonics. In spite of the success of the flow measurements and the hydrodynamic descriptions, one essential question is why the Fourier expansion is a natural way to analyze the flow data. In this paper, we will address these questions with one of the machine learning techniques, called the Principal Component Analysis (PCA). In more details, we will investigate if a machine could directly discover flow from the huge amount of data of the relativistic fluid systems without explicit instructions from human beings.
Email address: Huichaosong@pku.edu.cn (Huichao Song) PCA is one of the unsupervised algorithms of machine learning [7] based on the Singular Value Decomposition (SVD) that diagonalize a random matrix with two orthogonal matrices. Compared with other deep learning algorithms, the advantage of PCA lies in its simple and elegant mathematical formulation, which is understandable and traceable to human beings, and is able to reveal the main structure of data in a quite transparent way.
Due to its strong power in data mining, PCA has been implemented to various research area of physics [8][9][10][11][12][13]. In molecular dynamics, PCA has been utilized to distinguish break junction trajectories of single molecules [8], which is time efficient and can transfer to a wide range of multivariate data sets. In the field of quantum mechanics, the quantum version of PCA was applied to study quantum coherence among different copies of the system [9], which are exponentially faster than any existing algorithm. In condensed matter physics, PCA has been implemented to study the phase transition in Ising model [11], which found that eigenvectors of PCA can aid in the definition of the order parameter, as well as provide reasonable predictions for the critical temperature without any prior knowledge. Besides, PCA is a widely used tool in engineering for model reduction to make computations more efficient [14].
In relativistic heavy-ion collisions, PCA has been implemented to study the event-by-event flow fluctuations, using the 2-particle correlations with the Fourier expansion [13,[15][16][17][18]. Compared with the traditional method, PCA explores all the information contained in the 2-particle correlations, which reveals the substructures in flow fluctuations [13,15,16]. It was found that the leading components of PCA correspond to the traditional flow harmonics and the sub-leading components evaluate the breakdown of the flow factorization at different p t or η bins. Besides, PCA has also been used to study the non-linear mode cou-pling between different flow harmonics [17], which helps to discover some hidden mode-mixing patterns. Recently, the CMS Collaboration further implemented PCA to analyze 2-particle correlation in Pb-Pb collisions at √ s NN = 2.76 TeV and p-Pb collisions at √ s NN = 5.02 TeV [18], showing the potential of largely implementing such machine learning technique to realistic data in relativistic heavy ion collisions. These early PCA investigations on flow [13,[15][16][17][18] are all based on the preprocessed data with the Fourier expansion, which still belong to the category of traditional flow analysis. In this paper, we will directly apply PCA to study the single particle distributions from hydrodynamic simulations without any priori Fourier transformation. We aim to explore if PCA could discover flow with its own bases.
This paper is organized as follows. Sec. II introduces relativistic hydrodynamics, principal component analysis (PCA) and the corresponding flow analysis. Sec. III shows and discusses the flow results from PCA and compares them with the ones from traditional Fourier expansion. Sec. IV summarizes and concludes the paper.

VISH2+1 hydrodynamics
In this paper, we implement VISH2+1 [19][20][21][22] to generate the final particle distributions for the PCA analysis. VISH2+1 [19][20][21][22] is a 2+1-dimensional viscous hydrodynamic code to simulate the expansion of the QGP fireballs, which solves the transport equations for the energymomentum tenor T µν and the second order Israel-Stewart equations for the shear stress tensor π µν and bulk pressure Π with an equation of state s95-PCE [23,24] as an input. The initial profiles for VISH2+1 are provided by TRENTo, a parameterized initial condition model that generates eventby-event fluctuating entropy profiles with several tunable parameters [24,25]. These parameters, together with the temperature dependent specific shear viscosity and bulk viscosity, hydrodynamic starting time (τ 0 = 0.6 fm/c) and decoupling /switching temperature (T sw = 148 MeV) have been fixed through fitting all charged and identified particle yields, the mean transverse momenta and the integrated flow harmonics in 2.76 A TeV Pb+Pb collisions using the Bayesian statistics [24], which also nicely described various flow data at the LHC [26]. In practice, the transition from the hydrodynamic fluid to the emitted hadrons on the freeze-out surface is realized by a Monte-Carlo event generator iss based on the Cooper-Fryer formula [27]: where f (x, p) is the distribution function of particles, g is the degeneracy factor, and d 3 σ µ is the volume element on the freeze-out hypersurface. For the following PCA analysis, as well as for the traditional flow analysis in comparison, we run the eventby-event VISH2+1 simulations with 12000 fluctuating initial conditions generated from TRENTo for 2.76 A TeV Pb-Pb collisions at 0%-10%,10%-20%, 20%-30%, 30%-40%, 40%-50% and 50%-60% centrality bins. The default iss sampling for each VISH2+1 simulation is 1000 events, which corresponds to the main results presented in Sec. III. In the appendix of this paper, we also investigate the ability of PCA to distinguish signal and noise. We thus implement 25, 100 and 500 iss samplings for each VISH2+1 simulation for such investigation. Note that the default 1000 iss sampling used in this paper has already dramatically suppressed the statistical fluctuations from noises for the final hadron distributions.
With the final particle distributions obtained from hydrodynamic simulations, various flow observables can be calculated based on the traditional flow harmonics defined by the Fourier decomposition in Eq.(1). In Sec.III, the traditional flow results will be served as the comparison to the PCA results.

Principal Component Analysis (PCA)
Principal Component Analysis (PCA) is a statistical method to analyze complicated data, which aims to transform a set of correlated variables into various independent variables via orthogonal transformations. These obtained main eigenvectors, associated with large or unnegligible singular values, are also called the principal components, which reveal the most representative characteristics of the data. In practice, PCA implements the Singular Value Decomposition (SVD) to a real matrix, which obtains a diagonal matrix with the diagonal elements arranged in a descending order. Therefore, one needs to first construct a related matrix before the following PCA and SVD analysis. Since this paper focuses on investigating the integrated flow with PCA, such final state matrix M f is constructed from the angular distribution of all charged hadrons dN/dϕ (|y| < 1.0) (obtained from Eq.(2)) of N = 2000 independent events in each centrality bin, using VISH2+1 simulations with TRENTo initial conditions. In more details, we divide the azimuthal angle [−π, π] into m = 50 bins and count the number of particles in each bin. For the j th bin in event (i), the number of particles is denoted as dN/dφ (i) j , which is also the i th row and j th column of the matrix M f 1 . Then, we apply SVD to the final state matrix M f with the size N × m (Here, N = 2000 and m = 50), which gives where X and Z are two orthogonal matrices with the size of N × N and m × m, respectively. Σ is a diagonal matrix with diagonal elements (singular values) arranged in the descending order σ 1 > σ 2 > σ 3 · · · > 0. With such matrix multiplication, the i th row of matrix M f , denoted as dN/dϕ (i) , can be expressed by the linear combination of the eigenvectors z j (the j th row of matrix where (i) = 1, 2, ..., N, represents the index of the event, m is the number of angular bins of the inputting events.ṽ (i) j is the corresponding coefficient of z j for the i th event. In the spirit of PCA, one only focuses on the most important components, so there is a cut at the indices k in the last approximation of Eq.(4). In Sec. III, we will show that k = 12 is a proper truncation for the integrated flow analysis, and the shape of the bases or eigenvectors z j ( j = 1, ..., k) is similar to but not identical with the Fourier transformation bases cos(nϕ) and sin(nϕ) (n = 1, ..., 6) used in the traditional method. Correspondingly,ṽ (i) j ( j = 1, ..., k) is identified as the real or imaginary part of the flow harmonics for event (i), and the singular values σ j are associated with the corresponding event averaged flow harmonics at different orders. For more details, please also refer to Sec. III.

Results
In this section, we implement PCA to analyze the single particle distributions dN/dϕ from hydrodynamics simulations in Pb+Pb collisions at √ s NN = 2.76 A TeV. Firstly, we focus on the singular values, eigenvectors as well as the associated coefficients of PCA and explore if such unsupervised learning could discover flow with its own bases.
In practice, we run 2000 event-by-event VISH2+1 hydrodynamic simulations with TRENTo initial conditions to generate the dN/dϕ distributions for 10%-20% Pb+Pb collisions at √ s NN = 2.76 A TeV. With these dN/dϕ distributions, we construct the final state matrix M f and then implement SVD to M f as described in Sec. II. Fig. 1 shows these obtained first 12 eigenvectors z j ( j = 1, 2, ..., 12) and  As introduced in Sec. II, these eigenvectors contain the most representative information on correlations among final particles. Fig. 1 shows that the 1st and 2nd eigenvectors from PCA are similar to the Fourier decomposition bases sin(2ϕ) and cos(2ϕ), and the 3rd and 4th components are similar to sin(3ϕ) and cos(3ϕ), etc. Meanwhile, Fig. 1 (b) shows that singular values σ j ( j = 1, 2, ..., 12) are arranged in pairs. These results indicate that each pair of the singular values may associate with the real and imaginary parts of the event averaged flow vectors at different orders. Therefore, we define the event averaged flow harmonics of PCA with these paired singular values, as outlined in the the second column of Table 1. The values of these PCA flow at different order are compared with the traditional flow harmonics from the Fourier expansion in Table 1, which are close, but not exactly the same values for n ≤ 6. As explained in Sec II, one could also read the eventby-event flow harmonics from the results of PCA. In more details, such PCA flow harmonics for event (i) is associated with these coefficientsṽ (i) j , j = 1...k in Eq. (4). 2 Each eigenvector is automatically normalized with ||z j || 2 2 = m i=1 (z j ) 2 i = 1 (m = 50), due to the orthogonality of the eigenvector matrix Z. Therefore, we define the event-by-event flow harmonics v n with magnitudes projected onto PCA bases, similar to the event averaged ones defined in Table 1. For example, v 2 = m 2 ṽ 2 1 +ṽ 2 2 and v 3 = m 2 ṽ 2 3 +ṽ 2 4 (m = 50), etc. Fig. 2 compares v n from PCA and v n from the traditional Fourier expansion at different orders. For the event-byevent elliptic flow v 2 and v 2 and triangular flow v 3 and v 3 , the definitions from PCA and that from Fourier expansion are highly agree with each other, which mostly fall on the diagonal lines. For these higher order flow harmonics with n ≥ 4, these PCA results are largely deviated from the traditional Fourier ones. We also noticed that the first two PCA eigenvector z 1 and z 2 for v 2 are similar to but not identical with the Fourier bases sin(2ϕ) and cos(2ϕ) with n = 2, which contain the contributions from sin(4ϕ) and cos(4ϕ). Similarly, the PCA eigenvectors z 3 and z 4 also contain the contributions from other Fourier bases. Such mode mixing in the PCA eigenvectors leads to the large deviations between v 4 and v 4 , as well as between v 5 and v 5 , etc.
To evaluate the correlations between different PCA flow harmonics v m and v n , we calculate the symmetric cumulants as once defined for traditional flow harmonics [28][29][30]: Correspondingly, the traditional symmetric cumulants S C v (m, n) just replace v m and v n with v m and v n from the Fourier expansion. FIG. 3 compares the symmetric cumulants S C v (m, n) from PCA and S C v (m, n) from Fourier expansion, for the event-by-event VISH2+1 simulations in 2.76 A TeV Pb+Pb collisions at various centrality bins. One finds that, except for S C v (2, 3), almost all PCA symmetric cumulants S C v (m, n) reduce significantly compared to the traditional ones. Although v 4 from PCA largely deviated from the traditional v 4 from the Fourier expansion, the obtained S C v (2, 4) shows a significant suppression, which contradicts to the long believed idea that the nonlinear hydrodynamics evolution strongly couples v 2 2 to v 4 , leading to an obvious positive correlations between v 2 and v 4 obtained from Fourier expansion. Similarly, the non-linear mode coupling between v 2 and v 5 , v 3 and v 5 and v 3 and v 4 for these PCA defined flow harmonics also decrease, which results in the reduced symmetric cumulants S C v (2, 5), S C v (3, 5) and S C v (3, 4) correspondingly.
To evaluate the correlations between the initial and final state fluctuations, we use the Pearson coefficients r(v n , ε m ) 0.5 0.0 0.5 1.0 and r(v n , ε m ) to characterize the linearity between the PCA flow harmonics v n and the initial eccentricities ε m , as defined as the following: Here, ε m is the traditional eccentricities defined by Eq. (A.1). In Appendix A, we will demonstrate that, with a properly chosen smoothing procedure, the event-by-event eccentricities ε m from PCA is highly similar to ε m from the traditional method. We thus use ε m in the Pearson coefficient definition r(v n , ε m ) for PCA. Meanwhile, we can also calculate the Pearson coefficient r(v n , ε m ) for the traditional flow with Fourier expansion, which just replaces the flow harmonics v n in Eq. (6) by v n . According to the definition, the Pearson coefficient falls in the range [−1, 1], with r > 0 implying a positive correlation, and r < 0 implying a negative correlation. Fig. 4 plots the Pearson coefficients r(v n , ε m ) from PCA and r(v n , ε m ) from the Fourier expansion, for VISH2+1 simulated Pb+Pb collisions at various centralities. With these Pearson coefficients, we focus on evaluating if the PCA defined flow harmonics reduce or increase the correlations with the corresponding initial eccentricities. As shown in Fig. 3, the event-by-event flow harmonics v 2 or v 3 from PCA are approximately equal to the Fourier ones v 2 or v 3 . As a result, these Pearson coefficients involved with these two flow harmonics r(v 2 , ε m ) and r(v 3 , ε m ) are almost overlap with the Fourier ones r(v 2 , ε m ) and r(v 3 , ε m ) as shown by these upper panels in the first two rows. Meanwhile, these diagonal Pearson coefficients r(v 2 , ε 2 ) or r(v 2 , ε 2 ) and r(v 3 , ε 3 ) or r(v 3 , ε 3 ) are much larger than other ones, which confirms the early conclusion that the elliptic flow and triangular flow are mainly influenced by the initial eccentricity ε 2 and ε 3 with the approximate linear relationship v 2 ∼ ε 2 (v 2 ∼ ε 2 ) and v 3 ∼ ε 3 (v 3 ∼ ε 3 ) [31,32].
Although v 4 from PCA is largely deviated from the traditional v 4 in Fig. 3, such PCA definition largely enhances correlations between ε 4 , and also largely reduces the correlations between ε 2 . For example, at 20-30% centrality, the Pearson coefficients r(v 4 , ε 4 ) is only 70% of the r(v 4 , ε 4 ), while r(v 4 , ε 2 ) is 200% larger than r(v 4 , ε 2 ). Traditionally, it is generally believed that v 4 is largely influenced by ε 2 2 through the non-linear evolution of hydrodynamics. Our PCA analysis showed that such mode mixing could be deduced through a redefined PCA bases. Meanwhile, such PCA defined bases also significantly reduce the mode mixing for other higher order flow harmonics such as between v 5 and ε 2 , v 5 and ε 3 , etc.

Conclusions
In this paper, we implemented Principal Components Analysis (PCA) to study the single particle distributions of thousands of events generated from VISH2+1 hydrodynamic simulations. Compared with the early PCA investigations on flow that imposed the Fourier transformation in the input data [13,[15][16][17][18], we focused on analyzing the raw data of hydrodynamics and exploring if a machine could directly discover flow from the huge amount of data without explicit instructions from human-beings. We found that the PCA eigenvectors are similar to but not identical with the traditional Fourier basis. Correspondingly, the obtained flow harmonics v n from PCA are also similar to the traditional v n for n = 2 and 3, but largely deviate from the Fourier ones for n ≥ 4. With these PCA flow harmonics, we found that, except for S C v (2, 3), almost all other symmetric cumulants S C v (m, n) from PCA decrease significantly compared to the traditional S C v (m, n). Meanwhile, some certain Pearson coefficients r(v n , ε m ) that evaluate the linearity between the PCA flow harmonics and the initial eccentricities are obviously enhanced (especially for n ≥ 4), together with an corresponding reduction of the off-diagonal elements.
These results indicate that PCA has the ability to discover flow with its own basis, which also reduce the related mode coupling effects, when compared with traditional flow analysis based on the Fourier expansion. We emphasis that these eigenvectors from PCA are modeled to be orthogonal and uncorrelated to each other. As a result, most of the symmetric cumulants S C v (m, n) from PCA that evaluate the correlations between different flow harmonics are naturally reduced compared with the traditional ones. Besides, the PCA flow harmonics v n presents an enhanced linear relationship to the corresponding eccentricities ε n , especially for n = 4. These results seem contradictory to the long believed idea that hydrodynamics evolution are highly non-linear, which leads to strong mode-coupling between different flow harmonics. Our PCA investigation has shown that such mode coupling effects could be reduced with new-defined bases for the flow analysis. With such finding, the non-linearity of the relativistic hydrodynamic systems created in heavy ion collisions should be re-evaluated, which we would like to further explore it with such PCA method in the near future. also gratefully acknowledge the extensive computing resources provided by the Super-computing Center of Chinese Academy of Science (SCCAS), Tianhe-1A from the National Supercomputing Center in Tianjin, China and the High-performance Computing Platform of Peking University.
Appendix A. PCA for initial profiles with smoothing procedure In this appendix, we focus on analyzing the initial state fluctuations using the PCA method. Traditionally, the initial state fluctuations are evaluated by the eccentricity coefficients ε n at different order, which are defined as [31]: ε n e inΦ n = − r dr dϕ r n e inϕ s(r, ϕ) r dr dϕ r n s(r, ϕ) , where Φ n is the participant plan angle, s(r, ϕ) is the initial entropy density and ϕ is the azimuthal angle in the transverse plane [31].
For the PCA analysis, we first construct the initial state matrix M i , using the azimuthal angle distribution of the initial entropy dS /dϕ which is defined by obtained from 2000 event-by-event TRENTo initial conditions. A direct PCA analysis shows that more than 100 eigenvectors are needed to capture the rich structures of the initial state fluctuations. In contrast, 12 PCA eigenvectors are enough to describe the final state ones since the hydrodynamic evolution tends to smear out inhomogeneity of the evolving systems. In order to connect and compare these PCA singular values from the initial and final states, we implement a smoothing procedure for the initial profiles before the PCA analysis.
In more details, we apply a circular convolution with to the initial density profile dS /dϕ, which is written as: Here, K(ϕ , ϕ) is the convolution kernel, which is taken Here, we fine tune the radius a to ensure the the same decaying rate for the PCA singular values from the initial profiles and final profiles. The obtained optimized value for a is 0.251 rad.
With such smoothing procedure, we reconstruct the initial state matrix M i with 2000 event-by-event ( dS dϕ ) smooth distributions from TRENTo for each selected centralities. As the case for the flow analysis in Sec. II B and Sec.III, the implementation of SVD and PCA to the initial state matrix, M i =ŶΣẐ =ÊẐ, gives the singular valueσ j , eigenvectorsẑ j and the corresponding eccentricity coeffi- A comparison between the event-by-event eccentricites ε n from PCA and ε n from the traditional definition (A1), for TRENTo initial conditions at 10-20% centrality. In the corresponding panels, we also write the values of event averaged eccentricites ε n from PCA and ε n from the traditional definition.
We find that the PCA eigenvectorsẑ j of the initial states are highly similar to traditional Fourier bases cos(2ϕ), sin(2ϕ), cos(3ϕ), sin(3ϕ), etc. Meanwhile, we could associate the singular valueσ j to the event averaged initial eccentricities of PCA,ε n , at different orders and connect the coefficientŝ ε (i) j , ( j = 1, ...k) to the real or imaginary part of the PCA event-by-event initial eccentricities ε n (n = 1, ...k/2) as the case for flow 3 . Fig. A.5 compares the event-by-event eccentricites ε n from PCA and ε n from the traditional definition (A.1) for the TRENTo initial conditions at 10-20% centrality. It shows, with a properly chosen smoothing procedure of the initial conditions, ε n and ε n agree with each other well till n = 6 . Meanwhile, the event averaged eccentricites ε n from PCA and ε n from (A.1) also fit each other very well, which is much better than ones for flow shown in Table 1 and Fig. 2. Therefore, for the investigation of initial state and final state correlations, we only use the traditional ε m to define the Pearson coefficient r(v n , ε m ), and r(v n , ε m ) for both PCA and traditional flow in Sec.III. the statistical fluctuations during Cooper-Fryer freeze-out with a finite number of particle emission introduce statistical noise for the flow definition in each event. As a result, flow harmonics from traditional Fourier expansion are generally analyzed with an event average of millions of events. For the event-by-event flow analysis, one implements the standard Bayesian unfolding procedure to suppress effects from the finite multiplicites and non-flow [33].
In this appendix, we further explore the ability of PCA to distinguish the signal and noise. With such purpose, we implement 25, 100 and 500 iss samplings for each VISH2+1 simulation to generate the dN/dϕ distributions of final particles and the related final state matrixes M f with different weighted signal and noise. Then, we implement PCA to analyze these matrixes. As shown in Fig. B.6, the distribution of the PCA singular values is changed with the number of iss samplings. For these systems with large statistical fluctuations, for example with 25 iss samplings, the singular values σ j at large j tend to have a long and high tail. For these systems with reduced statistical fluctuations with more iss samplings, the height of the tail is largely decreased. Meanwhile, we noticed that these eigenvectors with an index j smaller than a certain "magic number"(12 in this case) is signal-like which has a basis similar to the Fourier one, while these eigenvectors with larger j behave so randomly and chaotically, that we as-sociate these eigenvectors with the noise patterns of the systems.Besides, we check the height of these PCA tails and found the ratios among these heights for different iss samplings approximately satisfy 1 √ 25 : 1 √ 100 : 1 √ 500 , such relation is known as the Law of Large Numbers for statistical noise. With more number of samplings, the height of the tail would further decrease. In the main part of this paper, we thus set the iss samplings for each VISH2+1 simulation to 1000, which largely suppresses the noise effects from the statistical fluctuations and makes PCA analysis focus on studying flow signal itself.