1 Introduction

Collective flow is one of the most important observables in relativistic heavy-ion collisions, which provides valuable information on the initial state fluctuations, final state correlations and the QGP properties. In the past decades, various flow observables have been extensively measured in experiments and studied in theory [1,2,3,4,5,6]. In general, these flow observables are defined based on the Fourier decomposition. For example, the integrated flow harmonics are defined as:

$$\begin{aligned} \begin{aligned} \frac{\mathrm{d} N}{\mathrm{d} \varphi }&=\frac{1}{2\pi } \sum _{-\infty }^{\infty }\mathbf {V}_n e^{-in\varphi }\\&=\frac{1}{2\pi } \Bigg (1+ 2 \sum _{n=1}^{\infty } v_{n} {\mathrm{cos}}(n(\varphi -\varPsi _n))\Bigg ) \end{aligned} \end{aligned}$$
(1)

where \(\mathbf {V}_n =v_ne^{in\varPsi _n}\) is the nth order flow-vector, \(v_{n}\) is the n-th order flow harmonics and \(\varPsi _{n}\) is the corresponding event plane angle. In general, the first coefficient, \(v_1\), is called the directed flow, the second coefficient, \(v_2\), is called the elliptic flow and the third coefficient \(v_3\), is called the triangular flow. For \(n \ge 3\), \(v_n\) is also referred as the higher order flow harmonics.

In spite of the success of the flow measurements and the hydrodynamic descriptions, one essential question is why the Fourier expansion is a natural way to analyze the flow data. In this paper, we will address these questions with one of the machine learning techniques, called the principal component analysis (PCA). In more details, we will investigate if a machine could directly discover flow from the huge amount of data of the relativistic fluid systems without explicit instructions from human beings.

PCA is one of the unsupervised algorithms of machine learning [7] based on the Singular Value Decomposition (SVD) that diagonalize a random matrix with two orthogonal matrices. Compared with other deep learning algorithms, the advantage of PCA lies in its simple and elegant mathematical formulation, which is understandable and traceable to human beings, and is able to reveal the main structure of data in a quite transparent way.

Due to its strong power in data mining, PCA has been implemented to various research area of physics [8,9,10,11,12,13]. In molecular dynamics, PCA has been utilized to distinguish break junction trajectories of single molecules [8], which is time efficient and can transfer to a wide range of multivariate data sets. In the field of quantum mechanics, the quantum version of PCA was applied to study quantum coherence among different copies of the system [9], which are exponentially faster than any existing algorithm. In condensed matter physics, PCA has been implemented to study the phase transition in Ising model [11], which found that eigenvectors of PCA can aid in the definition of the order parameter, as well as provide reasonable predictions for the critical temperature without any prior knowledge. Besides, PCA is a widely used tool in engineering for model reduction to make computations more efficient [14].

In relativistic heavy-ion collisions, PCA has been implemented to study the event-by-event flow fluctuations, using the 2-particle correlations with the Fourier expansion [13, 15,16,17,18]. Compared with the traditional method, PCA explores all the information contained in the 2-particle correlations, which reveals the substructures in flow fluctuations [13, 15, 16]. It was found that the leading components of PCA correspond to the traditional flow harmonics and the sub-leading components evaluate the breakdown of the flow factorization at different \(p_t\) or \(\eta \) bins. Besides, PCA has also been used to study the non-linear mode coupling between different flow harmonics [17], which helps to discover some hidden mode-mixing patterns. Recently, the CMS Collaboration further implemented PCA to analyze 2-particle correlation in Pb-Pb collisions at \(\sqrt{s_{NN}}=\) 2.76 TeV and p-Pb collisions at \(\sqrt{s_{NN}}=\) 5.02 TeV [18], showing the potential of largely implementing such machine learning technique to realistic data in relativistic heavy ion collisions.

These early PCA investigations on flow [13, 15,16,17,18] are all based on the preprocessed data with the Fourier expansion, which still belong to the category of traditional flow analysis. In this paper, we will directly apply PCA to study the single particle distributions from hydrodynamic simulations without any priori Fourier transformation. We aim to explore if PCA could discover flow with its own bases.

This paper is organized as follows. Section 2 introduces relativistic hydrodynamics, principal component analysis (PCA) and the corresponding flow analysis. Section 3 shows and discusses the flow results from PCA and compares them with the ones from traditional Fourier expansion. Section 4 summarizes and concludes the paper.

2 Model and method

2.1 VISH2+1 hydrodynamics

In this paper, we implement VISH2+1 [19,20,21,22] to generate the final particle distributions for the PCA analysis. VISH2+1 [19,20,21,22] is a 2+1-dimensional viscous hydrodynamic code to simulate the expansion of the QGP fireballs, which solves the transport equations for the energy–momentum tenor \(T^{\mu \nu }\) and the second order Israel–Stewart equations for the shear stress tensor \(\pi ^{\mu \nu }\) and bulk pressure \(\varPi \) with an equation of state s95-PCE [23, 24] as an input. The initial profiles for VISH2+1 are provided by TRENTo, a parameterized initial condition model that generates event-by-event fluctuating entropy profiles with several tunable parameters [24, 25]. These parameters, together with the temperature dependent specific shear viscosity and bulk viscosity, hydrodynamic starting time (\(\tau _0=0.6 \ \mathrm {fm/c}\)) and decoupling /switching temperature (\(T_{sw} =148 \ \mathrm {MeV}\)) have been fixed through fitting all charged and identified particle yields, the mean transverse momenta and the integrated flow harmonics in 2.76 A TeV Pb+Pb collisions using the Bayesian statistics [24], which also nicely described various flow data at the LHC [26]. In practice, the transition from the hydrodynamic fluid to the emitted hadrons on the freeze-out surface is realized by a Monte-Carlo event generator iss based on the Cooper–Fryer formula [27]:

$$\begin{aligned} \frac{dN}{dy p_T dp_T d\varphi } = \int _\varSigma \frac{g}{(2\pi )^3} p^\mu d^3 \sigma _\mu f(x, p) \end{aligned}$$
(2)

where f(xp) is the distribution function of particles, g is the degeneracy factor, and \(d^3\sigma _\mu \) is the volume element on the freeze-out hypersurface.

For the following PCA analysis, as well as for the traditional flow analysis in comparison, we run the event-by-event VISH2+1 simulations with 12000 fluctuating initial conditions generated from TRENTo for 2.76 A TeV Pb–Pb collisions at 0–10%,10–20%, 20–30%, 30–40%, 40–50% and 50–60% centrality bins. The default iss sampling for each VISH2+1 simulation is 1000 events, which corresponds to the main results presented in Sect. 3. In the appendix of this paper, we also investigate the ability of PCA to distinguish signal and noise. We thus implement 25, 100 and 500 iss samplings for each VISH2+1 simulation for such investigation. Note that the default 1000 iss sampling used in this paper has already dramatically suppressed the statistical fluctuations from noises for the final hadron distributions.

With the final particle distributions obtained from hydrodynamic simulations, various flow observables can be calculated based on the traditional flow harmonics defined by the Fourier decomposition in Eq. (1). In Sect. 3, the traditional flow results will be served as the comparison to the PCA results.

2.2 Principal component analysis (PCA)

Principal component analysis (PCA) is a statistical method to analyze complicated data, which aims to transform a set of correlated variables into various independent variables via orthogonal transformations. These obtained main eigenvectors, associated with large or unnegligible singular values, are also called the principal components, which reveal the most representative characteristics of the data. In practice, PCA implements the singular value decomposition (SVD) to a real matrix, which obtains a diagonal matrix with the diagonal elements arranged in a descending order. Therefore, one needs to first construct a related matrix before the following PCA and SVD analysis. Since this paper focuses on investigating the integrated flow with PCA, such final state matrix \(\mathbf {M}_\mathbf{f}\) is constructed from the angular distribution of all charged hadrons \(dN/d\varphi \,(|y|<1.0)\) (obtained from Eq. (2)) of \(N=2000\) independent events in each centrality bin, using VISH2+1 simulations with TRENTo initial conditions. In more details, we divide the azimuthal angle \([-\pi ,\pi ]\) into \(m=50\) bins and count the number of particles in each bin. For the jth bin in event (i), the number of particles is denoted as \(dN/d\phi ^{(i)}_j \), which is also the \(i_{th}\) row and \(j_{th}\) column of the matrix \(\mathbf {M}_\mathbf{f }\).Footnote 1

Then, we apply SVD to the final state matrix \(\mathbf {M}_\mathbf{f}\) with the size \(N\times m\) (Here, \(N=2000\) and \(m=50\)), which gives

$$\begin{aligned} \mathbf {M_{f}}={\mathbf{X}}{{\varvec{\Sigma }}}{\mathbf{Z}}=\mathbf {{V}{Z}} \end{aligned}$$
(3)

where \(\mathbf {{X}}\) and \(\mathbf {{Z}}\) are two orthogonal matrices with the size of \(N\times N\) and \(m\times m\), respectively. \({{\varvec{\Sigma }}}\) is a diagonal matrix with diagonal elements (singular values) arranged in the descending order \(\sigma _1>\sigma _2>\sigma _3 \ \cdots >0\).

Fig. 1
figure 1

a The first 12 eigenvectors \({z}_j \ (j=1,2,\ldots ,12)\) and b the first 20 singular values \(\sigma _j \ (j=1,2, \ldots ,20)\), after applying PCA to the final state matrix \(\mathbf {M}_\mathbf{f}\). The matrix \(\mathbf {M}_\mathbf{f}\) is constructed from 2000 \(dN/d\varphi \) distributions, generated from the event-by-event VISH2+1 simulations with TRENTo initial conditions for 10–20% Pb + Pb collisions at \(\sqrt{s_{NN}}=\) 2.76 A TeV

With such matrix multiplication, the \(i_{th}\) row of matrix \(\mathbf {M}_\mathbf{f}\), denoted as \(dN/d\varphi ^{(i)}\), can be expressed by the linear combination of the eigenvectors \(z_j\) (the \(j_{th}\) row of matrix \(\mathbf {Z}\)) with \(j=1,2, \ldots ,m \):

$$\begin{aligned} dN/d\varphi ^{(i)}= & {} \sum _{j=1}^m {x}_j^{(i)}{\sigma }_j {z}_j =\sum _{j=1}^m \tilde{v}_j^{(i)} {z}_j \nonumber \\\approx & {} \sum _{j=1}^{{k}} \tilde{v}_j^{(i)} {z}_j \ \ \ (i)=1,\ldots ,N \end{aligned}$$
(4)

where \((i)=1,2,\ldots , N\), represents the index of the event, \({z}_j\) are a set of orthogonal vectors such that \({z}_i^T{z}_j=\delta _{ij}\), m is the number of angular bins of the inputting events. \(\tilde{v}_j^{(i)}\) is the corresponding coefficient of \({z}_j\) for the \(i_{th}\) event.Footnote 2 In the spirit of PCA, one only focuses on the most important components, so there is a cut at the indices k in the last approximation of Eq. (4). In Sect. 3, we will show that \({k}=12\) is a proper truncation for the integrated flow analysis, and the shape of the bases or eigenvectors \({z}_j \ (j=1, \ldots ,{k})\) is similar to but not identical with the Fourier transformation bases \(\cos (n\varphi )\) and \(\sin (n\varphi )\) (\(n=1, \ldots ,6\)) used in the traditional method. Correspondingly, \(\tilde{v}_j^{(i)} \ (j=1, \ldots , {k})\) is identified as the real or imaginary part of the flow harmonics for event (i), and the singular values \({\sigma }_j\) are associated with the corresponding event averaged flow harmonics at different orders. For more details, please also refer to Sect. 3.

3 Results

In this section, we implement PCA to analyze the single particle distributions \(dN/d\varphi \) from hydrodynamics simulations in Pb+Pb collisions at \(\sqrt{s_{NN}}=\) 2.76 A TeV. Firstly, we focus on the singular values, eigenvectors as well as the associated coefficients of PCA and explore if such unsupervised learning could discover flow with its own bases.

In practice, we run 2000 event-by-event VISH2+1 hydrodynamic simulations with TRENTo initial conditions to generate the \(dN/d\varphi \) distributions for 10–20% Pb + Pb collisions at \(\sqrt{s_{NN}}=\) 2.76 A TeV. With these \(dN/d\varphi \) distributions, we construct the final state matrix \(\mathbf {M}_\mathbf{f}\) and then implement SVD to \(\mathbf {M}_\mathbf{f}\) as described in Sect. 2. Figure 1 shows these obtained first 12 eigenvectors \({z}_j \ (j=1,2, \ldots ,12)\) and the first 20 singular values \({\sigma }_j \ (j=1,2, \ldots ,20)\) of PCA, arranged by the descending order of magnitude.Footnote 3 As introduced in Sect. 2, these eigenvectors contain the most representative information on correlations among final particles. Figure 1 shows that the 1st and 2nd eigenvectors from PCA are similar to the Fourier decomposition bases \(\mathrm {sin}(2\varphi )\) and \(\mathrm {cos}(2\varphi )\), and the 3rd and 4th components are similar to \(\mathrm {sin}(3\varphi )\) and \(\mathrm {cos}(3\varphi )\), etc. Meanwhile, Fig. 1b shows that singular values \({\sigma }_j \ (j=1,2, \ldots ,12)\) are arranged in pairs. These results indicate that each pair of the singular values may associate with the real and imaginary parts of the event averaged flow vectors at different orders. Therefore, we define the event averaged flow harmonics of PCA with these paired singular values, as outlined in the the second column of Table 1. The values of these PCA flow at different order are compared with the traditional flow harmonics from the Fourier expansion in Table 1, which are close, but not exactly the same values for \(n\le 6\).

Table 1 Event averaged flow harmonics \(v_n'\) from PCA and \(v_n\) from the Fourier expansion, for VISH2+1 simulated Pb+Pb collisions at 10–20% centrality
Fig. 2
figure 2

A comparison between the event-by-event flow harmonics \(v_n'\) from PCA and \(v_n\) from the Fourier expansion, for VISH2+1 simulated Pb+Pb collisions at 10–20% centrality

As explained in Sect. 2, one could also read the event-by-event flow harmonics from the results of PCA. In more details, such PCA flow harmonics for event (i) is associated with these coefficients \(\tilde{v}_j^{(i)}, j=1 \ldots k\) in Eq. (4). Therefore, we define the event-by-event flow harmonics \(v_n^\prime \) with magnitudes projected onto PCA bases, similar to the event averaged ones defined in Table 1. For example, \(v_2^\prime =\sqrt{\frac{m}{2}}\sqrt{\tilde{v}_1^2+\tilde{v}_2^2}\) and \(v_3^\prime =\sqrt{\frac{m}{2}}\sqrt{\tilde{v}_3^2+\tilde{v}_4^2}\) (\(m=50\)), etc. Fig. 2 compares \(v_n'\) from PCA and \(v_n\) from the traditional Fourier expansion at different orders. For the event-by-event elliptic flow \(v_2\) and \(v_2'\) and triangular flow \(v_3\) and \(v_3'\), the definitions from PCA and that from Fourier expansion are highly agree with each other, which mostly fall on the diagonal lines. For these higher order flow harmonics with \(n\ge 4\), these PCA results are largely deviated from the traditional Fourier ones. We also noticed that the first two PCA eigenvector \({z}_1\) and \({z}_2\) for \(v_2'\) are similar to but not identical with the Fourier bases \(\mathrm {sin}(2\varphi )\) and \(\mathrm {cos}(2\varphi )\) with \(n=2\), which contain the contributions from \(\mathrm {sin}(4\varphi )\) and \(\mathrm {cos}(4\varphi )\). Similarly, the PCA eigenvectors \({z}_3\) and \({z}_4\) also contain the contributions from other Fourier bases. Such mode mixing in the PCA eigenvectors leads to the large deviations between \(v_4\) and \(v_4'\), as well as between \(v_5\) and \(v_5'\), etc.

Fig. 3
figure 3

Symmetric Cumulants \(SC^v{'(m,n)}\) from PCA and \(SC^v{'(m,n)}\) from the Fourier expansion, for VISH2+1 simulated Pb+Pb collisions at various centralities

Fig. 4
figure 4

The Pearson coefficient \(r(v'_n, \varepsilon '_m)\) from PCA and \(r(v_n, \varepsilon _m)\) from Fourier expansion, for VISH2+1 simulated Pb + Pb collisions at various centralities

To evaluate the correlations between different PCA flow harmonics \(v_m'\) and \(v_n'\), we calculate the symmetric cumulants as once defined for traditional flow harmonics [28,29,30]:

$$\begin{aligned} SC^v{'(m,n)}= & {} \left<v_m^{\prime 2} v_{n}^{\prime 2}\right>-\left<v_{m}^{\prime 2}\right>\left<v_{n}^{\prime 2}\right>. \end{aligned}$$
(5)

Correspondingly, the traditional symmetric cumulants \(SC^v {(m,n)}\) just replace \(v'_m\) and \(v'_n\) with \(v_m\) and \(v_n\) from the Fourier expansion.

Figure 3 compares the symmetric cumulants \(SC^v{'(m,n)}\) from PCA and \(SC^v{'(m,n)}\) from Fourier expansion, for the event-by-event VISH2+1 simulations in 2.76 A TeV Pb+Pb collisions at various centrality bins. One finds that, except for \(SC^v(2,3)\), almost all PCA symmetric cumulants \(SC^v{'(m,n)}\) reduce significantly compared to the traditional ones. Although \(v'_4\) from PCA largely deviated from the traditional \(v_4\) from the Fourier expansion, the obtained \(SC^v{'(2,4)}\) shows a significant suppression, which contradicts to the long believed idea that the nonlinear hydrodynamics evolution strongly couples \(v_2^2\) to \(v_4\), leading to an obvious positive correlations between \(v_2\) and \(v_4\) obtained from Fourier expansion. Similarly, the non-linear mode coupling between \(v'_2\) and \(v'_5\), \(v'_3\) and \(v'_5\) and \(v'_3\) and \(v'_4\) for these PCA defined flow harmonics also decrease, which results in the reduced symmetric cumulants \(SC^v{'(2,5)}\), \(SC^v{'(3,5)}\) and \(SC^v{'(3,4)}\) correspondingly.

To evaluate the correlations between the initial and final state fluctuations, we use the Pearson coefficients \(r(v'_n, \varepsilon _m)\) and \(r(v_n, \varepsilon _m)\) to characterize the linearity between the PCA flow harmonics \(v'_n\) and the initial eccentricities \(\varepsilon _m\), as defined as the following:

$$\begin{aligned} r(v'_n, \varepsilon _m)=\frac{\langle v'_n \varepsilon _m \rangle -\langle v'_n \rangle \langle \varepsilon _m \rangle }{\sqrt{(v'_n-\langle v'_n\rangle )^2(\varepsilon _m-\langle \varepsilon _m\rangle )^2}} \end{aligned}$$
(6)

Here, \(\varepsilon _m\) is the traditional eccentricities defined by Eq. (A.1). In Appendix A, we will demonstrate that, with a properly chosen smoothing procedure, the event-by-event eccentricities \(\varepsilon '_m\) from PCA is highly similar to \(\varepsilon _m\) from the traditional method. We thus use \(\varepsilon _m\) in the Pearson coefficient definition \(r(v'_n, \varepsilon _m)\) for PCA. Meanwhile, we can also calculate the Pearson coefficient \(r(v_n, \varepsilon _m)\) for the traditional flow with Fourier expansion, which just replaces the flow harmonics \(v'_n\) in Eq. (6) by \(v_n\). According to the definition, the Pearson coefficient falls in the range \([-1,1]\), with \(r>0\) implying a positive correlation, and \(r<0\) implying a negative correlation.

Figure 4 plots the Pearson coefficients \(r(v'_n, \varepsilon _m)\) from PCA and \(r(v_n, \varepsilon _m)\) from the Fourier expansion, for VISH2+1 simulated Pb+Pb collisions at various centralities. With these Pearson coefficients, we focus on evaluating if the PCA defined flow harmonics reduce or increase the correlations with the corresponding initial eccentricities. As shown in Fig. 3, the event-by-event flow harmonics \(v'_2\) or \(v'_3\) from PCA are approximately equal to the Fourier ones \(v_2\) or \(v_3\). As a result, these Pearson coefficients involved with these two flow harmonics \(r(v'_2, \varepsilon _m)\) and \(r(v'_3, \varepsilon _m)\) are almost overlap with the Fourier ones \(r(v_2, \varepsilon _m)\) and \(r(v_3, \varepsilon _m)\) as shown by these upper panels in the first two rows. Meanwhile, these diagonal Pearson coefficients \(r(v'_2, \varepsilon _2)\) or \(r(v_2, \varepsilon _2)\) and \(r(v'_3, \varepsilon _3)\) or \(r(v_3, \varepsilon _3)\) are much larger than other ones, which confirms the early conclusion that the elliptic flow and triangular flow are mainly influenced by the initial eccentricity \(\varepsilon _2\) and \(\varepsilon _3\) with the approximate linear relationship \(v_2 \thicksim \varepsilon _2\) (\(v'_2 \thicksim \varepsilon _2\)) and \(v_3 \thicksim \varepsilon _3\) (\(v'_3 \thicksim \varepsilon _3\)) [31, 32].

Although \(v'_4\) from PCA is largely deviated from the traditional \(v_4\) in Fig. 3, such PCA definition largely enhances correlations between \(\varepsilon _4\), and also largely reduces the correlations between \(\varepsilon _2\). For example, at 20-30% centrality, the Pearson coefficients \(r(v_4, \varepsilon _4)\) is only 70% of the \(r(v_4^\prime , \varepsilon _4)\), while \(r(v_4, \varepsilon _2)\) is 200% larger than \(r(v'_4, \varepsilon _2)\). Traditionally, it is generally believed that \(v_4\) is largely influenced by \(\varepsilon _2^2\) through the non-linear evolution of hydrodynamics and the Cooper–Frye freeze-out procedure. Our PCA analysis showed that such mode mixing could be deduced through a redefined PCA bases. Meanwhile, such PCA defined bases also significantly reduce the mode mixing for other higher order flow harmonics such as between \(v'_5\) and \(\varepsilon _2\), \(v'_5\) and \(\varepsilon _3\), etc.

4 Conclusions

In this paper, we implemented Principal Components Analysis (PCA) to study the single particle distributions of thousands of events generated from VISH2+1 hydrodynamic simulations. Compared with the early PCA investigations on flow that imposed the Fourier transformation in the input data [13, 15,16,17,18], we focused on analyzing the raw data of hydrodynamics and exploring if a machine could directly discover flow from the huge amount of data without explicit instructions from human-beings. We found that the PCA eigenvectors are similar to but not identical with the traditional Fourier basis. Correspondingly, the obtained flow harmonics \(v_n^\prime \) from PCA are also similar to the traditional \(v_n\) for \(n=2\) and 3, but largely deviate from the Fourier ones for \(n\ge 4\). With these PCA flow harmonics, we found that, except for \(SC^v{'(2,3)}\), almost all other symmetric cumulants \(SC^v{'(m,n)}\) from PCA decrease significantly compared to the traditional \(SC^v{(m,n)}\). Meanwhile, some certain Pearson coefficients \(r(v'_n, \varepsilon _m)\) that evaluate the linearity between the PCA flow harmonics and the initial eccentricities are obviously enhanced (especially for \(n \ge 4\)), together with an corresponding reduction of the off-diagonal elements.

These results indicate that PCA has the ability to discover flow with its own basis, which also reduce the related mode coupling effects, when compared with traditional flow analysis based on the Fourier expansion. We emphasis that these eigenvectors from PCA are modeled to be orthogonal and uncorrelated to each other. As a result, most of the symmetric cumulants \(SC^v{'(m,n)}\) from PCA that evaluate the correlations between different flow harmonics are naturally reduced compared with the traditional ones. Besides, the PCA flow harmonics \(v'_n\) presents an enhanced linear relationship to the corresponding eccentricities \(\varepsilon _n\), especially for \(n=4\). These results seem contradictory to the long believed idea that hydrodynamics evolution are highly non-linear, which leads to strong mode-coupling between different flow harmonics. Our PCA investigation has shown that such mode coupling effects could be reduced with new-defined bases for the flow analysis. With such finding, the non-linearity of the hot QGP systems created in heavy ion collisions should be re-evaluated, which we would like to further explore it with such PCA method in the near future.