1 Introduction

The dynamics of the strong interactions between quarks and gluons, governing the properties of hot and dense nuclear matter, can be described by the theory of QCD. It predicts that, if the temperature of strongly-interacting matter becomes large enough, a new state of matter is formed in which quarks and gluons can roam freely and are not confined in the hadrons anymore. This state of matter is called the quark-gluon plasma (QGP). Lattice QCD has established that the transition from a hadron gas to the QGP is a smooth crossover at a high temperature \(T\thicksim 140-180\) MeV and low net baryon density [1,2,3]. A variety of theoretical models, such as the Dyson–Schwinger equations model [4,5,6,7], the (Polyakov loop-) Nambu–Jona–Lasinio model [8,9,10,11,12] and the quark-meson coupling model [13,14,15] also predict the existence of a first-order phase transition that occurs at low temperature and moderate to large net baryon densities.

Relativistic heavy ion experiments have been carried out at the SIS18 [16], at the AGS [17] and at the SPS [18] in the fixed target mode and at the Relativistic Heavy Ion Collider (RHIC) [19] as well as at the Large Hadron Collider (LHC) [20] in the collider mode. The forthcoming Facility for Anti-proton and Ion Research (FAIR) [21, 22] and the Nuclotron-based Ion Collider fAcility (NICA) [23] will provide unprecedented intensities and luminosities for future studies. The main goal of these large experiments is to search for signals for the QCD phase transition and study the properties of QGP in nucleus-nucleus collisions. Due to the transience of the heavy ion collision dynamics, the QCD medium bulk properties can’t be directly observed in experiment. A strategy to identify the signals of QGP is to compare sophisticated model simulations with varying parameter sets and different equations of state (with and without a phase transition) with experimental data such as particle spectra and correlation functions. Currently some observables, for example, anisotropic flow [24,25,26,27], directed flow [28, 29] and fluctuations of particle multiplicities [30,31,32,33], are conjectured as most sensitive to the appearance of a phase transition. However, no disentangled mapping between these observables and this specific bulk property of the QCD medium from others, has been obtained so far. Then it’s necessary to call for modern data analysis methods like Bayesian analysis or the deep neural network approach.

The Bayesian analysis [34,35,36] applies a global fitting to a set of different observables for parameter estimation. In Ref. [36], the crossover type EoS was employed in the hybrid hydrodynamic framework and the event-averaged experimental data (e.g. particle yields, momentum distribution and flow) were used to infer the temperature-dependent shear and bulk viscosity of nuclear matter and other parameters at the same time. These estimated temperature-dependent viscosities are their marginal distributions by integrating out other parameters, respectively. One way to constrain these bulk properties of nuclear matter better is to fit more data or make use of more information from data. On one hand, one can employ higher-dimensional raw data instead of the integrated one to fit. On the other hand, the event-by-event fluctuation may contain more information as well.

In this work, we will explore the feasibility of identifying QCD EoS from event-by-event high-dimensional raw hadron spectra in high energy nucleus-nucleus collisions using the tools and techniques of Deep Learning (DL). DL was developed to capture highly-correlated features from big data [37, 38]. It has achieved tremendous success in a wide variety of applications, like image processing, natural language processing, computer vision, medical imaging, medical information processing, and other interesting fields. These have inspired physicists to adopt the technique to tackle physical problems of great complexity. A lot of progresses have been made in nuclear physics [39,40,41,42,43,44,45], lattice field theory [46,47,48,49,50], particle physics [51,52,53,54,55], astrophysics [56,57,58] and condensed matter physics [59,60,61,62,63,64,65].

For our exploration with DL method here, the purpose is to find out a disentangled mapping between observed final raw spectra and the EoS type for the medium. We vary different parameters, including shear viscosity, equilibration time, freeze-out temperature, etc., to enforce the neural network to explore if it can find a direct mapping from event-by-event high-dimensional raw spectra to the EoS type which can be immune to other parameters’ ‘interference’ in certain ranges. As long as we can find such a mapping, its straightforword to infer information about the EoS type from the measured data in experiment as the detector simulation or calibration is also considered for further study.

The great advantage of the DL method over conventional ones is its ability to extract hidden features from highly dynamical, rapidly evolving and complex non-linear systems, like in relativistic heavy ion collisions. Conventional observables rely on human’s design and are usually low-dimensional projections of the high-dimensional raw data. When one uses only part of these projected information to constrain the properties of nuclear matter, the estimated value are prone to be dependent on the specific model setup (e.g. other untuned parameters in the fitting) and the chosen observables. Instead DL methods can be used to explore distinct mappings and to construct observables from the full high-dimensional raw data for the classification task at hand. Recently, a deep CNN classifier was developed as an effective “EoS-meter”, an excellent tool for revealing the nature of the QCD transition with a high predictive accuracy \(\thicksim 95\%\) in hadron spectra from a pure hydrodynamic study [39].

The present work studies the performance of a CNN to identify the EoS trained and tested with hadron spectra from a more realistic simulation of heavy ion collisions. The generalizability of the method is explored by considering well established dynamics in the state-of-the-art simulation models. First of all, the hadronic rescattering, after the hydrodynamics evolution, is taken into account in the simulation via a hadronic cascade. Consequently, the event-by-event final-state pion spectra are discrete instead of smooth as in hydrodynamic simulations. Secondly, the resonance decays are included, which also contribute to the pion spectra. Due to the finite number of particles, the discrete event-by-event pion spectra will have significant fluctuations that might overwhelm correlations one is looking for. We will develop modified DL-tools with CNN to identify the EoS in this more complex and more realistic dynamic scenario.

This paper is organized as follows: Sect. 2 introduces the hybrid simulation model. Section 3 discusses the neural network and the methods of the data pre-processing. Section 4 presents the performance of the trained CNN in different scenarios and comparisons with that of a fully-connected deep neural network (DNN). Finally, Sect. 5 summarizes the results and gives the conclusions. A gives the details of the neural network structure. B shows the simulated data and predictive performance on testing datasets by the trained neural network. C visualizes the training datasets in B with traditional observables.

2 Micro–macro hybrid model of relativistic heavy-ion collisions

The modeling of relativistic heavy-ion collision is mostly done by following a “standard prescription” for the spatio-temporal evolution of the collision dynamics. The initial state of the matter right after the violent collision is described by the “color glass condensate”, which consists of frozen primordial gluons and is assumed to isotropize within 1 fm/c [66,67,68,69,70]. These gluons may evolve rapidly in accordance with the classical Yang–Mills equation. A few fm/c later, they can achieve approximate local thermal equilibrium [71, 72] and may exist briefly as a Yang–Mills gluon plasma, which may quickly expand nearly isentropically due to the high initial temperature. The total entropy and energy are not yet distributed over quark–anti-quark degrees of freedom. Subsequently, quarks are produced by gluon–gluon collisions [67,68,69,70], forming a strongly coupled quark-gluon plasma (sQGP). The dynamical evolution of that QGP can be described approximately by macroscopic dissipative hydrodynamics [73,74,75,76,77,78]. Viscous corrections are included to describe some of the remaining deviation from local isotropy and thermal equilibrium. The EoS of the hot QGP medium, the constitutive element used to close the hydrodynamic equations, is one crucial input. As the medium expands and cools quasi-isentropically, the quark-gluon fluid will go through a smooth crossover, or hypothetically in this work as a control experiment, a first order phase transition. The nature of the QCD transition strongly affects the hydrodynamic evolution [79]. Different forms of transitions are associated with different pressure gradients which consequently lead to different expansion rates. As the matter becomes more dilute, it will form an expanding non-equilibrium hadronic matter with important final state effects. For instance, final absorption of the products of the resonance decays in the hadronic matter can substantially change the yields of the hadrons observed by the experimental detectors. This evolution of the hadronic matter can be successfully described by microscopic hadron cascade models [80,81,82].

To generate the data for the training of the CNN, we use the iEBE-VISHNU hybrid model [83], which can perform event-by-event simulations of relativistic heavy-ion collisions at different energies. Major components of this hybrid model include an initial condition generator (SuperMC), a (2+1)D second-order event-by-event viscous hydrodynamic simulator (VISHNew), a particle sampler (iSS) and a hadron cascade “afterburner” simulator (UrQMD).

This hybrid model uses either the Monte-Carlo Glauber (MC-G) [84,85,86] or the Monte–Carlo Kharzeev–Levin–Nardi (MCKLN)  [87, 88] model to generate the fluctuating initial conditions in the SuperMC module. The collision centrality can be set up as needed, based on the assumption that, on average, the final charged hadron multiplicity, \({\mathrm {d}}N_{\mathrm {ch}}/{\mathrm {d}}y\), is directly proportional to the initially produced total entropy in the transverse plane \({\mathrm {d}}S/{\mathrm {d}}y|_{y=0}\). The effect of viscous heating will cause a spread in the final \({\mathrm {d}}N_{\mathrm {ch}}/{\mathrm {d}}y\), which is considered small (2-3%) for a given \({\mathrm {d}}S/{\mathrm {d}}y|_{y=0}\).

Fig. 1
figure 1

Two different EoSs are implemented in the hydrodynamic simulation, as functions of the energy density. A crossover, based on a lattice QCD parametrization is compared with a first order phase transition with a transition temperature \(T_c=165\) MeV, obtained by a Maxwell construction. It is assumed that the baryon-chemical potential is exactly \(\mu _B=0\) throughout the whole simulation

The simulation with the hydrodynamic package VISHNew uses two different EoSs: (1) the crossover type EoS, based on a lattice-QCD parametrization [89], denoted as L-EOS; (2) the first order type EoS with a Maxwell construction [90] between a hadron resonance gas and an ideal gas of quarks and gluons, as Q-EOS. The transition temperature is \(T_c=165\) MeV. These two EoSs are depicted in Fig. 1.

After the hydrodynamic evolution, the fluid fields are projected via the Cooper-Frye formula into particles, which will then be further propagated in a hadronic cascade, the Ultrarelativistic Quantum Molecular Dynamics (UrQMD) model. In UrQMD, a non-equilibrium transport model, resonance decays and hadronic rescatterings are included in the simulation. In contrast to the hydrodynamic evolution, which is governed by the conservation of energy and momentum with the EoS, shear viscosity \(\eta \), bulk viscosity \(\xi \), particles are assumed to be in asymptotic states and the trajectories are given by straight-lines between the collisions in the hadronic cascade. The hadronic cascade evolution is not deterministic since the processes involve certain randomness, e.g., scattering angle, scattering probability and decay probabilities. Furthermore, the effects of finite number of particles, i.e., thermal fluctuations, are included since the cascade propagates the discrete particles instead of the average densities.

This hybrid model with some adjustable parameters can fit experimental data on final hadron spectra. These parameters include: the equilibration time \(\tau _0\), which defines the point when the local thermal equilibration is reached and the hydrodynamics evolution starts, the ratio of the shear viscosity to the entropy density \(\eta /s\), and the freeze-out temperature \(T_{sw}\), which defines the switch from the hydrodynamic evolution to the hadronic cascade.

We vary the model parameters in the generation of the training datasets to allow the neural network to capture the intrinsic features encoded in the EoS, instead of those biased by the specific setup of other physical uncertainties. This would require many events simulations for hundreds of different parameter combinations and centrality selections, to make sure that the neural network gains a sufficient generalizabilty. However, in practice this is impossible. Hence we focus on systematic changes of these parameters and study the performance of the network whence it reaches the boundary of these parameter values.

3 Neural network and data pre-processing

In Ref. [39], the DL-tool engine with a CNN has been shown to classify successfully the EoS in pure hydrodynamical simulations, on an event-by-event basis with a \(\thicksim 95\%\) accuracy. To apply this strategy to real experimental data, it’s crucial to perform realistic simulations with hadronic “after-burner” and resonance decays. In the present paper, the DL-tool engine is constructed for more realistic simulations of heavy ion collisions. The CNN architecture used here is similar to that discussed in Ref. [39]. We refer to that paper for technical details. An introduction to this new CNN network is presented in detail in Fig. 4 in A.

The input \(\rho (p_T,\Phi ) \equiv dN_{\pi }/dy dp_T d\Phi \) to this neural network is a histogram of the number of pions with 24 \(p_T\)-bins and 24 \(\Phi \)-bins. \(p_T\) denotes the transverse momenta of observed pions in the final state, while \(\Phi \) denotes the azimuthal angles. Only pions with \(p_T\le 2 \mathrm {GeV}\), rapidity \(|y|\le 1\) and \(\Phi \in [0, 2\pi ]\) are accepted and accounted in the histogram.

In general, training or learning algorithms benefit a lot from pre-processing of the datasets. The input to the neural network used here, pion spectra \(\rho (p_T,\Phi )\), is a \(24\times 24\) matrix. One refers to each matrix element as one “feature” and each matrix as one “sample”. The pre-processing of the input data can be applied in a feature-wise (per feature) or sample-wise (per input sample) manner.

In the feature-wise standardization, the input \(\rho (p_T,\Phi )\) of all the training samples are pre-processed in a sample-interdependent manner. Each feature is subtracted with the mean over all training samples and is divided by their standard deviation. In this way, all features are centered around zero and have variances of the same order. Thus it is prevented that one feature with larger variance dominates the objective function over other features. The transformation is saved and then will be applied in the testing samples. With this standardization, the testing data should be simulated in one of the same collision systems as the training data, since the multiplicity in different collision systems differ a lot.

In the sample-wise standardization, or min–max normalization, the input \(\rho (p_T,\Phi )\) are pre-processed in a sample-independent manner. Each \(24\times 24\) matrix can be rescaled to have a zero mean and a unit variance, or to a specific range, such as \([-\frac{1}{2}, \frac{1}{2}]\), respectively. The latter choice is used in Ref. [39] with success.

Our training results show that feature-wise standardization does always perform better than the other two sample-wise methods. Hence we will show in the following only the results of the feature-wise standardization.

4 Training and testing results

A systematic analysis of the performance of the above described CNN is presented for hybrid modeling for relativistic heavy-ion collisions. Here an important aspect is the generalizability of the trained CNN model in the testing stage. The overfitting of the network to the training data will be checked on the validation data which are generated with the same physical parameter set in modeling the training data. The testing is performed on the testing datasets which are generated with different physical parameter sets in modeling the training data. The generalizability of the CNN model with respect to different physical parameter sets is studied systematically. In the previous study with pure hydrodynamics [39], the training data are generated with a viscous (3+1)D hydrodynamics model, CLVisc [77], with AMPT initial conditions [91], while the testing data are generated with a viscous (2+1)D hydrodynamics model, VISHNew, with Monte-Carlo Glauber initial conditions, which are used in a hybrid model in this work for the training data generation instead. However, here we find that, even in the pure hydrodynamic study, reversing the simulation models for training and testing data generation will obtain a testing accuracy only about 70%, from which we suspect some superiority of (3 + 1)D hydrodynamics model with AMPT initial conditions over other ones. Thus in this work, we would not be able to discuss the generalizability of the CNN model with respect to different hybrid simulation models.

4.1 Hybrid model with late transition to cascade

The CNN in the previous study [39] was directly trained using primordial pion spectra, obtained from a numerical integration of the Cooper–Frye formula over the freeze-out hypersurface in the hydrodynamics. In such a scenario, one neglects the fluctuations due to the finite number of hadrons. In addition, a significant portion of pions originating from resonance decays also need to be taken into account. In this section, we study the influence of the aforementioned effects on the predictive power of the CNN. To see the influence of the finite number of particles and resonance decays, we first assume a late transition from hydrodynamics to the UrQMD cascade by taking the switching temperature the same value as the hydrodynamics freeze-out temperature used in Ref. [39], \(T_{sw}=137\) MeV. In this scenario, the duration and influence of the hadronic cascade are significantly diminished and we are left with the effects of the finite number of particles and resonance decays as compared to the pure hydrodynamics modeling.

4.1.1 Event-by-event input, switch at \(T_{sw}=137\) MeV

In this sub-scenario, the event-by-event pion spectra \(\rho (p_T, \phi )\) are taken as the input to the CNN. 12 training datasets are generated by the iEBE-VISHNU hybrid model with the fluctuating MC-Glauber initial condition and 6 different fine centrality bins with 1% width in the centrality range 0-60% in two collision systems, respectively. We set the ratio of the shear viscosity to entropy density as \(\eta /s=0.08\) and 0.00, the equilibration time as \(\tau _0=0.5\) and \(0.4~\mathrm {fm/c}\) in the collision systems Pb+Pb \(\sqrt{s_{NN}}=2.76~\mathrm {TeV}\) and Au+Au \(\sqrt{s_{NN}}=200~\mathrm {GeV}\), respectively. The details of the datasets are shown in Tables 1 and 2 in B. About 44000 events with two different EoSs are generated in total. Figure 6 in C shows the event-by-event normalized \(p_T\) spectra and the elliptic flow \(v_2\) as a function of \(p_T\) of these training datasets with two EoSs. These two one-dimensional traditional observables are non-distinguishable by the human eye with respect to the EoSs. Thus it’s not trivial to identify the EoS from just final-state pion \(p_T\) spectra. The negative elliptic flow \(v_2\) in Fig. 6 shows that there are great fluctuations in the event-by-event spectra.

The validation accuracy is found to be about 83.5% after 1000 epochs training. This validation accuracy indicates that high-level correlations are extracted from the two-dimensional pion spectra \(\rho (p_T, \phi )\) to identify the EoS. However, it is significantly lower than that in pure hydrodynamics modeling [39], where a validation accuracy up to 99% was obtained. This implies that the fluctuations due to the finite number of particles and resonance decays overwhelm some correlation information from the early dynamics to the final-state particle spectra and thus result in the “overlap” between these two types of event-by-event spectra with different EoSs, which hinders the discrimination between them.

4.1.2 Cascade-coarse-grained input, switch at \(T_{sw}=137\) MeV

To mitigate the effect of fluctuation due to the finite number of particles and resonance decays, we average the pion spectra over a certain number of events. In the model simulations one can repeat the hadronic cascade for any number of times for the same hydrodynamic evolution. Then the pion spectra averaged over these simulations are taken as the input for training, which will be called “cascade-coarse-grained input”. We would like to find out whether such an event averaging will improve the network performance due to the statistics enhancement or worsen it due to the information loss.

In this sub-scenario, 2 training datasets are generated by the iEBE-VISHNU hybrid model with the fluctuating MC-Glauber initial condition in the centrality range 0–50%. The details are shown in Table 3 in B. In total, 15747 events are generated with two different EoSs. The hadronic cascade is repeated 30 times after each hydrodynamics evolution. The spectra averaged over these 30 events are taken as the input to the network. The validation accuracy with these cascade-coarse-grained spectra \(\rho _c(p_T, \phi )\) can achieve about \(92\%\). One can see that such averaging over cascade-stage is beneficial in identifying the EoS information in early dynamics from the final-state particle spectra. This means that the statistics matters a lot for using particle spectra to decode the EoS information.

4.1.3 Event-fine-averaged input, switch at \(T_{sw}=137\) MeV

One drawback of the above average procedure is that the separation of collision dynamics into hydrodynamic and hadronic cascade stage is purely theoretical. Thus from a realistic point of view, an averaging procedure based on experimentally controllable event filtering is preferable. In this sub-scenario, spectra are averaged within the same fine centrality bin (with 1% width) instead, which will be called “event-fine-averaged input” in the following. To be specific, we average the spectra of 30 random events within the same fine centrality bin in Tables 1 and 2 as the input to the network to accumulate the statistics. Figure 7 in C shows the 30-events-fine-averaged normalized \(p_T\) spectra and the elliptic flow \(v_2\) as a function of \(p_T\) of these training datasets with two EoSs. These two one-dimensional traditional observables are still not distinguishable by eye. By comparing with the corresponding event-by-event observables as shown in Fig. 6, one can see that the fluctuations are significantly reduced in the 30-events-fine-averaged spectra. This manner of averaging reduces the fluctuations from the initial conditions besides that from hadronic cascade and resonance decays. Consequently, a surprisingly obvious improvement for the CNN performance in classifying the two types of EoS is made. The validation accuracy reaches about \(99\%\) with the 30-events-fine-averaged spectra \(\rho _a(p_T, \phi )\) after 1000 epochs training, a value similar to that in the pure hydrodynamic case [39]. In principal, one can include more datasets generated in different fine centrality bins for training. However, we confirm that it’s enough to use the datasets simulated only in 6 representative fine centrality bins as in Tables 1 and 2, respectively, for training, since the predictive performances on the datasets simulated in other unselected fine centrality bins are as high as the training accuracy. This demonstrates that non-trivial high-level correlations which are independent of the centrality bins are learned by the neural network.

After the training with validation, the trained network is confronted with the testing data, which are generated with different physical parameter sets in simulations to explore the network’s generalizability. In Tables 4 and 5 in B we show the predictive performance of the neural network trained with the 30-events-fine-averaged spectra. A testing accuracy 95% on average is obtained on the testing data simulated in the centrality range 0–50% with MC-Glauber or MCKLN initial conditions. This evidently demonstrates that the trained neural network is robust against different model setups such as initial conditions, \(\tau _0\), \(\eta /s\) and \(T_{sw}\) in a range between [130, 142] MeV. We observe a slight centrality dependence of the predictive accuracy in the collision system Pb+Pb \(\sqrt{s_{NN}}=2.76~\mathrm {TeV}\), which decreases for more peripheral events.

Fig. 2
figure 2

Training and validation accuracy (upper panel) and loss (lower panel) in three different sub-scenarios with switching temperature \(T_{sw}=137\) MeV. These three sub-scenarios refer to the 30-events-fine-averaged spectra (purple and brown), the cascade-coarse-grained spectra (red and green) as well as the event-by-event spectra (blue and orange)

4.1.4 A hierarchy of the accuracy in the above sub-scenarios

Figure 2 shows the training and validation accuracy (upper panel) and loss (lower panel), respectively, by the CNN with the same setup for the first 1000 epochs in three aforementioned sub-scenarios. In each sub-scenario, training and validation accuracy (loss) are still close after 1000 epochs training, which implies that over-fitting is avoided. Besides, the network has not been sufficiently trained in the cascade-coarse-grained sub-scenario after 1000 epochs as the accuracy (loss) is still increasing (decreasing).

A clear hierarchy of the prediction accuracy is observed when the averaging is performed over more and more stages of the simulated dynamics. The CNN with event-by-event spectra gives the lowest accuracy, while the one with the 30-events-fine-averaged spectra gives the highest one, which is as high as in the pure hydrodynamic study [39].

4.2 Hybrid model with early transition to cascade

The scenario with early transition from hydrodynamics to hadronic cascade in hybrid modeling is in accordance with a widely used choice of the switching temperature \(T_{sw}>150\) MeV. This scenario is different from the one discussed in the previous subsection in two aspects. Firstly, the higher switching temperature decreases the contribution from the primordial pions which are directly emitted from the hydrodynamic evolution, and increases the contribution from resonance decays. Secondly, the elongated duration of the hadronic cascade stage may further blur out the imprint of the phase transition encoded in the final-state particle spectra. In the following, we will study how a higher switching temperature affects the performance of the CNN in three aforementioned sub-scenarios, respectively.

4.2.1 Event-by-event input, switch at \(T_{sw}>150\) MeV

In this sub-scenario, 9 training datasets are generated by the iEBE-VISHNU hybrid model with the fluctuating MC-Glauber initial condition in the centrality range 0–50%. The switching temperature is \(T_{sw} = 160\) MeV. Two different values for the equilibration time \(\tau _0\) and the ratio of shear viscosity to entropy \(\eta /s\) are used in the simulations. The details are shown in Tables 6 and 7 in B. In total, about 60000 events are generated with two different EoS types.

The validation accuracy is found to be about \(78\%\) for the CNN trained with these event-by-event spectra as input. This validation accuracy is lower than that in the sub-scenario with late transition (switching temperature \(T_{sw} = 137\) MeV). This decrease in the validation accuracy can be understood as a result of the increased contribution from resonance decays and the elongated duration of the hadronic rescattering.

4.2.2 Cascade-coarse-grained input, switch at \(T_{sw}>150\) MeV

In this sub-scenario, the cascade-coarse-grained pion spectra \(\rho _c(p_T, \phi )\) are taken as the input to the CNN. 2 training datasets are generated in analogy to the previous late transition case, by the iEBE-VISHNU hybrid model with the fluctuating MC-Glauber initial condition in the centrality range 0–50\(\%\) with the hadronic cascade simulated 30 times individually after each hydrodynamic evolution. The switching temperature \(T_{sw}\) is set to be 155 or 160 MeV. The details are shown in Table 8 in B. About 24000 events with two different EoSs are generated in total. The validation accuracy is found to be 87.5\(\%\) at most, which is also lower than that in previous sub-scenario with late transition to cascade.

4 testing datasets are generated in this sub-scenario as shown in Table 9 in B in the centrality range 0–50%. Both MC-Glauber and MCKLN initial conditions are used, and simulation parameters are varied from the training datasets to check the generalizability of the CNN. After training and validating the neural network, the testing accuracy on these datasets is \(83\%\) on average, which is slightly lower than the validation accuracy.

4.2.3 Event-fine-averaged input, switch at \(T_{sw}>150\) MeV

In this sub-scenario, the 30-events-fine-averaged spectra for training is explored with the switching temperature \(T_{sw} = 160\) MeV. This input is generated by the average over the spectra of 30 independent events within the same fine centrality bins (with 1% width) shown in Tables 6 and 7. The validation accuracy can also reach up to \(99\%\) in this sub-scenario as in the previous late transition one. The testing accuracy is up to \(95\%\) on average on the testing datasets as shown in Table 10 in the B. We also observe a slight centrality dependence of the predictive accuracy in the collision system Au+Au \(\sqrt{s_{NN}}=200~\mathrm {GeV}\), which decreases for more peripheral events.

Its also interesting to further check the performance of the neural network on the testing datasets which employ temperature-dependent shear viscosities. Here taking this sub-scenario for example, we evaluated the network’s prediction accuracy on the testing datasets in Table 11 where four temperature-dependent shear viscosities are employed in hybrid simulations as shown in Fig. 5 (labelled as 1–4, respectively). The first two are taken from Ref. [92], while the last two are taken from the Bayesian analysis estimations [35, 36], respectively). The results show that the performance is robust against the setup of these temperature-dependent shear viscosities as compared with Table 10.

4.3 Comparison with fully-connected deep neural network

As already discussed in Sect. 4.1, the event-by-event and 30-events-fine-averaged normalized \(p_T\) spectra and elliptic flow \(v_2\) with two different EOS from all centrality bins in Tables 1 and 2, as shown in Figs. 6 and 7, respectively, are non-distinguishable within the range of event-by-event fluctuations. However, one can observe that the peaks of the normalized \(p_T\) spectra with Q-EOS are higher than that with L-EOS on the whole. In Figs. 89 and 10 in C, we show the event-by-event, 30-events-fine-averaged and all-events-fine-averaged normalized \(p_T\) spectra (left panel) and elliptic flow \(v_2\) (right panel) solely from centrality bin 14–15% in Pb+Pb collision \(\sqrt{s_{NN}}=2.76\) TeV in Table 1, respectively. Within the same centrality bin one can see that the all-events-fine-averaged normalized \(p_T\) spectra are distinguishable with respect to different EOSs, 30-events-fine-averaged normalized \(p_T\) spectra are almost distinguishable from certain \(p_T\) bins, while the event-by-event normalized \(p_T\) spectra are still not. In Fig. 11 in C, we show the all-events-fine-averaged normalized \(p_T\) spectra (upper left panel) and elliptic flow \(v_2\) (upper right panel) as well as the first (lower left panel), second (lower middle panel) and third (lower right panel) derivatives of the normalized \(p_T\) spectra from all centrality bins in Tables 1 and 2. One can see that these all-events-fine-averaged normalized \(p_T\) spectra are not distinguishable again by the human eye. Their derivatives are also helpful to distinguish the EoS in certain \(p_T\) bins, which might lead us to construct novel observables from normalized \(p_T\) spectra in the future. Inspired with this observation, we use the normalized \(p_T\) spectra as the input to a fully-connected DNN to distinguish the EOSs as a first try. In this case, the normalized \(p_T\) spectra are regarded as a whole instead of isolated points at each \(p_T\) bin as regarded by the human eye, and high-level correlations including but not limited to high-order derivatives can be extracted supervisely.

We train a fully-connected DNNFootnote 1 with the event-by-event normalized \(p_T\) spectra from all centrality bins in Tables 6 and 7 as the input. The validation accuracy is about 74%, which is below that by CNN with two-dimensional spectra, about 78%. Here the correlations are not very strong in both cases due to the fluctuations from the particlization and “afterburner”. When the 30-events-fine-averaged normalized \(p_T\) spectra are taken as the input instead, the validation accuracy is about 97%, which is also a little below that by CNN with two-dimensional spectra, about 99%. Here the correlations are very strong in both cases. As for the testing accuracy, CNN with two-dimensional spectra outperforms fully-connected DNN with one-dimensional spectra by about \(8\%\) with 30-events-fine-averaged spectra. Apparently, in the above cases, fully-connected DNN with one-dimensional normalized \(p_T\) spectra can capture the main correlations, while CNN with two-dimensional spectra performs better and improves the generalizability.

When the event-by-event normalized \(p_T\) spectra from all centrality bins in Tables 1 and 2 with \(T_{sw}=137\) MeV and in Tables 6 and 7 with \(T_{sw}=160\) MeV are taken as the input to the fully-connected DNN, the validation accuracy is about \(62\%\), which is much lower than that by CNN with two-dimensional spectra, about \(69\%\). This shows that when physical parameters in the simulation model vary a lot in the generation of the training data, the normalized \(p_T\) spectra are more difficult to distinguish and CNN with two-dimensional spectra will outperform fully-connected DNN with one-dimensional normalized \(p_T\) spectra.

Fig. 3
figure 3

Comparison between the validation accuracy in all the different sub-scenarios studied. The green star depicts the pure hydrodynamic result [39]. The orange square, the purple triangle and the red filled circle symbols depict the results for the 30-events-fine-averaged, cascade-coarse-grained and event-by-event spectra, respectively, in different switching temperatures

5 Summary and conclusion

We extended a previous exploratory study on identifying EoS in the modeling of heavy ion collisions from hadron spectra using DL technique [39]. In this extended study, we consider more realistic hybrid modeling for heavy-ion collisions, where hadronic cascade “afterburner” with finite number of particles and resonance decays are properly taken into account. In the hybrid modeling the final-state particle spectra are histograms containing large fluctuations and thus are different from those in the previous study [39], which are smooth hadron spectra from Cooper-Frye prescription with perfect statistics. Fig. 3 summarizes the predictive performances on the validation datasets in the above exploratory studies of different sub-scenarios.

We have demonstrated that, after the hydrodynamic evolution, stochastic particlization, hadronic cascade and resonance decays, the information about EoS in early dynamics is preserved in the final-state pion spectra, from the perspectives of deep CNN, as shown in Fig. 3. The event-by-event input for the network can reveal the EoS-type information with about 80% classification accuracy in binary classification setup.

The downward trend for the performance of network in validation with respect to the switching temperature in Fig. 3, implies that more stochasticity from the resonance decays and the elongated hadronic cascade will diminish the correlation between the EoS information in the early dynamics and the final-state particle spectra. This is in accordance with the common physical interpretation.

Finally, the hierarchy of the validation accuracy in different sub-scenarios in Fig. 3 shows that proper enhancement of statistics and reduction of fluctuations from either the final hadronic dynamics or together with the initial conditions in the input data are found to facilitate the revealing of the EoS information by the network from final-state particle spectra.

Fig. 4
figure 4

The architecture of our convolution neural network (CNN) for identifying the QCD transition type by using pion spectra with 24 transverse momentum \(p_T\) bins and 24 azimuthal angle \(\Phi \) bins

In conclusion, deep CNN can decode the imprint of the EoS in hydrodynamic evolution (encoded within the phase transition dynamics) on the final-state pion spectra from heavy-ion collisions. The good performance of the network does demonstrate that this “EoS-encoder” works. The fingerprint of the early dynamics of the bulk matter is not washed out by the evolution even when stochasticity is increased due to the hadronization and sequential hadron dynamics. Deep CNN provides an effective decoding method to extract high-level correlations from two-dimensional final-state pion spectra, which are immune to different physical factors, such as centrality bins. In relatively simple cases, fully-connected deep neural network can also identify the EoS from normalized pion \(p_T\) spectra with close validation accuracy as CNN does, which can lead us to discover new observables sensitive to EoS from normalized pion \(p_T\) spectra. The generalizability of the learned features with respect to other simulation models also depends on the simulation model for the training data generation. In the present study, the training data is generated with well tested iEBE-VISHNU (VISHNew + UrQMD) hybrid model. In the future we will explore how to capture the features which can be generalized to the testing data from other models as well as experimental data. Possible applications of the framework developed here can be extended to classifying fluctuating initial conditions, extracting transport coefficients of QCD matter, analysis of real experimental data filtering and pre-processing, and detector calibration.