Sparse modeling approach to obtaining the shear viscosity from smeared correlation functions

We propose the sparse modeling method to estimate the spectral function from the smeared correlation functions. We give a description of how to obtain the shear viscosity from the correlation function of the renormalized energy-momentum tensor (EMT) measured by the gradient flow method ($C(t,\tau)$) for the quenched QCD at finite temperature. The measurement of the renormalized EMT in the gradient flow method reduces a statistical uncertainty thanks to its property of the smearing. However, the smearing breaks the sum rule of the spectral function and the over-smeared data in the correlation function may have to be eliminated from the analyzing process of physical observables. In this work, we demonstrate that the sparse modeling analysis in the intermediate-representation basis (IR basis), which connects between the Matsubara frequency data and real frequency data. It works well even using very limited data of $C(t,\tau)$ only in the fiducial window of the gradient flow. We utilize the ADMM algorithm which is useful to solve the LASSO problem under some constraints. We show that the obtained spectral function reproduces the input smeared correlation function at finite flow-time. Several systematic and statistical errors and the flow-time dependence are also discussed.


Introduction
One of the most important results obtained by RHIC (Relativistic Heavy Ion Collider) experiment is an elliptic flow of QCD particles [1,2]. The elliptic flow data using a Boltzmann-type equation for gluon scattering indicates a large cross section above the phase transition temperature (T c ) [3]. Furthermore, this elliptic flow is well explained by the hydrodynamics of the quark-gluon-plasma (QGP) state [4][5][6][7][8][9][10][11]. Thus, the QGP state is not described by a free gas system of perturbed gluons. It might be natural since around T c the interaction force between the quarks and gluons becomes strong. On the other hand, the numerical simulation of the relativistic liquid system gives that the upper limit of the ratio between the shear viscosity (η) to the thermal entropy (s) is very small, η/s ≤ 0.4 [11].
It turns out that the QGP has the perfect liquid property. The result gives us a non-sticky image of QGP states even though the QCD particles are strongly interacting. To make these curious properties of QGP clear, a considerable number of studies have been conducted on the determination of the shear viscosity from the first principle calculation in the pure SU(3) gauge theory [12][13][14][15][16][17][18][19][20][21][22].
The shear viscosity is given by the first-differential coefficient of the spectral function at zero frequency (η(T ) = πdρ(ω)/dω at ω = 0). The spectral function is defined by the Euclidean correlation function of the renormalized spacial energy-momentum tensor (EMT) in the static state as follows: (1.1) Here, K(τ, ω) denotes the kernel of an integral transform. In the lattice simulations, C(τ ) is measured by generated configurations using the first equality sign, and then ρ(ω) is estimated through the second one. During these calculations, there are at least three difficulties to obtain ρ(ω): (i) How to define the renormalized EMT on the lattice (ii) how to improve a bad signal-to-noise ratio of the correlation function of EMT (iii) how to estimate ρ(ω) from the limited number of the data C(τ ) on the lattice.
As for the first and second problems, a promising method, called the gradient flow method, has been provided in Ref. [23,24]. By using the UV finiteness of the flowed operators in the gradient flow [25,26], we can obtain the renormalized EMT without hard numerical calculation of Z-factor for quenched QCD. Furthermore, thank the smearing property of the gradient flow, the statistical uncertainty is suppressed in this method. On the other hand, it is notable that we have to analyze the data only in the fiducial window of the flow-time to obtain the thermodynamic quantities using the gradient flow method in Ref. [24]. The range is given by 2a √ 8t T −1 = N τ a, where both a strong lattice discretization and over-smearing corrections are suppressed.
The third problem is well known as an ill-posed inverse problem. Especially in the finite temperature system, the number of sites in the temporal direction is very limited, then it makes harder to solve the inverse problem. If we use the gradient flow method to obtain the correlation function, taking the data only in the fiducial window makes the third problem worse, since the available number of the data is further limited. To see the origin of the demerit, we show the smeared correlation function C(t, τ ) in Fig. 1. We find that the slope of C(t, τ ) in the short (and long because of the periodicity) τ -regime shows the discrepancy from the non-smeared data presented in Ref. [14]. The discrepancy becomes large if the distance of two operators (τ ) gets shorter than the smeared length ( √ 8t) by the gradient flow since the smeared regime of one operator reaches at the position of the other operator in the correlation function. Now, a question arises: whether we can straightforwardly apply ordinary estimation methods of ρ(ω) to such a smeared correlation functions. Until now, the fitting with some ansatz (e.g., the Breit-Wigner ansatz) and the Bayesian methods (e.g., the maximal entropy method [28,29]) have been utilized for the estimation of the spectral function in the case Figure 1. Flow-time dependence of the T 12 correlation function. Here, the data of β = 6.72, N 3 s × N τ = 48 3 × 12 (T = 1.65T c ) are plotted. Magenta curves denotes the best-fit function with 1-σ error of the fitting parameters which is reconstructed by the estimated spectral function by using the Breit-Wigner ansatz given in Ref. [14] . It indicates a non-smeared correlation function. r smear T = √ 8t/aN τ , where t denotes the flow-time. This plot is firstly shown in Ref. [27].
of the non-smeared correlation function, C(τ ). For the smeared EMT correlation function obtained by the gradient flow method in N f = 2 + 1 QCD, the fitting analyses with the Breit-Wigner ansatz and the hard thermal loop ansatz for ρ(ω) have been applied [30]. The authors have tried to fit the data of C(t, τ ) only in the limited τ -regime to eliminate the over-smeared data. However, the analysis for the eliminated C(t, τ ) using the same functional form of ρ(ω) is questionable since originally the non-smeared C(τ ) and ρ(ω) are related via the integration equation Eq.(1.1) and C(τ ) at each τ gets the corrections from all ω-regime. Thus, the lack of C(t, τ ) at several τ may change the functional form of the spectral function. As the other directions, several methods to reconstruct the spectral function with smearing have been also proposed in Refs. [31][32][33][34][35][36].
In this work, we propose the sparse modeling method to estimate the spectral function at finite flow-time, which is based on Bayesian statistics. Here, we need not introduce any functional form of the spectral function. The sparse modeling method can be applied to both non-smeared and smeared correlation functions. It has been utilized in board topics, e.g. MRI [37] and taking a picture of a large blackhole [38] (see also a recent review Ref. [39]). As for the estimation of spectral function, it has been succeeded for several analyses in the many-body system [40,41]. Utilizing the intermediate-representation basis (IR basis) [40,[42][43][44][45] and a few reasonable constraints, the sparse modeling makes it possible to estimate the real-frequency representation of ρ(ω) from a sparse imaginary-time correlation function.
We formulate the sparse modeling analysis for C(t, τ ) only in the fiducial window.
Here, we carry out the singular value decomposition (SVD) of the kernel K(τ, ω) in the IR basis, which is independent of the Monte Carlo data. The singular values of the kernel decay exponentially or even faster, then we drop the degree of freedom for the correlation function and the spectral function in the IR basis associated with the sufficiently small singular values. Then, we find the spectral function to minimize the square error given by the difference between the input correlation function and the output one reconstructed by ρ(ω). During this process, the L 1 regularization term is added to be consistent with the truncation of the degree of freedom in the IR basis. Technically, we utilize the ADMM algorithm to solve the optimization problem with such a regularization term, which is recently proposed by S. Boyd et al. [46,47]. In this work, we demonstrate that the sparse modeling analyses using very limited data for the quenched QCD at finite temperature. This paper is organized as follows: In §. 2, we give a review of the sparse modeling method. In §. 3.1, we explain how to calculate the correlation function of the renormalized EMT for the quenched QCD in the gradient flow method. In §. 3.2, we give a standard description to obtain the shear viscosity, where we assume that the correlation function in the whole τ -regime is available for the sparse modeling analysis. Then, we modify the standard description to analyze the smeared correlation function in which the over-smeared data are eliminated in §. 3.3. Section 4 presents the simulation setup of this work. We show the result of the sparse modeling analysis using the central value of the smeared correlation function in §. 5. We find that it works well since the reconstructed correlation function from the obtained spectral function is almost consistent with the input data. §. 6 contains the error estimations. We investigate several systematic errors; ω cut , τ -regime, and the flowtime dependence in our analysis, and also estimate the statistical uncertainty.. Although the statistical error of our result is sizable, we show that the bootstrap analysis is promising for estimating the statistical error of this analysis, since an expected relationship between the correlation function and the spectral function in the IR basis is satisfied. In this work, we have only 2, 000 configurations for the measurement of the correlation function, and it is very few in comparison with 800, 000 and 6 million configurations in Refs. [13,22]. Judging from such a poor statistic, we will give up the precise determination of the shear viscosity and focus on the feasibility of the sparse modeling method to obtain the spectral function combing with the gradient flow method. The last section is devoted to the summary.

Sparse modeling method
Here, we give a brief review of the sparse modeling method in the IR basis following Refs. [40,41] (see also a review paper [39]).
The input is the imaginary-time Green's function C(τ ). Its Fourier transform C(iω n ), where ω n denotes a Matsubara frequency, is related with the spectral function ρ(ω) as ρ(ω) = 1 π ImC(ω + i0) by replacing iω n with ω + i0. Then, the input C(τ ) is written by the integral form of the spectral function, where 0 ≤ τ ≤ 1/T . The kernel K in this work is given by , (2.2) but in this section the discussion does not depend on the details of the kernel. Here, we assume that the kernel is described by some exponential function as in many systems. For simplicity, we denote Eq.(2.1) using the dimensionless vectors as 3) The component of the vector C is C i ≡ C(τ i ), where τ i labels the temporal site on the lattice with 0 ≤ τ i ≤ N τ − 1. The kernel K becomes N τ × N ω matrix, while ρ denotes a vector whose component is ρ j ≡ ρ(ω j ) with j = 1, · · · , N ω . Our goal is to find ρ to minimize the square error Here, · 2 stands for the L 2 norm defined by ρ 2 ≡ ( j ρ 2 j ) 1/2 . Note that the vector C is sparse, so that the amount of the information in C is much smaller than that in ρ. The fact leads to the instability of ρ.
We have only the Monte Carlo data of C(τ ), which is not the same with the true value of C and the data have the statistical uncertainty. Here, we call the discrepancy between the Monte Carlo data and the true value of C a "noise".
Equation (2.3) gives simultaneous linear equations, where the number of equations is N τ while one of the unknown variables is N ω . We can solve the equation if N τ = N ω and the kernel is given by a regular matrix, while the unique solution does not exist and several possible solutions are allowed in the case of N τ < N ω .
Here, we would like to choose a stable solution against the noise among these possible solutions. For this purpose, we take an efficient basis called an IR basis. By introducing the singular value decomposition (SVD) of the matrix K defined as (2.5) We rewrite Eq.(2.3) as where S is an N τ × N ω diagonal matrix, and U and V are unitary matrices of size N τ × N τ and N ω × N ω , respectively. The new vectors in the IR basis imply the square error Eq.(2.4) as Thus, at the minimum point of the square error the l-th element almost satisfies It turns out that the contribution of ρ l to χ 2 ( ρ ) is weighted by the corresponding s l . The singular values s l with l = 1, 2, · · · of this type of kernels decay exponentially or even faster. Then, for example, if double precision numbers are used, C l can not include the information about ρ l , where s l /s 1 is less than 10 −16 on a computer. It does not depend on simulation details since a kernel K is fixed. This fact naturally explains the reason why a tiny noise of C(τ ) leads to a large difference of ρ(ω). Therefore, a tiny fluctuation in the left-hand side can affect the "best" inference of ρ(ω), and several "likely" solutions satisfy Eq. (2.1).
In actual calculations, the component s l ρ l with sufficiently small singular values gives a negligible contribution to χ 2 ( ρ). Then, by introducing some threshold s cut , we can drop such a component l > l cut , where s l < s cut . This truncation leads us to obtain a stable solution which is robust against the noise of C(τ ). Now, the vectors in the IR basis are reduced to the l cut -component vectors. Therefore, we would like to find the solution ρ l in Eq. (2.8), where many components of ρ takes zero to be consistent with the truncation. To search for such a solution ρ l , we consider the cost function with an L 1 regularization term, where λ is a positive constant and · 1 denotes the L 1 norm defined by This λ plays the role of the Lagrange multiplier. If the value of λ is smaller than the contribution of the noise for C, then the obtained ρ must be consistent with the true spectral function within the uncertainty coming from the noise.
An intuitive picture of the role of the L 1 regularization is following. The L 1 regularization ( ρ 1 =const.) gives a linear constraint for the components of ρ , and is graphically drawn by dotted and solid lines in Fig. 2. The minimize problem of the cost function turns to tune λ to minimize the sum of two constants: C − S ρ 2 2 is a constant and ρ 1 is the other constant. In Fig. 2, we see that the open-circle point becomes the most favorable. Note that at the open-circle point, ρ 2 vanishes and the number of the effective components is reduced.
Actually, according to the example analyses in Ref. [39,41], the spectral function becomes featureless for λ > λ opt , while artificial spikes appear for λ < λ opt . In other words, the former case corresponds to the under-fitting, where the L 1 regularization term is too strong and the number of components ρ l is too reduced. On the other hand, the latter case does to the over-fitting, where the L 1 term is too weak and the vector ρ has redundancy.
The optimization problem with these L 1 and/or L 2 regularization is called the LASSO (Least Absolute Shrinkage and Selection Operators) problem [48]. It allows obtaining the global minimum regardless of initial conditions [46,47]. As an additional constraint, we will add the non-negativity condition of the spectral function, ρ(ω) ≥ 0, (2.12) in this work. Furthermore, we can also use the sum rule, as an additional condition, while this constraint is not used in this work. The numerical algorithm to solve the optimization problems with these constraints has been developed [46,47], and it is called the alternating direction method of multipliers (ADMM) algorithm. See Appendix A for the detail of the algorithm.

Shear viscosity in quenched QCD
In this section, we explain how to obtain the shear viscosity for the quenched QCD in the lattice simulation. Firstly, we explain the calculation strategy of the correlation function of EMT using the gradient flow. Secondly, we give a standard formula for the sparse modeling method. Here, we assume that the correlation function in the whole τ -regime is available for the sparse modeling analysis. Then, we modify the standard formula for the smeared correlation function at finite flow-time, where we eliminate the over-smeared data.

Measurement of the correlation function of EMT in the gradient flow method
The renormalized EMT for the quenched QCD in the gradient flow method [23] is given by at a flow-time t. Here, we utilize the small flow-time expansion and drop O(t) corrections. The coefficients α U , α E are calculated perturbatively in Ref. [23]. An advantage of the usage of the gradient flow method is the absence of Z-factor to define the renormalized EMT. Once we use the renormalized coupling constant in the coefficient α U , α E , all composite operators constructed by the flowed gauge field are the UV finite at positive flow-time (t > 0) [26]. The last term in right-hand-side, · 0 , describes the vacuum expectation value (v.e.v.), which relates to the zero point energy. U µν (t, x) and E(t, x) denote gauge-invariant local products of dimension 4.
In the continuum theory, Here, G µν represents the field strength constructed by the flowed gauge field (B µ (t, x)). This flowed gauge field is a solution to the gradient flow equation, where A µ (x) denotes the original quantum gauge field variable. The Fourier components B µ (p) have a suppression factor e −p 2 t . Therefore, the flow-time (t) plays a role of the UV cutoff in momentum space. In the coordinate space, the flowed gauge field can be interpreted by the smeared field in the range of |x| < √ 8t. On the lattice, G µν operator can be calculated by the clover-leaf operator, whose size is 2a. The relationship Eq. (3.1) is useful in the range of 2a < √ 8t, where the corrections of the lattice discretization would be mild.
The numerical calculation using this formula has firstly done in Ref. [24], where the expectation value of the EMT has been measured for the quenched QCD at finite temperature. The expectation value of EMT is directly related to the bulk quantities, e.g. integration measure and thermal entropy. The results are consistent with the ones obtained by the integration method within the statistical error. Thanks to the smearing effect of the flowed fields, the statistical uncertainty is smaller than the other method. On the other hand, we need to take a fiducial window of the flow-time, 2a < √ 8t < aN τ /2, to avoid a strong lattice discretization and over-smearing corrections. Now, we calculate the two-point function of EMT. It is related to the shear and bulk viscosities. In this work, we focus on the former one, which is given by the correlation function of T R 12 component. The Euclidean correlation function is defined as On the lattice at a finite flow-time t, we measure the two-point function of U 12 (t, x), where the second argument of U 12 (t, x) denotes a four-dimensional vector x = (τ /a, x/a).
Although in an ideal double limits, a → 0 and then t → 0 limits, this smeared correlation function goes to the Euclidean correlation function (Eq. (3.3)), practically it is better to calculate physical observables from the smeared correlation function and take the double limits for the observables directly. In this work, we would like to obtain the shear viscosity from C(t, τ /a) at finite flow-time. How to treat C(t, τ /a) in the sparse modeling analysis will be discussed in §. 3.3.

Estimation of the spectral function using the sparse modeling method
Here, we give a standard formula for the sparse modeling method. We assume that C(τ ) is obtained after the double limits or C(t, τ ) in the whole τ -regime is available for the analysis. In this section, we simply denote these correlation functions as C(τ ). The correlation function can be expressed in terms of the corresponding spectral functions (ρ(ω)) as . (3.5) The spectral function has the following properties, The shear viscosity is given by from the spectral function. On the lattice, Eq. (3.5) turns to , (3.8) whereτ = τ /a,ω = aω andρ = a 4 ρ, respectively. Equation (3.7) shows that the shear viscosity is the first-differential coefficient at ω = 0, then it is convenient to defineρ Note thatρ(ω) turns to an even-function in terms ofω. Then, the relationship with the correlation function (3.8) is expressed by To obtain the spectral function in Eq.(3.10), we introduce the cutoff of ω and the rescaled τ as follows: Here, the regimes of the primed variables are −1 ≤ ω ≤ 1 and 0 ≤ τ < 1. Using Λ = N τωcut , the relationship between the correlation function and the spectral function is rewritten by Discretizing the integral into a finite sum and taking a replacement (dω ) → ∆ω with ∆ω = 2/(N ω − 1), the finite sum is given by 2 To make the N ω dependence mild in the sparse modeling analysis, we include √ ∆ω in the kernel as follows: (3.14) Note that the kernel at ω = 0 is independent of τ -coordinate. Now, the equation we have to solve is The spectral functionρ new satisfies the sum rule which is given by Eq. (3.15) at i = 0. Therefore, a normalized spectral function, implies a simple form of the sum rule, We will obtain this spectral functionρ calc as an output of the sparse modeling analysis. The original spectral functionρ(ω j ) in Eq. (3.13) is transformed from the output of the sparse modeling analysis (ρ calc ), Finally, the shear viscosity divided by T 3 to be a dimensionless quantity is given by Let us summarize the technical steps of our strategy to obtain the shear viscosity using the sparse modeling method as follows: Step 1: Measure the correlation function C(τ i ) on the lattice, and then normalize it using the value of C(τ i = 0).
Step 2: Construct the discretized kernel and carry out SVD decomposition. Here, we utilize DGESDD routines of LAPACK.
Step 3: Estimate an optimal value of λ as follows: We first fix a searching range of λ as [λ min , λ max ]. Next, we define a line segment function in log scale, f (λ) = aλ b , which connects f (λ mix ) with f (λ max ). Then, we search λ opt , where the ratio f (λ)/χ 2 ( ρ) has a peak since it corresponds to the position of kink in χ 2 ( ρ). Furthermore, a simple check of whether an obtained λ is a "correctly" optimized value or not can be done by seeing the scaling property of λ opt . The ADMM algorithm has free parameters (µ, µ ). The correct λ opt is inversely scaled by the values of (µ, µ ) in the ADMM algorithm (see Eq. (A.5) in Appendix A).
Step 4: Carry out the sparse modeling analysis using the ADMM algorithm to find the most likely spectral functionρ calc using λ opt . We can reconstruct the correlation function using the obtainedρ calc as We check the feasibility of the analysis whether C output (τ i ) reproduces the input correlation function C(τ i ).
Step 5: Calculate the shear viscosity (η(T )/T 3 ) on the lattice from the obtained spectral function using Eqs. (3.19) and (3.20). Carry out the same calculations from Step 1 to Step 4 using the other lattice spacings with keeping a temperature, and then obtain η(T )/T 3 in the continuum extrapolation.
We show an example sparse-modeling analysis using this standard formula in Appendix B, where the correlation function is the smeared correlation function at a finite flow-time 3 . Because of the over-smearing problem in the smeared correlation function, we find that the standard formula does not work for the smeared correlation function.

Shear viscosity at a finite flow-time
We actually have a smeared correlation function C(t, τ ) instead of C(τ ). The short τregime of C(t, τ ) suffers from the over-smearing correction, and then we have to eliminate these data whose slope is different from the non-smeared ones. On the other hand, as we will see later (the right panel in Fig. 3), the slope of C(t, τ ) in the fiducial τ -regime is universal and does not depend on the flow-time. Then, we consider the kernel for C(t, τ ) in the fiducial τ -regime can be the same functional form with the one for the non-smeared correlation function. Thus, the spectral function bears the flow-time dependence of C(t, τ ) while the (reduced) kernel is universal.
Therefore, the integration equation for C(t, τ ) turns to where τ runs to the site in the fiducial regime. From the spectral function in the integral equation above, we will obtain the shear viscosity which depends on the flow-time. Furthermore, it depends on the lattice spacing, so that we will take the double limits, a → 0 and then t → 0 limits, for the obtained η(t, T ) at a fixed T . After these processes, we will find η(T )/T 3 at the temperature.
Applying the sparse modeling method to the smeared correlation function needs three modifications to the standard formula: Firstly, the input data C(t, τ ) should be restricted only in the fiducial τ -regime as same as the calculation of the one-point function of EMT in Ref. [24]. Simultaneously, the kernel matrix is reduced to K red (τ i , ω j ), where τ i runs the site only in the fiducial τ -regime. Secondly, we remove the sum rule given in Eq. (3.18) in the cost function, since the sum rule is related to the correlation function at τ = 0 but C(t, τ = 0) is always contaminated by the over-smearing at finite flow-time. The third one is the modification of the normalization factor C(τ 0 ) in Eq. (3.21) to a different value. This modification is not so essential but useful. Here, we utilize C(t, τ ini ) instead of C(τ 0 ), where τ ini is the smallest number of site in the fiducial τ -regime.
The modified strategy to obtain the shear viscosity using the gradient flow and the sparse modeling methods can be summarized as follows: Step 1': Measure the correlation function C(t, τ /a) at flow-time t using the gradient flow method, and then normalize it using the value of C(t, τ /a = τ ini ).
Step 2': Construct the reduced kernel ∆ω (3.27) and carry out SVD decomposition. Here, τ i runs only in the fiducial range for each flowtime, namely √ 8t/a ≤ τ i and √ 8t ≤ N τ − τ i . Here, the second inequality comes from the periodic boundary condition. For simplicity, we will drop the second inequality in the present draft and only show the first one. Note that we assume that the functional form of the kernel does not change for the reduced data of the smeared correlation function. We will see that the numerical data of C(t, τ ) in the fiducial τ -regime support this assumption, and actually this reduced kernel works well in our analyses.
Step 3': Estimate an optimal value of λ. This is the same process as the standard formula.
Step 4': Carry out the sparse modeling without the sum rule (set ν = 0 in Eq. (A.3)). We obtain the spectral function at a finite flow-time from the output of the sparse modeling analysis (ρ calc ) asρ We reconstruct the correlation function using the obtainedρ calc , Then, we can check the feasibility of the analysis whether C output (t, τ /a) reproduces the input correlation function C(t, τ /a).
Step 5': Calculate the shear viscosity (η(t, T )/T 3 ) on the lattice from the obtained spectral function using Eqs. (3.25) and (3.28). Carry out the same calculations from Step 1' to Step 4' using the other lattice spacings with keeping a temperature, and then obtain η(t, T )/T 3 after taking the continuum extrapolation. Finally, we have η(T )/T 3 after taking t → 0 limit. The demonstration of these processes is out of the present work, and here our goal is to find a stableρ(t, ω ) from the smeared correlation function.

Simulation setup
We consider the Wilson plaquette gauge action under the periodic boundary condition at β = 6.93 on N 3 s × N τ = 64 3 × 16 lattices. The lattice parameter realizes the system at T = 1.65T c . We use the relation between a/r 0 , where r 0 denotes the Sommer scale, and β in Ref. [49]. Then, T /T c is fixed by the resultant values of T r 0 = (N τ (a/r 0 )) −1 using the result at β = 6.20 in Ref. [50]. The gradient flow method in Ref. [24] gives the thermal entropy, s/T 3 = 4.98(24), after a → 0 and then t → 0 extrapolations. Gauge configurations are generated by the pseudo-heatbath algorithm with the overrelaxation. We call one pseudo-heatbath update sweep plus several over-relaxation sweeps as a "Sweep". To eliminate the autocorrelation, we take 200 Sweeps between measurements. The number of gauge configurations for the measurements is 2, 000. Note that it is quite a small number of configurations in comparison with 800, 000 in Ref. [13] and 6 million in Ref. [22].
Judging from such a poor statistic, we give up a serious estimation of the statistical error and focus on the feasibility of the sparse modeling method to obtain the shear viscosity combing with the gradient flow method. In §. 5 we explain how to estimate the spectral function using the central value of the correlation function. Then, in §. 6, we discuss the systematic and statistical errors in the analysis. We demonstrate the bootstrap analyses to estimate statistical uncertainty.
The flowed gauge field is obtained by solving the ordinary first-order differential equation (Eq. (3.2)). Numerically, we utilize the third-order Runge-Kutta method in which the error per step (t → t + ε) is O(ε 5 ). We take ε = 0.01, and confirm that the accumulation errors are sufficiently smaller than the statistical errors. The gauge action of the flow is the Wilson plaquette gauge action.
5 Results of the sparse modeling method for the central value of C(t, τ /a) Now, we demonstrate the sparse modeling analysis using the central value of the measured correlation functions. Table 1 shows the technical parameters in the analysis in this section. The reason why we take these values of s cut and aω cut will be explained below and §. 6.1.  Fixing [λ min , λ max ], N λ , and (µ, µ ) are correlated with each other. This set is a possible choice to find an optimal value of λ.
In Step 1' listed the end of §. 3.3, we measure the correlation function using the gradient flow method in the range of the flow-times 0.50 ≤ t/a 2 ≤ 2.50 with the interval ∆t/a 2 = 0.10. Take t/a 2 = 0.50, 1.00, 1.50, 2.00, and 2.50 for example in this section. The data of C(t, τ /a) appears in Fig. 3. We see from Fig. 3 that the longer flow-time data have a smaller statistical error. Actually, in these poor statistics, namely 2, 000 configurations, the correlation function of the non-smeared U 12 operator is quite noisy and its central values at some τ /a often take negative values because of the large fluctuations. However, all flowed data shown in the left panel in Fig. 3 (except for only one data-point at τ /a = 8 at t/a 2 = 1.00) indicate correctly positive values. It is a great advantage of the usage of the gradient flow method to obtain the correlation function.
On the other hand, there is a demerit of the usage of the gradient flow for C(t, τ /a). Thus, the smearing changes the slope in the short τ -regime because of the over-smearing. To see the merit and demerit clearly, we rescale the correlation function by multiplying the flow-time, (t/a 2 )C(t, τ /a) (right panel in Fig. 3). In τ /a 3, the slope of the correlation function strongly depends on the flow-time. It indicates that the kernel term, which is proportional to hyperbolic cosine function, is not available for the whole τ -regime in the flowed correlation function (see also Appendix B). On the other hand, around τ /a = N τ /2 the curve of the correlation function does not change, while the statistical errors are reduced thanks to the smearing. A similar property of the smearing has been reported in the case of the APE smearing for the Wilson loop (see Fig. 5 in Ref. [51]). To take only the merits of the gradient flow, it is better to take the fiducial τ -regime at finite flow-time, √ 8t < τ , for the estimation of ρ(ω). Here, we fix the τ -regime as 3 ≤ τ /a ≤ 13 and investigate the flow dependence of the results. The next step, Step 2', is the SVD decomposition of the kernel matrix. We introduce the cutoffω cut  that the vertical axis takes a log scale. The figure tells us that a large hierarchy more than 10 10 exists between 6-th and 7-th singular values if we take 3 ≤ τ /a ≤ 13. Furthermore, we find that the number of s l larger than 10 −16 is the same with the number of independentτ i under the periodic boundary condition. Refer to several condensed matter studies [43][44][45], it will saturate to a specific value even though the number of the data points for C(τ ) increases. It indicates that the number of τ i , here at most 7 in taking 2 ≤ τ /a ≤ 12 under the periodic boundary condition, to analyze this type of kernels is very sparse 4 . To minimize the square error in Eq. (2.8), the contributions of s l ρ l with a sufficiently small s l are negligible. Therefore, we introduce the threshold of the singular value s cut as s cut = 10 −10 in the present analysis.
The most important point is that this property of the singular values is determined only by the kernel matrix Eq. (3.27) and is independent of the simulation details. Here, we just assume that the correlation function can be described by a hyperbolic cosine function.
The right panel in Fig. 4 depicts the N ω dependence of the singular values using 3 ≤ τ /a ≤ 13. We find that s l is almost independent of N ω if we include √ ∆ω in the kernel matrix as Eq. (3.27).
It is worth to see the correlation function in the IR basis, U t (l,τ )C(τ ).  τ /a ≤ 12, the values of C l is large. We take 3 ≤ τ /a ≤ 13 and will discuss the results using 4 ≤ τ /a ≤ 12 in §. 6.2. We see that C l with l 5 is also sufficiently small, so that it is consistent with the truncation of s l for l ≥ 7.
Step 3' is the λ optimization. The λ dependence of the square error (Eq. (2.4)) is a monotonically increasing function of λ [41]. The optimized λ is given by λ opt = 1.1 × 10 −7 -1.9 × 10 −6 in our analysis. To find this, we scan the value of λ in the range of 10 −15 ≤ λ ≤ 10 2 by changing its exponent with 1/N λ interval. Once we obtain λ opt , then we check whether it shows roughly an inverse-rescaling with (µ, µ ).
The results of the spectral function are shown in Fig. 6. First of all, we find the tails  Fig. 6 is an enlarged plot, which focuses on ω ≈ 0. The curvatures of ρ(t, aω) at ω = 0 has an opposite sign between t/a 2 = 0.50, 1.00 and t/a 2 = 1.50, 2.00, 2.50. It remind us that the gradient flow smears the data in τ /a < √ 8t/a, then the data at τ /a = 3 (and 13) are over-smeared in t/a 2 ≤ 1.12. Thus, the results in t/a 2 = 0.50, 1.00 may suffer from the over-smearing corrections. To conclude this, we have to investigate the statistical uncertainty of the input C(t, τ /a). We will discuss this point in §. 6.
Finally, in Step 4', we check whether the obtained spectral function correctly reproduces the input correlation function. Figure 7 depicts the comparison plot between the input C(t, τ /a) and the C output (t, τ /a) constructed by the obtainedρ(t, aω). Note that the input data of the analysis is the central value of C(t, τ /a), while its jackknife errors are also shown. We see that most data, especially in the short τ -regime, are consistent between the input and output, while the discrepancies exist around τ /a = N τ /2. We consider that it comes from the large fluctuation of the input data, where some of them are consistent with zero or take a negative value. On the other hand, the output data is constructed to be positive, since we utilize the non-negativity condition to find the spectral function. Actually, the discrepancy around τ /aN τ = 1/2 in the longer flow-time with less statistical Figure 7. Comparison between the input C(t, τ /a) with the jackknife error and the output C output (t, τ /a) constructed by the obtainedρ(t, aω). The input C(t, τ /a) in shadowing regime are not utilized in the analyses. uncertainty becomes smaller. Therefore, we believe that the discrepancy will be reduced if we have the input data precisely.
6 Error estimations Now, we study the systematic uncertainty coming from ω cut , which is artificially introduced. Figure 8 shows the comparison of the obtainedρ(t, aω) between taking aω cut = 3.0 and aω cut = 4.0 for several flow-times data. The tails of the spectral functions using aω cut = 4.0 for all flow-times approach to zero, while the ones using aω cut = 3.0 for t/a 2 = 0.50, 1.00 are the middle of the decreasing. Nevertheless, the shape of spectral function for t/a 2 = 0.50, 1.00 does not depend on the value of ω cut so much. It implies the stability of this sparse modeling analysis when the artificial parameter ω cut is changed. On the other hand, the tails ofρ(t, aω) for t/a 2 = 1.50, 2.00 are closed to zero even though we utilize aω cut = 3.0, since the higher frequency modes are suppressed by the long flow processes. From this point of view, although the sparse-modeling method can apply to the non-smeared data, the smearing may practically stabilize the analysis with the truncation of ω 5 .

τ -regime dependence and the fiducial window of the gradient flow
Next, we investigate the relationship between a choice of τ -regime and the gradient flowtime. The fiducial τ -regime is theoretically given as τ > √ 8t, and we expect that the data does not suffer from an over-smearing. In other words, if we take 3 ≤ τ /a ≤ 13, then C(t, τ /a) in t/a 2 < 1.12 are in the fiducial regime, while τ /a = 3 (and 13) must have a strong over-smearing correction in t/a 2 > 1.12. On the other hand, if we take 4 ≤ τ /a ≤ 12, then C(t, τ /a) in t/a 2 < 2.00 stay in the fiducial regime. Figure 9 shows the comparison of the spectral functions between the 3 ≤ τ /a ≤ 13 and 4 ≤ τ /a ≤ 12 analyses for t/a 2 = 0.50, 1.00, 1.50, 2.00. First of all, as we explained, the gradient flow with each fixed τ -regime (same color symbols) reduces N in Eq. (5.2). On the other hand, we naively expected that N must decrease if we take the narrow regime of τ since the effective degrees of freedom are truncated. However, in t/a 2 = 1.50, 2.00, N of the red data (4 ≤ τ /a ≤ 12) is larger than the one of the black data (3 ≤ τ /a ≤ 13). The discrepancy is slightly beyond the statistical error, which we will study in §. 6.3. It implies that the corrections to the spectral function from over-smearing in the black data are strong rather than the influence onρ(t, aω) from the truncation of C(t, τ /a). Figure 10 showsρ(t, 0) as a function of the flow-time. Here, we summarize that the fiducial regime of the flow-time for each τ -regime analysis. The orders of allρ(t, 0) in the 5 Applying the sparse modeling analysis to mock data whose spectral function has a constant mode has been considered [52]. fiducial window are the same. In more detail, the value itself seems to have a discrepancy between two τ -regime analyses. We will discuss the difference in the next section after including the statistical uncertainty of C(t, τ /a) since a tiny noise of the correlation function often leads to a large difference in the spectral function in this kind of the ill-posed inverse problems.

Statistical errors
Now, we try to include the statistical uncertainty of the correlation function in our analysis. The statistical uncertainty of the correlation function (C(t, τ /a)) is not directly related to the error of the spectral function (ρ(t, aω)) since these two quantities are related to each other through the integration equation. On the other hand, the correlation function in the IR basis is linearly related to the spectral function in the same basis (Eq. (2.9)). Thus, we expect that the statistical uncertainties of these quantities satisfy ∆C l ∼ s l ∆ρ l . (6.1) We propose the bootstrap method to estimate the statistical error of the spectral function as follows: We resample N boot sets of 2, 000 data of C(t, τ /a) for each configuration, where the overlapping selection of the configurations for one bootstrap sample is allowed. For each set of the bootstrap sample, we take the average of 2, 000 data of C(t, τ /a) and carry out the sparse modeling analysis using its mean value. The statistical errors of the spectral function are calculated by the variance over N boot samples. It allows finding the asymmetric errors, so that it is a good tool to calculate the errors ofρ which should be non-negative as a constraint. Here, we take N boot = 1, 000. 8t < τ ) is indicated by the dashed lines. 0.50 < t/a 2 < 1.12 (non-shadowing) is the fiducial window for the 3 ≤ τ /a ≤ 13 analysis, while 0.50 < t/a 2 < 2.00 (non-shadowing and red-shadowing) depicts the one for the 4 ≤ τ /a ≤ 12 analysis. Figure 11 depicts the spectral function with the bootstrap errors. We also plot the result of the central-value analysis shown in Fig. 6, which is inside the error bound.
We check whether the bootstrap errors satisfy an expected relationship Eq. (6.1). Here, using Eq. (2.7), ∆C l is estimated as the variation of C l transformed by the bootstrap sample of C(t, τ /a), while ∆ρ l is done as the variation of the obtainedρ(t,ω) for each bootstrap sample. Figure 12 shows the l-dependence of C l and ρ l . We see that the statistical errors of them roughly satisfies ∆ρ l ∼ ∆C l /s l . Note that ρ l with l ≥ 7 are consistent with zero since these modes are eliminated by the cut of the singular values in the analysis.
We also show the comparison of the statistical error bars between the input and output C(t, τ /a) in Fig. 13. The error bars between them in the short τ -regime are consistent with each other. The error bars of the output around τ /a = N τ /2 are smaller than the ones of the input. We consider that it comes from the non-negativity condition of the spectral function during the sparse modeling analysis. It makes C(t, τ /a) positive correctly, and then the condition reduces the error bars of C output . We can conclude that the bootstrap method is a reasonable estimation method of the statistical errors for the sparse modeling analysis.

Flow-time dependence of the shear viscosity
Finally, we discuss the flow-time dependence of the shear viscosity with the statistical uncertainty. Since the number of configurations is quite poor, then we cannot give a conclusive value of the shear viscosity. Furthermore, we first take the continuum limit of η(t, T ) obtained by the lattice simulation with several different lattice spacings at the fixed temperature. After that, we could discuss the flow-time dependence 6 . Thus, in the present analysis, we do not include the systematic uncertainties coming from the continuum extrapolation. Therefore, we show the results to see naive flow-time dependence at a fixed lattice spacing.
The left panel in Fig. 14 depicts the value ofρ(t, 0) in the 3 ≤ τ /a ≤ 13 and 4 ≤ τ /a ≤ 12 analyses. We find that both analyses are consistent with each other in non-shadowing regime (0.50 < t/a 2 < 1.12), where all data in both analyses are in the fiducial τ -regime. However, the discrepancy appears in red-shadowing regime (1.12 < t/a 2 < 2.00), where the data at τ /a = 3 is over-smeared. Furthermore, the slope of the flow-time dependence of the 4 ≤ τ /a ≤ 12 analysis is milder than the one of the 3 ≤ τ /a ≤ 13 analysis. It tells us the importance of taking the fiducial window in the sparse modeling analysis.
The ratio between the shear viscosity and the thermal entropy is shown in the right panel in Fig. 14. Here, we utilize s/T 3 = 4.98 at this temperature, which is given in Ref. [24]. We also plot the result in Ref. [14] at t/a 2 = 0 as a reference. A theoretical large-N c analysis based on AdS/CFT correspondence for N = 4 super Yang-Miils theory gives the lower bound for η/s = 1/4π [53] 7 . Although our result seems to be smaller than the previous works, we can conclude that our result also indicates the small η/s, which describes the perfect liquid property in the QGP phase.

Summary
We have proposed the sparse modeling analysis to estimate the spectral function from the smeared correlation functions. We have described how to obtain the shear viscosity from the correlation function of the renormalized EMT measured by using the gradient flow method for the quenched QCD at finite temperature. The gradient flow method reduces the statistical uncertainty of the correlation functions thanks to its smearing property, while the smearing breaks the sum rule of the spectral function. Therefore, we have eliminated the over-smeared data when we analyze the spectral function.
We have first given the standard formula of the sparse modeling analysis where we assume C(t, τ ) in the whole τ -regime is available for the analysis. Then, we have modified the formulation to investigate the smeared correlation function, where the over-smeared data of C(t, τ /a) are eliminated. We have shown that the sparse modeling analyses in the IR basis looks stable even using very limited data of the correlation function and the obtained spectral function reproduces the input correlation function. Several systematic uncertainties of the analysis are well under control. We have also demonstrated the bootstrap analysis for estimating the statistical errors. Although the statistical error of our result is sizable because of poor statistics of our data, we have shown that the bootstrap analysis seems to be promising since the expected relationship of the errors between C l and ρ l have been satisfied.
If we will collect 6 million configurations as the same with the work [22], a naive estimation following 1/ N conf. would give a few % relative error for η/s. It looks very promising toward the precise determination of the shear viscosity.

Acknowledgments
We are grateful to T. Hirano and A. Tomiya for useful comments. The numerical simulations were carried out on SX-ACE at Cybermedia Center (CMC) and Research Center for Nuclear Physics (RCNP), Osaka University. We also acknowledge the help of CMC in tuning the gradient-flow code. This work partially used computational resources of HPCI-JHPCN System Research Project (Project ID: jh200031) in Japan. This work was supported in part by Grants-in-Aid for Scientific Research through Grant Nos. 15H05855,and 19K03875, which were provided by the Japan Society for the Promotion of Science (JSPS), and in part by the Program for the Strategic Research Foundation at Private Universities "Topological Science" through Grant No. S1511006, which was supported by the Ministry of Education, Culture, Sports, Science and Technology (MEXT) of Japan. The calculations were partially performed by the supercomputing system SGI ICE X at the Japan Atomic Energy Agency. This work was partially supported by JSPS-KAKENHI Grant Numbers 18K11345.

A ADMM algorithm
Here, we give a review of the ADMM algorithm, which gives a solution to the optimization problem with several constraints. Following Refs. [41,46,47], we explain how to build the numerical code in detail to make the paper self-contained.
The problem is to find a minimization of F ( ρ ) in Eq. (2.10) with respect to ρ under additional constraints. Using the conventional notation, we change the variables as C → y and ρ → x. The cost function is written by Here, we consider two constraints: These constraints correspond to the non-negativity of the spectral function and the sum rule, respectively. Here, x = V x using the matrix V obtained by the SVD decomposition, and we have used a convention that vectors with prime denote quantities represented in the IR (SVD) basis. The dimension of this optimization problem is given by L = min(N τ , N ω ), where N τ and N ω denotes the size of y and x, respectively. Actually, we may further reduce L by introducing a cut of the singular value in analysis.
In the ADMM algorithm [46,47], we introduce auxiliary vectors z and z , and consider the minimization of the functioñ Here, the sum rule imposed by the Lagrange multiplier ν, and the non-negativity is represented by γ. Thus, the auxiliary vectors z and z inflect the sum rule and the non-negativity, respectively. The description to realize the first constraint, z = x , is given by two kinds of coefficients; (normalized) Lagrange multipliers u and a coefficient µ for z − x 2 2 . The parameter µ controls the speed of the convergence, while u is iteratively updated together with its conjugate variables z . Similarly, µ and u are introduced to realize the second constraint, z = V x.
The explicit description is following: x ← 1 λ S t S + (µ + µ)1 Here, P + denotes a projection operator onto non-negative quadrant; P + z j = max(z j , 0) for each component. The explicit form of S α in Eq. (A.7) is defined by B Analysis in the trash: Usage of whole τ -regime data of C(t, τ /a) at a finite flow-time It may be worth to see the sparse modeling analysis using all τ i data at a finite flow-time.
The numerical codes written in C++, Fortran, and Julia are uploaded on the arXiv page of this paper. The data of the smeared correlation function at t/a 2 = 1.50 is also included in the package. Figure 15 depicts the result of the spectral function in the left panel and the comparison plot between the input C(t, τ /a) and C output (t, τ /a) in the right panel. Here, the label ν = 0 and ν = 0 denotes the results of the sparse modeling analyses with and without the sum rule, respectively. In both results,ρ(t, aω) take the negative values near ω ≈ 0 even though we utilize the non-negativity as a constraint. It numerically indicates that the analyses do not converge within a finite number of iterations in the ADMM routine (here, we take O(10 9 ) iterations).
Furthermore, C output (t, τ /a), which are reconstructed by the obtained spectral functions, far from the input one. In particular, we can see that the slope in the log-scale is different between the input and the outputs. It suggests that the input smeared correlation function can not be described by the kernel given by the hyperbolic cosine function in Eq. (3.27).

C N τ dependence of the singular values
The left panel in Fig. 4 show that the number of s l above 10 −16 is the same with the number of the independent site in τ direction. It tells us the number of the data-point on N τ = 16 lattice is very sparse to resolve the information of the kernel in double precision on a computer. Here, we investigate how large lattice size (or how fine lattice spacing) are needed to analyze the kernel at this temperature with minimal loss of information.
For simplicity, here we consider the standard kernel (Eq.(3.27)) instead of the reduced one, ∆ω . (C.1) Here, Λ = N τ aω cut = ω cut /T . We assume that ω cut in the physical unit is universal at a fixed temperature, so that Λ is constant. To increase the number of s l above 10 −16 , we have to take the larger N τ and the finer a at the fixed temperature. Thus, it corresponds to taking the continuum extrapolation. In the present paper, we have shown that aω cut = 4.0 on N τ = 16 lattice extent is a good choice to analyze the correlation function at T = 1.65T c , then we set Λ = 64. Nτ=40, aω cut = 1.60 Nτ=32, aω cut = 2.00 Nτ=25, aω cut = 2.56 Nτ=25, aω cut = 3.20 Nτ=16, aω cut = 4.00 log(s l ) Figure 16. N τ dependence of the singular values for the kernel matrix. Here, we fix N τ aω cut = 64.0 Figure 16 depicts the N τ dependence of the singular values for the kernel matrix. We see that the number of s l larger than 10 −16 is almost saturated if we take N τ ≥ 32. Thus, it suggests that the correlation function with the kernel above can be well described by N τ ≈ 32 with minimal loss of information within the double precision. In other words, the information will not increase even though we carry out the simulation on N τ 32 in the double precision.
The minimal size of N τ with minimal loss of the information depends on the temperature in the physical unit. We expect that the lower temperature analysis needs the larger Λ = ω cut /T and then the slope of s l becomes gentle [43][44][45]. Then, in the lower temperature, we need the larger lattice size to resolve the information of the kernel in the same precision.