1 Introduction

During the reentry process, vehicles may encounter different flow regimes such as free molecular, transitional, near continuum, and continuum regime. The determination of aerodynamic forces and heat loads has great impact on the design of vehicles [1]. In the non-continuum regimes, traditional macroscopic methods, such as Euler, Navier-Stokes and Burnett equations, may become invalid. The following methods are mainly used for the non-equilibrium flow simulations. The first kind of method is based on probabilistic modeling. The most popular one is the direct simulation Monte Carlo (DSMC) method. DSMC was first proposed by Bird [2] more than half a century ago. It follows the evolution of representative particles with uncoupled transport and collision process. The DSMC has been fully validated for providing physical solutions through its comparison with the experiments measurements [3, 4]. It has played a key role in the design and flight analysis of vehicles in the rarefied environment. Some of the most cited DSMC codes in literature are DS2V/3 V [5], DAC [6], SMILE [7], MONACO [8], and DSMCFOAM [9]. The main differences among these codes are in the treatment of collision selection methods and mesh topology.

Another kind of approach is the deterministic method. Deterministic method mainly concerns the Boltzmann equation. Due to the complexity of the Boltzmann collision term, researchers usually choose the simplified collision model, such as BGK model [10], Shakhov model [11], Rykov model [12]. Titarev [13] has developed an implicit solver named Nesvetay-3D on unstructured mesh. Three-dimensional TVD method is applied for the numerical discretization. Both spatial and velocity mesh decomposition are used in the parallelization. A total number of 6.9 × 109 mesh points in the six-dimensional space is used for the supersonic flow simulation around a re-entry space vehicle. Wadsworth [14] has developed a parallel, finite volume 2D/axisymmetric code SMOKE which is based on conservative numerical schemes developed by Mieussens [15]. In Baranger’s team, a 3D code [16] has been used in the past years for rarefied flow simulations. This code can handle polyatomic gases. It uses block structured mesh and hybrid parallelization, i.e., space domain decomposition with MPI and inner parallelization with OpenMP. Furthermore, the code is equipped with velocity mesh refinement technique which improves the code in both CPU time saving and memory storage. Li’s team has developed a 3D code based on the model equation with the name gas-kinetic unified algorithm (GKUA) [17, 18]. Three-dimensional hypersonic flows around sphere and spacecraft with different Knudsen numbers and Mach numbers have been studied. The total six-dimensional mesh for a complex wing-body configuration reaches 7.3 × 1011 and 23,800 CPU cores [19] have been used in the computation.

However, the above deterministic methods share a common feature. They decouple the particle transport and collision. Therefore, the cell size and time step in these numerical schemes are limited by the particle mean free path and mean collision time in order to provide accurate numerical solutions. When the flow regime is close to continuum or near continuum, the time step and cell size limitations are rather severe and make these methods extremely time-consuming and inefficient.

Another distinguishable deterministic method, which is named unified gas kinetic scheme (UGKS), was proposed by Xu et al. [20,21,22]. UGKS is a multi-scale method with coupled particle transport and collision in its numerical flux modeling. It is based on an integral solution of the gas-kinetic model equation. It can recover the flow physics from the kinetic particle transport and collision to the hydrodynamic wave propagation. Moreover, the time step is determined only by the CFL condition, which is not limited by the mean collision time. So the scheme becomes more efficient in various flow regimes, especially when the local Knudsen number is low. Applying UGKS to analyze aerodynamic and aerothermodynamics on flying vehicles in near space flight is our long term objective.

This paper is organized in the following. Section 2 is about the introduction of UGKS and some techniques to accelerate convergence. Section 3 is a simple description of the framework. Section 4 is some 2D and 3D validation test cases. The last section is the conclusion.

2 Method

2.1 Unified gas kinetic scheme

The three-dimensional Shakhov model equation [11],which can give the correct Prandtl number, in non-dimensional form reads

$$ {f}_t+{uf}_x+{vf}_y+{wf}_z=\frac{f^{+}-f}{\tau } $$

where the free-stream parameters density \( {\overline{\rho}}_{\infty } \), velocity \( {\overline{U}}_{\infty } \), viscosity coefficient \( {\overline{\mu}}_{\infty } \) and the characteristic length \( \overline{L} \) are used and the resulting non-dimensional variables are given by.

\( \left(x,y,z\right)=\left(\overline{x},\overline{y},\overline{z}\right)/\overline{L} \), \( t=\overline{t}/\left(\overline{L}/{\overline{U}}_{\infty}\right) \), \( \left(u,v,w\right)=\left(\overline{u},\overline{v},\overline{w}\right)/{\overline{U}}_{\infty } \), \( \rho =\overline{\rho}/{\overline{\rho}}_{\infty } \)

\( p=\overline{p}/\left({\overline{\rho}}_{\infty }{\overline{U}}_{\infty}^2\right) \), \( \tau =\overline{\tau}/\left(\overline{L}/{\overline{U}}_{\infty}\right) \), \( \mu =\overline{\mu}/{\overline{\mu}}_{\infty } \), \( \lambda =\overline{\lambda}/\left(1/{\overline{U}}_{\infty}^2\right) \), \( f=\overline{f}/\left({\overline{\rho}}_{\infty }/{\overline{U}}_{\infty}^3\right) \)

\( {f}^{+}={\overline{f}}^{+}/\left({\overline{\rho}}_{\infty }/{\overline{U}}_{\infty}^3\right) \), \( {g}_M={\overline{g}}_M/\left({\overline{\rho}}_{\infty }/{\overline{U}}_{\infty}^3\right) \).

f+ can be given in the form, f+ = gM + g+

Here gM is the Maxwellian distribution function

$$ {g}^{+}={g}_M\left(1-\Pr \right)\overrightarrow{c}\bullet \overrightarrow{q}\left({c}^2/ RT-5\right)/\left(5 pRT\right) $$

and \( \overrightarrow{c}=\overrightarrow{u}-\overrightarrow{U} \) is the peculiar velocity. T, \( \overrightarrow{q} \), Pr are the temperature, heat flux and Prandtl number, respectively.

The relations between conservative variables ρ, ρU, ρV, ρW, ρE with the probability density function is

$$ {\left(\rho, \rho U,\rho V,\rho W,\rho E\right)}^{\mathrm{T}}=\int {\boldsymbol{\uppsi}}^{\mathrm{T}} fd\Xi $$

where ψT = (1, u, v, w, 1/2(u2 + v2 + w2))T is vector of moments and dΞ = dudvdw is the volume element in the phase space.

Integrating Eq. (1) in the volume element we can get

$$ {\displaystyle \begin{array}{l}\frac{\partial \mathbf{Q}}{\partial t}+\frac{\partial \mathbf{F}}{\partial x}+\frac{\partial \mathbf{G}}{\partial y}+\frac{\partial \mathbf{H}}{\partial y}=0\kern0.75em \\ {}\mathbf{F}=\int uf{\boldsymbol{\uppsi}}_{\alpha }d\Xi \kern0.75em \mathbf{G}=\int vf{\boldsymbol{\uppsi}}_{\alpha }d\Xi \kern0.5em \mathbf{H}=\int wf{\boldsymbol{\uppsi}}_{\alpha }d\Xi \end{array}} $$

where the conservation constraint or compatibility condition in the following form has been used

$$ \int \left(f-{f}^{+}\right){\boldsymbol{\uppsi}}_{\alpha }d\Xi =0\kern0.5em ,\kern0.5em \alpha =1,2,3,4,5 $$

For curvilinear coordinate system, applying the finite volume method eq. (3) goes to

$$ {\displaystyle \begin{array}{l}\Delta \mathbf{Q}=-{V}^{-1}{\int}_{t^n}^{t^{n+1}}\left[\begin{array}{l}{\left(\mathbf{J}\cdot \mathbf{S}\right)}_{i+1/2,j,k}-{\left(\mathbf{J}\cdot \mathbf{S}\right)}_{i-1/2,j,k}\\ {}+{\left(\mathbf{J}\cdot \mathbf{S}\right)}_{i,j+1/2,k}-{\left(\mathbf{J}\cdot \mathbf{S}\right)}_{i,j-1/2,k}\\ {}+{\left(\mathbf{J}\cdot \mathbf{S}\right)}_{i,j,k+1/2}-{\left(\mathbf{J}\cdot \mathbf{S}\right)}_{i,j,k-1/2}\end{array}\right] dt\\ {}\mathbf{J}=\mathbf{Fi}+\mathbf{Gj}+\mathbf{Hk}\end{array}} $$

where V is the cell volume, S and J are the cell face vectors and flux vectors, respectively.

The flux across a cell interface is based on the integral solution of the model equation. Discontinuous spatial reconstruction with nonlinear limiter is used to introduce artificial dissipation for UGKS once the scheme becomes a shock capturing method when the dissipative flow structure cannot be well resolved by the cell size. Details can be found in [20]. In this paper, we use van Leer limiter in the reconstruction. Due to the discreteness of the velocity space, numerical quadrature should be used to calculate various integrals. In this paper, composite Newton-Cote’s (N − C) quadrature is adopted.

The Rykov model [12] for diatomic gases is also implemented in our UGKS code package. The corresponding details are omitted.

2.2 Conservative discrete ordinate method [23]

The compatibility condition Eq. (4) is the basis for the governing Eq. (3). But once the DOM is introduced and the velocity space is discretised, Eq. (4) no longer holds and becomes

$$ \int \left(f-{f}^{+}\right){\boldsymbol{\uppsi}}_{\alpha }d\Xi =\mathbf{Err}\left(N-C\right) $$

Here Err is the numerical error introduced by the numerical quadrature. Err can be reduced by increasing the velocity space mesh in a certain extent but will finally stay in some level, which is determined by the intrinsic nature of numerical quadrature.

This numerical error results in a source term in the governing Eq. (5). The source term can be expressed in the form \( {\int}_{t^{\zeta}}^{t^{\zeta +1}}\left[\frac{1}{\tau}\mathbf{Err}\left(N-C\right)\right] dt \)


$$ \mathbf{SS}=\frac{1}{\Delta t}{\int}_{t^{\zeta}}^{t^{\zeta +1}}\left[\frac{1}{\tau}\mathbf{Err}\left(N-C\right)\right] dt\approx {\left[\frac{1}{\tau}\mathbf{Err}\left(N-C\right)\right]}^{\zeta +1} $$

Here Δt is the marching time step. The five components of SS correspond to the governing equations of mass, momentum in the x, y and z directions and the energy, respectively. After some simple derivations we can get

$$ \tau =\frac{\mu }{p\kern-0.2em {\operatorname{Re}}_{\infty }}\propto \frac{Kn_{\infty }}{M_{\infty }}\frac{\mu }{p} $$

From Eq. (7) and Eq. (8) we can see that SS is related to free-stream condition and numerical quadrature.

In order to eliminate the numerical source term completely, we introduce CDOM proposed by Titarev [24] into UGKS,

$$ \iiint \frac{f^{+}-f}{\tau }{\psi}_1^{\mathrm{T}} dudvdw=\frac{1}{\tau }{\left(0,0,0,0,0,-2/3{q}_x,-2/3{q}_y,-2/3{q}_z\right)}^{\mathrm{T}} $$


$$ {\displaystyle \begin{array}{l}{\psi}_1^{\mathrm{T}}={\left(1,u,v,w,\frac{1}{2}\left({u}^2+{v}^2+{w}^2\right),\frac{1}{2}\left(u-U\right){\overrightarrow{c}}^2,\frac{1}{2}\left(v-V\right){\overrightarrow{c}}^2,\frac{1}{2}\left(w-W\right){\overrightarrow{c}}^2\right)}^{\mathrm{T}}\\ {}{\overrightarrow{c}}^2={\left(u-U\right)}^2+{\left(v-V\right)}^2+{\left(w-W\right)}^2\end{array}} $$

The first five equations in (9) represent conservation of mass, momentum and energy during collision process. In discretised velocity space, the multiple integral is replaced by numerical quadratures. If the equilibrium distribution function remains in the form given in section 2.1, Eq. (9) no longer holds due to numerical error of quadratures. In other words, the conservation property will not be maintained.

Substituting the expression \( \iiint f{\psi}_1^{\mathrm{T}} dudvdw={\left(\rho, \rho U,\rho V,\rho W,\rho E,{q}_x,{q}_y,{q}_z\right)}^{\mathrm{T}} \) into Eq. (9) we can get a new Eq. (10), which can be solved by the Newton iteration method. An initial guess equals to (ρ, U, V, W, λ, qx, qy, qz) is provided. Then a new group of variables, \( \left({\rho}^{\prime },{U}^{\prime },{V}^{\prime },{W}^{\prime },{\lambda}^{\prime },{q}_x^{\prime },{q}_y^{\prime },{q}_z^{\prime}\right) \) can be got.

$$ \sum {f}^{+}{\psi}_2^{\mathrm{T}}-{\left(\rho, \rho U,\rho V,\rho W,\rho E,{q}_x,{q}_y,{q}_z\right)}^{\mathrm{T}}={\left(0,0,0,0,0,-2/3{q}_x,-2/3{q}_y,-2/3{q}_z\right)}^{\mathrm{T}} $$


$$ {\displaystyle \begin{array}{l}{\psi}_2^{\mathrm{T}}={\left(1,u,v,w,\frac{1}{2}\left({u}^2+{v}^2+{w}^2\right),\frac{1}{2}\left(u-{U}^{\prime}\right){{\overrightarrow{c}}^{\prime}}^2,\frac{1}{2}\left(v-{V}^{\prime}\right){{\overrightarrow{c}}^{\prime}}^2,\frac{1}{2}\left(w-{W}^{\prime}\right){{\overrightarrow{c}}^{\prime}}^2\right)}^{\mathrm{T}}\\ {}{{\overrightarrow{c}}^{\prime}}^2={\left(u-{U}^{\prime}\right)}^2+{\left(v-{V}^{\prime}\right)}^2+{\left(w-{W}^{\prime}\right)}^2\end{array}} $$

Here the symbol ∑ indicates that numerical quadratures are used. With the discrete f+ determined by the above group of variables, the conservation property holds and the numerical source term Err goes to machine zero, which has been validated in numerical experiments.

The UGKS in Section 2.1 has a second-order of accuracy. What we do in this section only changes the form of the heat flux modified equilibrium state. The spatial reconstruction and the evaluation of the numerical flux remain unchanged. Thus, CDOM does not affect the spatial accuracy and the coupling of particle transport and collision.

2.3 Implicit UGKS [25]

The governing equation in a physical control volume (i,j,k), at velocity mesh point ul, m, n = (ul, vm, wn), is given by

$$ \frac{\partial {f}_{i,j,k,l,m,n}}{\partial t}+{u}_l\frac{\partial {f}_{i,j,k,l,m,n}}{\partial x}+{v}_m\frac{\partial {f}_{i,j,k,l,m,n}}{\partial y}+{w}_n\frac{\partial {f}_{i,j,k,l,m,n}}{\partial z}=\frac{\left({f}_{i,j,k,l,m,n}^{+}-{f}_{i,j,k,l,m,n}\right)}{\tau } $$

Define Δf = fζ + 1 − fζ and Δt = tζ + 1 − tζ, then the implicit method reads

$$ {\displaystyle \begin{array}{l}\left(1+\Delta t\cdot \frac{1}{\tau^{\zeta }}+\Delta t\cdot {\mathrm{u}}_{l,m,n}\nabla \right){\left(\Delta f\right)}_{i,j,k,l,m,n}=\Delta t\cdot {R}_{i,j,k,l,m,n}^{\zeta}\\ {}{R}_{i,j,k,l,m,n}^{\zeta }=-{u}_l\frac{\partial {f}_{i,j,k,l,m,n}^{\zeta }}{\partial x}-{v}_m\frac{\partial {f}_{i,j,k,l,m,n}^{\zeta }}{\partial y}-{w}_n\frac{\partial {f}_{i,j,k,l,m,n}^{\zeta }}{\partial z}+\frac{1}{\tau^{\zeta }}\left({f}^{+}-f\right)\\ {}\kern3.5em =-{R}^{\prime }+\frac{1}{\tau^{\zeta }}\left({f}^{+}-f\right)\end{array}} $$

where R' is the evolving time averaged flux which can be written as

$$ {R}^{\prime }=\frac{\int_0^{\Delta tt}\sum \limits_{ii=1}^6{u}_{\mathrm{n}}{f}_p(t) dt}{\Delta tt} $$

where un = ul, m, n • nii and nii is the unit vector normal to the cell interface. The evolving time step Δtt is different from the marching time step Δt. Based on some numerical experimental results, we propose in this paper the following principle to determine Δtt

$$ \Delta tt<\Delta {t}_{\mathrm{min}}/ CFL $$

where Δtmin is the minimum time step in the whole field determined by the CFL condition.

Eq. (12) can be rewritten in the following form

$$ {\displaystyle \begin{array}{l}\left(1+\Delta t\cdot \frac{1}{\tau^{\zeta }}\right){\left(\Delta f\right)}_{i,j,k,l,m,n}+\frac{\Delta t}{\left|{V}_{i,j,k}\right|}\sum \limits_{ii=1}^6\left({\mathbf{u}}_{l,m,n}\cdot {\mathbf{n}}_{ii}\right)\cdot \left|{S}_{i,j,k, ii}\right|\cdot FF\left({\left(\Delta f\right)}_{i,j,k,l,m,n},{\left(\Delta f\right)}_{i1,j1,k1,l,m,n}\right)\\ {}=\Delta t\cdot {R}_{i,j,k,l,m,n}^{\zeta}\end{array}} $$

where the subscripts (i1,j1,k1) indicates the cell sharing the iith edge with the (i,j,k) cell. The quantity FF can be expressed as

$$ {\displaystyle \begin{array}{l} FF\left({\left(\Delta f\right)}_{i,j,k,l,m,n},{\left(\Delta f\right)}_{i1,j1,k1,l,m,n}\right)\\ {}=\frac{1}{2}\left[{\left(\Delta f\right)}_{i,j,k,l,m,n}+{\left(\Delta f\right)}_{i1,j1,k1,l,m,n}\right]+\frac{1}{2}\mathit{\operatorname{sign}}\left({\mathbf{u}}_{l,m,n}\cdot {\mathbf{n}}_{ii}\right)\left[{\left(\Delta f\right)}_{i,j,k,l,m,n}-{\left(\Delta f\right)}_{i1,j1,k1,l,m,n}\right]\end{array}} $$

Substituting the above expression into Eq. (15) we can get

$$ {\displaystyle \begin{array}{l}\left[1+\Delta t\cdot \frac{1}{\tau^{\zeta }}+\Delta t\cdot {b}_{i,j,k,l,m,n}\right]{\left(\Delta f\right)}_{i,j,k,l,m,n}+\sum \limits_{ii=1}^6\Delta t\cdot {c}_{i,j,k,l,m,n}\cdot {\left(\Delta f\right)}_{i1,j1,k1,l,m,n}=\Delta t\cdot {R}_{i,j,k,l,m,n}^{\zeta}\\ {}{b}_{i,j,k,l,m,n}=\sum \limits_{ii=1}^6\left({\mathbf{u}}_{l,m,n}\cdot {\mathbf{n}}_{ii}\right)\cdot \left(1+\mathit{\operatorname{sign}}\left({\mathbf{u}}_{l,m,n}\cdot {\mathbf{n}}_{ii}\right)\right)\frac{\left|{S}_{i,j,k, ii}\right|}{2\left|{V}_{i,j,k}\right|}\\ {}{c}_{i,j,k,l,m,n}=\left({\mathbf{u}}_{l,m,n}\cdot {\mathbf{n}}_{ii}\right)\cdot \left(1-\mathit{\operatorname{sign}}\left({\mathbf{u}}_{l,m,n}\cdot {\mathbf{n}}_{ii}\right)\right)\frac{\left|{S}_{i,j,k, ii}\right|}{2\left|{V}_{i,j,k}\right|}\end{array}} $$

Writing Eq. (16) in matrix form

$$ \left(\mathbf{I}+\Delta t\cdot {\mathbf{Z}}_{l,m,n}\right)\cdot {\left(\Delta \mathbf{f}\right)}_{l,m,n}=\Delta t\cdot {\mathbf{X}}_{l,m,n}^{-1}\cdot {\mathbf{R}}_{l,m,n}^{\zeta } $$
$$ {\displaystyle \begin{array}{c}{\left(\Delta \mathbf{f}\right)}_{l,m,n}=\left(\begin{array}{l}{\left(\Delta f\right)}_{1,1,1,l,m,n}\\ {}{\left(\Delta f\right)}_{2,1,1,l,m,n}\\ {}\cdots \\ {}{\left(\Delta f\right)}_{NI-1, NJ-1, NK-1,l,m,n}\end{array}\right)\kern1em {\mathbf{R}}_{l,m,n}^{\zeta }=\left(\begin{array}{l}{R}_{1,1,1,l,m,n}^{\zeta}\\ {}{R}_{2,1,1,l,m,n}^{\zeta}\\ {}\cdots \\ {}{R}_{NI-1, NJ-1, NK-1,l,m,n}^{\zeta}\end{array}\right)\\ {}{\mathbf{X}}_{l,m,n}=\left(\begin{array}{cccc}{\chi}_{1,1,1,l,m,n}& 0& \cdots & 0\\ {}0& {\chi}_{2,1,1,l,m,n}& \cdots & 0\\ {}0& 0& \cdots & 0\\ {}0& 0& \cdots & {\chi}_{NI-1, NJ-1, NK-1,l,m,n}\end{array}\right)\end{array}} $$

where NI, NJ and NK are the physical mesh points in the i, j and k directions, respectively.

Applying approximate LU decomposition to (I + Δt ⋅ Zl,m,n) we can get

$$ \mathbf{I}+\Delta t\cdot {\mathbf{Z}}_{l,m,n}={\mathbf{L}}_{l,m,n}\cdot {\mathbf{U}}_{l,m,n}+\bigcirc \left(\Delta {t}^2\right) $$

Where Ll,m,n and Ul,m,n are both diagonal matrices and can be given by

$$ {l}_{pq}=\left\{\begin{array}{ll}\Delta t\cdot {z}_{pq}& p<q\\ {}0& p>q\end{array}\right.\kern0.5em {u}_{pq}=\left\{\begin{array}{ll}0& p<q\\ {}\Delta t\cdot {z}_{pq}& p>q\end{array}\right.\kern0.5em {l}_{pp}={u}_{pp}=1 $$

The implicit method in the final form reads

$$ {\mathbf{L}}_{l,m,n}\cdot {\mathbf{U}}_{l,m,n}\cdot {\left(\Delta \mathbf{f}\right)}_{l,m,n}=\Delta t\cdot {\mathbf{X}}_{l,m,n}^{-1}\cdot {\mathbf{R}}_{l,m,n}^{\zeta } $$

In structured meshes, (Δf)l,m,n can be obtained after backward and forward substitution and fζ + 1 can be got subsequently.

In the above procedure, the gain term f+ in the collision term is treated explicitly. Since UGKS is a multi-scale hybrid method with both macroscopic and microscopic variable updates. The macroscopic variables can be updated implicitly first to give a pre-evaluating f+, resulting in a complete implicit implementation [26] for the collision term. This is very useful for continuum or near continuum flows.

2.4 Local refinement in the velocity mesh

Generic adaptive mesh refinement (AMR) [27, 28] in velocity can greatly decrease the CPU time and memory requirements for UGKS. However, the resulting velocity meshes are usually different for different spatial cells, making it rather difficult to apply the implicit technique.

In our UGKS solver package, we combine the merits of both methods through the following procedure. First, the bounds and interval of a global uniform velocity mesh are calculated according to numerical experiences or a pre-conducted Navier-Stokes simulation results. Obviously, the lower and upper limits of the velocity mesh in each direction are determined by the highest temperature which usually appears in the shock layer. While the mesh interval Δv is determined by the lowest temperature in the whole field. Second, a global uniform velocity mesh is generated which we call background mesh. The interval of this mesh is a • Δv where a is larger than one. Then we give a patch on the background velocity mesh for the spatial cells whose velocity mesh interval should be less than a • Δv. The location of the patch can be determined by the pre-calculated Navier-Stokes results or even by the UGKS results with the background velocity mesh. The resulting velocity mesh is still structured. The implicit method can be applied without any difficulties.

Up to now, the only difficulty arising may be the interpolation of distribution functions from the background mesh to the patch. We use the following conservative method. Take 1D case for example, the composite Newton-Cote’s quadrature requires that the total number of velocity points is 4 N + 1, where N is a positive integer. We can get an interpolation polynomial from the five distribution functions which is equally spaced on a small block of four successive intervals on the velocity mesh. Since Newton-Cote’s quadrature coefficients are derived from this polynomial, they are consistent. It can be easily proved that the conservations of mass, momentum and energy hold if we extend the original 5 points equally spaced mesh to a 9 points equally spaced mesh. For 2D or 3D cases, extending a block mesh of 5 × 5 or 5 × 5 × 5 to 9 × 9 or 9 × 9 × 9 can be done in the same way. Proof of the conservation law can be verificated through some mathematical software such as MAPLE.

We have applied this technique in a 2D jet case on a blunt cone. The freestream Mach number is 8.1 with an altitude of 90 km. The jet condition is ρj = 7.468e − 3, uj = cj, pj = 373Pa, Tj = 240K. The pressure ratio of the jet to the free-stream is about 2000. For the jet-off case, a velocity mesh of 121 × 121 is enough. For the jet-on case, the local temperature decreases severely due to rapid expansion from the jet exit. Figure 1 shows the temperature contour. The temperature in the downstream of the jet near pts4 is about one order lower than the free-stream temperature. Thus, it’s necessary to refine the velocity mesh in order to resolve the corresponding distribution function. From the pre-conducted UGKS results, we choose 9 blocks of 5 × 5 sub-mesh and extend them to 9 × 9 sub-mesh. The final distribution function and the velocity mesh are shown in Fig. 2.

Fig. 1
figure 1

Temperature contour for the jet case

Fig. 2
figure 2

Distribution function at pts4

In this case, if we use global uniform mesh, the total mesh will be 241 × 241. With the local refinement technique, the total mesh is 121 × 121 + 9 × (9 × 9 - 5 × 5) = 15,145 which is only 1/3.8 of the former.

2.5 Parallelization

At present, hybrid parallelization similar to that in [16] is used. The space mesh is decomposed and parallelized with MPI which has been broadly applied in many traditional CFD software. In every MPI process, several threads are used with OpenMP. However, due to the architecture change of our new super cluster, three space dimensions and one velocity dimension decomposition technique is under developing, allowing for a larger parallel scale up to 10,000 cores in the near future.

3 Code framework

The UGKS solver package is based on the framework of our in-house NS solver, CARDC Hypersonic Aerodynamic Numerical Tunnel (CHANT) [29]. Figure 3 shows the general sketch. The whole package is composed of five parts: input, output, initialization, control and calculation. The flowfield of a certain configuration is obtained through calculations over all structured blocks one by one. Multi-stage interface is devised for further development. Fortran90 is used for all subroutines.

Fig. 3
figure 3

Framework of UGKS solver package

The current features of UGKS solver package can be summarized as follows:

  • 2D and 3D body-fitted structured multi-block mesh

  • Steady and unsteady simulations

  • Explicit and implicit methods

  • Conservative discrete ordinate method

  • Local refinement in velocity mesh

  • Shakhov model for monatomic gases

  • Rykov model for diatomic gases

  • Diffuse or specular reflection wall boundaries, free-stream boundary, outflow boundary, symmetrical boundary

  • Several models for the viscosity calculation such as hard sphere model, variable hard sphere model [30] or the Sutherland model

  • Hybrid parallelization with MPI and OpenMP

4 Validation cases

Five test cases are considered. UGKS results are compared with those obtained from either DS2V [5], MONACO [31], RariHV [32] or experiments. Fully diffuse solid boundary is used. In all cases, the global Knudsen number Kn is defined as

$$ Kn=\frac{\lambda }{\overline{L}} $$

where λ is the mean free path which is determined for either hard sphere (HS) molecules [30].

$$ \lambda =\frac{16}{5}\sqrt{\frac{m}{2\pi kT}}\frac{\mu }{\rho } $$

or variable hard sphere (VHS) molecules

$$ \lambda =\frac{2\left(5-2\omega \right)\left(7-2\omega \right)}{15}\sqrt{\frac{m}{2\pi kT}}\frac{\mu }{\rho } $$

where ω is the power law index of the viscosity, m is the atomic mass, k is the Boltzmann constant.

The main free-stream conditions for all cases are summarized in Table 1.

Table 1 Free-stream conditions

4.1 Hypersonic flow over a 400 wedge

The angle of attack is 10 degrees. Figure 4 shows the pressure contour predicted by UGKS. Figures 56 and 7 display the pressure, heat flux and shear stress distributions on the surface, respectively. The UGKS results and DS2V results are almost identical, indicating that UGKS code package and DS2V can predict flows with similar accuracy.

Fig. 4
figure 4

Pressure contour of the wedge

Fig. 5
figure 5

Pressure distribution on the wedge surface

Fig. 6
figure 6

Heat flux distribution on the wedge surface

Fig. 7
figure 7

Shear stress distribution on the wedge surface

4.2 Super and hypersonic flows over a 2D cylinder

This is a quite comprehensive test case covering supersonic and hypersonic flows in all regimes. We also use this case for validating the CDOM and implicit techniques described in section 2.

For Mach number 10, both DOM and CDOM calculations are conducted. Figure 8 shows the variable |SS(1)| in the cells just near the wall at different velocity space meshes. When the velocity space mesh increases, the numerical source term decreases but will stay at a certain level finally. So increasing the velocity space mesh will not eliminate the source term. However, the source term will be on an order of 10− 14~10− 15 if CDOM is applied. The total drag at different velocity space meshes is given in Fig. 9. Obviously, the mesh dependence with CDOM is much smaller than that with DOM. The solution at 61 × 61 mesh with CDOM can be considered as mesh convergent while with DOM the same result can only be obtained at a much finer mesh of 121 × 121. Thus, the time and memory cost will decrease by nearly three quarters with the help of CDOM.

Fig. 8
figure 8

|SS(1)| on the cylinder surface

Fig. 9
figure 9

Cylinder drag coefficient vs points in u(v) direction

Figures 10 and 11 show the convergent histories of the drag coefficient and residual for Mach number 5 and Knudsen number 0.01, respectively. A comparison of the explicit and implicit methods in convergence rate is shown in Table 2. Nc.E and Nc.I are the total iterations steps for a convergent solution for the explicit and implicit methods, respectively. Rs is the speed-up ratio for the implicit method, where the denominator 1.02 comes from the fact that the computational cost of one time step for the implicit method is about 2% more than that for the explicit method. A speed up ratio of nearly two orders can be achieved.

Fig. 10
figure 10

Convergent histories of the drag coefficient with Ma = 5.0 and Kn = 0.01

Fig. 11
figure 11

Convergent histories of the residual with Ma = 5.0 and Kn = 0.01

Table 2 Comparison of the explicit and implicit methods in convergence rate

Figures 121314 and 15 show the comparisons between UGKS and DS2V for a diatomic nitrogen gas. The UGKS results are obtained with the Rykov model with rotational degrees of freedom. Thus, the heat flux can be divided into two parts, the contributions of translational degree and rotational degree. Good agreements can be seen, providing a sound validation for our UGKS code for diatomic gases.

Fig. 12
figure 12

Pressure distributions on the cylinder surface at M = 1.96

Fig. 13
figure 13

Slip velocity distributions on the cylinder surface at M = 1.96

Fig. 14
figure 14

Incident heat flux distributions on the cylinder surface at M = 1.96

Fig. 15
figure 15

Reflective heat flux distributions on the cylinder surface at M = 1.96

Figures 16 and 17 are the results for Mach number 5. Figures 18 and 19 are the results for Mach number 10. Figures 20 and 21 are the results for Mach number 25. We omit some comparisons at certain Mach numbers because of space limitations.

Fig. 16
figure 16

Pressure distributions on the cylinder surface at M = 5

Fig. 17
figure 17

Heat flux distributions on the cylinder surface at M = 5

Fig. 18
figure 18

Pressure distributions on the cylinder surface at M = 10

Fig. 19
figure 19

Shear stress distributions on the cylinder surface at M = 10

Fig. 20
figure 20

Pressure distributions on the cylinder surface at M = 25

Fig. 21
figure 21

Heat flux distributions on the cylinder surface at M = 25

Table 3 gives the drag coefficient comparisons. The maximum relative error is only 2.03%.

Table 3 Comparisons of cylinder drag

4.3 Hypersonic flow over a 2D cone

Figure 22 gives the computational configuration. The angle of attack is 0 degree. The pressure contour and streamlines are shown in Fig. 23. The altitude in the figure is only ‘nominal’ which means that only the temperature and number density at the corresponding altitude are used, since the air is treated as a monatomic gas. In other words, internal degrees of freedom are ignored. The two global Knudsen numbers in Table 1 for cone case correspond to nominal altitudes 60 km and 85 km, respectively. The flow pattern is relatively simple, i.e., a bow shock in front of the blunt body and a vortex in the bottom similar to that in a backward step case. However, the bow shock in front of the 85 km case is much weaker than that in the 60 km case. The recirculation zone in the bottom is smaller, too.

Fig. 22
figure 22

Cone model

Fig. 23
figure 23

Pressure contour and streamlines

Figures 2425 and 26 show the pressure, heat flux, and shear stress distributions on the cone surface, respectively. The abscissa indicates the distance from the very begin of the cone on the surface. The bottom pressure at 60 km rises about one order from the corner to the center of the bottom, resulting in a large adverse pressure gradient and inducing a large separation. At 85 km, the pressure curve is rather flat and only small adverse pressure gradient occurs. Moreover, the minimum pressure, heat flux and stress at the bottom are almost three orders lower than the maximum values on the cone. UGKS can capture these phenomena as accurately as the DS2V.

Fig. 24
figure 24

Pressure distributions on the cone surface (a) Body (b) Bottom

Fig. 25
figure 25

Heat flux distributions on the cone surface (a) Body (b) Bottom

Fig. 26
figure 26

Shear stress distributions on the cone surface (a) Body (b) Bottom

4.4 Supersonic and hypersonic flows over a sphere

The flow past a sphere is simulated with Rykov model to compare with the experimental drag coefficients [33]. The space mesh contains 21,840 cells while a velocity mesh of 41 × 41 × 41 is used.

Figure 27 shows the pressure contour for two cases. When the Knudsen number is large, variable gradient in the whole field is small. There is only weak compressive wave in front of the sphere.

Fig. 27
figure 27

Pressure contour on the symmetry plane and the sphere surface (a) M = 4.25, Kn = 0.031 (b) M = 5.45, Kn = 1.96

Table 4 gives the drag coefficient comparisons. The maximum relative error is only 2.64%. The agreements can be considered as excellent since the root mean square (RMS) error of the experiments is about ±2%.

Table 4 Comparisons of sphere drag

4.5 Supersonic and hypersonic flows over a X38-like vehicle

The angle of attack is 20 degrees in this case. The space mesh contains 334,434 cells while a velocity mesh of 33 × 33 × 33 is used. The total six-dimensional mesh reaches 1.2 × 1010. The reference area for the aerodynamic coefficient is 2.41 × 10− 2 m2.

Figure 28 gives the spatial streamlines around the vehicle with Mach number 4. When the free-stream Knudsen number is relatively small, the adverse pressure gradient can be large enough to induce the flow to separate from the boundary, resulting in the vortex in Fig. 28(a).

Fig. 28
figure 28

Spatial streamlines around the X38-like vehicle (a) M = 4, Kn = 8.41e-5 (b) M = 4, Kn = 8.41e-3

Figure 29 shows the local Knudsen number distribution near the surface. Local Knudsen number is calculated through Eq. (19) with the characteristic length \( \overline{L} \) substituted by the local gradient-length Q/|dQ/dl| proposed by Boyd [34]. In this paper, the density-based gradient-length is used. The local Knudsen number can cover a wide range of values with four to five order of magnitude difference. Thus, such a multi-scale method as UGKS is needed in order to correctly simulate these flow fields.

Fig. 29
figure 29

Local Knudsen number distribution (a) M = 6, Kn = 1.26e-4 (b) M = 6, Kn = 1.26e-2

Table 5 gives the aerodynamic coefficients comparisons for Mach number 8. The DSMC results are provided with RariHV which is an in-house DSMC software based on unstructured mesh in our group. The maximum relative error is only 2.27%.

Table 5 Comparisons of X38-like coefficients with Mach number 8

5 Conclusions

Our UGKS solver package is introduced including the main numerical techniques for improving the efficiency and accuracy, such as implicit method and local mesh refinement technique in the velocity space. It is devised for simulating flow fields around complex configurations for all flow regimes.

Several validations are conducted by comprehensive comparisons with industry-standard DSMC code and experimental results including the pressure, heat flux, shear stress and aerodynamic coefficients for supersonic and hypersonic flows at almost all regimes. The agreements are satisfactory in all cases.

Future work include more application to 3D complex configurations and complex flow, improvement on physical models to consider vibrational degree, implementation of models for gas mixtures, and increases in computational efficiency and accuracy.