Numerical aspects of a Godunov-type stabilization scheme for the Boltzmann transport equation

We discuss the numerical aspects of the Boltzmann transport equation (BE) for electrons in semiconductor devices, which is stabilized by Godunov’s scheme. The k-space is discretized with a grid based on the total energy to suppress spurious diffusion in the stationary case. Band structures of arbitrary shape can be handled. In the stationary case, the discrete BE yields always nonnegative distribution functions and the corresponding system matrix has only eigenvalues with positive real parts (diagonally dominant matrix) resulting in an excellent numerical stability. In the transient case, this property yields an upper limit for the time step ensuring the stability of the CPU-efficient forward Euler scheme and a positive distribution function. Similar to the Monte-Carlo (MC) method, the discrete BE can be solved in time together with the Poisson equation (PE), where the time steps for the PE are split into shorter time steps for the BE, which can be performed at minor additional computational cost. Thus, similar to the MC method, the transient approach is matrix-free and the solution of memory and CPU intensive large systems of linear equations is avoided. The numerical properties of the approach are demonstrated for a silicon nanowire NMOSFET, for which the stationary I–V characteristics, small-signal admittance parameters and the switching behavior are simulated with and without strong scattering. The spurious damping introduced by Godunov’s (upwind) scheme is found to be negligible in the technically relevant frequency range. The inherent asymmetry of the upwind scheme results in an error for very strong scattering that can be alleviated by a finer grid in transport direction.


Introduction
The semiconductor industry is moving toward nanowire gate-all-around MOSFETs to improve the gate control [1][2][3]. Due to the small wire diameter, short gate lengths are feasible, but this does not mean that the transport is quasi-ballistic and scattering should be included in the simulations [4,5]. Furthermore, transient simulations are required to determine the switching speed and RF performance of those devices. These types of simulations are very CPU intensive, when performed by quantum transport models (QTM) including scattering, and the drift-diffusion model (DDM) calibrated with results of a QTM is often used instead for device design (e.g., [5]). The DDM is not only by many orders of magnitude more CPU efficient than the QTMs, but also numerically very stable [6,7]. Sophisticated algorithms exist to routinely obtain convergence of the nonlinear system of partial differential equations [8]. This is the reason, why the DDM is still the workhorse of today's TCAD. On the other hand, it was pointed out quite early that the DDM suffers from stringent approximations and might fail in the case of quasi-ballistic transport [9]. The Boltzmann transport equation (BE) is a more physics-based (albeit more CPU intensive) semi-classical alternative, which can handle intricate band structures together with quasi-ballistic and diffusive transport [10,11]. In the case of nanowires, the multi-subband BE is solved together with the 3D Poisson equation (PE) and Schrödinger equations (SE) for the cross sections [12,13]. In this paper, we will show that the discrete BE can compete with the DDM w.r.t. numerical stability.
The BE is a partial differential equation with the possibility of discontinuous solutions, which can lead to numerical problems when it is mapped onto a grid in phase space. The most popular method to stabilize the BE for a 3D k-space is its expansion by spherical harmonics in the k-space and a transformation onto the total energy (H-transform) [14]. Only the lowest spherical harmonic is nonnegative, and by projection onto the spherical harmonics, it is no longer guaranteed that the distribution function is nonnegative [15]. This degrades the stability. However, by choosing a different set of functions that are nonnegative themselves for projection, this problem can be avoided [12,13,[16][17][18][19][20]. Such functions are directly defined in the phase space and are piecewise constant with a value of either zero or one. The areas in which the functions are one do not overlap for different functions, which makes them orthogonal in this sense. Projection of the BE onto such functions is a prerequisite to preserve the nonnegativeness of the distribution function.
The simplest set of functions is given by a tessellation of the phase space into non-overlapping contiguous cells, where for each cell a function is defined that is one within the cell and zero outside of it. For the tessellated phase space, a discrete equivalent of the BE has to be found that conserves the nonnegativeness of the distribution function and that is numerically stable. If we neglect scattering, the BE reduces to the Liouville equation, which describes particle flux conservation in the phase space. Exact flux conservation in the phase space is possible by discretizing the Liouville equation with the finite volume method (FVM), where the cells of the tessellated phase space are the finite volumes [21]. Thus, fundamental conservation laws (e.g., charge conservation) are satisfied by construction of the discrete BE, if the scattering integral is discretized accordingly.
The calculation of the fluxes on the surfaces of the finite volumes has a strong impact on the numerical stability. Since the motion of particles is similar to waves in fluids, we use methods developed for computational fluid dynamics, which can handle, for example, discontinuous distribution functions [21]. Such an approach was presented in Ref. [22] based on Godunov's scheme [23]. As pointed out in Ref. [24], the finite volumes should be aligned with lines of constant total energy (H-transform) to avoid spurious diffusion in the k-space. It should be mentioned that the H-transform comes at the price of additional derivatives w.r.t. time in the discrete BE, because in the transient case the phase space grid depends on time [25]. This again leads to spurious diffusion in the k-space as shown below, whereas a phase space grid fixed in time does not have this problem. Thus, in the transient case, spurious diffusion in the k-space cannot be completely suppressed by the H-transform.
In this work, we discuss the numerical properties of this approach, which is not limited to the 1D k-space of nanowires, but can also be applied in the case of a 2D or 3D k-space. The approach can handle complex band structures as they occur, for example, in the case of PMOSFETs [26,27]. The nonlinear scattering integral including the Pauli principle is discretized in such a way that it conserves the particle number and satisfies the principle of detailed balance at equilibrium. We will show that the discrete BE yields nonnegative solutions by construction and that the corresponding system matrix is diagonally dominant resulting in excellent numerical stability. Due to its stability, the discrete BE can be integrated in time by the CPU-efficient forward Euler scheme. The numerical properties of the simulation approach are demonstrated for a silicon nanowire NMOSFET under stationary and transient conditions. Furthermore, we discuss the problems of the spurious damping and asymmetry introduced by Godunov's scheme.

Simulation model
In the following, the time argument t is not shown to reduce the length of the formulas. It is implicitly understood that all solution variables, quantities derived from them and applied voltages can depend on time.

Subband structure
In the case of a nanowire MOSFET, we solve the PE, SE and BE self-consistently as explained in Refs. [12,20]. The quasi-stationary SE yields the subband structure with the total energy tot (z, k) of the th subband and the wave function Ψ (x, y;z, k) for a cross section of the wire at position z in real space and wave number k. The cross section is parallel to the x, y-plane. is the index of the eigenstate and might contain in addition a band or valley index (e.g., the six lowest valleys of the silicon conduction band). For the subband structure, we assume inversion symmetry with tot (z, k) = tot (z, −k) and that the subband energy is a continuous function of the wave number. Otherwise the dependence on the wave number is not restricted and an example is shown in Fig. 1, which does not show monotonic behavior for positive wave numbers.
The th subband is split into monotonic branches as a function of the wave number, where the th branch has the minimum energy min , (z) and maximum one max , (z) . Due to the monotonic behavior, the group velocity in z direction is either positive or negative for the whole branch. In addition, in Fig. 1, two types of reflection at an energy barrier are shown. The electron moves into the positive z-direction and after reflection into the negative one. The branches with = 1, −2 have a negative velocity and if the energy of the electron is below max 1 (see the lower dashed line in Fig. 1), two possible final states after the reflection exist which satisfy conservation of the total energy. In the framework of classical transport theory, the electron ends up in the state, which is closer to the initial one, when moving into the direction after the reflection (negative direction in this case and = 2 → = 1 ). If the energy is higher than max 1 , the usual specular reflection with k → −k and = 2 → = −2 occurs (upper dashed line in Fig. 1).

Real-space grid
The real space in z direction is mapped onto a grid with grid nodes z i and i = 1, … , N z . The corresponding finite volume is given by the interval (z L i , z R i ) with the left boundary and the right one The width of the finite volume in real space is Within a finite volume of the z grid, the subband structure is assumed to be constant with tot ,i (k) = tot (z i , k) , Ψ ,i (x, y;k) = Ψ (x, y;z i , k) etc.

Energy grid
The total energy is mapped onto a grid with a constant spacing of with nodes j = (j − 1) + 0 and j = 1, … , N . This energy grid spans at least the whole range of the total energy from the lowest min , ,i to the highest max , ,i within the device. The phase-space grid is the tensor product of the real-space and energy grids. For the sake of legibility, the indices , , i of a branch in phase space are combined to an index = ( , , i) . Each monotonic branch of a subband has to be mapped onto the phase-space grid with the energy index of the lowest energy N min and highest one N max The finite volume in the total energy is given for j = N min , … , N max by the interval ( − j, , + j, ) with the lower boundary and the upper one In Fig. 2, the minimal energy of an energy branch is shown which is not exactly aligned between two grid nodes of the energy grid.
The box width in energy is for the lowest (and highest) finite volume often smaller than and always positive due to the monotonically increasing energy (0 < j, ≤ ) . The corresponding width in wave number is given by . This quantity can be positive or negative depending on the sign of the group velocity of the branch. We therefore introduce lower and upper limits of the k-space boxes in ascending order with The average velocity of the energy box is given by The energy between the lower and upper box limits is linearly interpolated in k with this velocity which can be inverted The 1D density-of-states is positive by definition and constant within a box.

Boltzmann transport equation
The multi-subband BE describes the spatiotemporal evolution of the distribution function f (z, k) , which is the density in the 2D phase space (z, k) of the th subband [12,28,29] The scattering integral describes local and instantaneous single-particle scattering including the Pauli principle [28] L sys is the 1D system volume and W , � (z, k, k � ) the transition rate of a scattering event from state ′ , k ′ into , k . The transition rate is calculated by Fermi's Golden Rule [12,28] The upper sign refers to emission of the energy transfer h of the th scattering process and the lower one to absorption. M , , � (z, k, k � ) is the matrix element of the th scattering process. The Bose-Einstein distribution is given by where k B T 0 is the thermal energy for the ambient temperature T 0 [28]. The electron density, for example, is given by where the factor of two accounts for spin degeneracy.

Godunov-type stabilization
The driving force in (17) is the derivative of the total energy w.r.t. the z-coordinate [12]. In order to apply Godunov's stabilization scheme to the BE the drift operator is moved into the interface between two adjacent cells by assuming that the subband structure is constant within a cell and changes abruptly between cells (Fig. 2) [20,22,30,31]. Within a finite volume in real space, the force is now zero and i = 1, … , N z . Box integration of the distribution function over a finite volume in phase space with the abbreviation = (j, , ) The phase space of the subband and branch (grey area). The thick solid line is the minimal total energy of the subband and the dashed one the maximal. The hatched area is the finite volume of the phase space for position z i and energy j of this subband. The velocity is either positive for all finite volumes of a given branch or negative yields with (15) a distribution function which is constant within the phase space volume ( , i) and positive for a positive f (z, k) . By integration of (22) over the finite volume (Godunov's scheme), we obtain [22,23] The time derivatives of the k-space boundaries of the phasespace cell are due to the averaging of the distribution function in (23). The distribution functions at the four edges of the rectangular phase space box have to be evaluated by appropriate Riemann solvers [21] (see Sect. 3.4).

Discrete scattering integral
For the scattering integral, we assume that the distribution function is constant within a phase space cell with in-scattering and out-scattering In many models for single-particle scattering, the energy transfer and matrix element are assumed to be constant (e.g., [32]). If we furthermore assume that the energy transfer is a multiple of the energy grid spacing with h = m , where m is a positive integer, the discrete transition rates can be easily evaluated The energy overlap of the initial box ( − � ,i , + � ,i ) and final one ( − ,i , + ,i ) is given by A similar formula can be derived for out-scattering The energy overlap of the initial box ( − ,i , + ,i ) and final one ( − � ,i , + � ,i ) is given by and the energy overlap is symmetric. Since the matrix element is self-adjoint, we get furthermore These symmetries and the choice of the energy transfer ensure that the principle of detailed balance holds at equilibrium as shown in the next section. In addition, the box integration results in a discrete scattering integral that is particle-number conserving by construction.

Principle of detailed balance
At equilibrium, the principle of detailed balance must hold [28]. With the equilibrium distribution function, which is in this case the Fermi-Dirac distribution where F is the Fermi energy, scattering from one state into the other must exactly equal the reverse transition The same result is obtained, if we neglect the Pauli principle and use the Boltzmann distribution. If we assume that the scattering from ′ to is an emission, the reverse process must be an absorption, and inserting the above expressions for the transition rates, we obtain, after eliminating the equivalent factors on both sides of the equation, for the th scattering process With the Bose-Einstein distribution (20), this equality can be shown to hold, because the energy transfer of the scattering processes was aligned with the energy grid (h = m ) , and the principle of detailed balance also holds in the discrete case.

Riemann solver for the cell interfaces
The distribution functions at the left and right boundaries of (24), are calculated by particle flux conservation (Riemann solver [21]). For a subband branch with a positive velocity, which is constant within an energy cell, the flux in positive z-direction at the right cell boundary is given with (13) by with f ⋆R ,i =f ,i , because all particles arriving at the right boundary of the cell must stem from within the cell due to the positive velocity [22,23].
The case at the left boundary for a positive velocity is quite different and particles enter the phase space cell ( , i) from different cells under conservation of the total energy. This requires the index of the total energy j to be the same for the initial and final boxes, while the corresponding wave numbers and velocities can change during the transition (Fig. 2). Since the energy index does not change during the transition, spurious dispersion in energy, which occurs, when the energy grids of two adjacent cells are not aligned,  (24), the derivative w.r.t. z is approximated by an upwind scheme [21]. Since the sign of the velocity does not depend on the solution variables, the upwind direction is fixed and does not change during the simulation. Another source of particles entering the cell at the left boundary is reflection of particles traveling into the negative direction. In the case of specular reflection (the upper case in Fig. 1) with � = − , the flux due to reflection is given with ̄= (j , − , ) by with possible barriers at the top ( + ) and bottom (−) of the cell boundary (Fig. 3) and the sum of those two In the case that the barrier at the left boundary spans the whole energy width of the cell (e.g., cell (j − 1, , , i − 1) in Fig. 2), the total flux of that cell is reflected When particles move from one cell to another, the total energy is conserved, and the sum of all energy overlaps is exactly the energy width of the cell, into which the particles enter Particles with a total energy outside of the energy interval of this cell can not enter the cell. If there are multiple cells, which contain the same total energy overlapping with this cell, only particles from the cell, which satisfies the rules of classical transport theory (see Fig. 1), can enter. Eq. (46) ensures flux conservation (see (38)).
In the case of the second reflection type shown in Fig. 1, a similar set of equations can be derived, which contains more cases, because the cell, into which the particles are reflected, might overlap for example with the energy max 1 and in this case both types of reflection could contribute to the flux. The corresponding formulas are cumbersome and not shown, but easy to program due to the use of the total energy and (46) still holds.
In the case of a cell with a negative velocity, the left and right boundaries have to be interchanged in the formulas and the derivation of the final expressions is straightforward and also not shown.
In the non-stationary case, the subband structure depends on time via the potential and thus the k-grid. This time dependence of the k-grid leads to additional time derivatives in the discrete BE (24) w.r.t. the k-space boundaries of the phasespace boxes (Fig. 4). The distribution functions at the upper and lower boundary of the cell in the wavenumber are given by where j ′ , ′ are the indices of the adjacent cell in positive k-direction with k u j, , ,i = k l j � , � , ,i , and

Boundary conditions for the BE
At the left interface of the first cell in real space (source contact) and at the right interface of the last cell in real space (drain contact), Dirichlet boundary conditions are applied for the inflowing particles [33,34]. At the left interface for all with a positive velocity, the distribution function is given by the Fermi-Dirac distribution (34) and at the right one with negative velocity by V S , V D are the source and drain potentials, respectively. The Fermi energy F is the energy reference and the built-in potentials at the contacts are chosen such that the contacts are charge neutral at equilibrium with V S = V D = 0 . The number of particles due to the Fermi-Dirac distributions at the contacts should not change with an applied bias. This might require adjustment of the built-in potentials due to the use of a grid in total energy. In the case of equilibrium, the Fermi-Dirac distribution leads to a vanishing scattering integral because of the principle of detailed balance. Furthermore, since the Fermi-Dirac distribution depends on the total energy and is position independent, it satisfies with (46) the diffusion term of the discrete BE (24). Since the solution is stationary at equilibrium, the time derivative also vanishes and the Fermi-Dirac distribution (34) evaluated at the box energies j is the exact solution of the discrete BE at equilibrium.

Non-negative solution of the BE
The solution of the BE is nonnegative for nonnegative boundary conditions. The discrete BE should have the same property. To show this, we will investigate the stationary case and neglect the Pauli principle. In this case, the BE is linear and results in a linear system of equations, of which the system matrix should have a nonnegative inverse. For example, an M-matrix has a nonnegative inverse [35].
The elements of the distribution function are arranged into a vector with [ ] ( ,i) =f ,i for all possible indices in the grey area of Fig. 2. Similarly, the boundary conditions are assembled in a vector and the discrete BE takes the form Â = with the system matrix Â =D +Ŝ , which is the sum of the diffusion matrix D and scattering matrix Ŝ .
The main diagonal of an M-matrix should be positive and the off-diagonal elements nonpositive. The sum over all elements of a row should be nonnegative, and for an irreducible matrix, the sum over at least one row must be positive [35]. This is usually the case due to the boundary conditions, for which the corresponding rows of the system matrix have only a one on the main diagonal and zeros otherwise The diffusion term in (24) for a positive velocity v ,i > 0 is given for specular reflection with (38), (39) by The corresponding nonzero entries of the diffusion matrix are on the main diagonal and on the off-diagonals and These three entries have the required signs and their sum is exactly zero due to (46). For a negative velocity v ,i , the matrix entries have the same properties. Similar results are obtained for general reflections.
For the matrix to be irreducible and non-singular, scattering is required [36]. The contribution of the scattering integral to the system matrix is without the Pauli principle . 1 This choice of the values at the boundaries leads to diffusion in the k-space for transient simulations, because more particles move from cells with a larger distribution function to cells with a lower one than in the opposite direction. In addition, the selection of the boundary value depends on the simulation results, which can cause problems under the sinusoidal-steady-state condition. These problems could be reduced or avoided by using the arithmetic mean of the distribution functions at the boundary, which is also particle number conserving. Numerical stability of the transient simulation might suffer, but this is not important in the small-signal case as long as the problems occur at frequencies so high that they are no longer relevant.

Separation of the distribution function into the equilibrium distribution function and a positive factor g ,i yields
With the nonnegative and non-singular diagonal matrix and the vector of the factor we obtain =Ĥ . Multiplication of the system matrix from the right by Ĥ and from the left by the inverse leads to The diffusion matrix D is not changed by the transformation, because the elements in the rows of the diffusion matrix are nonzero only for j � = j and are therefore not modified by the transformation, which depends only on j . The transformed scattering matrix Ŝ ′ is an M-matrix due to the principle of detailed balance (36) [15] where the transition rates are nonnegative. Since the sum of two M-matrices is also an M-matrix, when both matrices satisfy the row criterion, the transformed system matrix has property M. The inverse of the system matrix remains nonnegative under transformation with a nonnegative diagonal matrix and the distribution function is nonnegative by construction. Since this M matrix has only eigenvalues with positive real parts and is diagonally dominant [35], the numerical stability of the discrete BE is excellent.

Stability in the time domain
If the Pauli principle is neglected, the BE itself is linear in the distribution function, and for a stationary potential, only the first time-derivative in (24) remains Since the transformation in (59) does not change the eigenvalues of the matrix, these are also the eigenvalues of the matrix Ĥ −1âĤ , which is again an M-matrix, 2 of which all eigenvalues have positive real parts [35]. Thus, all poles have negative real parts and the discrete BE is stable for a stationary potential due to Godunov's stabilization scheme, scattering and the boundary conditions. For a potential, which is translationally invariant in z-direction (zero force), the subband structure does not depend on z and the BE has a unique solution even in the ballistic case without scattering. The stationary distribution function within the device is given in this simple case by the boundary conditions. All poles still have a negative real part although scattering is neglected, because in this special case the diffusion matrix together with the boundary conditions yields a non-singular M-matrix. Thus, (65) det s lÎ +â = 0 in addition to the boundary conditions [33], Godunov's scheme introduces dissipation in the system.
To investigate this numerical dissipation, a plane wave with the real angular frequency and wave number q traveling into the positive z-direction ( > 0, ℜ{q} > 0) is considered In the ballistic case, the BE (22) reduces to a linear advection equation and the dispersion relation of the plane wave is given by The wave number is real for a real frequency and the wave is not damped (ℑ{q} = 0) due to the assumption of zero scattering. With the ballistic discrete BE for a positive velocity and a homogeneous subband structure we obtain for an equidistant grid in z-direction with z i = z the characteristic equation Expansion up to the second order in the frequency and wave number yields the dispersion relation Compared to the exact analytical result (67), an additional negative imaginary part occurs due to Godunov's scheme, which leads to damping of the wave. The imaginary part is proportional to the grid spacing and thus due to the discrete formulation of the system. A finer real-space grid leads to less numerical dissipation, of which the impact is strongest at high frequencies for states with a low group velocity. Due to the artificial damping, an explicit Euler scheme is possible even in the ballistic case.
The Laplace transformation of the discrete BE (61) yields in the case with scattering and a stationary potential, which can depend on the z-coordinate, for the forward Euler scheme the characteristic equation which has the roots .
The real part of the poles s l must be negative and the time step must be so small that l t lies within the circle The eigenvalues l have a positive real part, and with Gershgorin's theorem [37], an upper bound for the positive time step can be calculated that satisfies (74) with the total out-scattering rate The equality in (75) holds, because the transformation does not change the main diagonal elements of the matrix â.

Transient simulations
Since the explicit Euler scheme is stable for the BE, we will use it to solve the BE, PE and SE self-consistently in the transient case. After discretization, the three equations form a system of differential algebraic equations, where only the BE contains a time derivative and is solved by the forward Euler scheme (see 5.1). This yields a distribution function f end j, , ,i,l−1 at the end of the l − 1 th time step. With this distribution function, the electron density is calculated by (21) for time t l and the PE is solved self-consistently with the SE. The coupled solution of the PE and SE is necessary, because the electron density depends on the wave functions which in turn depend on the potential. Since solving the SE results in a new subband structure tot ,i,l (k) , the k-grid k ± j, , ,i,l has to be rebuilt for the time-invariant grid of the total energy as shown in Sect. 2.3. The change in the k-grid requires the mapping of the distribution function based on the old k-grid onto the new one. This is detailed in Sect. 5.2. After a selfconsistent solution of the PE and SE is obtained, the BE is solved again by the forward Euler scheme based on the new potential and the initial condition is the old distribution function mapped onto the new k-grid.
(72) exp(s l t) − 1 t + l = 0 . (73) The transient simulations are performed with two different times steps. The larger one t PE l = t l − t l−1 , which is the time between updates of the PE together with the SE, is a multiple of the time step t BE k = t k − t k−1 for the BE. The time step for the BE should be at least twice as short as the stability limit (75) to obtain smoother transient solutions (i.e., the matrix in the square brackets in (77) has only eigenvalues with positive real parts). The CPU time for the calculation of the subband structure and transition rates, which has to be done every time the PE and SE are solved, is larger than the CPU time for the iteration of the BE by the forward Euler scheme, and the CPU-time increase due to multiple time steps of the BE does not dominate the total CPU time. This makes it possible to use time steps for the PE similar to the MC method [38,39].
The explicit time integration scheme avoids the solution of large linear systems of equations and does not, therefore, require much memory. Furthermore, the explicit scheme can be efficiently parallelized.

Forward Euler scheme for the discrete BE
The discrete BE with the time derivative (63) integrated by the forward Euler scheme is given for the time step t BE k by For the distribution function int,k at time t k to be nonnegative, the right-hand side of this equation must be nonnegative. The matrix −ĉ and the vector are both nonnegative as is the distribution function int,k−1 . The matrix in the square brackets is nonnegative, because the off-diagonal elements of −â are nonnegative and the main diagonal elements of Î − t BE kâ are positive, if t BE k satisfies (75). Thus, a positive distribution function remains positive during the iteration of the forward Euler scheme.
If the Pauli principle is included in the scattering integral, the total out-scattering rate is an upper bound of the contribution of the scattering integral to the main diagonal With (75) and a distribution function, which is smaller than one, this ensures a positive distribution function in the case of the nonlinear scattering integral. Similar arguments show that a distribution function, which is smaller than one, remains smaller than one during the iteration.

Mapping of the distribution function onto a new k-grid
| k j, , ,i,l | = 1 .
(84) The approach is consistent with the choice of the distribution function at the upper and lower boundaries of the phase space cells in (47),(48). If the subband structure does not change, the distribution function remains the same. On the other hand, for a transient change in the band structure, charge diffuses in the k-space from cells with a larger distribution function to cells with a lower one. While the grid in total energy suppresses diffusion in the k-space for stationary simulations, this is no longer the case for transient ones.

Simulation results
We simulate the same silicon nanowire NMOSFET as in Ref. [22] based on a parabolic conduction band with six elliptical valleys. The lattice temperature is in all cases 300K. The silicon cuboid has a cross section of 5 nm× 4 nm and a length of 32 nm. The gate length is 16 nm and the source/drain regions are each 8 nm long. The donor concentration is in the drain/source regions 1 × 10 2 0cm − 3 , and the channel is undoped. The perfectly insulating gate oxide with the permittivity of SiO 2 has a thickness of 1 nm and the work function of the gate is 4.3eV. The channel direction is aligned with the ⟨100⟩-axis, which in turn is aligned with the z-axis. The grid spacings in the cross section are 0.5 nm in the x and y-directions and, if not mentioned otherwise, 0.5 nm in the z-direction as well. The energy grid has a spacing of 5meV. We include all subbands in an energy range of 0.6eV above the lowest subband edge in the source resulting in up to 6 subbands per valley. If not stated otherwise, all simulations are performed with a k-grid aligned with the total energy (H-transform).
In the stationary case, we solve the nonlinear BE by the Newton-Raphson method, where the resultant linear system of equations is solved by a GMRES with the ILUT preconditioner [40,41]. The discrete SE is solved by the LAPACK routine DSYEV [42] and the PE by PARDISO [43]. All three equations are self-consistently solved by a Gummel loop, where the nonlinear charge term of the PE is based on imrefs [44]. The iteration is stopped, when the change in potential is less than 1 v . In the transient case, the matrixfree forward Euler method is applied to the BE. The selfconsistent solution of the SE and PE is obtained by a damped Picard iteration, where the charge term of the linear PE is based on the distribution functions evaluated by the BE (21). In contrast to Ref. [22], the SE is not solved in the real space but for a sine base leading to more accurate results for a real space grid in the cross section with a spacing of 0.5 nm, which is sufficiently fine for the PE [27].
The nanowire NMOSFET was simulated for diffusive and quasi-ballistic transport. In the case of diffusive transport, the original scattering integral is used, of which the scattering parameters are adjusted to match a mobility of about 100cm 2 V − 1s − 1 in the channel resulting in strongly diffusive transport [45,46]. For the quasi-ballistic simulations, the complete scattering integral is scaled by a factor of 10 −4 and the impact of scattering is negligible. The simulations include 65 grid nodes in z-direction and 15 × 13 grid nodes in the cross section. For the bias point V GS = V DS = 0.5V , the number of variables of the distribution function is 680550. The CPU time on a single core of a current CPU is for this bias point about 48min.
An advantage of the k-grid aligned with the total energy is that it does not suffer from spurious k-space diffusion in the stationary case [24]. To demonstrate this, in Fig. 5, the distribution function is shown for a quasi-ballistic simulation not only with a k-grid that is aligned with the total energy (H-transform) but also with one that does not depend on the position in real space (fixed k-grid, compare Ref. [22]). In the case of the H-transform, the distribution function in the drain shows the expected abrupt increase at a large value of the positive wave number. This is due to electrons from the source, which pass the source/channel barrier. Since scattering is neglected, the threshold behavior of the energy barrier is preserved when the particles move with constant total energy through the channel and drain. In the case of the fixed k-grid, diffusion occurs and the abrupt change in the distribution function is smeared out over the wave number. In both cases, the distribution functions are positive and show no artificial oscillations.
In Fig. 6, the transfer characteristics of the nanowire transistor are shown at V DS = 0.5 V. With and without scattering, smooth results are obtained over many orders of magnitude. The subthreshold slope is 64 mV/dec with scattering and 63 mV/dec without at V GS = 0.15 V. It does not depend on the spacing of the energy grid, which was checked by simulations with different spacings (not shown). This is in contrast to the fixed k-grid, which leads to artificial diffusion in the k-space (cf. Fig. 5), which degrades the subthreshold slope [24]. On the other hand, this does not mean that simulations based on the fixed k-grid fail catastrophically as shown in Ref. [20], where good results for the subthreshold slope were obtained with a feasible number of grid points in the k-space.
In the remainder of the paper, only the case of the H-transformed grid is discussed. The output characteristics are shown for diffusive and quasi-ballistic transport in Figs. 7, 8, respectively. The currents are smooth for both types of transport, where the lower output resistance in the quasi-ballistic case leads to better saturation of the drain current. The distribution function is for all bias points between zero and one, and does not show any spurious oscillations. At zero drain/source bias, it is given with very high accuracy by the Fermi-Dirac distribution and the scattering integral satisfies the principle of detailed balance as shown Sect. 3.3. The results in Figs. 5, 6, 7, 8 for the quasi-ballistic case demonstrate the excellent stability of Godunov's scheme.
The on-switching is simulated by ramping V GS within 10 fs from 0 V to 0.5 V at V DS = 0.5 V. The time step for the PE is t PE = 0.5 fs and for the BE t BE = 0.1 fs. The latter value is about three times smaller than the upper limit given by (75). The CPU time for a simulation of one picosecond on a single core is 300 h. Within the first 10 fs, the gate and drain currents are dominated by the displacement current due to the linear increase in the gate voltage (Fig. 9), where the gate current is positive and the drain current negative. After the charge buildup under the gate, the drain current changes sign and remains positive. In the quasi-ballistic case, plasma oscillations are visible in the drain and gate currents for more than 100 fs, whereas in the diffusive case these are strongly damped. In both cases, the turn-on time is less than 200fs.    Fig. 10, the drain current is shown as a function of the scaling factor of the scattering integral for different spacings of the real-space grid. For small values of the factor, the transport is ballistic and saturates at a value, which is almost independent of the grid spacing in z-direction. At large factors, the transport is diffusive and the current should be inversely proportional to the factor, which is not the case. Instead, the current saturates at a level, which depends on the grid spacing. For smaller grid spacings, the current is longer inversely proportional to the factor and saturates at a lower level.
This erroneous behavior is a consequence of the asymmetry of the upwind scheme in conjunction with the constant reconstruction of the distribution function. In the case of very strong scattering, the distribution function is very close to the equilibrium one. Let us assume that the Pauli principle can be neglected, that the difference in subband energy between two adjacent cells in real space is zero and that the left cell has a higher electron density than the right one. In this case, all electrons with a positive velocity in the left cell move to the right and all electrons with a negative velocity in the right cell move to the left. Since the distribution function is very close to the equilibrium one, the modulus of the average velocity of particles moving to the left and right is the same and does not depend on the particle density (so-called injection velocity [47]). Since the current is proportional to the product of this velocity and the density, the difference in the particle densities in the left and right cells means that the currents in positive and negative direction do not cancel each other and lead to a finite total current which depends only on the difference in the density. A finer grid in real space leads to a smaller difference in the density between adjacent cells and therefore to a smaller saturation current (Fig. 10). 3 The problem could be alleviated by using instead of the constant reconstruction of Godunov's scheme a higher order one within a cell, but this is not trivial and might cause other problems [21]. Nevertheless, as always in the case of finite difference schemes, it has to be checked that the grid is sufficiently fine to ensure an acceptable discretization error.
The stability of the transient simulations depends on the time between successive solutions of the PE [39]. This is demonstrated in Fig. 11 for the response of the drain current to a small voltage pulse applied to the drain for quasiballistic transport and a constant t BE = 0.1 fs. A time step of t PE = 2 fs leads to instability and the current response does not vanish at large times. The simulations are stable   Fourier transformation of the impulse response yields the drain self-admittance (Fig. 12). Below 10 THz (Fig. 12), the impact of the different time steps can be ignored and a time step of t PE = 1 fs seems to be adequate.
In Fig. 13, the drain self-admittance is shown for three different spacings of the grid in z-direction and t PE = 0.5 fs, t BE = 0.1 fs. The quasi-ballistic case is simulated, because the plasma oscillations are not damped by scattering. At low frequencies, the dependence on the grid spacing is weak as expected (see Fig. 10). Since the numerical damping, which occurs mainly at high frequencies, decreases with decreasing grid spacing (cf. (70)), the impact of the grid spacing is much stronger at high frequencies, where plasma oscillations occur. On the other hand, frequencies of 10 THz and more are technically not relevant in silicon and the impact of the numerical damping of Godunov's scheme can be ignored.

Conclusions
We have shown that the discrete BE can be formulated with Godunov's scheme in such a way that it is numerically very robust. Its system matrix is diagonally dominant and the distribution function is positive in the stationary and transient cases, where in the latter case the time step of the forward Euler scheme must satisfy the limit derived in the paper. If the Pauli principle is included in the scattering integral, the transient distribution function remains in the interval between zero and one. The transient solution of the BE and PE can be performed CPU and memory efficiently by the matrix-free forward Euler method, which can be easily parallelized on a computer cluster. Since Godunov's scheme is based on the asymmetric upwind scheme, care has to be taken that the grid in real space is sufficiently fine to reduce numerical errors to an acceptable level. Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http:// creat iveco mmons. org/ licen ses/ by/4. 0/.