1 Introduction

Constant progress in the integrated circuit technology has been supported by the miniaturization of semiconductor silicon devices. Recent technological innovations make it possible to create semiconductor structures with the channel lengths of metal-oxide-semiconductor field-effect-transistors (MOSFETs) in the order of 10 nm or even smaller. However, the microelectronic scaling has been showing signs of reaching the end point due to production cost of chip fabrication, limitations in processing chemistry and physical properties of the materials.

Continued progress is no longer a matter of further reduction but may require novel approaches in science behind materials, device architectures and MOSFET technology for future semiconductor industry. The increase of the integration level has compelled the development of a variety of novel devices such as double-gate [1, 2] and gate-all-around (GAA) transistors [3,4,5,6], FinFET [7,8,9], heterojunction tunneling devices [10,11,12], carbon nanotubes and nanoribbons. Fundamental physical limitations of silicon has also stimulated the search and development of innovative materials with higher carrier mobility and reduced sensitivity to high temperatures. New types of gate electrodes, various modifications of silicon dioxide and high-k dielectrics are being studied in connecting with applications in MOSFET technology. The modern growth mechanisms [13] and processing technologies enable one to synthesize composite materials with advanced material characteristics, low densities of defects and good control in uniformity.

In spite of this progress, continued downscaling meets with serious technological challenges as well as increased costs of fabricating and testing possible candidates for future electronic devices. In order to make an educated guess among many available options, it is desirable to build a proper device simulation environment that does not require any empirical data or parameters optimization. Detailed theoretical studies can help in better understanding the correlation of material properties and transport characteristics and addressing other practical issues which are needed to accelerate the development of MOSFETs with ultra short channels.

At the subnanometer scale, quantum effects play a crucial role in the device transport characteristics. Classical and semiclassical approaches fail and must be replaced by computationally much more intensive quantum-mechanical methods. Moreover, accurate transport modeling must capture details of the semiconductor band structure and dynamic properties of mobile carries in the presence of impurities, crystal imperfections or lattice mismatch at the channel interface. This stimulates the development of an accurate atomistic quantum device simulator based on the first-principle density functional theory (DFT) [14, 15] for electronic-structure calculations. As of now, although various first-principle device simulators has been reported [16,17,18,19,20], there remain many practical issues regarding high computational cost and applicability to heterogeneous nanostructures which is essential for developing predictive tool and assisting the next-generation device development.

Significant efforts has been focused on developing effective methods for computing quantum transport in nanoscale devices [21,22,23,24]. The quantum transmission boundary method [25,26,27,28] and the nonequilibrium Green’s function (NEGF) formalism [29,30,31] are commonly used in quantum device simulations. At the device dimensions less than the dephasing length, ballistic transport is expected to dominate the device behavior. In this regime, the non-elastic processes can be neglected and the two methods are equivalent since the nonequilibrium state of the device can be described in terms of appropriate one-particle scattering wave functions. In quasi-one-dimensional transport, a device can be partitioned into small blocks in the transport direction and the Schrödinger or Dyson equations can be solved recursively without huge operations of the order of the entire device. However, the recursive algorithms requires complex-valued matrix inversion operations which may become numerically costly [32]. On the contrary, the R-matrix method [33, 34] only operates on real-valued matrices and offers more options for numerical optimization.

In this paper we present a device simulator which efficiently performs large-scale first-principle transport simulations in realistic systems with tens of thousands of atoms in the device channel. Our computational scheme is based on the real-space representation of the density functional theory which is suitable for parallel computations. In this scheme, singular ionic potentials are replaces by smooth non-local pseudopotentials \(V_{\textrm{NL}}\) and the Kohn–Sham equations are treated as finite-difference equations on \(N_{\textrm{grid}}\) mesh points in the computational domain. Fast Fourier transformation in this scheme is unnecessary and due to the sparse nature of the Hamiltonian matrix the computational time for the main matrix operation \(H_{\textrm{KS}} \Psi _{\textrm{DFT}}\) scales as \(\textrm{O}(N_{\textrm{grid}} / N_{\textrm{CPU}})\) which makes it suitable for parallel simulations in realistic nanostructures. The real-space DFT (RSDFT) simulator [35] can perform atomic configuration optimization and electronic structure calculation in a supercell of up to 10,000 atoms with periodic boundary conditions. The self-consistent Bloch Hamiltonian \(H_{\textrm{KS}}(q)\) contains the nonlocal terms \(\sim q\partial / \partial r\) and \(\textrm{e}^{iqr} V_{\textrm{NL}} \textrm{e}^{-iqr}\) which in the real grid representation can be redefined as \(\sum _{n} W_{n}\exp (iqa_n)\) where q is the wave vector and \(a_{n}\) is a translation vector in the Bravais lattice. A set of \(N_{\textrm{grid}} {\oplus } N_{\textrm{grid}}\) matrices \(W_{n}\) can be used to define a device Hamiltonian and compute non-equilibrium electronic states with arbitrary boundary conditions. However, direct application of this approach to the NEGF simulations in realistic nanostructures are prohibitively difficult because of too heavy computational burden. For example, in a 10 nm-diameter Si nanowire channel with cutoff energy \(21 \, E_h\) (\(E_h\) is the Hartree energy), the size of the supercell is of the order of \(\sim 10^7\). In order to improve the numerical accuracy of the finite difference approximation one has to deal with a long range kinetic energy matrix [36] which complicates the application of commonly used recursive Green’s function algorithm. The R-matrix approach is much more suitable in such systems, since all the numerical operations in this method are real-valued and the recursion sequence can be chosen in arbitrary way thus giving full control over the size of the required inversion matrix operations. The propagation algorithm in the R-matrix method is constructed by adding an arbitrary small group of mesh points to a computation domain where the R-matrix is assumed to has already been calculated. The size of the R-matrix is determined by the number of mesh points with non-vanishing interaction with the rest of the device area. For a long range kinetic energy matrix, it is of the same order of magnitude as the whole supercell and the transport simulations may be prohibitively difficult in terms of both processing time and memory requirements. The heavy computational burden can be reduced by introducing a low rank basis representation for the Green’s function. In the effective mass approximation, the well known mode space approach makes use of few lowest eigenstates (subband modes) computed at fixed value of coordinate along the current. The scattering problem in this representation is thus reduced to solving one-dimensional matrix equations in the transport direction. In atomistic simulations, the physically relevant transport modes (conducting electrons or holes) are located far from the edge of the total spectrum of atomistic Hamiltonian. Hence, even though one can always extract enough atomistic Bloch eigenstates to reproduce all relevant modes, there is no energy minimization principle to guarantee correct spectrum and energy band gap in the basis-transformed transport Hamiltonian. This problem has been addressed in Ref. [37] by constructing a low-dimensional equivalent model (EM) for atomistic transport Hamiltonians. The method has been recently implemented and extended by several groups [19, 20, 38,39,40] and the results have confirmed good accuracy of the reduced representation and applicability of the method to realistic nanodevices. The method has been often referred to as atomistic mode space approach. Here we prefer the original method name in order to indicate the difference with the ordinary selected eigenstates representation for bounded spectrum. The abbreviation EM should not be confused with the effective mass approximation, although the method plays a similar role, providing a simple approximation for the low-energy electrons in nanostructures.

The paper is divided into five sections. In Sect. 2, we introduce necessary modifications to the EM scheme in order to incorporate the method into the real-space first-principle density functional program and obtain a reduced transport model suitable for large-scale parallel calculations. In Sect. 3, we briefly discuss general properties of the R-matrix approach and demonstrate compatibility of the EM representation. In Sect. 4, we discuss practical application of the R-matrix method in the EM representation and present test calculations in realistic nanostructures. In Sect. 5, we discuss an alternative application of the EM approach which may offer an interesting option to facilitate numerical studies of quantum transport in more realistic inhomogeneous nanostructures.

2 Equivalent model for large-scale real-space DFT calculations

In this section, we discuss construction of the equivalent transport model for the RSDFT Hamiltonian using a Si nanowire as an example. The RSDFT simulator code [35] performs structural optimization and computes self-consistently an one-particle DFT Hamiltonian in a supercell with periodic boundary conditions in all three dimensions. A suitable device Hamiltonian can be defined by lifting the periodicity restriction along the wire and imposing zero boundary conditions at the boundary of the channel cross section. It is expected that these changes in the supercell geometry and boundary conditions do not affect drastically the transport and dielectric properties of the nanostructure. In the RSDFT simulator, the interaction with the nucleus and inert core electrons is described by pseudopotential [41] using the Fourier-filtering [42] method in order to obtain the nonlocal part of the pseudopotential in the form of short range projectors [43]. Thus, one can redefine the unit cell by including a narrow vacuum region of \(\sim 0.1-0.2\) nm width outside the passivation hydrogen atoms and ignore the rest of the computational domain as long as the corresponding projector operators are fully present in the modified system. The energy band structure can now be found by solving the eigen-value problem for the Bloch Hamiltonian

$$\begin{aligned} H(q) = H_0 + W \textrm{e}^{iq} + W^T \textrm{e}^{-iq}, \end{aligned}$$
(1)

where q is the normalized wave vector in the transport direction, \(H_0\) is the part of the Kohn–Sham Hamiltonian for an isolated supercell and the remaining part of the non-local interaction forms the coupling terms \(W(W^T)\). In the local-density approximation (LDA), only the non-local part of the pseudopotential for the boundary ions and the kinetic energy contributes to the coupling terms leading to the nearest neighbor interaction in the transport Hamiltonian.

The size of the RSDFT eigen-value problem can be rather large depending on the cutoff energy and the number of ions. In the device simulations, one is normally interested in a rather narrow energy spectrum of mobile carriers in the vicinity of the band gap which suggests using a projection type methods [44, 45] for computing the relevant Bloch states \(\Psi _{\nu q}\) where \(\nu\) numerates the energy states at given wave vector. The main numerical challenge in such methods comes from the complex-valued resolvent \(\sim [z - H(q)]^{-1}\) at a set of z-points in complex-valued energy space and the computational time for M eigen states scales as \(\sim N_{\textrm{grid}} \, M \, N_{\textrm{inv}} \, N_{\textrm{iter}}\), where the number of steps \(N_{\textrm{iter}}\) does not strongly depend on the system size \(N_{\textrm{grid}}\). On the other hand, numerical tests show that the required number of iterations in the numerical inversion \(N_{\textrm{inv}}\) grows faster than the size of the system and the calculations become rather time consuming in systems with hundreds of ions. Another way of solving the Kohn–Sham eigenvalue problem in electronic-structure calculations is a subspace iteration method [36, 46]. The algorithm is based on the recursive construction of a subspace spanned by a set of \(M \ll N_{\textrm{DFT}}\) low energy eigenstates. Given approximate set of orbitals \(\Psi _{\nu q}\), one applies the conjugate gradient (CG) method to each member of this set in order to reduce the corresponding expectation energies (Rayleigh quotient) \(\varepsilon _{\nu q} \equiv \langle \Psi _{\nu q} |H(q)| \Psi _{\nu q} \rangle / \langle \Psi _{\nu q} | \Psi _{\nu q} \rangle\). Thus obtained set of states is used to construct a new ortho-normalized basis for the “improved” low-energy subspace. Projecting the original Hamiltonian onto this subspace and solving the corresponding low-dimensional eigenvalue problem generates a new set of orbitals which are used to start the next iteration. The most time-consuming part in this approach is due to subspace projection (\(t \sim N_{\textrm{grid}} \, M^2\)). The computational cost can be reduced by regrouping the larger portion of the required operations into a matrix-by-matrix product [36] and making use of a highly tuned linear algebra library (BLAS) [47]. The subspace diagonalization (SD) can be done by using standard LAPACK (SCALAPACK) library [48] and the computational time for this step \(\sim M^3\) is relatively small in supercells up to few thousands atoms. The projection subspace is much larger in the CGSD method but the number of CG interactions should be small (typically \(\sim 2-3\)) in order to preserve linear independence of the orbitals which makes the method suitable for large systems. At the same time, in order to use the Bloch eigenstates in building the EM representation it is essential to ensure high numerical accuracy. Hence, in our simulations we combine these two methods. The CGSD method is used to generate all the states in the valence band as well as a low energy part of the conduction band. The numerical accuracy in the conduction band is further improved by applying the FEAST algorithm [45] for a narrow energy range relevant for transport.

Fig. 1
figure 1

(Color online) The valence band (left) and conduction band (right) in a 1 nm-diameter Si wire. Solid lines represent the band structure in the original RSDFT simulator and squares correspond to the device Hamiltonian for the modified supercell geometry

Fig. 2
figure 2

(Color online) Electronic band structure in a 8 nm-diameter Si nanowire (SiNW) computed in the original RSDFT simulator (left) and using the device Hamiltonian with zero boundary conditions at the supercell boundary (right)

Figures 1 and 2 presents a comparison between the band structures in two nanowires with diameters 1 nm and 8 nm for the Bloch Hamiltonian in Eq. (1) with zero boundary condition and the original periodic supercell. Our calculations confirm that different choice of the supercell geometry does not lead to any significant changes in the valence band spectra. A small positive shift of the conduction band spectrum can be seen in both cases but the energy difference in the band gap does not exceed 0.1 percent which is quite acceptable for our present purpose.

The new supercell geometry naturally defines the boundary of the conducting channel. In the device simulations, the Poisson equation is solved by assuming a continuous dielectric material in the gate region outside the RSDFT computational domain. More realistic description of the channel boundary requires incorporation of the area of dielectric into the RSDFT domain as discussed briefly in the following section.

The first step in building the EM representation for the RSDFT Hamiltonian is to construct an orthonormal basis \(\Phi\) from a representative set of Bloch eigenstates within physically relevant energy interval \([ E_1, E_2]\). The number of the orbitals and the size of the preliminary basis \(N_b\) is kept as small as possible but it has to be large enough to ensure that any Bloch state \(\Psi _{\nu q}\) with the energy \(\varepsilon _{\nu }(q) \in \left[ E_1, E_2\right]\) can be represented by this basis

$$\begin{aligned} \Psi _{\nu q} = \Phi \psi _{\nu q} \end{aligned}$$
(2)

for all q with enough accuracy. The coefficients in the basis representation are found by solving the corresponding eigenvalue problem

$$\begin{aligned} h(q)\psi _{\nu q} = \varepsilon _{\nu }(q) \psi _{\nu q}, \end{aligned}$$
(3)

for the \(N_b \times N_b\) model Hamiltonian

$$\begin{aligned} h(q) = \Phi ^T H(q) \Phi . \end{aligned}$$
(4)

Hereafter, we employ matrix notations and omits the indices for grid points, basis states etc. To make a distinction with the original real-space grid representation all the quantities (Hamiltonian, self-energy, wave function etc.) in the basis representation are denoted using lowercase letters.

The eigenstates approximation provides a good representation for a bottom part of the spectrum of bounded operators. However, for an arbitrary energy interval the energy variation principle does not apply and the method fails due to appearance of spurious states with close energies. The false states can be identified by large (\(\ge 1\)) values of the residual factor

$$\begin{aligned} \Delta _{\nu } \equiv \left| H(q)\Psi _{\nu q} - \varepsilon _{\nu }(q) \Psi _{\nu q}\right| /\textrm{max} (|E_1|,|E_2|) \left| \Psi _{\nu q}\right| \end{aligned}$$
(5)

which for the physical states is typically of the order of \(\sim 10^{-2}\) – \(10^{-3}\) depending on the numerical accuracy in the CGSD scheme and the number of the primary representative RSDFT orbitals. In principle, an accurate low rank model can be constructed by simply adding extra electronic states from the valence band below \(E_1\). Such a direct construction can easily be done in the case of two-dimensional materials, but it is impractical for realistic 3D nanostructures. The EM method [37] allows one to eliminate the spurious states from the targeted energy range \([E_1, E_2]\) and obtain the correct spectrum of the carriers while maintaining reasonable size of the basis representation. The idea of the method is to find a sequence of mutually orthogonal basis vectors such that adding a new vector \({\widetilde{\Phi }} \perp \Phi\) to the previous basis gives a new model Bloch Hamiltonian

$$\begin{aligned} {\widetilde{h}}(q) = {\left( { \begin{array}{cc} h (q) &{} X(q) \\ X^\dagger (q) &{} H_{{\widetilde{\Phi }} {\widetilde{\Phi }}}(q) \end{array} } \right) } \end{aligned}$$
(6)

with fewer eigenstates within the energy interval \(\left[ E_1, E_2 \right]\). Since adding a new basis state does not affect the physical states in Eq. (2) constructing the correct band structure can be viewed as a minimization problem. In practical simulations, it is convenient to use the basis of Bloch states Eq. (3). In this case \(h_{\nu \mu }(q) = \varepsilon _{\nu }(q) \delta _{\nu \mu }\) and X(q) represents the matrix elements

$$\begin{aligned} X_{\nu }(q) = \langle \xi _\nu (q) | {\widetilde{\Phi }}\rangle . \end{aligned}$$
(7)

Here we introduced a new vector

$$\begin{aligned} \xi _{\nu }(q) = P H(q) | \Psi _{\nu q} \rangle , \end{aligned}$$
(8)

where \(P = 1 - \Phi \Phi ^T\) is the projector operator to the orthogonal complement \(V_{\Phi }^{\perp }\) of the subspace spanned by \(\Phi\).

It has been shown [37] that the change of the band structure in Eq. (6) compared to the previous representation h(q) can be “measured” by the variational functional

$$\begin{aligned} F[{\widetilde{\Phi }}] = \sum _i \Delta N(q_i,E_1,E_2) + ( \Vert {\widetilde{\Phi }} \Vert ^2 -1 )^2, \end{aligned}$$
(9)

where the first term is a sum over a set of \(n_q\) representative wavenumbers \(q_i\) and \(\Delta N(\ldots )\) is defined as

$$\begin{aligned} \Delta N(q,E_1,E_2) \equiv \left\langle \frac{ \Vert \widetilde{\Phi }\Vert ^2 + A_2(z,q) }{z \Vert {\widetilde{\Phi }} \Vert ^2 - H_{{\widetilde{\Phi }} {\widetilde{\Phi }}} (q) - A_1(z,q)}(z - \varepsilon _c) \right\rangle , \end{aligned}$$
(10)

where

$$\begin{aligned} A_n(z,q) \equiv \sum _{\nu } \frac{\left| X_{\nu }(q)^2 \right| }{\left( z - \varepsilon _{\nu }\right) ^a};\quad n = 1,2 \end{aligned}$$
(11)

and \(\langle \dots \rangle\) stands of for the average value

$$\begin{aligned} \left\langle f(z) \right\rangle \equiv \frac{1}{2n_z}\sum _{k=1}^{2n_z}f(z_k) \end{aligned}$$
(12)

over a set of points in the complex z-plane \(z_k = \varepsilon _c + \rho \textrm{e}^ {\frac{i\pi }{n_z}\left( k-\frac{1}{2} \right) }\) along the contour with center \(\varepsilon _c = (E_1+E_2)/2\) and radius \(\rho =(E_2-E_1)/2\). \(\Delta N(q,E_1,E_2)\) represents a change in the number of energy levels at the wavenumber q within the energy interval \([E_1,E_2]\) [37]. Finding the minimum of the variational functional Eq. (9) is equivalent to constructing the model Hamiltonian Eq. (6) with less density of states. The number of spurious states at each q within arbitrary energy range can change at most by one and the best solution is obtained when one unphysical state eliminated (shifted deep into the valence band) at all \(q_i\) which corresponds to \(F [{\widetilde{\Phi }} ] \approx -n_q\).

Dealing with the RSDFT Hamiltonian in realistic nanostructures requires massively parallel computations. The numerical performance strongly depends on computational details including the definition of orthogonal variational subspace and algorithms for choosing parameters and starting guess for effective minimization. In Ref. [37], we constructed \({\widetilde{\Phi }}\) as a linear combination of the vectors \(\xi _{\nu }(q)\) in which case the variational freedom is limited within a rather small \(\le 3N_b\) dimensional orthogonal subspace spanned by all the functions with maximum coupling \(X_{\nu }(q)\) over the Brillouin zone. In this case, Eq. (9) becomes a simple analytical function of the expansion coefficients which is suitable for numerical applications. The reasoning behind this definition is that shifting the energy of a particular spurious state \(\varepsilon _{\nu _0}\) at the boundaries of the Brillouin zone requires large value of the corresponding coupling terms \(X_{\nu _0}\) in Eq. (6) which leads to a natural choice for starting (real-valued) guess \({\widetilde{\Phi }} \sim \xi _{\nu _0}\) and the variational subspace which incorporate all such functions. However, the RSDFT first principle calculations in systems with hundreds or thousands of atoms reveal strong dispersion of the spurious states especially in nanostructures with asymmetrical cross section (e.g. nanosheets). In such cases the number of spurious states grows and the previous simple choice often become insufficient for eliminating unphysical states at all \(q_i\). In this work we generalize the previous method by allowing for more variational freedom and formulate a reliable strategy for adjusting parameters in the variational calculations. In many cases this significantly reduces the size of the final EM basis and facilitates the transport simulations.

Let us consider a particular unphysical state \(\varepsilon _{\nu _0}(q_i)\) in Eq. (3). In large systems, with rare possible exceptions, the diagonal term \(H_{\widetilde{\Phi }{\widetilde{\Phi }}} \gg \varepsilon _{\nu _0}\) and for a poorly chosen \({\widetilde{\Phi }} \in V_{\Phi }^{\perp }\) the coupling matrix elements \(X_{\nu _0} \sim 1/ \sqrt{N_{\textrm{grid}}}\) which does not make any noticeable energy change \(\Delta \varepsilon _{\nu _0}\). In this case \(\Delta N(q, \ldots )\) reduces to a small positive contribution from a single level located outside the targeted energy interval and the minimization of \(F [{\widetilde{\Phi }}]\) fails to produce the desired solution.

In order to improve the numerical efficiency of the variational calculations we consider two types of states. The simplest candidates for \({\widetilde{\Phi }}\) are the vectors with the maximum matrix elements \(X_{\nu _0}\) at all \(q_i\). Introducing the real and imaginary components \(\xi _{\nu _0} = \xi _1 + i\xi _2\) we find

$$\begin{aligned} {\widetilde{\Phi }} = x_1 \xi _1 + x_2 \xi _2, \end{aligned}$$
(13)

where \(\left( {\begin{matrix} x_1 \\ x_2 \end{matrix}}\right)\) is the largest eigenvector of the \(2\times 2\) matrix:

$$\begin{aligned} \begin{pmatrix} \langle \xi _1 | \xi _1 \rangle &{} \langle \xi _1 | \xi _2 \rangle \\ \langle \xi _2 | \xi _1 \rangle &{} \langle \xi _2 | \xi _2 \rangle \end{pmatrix}. \end{aligned}$$
(14)

At the boundary of the Brillouin zone Eq. (13) reduces to the real-valued vectors \(\xi _{\nu }(0)\) and \(\xi _{\nu }(\pi )\) used previously to construct an orthogonal subspace for variational calculations [37]. One more trial vector of similar kind can be obtained by maximizing all the coupling terms at once. In a similar manner the solution is given by a linear combination of \(2n_q\) vectors \(\xi _1,\xi _2\) at all \(q_i\) with the coefficients computed as the largest eigenvector for the corresponding \(2n_q \times 2n_q\) matrix.

The second type of trial vectors can be obtained by estimating the maximum shift of the unphysical level \(\Delta \varepsilon _{\nu _0}\). The condition \(\det [\varepsilon - {\widetilde{h}} ] = 0\) gives

$$\begin{aligned} \varepsilon - \varepsilon _{\nu _0} = \frac{\left| X_{\nu _0}\right| ^2}{\eta _{\nu _0} - H_{{\widetilde{\Phi }} {\widetilde{\Phi }}}};\,\,\, \eta _{\nu _0}(\varepsilon ) \equiv \varepsilon -\sum _{\nu \ne \nu _0} \frac{\left| X_{\nu }\right| ^2}{\varepsilon - \varepsilon _{\nu }} \end{aligned}$$
(15)

and in the simplest approximation we set \(\eta _{\nu _0} \approx \varepsilon _{\nu _0}\). We introduce the new vector \(\eta = \eta _1 + i \eta _2 \equiv [ \varepsilon _{\nu _0} - {{\text {Re}}}\, H ]^{-1} \xi _{\nu _0}\) and find instead of Eq. (13)

$$\begin{aligned} {\widetilde{\Phi }} = x_1 \eta _1 + x_2 \eta _2, \end{aligned}$$
(16)

where \(\left( {\begin{matrix} x_1 \\ x_2 \end{matrix}}\right)\) is the largest eigenvector of the \(2\times 2\) matrix:

$$\begin{aligned} \begin{pmatrix} \langle \xi _1 | \eta _1 \rangle &{} \langle \xi _1 | \eta _2 \rangle \\ \langle \xi _2 | \eta _1 \rangle &{} \langle \xi _2 | \eta _2 \rangle \end{pmatrix}. \end{aligned}$$
(17)

Again, one might consider an additional vector of similar type by maximizing the energy shifts at all \(q_i\) at once. In computing the auxiliary vector \(\eta\), we do not require high accuracy and use a simple estimate obtained by a moderate number of iterations in a standard CG scheme for real-valued symmetric matrix.

Equations (13, 16) give an initial set of trial states which are likely to have strong effect on the band structure within the targeted energy interval. Similar to Ref. [37] one can now introduce an auxiliary orthonormal basis for the Krylov subspace spanned by these vectors and obtain from Eq. (10) a rational variational function of the expansion coefficients. However, the corresponding basis is rather large and the variational freedom restriction is no longer justified. Instead, in the RSDFT simulations we use the original definition Eq. (9) and construct the new basis state \({\widetilde{\Phi }} \in V_{\Phi }^{\perp }\) without additional limitations. We also introduce a small modification of the variational functional by allowing for q-dependent energy target intervals in the definition of \(\langle \dots \rangle\) in Eq. (10). The calculations are most effective when each term \(\Delta N(q_i,\ldots )\) in Eq. (9) returns a negative value \(\approx -1\) which corresponds to shifting the unphysical energy at given \(q_i\) outside the interval \([\varepsilon _c - \rho /2, \varepsilon _c + \rho /2]\). In a typical case of large \(H_{{\widetilde{\Phi }}{\widetilde{\Phi }}}\) the energy shift is negative and much better results are obtained from the variational functional

$$\begin{aligned} F[{\widetilde{\Phi }}] = \sum _i \Delta N(q_i,\varepsilon _1(q_i), \varepsilon _2(q_i)) + ( \Vert {\widetilde{\Phi }} \Vert ^2 -1 )^2, \end{aligned}$$
(18)

where at each q the targeting interval is placed above the spurious level \(\varepsilon _{\nu _0}(q)\) which needs to be eliminated. Thus, we define \(\varepsilon _1(q_i) = \varepsilon _{\nu _0}(q_i)\), \(\varepsilon _2(q_i) = \varepsilon _{\mu _0}(q_i)\) where \(E_1< \varepsilon _{\nu _0}(q_i) < E_2\) and \(\varepsilon _{\mu _0}(q_i)\) is the next spurious level in the reduce model Hamiltonian. If there are no unphysical states \(\varepsilon _{\nu }(q_i) \in [E_1, E_2]\), the original definition \(\varepsilon _{1,2}(q_i) = E_{1,2}\) is used.

Minimization of the variational functional \(F [{\widetilde{\Phi }}]\) is performed by the standard conjugate gradient algorithm [49, 50]. At each step the new conjugate directions is found as a a linear combination of the previous direction and the residual vector \(\delta F / \delta {\widetilde{\Phi }}\) calculated at the local directional minimum at the previous step. The residual vector is calculated in the form

$$\begin{aligned} \frac{\delta F}{\delta {\widetilde{\Phi }}}= & {} \alpha _0 {\widetilde{\Phi }} + \alpha _1 H_0 {\widetilde{\Phi }} + \alpha _1 W {\widetilde{\Phi }} +\alpha _1 W^T {\widetilde{\Phi }} \nonumber \\+ & {} \sum _{i,\nu } \left( \beta _{i\nu }\xi _{\nu }(q_i) + \mathrm {c.c.}\right) , \end{aligned}$$
(19)

where the coefficients \(\alpha\)s and \(\beta\)s are found from Eqs. (911). At each step one only need to perform simple RSDFT operations \(\sim H {\widetilde{\Phi }}\) since all the other terms are linear combinations of the previously computed RSDFT vectors with known coefficients. Most of the large matrix operations in the expansion coefficients and/or the line minimization are performed only once and the most time-consuming part of the numerical simulations is optimization of the initial trial states in Eq. (16).

Let us now summarize the main steps in the variational calculation. The first step is to compute the exact Bloch orbitals of the RSDFT Hamiltonian Eq. (1) at \(n_q\) equidistant wavenumbers \(0 \le q_i \le \pi\). We use the real and imaginary part of all the eigenstates with the energies within the targeted interval \(\left[ E_1, E_2\right]\) to form a primary \(N_b\) dimensional real-valued basis \(\Phi\) [37]. Compute the new set of the RSDFT vectors which defines a direct sum \(\Phi _1 \equiv PH_0\Phi \oplus PW\Phi \oplus PW^T\Phi\) which will be used further to form all the vectors \(\xi _{\nu }(q)\) in Eq. (8). The variational calculations of the additional basis vectors in the EM model proceed as follows.

  1. (1)

    For each wavenumber \(q_i\) solve the reduced eigenvalue problem Eq. (3) to obtain \(N_{b}\) eigenstates in the basis representation Eq. (2). Calculate the residual values Eq. (5) and find the lowest unphysical level \(E_1< \varepsilon _{\nu _0}(q_i) < E_2\). Use \(\xi _{\nu _0}(q_i)\) to construct the trial vectors Eqs. (13, 16).

  2. (2)

    From the obtained set of vectors select the trial state \({\widetilde{\Phi }}\) which returns the minimum value of the original variational functional \(F[{\widetilde{\Phi }}]\), Eq. (9). Compute the new set of three RSDFT vectors \({\widetilde{\Phi }}_1 \equiv PH_0{\widetilde{\Phi }} \oplus PW\widetilde{\Phi }\oplus PW^T{\widetilde{\Phi }}\).

  3. (3)

    For each \(q_i\) solve the reduced eigenvalue problem in the basis representation \(\Phi \oplus {\widetilde{\Phi }}\) and redefine \(\varepsilon _{\nu _0}(q_i)\) as well as the next unphysical level \(\varepsilon _{\mu _0}(q_i)\). Define the variational functional in Eq. (18) by setting \(\varepsilon _1(q_i) = \varepsilon _{\nu _0}(q_i)\) and \(\varepsilon _2(q_i) = \varepsilon _{\mu _0}(q_i)\). If there is no unphysical level \(\varepsilon _{\nu _0}(q_i) \in \left[ E_1, E_2 \right]\) set \(\varepsilon _{1,2}(q_i) = E_{1,2}\).

  4. (4)

    Minimize the variational functional \(F[{\widetilde{\Phi }} ]\) by the CG algorithm as outlined above.

Repeat the steps (2) and (3) to reach the minimum value of \(F[{\widetilde{\Phi }}]\).

Redefine the model \(\Phi {\oplus } {\widetilde{\Phi }} \rightarrow \Phi\), \(\Phi _1 {\oplus } {\widetilde{\Phi }}_1 \rightarrow \Phi _1\), \(N_b = N_b + 1\) and repeat (1) – (4) until there are no more spurious states at all \(q_i\).

As an auxiliary useful options one may add the band gap verification at step (2). This is done by solving the eigenvalue problem for computing the self-energy (see below) in the model \(\Phi \oplus {\widetilde{\Phi }}\) below the conduction band edge and counting the number of open channels. Then the selection of the trial state in (2) is to be made among the candidates with the smallest number of open channels. This option helps to ensure that steep unphysical branched are not missed by the simulator even if the q-grid is not dense enough.

Fig. 3
figure 3

(Color online) Constructing the EM basis in a 1 nm-diameter Si wire. The solid red lines represents the band structure in the reduced model before (left) and after (right) minimizing the number of states

Fig. 4
figure 4

(Color online) The same as Fig. 3 for a 8 nm-diameter SiNW

Figures 34 show two examples of the EM construction in Si wires with diameters 1 nm and 8 nm. The black points in these figures represent the exact eigenvalues of the Bloch Hamiltonians at \(n_q = 13\) representative wavenumbers. The left (right) panel show the band structure in the reduced representation before (after) the variational calculations. In these two examples, the correct transport models are obtained after 7 and 64 iteration steps reducing the size of the scattering problem by the factor of \(N_{b}/N_{\textrm{grid}} \approx 5 \times 10^{-3}\) and \(4 \times 10^{-4}\) respectively.

3 R-matrix method for ballistic quantum transport

The atomistic first-principle transport simulations require the full quantum description of non-equilibrium electronic state of the device. In the two-probe configuration the system under consideration consists of a central area (device) and two semi-infinite electrodes (leads) which are periodic in the transport direction. The part of the device at the contacts with the leads is assumed to be a seamless extension of the atomistic structure of the corresponding electrodes. In the NEGF formalism, the electric current and density are calculated from the Green’s functions which need to be found from the following system of equations

$$\begin{aligned} G^{R}(E)= & {} \left[ E - H_D - \Sigma ^R(E)\right] ^{-1}, \end{aligned}$$
(20)
$$\begin{aligned} G^{<}(E)= & {} G^R(E)\Sigma ^{<}(E){G^R}^+(E), \end{aligned}$$
(21)

where \(H_D\) represents the Kohn–Sham Hamiltonian for the electrons in the device area and \(\Sigma ^{R,<}(E)\) are the self-energy terms which describe possible scattering mechanisms. \(H_D\) also includes a potential term which depends on the applied bias and mobile density distribution which need to be computed self-consistently. Here we only consider ballistic transport in which case the retarded self-energy \(\Sigma ^{R}\) comes from the outgoing boundary conditions at the contacts and the lesser self-energy term \(\Sigma ^{<}\) describes the flow of mobile carriers from the corresponding reservoirs. In the ballistic regime, the Green’s functions are computed independently for each energy value and \(\Sigma ^{R,<}\) are spatially localized in the contact region which greatly reduces computational cost. A current-carrying device state can be computed in terms of the boundary part of the retarded Green’s function which is equivalent to solving an one-particle scattering problem. In the real space representation, the electronic states are calculated on a three-dimensional spatial grid and the electronic Hamiltonian is represented by a sparse (or block tridiagonal) matrix. The transport simulations are often conducted by the recursive Green’s function algorithm [51, 52] or quantum transmitting boundary method [25]. The RSDFT Hamiltonian generally has a long range nondiagonal part and the recursive algorithms require large complex-valued matrix inversion which yield prohibitively large computational loads. One the other hand, the R-matrix approach [34] only uses real-valued arrays and enables one to optimize the size of the inversion operations.

The retarded Green’s function in Eq. (20) can be calculated in terms of the corresponding Green’s function \(G_0\) in the close system without leads. In the ballistic simulations, one only needs a small boundary part of the Green’s function and all the results can be obtained from the real-valued symmetric R-matrix

$$\begin{aligned} R(E) \equiv P G_0(E) P = P\frac{1}{E - H_{D}}P, \end{aligned}$$
(22)

where P is the boundary projector to the contact region of the device D with non-vanishing Hamiltonian matrix elements \(H_{DL}\) where L indicates the grid points outside the device area. The symbol P should not be confused with the projection operator in the previous section where we studied trial functions orthogonal to the EM space.

For example, the drain current is given by the Landauer formula [29] (in a.u.)

$$\begin{aligned} I = 2 \times \int \frac{d E}{2 \pi } T(E)[f_1(E) - f_2(E)], \end{aligned}$$
(23)

where

$$\begin{aligned} T = \textrm{Tr} [ \Gamma _1 G^R \Gamma _2 {G^R}^\dagger ] \end{aligned}$$
(24)

is the ballistic transmission coefficient, \(\Gamma _p = i( \Sigma _p^R - {\Sigma _p^R}^\dagger )\) is the scattering rate in the p-th probe and \(f_p\) is the corresponding Fermi factor.

From the boundary projection of the Dyson’s equation one finds

$$\begin{aligned} PG^RP = ( 1 - R \Sigma ^R )^{-1}R \end{aligned}$$
(25)

and, since the self-energy is spatially localized \(\Sigma ^R = \Sigma ^R P = P\Sigma ^R\), the transmission coefficients Eq. (24) can be computed in terms of the R-matrix. Although the real-valued Hamiltonian matrix \(H_D\) in Eq. (22) is often very large, the R-matrix can be calculated recursively, like in other popular computational schemes [34]. For completeness, we outline here the R-matrix propagation scheme in the most general form.

One may consider an arbitrary fragment \(C\subset D\) of the device area and define the corresponding local R-matrix \(R_{ cc}\) by the equation similar to Eq. (22) with P, \(H_{D}\) being substituted by the corresponding operators \(P_c\), \(H_{c}\). One next adds a small area \(a \ll C\) from the rest of the device and calculate the new R-matrix as the boundary projection \({\widetilde{R}} = (P_c + P_a){\widetilde{G}} (P_c + P_a)\) of the Green’s function for the closed system \({\tilde{C}} = C + a\)

$$\begin{aligned} {\widetilde{G}} = \left( \begin{array}{cc} {\widetilde{G}}_{cc} &{}{\widetilde{G}}_{ca} \\ {\widetilde{G}}_{ac} &{}{\widetilde{G}}_{aa} \end{array} \right) = \left( \begin{array}{cc} E - H_{c} &{} -H_{ca} \\ -H_{ac} &{} E-H_{a} \end{array} \right) ^{-1}. \end{aligned}$$
(26)

The calculations are straightforward. Two blocks \({\widetilde{G}}_{cc}\) and \({\widetilde{G}}_{ac}\) in Eq. (26) are calculated in terms of matrix products which contain the Green’s function \(G_{cc} = [E-H_{c}]^{-1}\). Taking the boundary projection turns \(G_{cc}\) into \(R_{cc}\) and gives the final result

$$\begin{aligned} {\widetilde{R}}_{aa}&= \left( E - H_{aa} - H_{ac}R_{cc}H_{ca} \right) ^{-1}, \end{aligned}$$
(27a)
$$\begin{aligned} {\widetilde{R}}_{ac}&= {\widetilde{R}}_{aa} H_{ac} R_{cc};\,\,\,\,\,\, {\widetilde{R}}_{ca} = {\widetilde{R}}_{ac}^T, \end{aligned}$$
(27b)
$$\begin{aligned} {\widetilde{R}}_{cc}&= R_{cc} + {\widetilde{R}}_{ca} H_{ac}R_{cc}. \end{aligned}$$
(27c)

Equation (27) is just the R-matrix propagation recipe Eq. (19) in Ref. [34] written in a more compact form. In the above equations, the main Hamiltonian matrix operation \(X_{ac} \equiv H_{ac}R_{cc}\) needs to be performed only once. In practical application one can keep the size of a small and use column-like distribution of the R-matrix among the processors. Computing \(X_{ac}\) scales as \(\sim N_a N_c/N_{\textrm{CPU}}\) with a proportionality factor depending on the RSDFT parameters. Computing the total correction \(X_{ac}H_{ca}\) in Eq. (27a) and \({\widetilde{R}}_{ca}\) in Eq. (27b) also requires inter-mode communications which scale in a similar way and does not add computational burden in practice. The most time consuming part of the simulations is the real-valued rectangular matrix multiplication in Eq. (27c) \({\widetilde{R}}_{ca} X_{ac}\) which in parallel computations scales as \(N_aN_c^2/N_{\textrm{CPU}}\). This can be effectively performed even for a large R-matrix by making use of a highly tuned linear algebra library (BLAS) [47]. Finally, one redefines the boundary \(P_c + P_a \rightarrow P_{\tilde{c}}\) and reduce \({\widetilde{R}}\) by eliminating the rows and columns for the redundant points \(\notin P_{{\tilde{c}}}\). The propagation start from the R-matrix \(R_{aa} = \left( E- H_{a}\right) ^{-1}\) for an arbitrary initial fragment.

After the full R-matrix is computed one can add the contact self-energies and obtain the boundary part of the retarded Green’s function Eq. (25). For the p-th probe of nanowire the corresponding self-energy term is calculated as [53]

$$\begin{aligned} \Sigma _p^R =H_{ DL_p} \overrightarrow{\chi }_p \overrightarrow{Z}_p \, \overrightarrow{\chi }_p^{-1}; \,\, \end{aligned}$$
(28)

where \(\{\overrightarrow{\chi }_p, \overrightarrow{Z}_p\}\) is the outgoing/decaying solution of the generalized scattering eigenvalue problem [24] in the p-th probe. Even in the simplest cases, such as the one in Fig. 1, solving this problem in the original real-space grid representation is extremely difficult due to the large size of the supercell [54]. On the contrary, there is no difficulty in computing the 2\(N_b\) possible scattering states in the EM representation. The choice of the physical characteristics of the probes outside the conducting channel is a matter of convenience. The only essential condition is that the difference in the materials does not cause noticeable reflection from the contact area. The probes with the low rank EM Hamiltonian satisfies this condition by the construction. As a test, we compute the transmission function in a 1 nm-diameter Si nanowire (SiNW) in Fig. 1. Adding the EM contacts can be understood as an extra step of the R-matrix propagation. On can consider a final fragment as a composite of the unconnected unit cells at two contacts and Eq. (27a) gives the \(2 \times 2\) block R -matrix in the EM representation

$$\begin{aligned} r = \begin{pmatrix} E - h_1 - p_1 H_{LD} R H_{DL} p_1 &{} - p_1 H_{LD} R H_{DL} p_2 \\[1ex] - p_2 H_{LD} R H_{DL} p_1 &{} E - h_2 - p_2 H_{LD} R H_{DL} p_2 \end{pmatrix}^{-1} \end{aligned}$$
(29)
Fig. 5
figure 5

(Color online) a Mixing RSDFT and the contact self-energy in the EM representations. b Transition probability and (c) the band structure in a 1 nm-diameter ideal Si wire

where \(h_i\) and \(p_i\) stand for the unit cell EM Hamiltonian and the corresponding projector at the EM contact 1 (left) or 2 (right). The transfer part of the Hamiltonian at the contacts with the leads is determined by the one-sided EM basis transformation \(H_{DL}p_2 = W \Phi\), \(H_{DL}p_1 = W^T \Phi\) and their transpose. The transmission function can now be calculated without referring to the original real space grid representation. Figure 5 illustrates the outlines procedure. The computed transmission function exhibits a step-function like behavior which clearly indicates that the contacts between the original first principle model and the probes in the EM representation cause no unphysical backscattering.

It has also been shown that the R-matrix provides all the necessary information for computing the diagonal part of the lesser Green’s function \(G^{<}\). The boundary projection of the retarded Green’s function and the sequence of the local R-matrix elements \(\widetilde{R}_{ac}\), \({\widetilde{R}}_{aa}\) in Eq. (27) can be used to construct any scattering solution in the reverse order. Thus, one can calculate the mobile charge distribution and obtain self-consistently the electric potential in the device area. Application of the R-matrix method to the self-consistent device simulations is discussed briefly in the next section.

The above matching procedure is equivalent to computing the retarded Green’s function in a RSDFT + EM composite system with boundary unit cells being treated in the EM representation. A unit cell at the contact corresponds to the diagonal block \(h + \sigma ^R\) in the composite Hamiltonian matrix coupled to the rest of the system by the off-diagonal blocks \(W \Phi\) and \(\Phi ^T W^T\). Calculating the inverse in Eq. (20), for the part of the system in the original real-space representation one obtains the Green’s function in the same form as Eq. (20) with the RSDFT self-energy correction

$$\begin{aligned} \Sigma ^R = W \Phi \left( E - h - \sigma ^R\right) ^{-1} \Phi ^T W^T. \end{aligned}$$
(30)

Thus, one can use the EM representation only as a tool to compute the self-energy terms and treat the entire simulation domain using the original DFT basis. Note that Eq. (30) differs from the direct transformation \({\widetilde{\Sigma }}^R = \Phi \sigma ^R \Phi ^T\) suggested in Ref. [19]. Using the exact identity \(\sigma ^R = w ( E - h - \sigma ^R)^{-1} w^T\), which is valid for any self-energy in the form of Eq. (28), we obtain

$$\begin{aligned} {\widetilde{\Sigma }}^R = \Phi \Phi ^T \Sigma ^R \Phi \Phi ^T. \end{aligned}$$
(31)

Thus, even though the EM basis transformation of both \(\widetilde{\Sigma }^R\) and \(\Sigma ^R\) give the same result, in the original representation \({\widetilde{\Sigma }}^R\) contains an extra projector \(\Phi \Phi ^T \ne \mathbb {1}\) which should be avoided.

4 Device simulations in the EM representation

4.1 R-matrix method in quasi-one-dimensional systems

In realistic nanostructures, computing the R-matrix in the original real-space grid representation may become prohibitively time consuming. Although the size of the local R-matrix remains nearly constant in the course of the propagation, the boundary of arbitrary domain is determined by the range of the nonlocal interaction in the RSDFT Hamiltonian. Within the local density approximation (LDA), the decisive factor is the highest order in the finite-difference expression for the kinetic energy operator [36] and for a typical choice of parameters in the RSDFT geometry optimization simulations the kinetic energy has a long range up to \(6\,\) a.u. As a result, the size of the boundary region is of the same order of magnitude as the supercell of the wire. For example, the supercell in a 8 nm-diameter \(\langle 100 \rangle\) SiNW in Fig. 2 with the cutoff energy \(\sim 20 \, E_h\) contains more than \(N_{\textrm{grid}} \sim 1.1 \times 10^6\) grid points. For a 10 nm channel length, the total number of the multiplication operations in the R-matrix propagation is huge \(\sim 2 \times 10^{19}\) which makes it next to impossible to perform actual self-consistent device simulations.

In homogeneous nanostructures, one considers a device Hamiltonian in the form

$$\begin{aligned} H_{n}=H_0 + V_n;\,\,H_{n\,n+1}=H^T_{n+1\,n}=W, \end{aligned}$$
(32)

where n numerate unit structures (supercells) in homogeneous wire and \(V_n\) is an extra diagonal potential term which needs to be computed self-consistently with respect to the mobile charge distribution in the device area. In this case the EM basis representation can be directly used through the whole device structure. The corresponding low-dimensional device Hamiltonian is obtained by the EM basis transformation Eq. (4) which reduces the simulation time by the factor of \(\sim ( N_{\textrm{grid}}/N_b)^3\) and enable device research. Thus, the EM model in right panel in Fig. 4 contains \(N_b = 367\) real-valued basis functions which reproduce the band structure and the corresponding microscopic solution with 2 – 3 significant digits of accuracy at the bottom of the conduction band. The effective energy range in this representation \(\sim 0.4\) eV is more than enough for accurate ballistic transport simulations [37].

Fig. 6
figure 6

(Color online) I-V characteristics of a 8 nm-diameter n-SiNW FET whose schematic is shown in the inset of the left panel. The right panel shows the electric potential along the wire at four values of the gate voltage as indicated by the colored dots in the left panel

We consider a n-SiNW GAA MOSFET shown on the inset of Fig. 6 with a gate oxide thickness \(t_\textrm{ox}=2 \, \textrm{nm}\). In these simulations, the oxide layer is considered to be a continuous media with \(\epsilon _{\!\scriptscriptstyle \mathrm {SiO_2}} = 3.8\) which only affects the boundary conditions in the nonlinear Poisson equations. The validity of this approximation is briefly discussed below. The whole simulation domain is 30 nm in the transport direction which corresponds to \(M = 55\) supercells. Other parameters are: \(V_\textrm{SD}=0.1 \, \textrm{V}\), \(T=300\,\textrm{K}\), \(\epsilon _{\textrm{Si}}=11.9\), dopant concentration in the source/drain regions \(10^{20} \,\textrm{cm}^{-3}\). The first (source) and the last (drain) blocks are used to form two semiinfinite leads connected to equilibrium reservoir with the Fermi levels \(\mu _\textrm{D} = \mu _\textrm{S} - eV_\textrm{SD}\). The source Fermi level \(\mu _\textrm{S}\) is fixed by the flat band condition for the ideal wire.

For a given potential distribution, the Hamiltonians in unit cells \(n=1\) and \(n=M\) are used to compute the contact self-energies \(\sigma ^R_p\) in the EM representation

$$\begin{aligned} \sigma _1^R =w^T \overrightarrow{\chi _1} \overrightarrow{z_1} \,\, \overrightarrow{\chi _1}^{-1}; \,\, \sigma _2^R =w \overrightarrow{\chi _2} \overrightarrow{z_2} \, \overrightarrow{\chi _2}^{-1}, \end{aligned}$$
(33)

where \(\overrightarrow{\chi }_{1,2}\), \(\overrightarrow{z}_{1,2}\) are the matrices of the outgoing/decaying Bloch states and the Bloch factors in the corresponding (left or right) leads [53].

The NEGF equations in the EM representation do not involve any large quantities and can be solved easily. M supercells in the device are used as independent blocks of the R-matrix propagation. As become clear below, the propagation sequence must be chosen such that the 1-st and M-th contact blocks are added last. For example, one can start with the supercell \(n=2\) and add the rest of the blocks from the right one by one. Adding the last block \(n=1\) at the source contact completes the propagation sequence. An intermediate cluster in the propagation algorithms is a sequence of \(n-1\) unit cells \(2,3,\ldots ,n\). The boundary of the cluster consists of two unit cells and the R-matrix is defined as a \(2 \times 2\) \(N_b\)-dimensional block matrix \(\left( \begin{array}{cc} r _{11} &{} r _{12}\\ r _{21} &{} r _{22} \end{array} \right)\), analogous to the EM-transformed R-matrix in Eq. (29). After adding the next \((n+1)\)-th block from the right, the R-matrix is redefined as

$$\begin{aligned} r_{22}&= \left( E - h_{ M+1} - w^T r_{ 22} w\right) ^{-1} \end{aligned}$$
(34a)
$$\begin{aligned} r_{21}&= r_{22} w^T r_{21} \end{aligned}$$
(34b)
$$\begin{aligned} r_{11}&= r_{11} + r_{12} w r_{21} \end{aligned}$$
(34c)
$$\begin{aligned} r_{12}&= r_{21}^T, \end{aligned}$$
(34d)

which is a particular case of the previous general result Eq. (27) written in a form of recurrence relations. The initial conditions at the first step are given by \(r_{ij} = \left( E - h_2 \right) ^{-1}\). The final step of the calculations is to add the contact unit cell \(n=1\). The R-matrix transformation is given by Eq. (34) with \(h_{n+1}\) being replaced by \(h_1\) and opposite propagation direction which corresponds to the transformation \(1 \leftrightarrow 2\) and \(w \leftrightarrow w^T\). The final R-matrix and the contact self energies Eq. (33) determines the boundary projection of the retarded Green’s function in the form of Eq. (25). The drain current is calculated directly from the low-dimensional model without referring to the original real-space representation. On the contrary, the electronic density distribution on the original three dimensional grid in the computation domain is essential for calculating the electric potential in the device. In realistic systems, as the density of states grows, computing the electronic density becomes the most time-consuming part of the device simulations which is deserving of more consideration.

4.2 Optimization of the carrier density calculations

The electronic density in the NEGF formalism is determined by the diagonal part of the lesser Green’s function and can be evaluated as [29]

$$\begin{aligned} \rho = 2 \times \sum _p \int \frac{d E }{2\pi } f_p(E) \left[ \Phi g^R \gamma _p {g^R}^\dagger \Phi ^T \right] _{\textrm{diag}}, \end{aligned}$$
(35)

where \(\gamma _p\) is the EM scattering rate at the p-th contact. In practical implementations, the required number of operations grows as \(\sim N_{\textrm{grid}} N_b\) which becomes the major challenge for large systems. The R-matrix formalism enables one to significantly reduce the computational cost. We consider partial scattering eigenstates of the EM contact scattering operator

$$\begin{aligned} {\gamma }_p |p \nu \rangle = \Lambda _{p\nu } | p \nu \rangle , \end{aligned}$$
(36)

with positive \(\Lambda _{p\nu }\). Apart from rare exceptions of energy values located close to a scattering threshold in the band structure, the spectrum of \(\Lambda\)s has a finite gap and we use a simple cutoff criteria \(\Lambda > \Lambda _0 \approx 10^{-3}\) a.u. to identify the scattering states. The number of these open channels \(N_{\textrm{ch}}\) is typically small compared to the EM dimensions \(N_{\textrm{ch}} \ll N_b\). Equation (35) can now be rewritten in the form [34]

$$\begin{aligned} \rho = \sum _{p \nu } \int \frac{d E}{\pi }f_p (\varepsilon ) \left| \Psi _{p \nu } \right| ^2, \end{aligned}$$
(37)

which is similar to the quantum transmission-boundary method [25]. Here the wave functions \(\Psi _{p \nu }\) are defined as

$$\begin{aligned} \Psi _{p \nu } = \Phi g^R |p\nu \rangle \sqrt{\Lambda _{p\nu }}, \end{aligned}$$
(38)

which is different from ordinary scattering modes but it is particularly useful in the backward wave function calculations. Equation (38) satisfies the Schrödinger equation in the EM representation everywhere except at the contact unit cells which explains the choice of the R-matrix propagation scheme. The sequence of the local R-matrices in Eq. (34) contains all the necessary information to construct all scattering solutions at given energy. To see that, we calculate a boundary projection of the Lippmann-Schwinger equation in a sequence of blocks \(2,3,\ldots ,n\) to obtain

$$\begin{aligned} \psi _n = r_{21}w^T\psi _1 + r_{22}w\psi _{n+1} \end{aligned}$$
(39)

for the wave function in the last added block. Equation (39) gives a numerically stable double-end recurrence relation for the backward wave function propagation. A set of the connection matrices \(r_{21}w^T\) and \(r_{22}w\) must be stored during the forward recursion Eq. (34). In turn, the boundary projection of the EM partial solution \(\psi _{p\nu }\) in Eq. (38) gives the boundary condition

$$\begin{aligned} \left( \begin{array}{c} \psi _1\\ \psi _{ M} \end{array} \right) = ( 1-r \sigma ^R )^{-1}r |p\nu \rangle \sqrt{\Lambda _{p\nu }} \end{aligned}$$
(40)

and the wave function in all other blocks is calculated from Eq. (39) in the reverse order. Since the R-matrix refers to a close system without probes, the backward connection matrices do not depend on the boundary conditions which is important for practical implementation of the method. In realistic systems, one needs to compute contribution to the mobile charge at huge number of grid points from many partial scattering solution which becomes computationally costly. The use of the EM representation in the channel-independent backward propagation can significantly reduce the computational load. At each propagation step \(n \rightarrow (n-1)\) the contribution to the electronic charge from all the partial states is computed by performing three matrix operations \(\psi = T \psi ; \,\,\Psi = \Phi \psi ; \,\,\, \delta \rho = \left[ \Psi \Psi ^{\dagger } \right] _{\textrm{diag}}\), where \(\Psi\) is a complex-valued \(( N_{\textrm{grid}}/N_{\textrm{CPU}}) \times N_{\textrm{ch}}\) array of RSDFT scattering solutions, \(\psi\) is the corresponding array in the EM picture and T represents the real-valued connection matrices in Eq. (39). Transforming the density calculation into matrix-by-matrix products and making use of a highly tuned linear algebra library (BLAS) [47] greatly reduces the computational time. As an illustration, we show in Fig. 7 the CPU time for computing the integrant Eq. (37) as a function of total energy. The reference time \(t_0\) corresponds to the forward R-matrix propagation. For comparison, we also show the simulation time in performing the EM-RSDFT basis transformation for each partial scattering solution before computing its contribution to the electronic density (black curve). The increase in the computational time is due to the larger number of partial contributions in Eq. (37) which is up to \(\sim 140 \, t_0\) at the highest energy values. Although the total number of floating point operations is the same in both cases, regrouping Eq. (37) into the matrix form suitable for the BLAS matrix operations makes the simulations almost two orders of magnitude faster.

Fig. 7
figure 7

(Color online) The CPU time for the density integrand in Eq. (37) (red). Black points corresponds to the direct channel summation. The CPU time \(t_0\) of the R-matrix propagation is used as a reference

An important feature in the original RSDFT simulator is an efficient real-space parallelization [36]. Because of the modified supercell geometry the message passing interface (MPI)-communication scheme in the device simulator differs from the original code. The supercell is split into a set of clusters keeping the number of mesh points within a cluster small and nearly constant. The same fragmentation scheme is used both in the exact R-matrix propagation scheme of the previous section and for the global numeration of the mesh points though the supercell. In general, the maximum number of points in a cluster is a numerical parameter which should be optimized based on the CPU time for the matrix multiplication and the inversion operation in Eq. (27). In the large-scale NEGF EM device simulations we choose this parameter based on the condition that the number of clusters in the supercell is of the order of the number of nodes. The numeration of the mesh points through the clusters is chosen is such a way as to ensure geometrical proximity of the points with consecutive numbers in order to reduce the number of relevant nodes required for the MPI-communication in the non-local part of the Hamiltonian operator. The optimization of the MPI-communication scheme in the modified geometry has not been fully addressed yet and the non-local operations on the real grid are presently about twice as slow as in the original code.

The Poisson equation is solved on the same three-dimensional equidistant grid used in the RSDFT structure calculations. A dielectric layer in the gate area is introduced by extending the original grid beyond the boundary of the modified supercell. We thus obtain a layer of mesh points with a desired thickness \(t_{\textrm{ox}}\) which is used to represent a dielectric layer in the device structure. The non-linear Poisson equation is solved in the integral form by using the Newton–Raphson method.

4.3 Numerical illustration: RSDFT-EM transport simulations to silicon FETs

Figure 6 presents an example of calculated I-V characteristics in a n-SiWN MOSFET shown in the inset. The right panel also shows the one-dimensional profile of the electric potential averaged over the cross section of the silicon channel. The number of atoms in the simulation domain in this case \(N_{\textrm{at}} = 83,875\) and, although the total number of mesh points exceeds \(6.5 \times 10^{7}\), the self-consistent device simulations can be performed in a reasonable timeframe due to the EM basis representation and the properties of the R-matrix method. For a given potential distribution, the energy-dependent R-matrix propagation is performed independently at energy quadrature points evenly distributed among available nodes. The obtained local R-matrix data is sent back to all the nodes and independent cumulative contributions to the electronic density from all the scattering states are computed as a single matrix-by-matrix multiplication operation \(\sim (N_{\textrm{grid}}/N_{\textrm{CPU}}) \, N_b \, N_{ \textrm{ch}}\). The device simulations in this section are performed using \(N_{\textrm{CPU}} = 1,536\) CPUs of the Fugaku supercomputer. Computing the I-V characteristics in Fig. 6 take \(\sim 20\) hours in total. This does not includes the preliminary RSDFT first-principle structure optimization of the silicon crystal structure and construction of the EM basis representation.

The developed simulator is applicable to any geometry of the conducting channel. The only difference may come from the boundary conditions in the Poisson solver as well as the choice of the supercell which is defined by the original RSDFT input file containing the atomic coordinates. As an illustration, we outline the main steps in the device simulations for a n-Ge nanosheet FET with a cross-section of 4 nm by 8 nm. The original supercell of the Ge diamond structure in [001] direction contains 999 atoms including 172 hydrogen atoms necessary to fully passivate the surface of the sample. The equidistant grid of 968,000 mesh points in the supercell has been used to perform the RSDFT structural optimization of the atomistic structure which corresponds to a cutoff energy \(\sim 20\, E_h\). These simulations takes \(\sim 60\) hours. One thus obtains the optimized atomic configuration and the self-consistent

Fig. 8
figure 8

(Color online) Conduction band in a nanosheet n-Ge FET with a cross-section of 4 nm by 8 nm. The black squares shows the results of RSDFT simulation with \(9 \times 10^5\) grid points in a supercell. The red lines represent EM with 389 basis functions

local potential which fully specifies the one-particle Kohn-Sham Hamiltonian and the corresponding energy band structure. As a test, we repeat the self-consistent calculation of the electronic density/potential at higher cutoff energy \(\sim 30 \, E_h\) using the previously obtained atomic structure. Apart from an immaterial energy shift, the conduction band structure is found nearly identical within \(\sim 0.01-0.02\) eV level of accuracy. The device simulations are performed using the smaller RSDFT data set. The geometry of the unit cell is modified in order to impose zero boundary conditions outside the channel cross section which includes a narrow 0.2 nm region outside the hydrogen atoms. This gives a smaller supercell with \(N_{\textrm{grid}} \sim 8 \times 10^5\) mesh points. Figure 8 shows a part of the energy band structure at the bottom of the conduction band. The black scattered points represents the Block states at a set of 13 equidistant wave numbers which are used to form an initial scattering eigenstate basis. Compared to the previous example of the silicon wire, this figure indicates a stronger q-dependence on the Bloch state across the Brillouin zone. The eigenstate basis transformation produces a large number of spurious branches with strong dispersion. In this case the second type of the trial basis states Eq. (16) play an important role in reducing the number of steps in the variational simulations. The red solid lines in Fig. 8 shows the band structure in the final EM model with \(N_b = 389\) basis states. The variational calculations take \(\sim 14\) hours using 1,536 CPUs of the Fugaku supercomputer which includes computing the exact Bloch states and 56 supplementary basis functions

The device Hamiltonian is obtained by performing the EM basis transformation of the chain Hamiltonian Eq. (32). The electric potential on the three dimensional grid is found from the Poisson equation assuming extra homogeneous dopant concentration \(N^+=10^{20} \, \textrm{cm}^{-3}\) in source and drain regions both consisting of 16 supercells. The most time consuming part of the simulations is the density calculation in Eq. (35). In order to reduce the number of energy points we use Gauss-Jacobi quadrature \([0, -1/2]\) in a sequence of narrow intervals between the ordered threshold energies in two probes. The lowest threshold corresponds to the bottom of the conduction band where \(N_{\textrm{ch}}\) changes from zero to a positive number. The next thresholds are determined as the energy values where \(N_{\textrm{ch}}\) changes its value. We do not distinguish close threshold energies \(\Delta E_{\textrm{thr}} < 10^{-3}\) eV which are treated as (nearly) degenerate branches. The number of quadrature points in each interval can be kept small \(\sim 20\) without losing numerical accuracy. Equidistant grid is used for the extra energy interval \(-0.3\) eV above the highest threshold energy.

Optimizations of the density calculation has been discussed in the previous example. Another practically useful scheme is based on separation of the constant basis in Eq. (35) from the energy integration. In this case, the backward propagation is used to compute partial contributions to the \(N_b \times N_b\) diagonal blocks of the density matrix in the EM representation \(D \sim \sum _{E \nu p} f_{p}(E) \psi _{p\nu }(E) {\oplus } \psi _{p\nu }^T(E)\) which only needs an additional inter-node MPI_ALLREDUCE communications. The real space density is then obtained by a single EM transformation \(\sim [\Phi ^T D \Phi ]_{\textrm{diag}}\) which scales as \((N_{\textrm{grid}}/N_{\textrm{CPU}}) N_b^2\).

Fig. 9
figure 9

(Color online) I-V characteristics of nanosheet n-Ge FETs with the conduction band in Fig. 8 with gate lengths of 10 nm and 14 nm. The right panel shows the corresponding electric potentials at selected values of the gate voltage

Figure 9 shows I-V characteristic in two n-Ge nanosheeet FETs with different gate length. As expected, the device with larger gate area \(L_g\) exhibits far better gate control but shows similar value of the drain current in ON state. This can also be clearly seen from the electric potential shown in the right panel of Fig. 9. The total simulation time for the I-V characteristics is \(\sim 18\) – 23 hours depending on the numerical parameters and integration scheme which is quite suitable for a practical device research.

4.4 Si-SiO\(_2\) interface in the RSDFT-EM transport simulations

In the previous examples, the material properties of the dielectric layer in the gate area have not been considered and the DFT calculations are performed in the semiconductor area only. This area however cannot be defined uniquely as there is certain freedom in how one specifies the boundary of the semiconductor domain. The previous choice has been made in order to keep the computational domain as small as possible without losing any pseudopotential terms. The question remains of how different choice of the boundary may affect the electric potential within the channel or whether actual mobile density distribution at the boundary semiconductor/dielectric may have a noticeable effect in the self-consistent device simulations.

As the last example we consider an oxidized n-SiNW channel by including the silicon dioxide in the first-principle RSDFT structure simulations. Even for a thin oxidized wire the DFT simulations are much more demanding compared to the previous examples. Due to a shorter O-H bond in silicon dioxide the RSDFT structural optimization generally requires a much denser three dimensional grid. The energy band structure also contains many dioxide bands at low energies separated from the valence band by a large gap \(\sim 7\) eV which make the subspace diagonalization in the CGSD calculations more demanding. Yet another obstacle comes from a trial atomistic configuration which needs to be specified in order to start the geometry optimization. The simplest trial configuration can be obtained by inserting oxygen atoms in the mid positions in the silicon lattice outside a specified radius. However, numerical tests indicate that such trial configurations are totally useless. Large lattice mismatch at the Si-SiO\(_2\) interface in the wire geometry causes strong local strain in the boundary region and produces multiple spurious energy bands at the Fermi level leading to a very poor convergence in the structural optimization. In order to reduce the strain and insure the finite band gap in the band structure, we perform multiple steps of oxidation-optimization process. Figure 10 illustrates a few steps in such calculations. The simulations start by selecting a 2 nm-diameter cylindrical core in a larger silicon sample with perfect diamond structure. In order to reduce the local lattice mismatch one also has to increase the size of the supercell in the wire direction. For a [001] nanowire, we include eight atomic layers along z-direction in the definition of the initial silicon supercell. At each following step we introduce a small number of oxygen atoms into vacant midpoint positions between silicon atoms outside the core region and perform the RSDFT optimization to obtain a stable sample with converged total energy. Even for such a thin sample, the size of the supercell is as large as \(3.9 \times 10^6\) which corresponds to a cutoff energy \(\sim 69\,E_h\). The simulation for one oxidation step with 50 extra oxygen atoms take about one day. The simulations proceed until there are no more vacant positions. The final nanostructure consists of 226 Si atoms, 174 O atoms and 104 H atoms to fully passivate the surface. The structure corresponds to a 2 nm-diameter Si wire with a 1 nm-thick silicon dioxide (SiO\(_2\)) layer. Figure 10b shows the supercell in the final oxidized Si-SiO\(_2\) nanowire as well as the core region of strained silicon which can be used for comparison. The core area is defined based on the atomic positions of the nearest oxygen atoms and the corresponding bonds are passivated by attaching a new set of hydrogen atoms. We further reduce the size of the supercell by extracting four atomic layers in [001] direction and thus obtain a supercell of 69 Si atoms which can be used in simplified transport simulations in a nanowire with “equivalent” central silicon atomic structure.

Fig. 10
figure 10

(Color online) a Oxidation of a Si wire in the RSDFT structure optimization simulations. At each step \(\le 50\) oxygen atoms are introduced at random vacant positions outside the core region. b Extracting the core region for the reduced first principle transport simulations

Fig. 11
figure 11

The band structure in a 2 nm-diameter Si wire. a Simulations in a double supercell including a 1 nm-thick silicon dioxide layer. bd Simulations in 3 types of a strained unit cell obtained by extracting 69 Si atoms from the central region of the oxidized silicon nanostructure (see the text)

Figure 11(a) shows the energy band structure in the Si-SiO\(_2\) supercell (called “Si-SiO_2” hereafter) and in the equivalent silicon wire. In the latter case we use 16 CPUs of Inter 2.6 GHz workstation to compute the local RSDFT potential and perform all the necessary steps in device simulations. Figure 11 also shows the band structure in three Si supercells with the same number of atoms which we use for comparison. Case (b) is obtained by reducing the size of the supercell by extracting four atomic layers in [001] direction (“Str Si”). Case (c) is obtained by relaxing the strain in the extracted central core region (“Opt Si”). Case (d) corresponds to relocating the silicon atoms into the nearest position in an ideal diamond lattice (“Si”). The main feature of the EM method is that the size of the reduced device Hamiltonian depends only on complexity of the energy band structure and not on the original exact representation. Although the supercell of the oxidized sample is a few times larger compared to the previous examples, the low energy part of the conduction band structure is as simple as the one in thin Si wires which make the EM construction rather trivial. The primary basis of the Bloch eigenstates already generate a reduced model with correct band gap and there are just three spurious branches within the transport energy range which are easily eliminated by three extra steps. We thus obtains a 61-dimensional EM representation \((N_b/N_{\textrm{grid}} \sim 2\times 10^{-5})\) suitable for device simulations. We have computed the I-V characteristics of a 2 nm-diameter n-SiNW FET using the constructed RSDFT-EM model for the oxidized silicon with 1 nm-thick SiO\(_2\) layer. Other parameters are the same as in the previous examples. The results are compared with simple models using continuous dielectric media in the gate region. Our simulations show that the drain current in the ON states have similar values but the subthreshold slope in the continuous dioxide models is significantly underestimated. The likely reason for this discrepancy is the difference in the charge distribution and the boundary condition in the area close to the gate contact. The potential profile for two simulation models in Fig. 12 clearly indicates different width of the effective potential barrier. In order to reduce the discrepancy, we have changed the gate geometry in continuous modeling by extending the gate area by 1 nm. The right panel in Fig. 12 shows the potential profile in the new configuration and Fig. 13 present our final results. The calculated drain current in the strained Si channel with equivalent atomistic configuration is in good agreement with the original Si-SO\(_2\) modeling. Introducing extra strain relaxation in the atomic structure leads to the increased drain current in ON state, but show similar subthreshold slope. Despite generally fairly good agreement between these models, both the band structure and the electronic charge distribution in the channel are quite different. Figure 14 shows the electronic density at two representative values of gate voltage in different models for the same Si channel. The strain relaxation leads to a more oriented atomic structure with reduced density fluctuations across the supercell. In idealistic homogeneous nanostructures this does not seem to lead to significant changes in the simulation results. However, in a more realistic devices with a spatially inhomogeneous Si/SiO\(_2\) interface the full ab initio description may be necessary in order to properly reproduce the device characteristics. Unfortunately, the present EM method is not well suited for inhomogeneous materials and the computational strategy needs to be modified. In the next section we briefly discuss a possible way to overcome the limitations of the EM method.

Fig. 12
figure 12

(Color online) Electric potential at various gate voltage in a SiNW FET using models (a) and (b) in Fig. 11 shown respectively by black and red lines. The results for a strained silicon in the right panel correspond to extending dielectric layer in the gate area

Fig. 13
figure 13

(Color online) The I-V characteristics in an n-SiNW MOSFET using four types of RSDFT data in Fig. 11

Fig. 14
figure 14

Electric charge in OFF (upper panels) and ON (low panels) states in a Si channel before (left) and after (right) strain relaxation

5 RSDFT-EM representation in scattering-matrix method

The EM method originally assumes that a nonequilibrium state of homogeneous device is formed as a result of various scattering processed in a reference periodic nanostructure. Thus, the same set of basis functions can be used to represent transport properties of all part of the device channel. The method is suitable for inelastic transport simulations since reducing the size of the device Hamiltonian enables one to compute the full Green’s function in the basis representation [37, 38]. Scattering by lattice imperfections or impurities, treating heterogeneous devices of varying cross section and/or materials requires different approach. In principle, physically relevant modes can be extracted locally from a set of scattering problems in a sequence of larger supercells along the channels. Therefore it may be possible to represent defects or impurities by a local basis. However, such a local basis representation may produces extra backscattering at the boundaries between different representations thus leading to enhanced localization and underestimated transmission coefficients [55]. On the other hand, atomistic simulations in a tight binding model show that physical solutions in the entire channel can be well reproduced even if the basis representation does not completely eliminate the unphysical backscattering [56]. This suggests that the EM representation can be used as an approximate solution to facilitate more time consuming atomistic simulations. Since the regular perturbation expansion may not be suitable for capturing the behavior of a nonequilibrium open system, we consider a more straightforward approach and use the EM approximation as a preconditioner in recursive solution of inhomogeneous scattering equation.

The partial contribution to the electric charge in Eq. (37) is defined in terms of the eigenstates of the scattering operator which ensures correct normalization of the corresponding wave function. In this section we prefer the standard definition of scattering wave functions which correspond to the incoming wave in particular channels. With proper normalization, one can find these solutions approximately and compute the carrier density and drain current. We consider a structure in Fig. 5 with the device Hamiltonian \(H_D\) and two probes in the EM representation. The scattering mode \(\Psi _{\nu }\) is found from the equation

$$\begin{aligned} \left( E - H_{ D} \right) \Psi _{\nu } = h_{ DL} \Bigl ( \overleftarrow{\chi _{\nu }} \overleftarrow{ z_{\nu }} + \sum _{\mu } \overrightarrow{\chi _{\mu }} a_{\mu } \Bigr ), \end{aligned}$$
(41)

where \(\overleftarrow{\chi _{\nu }}\) is the incoming wave in open channel \(\nu\) with the corresponding phase factor \(\left| \overleftarrow{ z_{\nu }} \right| = 1\). As usual, we assume semi-infinite probes of nanowire geometry and square matrices of Bloch eigenstates \(\overrightarrow{\chi }\) and \(\overleftarrow{\chi }\) [24]. To keep the notations simple, we use a single index \(\nu\) to numerate possible modes in both probes. The second term in the brackets in the right hand sides represents outdoing/decaying waves in the probes after scattering. The unknown expansion coefficients \(a_{\mu }\) are related to the elements of the S-matrix and therefore to the transmission coefficient. The unitarity of the S-matrix requires unit-current normalization of the Bloch eigenstates

$$\begin{aligned} 2\Im \langle \overrightarrow{\chi }^\dagger _{\nu } W \overrightarrow{\chi _\nu } {\overrightarrow{Z}_\nu } \rangle = v_{\nu }\langle \overrightarrow{\chi _\nu }^{\dagger } \overrightarrow{\chi _\nu } \rangle = 1, \end{aligned}$$
(42)

where \(v_\nu\) is the group velocity and \(\overrightarrow{Z}_\nu\) is the corresponding Bloch factor [24]. Under this condition, the mobile charge density is calculated from Eq. (37) and the transmission function is given by

$$\begin{aligned} T= & {} \sum _{\nu \mu }{}^{'} |S_{\mu \nu }|^2, \end{aligned}$$
(43)

where the prime indicates that \(\nu\) and \(\mu\) run over open channels in different leads. The corresponding elements of the scattering matrix are obtained by projecting the wave function in the opposite EM contact to the outgoing states \(S_{\mu \nu } = \overrightarrow{\chi _\mu }^{-1} P \psi _{\nu }\) where \(\overrightarrow{\chi _\mu }^{-1}\) is the \(\mu\)-th row in the inverse matrix \(\overrightarrow{\chi }^{-1}\).

At the contacts, \(P \Psi _{\nu }\) is given by the expression in the right hand side of Eq. (41) except that all the Bloch states \(\chi\)s are multiplied by the inverse of the corresponding Bloch eigenvalue \(z^{-1}\). Hence, one finds the expansion coefficients

$$\begin{aligned} a_{\mu } = Z_{\mu }\overrightarrow{\chi _\mu }^{-1}\left( P \Psi _{\nu } - \overleftarrow{\chi _\nu } \right) \end{aligned}$$
(44)

and obtains the closed equation

$$\begin{aligned} \left( E - H_{D} - \sigma ^R \right) \Psi _\nu = \left( h_{ DL} \overleftarrow{z _\nu } - \sigma ^R \right) \overleftarrow{\chi _\nu }. \end{aligned}$$
(45)

For a sparse banded Hamiltonian matrix \(H_{D}\) Eq. (45) can be solved recursively. However, similar to other recursive numerical schemes, the exact solution may become prohibitively difficult for the RSDFT Hamiltonian in realistic devices. On the other hand, solving this equation in the EM representation presents no difficulty even for fairly large systems. Let \(P_{\textrm{EM}}\) be a projector to the subspace of the EM basis functions. If the accuracy of the EM basis is good enough, the coupling term \(P_{\textrm{EM}} H_{D} (1-P_{\textrm{EM}})\) is supposed to be a small correction and the Green’s function \(G_{\textrm{EM}} = \Phi g \Phi ^T\) in the EM representation nearly diagonalizes the left hand size of the equation. Thus one can apply an iterative algorithm from the family of complex conjugate gradient methods [57] and use the preconditioner in the form \(G_{\textrm{EM}} + (1 - P_{\textrm{EM}})\).

As a test, we consider a thin 1 nm-diameter nanowire in Fig. 1 and take \(g = g_0\) as a Green’s function in the close system. Applying the preconditioning is equivalent to solving the linear equation

$$\begin{aligned} (E - h_{ D})\xi = x, \end{aligned}$$
(46)

where x is the \(P_{\textrm{EM}}\) projection of the residual in the original equation. The equation can be again solved by the “double-sided” R-matrix recursion technique. Assuming the \(x_n,\xi _n\) be the components within the n-th EM unit cell one calculates \(\xi _n\) by the recursion

$$\begin{aligned} \left( \begin{array}{c} \xi _2 \\ \xi _{n} \end{array} \right) = \left( \begin{array}{cc} r_{11} &{} r_{12} \\ r_{21} &{} r_{22} \end{array} \right) \left( \begin{array}{c} w^T\xi _1 \\ w \xi _{n+1} \end{array} \right) + \left( \begin{array}{c} y_1 \\ y_n \end{array} \right) , \end{aligned}$$
(47)

where we introduced an auxiliary vector y which needs to be evaluated in the course of the R-matrix propagation Eq. (34). The corresponding recursive formulas are easily found from Eq. (46): At the propagation step \(n \rightarrow n+1\) after computing the new \(r_{22}\) in Eq. (34a) two additional operations need to be performed

$$\begin{aligned} y_{n+1}&= r_{22}(x_{n+1} + w^T y_n) \end{aligned}$$
(48a)
$$\begin{aligned} y_2&= y_2 + r_{12} w y_{n+1} \end{aligned}$$
(48b)

before completing the R-matrix propagation step Eq. (34b–d). The propagation starts at \(n = 2\) with the initial condition \(r_{ij} = (E-h_2)^{-1}\) and \(y_2 = r_{22}x_2\). After the last propagation step the solution in the boundary unit cells are found as

$$\begin{aligned} \xi _1&= (e-h_1-wr_{11}w^T)^{-1}(x_1 + wy_2) \end{aligned}$$
(49a)
$$\begin{aligned} \xi _{M}&= r_{21}w^T \xi _1 + y_{M} \end{aligned}$$
(49b)

and all other \(\xi\)s are computed from Eq. (47) in the reverse order. The EM transformation completes the preconditioning step. The calculations for all the open channels should be performed in as a single matrix operation in order to improve the computer performance in much the same way as for the previously discussed electric density calculations. Similar approach can used for the preconditioner \(g^R\). The only difference is that the complex-valued self-energies must be taken into account at the last propagation step. Most time consuming part of solving Eq. (45) is the preconditioning and the simulation time for one iteration is analogous to computing the contribution to the carrier density in the EM representation. Figure 15 presents an example of the I-V characteristics in a n-SiNW FET. The right panels show the potential and one-dimensional profile of the mobile density at low gate voltage. As a demonstration, we show the convergence of partial density contribution to this OFF state from incoming scattering modes at the source probe at low energy \(E=-1.1\) eV. Figure 16 illustrates the behavior or the residual error in the bicongugate iterative scheme for solving Eq. (45). The solution of the scattering problem Eq. (45) is well localized in the source region and the contribution to the electronic density in the drain contact from these states can be neglected. However, as may be seen from the left panel, even in this deep tunneling regime, the transmission probability Eq. (43) shows steady convergence to the exponentially small tunneling value (\(T \sim 3\times 10^{-3}\)). Figure 17 shows the partial contribution \(\sum _{\nu }\left| \Psi _{p=1\nu }\right| ^2\) to the electronic density Eq. (37) from the source electrons. The black solid line represents the exact result from the R-matrix calculation and the red line shows iterative solution of Eq. (45) after 10 and 20 iterations. In this particular case \(N_{\textrm{iter}} = 30\) iterations give a few percent level of accuracy. The simulations have been performed using 16 CPUs of the Intel workstation and computing the contribution from one energy point takes \(\le 20\) sec which is \(\sim 10\) times faster compared to the exact R-matrix propagation.

The present example may seem not to be informative enough as the EM representation in this case has been constructed to fit the quantum states of the periodic nanostructure. One should expect the simulations in inhomogeneous or amorphous nanostructures to be much more time consuming. However, since the time dependence on the total size \(N_{\textrm{grid}}\) is linear, the approximate iterative schemes may offer a promising alternative to the exact recursive schemes. Convergence of the method and simulation time depends on the ab initio model for describing the crystal imperfections, the method for constructing the local basis representation and the choice of the iterative algorithm. These issues need to be addressed in future studies.

Fig. 15
figure 15

(Color online) The I-V characteristics in a 1 nm-diameter n-SiNW FET. The right panels shows the electric potential and mobile density in the OFF state used in the demonstration of iterative method

Fig. 16
figure 16

(Color online) Convergence of the iterative method. Left: the low energy tunneling transition probability in the OFF state shown in Fig. 15. Right: the residual error in the biconjugate gradient (BiCG) iterations with EM preconditioner

Fig. 17
figure 17

(Color online) The partial contribution to the mobile charge in Fig. 15 from the low energy electrons coming from the left contact (red solid line) after 10 and 20 iterations. The black solid line shows the exact result \(-i [ G^R \Sigma ^{<}_S {G^R}^ \dagger ]_{\textrm{diag}}\)

6 Summary

We have presented a first principle device simulator which is suitable for massively-parallel computers. The developed computer code makes it possible to calculate transport characteristics within a reasonable simulation time in large quasi-one-dimensional nanostructures with thousands of atoms in the channel cross-section. It is based on the previously reported real-space first-principle density functional program which efficiently performs large-scale parallel calculations in realistic systems. Necessary modifications are introduced to the geometry of the simulation domain as well as the boundary condition for the electronic structure calculations to be combined with the appropriate Poisson solver. An effective algorithm for constructing equivalent model (EM) representation has been developed in order to transform the original RSDFT data into a low rank quantum transport model suitable for practical parallel simulations. Coupled with the real-valued R-matrix propagation algorithm, the method greatly reduces the computational cost and enabled one to perform self-consistent device simulations within a reasonable time frame.