1 Introduction

Electrostatic interactions are important in biological processes such as molecular recognition, enzyme catalysis, and biomolecular encounter rates. A significant challenge in computational biology has been to model these interactions accurately and efficiently. This is because biomolecules are surrounded by solvent molecules and therefore, the solvent effects must be considered during modelling. There are two main groups of computational approaches which are used to model electrostatic interactions based on how the solvent is treated. Explicit methods place the solvent molecules around the biomolecule while implicit methods consider the solvent molecules as a continuum [1, 2].

The Poisson–Boltzmann equation (PBE) is one of the most popular implicit solvent models which describes the solvent in a continuum model through the Boltzmann distribution. The PBE solves the electrostatic potential in the entire domain which comprises both the molecule and the solvent. From this potential, further information can be obtained at various regions of interest and for different applications. Firstly, the electrostatic potential at the biomolecular surface, commonly known as electrostatic surface potential, can provide insights into possible docking sites for other small or large molecules. Secondly, the potential outside the biomolecule can provide information about the free energy of interaction of small molecules at different positions in the vicinity of the biomolecule. Thirdly, free energy of a biomolecule can be determined, which provides information about the molecule’s stability. Finally, the electrostatic field can be estimated from which the mean atomic forces can be derived. More information can be found in [2,3,4,5].

Analytical solutions of the PBE are only possible under the assumption that the biomolecules of interest have regular shapes, for example, spheres or cylinders. And even if these solutions exist, they are still quite complex. However, these are not realistic because biomolecules have irregular shapes or geometries and charge distributions [6, 7]. This makes it necessary to apply numerical techniques to the PBE and the first of such methods were introduced in [8] where the electrostatic potential was determined at the active site of a protein (or enzyme). The most popular numerical techniques in this regard are based on the discretization of the domain of interest into small regions and employ the finite difference methods (FDM) [1, 9], the finite element methods (FEM) [9, 10], or the boundary element methods (BEM) [11, 12]. A thorough review of the numerical methods for solving the PBE can be found in [13].

All of the aforementioned numerical methods have one major advantage in common. It is possible to employ ‘electrostatic focussing’, which enables users to apply relatively coarse grids for the entire calculations and very fine grids in regions of great interest such as the binding or active sites of macrobiomolecules. This adaptivity provides highly accurate local solutions to the PBE at reduced computational costs [14]. However, the BEM has the drawback of being applicable only to the LPBE and thus limiting its general use. Numerous software packages have been developed to solve the PBE and some of the major ones include the adaptive Poisson–Boltzmann solver (APBS) [9] and Delphi [15]. There are also recent developments regarding the PBE theory which include, the treatment of the biomolecular system as an interface problem, the extensive studies on the nonlinear PBE, among others, see Sect. 2 for more details.

Due to the limited computational memory and speed, solving the PBE efficiently is still computationally challenging and affecting the accuracy of the numerical solutions. This is due to the following reasons. Firstly, electrostatic interactions are long-ranged and therefore, the electrostatic potential decays exponentially over large distances, see Eq. (7). This requires an infinite domain which is infeasible in practice. Secondly, biomolecules of interest comprise of thousands to millions of atoms which require a large domain to accommodate both the biomolecule and the solvent. To circumvent these challenges, it is customary to choose a truncated domain of at least three times the size of the biomolecule so as to accurately approximate boundary conditions [6]. Nonetheless, this still leads to a very large algebraic system consisting of several hundreds of thousands to millions of degrees of freedom. It becomes even more difficult if the PBE is incorporated in a typical dynamics simulation which involves millions of time steps or in a multi-query task where the solution is solved many times for varying parameter values such as the ionic strength [1].

The computational complexity arising from the resultant high-dimensional system can be greatly reduced by applying model order reduction (MOR) techniques. The main goal of MOR is to construct a reduced-order model (ROM) of typically low dimension, whose solution retains all the important information of the high-fidelity system at a greatly reduced computational effort. Because the PBE is a parametrized PDE (PPDE), we apply the reduced basis method (RBM) which falls into the class of parametrized MOR (PMOR) techniques [16]. However, it is important to note that the RBM is not an independent numerical technique; hence its accuracy depends on that of the underlying technique which is used to discretize the PBE [16, 17]. In this paper, we discretize the PBE using FDM before applying the RBM.

The benefits of the RBM, or the MOR in general, become obvious when the same problem has to be solved for a large number of parameter values. In our study, the break-even point is about 10, and thus, the RBM becomes very effective if dozens or more parameter configurations need to be evaluated.

Here, we consider a protein molecule immersed in ionic solution at physiological concentration, and determine the electrostatic potential triggered by the interaction between the two particles, see Fig. 1. The electrolyte here is of monovalent type, implying that the ionic strength is equivalent to the concentration of the ions. The ionic strength is a physical parameter of the PBE, and we consider it as the parameter \(\mu \) for the RBM in this article. Therefore, the electrostatic potential shall be determined under variation of this parameter. The electrostatic potential as a function of the ionic strength in solution may influence the rate of association between the enzyme and substrate [18, 19].

Fig. 1
figure 1

2-D view of Debye–Hückel model

This paper is an extension of the ECCOMAS Congress 2016 proceedings paper, Kweyu et al. [20] with the following additional key inputs. Firstly, we employ nonaffine Dirichlet boundary conditions given in Sect. 3.4 to replace the zero Dirichlet boundary conditions in the former. Secondly, and as a consequence of the nonaffine parameter dependence of these boundary conditions, we apply the (discrete) empirical interpolation method ((D)EIM) to reduce the resultant complexity in the reduced order model (ROM) during the online phase of the reduced basis method (RBM), see Sects. 4.2 and 4.3. Note that we follow the description of the DEIM implementation discussed in [21]. Mathematically, this is equivalent to the original EIM version introduced in [22] and the discrete variant in [23]. Lastly, we apply finite volume discretization to the dielectric coefficient function instead of taking the averages of the dielectric values between two neighbouring grid points. This is meant to reduce the truncation error as explained in Sect. 3.2.

RBM has previously been applied to the nonlinear regularized PBE based on the range-separated tensor format and preliminary results for only simple molecules were published in [24]. Chen et al. [25] used a simplified variant of the classical nonlinear PBE but only in 1- and 2-dimensions wherein smooth exponential functions, which derive their application in electrochemical systems involving the modeling of the symmetric electrolyte, were used as the source terms. However, the novelty of this paper rests in the efficient construction of a low dimensional surrogate reduced order model (ROM) for the LPBE by the RBM and DEIM, whose solution is as accurate as those of popular PBE software packages, for example, the APBS. Here, for the first time, the RBM has been applied to the LPBE in 3D for modeling of complex biomolecular systems, which are characterized by the presence of strong singularities generated by singular sources and subject to parametric nonaffine Dirichlet boundary conditions in the form of Yukawa potential.

The outline of this paper is as follows: in Sect. 2, we present an overview of the PBE theory and derive the PBE model. In Sect. 3, we provide a glimpse on the finite difference discretization of the LPBE and those of the dielectric coefficient and kappa functions, charge densities, as well as their respective mappings to the computational grid. In Sect. 4, we provide the basics of the RBM which include the problem formulation, the solution manifold, the greedy algorithm, the discrete empirical interpolation method (DEIM), and the a posteriori error estimation. In Sect. 5, we provide numerical results of the FOM (via the FDM) and those of the ROM (via RBM and DEIM). Conclusions and some ideas on future work are given in the end.

2 An overview of Poisson–Boltzmann theory

There are numerous ways and reviews on the derivation of the PBE. The simplest stems from the Poisson equation [26, 27] (in SI units),

$$\begin{aligned} -\vec {\nabla }\cdot (\epsilon (x)\vec {\nabla }u(x)) = \uprho (x), \quad \text {in} \quad \Omega \in {\mathbb {R}}^3, \end{aligned}$$
(1)

which describes the electrostatic potential u(x) at a point \(x \in \Omega \). The term \(\uprho (x)\) is the charge distribution which generates the potential in a region with a position-dependent and piecewise constant dielectric function \(\epsilon (x)\). Equation (1) is generally solved in a finite domain \(\Omega \) subject to Dirichlet boundary conditions \(u(x) = g(x)\) on \(\partial {\Omega }\). Usually, g(x) employs an analytic and asymptotically correct form of the electrostatic potential and therefore, the domain must be sufficiently large to ensure an accurate approximation of the boundary conditions [7].

To obtain the PBE from Eq. (1), we consider two contributions to the charge distribution \(\uprho (x)\): the ‘fixed’ solute charges \(\uprho _f(x)\) and the aqueous ‘mobile’ ions in the solvent \(\uprho _m(x)\). The \(N_m\) partial atomic point charges (\(z_i\)) of the biomolecule are modeled as a sum of delta distributions at each atomic center \(x_i\), for \(i = 1, \ldots , N_m\), that is,

$$\begin{aligned} \uprho _f(x) = \frac{4\pi e^2}{K_B T}\sum _{i=1}^{N_m}z_i\delta (x-x_i). \end{aligned}$$
(2)

The term \(e/{K_BT}\) is the scaling coefficient which ensures that the electrostatic potential is dimensionless, where e is the electron charge and \(K_BT\) is the thermal energy of the system and is comprised of the Boltzmann constant \(K_B\) and the absolute temperature T. The total charge of each atom is \(ez_i\).

On the other hand, the solvent is modeled as a continuum through the Boltzmann distribution which leads to the mobile ion charge distribution

$$\begin{aligned} \uprho _m(x) = \frac{4\pi e^2}{K_B T}\sum _{j=1}^{m}c_jq_je^{-q_ju(x)-V_j(x)}, \end{aligned}$$
(3)

where we have m mobile ion species with charges \(q_j\) and bulk concentrations \(c_j\). The term \(V_j(x)\) is the steric potential which prevents an overlap between the biomolecule and the counterions. For monovalent electrolytes, whose ions are in a 1 : 1 ratio, for example, NaCl, Eq. (3) reduces to

$$\begin{aligned} \uprho _m(x) = -\kappa ^2(x)\sinh (u(x)), \end{aligned}$$
(4)

where the kappa function \(\kappa ^2(x)\) is position-dependent and piecewise constant; it describes both the ion accessibility through \(e^{-V(x)}\) and the bulk ionic strength (or concentration) [14].

We eventually obtain the PBE by combining the two expressions for the charge distributions in (2) and (4) with the Poisson equation (1) for a monovalent electrolyte,

$$\begin{aligned}&-\vec {\nabla }.(\epsilon (x)\vec {\nabla }u(x)) + {\bar{\kappa }}^2(x) \sinh (u(x)) \nonumber \\&\quad = \frac{4\pi e^2}{K_B T}\sum _{i=1}^{N_m}z_i\delta (x-x_i), \end{aligned}$$
(5)

subject to

$$\begin{aligned} u(x) = g(x) \quad \text {on} \quad \partial {\Omega }, \end{aligned}$$
(6)

where

$$\begin{aligned} u(\infty ) = 0 \, \implies u(x_{\max })\rightarrow 0 \,\,\text {as} \,\, \left| x_{\max }\right| \rightarrow \infty . \end{aligned}$$
(7)

In Eq. (5), \( u(x) = {e\psi (x)}/{K_B T}\) is the dimensionless potential scaled by \(e/{K_B T}\) and \(\psi (x)\) is the original electrostatic potential in centimeter-gram-second (cgs) units at \( x \in {\mathbb {R}}^3 \). The terms \(\epsilon (x)\) and \({\bar{\kappa }}^2(x)\) are discontinuous functions at the interface between the charged biomolecule and the solvent, and at an ion exclusion region (Stern layer) surrounding the molecule, respectively. The term \({\bar{\kappa }}^2 = {8\pi e^2 I}/{1000\epsilon K_BT}\) is a function of the ionic strength \(I = 1/2\sum _{i=1}^{N_m}c_iz_i^2\), which shall be used as the RBM paramter \(\mu \) in Sect. 4 . The function g(x) represents the Dirichlet boundary conditions which are discussed in detail in Sect. 3.4 and are nonaffine in the parameter I. Equation (7) shows that the electrostatic potential decays to zero exponentially as the position approaches infinity. Details on mapping \(\epsilon (x)\) and \({\bar{\kappa }}^2(x)\) onto a computational grid can be found in [9]. The PBE (5) poses severe computational challenges in both analytical and numerical approaches due to the infinite (unbounded) domain in (7), delta distributions, rapid nonlinearity, and discontinuous coefficients [6, 10].

The PBE (5) can be linearized under the assumption that the electrostatic potential is very small relative to the thermal energy \(K_BT\) [2]. Therefore, the nonlinear function \(\text {sinh}(u(x))\) can be expanded into a Taylor series

$$\begin{aligned} \sinh (u(x)) = u(x) + \frac{u(x)^3}{3!} + \frac{u(x)^5}{5!} + \cdots , \end{aligned}$$
(8)

and only the first term is retained. We obtain the linearized PBE (LPBE) given by

$$\begin{aligned} -\vec {\nabla }.(\epsilon (x)\vec {\nabla }u(x)) + {\bar{\kappa }}^2(x)u(x) = \left( \frac{4\pi e^2}{K_B T}\right) \sum _{i=1}^{N_m}z_i\delta (x-x_i). \end{aligned}$$
(9)

Usually, proteins are not highly charged, and it suffices to consider the linearized PBE (LPBE). One can still obtain accurate results because the higher order terms in (8) do not provide a significant contribution. However, we must note that the LPBE can give inaccurate results for highly charged biomolecules such as the DNA and RNA (nucleic acids), phospholipid membranes, and polylysine [4]. More information about the PBE, including its derivation from first principles, can be found in [6].

It is worth noting that there are recent developments of the PBE theory. Firstly, the biomolecular system has been considered as an interface problem which requires solution decomposition techniques to get rid of the solution singularities caused by the Dirac-delta distributions on the right hand side of (9) or (5). This has been discussed, for example, in [28,29,30] where the LPBE has been modified into the form

$$\begin{aligned} {\left\{ \begin{array}{ll} -\epsilon _p \Delta u(x) = \alpha \sum _{i=1}^{N_m}z_i\delta (x-x_i), &{}\quad x \in D_p,\\ -\epsilon _s \Delta u(x) + \kappa ^2 u(x) = 0, &{}\quad x \in D_s,\\ u(s^+) = u(s^-), \quad \epsilon _s \frac{\partial u(s^+)}{\partial n(s)} = \epsilon _p \frac{\partial u(s^-)}{\partial n(s)}, &{}\quad s \in \varvec{\Gamma },\\ u(s) = g(s), &{}\quad s \in \partial \Omega , \quad \end{array}\right. } \end{aligned}$$
(10)

where \(\alpha \) is a constant, \(D_p\) the protein domain, \(D_s\) the solvent domain and \(\varvec{\Gamma }\) the interface between the protein and the solvent. The PBE (nonlinear) has also been extensively solved as an interface problem [28, 29].

The interface problem in (10) is more accurate than the model in (9) considered in this study, because the local or short-range potentials generated by the Dirac-delta distributions are computed independent of the long-range potentials, thus avoiding errors. However, this model is still computationally expensive because the numerical calculations by conventional methods are in \({\mathcal {O}}({\mathcal {N}}^3)\), (commonly known as the ‘curse of dimensionality’), where \({\mathcal {N}}\) is the dimension of the system in one direction. Therefore, we use the simple model (9) for the purpose of introducing and validating the RBM. Considering the interface problem would be our next step.

Secondly, studies on a variational problem of minimizing a mean-field variational electrostatic free-energy functional have been conducted [31]. This has been done in order to investigate the dependence of dielectric coefficient on local ionic concentrations and its effect on the equilibrium properties of electrostatic interactions in an ionic solution which was proposed, for instance, in [32]. Results show that indeed the dielectric coefficient depends on the local ionic concentrations and this dependence can be expressed as a mathematical function which is continuous, monotonically decreasing, and convex [31].

2.1 Applications and post-processing of the PBE solution

The resultant electrostatic potential for the entire system can be used to calculate electrostatic free energies and electrostatic forces. The electrostatic free energy represents the work needed to assemble the biomolecule and is obtained by integration of the potential over a given domain of interest [7, 33]. For the LPBE, this energy is given by

$$\begin{aligned} G_{\text {elec}}[u(x)]= \frac{1}{2}\int _{\Omega }\uprho _f u(x)dx = \frac{1}{2}\sum _{i=1}^{N}z_iu(x_i), \end{aligned}$$
(11)

where \(u(x_i)\) is the mean electrostatic potential acting on an atom i located at position \(x_i\) and carrying a charge \(z_i\). The integral in (11) can be seen as the integral of polarization energy which is equivalent to the sum of interactions between charges and their respective potentials.

On the other hand, it is also possible to differentiate the energy functional in (11) with respect to atomic positions to obtain the electrostatic force on each atom [7, 14, 34]. The electrostatic potential can also be evaluated on the surface of the biomolecule (electrostatic surface potential). It is used to provide information about the interaction between the biomolecule and other biomolecules or ligands or ions in its vicinity.

2.1.1 Similarity index (SI) analysis of proteins

Similarity indices (SIs) are quite significant for the following reasons. Firstly, they are used in quantum mechanical calculations to compare the electron densities and electrostatic potentials of small organic compounds. The comparison can be used to derive quantitative structure-activity relationships (QSARs) [35]. Secondly, they are used for comparison of molecular electrostatic potentials generated by the PBE. In general, similarity analysis can be used to compare the interaction properties of related proteins which provides information about binding to other particles [35].

If so many protein samples are considered, then it becomes a severe computational issue. On the other hand, self-similarity indices can also be calculated by rotation of individual proteins about an axis, tasks which can be handled more conveniently by the RBM. In this case, the angle of rotation becomes the useful parameter. This is our next research focus where we shall apply solution decomposition technique (the range-separated canonical tensor format) [36] in order to modify the PBE in (5) so as to improve on the accuracy and to reduce the computational costs.

2.1.2 Brownian dynamics simulation (BDS) and ionic strength dependence on reaction rates

Brownian dynamics simulation technique has a myriad of applications in biological systems. It may be used for example, to determine protein association rates, simulate protein-protein encounter, among others [2, 37]. Protein association rates highly depend on the ionic strength of the ionic solution in which the interaction takes place. For instance, high ionic strengths dampen or attenuate the effect of electrostatic forces and energies of proteins, hence reducing the rates of association and vice versa. The dependence on ionic strength of the solution is an indicator to the significance of long-range electrostatic forces and hence diffusion control [37].

Works of several researchers corroborate this dependence of protein association rates on ionic strength and we here mention a few of these findings. In their research about ionic strength dependence of protein-polyelectrolyte interactions, Seyrek et al. [38] investigated the effect of univalent electrolyte concentration on protein-polyelectrolyte complex formation. They observed that the addition of salt screened repulsions, as well as attractions, thus reduced the binding of the complex.

Pasche et al. [39] examined the effect of ionic strength and surface charge on protein adsorption at PEGylated surfaces. They observed that at high grafting density and high ionic strength, the net interfacial force was determined by the steric barrier properties of PEG (polyethylene glycol). On the other hand, at low ionic strength, the electrical double layer thickness exceeded that of the PEG layer, therefore, the protein interactions with PLL-g-PEG coated surfaces were influenced by the surface charges shining through the PEG double layer.

In [40], the electrostatic influence on the kinetics of ligand binding to acetylcholinesterase (AChE) was investigated and distinctions between active center ligands and fasciculin were made. It was observed that reaction rates for the cationic ligands showed a strong dependence on ionic strength. Furthermore, fasciculin 2 (FAS2) showed greater ionic strength dependence than \(\hbox {TFK}^+\) (m-trimethylammoniotrifluoroacetophenone) which is consistent with its multiple net positive charges.

The RBM technique can be quite useful for such multi-parametric systems, whereby a reduced order model (ROM) can be obtained for varying ionic strengths and positions of the molecule under investigation. This ROM can make the BDS computations much cheaper than using the full order model (FOM).

3 Discretization of the Poisson–Boltzmann equation

3.1 Finite difference discretization

We discretize the LPBE in (2) with a centered finite differences scheme to obtain the algebraic linear system as follows,

$$\begin{aligned}&\displaystyle -\frac{H}{dx^2}\epsilon _{i+\frac{1}{2},j,k}^x(u_{i+1,j,k}-u_{i,j,k})\nonumber \\&\quad +\, \frac{H}{dx^2}\epsilon _{i-\frac{1}{2},j,k}^x(u_{i,j,k}-u_{i-1,j,k})\nonumber \\&\quad -\,\frac{H}{dy^2}\epsilon _{i,j+\frac{1}{2},k}^y(u_{i,j+1,k}-u_{i,j,k})\nonumber \\&\quad +\,\frac{H}{dy^2}\epsilon _{i,j-\frac{1}{2},k}^y(u_{i,j,k}-u_{i,j-1,k})\nonumber \\&\quad -\,\frac{H}{dz^2}\epsilon _{i,j,k+\frac{1}{2}}^z(u_{i,j,k+1}-u_{i,j,k})\nonumber \\&\quad +\,\frac{H}{dz^2}\epsilon _{i,j,k-\frac{1}{2}}^z(u_{i,j,k}-u_{i,j,k-1}) + H{\bar{\kappa }}^2_{i,j,k}u_{i,j,k} = HCq_{i,j,k},\nonumber \\ \end{aligned}$$
(12)

where \(H = dx\times dy\times dz\) is a scaling factor, \(q_{i,j,k}\) is the discretized molecular charge density and \(C = {4\pi e^2}/{K_B T}\). It is important to choose efficient algorithms and parameters to be used in the discretization of the charge density distribution, the kappa, and the dielectric functions that appear in the LPBE for the accuracy of the mean electrostatic potential solution. An efficient method is usually chosen to partition the domain into regions of solute (or biomolecule) and the solvent dielectric. Some of the key methods employed in APBS are the molecular surface and cubic-spline surface methods [33]. In the following subsections, we provide some insights into these discretizations.

3.2 Calculation of dielectric constant distribution and kappa function

We notice that the dielectric constant \(\epsilon \) in Eq. (12), is discretized at half grid, and therefore, we use a staggered mesh which results in three arrays (in x, y, and z directions) representing the shifted dielectric values on different grids. This intends to fully take advantage of the finite volume discretization in order to minimize the solution error by increasing the spatial resolution. The dielectric coefficients and kappa functions which are piecewise constant, are mapped according to the following conditions,

$$\begin{aligned} \epsilon (x) = {\left\{ \begin{array}{ll} 2 &{}\quad \text {if } x \in \Omega _1\\ 78.54 &{}\quad \text {if } x \in \Omega _2 \,\, \text {or} \,\, \Omega _3 \end{array}\right. }, \qquad {\bar{\kappa }}(x) = {\left\{ \begin{array}{ll} 0 &{}\quad \text {if } x \in \Omega _1 \,\, \text {or} \,\, \Omega _2\\ \sqrt{\epsilon _3}\kappa &{}\quad \text {if } x \in \Omega _3 \end{array}\right. },\nonumber \\ \end{aligned}$$
(13)

where \(\Omega _1\) is the region occupied by the protein molecule, \(\Omega _2\) is the ion-exclusion layer, and \(\Omega _3\) is the region occupied by the ionic solution.

Techniques used to map the dielectric and kappa functions onto the grid include, among others, the molecular surface, and the smoothed molecular surface, which are calculated using the Connolly approach [41] and the cubic-spline surface. For more information see [33]. The cubic-spline surface method, which is our method of choice, is more suitable than the other two because it is possible to evaluate the gradient of the mean electrostatic potential such as in the determination of the solvated or polar forces. This method introduces an intermediate dielectric region at the interface between the solute and the solvent because the kappa and dielectric maps are built on a cubic-spline surface. This smoothes the transition of the functions to circumvent discontinuities inherent in them [9, 33].

3.3 Calculation of charge densities

The molecular charge density (right-hand side of the LPBE (9)) can be obtained from any file with atomic coordinates, charges, and radii. However, these atomic coordinates may not coincide with any of our grid points. Therefore, it is necessary to find an efficient method of spreading the point charges (summation term in LPBE) to the grid points. Several methods are available to map or spread the charges onto the grid points, e.g. in the APBS software package. Trilinear interpolation (or linear spline) in which charges are mapped onto nearest-neighbour grids, results in potentials which are very sensitive to the grid resolution. Cubic B-spline interpolation where charges are mapped to two layers of grid points, has an average sensitivity to the grid setup, and quintic B-spline interpolation has the lowest sensitivity to grid spacing because charges are spread out to three layers of the grid points [9].

In this study, we use the cubic B-spline interpolation (basis spline) method which maps the charges to the nearest and next-nearest grid points. Although computationally expensive, this method provides softer or smoother distributions of charges which subsequently reduces the sensitivity of the mean electrostatic potential solutions to the grid spacing [33].

3.4 Dirichlet boundary conditions

Analytical solutions to the LPBE can only be obtained for systems with simple geometries, for example, spherical and cylindrical systems. Equation (14) shows an analytical solution for a spherical molecule with uniform charge (Born ion) [6]. From this equation, we can obtain two different kinds of Dirichlet boundary conditions, the single Debye–Hückel (SDH) and multiple Debye–Hückel (MDH). For the former, we assume that all the atomic charges are collected into a single charge at the center of the solute approximated by a sphere. This kind of boundary condition is suitable when the boundary is sufficiently far from the biomolecule. On the other hand, the latter assumes the superposition of the contribution of each atomic charge (i.e. multiple, non-interacting spheres with point charges) with respective radius. This kind of boundary condition is more accurate than SDH for closer boundaries but can be computationally expensive for large biomolecules.

In this study, we employ the MDH type [9, 42],

$$\begin{aligned} u(x) = \left( \frac{e^2}{K_B T}\right) \sum \limits _{i=1}^{Nm}\frac{z_ie^{-\kappa (d-a_i)}}{\epsilon _w (1+\kappa a_i)d} \quad \text {on} \quad \partial {\Omega }, \quad d = |x-x_i |. \end{aligned}$$
(14)

Here \(z_i\) are the point partial charges of the protein, \(\epsilon _w\) is the solvent dielectric, \(\kappa = {\bar{\kappa }}/ \sqrt{\epsilon _w}\) is a function of the ionic strength of the solution, \(a_i\) are the atomic radii, and \(N_m\) is the total number of point partial charges in the protein.

4 Essentials of the reduced basis method

The Reduced basis method (RBM) and proper orthogonal decomposition (POD) are examples of popular projection-based parametrized model order reduction (PMOR) techniques. The main goal of these techniques is to generate a parametric ROM which accurately approximates the original full order model (FOM) of high dimension over varying parameter values [16, 17, 43]. The RBM exploits an offline/online procedure which ensures an accurate approximation of the high-fidelity solution in a rapid and inexpensive manner and is widely applicable in real-time and many-query scenarios. For a thorough review, see [16].

We consider a physical domain \(\Omega \subset {\mathbb {R}}^3\) with boundary \(\partial \Omega \), and a parameter domain \(D \subset {\mathbb {R}}\) in which the physical parameter of the RBM, (i.e., the ionic strength \(\mu \)) resides. The LPBE (9) is discretized with the centered finite difference scheme (12) on \(\Omega \) and Dirichlet boundary conditions (6) obtained from (14) are applied. The resultant discrete problem of the LPBE becomes, for any \(\mu \in D\), find \(u^{{\mathcal {N}}}(\mu )\) that satisfies the linear system

$$\begin{aligned} A(\mu )u^{{\mathcal {N}}}(\mu ) = f(\mu ), \quad \mu \in D, \end{aligned}$$
(15)

where \(A(\mu ) \in {\mathbb {R}}^{{\mathcal {N}}\times {\mathcal {N}}}\) and \(f(\mu ) \in {\mathbb {R}}^{{\mathcal {N}}}\). The matrix \(A(\mu )\) can also be written as a parameter-affine matrix,

$$\begin{aligned} A(\mu ) = \sum \limits _{i=1}^{Q}\Theta _i(\mu )A_i, \end{aligned}$$
(16)

where \(Q\in {\mathbb {N}}\), \(\Theta _i\) are scalar coefficient functions, and \(A_i\) are the parameter independent matrices. The \({\mathcal {N}}\times {\mathcal {N}}\) system is indeed computationally expensive to be solved for an accurate approximation of \(u(\mu )\) because the dimension \({\mathcal {N}}\) is approximately \(2\times 10^6\) in our problem. Therefore, we apply the RBM to save computational costs by providing an accurate approximation of \(u^{{\mathcal {N}}}(\mu )\) at a greatly reduced dimension of \(N \ll {\mathcal {N}}\). The ROM is given by (19).

However, as detailed in Sect. 4.2, we encounter some computational complexity in the online phase of RBM which is caused by the nonaffine parameter dependence in the right-hand side vector \(f(\mu )\) from the boundary condition (14). The parameter, the ionic strength, resides in the kappa term \(\kappa \) in the exponential function. This violates one of the key assumptions of the RBM which requires that all the system matrices and vectors must be affinely dependent on the parameter so that the offline/online decomposition is natural [44]. To circumvent this problem, we propose to apply an empirical interpolation method to reduce the complexity of the the online phase by avoiding the high-dimensional computation related to the vector \(f(\mu )\). We provide some details in Sect. 4.3.

4.1 The solution manifold and the greedy algorithm

Another key assumption in RBM besides the affine parameter dependence, is the existence of a typically smooth and very low dimensional solution manifold which almost covers all the high-fidelity solutions of (15) under variation of parameters [17],

$$\begin{aligned} {\mathcal {M}}^{{\mathcal {N}}} = \{ u^{{\mathcal {N}}}(\mu ) : \mu \in D\}. \end{aligned}$$
(17)

The RB approximation space is then built upon this solution manifold and is given by the subspace spanned by the snapshots of the FOM. In other words, it is the subspace spanned by the high-fidelity \(u^{{\mathcal {N}}}(\mu )\) solutions corresponding to a number of samples of the parameters, that is,

$$\begin{aligned} \text {range}(V) = \text {span}\{ u^{{\mathcal {N}}}(\mu _1), \ldots , u^{{\mathcal {N}}}(\mu _l)\}, \quad \forall \mu _1, ..., \mu _l \in D. \end{aligned}$$
(18)

The greedy algorithm as given in Algorithm 1 is used to generate the reduced basis space (18) through an iterative procedure where a new basis is computed at each iteration [45]. The RB space can be thought of being nested or hierarchical such that the previous basis set is a subset of the next and so on.

figure a

The RB approximation is then formulated as, for any given \(\mu \in D\), find \(u_N(\mu ) \in X_N\) which satisfies

$$\begin{aligned} A_N(\mu )u_N(\mu ) = f_N(\mu ), \end{aligned}$$
(19)

where \(A_N = V^TAV\) and \(f_N(\mu ) = V^Tf(\mu )\). V is the orthonormal matrix computed from the greedy algorithm. From the fact that \(N \ll {\mathcal {N}}\), solving the small dimensional reduced order model (ROM) is much cheaper than solving the high-fidelity model, the FOM (15) [17]. However, one problem still remains when computing the ROM. The computational complexity of evaluating the nonaffine function \(f_N(\mu )\) still depends on the dimension of the FOM, as illustrated in Sect. 4.2. Efficient implementation of Sect. 1 depends on an efficient error estimation \(\Delta _N(\mu )\) of the ROM, which is discussed in Sect. 4.5.

4.2 Computational complexity of the reduced order model (ROM)

To demystify the issue of computational complexity in the ROM, we can first rewrite (15) explicitly to illustrate the affine parameter decomposition on the left-hand side and the nonaffine right-hand side,

$$\begin{aligned} (A_1 +\mu A_2)u^{{\mathcal {N}}}(\mu ) = \uprho + b(\mu ), \quad \mu \in D, \end{aligned}$$
(20)

where the matrix \(A_1\) comes from the Laplacian operator term, \(A_2\) is a diagonal matrix from the \({\bar{\kappa }}\) term, \(\uprho \) represents the charge density term and \(b(\mu )\), the boundary conditions obtained from the analytical solution in (14). We can clearly notice the affine parameter decomposition of the matrix A in (15) into \(A_1\) and \(\mu A_2\) in (20). However, the right-hand side function \(b(\mu )\) is nonaffine in the parameter and therefore it cannot be decomposed in such a manner. Consider the ROM which is obtained by the greedy algorithm approach in Algorithm 1 and a Galerkin projection,

$$\begin{aligned} (\underbrace{\hat{A_1}}_{N \times N} + \mu \underbrace{\hat{A_2}}_{N \times N})\underbrace{u^N(\mu )}_{N \times 1} = \underbrace{\hat{\uprho }}_{N \times 1} + \underbrace{V^T}_{N \times {\mathcal {N}}}\underbrace{b(\mu )}_{{\mathcal {N}} \times 1}, \end{aligned}$$
(21)

where \(\hat{A_1} = V^TA_1V\), \(\hat{A_2} = V^TA_2V\), \(\hat{\uprho } = V^T\uprho \), and \(N \ll {\mathcal {N}}\).

It is clear from (21) that the last term of the right-hand side (RHS) still depends on the dimension \({\mathcal {N}}\) of the FOM while all the other matrices and vectors depend only on the dimension N of the ROM, with \(N \ll {\mathcal {N}}\). Therefore, the reduced order matrices on the left-hand side and the first vector on the right-hand side of (21) can be precomputed and stored during the offline phase, thereby providing a lot of computational savings. However, the term \(V^Tb(\mu )\) cannot be precomputed because of the aforementioned nonaffine parameter dependence and therefore, the Galerkin projection involving matrix-vector products which are dependent on the dimension \({\mathcal {N}}\), has to be computed in the online phase of solving the ROM.

In principle, we require \({\mathcal {O}}(2{\mathcal {N}}N)\) flops for these matrix-vector products and a full evaluation of the nonaffine analytical function (14) to obtain \(V^Tb(\mu )\). This can be computationally expensive for a large \({\mathcal {N}}\), especially during the a posteriori error estimation (computing \(\Delta _N (\mu )\)), where the residual is computed l times for varying parameter values \(\mu _i\), \(i = 1, \ldots , l\) for a single iteration of the greedy algorithm. The (discrete) empirical interpolation method [21,22,23] is an approach to circumvent this problem in order to reduce the computational complexity of the nonaffine function. We discuss this technique at length in the next subsection.

4.3 Discrete empirical interpolation method (DEIM)

(D)EIM is a complexity reduction technique that was proposed in [22] in a continuous setting and then for implementation purposes later developed in discrete versions in [21, 23]. The goal of (D)EIM is to overcome the drawback of the proper orthogonal decomposition (POD) approach for approximating a nonaffine (or nonlinear) parametrized function in the ROM during the online phase. This drawback is in the sense that the evaluation of the nonlinear/nonaffine function is still as costly as computing the counterpart of the original system, which restricts the computational savings of POD significantly. Therefore, the main idea of (D)EIM is to interpolate the nonlinear/nonaffine function by computing only a few entries of it, which dramatically reduces the computational complexity. Here, we follow the easy-to-implement description from [21, 47], though mathematically, this is equivalent to the earlier versions in [22, 23].

Fig. 2
figure 2

Decay of singular values before truncation (left) and after truncation (right) of \(\Sigma \) in (23)

We provide a brief overview on using the singular value decomposition (SVD) to obtain the interpolation basis vectors. Firstly, we compute snapshots of the function \(b(\mu )\) at a set of parameter \(\mu \) in the training set \(\Xi =\{\mu _1, \ldots , \mu _l\}\subset D\) and construct the snapshot matrix,

$$\begin{aligned} F=[b(\mu _1), \ldots , b(\mu _l)] \in {\mathbb {R}}^{{\mathcal {N}}\times l}. \end{aligned}$$
(22)

Secondly, we compute its singular value decomposition (SVD),

$$\begin{aligned} F = U_F\Sigma W^T, \end{aligned}$$
(23)

where \(U_F \in {\mathbb {R}}^{{\mathcal {N}}\times l}\), \(\Sigma \in {\mathbb {R}}^{l\times l}\), and \(W \in {\mathbb {R}}^{l\times l}\). Note that the matrices \(U_F\) and W are orthogonal, that is, \((U_F)^TU_F=W^TW = I_l\), \(I_l \in {\mathbb {R}}^{l \times l}\) and \(\Sigma = \text {diag}(\sigma _1, \ldots , \sigma _l)\), with \(\sigma _1 \ge \ldots \ge \sigma _l \ge 0\).

Figure 2 shows the decay of the singular values of \(\Sigma \) for the protein fasciculin 1. Figure 2 (left) shows the behaviour of 20 singular values with almost no decay from the 11th singular value. We discard these non-decaying singular values to obtain those in Fig. 2 (right). From the latter, we can actually truncate the singular values by selecting only the largest of them represented by \(r \in \{ 1, \ldots , l\}\) that correspond to some required degree of accuracy. In this case, \(l=11\) and \(r=9\) which corresponds to an accuracy of \(\epsilon _{\text {svd}}=O(10^{-10})\) in (25). The number r plays an important role to select a basis set \(\{u_i^F\}_{i=1}^r\) of rank r from \(U_F\) which solves the minimization problem [48],

$$\begin{aligned} \text {arg}\min \limits _{\{\tilde{u}_i\}_{i=1}^r}\sum _{j=1}^{l}\Arrowvert F_j -\sum _{i=1}^{r}\langle F_j, {\tilde{u}}_i \rangle {\tilde{u}}_i\Arrowvert _2^2, \quad \text {s.t.} \, \langle {\tilde{u}}_i,{\tilde{u}}_j \rangle = \delta _{ij},\nonumber \\ \end{aligned}$$
(24)

where \(F_j\) is the jth column of the snapshot matrix F, and \(\delta _{ij}\) is the usual Kronecker delta.

The following criterion is used to truncate the largest singular values from Fig. 2 based on some desired accuracy, \(\epsilon _{\text {svd}}\).

$$\begin{aligned} \frac{\sum \nolimits _{i=r+1}^{l}\sigma _i}{\sum \nolimits _{1=1}^{l}\sigma _i}<\epsilon _{\text {svd}}, \end{aligned}$$
(25)

where \(\sigma _i, i = 1, \ldots , l\) are the nonzero singular values of F. The dotted horizontal black line corresponds to \(r = 9\) singular values and the corresponding singular vectors \(\{u_i^F\}_{i=1}^r\) are used in the DEIM approximation.

DEIM overcomes the problem mentioned in Sect. 4.2 by determining an interpolation of the nonaffine function \(b(\mu )\). This is realized by approximating \(b(\mu )\) with the linear combination of the basis vectors \(U_F=[u_1^F, \ldots , u_r^F] \in {\mathbb {R}}^{{\mathcal {N}} \times r}\), i.e.

$$\begin{aligned} b(\mu ) \approx U_Fc(\mu ), \end{aligned}$$
(26)

where \(c(\mu )\in {\mathbb {R}}^r\) is the corresponding coefficient vector, and can be determined by assuming that \(U_Fc(\mu )\) interpolates \(b(\mu )\) at r selected interpolation points, then,

$$\begin{aligned} P^Tb(\mu ) = P^TU_Fc(\mu ), \end{aligned}$$
(27)

where P is an index matrix given by

$$\begin{aligned} P = [e_{\wp _1}, \ldots , e_{\wp _r}] \in {\mathbb {R}}^{{\mathcal {N}} \times r}, \end{aligned}$$
(28)

which consists of unit vectors \(e_{\wp _i}\), \(i = 1, \ldots , r\), where the indices \(\wp _i\), are the DEIM interpolation points which are selected iteratively with a greedy algorithm. Suppose that \(P^TU_F \in {\mathbb {R}}^{r \times r}\) is nondegenerate, then \(c(\mu )\) can be determined from (27) by

$$\begin{aligned} c(\mu ) = (P^TU_F)^{-1}P^Tb(\mu ). \end{aligned}$$
(29)

Therefore, the function \(b(\mu )\) in (14) can be approximated as

$$\begin{aligned} b(\mu ) \approx U_Fc(\mu ) = U_F(P^TU_F)^{-1}P^Tb(\mu ), \end{aligned}$$
(30)

so that the ROM in (21) with DEIM approximation becomes,

$$\begin{aligned} (\underbrace{\hat{A_1}}_{N \times N} + \mu \underbrace{\hat{A_2}}_{N \times N})\underbrace{u^N(\mu )}_{N \times 1} = \underbrace{\hat{\uprho }}_{N \times 1} + \underbrace{V^TU_F(P^TU_F)^{-1}}_{N \times r}\underbrace{P^Tb(\mu )}_{r \times 1}.\nonumber \\ \end{aligned}$$
(31)

The interpolant \(V^TU_F(P^TU_F)^{-1}P^Tb(\mu )\) can be computed a lot cheaper than \(V^Tb(\mu )\) because we can precompute \(V^TU_F(P^TU_F)^{-1}\) independent of the parameter \(\mu \). Alternatively, we can also compute only those entries in \(b(\mu )\) that correspond to the interpolation indices \(\wp _i, i = 1, \ldots , r\), \(r \ll {\mathcal {N}}\), i.e., \(P^T b(\mu )\) instead of the entire \({\mathcal {N}}\) entries in \(b(\mu )\). Algorithm 2 provides a brief overview of the DEIM procedure.

Remark 1

For the actual numerical implementation of the interpolation (30), the matrix P needs not be explicitly applied. Instead, only the interpolation indices \(\wp _i, i = 1, \ldots , r\) need to be applied to the matrix \(U_F\) or the nonaffine function \(b(\mu )\). This implies that \(P^TU_F\) merely consists of the rows of \(U_F\) which correspond to the interpolation indices \(\wp _i, i = 1, \ldots , r\). Similarly, \(P^Tb(\mu )\) is a condensed vector composed of a few entries of \(b(\mu )\) which correspond to the same indices.

figure b

Note that in Algorithm 2, the POD basis \(\{u_i^F\}_{i=1}^r\) is of great significance as an input basis for the DEIM procedure in two ways. First, a set of interpolation indices \(\wp _i\) are constructed inductively based on this basis through a greedy algorithm. Secondly, an error analysis in [21] indicates that the ordering of this basis according to the dominant singular values makes it the right choice for this algorithm. In step 1, the process selects the first interpolation index \(\wp _1\) which corresponds to the location of the entry in \(u_1^F\) with the largest magnitude. The subsequent indices in step 6, \(\wp _i, i = 2, \ldots , r\), are selected in such a way that each of them corresponds to the location of the entry in r (step 5) with the largest magnitude.

4.4 DEIM approximation error

We compute the error due to the DEIM interpolation which is to be included into the residual in the \(\textit{a posteriori}\) error estimation. This error was first proposed in [47] for nonlinear dynamical systems and has also been used in [49] in the context of a nonlinear population balance systems. We extend this idea to parametrized elliptic PDEs where the DEIM error is given by,

$$\begin{aligned} e_{\text {DEIM}} = b(\mu )-\tilde{b}(\mu )= \Pi _2(I-\Pi )b(\mu ), \end{aligned}$$
(32)

where \(\Pi \) and \(\Pi _2\) are oblique projectors defined as follows,

$$\begin{aligned} \Pi = U_F(P^TU_F)^{-1}P^T, \end{aligned}$$
(33)

and

$$\begin{aligned} \Pi _2 = (I-\Pi )\tilde{U}_F(\tilde{P}^T(I-\Pi )\tilde{U}_F)^{-1}\tilde{P}^T. \end{aligned}$$
(34)

In Eq. (33), \(U_F = (u_1^F, \ldots , u_r^F) \in {\mathbb {R}}^{{\mathcal {N}}\times r}\) and \(P \in {\mathbb {R}}^{{\mathcal {N}}\times r}\) are the current DEIM basis and interpolation index matrix obtained from Algorithm 2.

To obtain \(\Pi _2\) in (34), we assume that \(r^*(\ge r)\) DEIM basis vectors \(U_F^* = (u_1^F, \ldots , u_{r^*}^F)\) interpolate \(b(\mu )\) exactly, i.e.

$$\begin{aligned} b(\mu ) = U_F^*((P^*)^TU_F^*)^{-1}(P^*)^Tb(\mu ), \end{aligned}$$
(35)

where \(P^*\) is the corresponding index matrix with \(r^*\) columns. Finally, \(\tilde{U}_F = U_F^*(:,r+1:r^*)\) and \(\tilde{P} = P^*(:,r+1:r^*)\) such that \(U_F^* = [U_F,\tilde{U}_F]\) and \(P^* = [P,\tilde{P}]\), where \(M(:,r+1:r^*)\), using MATLAB notation [49]. In the next subsection, we introduce an a posteriori error estimation derived from the residual of the approximate RB solution and the DEIM approximation error.

4.5 A Posteriori error estimation

A posteriori error estimators are computable indicators which provide an estimate to the actual solution error by utilizing the residual of the approximate RB solution. An efficient error estimator is required to possess three major characteristics, namely: it is required to be as sharp as possible (close to the unknown actual error), asymptotically correct (tend to zero with increasing RB space dimension N, at a similar rate as the actual error), and computationally cheap. Therefore, these estimators guarantee both reliability and efficiency of the reduction process [50]. In Sects. 4.5.1 and 4.5.2, we briefly introduce the concepts of error estimators which are related related to the solution vector (i.e., electrostatic potential) and the output (electrostatic energy), respectively.

4.5.1 Error estimator for the solution vector

We first compute the residual due to DEIM interpolation;

$$\begin{aligned} r_N^{\text {DEIM}}(u_N;\mu ) = (\uprho + \tilde{b}(\mu )) - A^{{\mathcal {N}}}(\mu )u_N(\mu ), \end{aligned}$$
(36)

where \(\tilde{b}(\mu ) = \Pi b(\mu )\) is the DEIM interpolation of \(b(\mu )\) and \(u_N(\mu ):= Vu^N(\mu )\) is the RB solution transformed back to the high-fidelity space \({\mathcal {N}}\). Then the final residual is obtained by including the DEIM approximation error derived in Sect. 4.4 as follows;

$$\begin{aligned} \begin{aligned} r_N(u_N;\mu )&= (\uprho + b(\mu )) - A^{{\mathcal {N}}}(\mu )u_N(\mu ) \\&= (\uprho + \tilde{b}(\mu )) - A^{{\mathcal {N}}}(\mu )u_N(\mu ) + b(\mu ) - \tilde{b}(\mu )\\&= r_N^{\text {DEIM}}(u_N;\mu ) + \underbrace{b(\mu ) - \tilde{b}(\mu )}_{:=e_{\text {DEIM}}}\\&= r_N^{\text {DEIM}}(u_N;\mu ) + e_{\text {DEIM}}. \end{aligned} \end{aligned}$$
(37)

The a posteriori error estimation is then derived from the residual in (37). Rewriting the first equation of (37), we obtain

$$\begin{aligned} \begin{aligned} r_N(u_N;\mu )&= A^{{\mathcal {N}}}(\mu )u^{{\mathcal {N}}}(\mu ) - A^{{\mathcal {N}}}(\mu )u_N(\mu ) \\&= A^{{\mathcal {N}}}(\mu )e(\mu ), \end{aligned} \end{aligned}$$
(38)

where the error of the solution vector \(e(\mu ):= u^{{\mathcal {N}}}(\mu ) - u_N(\mu )\) is given by

$$\begin{aligned} e(\mu ) = (A^{{\mathcal {N}}}(\mu ))^{-1}r_N(u_N;\mu ). \end{aligned}$$
(39)

We obtain an upper bound for the 2-norm of the error by taking the 2-norm on both sides of Eq. (39), i.e.

$$\begin{aligned} \begin{aligned} \Arrowvert e(\mu )\Arrowvert _2 \le \Arrowvert (A^{{\mathcal {N}}})^{-1}(\mu )\Arrowvert _2 \Arrowvert r_N(u_N;\mu )\Arrowvert _2&= \frac{\Arrowvert r_N(u_N;\mu )\Arrowvert _2}{\sigma _{\text {min}}(A^{{\mathcal {N}}}(\mu ))}\\&=: {\tilde{\Delta }}_N(\mu ), \end{aligned} \end{aligned}$$
(40)

where \(\sigma _{\text {min}}(A^{{\mathcal {N}}}(\mu ))\) is the smallest eigenvalue of the symmetric matrix \(A^{{\mathcal {N}}}(\mu )\) [50]. The quantity \(\tilde{\Delta }_N(\mu )\) is a rigorous error bound, and can be used to select snapshots within the greedy algorithm in the offline stage and consequently to measure the accuracy of the RB approximation [45]. For efficient computation of the norm of the residual and error bounds, see [44, 50]. It is computationally expensive to compute \(\sigma _{\text {min}}(A^{{\mathcal {N}}}(\mu ))\) in the online phase as it entails solutions of large-scale eigenvalue problems [45]. Therefore, in our computations, we use the norm of the residual as our error estimator, which satisfies the inequality (40) and provides an estimation of the true error that works well for our problem. It also provides rapid convergence as depicted in the numerical results in Fig. 4. It is given by

$$\begin{aligned} \Arrowvert e(\mu )\Arrowvert _2 \approx \Arrowvert r_N(u_N;\mu )\Arrowvert _2 := \Delta _N(\mu ). \end{aligned}$$
(41)

4.5.2 Output error estimator

When the output becomes interesting, one can also use the output error bounds (or estimators) to measure the output error. For the PBE model, the output of interest is given by \(s(\mu ) = u(\mu )^Tf(\mu )\), which represents the electrostatic free energy of the system. We here briefly describe the output error estimator a compliant problem, in which the output functional is equivalent to the load/source functional, see [44] for more details. Additionally, the coeffient matrix of the system should be symmetric for any parameter \(\mu \in D\). These properties are fulfilled by the PBE system. According to the derivation in [44], the output error bound is given by

$$\begin{aligned} s(\mu ) -s_N(\mu ) \le \Delta _{s}'(\mu ) := \frac{\Arrowvert r_N(u_N;\mu )\Arrowvert _2^2}{\sigma _{\text {min}}(A^{{\mathcal {N}}}(\mu ))}, \end{aligned}$$
(42)

where \(s_N(\mu ) = u_N(\mu )^Tf_N(\mu )\) is the output computed from the ROM, and \(u(\mu )\) and \(u_N(\mu )\) are the solutions of the FOM in (15) and ROM in (19), respectively. We here also avoid the use of \(\sigma _{\text {min}}(A^{{\mathcal {N}}}(\mu ))\) due to the high computational costs involved. We provide some numerical results in Fig. 8 for different kinds of molecules.

5 Numerical results

5.1 Finite difference results

We consider the LPBE (9), a parameter domain \(\mu \in D = [0.05,0.15]\) containing ionic strengths (or varying concentrations) of the ionic solvent, and a cubic grid of 129 points and a box length of \(60\,\text {\AA }\) centered at the protein position. The parameter domain is chosen for a feasible physiological process and \(\mu \) resides in the second term in the kappa function. Information about the molecular charge density is obtained from a PQR file which contains 1228 atoms of the protein fasciculin 1 toxin CPDB entry 1FAS. We discretize the LPBE with a centered finite difference scheme and the resulting parametrized linear system (15) is of more than \(2\times 10^{6}\) degrees of freedom. This FOM is solved by the aggregation-based algebraic multigrid (AGMG) method, where a tolerance of \(10^{-10}\) and a zero initial guess are used [51,52,53].

The choice of the tolerance directly affects the results of the greedy algorithm. Therefore, it is prudent to ensure that the high-fidelity solution (\(u^{\mathcal {N}}(\mu ), {\mathcal {N}}= 2,146,689\)) is highly accurate. Some of the iterative methods commonly used in the PBE solvers are; the minimal residual (MINRES) method, the generalized minimal residual (GMRES) method and the biconjugate gradient stabilized (BICGSTAB) method. These methods employ the incomplete LU factorization to generate the preconditioner matrices L (lower diagonal) and U (upper diagonal) which are used to improve their stability and convergence at low costs [33].

Figure 3 shows the lower cross-sections of the z-axis of the electrostatic potential u(xy, 1) at varying ionic strengths (i.e., \(\mu = \{0, 0.05, 0.15, 0.5\}\), respectively). Notice that the electrostatic potential decays exponentially with the variation of the parameter \(\mu \), and is attributed to the large force constant (332 kcal/mol) of electrostatic interactions. In the absence of ions (that is, at \(\mu = I = 0\)), these interactions are long ranged (see Fig. 3 (top left)), but in the presence of ions (that is, \(\mu > 0\)), they are damped or screened and gradually decay to zero [2]. The runtime taken to obtain the high-fidelity solution \(u^{\mathcal {N}}(\mu )\) is approximately 28 s on average and varies depending on the value of the ionic strength used.

Fig. 3
figure 3

High-fidelity solutions (\(u^{{\mathcal {N}}}(\mu )\)) at varying ionic strengths (i.e., \(\mu \) = {0, 0.05, 0.15, 0.5}), respectively

5.2 Accuracy of FDM

We demonstrate the accuracy and reliability of the FDM before applying the RBM for the solution of the PBE. This is because the accuracy of the RBM depends on that of the underlying discretization technique. In this study, we consider six test examples to validate the FDM which include a Born ion and five proteins consisting of between 380 and 3400 atoms, respectively. We compare the FDM results with those of APBS for electrostatic solvation free energy at different mesh refinements. Firstly, we consider the Born ion which is a canonical example for polar solvation and whose analytical solution is well known.

This analytical solution gives the polar solvation energy which results from the transfer of a non-polarizable ion between two dielectrics [54], i.e.,

$$\begin{aligned} \Delta _pG_{\text {Born}} = \frac{q^2}{8\pi \epsilon _0 r}\left( \frac{1}{\epsilon _{\text {out}}}- \frac{1}{\epsilon _{\text {in}}}\right) , \end{aligned}$$
(43)

where q is the ion charge, r is the ion radius, \(\epsilon _{\text {out}}\) is the external dielectric coefficient (e.g., water) and \(\epsilon _{\text {in}}\) is the internal dielectric coefficient (e.g., vacuum). This model assumes zero ionic strength. We consider a Born ion of unit charge, \(3\text {\AA }\) radius and located at the origin ((0, 0, 0)). Here, \(\epsilon _{\text {in}} = 1\) and \(\epsilon _{\text {out}} = 78.54\). With these parameters, the analytical solution in (43) is

$$\begin{aligned} \Delta _pG_{\text {Born}} = -691.85\left( \frac{q^2}{r}\right) = -230.62 \text {kJ/mol}. \end{aligned}$$
(44)

We compare numerical computations using Eq. (11) for charging free energies in a homogeneous (\(\epsilon _{\text {in}} = \epsilon _{\text {out}} = 1\)) and heterogeneous (\(\epsilon _{\text {in}} = 1, \epsilon _{\text {out}} = 78.54\)) dielectric coefficients with the analytical solution [54]. We use the following additional parameters. We consider two different mesh sizes (or \(\Delta x\)), which result in different degrees of freedom (or \({\mathcal {N}}\)) as shown in Table 1. Numerical results using FDM are compared with those of the exact solution (43) and APBS (which also uses FDM). The results show that the FDM method gives solutions which are consistent with those of the exact solutions, as well as those of the APBS software package.

Table 1 Comparison of Born ion solvation energies in kJ/mol
Table 2 Comparison of electrostatic solvation free energies \(\Delta E\), between FDM and APBS for different proteins
Fig. 4
figure 4

Comparison of maximal error estimator and true error for the proteins in Table 2, respectively

Secondly, we compare the accuracy of FDM for the LPBE with the following set of typical examples of use of LPBE and APBS in particular: Calculation of the total electrostatic energy (including self-interaction energies) of a 22 residue, \(\alpha \)-helical peptide from the N protein of phage \(\lambda \) which binds to its cognate 19 nucleotide box B RNA hairpin [55], Fasciculin 1, an anti-acetylcholinesterase toxin from green mamba snake venom [56], the electrostatic potential of a minimized FKBP protein from binding energy calculations of small ligands [57], a 180-residue cytokine solution NMR structure of a murine–human chimera of leukemia inhibitory factor (LIF) [58], and the binding energy of a balanol ligand to the catalytic subunit of the CAMP-dependent protein kinase A, here the apo form of the enzyme [59]. The proteins and or complexes have the following number of atoms (379, 1228, 1663, 2809, and 3423), respectively.

The electrostatic solvation free energies, \(\Delta E\) are computed and shown in Table 2 for varying grid resolutions \(\Delta x\). However, we here do not have the analytical electrostatic energies for these proteins but rely on the accuracy of the APBS software for validation. A compute cluster with 4 Intel Xeon E7-8837 CPUs running at 2.67 GHz (8 cores per CPU) and 1 TB RAM, split into four 256 GB parts (each CPU controls one part) is used to carry out the computations which require a huge amount of memory, so that it allows for solving large-scale problems with \({\mathcal {N}} \ge (3\times 10^6)\).

From Table 2, we can clearly see that the results of the FDM method agree well with those of APBS in terms of convergence with respect to mesh refinement. Hence, we conclude that we can test the RBM in conjunction with our FDM solver reliably. We expect no differences when using a FDM solver like APBS, which would require intruding the software.

5.3 Accuracy of the RBM

In this section, we evaluate the accuracy of the RBM for the approximation of the high-fidelity solutions generated by the FDM for the five proteins which were investigated in Sect. 5.2. We consider a cubic domain of 129 points and a box length of \(60\,\text {\AA }\) centered at the protein position for all the computations. Figure 4 shows the decay of the error estimator and the true error during the greedy algorithm at the current RB dimension \(i = 1, \ldots , N\). They corroborate the asymptotic correctness property stated in Sect. 4.5, and it is evident that the error estimator is an upper bound to the true error. We also observe a high convergence rate of the error estimator with up to two orders of magnitude and the RB space is rich enough at only six iterations of the greedy algorithm for the five proteins. These error estimators are the maximal error and relative maximal error, respectively, and are defined as,

$$\begin{aligned} \Delta _N^{\max } = \max \limits _{\mu \in \Xi }\Arrowvert r_N(u_N;\mu )\Arrowvert _2, \end{aligned}$$

and

$$\begin{aligned}{\Delta _N^{\max }}/{\Arrowvert u_N(\mu ^*)\Arrowvert _2},\end{aligned}$$

where \(\mu ^*=\arg \max \limits _{\mu \in \Xi }\Vert r_N (u_N; \mu )\Vert _2.\)

Note that we here set the inf-sup constant, \(\sigma _{\min }(A^{{\mathcal {N}}}(\mu ))\) in (40) to unity \(\forall \,\, \mu \in \Xi \) as in (41), because they are of the \({\mathcal {O}}(10^{-2})\). This makes the norm of the residual in (41) a better error estimator than that of (40) for this specific problem.

Fig. 5
figure 5

Comparison of error estimator and true error for the final ROM for \(\Xi \in D\) and for the proteins in Table 2, respectively

Fig. 6
figure 6

True error \(\Vert u^{{\mathcal {N}}}(\mu )-u_N(\mu )\Vert _2\) for random parameters \(\mu \in D\) for the proteins in Table 2, respectively

In the greedy algorithm, we apply an error tolerance of \(\epsilon = 10^{-3}\) and a training set \(\Xi \) consisting of \(l=11\) samples of the parameter. From Fig. 4, it is evident that both the error estimator and the true error fall below the prescribed tolerances at the final dimension of the ROM (i.e. \(N = 6\)).

Figure 5 shows the error estimator and the true error of the finally constructed ROM over \(\mu _i = \Xi \), for \(i = 1, ..., 11\) samples for each protein as in Table 2, respectively. It is evident that the error estimator for the final RB approximations of dimension \(N = 6\) is indeed an upper bound of the true error and a trend that both quantities behave similarly is clearly visible from the graphs. Consequently, the error estimators fall below the greedy tolerance of \(10^{-3}\).

Figure 6 is used to validate the true error in Fig. 5, whereby 20 random values of the parameter domain D which are different from those in the training set \(\Xi \) are used. A common observation from these figures is that the true errors fall below \({\mathcal {O}}(10^{-4})\), which is approximmately an order of magnitude below the error estimator. The computational time taken to obtain the approximate solution \(u_N(\mu )\) in the online phase is approximately \(4.97\times 10^{-3}\) s on average, for any parameter \(\mu \in D\).

Fig. 7
figure 7

Effectivity indices to demonstrate the quality of the error estimators in Fig. 4, respectively

Fig. 8
figure 8

Comparison of error estimator and true output error for the Born ion, proteins Fasciculin 1, a 180-residue cytokine solution NMR structure of a murine–human chimera of leukemia inhibitory factor, and a minimized FKBP protein, respectively

A standard measure to determine the efficiency and the quality of the error estimator is the so-called effectivity index [60] given by

$$\begin{aligned} \mathrm {eff} := \frac{\Delta _N(\mu )}{\Vert u^{{\mathcal {N}}}(\mu )-u_N(\mu )\Vert _2}, \end{aligned}$$
(45)

where \(\Delta _N(\mu )\) is the error estimator and \(\Vert u_{{\mathcal {N}}}(\mu )-u_N(\mu )\Vert _2\) is the true error. The effectivity index in (45) is required to be \(\ge 1\) for rigorosity and as close as possible to unity for sharpness of the error estimator.

In Fig. 7, we present the effectivity indices to demonstrate the quality and efficiency of the error estimator in Fig. 4. Since they are of order \({\mathcal {O}}(10)\) at the final RB dimension, we can claim that the error estimator is of good quality.

Figure 8 demostrates the output error estimators for the generalized Born ion and some of the protein molecules which were introduced in Sect. 5.2. We observe that the dimension of the ROMs obtained by these output estimators are slighlty smaller than those obtained in Fig. 4.

5.4 Runtimes and computational speed-ups

Before we dive into the runtimes of the various phases of the RBM, we would like to make clear about some key notions of the two phases of the greedy algorithm, i.e., the offline and online phases, respectively. The offline phase is subdivided into two parts, the offline-offline phase, and the offline–online phase [44]. The offline–offline phase involves computation of the snapshots and pre-computing the parameter-independent quantities. The offline–online phase involves computation of the error estimator and the RB approximation. On the other hand, the pure online phase is where the final ROM has been constructed after the accuracy of the reduced basis is fulfilled, and is independent of the greedy algorithm. In this phase, the ROM can be solved for any parameter value in the parameter domain, including those which are different from the training set.

Table 3 Runtimes and speed-ups due to DEIM
Table 4 Runtimes and speed-ups for FOM, ROM and RBM

Table 3 shows the runtimes and computational speed-ups obtained with the use of DEIM approximation during the offline-online phase of the RBM at a single iteration of the greedy algorithm and with the use of the RBM in solving the linear system. We use a modest PC with the following specifications: Intel (R) Core (TM)2 Duo CPU E8400 @ 3.00 GHz with 8 GB RAM. In this section, the PBE is applied to the protein fasciculin 1.

Table 4 shows the runtimes of computing the FOM and the ROM at a given parameter value, respectively. The runtimes at different phases of the RBM are also presented. Speed-up factors induced by solving the ROM are listed to visualize the big difference between the FOM and the ROM. The ROM is much faster and takes a split second to assemble and solve for any parameter value. In the offline phase of the RBM, which comprises the greedy algorithm, the dominating cost is that of solving the linear system of the FOM by AGMG (i.e., computing snapshot) at every iteration of the greedy algorithm. Miscellaneous in this case refers to the runtime to initialize the FDM, including assembling the FOM. The total RBM runtime includes the miscellaneous and offline runtimes.

Table 5 shows the runtimes of APBS and RBM for solving the FOM and the ROM at any given parameter value, respectively. The speed-up factor of RBM w.r.t. the APBS is also shown for different numbers of parameter values. It is evident that RBM is much more efficient than APBS when solving the system for many input parameter values (i.e. in a multi-query context). This is because we only need to solve a small system of order \(N = 6\) once the final ROM model has been constructed which takes approximately \(9.91\times 10^{-3}\) s for each parameter value, whereas APBS solves the FOM besides the initial system setup.

In a nutshell, to solve the LPBE for any parameter value with APBS, it takes 22.893 s, because the solver has to reconstruct the linear system. This implies that it takes approximately 2289.3 s to compute the potential for 100 parameter values (neglecting the runtime to modify the input files). This is more expensive than the total RBM time of 96.12 s. On the other hand, it takes the RBM approximately \(9.91\times 10^{-1}\) s to solve the ROM of the LPBE for the same number of parameters values (i.e., 100).

The RBM only solves the FOM N times during the expensive offline phase as stated in Algorithm 1. Moreover, the RBM utilizes the precomputed system matrices and vectors and only solves the ROM for the new parameter value, thus saving a significant amount of computational costs during the online phase. This efficient implementation of a new mathematical approach to solve the PBE holds great promise towards reducing computational costs in a multi-query scenario and molecular dynamics simulation.

Table 5 Runtimes for APBS and RBM

6 Conclusions

In this paper, we have presented a new, computationally efficient approach to solving the LPBE for varying parameter values occuring in biomolecular simulations. The RBM reduces the high-dimensional full order model by a factor of approximately 360, 000 and the computational time by a factor of approximately 7600. The error estimator provides fast convergence to the reduced basis approximation at an accuracy of \({\mathcal {O}}(10^{-3})\). The true error between the RBM and the FDM is smaller than \(O(10^{-4})\), for all the parameter samples tested. DEIM provides a speed-up of 20 in the online phase by reducing the complexity of the nonaffine Dirichlet boundary conditions. This is achieved by only selecting a few entries from a high-dimensional vector which provides the most important information. Therefore, the RBM can be extremely beneficial in cases where simulations of the PBE for many input parameter values are required. This method can also be implemented in the available PBE solvers, for example, APBS, after a few adjustments regarding parametrization in the linear system are made. Our future research is based on two aspects. Firstly, we plan to develop a more efficient error estimator which is more rigorous than merely taking the norm of the residual. Secondly, we aim to develop a modified version of the LPBE which considers the PBE as interface problem by applying a range-separated tensor format [36]. This is expected to reduce the computational complexity experienced by the current PBE studies, and to provide more accurate results due to the more realistic model.