1 Introduction

Fluid flow through porous media is a complicated phenomenon due to the structural complexity inherent to all porous materials. The difficult problem of directly modelling flow, transport and chemical activity at the pore scale is typically avoided via upscaling techniques such as volume and/or ensemble averaging, homogenisation or stochastic methods (Mauri 1991; Whitaker 1998; Barbu et al. 2016). Many of these upscaling techniques involve the derivation of macroscopic continuum equations via closures which involve effective parameters that encapsulate the complexities of the pore geometry. Derivation of these equations, however, often depends on physical assumptions only valid in a certain parameter regime. In addition, the effective parameters are difficult to determine and often oversimplify the pore-scale behaviour; the averaging procedure discards important pore-scale fluctuations. Typical approaches to empirically estimate these complicated, nonlinear effective parameters often depend only on the porosity of the medium (Carman 1937; Nakasaki et al. 1987).

The problems associated with determining and using effective parameters provide motivation for a dual-scale formulation, in which the pore-scale flow, transport and/or chemical activity is explicitly modelled over many realisations conditional upon the macroscopic medium properties. The macroscale problem is solved using the control volume finite element method (CVFEM) (Voller 2009), with each value of the flux—the quantity that is fundamental to CVFEM—provided by a dedicated pore-scale simulation. Instead of calculating the flux by applying an empirical relation that discards important pore-scale information, a spatially-periodic, steady-state pore-scale problem is solved to yield the flux, maintaining the maximum feasible resolution of the microstructure (Alyaev et al. 2014). The spatial periodicity of the pore-scale problem results from the simplifying assumption that the medium is locally periodic—the porous structure varies slowly enough across the macroscale domain that it can be represented as periodic at any given macroscale point (Abdulle and Budáč 2015). Such a medium is illustrated in Fig. 1.

Fig. 1
figure 1

A locally-periodic porous medium. The porous microstructure changes across the macroscale domain slowly enough that, when looking at any given point, the structure appears periodic

For the dual-scale model to be viable, the pore-scale solver is required to achieve a reasonable level of accuracy in an extremely short computational time, due to the vast number of pore-scale problems to be solved during a dual-scale simulation. Focusing on pressure-driven, two-dimensional incompressible flow, this research aims to investigate whether pore-scale solvers based on the lattice Boltzmann method (LBM) (Chen and Doolen 1998) or the boundary element method (BEM) (Cheng and Cheng 2005) satisfy the demands of such a dual-scale model.

There is ample literature on the modelling of pore-scale fluid flow with LBM and BEM individually (Chella et al. 1998; Bernsdorf et al. 2000; Bai et al. 2009; Gao et al. 2008). Studies by Pereira et al. (2012) and Yang et al. (2016) compared LBM with smoothed particle hydrodynamics and the finite volume method, for realistic 3D porous media, and found LBM to be the most efficient of the methods. However, to the best of our knowledge, no study has performed a comparison between LBM and BEM in any context, let alone in the context of 2D pore-scale flow or dual-scale flow.

The objective of this study is to determine which of BEM, the single-relaxation-time LBM (SRT), and the two-relaxation-time LBM (TRT) performs best on each of several pore-scale benchmark problems. These problems are formulated to gauge suitability for a dual-scale model, based on efficiency and accuracy, and the porosity, shape and complexity of the porous structure are varied between problems. The benchmark problems include both Stokes flow, in which viscous forces dominate inertial forces, and visco-inertial flow, in which viscous and inertial forces are of comparable magnitude. We seek to identify the limitations of the methods, and we especially seek to verify the competitiveness of our recently-developed 2D2P BEM formulation (Hassard et al. 2018). Finally, we have formulated a dual-scale model that couples an SRT, TRT or BEM pore-scale solver with a CVFEM macroscale solver to simulate either Stokes flow or visco-inertial flow through a locally-periodic porous medium. We aim to confirm the applicability of our pore-scale findings to this dual-scale model by exhibiting some preliminary dual-scale results for Stokes flow.

In § 2, we provide some general background regarding the problems and the methods, and discuss the methodology of the dual-scale model. In § 3, we provide background and implementation details for LBM. We do the same for BEM in § 4 and CVFEM in § 5. In § 6, we run a number of simulations on several test problems in order to compare the pore-scale methods, and discuss the implications. In § 7, we show some preliminary results that demonstrate how these pore-scale results translate to a dual-scale model of Stokes flow.

2 Background and Methodology

Pore-scale fluid flow is governed by the incompressible Navier–Stokes equations, which can be expressed in nondimensional steady-state form as follows (Doering and Gibbon 1995):

$$\begin{aligned}{} & {} {\varvec{\nabla }}\cdot {\textbf {u}} = 0, \end{aligned}$$
(1a)
$$\begin{aligned}{} & {} {{\,\textrm{Re}\,}}\, {\textbf {u}} \cdot {\varvec{\nabla }}{\textbf {u}} = -{\varvec{\nabla }}P + \nabla ^2 {\textbf {u}} + {\textbf {F}}, \end{aligned}$$
(1b)

where \({\textbf {u}}\) is the fluid velocity, P is the kinematic pressure, and \({\textbf {F}}\) is a uniform body force, representing the mean pressure gradient. The Reynolds number \({{\,\textrm{Re}\,}}= U L / \nu _\text {K}\) is the ratio of inertial forces to viscous forces, where \(\nu _\text {K}\) is the kinematic viscosity, and U and L are characteristic velocity and length scales.

As discussed by Dybbs and Edwards (1984), the range of Reynolds numbers can be partitioned into a number of distinct flow regimes, from Stokes flow (\({{\,\textrm{Re}\,}}\ll 1\)) to turbulent flow (\({{\,\textrm{Re}\,}}\gtrsim 300\)). However, most porous media applications involve flows with \({{\,\textrm{Re}\,}}\lesssim 10\), and the bulk of research has been on such flows. In this regime, viscous forces dominate, while inertial forces are quadratic in the Reynolds number and range from negligible to small (Hill et al. 2001). We further divide this regime into two cases, which we use to illustrate the strengths and weaknesses of the candidate pore-scale solvers: Stokes flow at small \({{\,\textrm{Re}\,}}\) (approximately \(10^{-3}-10^{-2}\)), and visco-inertial flow at moderate \({{\,\textrm{Re}\,}}\) (approximately 1–10). We use the term visco-inertial for moderate \({{\,\textrm{Re}\,}}\) since the viscous and inertial forces are of similar significance near \({{\,\textrm{Re}\,}}= 1\). The precise Reynolds number for each visco-inertial test case is given in Table 1.

In the Stokes limit \({{\,\textrm{Re}\,}}\ll 1\), the inertial forces are negligible in comparison to the viscous forces, and so the Navier–Stokes equations (1) simplify to the Stokes equations, given in nondimensional steady-state form as follows:

$$\begin{aligned}{} & {} {\varvec{\nabla }}\cdot {\textbf {u}} = 0, \end{aligned}$$
(2a)
$$\begin{aligned}{} & {} {\varvec{\nabla }}P = \nabla ^2 {\textbf {u}} + {\textbf {F}}. \end{aligned}$$
(2b)

The technique of volume averaging can be applied to these equations to derive Darcy’s law, a commonly-used closure (Whitaker 1986):

$$\begin{aligned} {\textbf {v}} = {\textbf {K}} \cdot {\textbf {F}}, \end{aligned}$$
(3)

where the effective parameter \({\textbf {K}}\) is known as the permeability tensor, \({\textbf {F}}\) is the pore-scale body force, and \({\textbf {v}}\) is interpreted at the pore scale as the volume-averaged fluid velocity, and at the macroscale as the flux. A pore-scale solver can be used to compute \({\textbf {K}}\) in the Stokes regime, or in general it can provide \({\textbf {v}}\) directly (as the spatial mean of \({\textbf {u}}\)), without the need to unnecessarily restrict \({{\,\textrm{Re}\,}}\). This provides our primary motivation to apply a dual-scale model in this context.

In the dual-scale model, the macroscale problem is the conservation of mass in a steady, incompressible fluid without mass generation or annihilation (i.e. without a source term). The governing equation is (Alyaev et al. 2012)

$$\begin{aligned} {\varvec{\nabla }}\cdot {\textbf {v}} = 0, \end{aligned}$$
(4)

which is to be solved using CVFEM. Throughout the CVFEM simulation, the flux \({\textbf {v}}\) at each control volume face must be calculated for many different values of the pore-scale force \({\textbf {F}}\). This necessitates a large number of pore-scale simulations during a dual-scale simulation, and so each pore-scale simulation must incur an extremely short runtime for the dual-scale simulation to be viable.

We illustrate the proposed dual-scale method in Fig. 2, exhibiting a typical macroscale mesh, and typical pore-scale visco-inertial and Stokes flow problems. For visco-inertial flow, the nonlinearity of the Navier–Stokes equations (1) means that a given flux \({\textbf {v}}\) must be recalculated (using a pore-scale simulation) whenever \({\textbf {F}}\) changes, and so \({\textbf {F}}\) and \({\textbf {v}}\) must be exchanged between the scales many times. In contrast, the linearity of the Stokes equations (2) means that for Stokes flow, \({\textbf {v}}\) can be computed directly from Darcy’s law (3) if the permeability \({\textbf {K}}\) is known. In this case, only a one-way transfer of information is necessary, with \({\textbf {K}}\) calculated once for each pore-scale domain before the macroscale simulation.

A dual-scale method as described above has been proposed by Alyaev et al. (2014), using CVFEM at the macroscale and the finite element method at the pore scale. However, the requirement for a mesh in the finite element method can make it too computationally intensive for complex pore-scale geometries, especially in the context of a dual-scale model. This motivates us to seek an alternative pore-scale solver for our dual-scale model, and to this end we investigate and compare two pore-scale solvers: the lattice Boltzmann method (LBM) and the boundary element method (BEM).

Fig. 2
figure 2

Schematic of the dual-scale method: macroscale mesh (left); neighbourhood of a single macroscale element (centre); typical pore-scale unit cells for Stokes flow (bottom) and visco-inertial flow (top), with representations of the corresponding information exchange between scales (right). Highlighted in the centre are: a typical element (red lines), the neighbouring elements (black lines), the element’s nodes (green points), their associated control volumes (blue dashed lines), the midpoints of the control volume faces (white points), and the element centre (black point)

LBM originated as a statistically-averaged variant of the lattice gas automaton (McNamara and Zanetti 1988; Higuera and Jiménez 1989), and retains much of the simplicity of its predecessor. The fluid is represented by a number of fixed-velocity particle distributions, which travel between nodes, reverse their direction at wall boundaries, and interact via relaxation towards an equilibrium. While the Navier–Stokes equations (1) are not solved directly, it can be shown that this procedure generates behaviour consistent with them (Krüger et al. 2017).

The standard BEM can only be used in the Stokes limit \({{\,\textrm{Re}\,}}\ll 1\), where the linear Stokes equations (2) can be rewritten as an integral over the grain boundaries. This reduces the domain by one dimension, so that only the boundary is discretised into elements. A dense matrix of integrals over these boundary elements is inverted to give the permeability and the pressure distribution on the boundary, which can then be used to determine the velocity at any point in the fluid domain. Nonlinear variants of BEM, which can simulate steady visco-inertial flow, have been proposed (Bush and Tanner 1983; Florez and Power 2001). However, we neglect them as they involve iteration, nonlinear data fitting and a large number of complicated boundary integrals involving a large number of Green’s functions. Furthermore, the complexity involved in applying these variants is exacerbated by the periodicity of our problem.

Fig. 3
figure 3

The periodic media used for the pore-scale test problems. The dashed lines represent the boundaries of the unit cells, which have a side length of L. Left: medium A, containing a circular cylinder of radius R; centre: medium B, containing a square prism with side length S; right: medium C, containing a collection of cylinders of various radii (Alyaev et al. 2014). Medium C is shown with the eigenvectors of the permeability tensor, scaled by their respective eigenvalues. The principal directions of flow (i.e. the directions of these eigenvectors) are approximately \({4.7}^{\circ }\) from the coordinate axes

To provide test problems for our comparison of LBM and BEM, we introduce three periodic pore-scale media with \(L \times L\) unit cells (see Fig. 3). Medium A contains a circular cylinder of radius R, which is chosen to be either L/10, L/5 or L/3. Medium B contains a square prism with side length \(S = L \sqrt{\pi } / 5\), so that the porosity is identical to that of medium A with \(R = L/5\). Medium C, based on a medium of Alyaev et al. (2014), contains a collection of 44 cylinders of various radii.

In order to quantitatively compare the performance of the pore-scale solvers, we measure two quantities for each simulation: the computational runtime T, and the error \(\varepsilon \) with respect to a benchmark solution. The succinct representation of each entire simulation by only two numbers allows us to compare many simulations at once. For the runtime, we maintain consistency by running all LBM and BEM simulations using the same software (MATLAB R2017a) and hardware (the high performance computing facilities at the Queensland University of Technology). For simulations with significant variance in runtime, we average the runtime over a large number of trials.

For our error metric \(\varepsilon \), we use the relative error of the mean velocity \({\textbf {v}}\):

$$\begin{aligned} \varepsilon = \frac{\Vert {\textbf {v}} - {\textbf {v}}^\text {bench} \Vert }{\Vert {\textbf {v}}^\text {bench} \Vert }, \end{aligned}$$
(5)

where \({\textbf {v}}^\text {bench}\) is the mean velocity of the benchmark solution. For Stokes flow through medium A, we use an analytic solution as a benchmark (Sangani and Acrivos 1982). For all visco-inertial flow problems, we use a fine-grid, finite volume simulation from ANSYS Fluent 18.0 as a benchmark. For Stokes flow through media B and C, we do not have an analytic solution, but due to the difficulties associated with generating a solution of sufficient precision with Fluent, we instead use the highest-resolution BEM solution as a benchmark. The results here should carry over to the dual-scale model, where the error can be decomposed so that the contribution from the pore-scale solver is simply a constant multiple of the error at the pore scale (Alyaev et al. 2014).

3 Lattice Boltzmann Method

The lattice Boltzmann method (LBM) was developed in the late 1980’s (McNamara and Zanetti 1988; Higuera and Jiménez 1989), motivated as a statistically-averaged version of a lattice gas automaton (LGA). An LGA is a discrete model of individual particles that move according to a grid of nodes and interact via collisions. These collisions are programmed such that the average of a large ensemble of LGA simulations will exhibit behaviour consistent with the Navier–Stokes equations (1). By replacing the individual particles with probability distributions, LBM eliminates the need for an ensemble of simulations, and will agree with the Navier–Stokes equations in the continuum limit and the limit of low Mach number (Krüger et al. 2017).

The main attraction of LBM is that velocity is discretised along with space and time, thereby trivialising advection. The algorithm is straightforward to implement, and it is split into two complementary phases: a parallelisable calculation step and an error-free advection step. As such, LBM is ideal for a quick prototype model.

The simplicity of the method is also the source of some disadvantages. To allow exact advection, a uniform lattice must be used (in our case, a square grid), preventing localised refinement at e.g. complex boundaries. Since the governing Navier–Stokes equations are not directly solved, modifications to these equations are not as straightforward as in other numerical methods.

Here, we give a brief introduction to LBM, then discuss issues of convergence, stability and optimisation. For more background information, the interested reader is referred to the review article by Chen and Doolen (1998) and the book by Krüger et al. (2017) for a detailed account of the history, development, implementation and other applications of the method.

3.1 Nine-Velocity Model

We begin with the two-dimensional, nine-velocity model D2Q9 (Qian et al. 1992). Fluid particles are restricted to the nodes of an \(N \times N\) square grid, and the only allowed nonzero velocities are those that cover the distance between neighbouring nodes in exactly one time-step. A scaling is applied to set the grid spacing and time-step to 1, so that all spatial and temporal coordinates are integers. The set of nine allowed velocities includes a rest velocity \({\textbf {e}}_0 = (0,0)\), and nonzero velocities in the four orthogonal directions and the four diagonal directions:

$$\begin{aligned} {\textbf {e}}_1 = -{\textbf {e}}_3 = (1,0),{} & {} {\textbf {e}}_2 = -{\textbf {e}}_4 = (0,1),{} & {} {\textbf {e}}_5 = -{\textbf {e}}_7 = (1,1),{} & {} {\textbf {e}}_6 = -{\textbf {e}}_8 = (-1,1), \end{aligned}$$
(6)

where the velocities \({\textbf {e}}_\alpha \) are parameterised by the integer \(\alpha \), which we refer to as the direction.

At each node there are densities \(f_\alpha \) of particles travelling in each direction \(\alpha \), known as distributions. The evolution of these distributions is described by the lattice Boltzmann equation (LBE) (Krüger et al. 2017):

$$\begin{aligned} f_\alpha ({\textbf {x}} + {\textbf {e}}_\alpha , t + 1) = f_\alpha ({\textbf {x}}, t) + \varOmega _\alpha ({\textbf {f}}({\textbf {x}}, t)) + S_\alpha , \end{aligned}$$
(7)

where the collision operator \(\varOmega _\alpha \) accounts for the aggregate effect of particle collisions, the force term \(S_\alpha \) accounts for the body force \({\textbf {F}}\), and \({\textbf {f}}\) is the vector containing each \(f_\alpha \). These distributions \(f_\alpha \) are used to calculate the fluid density \(\rho \) and the fluid velocity \({\textbf {u}}\). For incompressible flow with the reference density \(\rho _0\) set to 1—in which the velocity \({\textbf {u}}\) and momentum density \(\rho _0 {\textbf {u}}\) are identical—the density and velocity are given by \(\rho = \sum _\alpha f_\alpha \) and \({\textbf {u}} = \sum _\alpha f_\alpha {\textbf {e}}_\alpha \).

3.2 Collision Operator

Kinetic theory states that the aggregate effect of microscopic collisions is to relax the local velocity distribution towards equilibrium (Wolf-Gladrow 2000). In LBM, the local velocity distribution \({\textbf {f}}\) is relaxed by collision towards the equilibrium distribution \({\textbf {f}}^\text {eq}\), which is based on the Maxwell–Boltzmann distribution. For incompressible flow with reference density \(\rho _0 = 1\), the equilibrium distribution is written as follows (Pan et al. 2006):

$$\begin{aligned} \text {Stokes flow}&\text {:}\qquad f_\alpha ^\text {eq} = w_\alpha \, \Big ( \rho + 3 {\textbf {e}}_\alpha \cdot {\textbf {u}} \Big ), \end{aligned}$$
(8a)
$$\begin{aligned} \text {Visco-inertial flow}&\text {:}\qquad f_\alpha ^\text {eq} = w_\alpha \bigg ( \rho + 3 {\textbf {e}}_\alpha \cdot {\textbf {u}} + \frac{9}{2} \left( {\textbf {e}}_\alpha \cdot {\textbf {u}} \right) ^2 - \frac{3}{2} \Vert {\textbf {u}} \Vert ^2 \bigg ), \end{aligned}$$
(8b)

where the \(w_\alpha \) are constant weights: \(w_0 = 4/9\), \(w_{1,2,3,4} = 1/9\) and \(w_{5,6,7,8} = 1/36\). These weights are also used to compute the force term \(S_\alpha = 3 w_\alpha {\textbf {e}}_\alpha \cdot {\textbf {F}}\) (He et al. 1997).

We use two different collision operators in this work, and treat the corresponding lattice Boltzmann models as separate methods. The first operator is the single-relaxation-time (SRT) collision operator, also known as the Bhatnagar–Gross–Krook operator (Bhatnagar et al. 1954):

$$\begin{aligned} \varOmega _\alpha = -\omega {\varPhi }_\alpha , \end{aligned}$$
(9)

where \({\varPhi }_\alpha = f_\alpha - f_\alpha ^\text {eq}\) is the nonequilibrium distribution, and \(\omega \in (0,2)\) is the relaxation rate. In this study, we set \(\omega = 1\), since such a choice provides several benefits with respect to simplicity, accuracy and stability (Chen et al. 1996; Ginzburg et al. 2010). In particular, the right-hand side of (7) can simply be replaced with \(f_\alpha ^\text {eq} + S_\alpha \). A disadvantage of setting \(\omega = 1\) is that the error will eventually approach some finite value as N increases. However, we find that our values of N are generally not large enough to encounter this problem.

The second operator we use is the two-relaxation-time (TRT) collision operator, which allows more control over the simulation while retaining the simplicity of the SRT operator (Krüger et al. 2017):

$$\begin{aligned} \varOmega _\alpha = -\omega ^+ {\varPhi }^+_\alpha - \omega ^- {\varPhi }^-_\alpha , \end{aligned}$$
(10)

where \(\omega ^\pm \) are the even and odd relaxation rates, \({\varPhi }^\pm _\alpha = ({\varPhi }_\alpha \pm {\varPhi }_{-\alpha })/2\) are the even and odd components of the nonequilibrium distribution \({\varPhi }_\alpha \), and the direction \(-\alpha \) is defined as the reverse of direction \(\alpha \). While the relaxation rates may both be arbitrarily chosen from the open interval (0, 2), there are certain combinations of the relaxation rates that result in desirable behaviour. The so-called magic number

$$\begin{aligned} \varLambda = \left( \frac{1}{\omega ^+} - \frac{1}{2} \right) \left( \frac{1}{\omega ^-} - \frac{1}{2} \right) , \end{aligned}$$
(11)

can be fixed at one of several special values, each providing benefits for accuracy and/or stability (Krüger et al. 2017). In this study, we set \(\varLambda = 1/12\), which eliminates third-order spatial error. We treat \(\omega ^+\) as our free parameter, with \(\omega ^-\) calculated from (11). In the following, we often simply use \(\omega \) to refer to \(\omega \) in SRT and \(\omega ^+\) in TRT.

3.3 Time Evolution

A time-step in LBM is split into collision and streaming phases. The streaming phase is given by

$$\begin{aligned} f_\alpha ({\textbf {x}} + {\textbf {e}}_\alpha , t + 1) = f_\alpha \left( {\textbf {x}}, t + \tfrac{1}{2}\right) , \end{aligned}$$
(12)

where \(t + \frac{1}{2}\) is metaphorical, referring to a distribution after collision but before streaming. The SRT and TRT collision operators yield the following collision phases:

$$\begin{aligned} \text {SRT:}\qquad&f_\alpha \left( {\textbf {x}}, t + \tfrac{1}{2}\right) = f_\alpha ^\text {eq} + 3 w_\alpha {\textbf {e}}_\alpha \cdot {\textbf {F}}, \end{aligned}$$
(13a)
$$\begin{aligned} \text {TRT:}\qquad&f_\alpha \left( {\textbf {x}}, t + \tfrac{1}{2}\right) = f_\alpha - \omega ^+ {\varPhi }^+_\alpha - \omega ^- {\varPhi }^-_\alpha + 3 w_\alpha {\textbf {e}}_\alpha \cdot {\textbf {F}}, \end{aligned}$$
(13b)

where evaluation at \(({\textbf {x}}, t)\) is implied on the right-hand side. This separation results in two advantages: the collision phase involves only local computations (as the only coordinate to appear is \({\textbf {x}}\)), and the streaming phase requires only memory reallocation (which introduces no error).

Together, these phases can be shown to approximately solve the incompressible Navier–Stokes equations (1) in lattice units when the equilibrium distribution (8b) is used, or the Stokes equations (2) in lattice units when the equilibrium distribution (8a) is used (Chen and Doolen 1998; Pan et al. 2006). The associated viscosity, known as the lattice viscosity, is \(\nu = (2 - \omega ) / 6 \omega \) in both cases. We note that the lattice viscosity \(\nu \) is a dimensionless numerical parameter related to the time-step, and is independent of the (dimensional) kinematic viscosity \(\nu _\text {K}\). Since \(\omega \) is fixed in SRT and free in TRT, the lattice viscosity and therefore the time-step are also fixed in SRT and free in TRT.

Steady-state flows can be modelled in LBM by running a transient simulation until some time \(t_\text {f}\) at which it has satisfactorily converged (Krüger et al. 2017). If the simulation is viewed as an iterative solver, the time-steps are viewed as iterations, and we refer to them as such hereafter. To determine whether the mean velocity \({\textbf {v}}(t)\) has converged, we check the following criterion every 100 iterations:

$$\begin{aligned} \max _{0< \psi \le \phi } \frac{\left\| {\textbf {v}}(t) - {\textbf {v}}(t - \psi t) \right\| }{\left\| {\textbf {v}}(t - \psi t) \right\| } < \theta , \end{aligned}$$
(14)

where \(\phi = 0.3\) is a fixed proportion of velocity history and \(\theta \) is a relative tolerance. This criterion ensures that there are no significant fluctuations in the recent velocity history. In an idealised scenario in which the mean velocity converges exponentially towards the true steady-state value \({\textbf {v}}_0\) (approximately the behaviour observed in LBM), the mean velocity is \({{\textbf {v}}^*(t) = ( 1 - e^{-k t} ) \, {\textbf {v}}_0}\) for some positive constant k. The relative error \(\varepsilon ^*(t)\) is \(e^{-k t}\) and the criterion (14) is satisfied at time \(t_\text {f}^*\), when \(( \varepsilon ^*(t_\text {f}^*)^{1 - \phi } - \varepsilon ^*(t_\text {f}^*) ) / ( 1 - \varepsilon ^*(t_\text {f}^*)^{1 - \phi } ) = \theta \). Given \(\phi \) and \(\theta \), this last equation can be solved numerically for \(\varepsilon ^*(t_\text {f}^*)\), the relative error remaining at termination in the idealised scenario. In an LBM simulation with the same values of \(\phi \) and \(\theta \), \(\varepsilon ^*(t_\text {f}^*)\) can be interpreted as the approximate precision of \({\textbf {v}}(t_\text {f})\), the mean velocity at termination.

We conducted preliminary experiments to examine the relationship between the tolerance \(\theta \) and the precision of \({\textbf {v}}(t_\text {f})\), and to determine how close this precision is to the predicted value \(\varepsilon ^*(t_\text {f}^*)\). For SRT, the precision matched the predicted value across a range of tolerances, resolutions and media. We therefore use the tolerance \(\theta = 10^{-3}\) for SRT and take the predicted value \(5.59\times 10^{-5}\) to be the precision of \({\textbf {v}}(t_\text {f})\). For TRT, the precision fluctuated unpredictably as the tolerance changed, peaking at approximately 10 times the predicted value. To correct for this, we simply aim for a predicted precision 10 times smaller than we need. Therefore, to match the precision of SRT in §§ 6 and 7, we use the tolerance \(\theta = 2\times 10^{-4}\) for TRT. However, in the preliminary results of §§ 3.5 and 3.6, we use the tolerance \(\theta = 1.1\times 10^{-11}\) for TRT, which gives a predicted precision of \(2.21\times 10^{-16}\).

We use an initial condition at rest and at the reference density \(\rho _0 = 1\). Instability can arise from an initial condition with large nonequilibrium components \({\varPhi }_\alpha \), due to the risk of distributions \(f_\alpha \) relaxing to (unphysical) negative values. Therefore, we simply initialise the distributions at equilibrium: \(f_\alpha ({\textbf {x}}, 0) = w_\alpha \).

3.4 Boundary Conditions

Periodic boundary conditions are implemented in LBM by adjusting the streaming phase for nodes adjacent to the periodic boundary. A boundary-crossing distribution (i.e. a distribution that would cross the boundary if streamed normally) is simply streamed to the corresponding node on the opposite side of the domain.

To implement a wall boundary (i.e. a no-penetration, no-slip boundary with \({\textbf {u}} = {\textbf {0}}\)), we use the Bouzidi–Firdaouss–Lallemand (BFL) condition for SRT, and the multi-reflection (MR) condition for TRT. For a boundary-crossing distribution \(\beta \) at node \({\textbf {x}}_b\), the BFL condition can be written as (Pan et al. 2006):

$$\begin{aligned} f_{-\beta }({\textbf {x}}_b, t + 1) = {\left\{ \begin{array}{ll} 2q f_\beta \left( {\textbf {x}}_b, t + \tfrac{1}{2}\right) + (1 - 2q) f_\beta ({\textbf {x}}_b, t + 1), &{} q \le \frac{1}{2},\\ \dfrac{1}{2q} f_\beta \left( {\textbf {x}}_b, t + \tfrac{1}{2}\right) + \left( 1 - \dfrac{1}{2q} \right) f_{-\beta }\left( {\textbf {x}}_b, t + \tfrac{1}{2}\right) , &{} q > \frac{1}{2}, \end{array}\right. } \end{aligned}$$
(15)

where \(q \in [0,1)\) is the distance to the boundary, which occurs at \({\textbf {x}}_b + q {\textbf {e}}_\beta \).

The MR condition is similar to the BFL condition, but includes a correction term to ensure third-order accuracy (Pan et al. 2006). For a boundary-crossing distribution \(\beta \) at node \({\textbf {x}}_b\), the MR condition can be written succinctly as (Ginzburg and d’Humières 2003):

$$\begin{aligned} \varDelta _{-\beta }({\textbf {x}}_b, t) = \kappa _1 \varDelta _\beta ({\textbf {x}}_b, t) + \kappa _2 \varDelta _\beta ({\textbf {x}}_b - {\textbf {e}}_\beta , t) + \kappa _3 \omega ^- {\varPhi }^-_\beta ({\textbf {x}}_b, t), \end{aligned}$$
(16)

where the following definitions hold:

$$\begin{aligned} \kappa _1&= \frac{1 - 2q - 2q^2}{(1 + q)^2},&\kappa _2&= \frac{q^2}{(1 + q)^2},&\kappa _3&= \frac{4 \varLambda }{3 \nu (1 + q)^2}, \end{aligned}$$
(17a)
$$\begin{aligned} \varDelta _\beta ({\textbf {x}}, t)&= f_\beta ({\textbf {x}}, t + 1) - f_{-\beta }\left( {\textbf {x}}, t + \tfrac{1}{2}\right) , \end{aligned}$$
(17b)

with \(\varLambda \) as the magic number (11), and \(\nu \) as the lattice viscosity. Given \(\varDelta _{-\beta }({\textbf {x}}_b, t)\) from (16), the unknown distribution \(f_{-\beta }({\textbf {x}}_b, t + 1)\) is computed from (17b) as \({\varDelta _{-\beta }({\textbf {x}}_b, t) + f_\beta ({\textbf {x}}_b, t + \tfrac{1}{2})}\).

3.5 Stability

LBM simulations become unstable in both the over-relaxation limit \(\omega \rightarrow 2\) (equivalently, \(\nu \rightarrow 0\)) and the under-relaxation limit \(\omega \rightarrow 0\) (equivalently, \(\nu \rightarrow \infty \)) (Krüger et al. 2017). The simulations in this study mostly have \(\omega \le 1\), and so we only discuss instability in this regime. For an LBM simulation to be stable in the under-relaxation regime, \(V \equiv \max _t \Vert {\textbf {v}}(t) \Vert \) must not exceed a certain critical speed \(V_\text {c}\), which depends on a number of parameters (Chen and Doolen 1998). For a given problem with \({{\,\textrm{Re}\,}}\) and N fixed, the characteristic lattice velocity \(U_\text {L} = \nu {{\,\textrm{Re}\,}}/ N\) (i.e. the magnitude of the true mean velocity in lattice units) scales with lattice viscosity, while the critical speed is constant throughout the under-relaxation regime (Krüger et al. 2017). Therefore, as lattice viscosity grows, supercritical velocity \(V > V_\text {c}\) (and therefore instability) becomes more likely.

Approximating V by the characteristic lattice velocity \(U_\text {L}\), we write the stability condition as

$$\begin{aligned} \frac{\nu {{\,\textrm{Re}\,}}}{N} < V_\text {c}. \end{aligned}$$
(18)

In SRT, the lattice viscosity is fixed at \(\nu = 1/6\) (since the relaxation rate is fixed at \({\omega = 1}\)), and so (18) can be rearranged to give the minimum number of nodes as \(N_\text {c} \propto {{\,\textrm{Re}\,}}\). In TRT, the lattice viscosity is free and so (18) can be rearranged to give the maximum lattice viscosity as \(\nu _\text {c} \propto N / {{\,\textrm{Re}\,}}\). An increase in \(N_\text {c}\) for SRT or a decrease in \(\nu _\text {c}\) for TRT will increase the minimum stable runtime, thereby limiting our ability to control the runtime through parameter selection.

Fig. 4
figure 4

Error of SRT (blue) and TRT (red) against grid resolution N, for medium A with \(R = L/5\). The left and right plots show the same data with narrow and wide windows respectively

While not strictly related to stability, it is noted by Ginzburg and d’Humières (2003) that the error exhibits highly non-monotonic behaviour as N increases, but the reason for this still appears to be an open question in the literature. This is demonstrated in Fig. 4, for SRT with the BFL boundary condition and TRT with the MR boundary condition. The left plot shows the oscillations for consecutive values of N, while the right plot shows the general downward trend. While the oscillations continually decay for TRT, they appear to persist for SRT.

Fig. 5
figure 5

Number of iterations k against lattice viscosity \(\nu \), using TRT. Left: medium A with \(R = L/10\) and resolutions \(N = 27 / 45 / 71\) (blue/red/green). Right: medium C with resolutions \(N = 177 / 281 / 445\). Solid lines: Stokes flow; dashed lines: visco-inertial flow

3.6 Optimal Viscosity for TRT

While the lattice viscosity \(\nu \) is fixed in SRT, it is free to be chosen in TRT. With the MR boundary condition and a fixed value of the magic number \(\varLambda \), the mean velocity given by TRT is independent of lattice viscosity (Ginzburg and d’Humières 2003).Footnote 1 The only consideration when choosing the lattice viscosity is therefore the number of iterations k required to reach steady state. The number of iterations governs the runtime, and so we attempt to find the lattice viscosity \(\nu ^*\) that minimises it.

Figure 5 shows the dependence of k on \(\nu \) for Stokes and visco-inertial flows through two different periodic media, using several resolutions. We note that k typically decreases until its minimum and increases thereafter. We term these two regimes small-\(\nu \) and large-\(\nu \). The minima for medium C (the collection of cylinders) are straightforward to locate, as \(k(\nu )\) is monotonic in each of the regimes. However, the minima for medium A (the single cylinder) are more difficult to locate: for Stokes flow, \(k(\nu )\) is non-monotonic near its minimum; for visco-inertial flow, \(\nu _\text {c}\) is in the small-\(\nu \) regime and so \(\nu ^*= \nu _\text {c}\).

Fig. 6
figure 6

Number of iterations k required for the heuristic lattice viscosity \(\nu ^\text {H}\) (solid lines) and the optimal lattice viscosity \(\nu ^*\) (dashed lines), using TRT. Blue: medium A with \(R = L/10\); red: medium C. Left: Stokes flow; right: visco-inertial flow

The determination of \(\nu ^*\) for a given medium and resolution N requires several TRT simulations, which is not a viable strategy in general. A heuristic is thus required. It can be seen in the left plot of Fig. 5 that the optimal lattice viscosity \(\nu ^*\) increases steadily with increasing N. While the assumption \(\nu ^*\propto N\) is not necessarily true (see e.g. the right plot of Fig. 5), it nevertheless leads to an effective heuristic. For each medium, this heuristic involves the determination of the optimal lattice viscosity \(\nu ^*_0\) for a benchmark resolution \(N_0\) (which requires several TRT simulations), and the definition of the heuristic lattice viscosity as \(\nu ^\text {H} = N \nu ^*_0 / N_0\). Even with a relatively small \(N_0\) (ensuring quick simulations) and a loose tolerance when minimising k (thereby keeping the number of simulations low), this heuristic is effective over a large range of N. As can be seen in Fig. 6, the number of iterations incurred by the heuristic is typically only slightly more than the optimum, and the discrepancy generally diminishes as N increases.

We note that the dependence of k on \(\nu \) has been previously discussed in the literature (Ginzburg and d’Humières 2003; Pan et al. 2006; Talon et al. 2012). These studies almost exclusively exhibit monotonic behaviour, but disagree on the direction. As \(\nu \) increases, Ginzburg and d’Humières (2003) and Pan et al. (2006) report decreasing k, while Talon et al. (2012) show increasing k in one case and the existence of a local minimum (at a much lower \(\nu ^*\) than we have observed) in others. In contrast, we have found local minima for every test case we have studied (other than visco-inertial flows for which \(\nu _\text {c}\) is in the small-\(\nu \) regime), for both Stokes flow and visco-inertial flow, across a range of unit cells (although only two are shown in this article). The dependence of k on \(\nu \) that we have found is interesting and reminiscent of the behaviour observed in other relaxation-based iterative methods (such as successive over-relaxation), and we believe this warrants further investigation in future research to identify a possible closed form of the optimal lattice viscosity \(\nu ^*\).

4 Boundary Element Method

The boundary element method (BEM) was initially developed as a means to numerically solve boundary integral equations (Lachat and Watson 1976; Brebbia and Dominguez 1977). If the governing differential equation is linear (and so can be written in terms of integrals over the boundary only) and Green’s functions associated with the governing equation are available, the problem can be efficiently solved with BEM by discretising only the boundary.

Unlike LBM, which simulates the nonlinear Navier–Stokes equations (1), BEM solves the linear Stokes equations (2), valid only for \({{\,\textrm{Re}\,}}\ll 1\). BEM is therefore only applicable to a very limited range of \({{\,\textrm{Re}\,}}\). Due to the requirements on the governing equation and Green’s functions, modifications to the method, such as reintroducing the nonlinear inertial term \({{\textbf {u}} \cdot {\varvec{\nabla }}{\textbf {u}}}\), are usually not straightforward (Bush and Tanner 1983; Florez and Power 2001), so we restrict ourselves to the standard BEM. The method is also sensitive to problem topology, as certain Green’s functions, such as for a doubly-periodic two-dimensional (2D2P) medium, are much more difficult to handle than others (Hassard et al. 2018).

Discretisation of the domain boundaries requires many fewer and much simpler elements than discretisation of the entire fluid domain. The resultant simplicity and efficiency is the main advantage of BEM. However, this advantage is diminished as the total boundary length increases (requiring more elements in BEM) or the porosity decreases (requiring fewer fluid elements in other methods) in porous media applications.

The simplicity of the method and of Stokes flow (2) allows the permeability tensor \({\textbf {K}}\) of a 2D2P medium to be calculated from a single linear system (Hassard et al. 2018). The small number of elements required to discretise the boundary allows this linear system to be much smaller than one encountered in a typical numerical method such as finite volume. However, this linear system is dense and asymmetric, with each matrix element a complicated integral, so it requires significant computational effort to both populate and solve.

Here, we give a brief introduction to BEM, with particular emphasis on 2D2P domains. The interested reader is referred to Cheng and Cheng (2005) for a general account of the history and development of the method, and to Hassard et al. (2018) for details specific to our implementation.

4.1 Boundary Integral Equation

With the help of Green’s functions \({\textbf {G}}({\textbf {x}}, {\textbf {x}}_0)\) for velocity and \({\textbf {T}}({\textbf {x}}, {\textbf {x}}_0)\) for stress, the velocity at any point in the fluid domain can be written in terms of the velocity and traction \({\textbf {t}}\) (normal stress) on the boundary:

$$\begin{aligned} u_k({\textbf {x}}_0)&= \frac{1}{4 \pi } \int _{\varGamma } u_i T_{ij}^k n_j \, \text {d} s - \frac{1}{4 \pi } \int _{\varGamma } G_i^k t_i \, \text {d} s,&{\textbf {x}}_0&\in \varOmega , \end{aligned}$$
(19a)
$$\begin{aligned} \frac{1}{2} u_k({\textbf {x}}_0)&= \frac{1}{4 \pi } \int _{\varGamma }^{\mathscr {P}} u_i T_{ij}^k n_j \, \text {d} s - \frac{1}{4 \pi } \int _{\varGamma } G_i^k t_i \, \text {d} s,&{\textbf {x}}_0&\in {\varGamma }, \end{aligned}$$
(19b)

where \(\varOmega \) is the interior of the fluid region, \({\varGamma } = \partial \varOmega \) is the boundary of the fluid region, \({\textbf {n}}\) is the normal pointing into the fluid and \(\mathscr {P}\) denotes an integral defined in the principal value sense. Where not specified, we take \({\textbf {u}}\), \({\textbf {t}}\) and \({\textbf {n}}\) to be evaluated at \({\textbf {x}}\), and the Green’s functions \({\textbf {G}}\) and \({\textbf {T}}\) to be evaluated at \(({\textbf {x}}, {\textbf {x}}_0)\). Here we use Einstein notation, where a repeated index indicates summation.

In order to use (19a) to compute the velocity at an arbitrary fluid point, both \({\textbf {u}}\) and \({\textbf {t}}\) must be known along the entirety of \({\varGamma }\). Some of this boundary information is specified by the boundary conditions, while the rest can be determined from (19b). The boundary element method centres around the approximation of both the shape of \({\varGamma }\) and the values of \({\textbf {u}}\) and \({\textbf {t}}\) along it, and the resultant simplification and discretisation of (19b) into a linear system.

4.2 Doubly-Periodic Green’s Function

For a doubly-periodic two-dimensional (2D2P) domain with the wall boundary condition \({\textbf {u}} = {\textbf {0}}\) along the fluid-solid boundary \({\varGamma }_\text {solid}\), the boundary integral equation (19) becomes (Pozrikidis 2002)

$$\begin{aligned} u_k({\textbf {x}}_0)&= v_k - \frac{1}{4 \pi } \int _{{\varGamma }_\text {solid}} G_i^k t_i \, \text {d} s,&{\textbf {x}}_0 \in \varOmega \cup {\varGamma }, \end{aligned}$$
(20)

where \({\textbf {v}}\) is the mean fluid velocity, and the traction \({\textbf {t}}\) is to be determined. The stress Green’s function \({\textbf {T}}({\textbf {x}}, {\textbf {x}}_0)\) is absent from this equation, leaving only the velocity Green’s function \({\textbf {G}}({\textbf {x}}, {\textbf {x}}_0)\) to be defined. For a 2D2P medium, this function is

$$\begin{aligned} {\textbf {G}}({\textbf {x}}, {\textbf {x}}_0) = \sum _{{\varvec{\lambda }} \in \mathbb {Z}^2} {\varvec{\phi }}^{{\varvec{\lambda }}}(\hat{{\textbf {x}}}) + \sum _{{\varvec{\mu }} \in \mathbb {Z}^2 \setminus \{{\textbf {0}}\}} {\varvec{\rho }}^{{\varvec{\mu }}}(\hat{{\textbf {x}}}), \end{aligned}$$
(21)

where \(\hat{{\textbf {x}}} = {\textbf {x}} - {\textbf {x}}_0\), and for a square unit cell with side length L, the first sum is over the physical lattice of spatial vectors \(L {\varvec{\lambda }}\), and the second sum is over the reciprocal lattice of wave vectors \(2 \pi {\varvec{\mu }} / L\). The relative convergence of the sums is controlled by the splitting parameter \(\alpha \), in the sense that \(\big \Vert {\varvec{\phi }}^{{\varvec{\lambda }}} \big \Vert \) (and therefore the number of terms required for the first sum to converge) is an increasing function of \(\alpha \), and \(\big \Vert {\varvec{\rho }}^{{\varvec{\mu }}} \big \Vert \) (and therefore the number of terms required for the second sum to converge) is a decreasing function of \(\alpha \). The choice of \(\alpha \) is important to consider, as \({\varvec{\rho }}^{{\varvec{\mu }}}\) has a simple form that is integrable by hand, while \({\varvec{\phi }}^{{\varvec{\lambda }}}\) is difficult to both calculate and integrate. We use a runtime-minimising algorithm to choose \(\alpha \) (Hassard et al. 2018).

4.3 Discretisation

In the boundary element method, the boundary \({\varGamma }_\text {solid}\) is approximated by B elements. Each element \({\varGamma }_q\) is a line segment of length \(h_q\), with a constant value of traction \({\textbf {t}}^{{\varGamma }_q}\). The 2D2P boundary integral equation (20) then becomes

$$\begin{aligned} u_k({\textbf {x}}_0) = v_k - \frac{1}{4 \pi } \sum _{q = 1}^B t^{{\varGamma }_q}_i \int _{{\varGamma }_q} G_i^k \, \text {d} s, \end{aligned}$$
(22)

where we note that the sum is simply a linear combination of the 2B boundary tractions \(t^{{\varGamma }_q}_i\).

To solve for the boundary tractions, we form a linear system of 2B equations by setting \({\textbf {x}}_0\) to be the midpoint \({\textbf {x}}_n\) of each element \({\varGamma }_n\):

$$\begin{aligned} {\textbf {A}} {\textbf {t}}^{{\varvec{\Gamma }}} = 4 \pi {\textbf {Q}} {\textbf {v}}, \end{aligned}$$
(23)

where \({\textbf {A}}\) is the matrix containing the integrals \(\int _{{\varGamma }_q} {\textbf {G}}({\textbf {x}}, {\textbf {x}}_n) \, \text {d} s\), the vector \({\textbf {t}}^{{\varvec{\Gamma }}}\) contains the boundary tractions, and \({\textbf {Q}} = (\, {\textbf {I}}_2 \,\cdots \, {\textbf {I}}_2 \,)^\text {T} \in \mathbb {R}^{2B \times 2}\), with \({\textbf {I}}_2\) the \(2 \times 2\) identity matrix.

If only the permeability of the medium is required, then the boundary tractions \({\textbf {t}}^{{\varvec{\Gamma }}}\) need not be determined. Our previous article (Hassard et al. 2018) showed that the permeability \({\textbf {K}}\) may be determined from the matrix \({\textbf {A}}\):

$$\begin{aligned} {\textbf {K}} = \frac{\tau }{4 \pi } \left[ {\textbf {H}}^\text {T} {\textbf {A}}^\dagger {\textbf {Q}} \right] ^{-1}, \end{aligned}$$
(24)

where \(\tau \) is the area of the unit cell, and \({\textbf {H}} = (\, h_1 {\textbf {I}}_2 \,\cdots \, h_B {\textbf {I}}_2 \,)^\text {T} \in \mathbb {R}^{2B \times 2}\). The pseudoinverse \({\textbf {A}}^\dagger \) is used here since \({\textbf {A}}\) only has rank \(2B - 1\). However, since the nullspace of \({\textbf {A}}\) is a subset of the nullspace of \({\textbf {H}}^\text {T}\), the product \({\textbf {A}}^\dagger {\textbf {Q}}\) can be replaced with any solution to \({\textbf {A}} {\textbf {x}} = {\textbf {Q}}\) without changing \({\textbf {K}}\).

4.4 Error Tolerance

The runtime-minimising algorithm we use to choose the splitting parameter \(\alpha \) (Hassard et al. 2018) involves the specification of an error tolerance for the sums in (21). However, the link between this tolerance and the global precision of the permeability \({\textbf {K}}\) (or mean velocity \({\textbf {v}}\)) is not straightforward. For simple porous media such as media A and B, preliminary tests showed the global precision to be approximately twice the tolerance, and so we use a tolerance of \(2.8 \times 10^{-5}\) to match LBM’s precision of \(5.59 \times 10^{-5}\) (see § 3.3). For complex media such as medium C, preliminary tests showed the permeability to be much more imprecise and so we use a tolerance of \(10^{-7}\).

Another aspect of the runtime-minimising algorithm is the decision of how many terms of the first sum in (21) should be computed. In our previous work (Hassard et al. 2018), we found the optimal number of terms to be either 0 or 1, depending on the displacement vector \(\hat{{\textbf {x}}}\). For the small error tolerance we had used, the shape of the interface between 0 and 1 terms was erratic and radially asymmetric, and so we had used nearest-neighbour interpolation to decide the number of terms. However, for the higher values of error tolerance that we currently use, the interface between 0 and 1 terms is approximately a circle. This leads to a more efficient scheme in which \(\Vert \hat{{\textbf {x}}}\Vert \) is compared with the radius of this circle, which avoids the need for an interpolant and lowers the runtime of a BEM simulation by 15–20 %.

5 Control Volume Finite Element Method

Many physical systems can be modelled by conservation laws, the unifying feature of which is a flux term, appearing in the weak form of the law as an integral of the flux out of an arbitrary control volume. A numerical method that enforces exact conservation arises from a scheme in which the domain is partitioned into control volumes and the flux is numerically integrated along each control volume boundary. Along with the application of finite element interpolants (known as shape functions), this is a central idea behind the development of the control volume finite element method (CVFEM). Initially suggested in an analysis of pure diffusion (Winslow 1966), the method was generalised in the 1980’s to solve an advection–diffusion equation (Baliga and Patankar 1980, 1983), representing a large class of conservation laws modelling such phenomena as fluid flow, solute transport, heat transfer and electrochemical activity (Carr et al. 2013; Conway et al. 2018).

What follows is a brief overview of CVFEM as applied to our governing equation (4). This is because most literature on CVFEM is too general for our purposes in some aspects and too specific in others: too general since our equation involves only a flux term, and too specific since the form of our flux is in general unknown. For a much more comprehensive account of CVFEM, we refer to Voller (2009).

5.1 Discretisation of Governing Equation

Given a macroscale pressure field \({\varPi }({\textbf {x}})\) and body force \({\varvec{\Phi }}\), the fluid is driven by the effective pressure gradient \({\varvec{\nabla }}\varpi \equiv {\varvec{\nabla }}{\varPi } - {\varvec{\Phi }}\). The flux can then be written as the (in general, nonlinear) function \({\textbf {v}}({\varvec{\nabla }}\varpi ({\textbf {x}}); \mathscr {D}({\textbf {x}}))\), where \(\mathscr {D}({\textbf {x}})\) defines the pore-scale domain at macroscale position \({\textbf {x}}\). The governing equation (4) can then be written, in strong and weak forms respectively, as (Alyaev et al. 2012)

$$\begin{aligned}{} & {} {\varvec{\nabla }}\cdot {\textbf {v}}({\varvec{\nabla }}\varpi ; \mathscr {D}) = 0, \end{aligned}$$
(25a)
$$\begin{aligned}{} & {} \oint _{\partial \varOmega } {\textbf {v}}({\varvec{\nabla }}\varpi ; \mathscr {D}) \cdot {\textbf {n}} \, \text {d} s = 0, \end{aligned}$$
(25b)

where \(\partial \varOmega \) is the boundary of the arbitrary two-dimensional region \(\varOmega \), with outward-pointing unit normal \({\textbf {n}}\) and differential length element \(\text {d} s\). We wish to solve the above equations for \({\varPi }\), treating \({\textbf {v}}\) as an auxiliary variable.

The domain is partitioned into E quadrilateral elements whose vertices form a set of C nodes. A typical mesh of elements and nodes is shown on the left of Fig. 2. The domain is also partitioned by a dual mesh of C control volumes, each of which is associated with a node. The faces (edges) of the control volumes are constructed on an element-by-element basis: for each element, two straight lines join the midpoints of opposite edges. These two lines meet at a point we term the element centre (not to be confused with the element centroid). Control volume faces also run along the boundaries of the domain, so that every control volume is bounded by faces. The neighbourhood of a typical element is shown in the centre of Fig. 2, including control volume faces, nodes, the element centre, and midpoints of control volume faces.

The weak form of the governing equation (25b) is applied to a generic control volume \(\varOmega _i\). The control volume boundary \(\partial \varOmega _i\) comprises a number of straight-line faces \({\varGamma }_{ij}\), each with a constant outward unit normal \({\textbf {n}}_{ij}\). The conservation of mass becomes

$$\begin{aligned} \sum _j {\textbf {n}}_{ij} \cdot \int _{{\varGamma }_{ij}} {\textbf {v}}({\varvec{\nabla }}\varpi ; \mathscr {D}) \, \text {d} s = 0, \end{aligned}$$
(26)

into which we note that no approximation has yet been introduced. In order to evaluate the integral in (26), we use one-point quadrature, so that the equation for control volume \(\varOmega _i\) becomes

$$\begin{aligned} \sum _j \ell _{ij} {\textbf {n}}_{ij} \cdot {\textbf {v}}({\varvec{\nabla }}\varpi ({\textbf {z}}_{ij}); \mathscr {D}({\textbf {z}}_{ij})) = 0, \end{aligned}$$
(27)

where \({\textbf {z}}_{ij}\) and \(\ell _{ij}\) are the midpoint and length of the face \({\varGamma }_{ij}\) respectively.

We note that, subject to further details and definitions, (27) is the central problem to be solved in CVFEM. Expressed in the form \({\varvec{\Sigma }}({\varvec{\Pi }}) = {\textbf {0}}\), it represents a system of C nonlinear equations for the pressure at each of the C nodes, where the summand in (27) is known as the j-th contribution to the i-th equation. This nonlinear system is solved using the Newton–Shamanskii method, using a finite-difference approximation to the Jacobian matrix (Kelley 2003). The linear systems within the Newton–Shamanskii method are solved using the inbuilt function |mldivide| in MATLAB. Since evaluations of the function \({\textbf {v}}({\varvec{\nabla }}\varpi ; \mathscr {D})\) typically constitute the vast majority of our runtime, we exploit our knowledge of the structure of the function \({\varvec{\Sigma }}({\varvec{\Pi }})\) to compute the Jacobian matrix in such a way that eliminates unnecessary evaluations of \({\textbf {v}}({\varvec{\nabla }}\varpi ; \mathscr {D})\) (Hassard 2021).

5.2 Contributions to the Residual Vector

Fig. 7
figure 7

Local numbering for a generic quadrilateral element in CVFEM. Left: points of interest; right: vectors of interest and key values of the canonical coordinates \((\xi , \eta )\). The element centre is \({\textbf {w}}\), nodes are \({\textbf {x}}_\varphi \), midpoints of element edges are \({\textbf {y}}_{\varphi }\), midpoints of control volume faces are \({\textbf {z}}_\varphi \), unit normals are \({\textbf {n}}_\varphi \), and the vector \({\varvec{\ell }}_\varphi \) spans the \(\varphi \)-th face. Note that \({\textbf {n}}_\varphi \) is perpendicular to \({\varvec{\ell }}_\varphi \) but is not necessarily parallel to any element edge

Each control volume has two non-boundary faces in each of a number of elements. Without loss of generality, we focus on a single element and its associated contributions to the residual vector \({\varvec{\Sigma }}\). To simplify the notation, we introduce a local numbering system, illustrated in Fig. 7.

The nodes \({\textbf {x}}_\varphi \) are numbered anticlockwise around the element, from \({\textbf {x}}_1\) to \({\textbf {x}}_4\). The midpoint of the \(\varphi \)-th element edge is \({\textbf {y}}_{\varphi } = ({\textbf {x}}_\varphi + {\textbf {x}}_{\varphi +1})/2\). We note that the numbering is 4-periodic, so that the indices \(\varphi \) and \(\varphi \pm 4\) refer to the same objects. The interior control volume faces are constructed by joining opposite midpoints \({\textbf {y}}_{\varphi }\) and \({\textbf {y}}_{\varphi +2}\), resulting in two line segments that meet at the element centre \({\textbf {w}} = ({\textbf {x}}_1 + {\textbf {x}}_2 + {\textbf {x}}_3 + {\textbf {x}}_4)/4\). The \(\varphi \)-th face is spanned by \({\varvec{\ell }}_\varphi = {\textbf {w}} - {\textbf {y}}_{\varphi }\), and has length \(\ell _\varphi = \Vert {\varvec{\ell }}_\varphi \Vert \) and midpoint \({\textbf {z}}_\varphi = ({\textbf {w}} + {\textbf {y}}_\varphi )/2\). The unit vector \({\textbf {n}}_\varphi \) is normal to the \(\varphi \)-th face, and is oriented out of the control volume around node \({\textbf {x}}_\varphi \).

We also introduce the canonical coordinates \((\xi , \eta )\), illustrated in Fig. 7. In these coordinates, the element is coterminous with the square \([-1,1]^2\) and the control volume faces lie along the coordinate axes. To distinguish the coordinate systems, we use a superscript numeral to denote a standard coordinate, and a superscript variable to denote a canonical coordinate. For example, a vector \({\textbf {x}}\) has standard coordinates \((x^1, x^2)\) and canonical coordinates \((x^\xi , x^\eta )\).

The \(\varphi \)-th control volume face makes the following contribution to the residual vector \({\varvec{\Sigma }}\):

$$\begin{aligned} \sigma _\varphi = {\varvec{\ell }}_\varphi ^\perp \cdot {\textbf {v}}({\varvec{\nabla }}\varpi ({\textbf {z}}_\varphi ); \mathscr {D}({\textbf {z}}_\varphi )), \end{aligned}$$
(28)

where \({\varvec{\ell }}_\varphi ^\perp = \ell _\varphi {\textbf {n}}_\varphi \) is the quarter-circle-clockwise rotation of \({\varvec{\ell }}_\varphi \). The total contribution to \({\varSigma }_\varphi \) (i.e. the residual associated with the control volume around node \({\textbf {x}}_\varphi \)) due to the element of interest is then simply given by \(\sigma _\varphi - \sigma _{\varphi -1}\). Here, the second term is negative since the normal \({\textbf {n}}_{\varphi -1}\) (or equivalently, \({\varvec{\ell }}_{\varphi -1}^\perp \)) is oriented into the relevant control volume rather than out of it. All that remains is to give expressions for the terms on the right-hand side of (28). Since the body force \({\varvec{\Phi }}\) and the functions \({\textbf {v}}({\varvec{\nabla }}\varpi ; \mathscr {D})\) and \(\mathscr {D}({\textbf {x}})\) are unknown, we can only present expressions for \({\varvec{\ell }}_\varphi ^\perp \), \({\textbf {z}}_\varphi \) and \({\varvec{\nabla }}{\varPi }({\textbf {z}}_\varphi )\).

The \({\varvec{\ell }}_\varphi ^\perp \) and \({\textbf {z}}_\varphi \) are straightforward to calculate from the nodes \({\textbf {x}}_\varphi \):

$$\begin{aligned} {\varvec{\ell }}_\varphi ^\perp= & {} \frac{1}{4} \begin{pmatrix} x_{\varphi + 2}^2 + x_{\varphi + 3}^2 - x_\varphi ^2 - x_{\varphi + 1}^2\\ x_\varphi ^1 + x_{\varphi + 1}^1 - x_{\varphi + 2}^1 - x_{\varphi + 3}^1 \end{pmatrix}, \end{aligned}$$
(29a)
$$\begin{aligned} {\textbf {z}}_\varphi= & {} \frac{1}{8} (3 {\textbf {x}}_\varphi + 3 {\textbf {x}}_{\varphi + 1} + {\textbf {x}}_{\varphi + 2} + {\textbf {x}}_{\varphi + 3}). \end{aligned}$$
(29b)

Bilinear shape functions (Ramadhyani and Patankar 1985) can be used to interpolate the pressure gradient \({\varvec{\nabla }}{\varPi }\) across the element, so that \({\varvec{\nabla }}{\varPi }({\textbf {z}}_\varphi )\) can be determined by solving the following linear system:

$$\begin{aligned} \left[ \sum _{\psi = 1}^4 \begin{pmatrix} x_\psi ^\xi \left( 1 + x_\psi ^\eta z_\varphi ^\eta \right) \\ x_\psi ^\eta \left( 1 + x_\psi ^\xi z_\varphi ^\xi \right) \end{pmatrix} \begin{pmatrix} x_\psi ^1&x_\psi ^2 \end{pmatrix} \right] {\varvec{\nabla }}{\varPi }({\textbf {z}}_\varphi ) = \sum _{\psi = 1}^4 {\varPi }_\psi \begin{pmatrix} x_\psi ^\xi \left( 1 + x_\psi ^\eta z_\varphi ^\eta \right) \\ x_\psi ^\eta \left( 1 + x_\psi ^\xi z_\varphi ^\xi \right) \end{pmatrix}, \end{aligned}$$
(30)

where \({\varPi }_\psi = {\varPi }({\textbf {x}}_\psi )\) are the nodal pressures.

5.3 Boundary Conditions

We now discuss the treatment of boundary faces—those control volume faces that lie along the boundary of the domain—under no-flux and periodic boundary conditions. Under the no-flux condition \({\textbf {v}} \cdot {\textbf {n}} = 0\), the flux across a boundary face is simply zero, and so these boundary faces can be ignored (Voller 2009). A periodic boundary can essentially be enforced topologically, by identifying corresponding nodes at either end as the same node. All features distinguishing images of the same node may be discarded except the positions. This topological approach eliminates the periodic boundary itself and consequently eliminates the boundary faces. We therefore find that under both of these boundary conditions, the boundary faces can be ignored, as they do not contribute towards the residual vector \({\varvec{\Sigma }}\).

6 Comparison of SRT, TRT and BEM as Pore-Scale Solvers

We now compare the relative and absolute performance of SRT, TRT and BEM on several test problems, using the metrics of error \(\varepsilon \) and computational runtime T. The error metric is defined in (5) and we note that, where necessary to ensure precision, each simulation is repeated a large number of times so that the runtime can be averaged. All simulation results presented throughout this section were obtained using the high performance computing facilities at the Queensland University of Technology (QUT).

6.1 Test Problems

As test problems, we use a variety of doubly-periodic porous domains (see Fig. 3). All domains have square unit cells of length L, and flow is driven by a uniform body force \({\textbf {F}}\) pointing in the positive x-direction. We first use medium A, containing a circular cylinder. By varying the radius of the cylinder, we observe the effect of porosity on the behaviour of each method. We then use medium B, containing a square prism, to observe the effect of geometric shape when the porosity is held constant. Finally, we use medium C, containing many cylinders of various radii, to observe the effect of domain complexity. Here, we use the total length of the solid boundary as a measure of complexity.

To quantify the Reynolds number \({{\,\textrm{Re}\,}}= U L / \nu _\text {K}\) for the visco-inertial flow problems, we require the velocity scale U, the length scale L, and the kinematic viscosity \(\nu _\text {K}\). While we can choose the length scale to be the side length of the unit cell, we have no natural choice for the velocity scale in a pressure-driven flow problem. We therefore first choose physical parameters and calculate \({{\,\textrm{Re}\,}}\) after simulation, using the mean velocity magnitude \(\Vert {\textbf {v}} \Vert \) as the velocity scale.

For medium A with \(R = L/5\), Morris et al. (1997) use \(L = 0.1\,\hbox {m}\), \(\nu _\text {K} = 10^{-6}\,\hbox {m}^{2}\hbox {s}^{-1}\) and \(\Vert {\textbf {F}} \Vert = 1.5 \times 10^{-7}\,\hbox {ms}^{-2}\), and quote a nominal Reynolds number of unity. We use these values of L, \(\nu _\text {K}\) and \(\Vert {\textbf {F}} \Vert \) for all of our media, except for the body force strength in medium C, which we increase to \(\Vert {\textbf {F}} \Vert = 9.6 \times 10^{-6}\,\hbox {ms}^{-2}\) in order to maintain a similarly-sized \({{\,\textrm{Re}\,}}\). The retrospectively-calculated Reynolds numbers are given in the second column of Table 1. The variation observed is important for explaining variation in results between the media.

Table 1 Retrospectively-calculated Reynolds numbers for visco-inertial flow, and mesh statistics for each Fluent benchmark solution: the maximum length scale \(L_\text {max}\) that constrains the size of elements; and the number of elements \(N_\text {elm}\)
Fig. 8
figure 8

Fluent results for medium A with \(R = L/5\). Left: Error of Fluent simulations against runtime, for Stokes flow. Each line represents a mesh resolution (481/1092/2253/4887/10365/22400/48041 elements, increasing left to right), and the tolerance decreases along each line. The dashed line traces the final error for each resolution. Right: Relative difference \(\delta \) between the mean velocities of the analytic solution of Stokes flow and the Fluent solution of visco-inertial flow, as a function of \({{\,\textrm{Re}\,}}\). The dashed line shows quadratic error (proportional to \({{\,\textrm{Re}\,}}^2\))

In order to calculate the error metric \(\varepsilon \), we require accurate benchmark solutions. For medium A, an analytic solution of Stokes flow is available (Sangani and Acrivos 1982). In media B and C, and for visco-inertial flow, we must instead obtain numerical benchmark solutions. For this purpose, we first use the finite-volume computational fluid dynamics code ANSYS Fluent 18.0, which can provide accurate solutions for visco-inertial flows in regular domains such as media A, B and C.

The error for the Fluent solution of Stokes flow through medium A is given on the left of Fig. 8. The most refined simulation that we can run in Fluent (limited by prohibitive runtime or memory requirements) has precision and accuracy on the order of \(10^{-4}\), and we expect similar precision for other benchmark solutions generated by Fluent. We note that this precision is slightly larger than the precision of \(5.59 \times 10^{-5}\) that we use for BEM and LBM. This level of precision is enough to declare mesh independence for any reasonable engineering application, but for the purpose of using the results as a benchmark, we require an especially high level of accuracy (and therefore precision). We see in § 6.2 that some of the benchmark Fluent solutions are imprecise enough that we observe mesh effects in our results, but the limitations mentioned above mean that we cannot generate solutions to any higher precision. The mesh statistics for each benchmark Fluent solution are given in the final two columns of Table 1. All meshes comprise a mix of quadrilateral and triangular elements.

When a Stokes solver such as BEM is applied to visco-inertial flow, one significant source of error is the model error associated with the assumption \({{\,\textrm{Re}\,}}\ll 1\), which we refer to as the linearity error. On the right side of Fig. 8, we show the difference between the analytic solution of Stokes flow and Fluent’s solution of visco-inertial flow as \({{\,\textrm{Re}\,}}\) grows. For higher \({{\,\textrm{Re}\,}}\), this difference is simply the linearity error, which is quadratic in \({{\,\textrm{Re}\,}}\) (Hill et al. 2001). For lower \({{\,\textrm{Re}\,}}\), the difference is simply due to the imprecision of the Fluent solution, which is on the order of \(10^{-4}\), as discussed above.

Due to the limited precision of the benchmark Fluent solutions, the errors calculated with respect to these solutions will converge to nonzero values. To rectify this for Stokes flow, we instead use the highest-resolution BEM simulation as a benchmark in media B and C. In summary, our benchmarks are: an analytic solution for Stokes flow through medium A, BEM for Stokes flow through media B and C, and Fluent for all visco-inertial flow. Contour plots of normalised speed \(\Vert {\textbf {u}} \Vert / \Vert {\textbf {v}} \Vert \) for the benchmark solutions are shown alongside the corresponding error plots in the following (Figs. 9, 10, 11 and 12).

Table 2 Number of iterations required for each SRT simulation to terminate, as a function of the resolution N, the unit cell geometry, and whether the flow is Stokes or visco-inertial (VI). The pairs of columns correspond to Figs. 9, 10, 11 and 12 respectively. The entries are all multiples of 100 as the convergence criterion (14) is only checked every 100 iterations
Table 3 Number of iterations required for each TRT simulation to terminate, as a function of the resolution N, the unit cell geometry, and whether the flow is Stokes or visco-inertial (VI). The pairs of columns correspond to Figs. 9, 10, 11 and 12 respectively. The entries are all multiples of 100 as the convergence criterion (14) is only checked every 100 iterations
Table 4 Number of elements in each BEM simulation, as a function of the target number of elements B, the unit cell geometry, and whether the flow is Stokes or visco-inertial (VI). An asterisk is used when the number of elements matches the target number B. The pairs of columns correspond to Figs. 9, 10, 11 and 12 respectively

6.2 Relative Performance

Fig. 9
figure 9

Flow through medium A with \(R = L/5\). Left: Stokes flow with an analytic benchmark; right: \({{\,\textrm{Re}\,}}= 4.90\) with a Fluent benchmark. Top: benchmark normalised speed; bottom: error against runtime for SRT (), TRT () and BEM (). In the grey region \(\varepsilon < 5.59 \times 10^{-5}\), the error is smaller than the precision

Fig. 10
figure 10

Visco-inertial flow through medium A. Left: \(R = L/10\), \({{\,\textrm{Re}\,}}= 11.61\); right: \(R = L/3\), \({{\,\textrm{Re}\,}}= 1.02\). Top: benchmark normalised speed; bottom: error against runtime for SRT (), TRT () and BEM (). In the grey region \(\varepsilon < 5.59 \times 10^{-5}\), the error is smaller than the precision

The statistics for every SRT, TRT and BEM simulation used in this section are given in Tables 2, 3 and 4 respectively, including the resolution and number of iterations.

For medium A, the performance of each method (i.e. the error and corresponding runtime of each simulation) is shown in Fig. 9, for cylinder radius \(R = L/5\), and in Fig. 10, for \(R = L/10\) and \(R = L/3\). While results for visco-inertial flow are shown for all media, results for Stokes flow are only shown for \(R = L/5\), since the qualitative trends are clear from the visco-inertial results. Within the grey region of each performance plot, the error is smaller than the methods’ precision (whose value is \(5.59 \times 10^{-5}\)). Since the precision of a method provides a lower bound for its error, we cannot expect the error to continue to decrease once it reaches the grey region. Instead, we expect the error to fluctuate about the precision (i.e. the interface between the white and grey regions).

For both Stokes flow and visco-inertial flow, the lattice Boltzmann methods converge well, despite the fluctuations with N observed in Fig. 4. In all cases, the TRT error decreases more quickly than the SRT error as the runtime increases. For \(R = L/5\) and \(R = L/10\), the TRT error also begins lower than the SRT error, and so TRT clearly performs better in these cases. However, for \(R = L/3\), the SRT error begins lower, and so there will be a point at which the TRT and SRT lines intersect, denoted by the runtime threshold \(\varTheta ^\text {TRT}_\text {SRT}\).

The presence of a runtime threshold \(\varTheta ^\text {TRT}_\text {SRT} \approx {2\,\mathrm{\text {s}}}\) for \(R = L/3\), despite the absence of one for \(R = L/5\) and \(R = L/10\), can be explained by the large boundary length and small porosity. Firstly, our implementation of TRT does not optimise for exact boundary placement (due to our choice of the magic number \(\varLambda \) in § 3.2) while SRT does (due to the choice \(\omega = 1\) in § 3.2), and the MR boundary conditions used in TRT are more computationally expensive than the BFL conditions used in SRT. As a result, the boundary incurs higher error and higher computational cost for TRT than for SRT, and as the boundary length increases, this discrepancy grows. Similarly, as the porosity diminishes, the boundary-related costs form a growing proportion of the total cost and so this discrepancy becomes more prominent.

For Stokes flow, we observe that, relative to the SRT error and TRT error, the BEM error begins higher and decreases more quickly. There will therefore be runtime thresholds \(\varTheta ^\text {BEM}_\text {SRT}\) and \(\varTheta ^\text {BEM}_\text {TRT}\), both of which we find to be less than 1 s for \(R = L/5\). BEM is therefore the best-performing method for this problem unless an especially short runtime is required. However, for visco-inertial flow, BEM encounters linearity error, and so SRT and TRT are the only viable options. The different cylinder radii R give different Reynolds numbers and therefore different levels of linearity error—for \(R = L/10\), the BEM error levels off almost immediately, while for \(R = L/3\), no levelling behaviour is observed over the range of tested resolutions.

Fig. 11
figure 11

Flow through medium B. Left: Stokes flow with a BEM benchmark; right: \({{\,\textrm{Re}\,}}= 4.55\) with a Fluent benchmark. Top: benchmark normalised speed; bottom: error against runtime for SRT (), TRT () and BEM ()

The performance of each method for medium B, which contains a square with side length \(S = L \sqrt{\pi } / 5\), is shown in Fig. 11. We note that the porosity matches that of medium A with \(R = L/5\), and that the boundary length is only slightly longer. Accordingly, TRT outperforms SRT in medium B as it does for \(R = L/5\). For Stokes flow, BEM performs best after a longer runtime threshold \(\varTheta ^\text {BEM}_\text {TRT} \approx {1\,\mathrm{\text {s}}}\).

All three methods perform significantly worse in medium B than in medium A with \(R = L/5\), and this is likely the reason for the longer runtime threshold. To illustrate the difference in performance, we note that for Stokes flow using TRT and BEM, the error in medium A reaches the precision (grey region in plot), but the error in medium B does not approach the precision, despite using higher resolutions. Relative to medium A, all three methods exhibit error reduction indices that are approximately halved. This means that a decrease in error corresponding to a 10-fold runtime increase in medium A will require a 100-fold runtime increase in medium B. SRT and TRT also encounter particular difficulty due to LBM’s inherent fluctuations in error as the resolution and runtime increase. While these fluctuations may be no larger than in medium A, they are larger relative to the overall downward trend.

Fig. 12
figure 12

Flow through medium C. Left: Stokes flow with a BEM benchmark; right: \({{\,\textrm{Re}\,}}= 5.00\) with a Fluent benchmark. Top: benchmark normalised speed; bottom: error against runtime for SRT (), TRT () and BEM (). In the grey region \(\varepsilon < 5.59 \times 10^{-5}\), the error is smaller than the precision

The performance of each method for medium C, which contains 44 cylinders of various radii, is shown in Fig. 12. The total boundary length in this medium is approximately 8 times the boundary length of medium A with \(R = L/5\), which we expect to negatively affect BEM and, to a lesser extent, TRT. We find here that the utility of our visco-inertial error metric is limited by the precision of our Fluent benchmark—the error metric \(\varepsilon \) appears to converge to the same nonzero value for all three methods, indicating that this value represents the precision of the benchmark.

Bearing in mind the limitation of the Fluent benchmark, both SRT and TRT appear to converge steadily for Stokes flow and visco-inertial flow, with similar log-log slopes to those in medium A, and with the fluctuations in error much less apparent. Relative to medium A with \(R = L/3\), medium C has a larger total boundary length and a larger porosity. These changes largely counteract each other, considering that SRT is advantaged by large total boundary length but small porosity. We find that there is no runtime threshold \(\varTheta ^\text {TRT}_\text {SRT}\) for medium C, and so we conclude that the effect of the porosity is stronger than that of the total boundary length.

While the log-log slope of the BEM error is similar to those in medium A, the runtime thresholds are much higher, as expected due to the much larger total boundary length: \(\varTheta ^\text {BEM}_\text {SRT} \approx {13\,\mathrm{\text {min}}}\) and \(\varTheta ^\text {BEM}_\text {TRT} \approx {7\,\mathrm{\text {h}}}\). The runtime threshold \(\varTheta ^\text {BEM}_\text {TRT}\) is much larger than would be practical in a dual-scale model, and so TRT dominates for all practical runtimes. Additionally, the extreme increases in the runtime thresholds \(\varTheta ^\text {BEM}_\text {SRT}\) and \(\varTheta ^\text {BEM}_\text {TRT}\) imply that there is only a small range of values of the total boundary length for which BEM is a viable option. However, we note that for the special case of Stokes flow through a homogeneous medium, for which a dual-scale model requires only a single pore-scale simulation, BEM remains viable for medium C.

In summary, BEM performs best for small-boundary-length Stokes flow, and in all other cases, TRT performs best for small boundary length and large porosity, and SRT performs best for large boundary length and small porosity. The most important factor in deciding between TRT and SRT is the porosity, and other than the fact that BEM is only suited to Stokes flow, the most important factor determining the suitability of BEM is the total boundary length. While increases in boundary length impact TRT and SRT less than BEM, they still have a negative impact on all three methods, and so there is likely a practical limit to the geometric complexity that can be handled.

6.3 Absolute Performance

Finally, after assessing the relative performance of SRT, TRT and BEM, we now discuss the absolute performance of the best-performing methods, to determine their feasibility in the dual-scale modelling context. An ideal runtime would be under 3 s, so that a dual-scale simulation requiring \(10^{4}\) pore-scale simulations would run within 8 h, and we deem any runtime over a minute to be impractical, with \(10^{4}\) pore-scale simulations taking over a week to run. We deem any error under 1 % to be acceptable, and define further milestones at 0.1 % and 0.01 % error.

Table 5 Runtimes required to reach error milestones for Stokes and visco-inertial flows, using whichever method is fastest. The symbols denote which method is used: SRT (), TRT () or BEM ()

Approximate runtimes for these milestones are given in Table 5. Where error does not decrease steadily, results are based on a line of best fit. Since particularly short runtimes are difficult to measure precisely, and such runtimes occur in the less-predictable coarse-resolution regime, we give several runtimes for media A and B as simply less than 1 s. Similarly, for medium C, the coarsest successful TRT simulation ran in just under 10 s, and since a slightly coarser resolution was unsuccessful (see Table 3), we cannot extrapolate the results and give some runtimes as simply less than 10 s.

For Stokes flow, TRT can achieve 1 % error for all test domains in under 10 s, which is well below the maximum practical runtime of a minute. Within the ideal runtime of 3 s, BEM can achieve less than 1 % error for medium B and less than 0.1 % error for medium A. The differences between the media become apparent at the 0.01 % milestone, with runtimes between several seconds and several hours.

The visco-inertial results are similar wherever TRT has been used for the corresponding Stokes flow result. For instance, 1 % error can again be achieved in under 10 s for all test domains, and within 1 s for most test domains. Within the ideal runtime of 3 s, we again see that 0.1 % error can be achieved for some media. However, since BEM is not suitable for visco-inertial problems, the low runtimes it achieves for the smaller error milestones in Stokes flow cannot be replicated in visco-inertial flow. The 0.1 % and 0.01 % milestones must instead be reached by SRT or TRT, for which error decreases much more slowly. The runtimes for many of these milestones are therefore much larger, and some cannot be reliably estimated.

We take this opportunity to note that SRT, TRT and BEM generally outperform the finite volume simulations run in Fluent. In most of the test cases, the error incurred by Fluent is on the order of that incurred by SRT, TRT and BEM in a much shorter runtime. As an example, Fig. 8 shows the performance of Fluent for Stokes flow through medium A with \(R = L/5\). Fluent takes more than a minute to reach 1 % error, which is longer than would be practical in a dual-scale model, especially for such a simple pore-scale domain. In contrast, in the same time frame, SRT, TRT and BEM can achieve 0.1 %, 0.01 % and 0.001 % error respectively.

In summary, for all of the test problems, sufficiently small error is achievable in sufficiently small runtime to be viable in a dual-scale model. This is especially true for a context in which \({{\,\textrm{Re}\,}}\) is small or the pore-scale geometry is simple.

7 Comparison of SRT, TRT and BEM in a Dual-Scale Model

Finally, we demonstrate application of the above methods to the dual-scale solution of Stokes flow through porous media. In such a problem, Darcy’s law (3) is valid, and so a dual-scale simulation involves two stages: the determination of the permeability \({\textbf {K}}({\textbf {x}})\) across the macroscale domain given the porous structure \(\mathscr {D}({\textbf {x}})\); and the solution of the macroscale problem given the permeability \({\textbf {K}}({\textbf {x}})\). In our model, we sample the permeability at H points across the macroscale domain and use natural neighbour interpolation (Sibson 1981) to compute the permeability at an arbitrary point.

Our dual-scale test problem is due to Abdulle and Budáč (2015), some of the few authors that discuss dual-scale models of this type in locally-periodic media. The macroscale domain, shown on the left side of Fig. 13, is \(\varOmega = \big ( [0,2] \times [0,3] \big ) {\setminus } \big ( (1,2] \times (1,2) \big )\), with a body force \({\varvec{\Phi }} = -{\textbf {j}}\), periodic symmetry between the top (\(y = 3\)) and bottom (\(y = 0\)) boundaries, and no-flux boundaries elsewhere.

Fig. 13
figure 13

Domains of the dual-scale test problems. Left: macroscale domain. No-flux boundaries are indicated with thick lines, periodic boundaries with dashed lines, and the origin with a ring. Right: representative unit cells for the heterogeneous (top) and homogeneous (bottom) media. The heterogeneous unit cell is parameterised by \(\theta \), with \(L_x\) and \(L_y\) fixed

For the pore-scale structure, we use one of two locally-periodic porous media: a heterogeneous medium due to Abdulle and Budáč (2015), and a homogeneous medium based on the pore-scale medium C. At a given macroscale point \({\textbf {x}} = (x, y)\) in the heterogeneous medium, the \(L \times L\) unit cell \(\mathscr {D}({\textbf {x}})\) contains a rectangle with side lengths \(L_x = 0.6 L\) and \(L_y = 0.3 L\), rotated by the angle \(\theta = (1 - x^2/8 - y/3) \pi \). Typical unit cells for the heterogeneous and homogeneous media are shown on the right side of Fig. 13.

Fig. 14
figure 14

Stokes flow through the heterogeneous (top) and homogeneous (bottom) media. Left: macroscale heterogeneity demonstrated by magnified pore-scale geometry. Solid lines indicate no-flux conditions and dashed lines indicate periodic boundaries. Right: benchmark pressure distribution. Contours are multiples of 0.05, bold contours are multiples of 0.2

Fig. 15
figure 15

Error against runtime for dual-scale solution of Stokes flow through the heterogeneous (left) and homogeneous (right) media. The pore-scale problem is solved by either SRT (red), TRT (purple) or BEM (green). Due to the large fluctuations exhibited by LBM as the resolution increases (see Fig. 4), the results here for SRT and TRT are based on fitted error surfaces

The heterogeneous medium is intentionally constructed so that the pore-scale domain \(\mathscr {D}({\textbf {x}})\) depends on a single parameter \(\theta \). A straightforward solution of the dual-scale problem does not exploit this information, instead assuming \(\mathscr {D}({\textbf {x}})\) to be a black box and thus sampling from the two-dimensional position \({\textbf {x}}\). However, an accurate benchmark solution can be constructed by sampling directly from the parameter \(\theta \).

In our case, the benchmark solution is generated using 1025 equally-spaced samples of \(\theta \in [0,\pi /4]\), with permeabilities for \(\theta \) outside this range calculated based on symmetry. The pore-scale problems were solved using BEM with \(B = 16384\) elements, and the macroscale problem was solved with \(E \approx {}{1.3e8}{}\) elements. The benchmark solution for the homogeneous medium was generated using \(B = 32767\) and \(E \approx {}{1.3e8}{}\), with only a single pore-scale simulation required. These benchmark solutions are shown in Fig. 14, alongside simplified representations of the corresponding media, in which the pore-scale structure is magnified to give a sense of how it varies across the macroscale domain.

With respect to the benchmark solution, we calculate relative error as

$$\begin{aligned} \varepsilon = \frac{\Vert {\varvec{\Pi }}_\text {int} - {\varvec{\Pi }}_\text {int}^\text {bench} \Vert }{\Vert {\varvec{\Pi }}_\text {int}^\text {bench} \Vert }, \end{aligned}$$
(31)

where \({\varvec{\Pi }}_\text {int}\) contains interpolated values of the macroscale pressure \({\varPi }\), evaluated over the set \((\mathbb {Z}^2/100) \cap \varOmega \), and \({\varvec{\Pi }}_\text {int}^\text {bench}\) is the same for the benchmark solution.

Using BEM with B elements at the pore scale, we can vary B (using \(32 \le B \le 256\) and \(256 \le B \le 8192\) for the heterogeneous and homogeneous media respectively), the number of macroscale elements E (using \(E = 63750\) and \(27 \le E \le 1022341\) respectively), and, for the heterogeneous medium, the number of sampled permeabilities H (using \(35 \le H \le 64150\)). We optimise these parameters to achieve the least error \(\varepsilon \) in the least runtime T possible. By varying the grid resolution N, we perform similar analysis for SRT (using \(27 \le N \le 113\) and \(27 \le N \le 1783\) respectively) and for TRT (using \(27 \le N \le 281\) and \(147 \le N \le 1783\) respectively), but due to the large fluctuations in error for these methods, such optimisation is only possible when the error is approximated by a least-squares fit. The error and runtime corresponding to these optimised parameters are shown in Fig. 15.

In both media, we see that, as in the pore-scale simulations, the error reduces more quickly for TRT than for SRT, and more quickly for BEM than for TRT. For the heterogeneous medium, in which the pore-scale geometry is relatively simple, the runtime at which BEM becomes the best-performing method is relatively low—approximately 2 min. For the homogeneous medium, in which the pore-scale geometry is relatively complex, the runtime at which BEM will become the best-performing method will be relatively high. However, since only a single pore-scale simulation is required, the dual-scale runtime will be within 6 h, which is not an impractically large runtime for an entire dual-scale simulation. We note that these runtime thresholds will be lower in practice, as this analysis neglects the error fluctuations in LBM.

We therefore see that the preliminary results from our dual-scale simulations qualitatively agree with the results from our pore-scale simulations. For simple pore-scale media, BEM performs best unless an especially short runtime is required. For complex pore-scale media, BEM only performs best after a runtime on the order of hours, but this is acceptable if the medium is homogeneous.

8 Conclusions

We have compared the performance of the single-relaxation-time lattice Boltzmann method (SRT), the two-relaxation-time lattice Boltzmann method (TRT), and the boundary element method (BEM) for several periodic porous media.

For Stokes flow, TRT and BEM generally complement each other well. TRT incurs a lower setup cost, so that relatively small error can be obtained from short computations. However, a relatively large increase in runtime is required to further reduce error. On the other hand, BEM incurs a large amount of error for coarse simulations, but its error decreases quickly as the runtime increases. The advantages of both methods can be exploited by using TRT for short runtimes, when it has relatively low error, and BEM for long runtimes, when it has relatively quick error reduction. Regarding SRT, we find that TRT generally achieves less error in a shorter runtime, and so TRT is preferable to SRT in this context.

In media with large boundary length, the overhead incurred by BEM is much larger, due to the greater number of elements required to discretise the boundary. This larger overhead means that BEM only outperforms TRT when the runtime is especially long—perhaps still practical for a single pore-scale simulation, but impractical for a dual-scale simulation during which many pore-scale simulations will be run. Therefore, if restricted to runtimes practical for a dual-scale solver, TRT is the best-performing method for large-boundary-length media, and its slow error reduction cannot be overcome.

For a given visco-inertial flow problem, SRT and TRT generally incur similar levels of both error and runtime as for the analogous Stokes flow problem. This is an advantage, since other methods, such as finite volume (via Fluent), incur more runtime and more error as the Reynolds number increases. We therefore observe that, as for Stokes flow, TRT generally performs better than SRT. However, from the larger range of media used for the visco-inertial tests, it is apparent that SRT performs better than TRT for small porosity and, to a lesser extent, for large boundary length. Since BEM only solves Stokes flow, it does not converge for visco-inertial flow and so its superior error reduction cannot be exploited.

From an absolute perspective, the computational runtime required for SRT, TRT or BEM (depending on the context) to achieve an acceptable level of error (under 1 %) is, in general, small enough for the method to be practical as the pore-scale solver of a dual-scale model (always under 10 s, and often under 1 s).

Finally, we show that our conclusions based on the pore-scale results hold for a dual-scale model of Stokes flow. BEM outperforms SRT and TRT for a heterogeneous medium with small boundary length, and for a homogeneous medium with large boundary length. It is expected that TRT would perform best for a heterogeneous medium with large boundary length.

Building on the findings of the current work, further research is to be on the full dual-scale model. Such research will include a thorough analysis of the solution of Stokes flow through a locally-periodic medium, and an investigation into the viability of using this model to solve visco-inertial flow through a locally-periodic medium.