1 Introduction to part II

This paper is a companion to van Keken and Wilson “An introductory review of the thermal structure of subduction zones: I—motivation and selected examples” (van Keken and Wilson 2023a, hereafter called Part I) and van Keken and Wilson “An introductory review of the thermal structure of subduction zones: III—comparison between models and observations” (van Keken and Wilson 2023b, hereafter referred to as part III).

Combined these articles provide an introduction to the use of thermal models and observational constraints to aid our understanding of the dynamics, structure, and evolution of subduction zones from a geophysical, geochemical and petrological perspective. In Part I, we provided the motivation for these studies, fundamental constraints on subduction zone geometry and thermal structure, and a limited overview of existing thermal models. In this article, we will provide a discussion of the use of the finite element method to discretize partial differential equations needed for subduction zone modeling, present open-source software, and discuss validation and verification approaches to understand the reliability of the thermal models.

Our approach will be similar to that in part I—we strive to make this introduction accessible to advanced undergraduates, graduate students, and professionals from outside geodynamics. This will, hopefully, make the reader able to establish a fundamental understanding of what is required for numerical modeling of the thermal structure of subduction zones.

While we focus on the use of finite element methods to solve the governing equations, we acknowledge that significant and important studies have been published that use finite difference (FD) or finite volume (FV) methods. An introduction to the use of FD methods in geodynamical applications is provided by Gerya (2019). A broader overview of computational methods for geodynamics including FV is in Ismail-Zadeh and Tackley (2010). A useful overview of the use of finite element methods specifically for mantle convection modeling with a comparison to FD and FV methods is in Zhong et al. (2015). As we will see, finite element methods can be used to discretize complex geometries, which provides a significant advantage for subduction zone modeling over FD and FV methods.

In Sect. 2, we first describe how finite element approaches to solve common linear partial differential equations such as the Poisson and Stokes equations are constructed. We then apply this to dynamical models that rely on solving the Stokes and heat equations, which include a standard convection benchmark and a new simplified subduction zone benchmark. The latter will be used to quantify the precision with which we can predict the subduction zone thermal structure using a kinematic–dynamic approach.

2 Finite element modeling

2.1 General formulation of the finite element solution of partial differential equations

The goal of the numerical models discussed here is to find the approximate solutions of partial differential equations (PDEs) in a spatial domain denoted by \(\Upomega\), with boundaries \(\partial \Upomega\), representing some part of the Earth, say, a cross-section through a subduction zone. These PDEs can be time dependent, nonlinear, or nonlinearly coupled to other PDEs. To sketch out how we can discretize the PDEs with finite elements, we will first assume that we have linear PDEs of the general form

$$\begin{aligned} {L}(u) = f \quad \quad \text {in }\Upomega \end{aligned}$$
(1)

where L is a linear differential operator, f some right-hand side function and \(u = u( \vec {x}, t)\) the solution we seek to approximate over space \(\vec {x}\) and time t. In addition to (1), we require boundary conditions of the form

$$\begin{aligned} {J}(u) = g \quad \quad \text {on }\partial \Upomega \end{aligned}$$
(2)

where J is a linear differential operator and g is a function describing how u and/or its derivatives behave on the boundary. Efficient computer-based solution of the linear differential problem (1) and (2) relies on discretizing the domain \(\Upomega\) into a set of degrees of freedom (DOFs) or values at “nodal” points in the domain at which the approximate solution is sought. This discretization facilitates the translation of the governing equations from differential to algebraic matrix–vector form. Discretization schemes differ in how they organize and distribute the degrees of freedom onto a mesh or grid of points across the domain.

Finite difference methods distribute DOFs at points in \(\Upomega\) and construct approximate derivatives by taking the differences between the values of neighboring points (along connecting lines in a mesh of points). This is made easier if the DOFs are organized in a regular or structured grid. Finite volume methods construct control volumes around the degrees of freedom, and rather than approximating the derivatives, they consider the fluxes through the control volume boundaries between neighboring degrees of freedom. This means that the DOFs can be distributed in an unstructured way, but achieving higher orders of accuracy with FV methods is easier on structured meshes. The finite element method (FEM), on the other hand, tessellates the domain with polygonal elements and then distributes DOFs relative to these elements. The order of accuracy is then controlled by the number and the distribution of DOFs within an element, which can themselves be arranged in an unstructured pattern.

Formally, the FEM approximates u by \({\tilde{u}}\), the solution’s representation in a function space on the mesh where

$$\begin{aligned} {\tilde{u}}( \vec {x}, t) = \sum _j \phi _j( \vec {x}) u_j(t) \end{aligned}$$
(3)

Here, \(u_j\) are coefficients that as indicated can be time dependent but do not depend on space. The shape functions \(\phi _j\) are a function of space but generally independent of time. The index j indicates the number of the shape function on the mesh and is associated with the number of the nodal point. In this manuscript, we will principally discuss so-called Lagrange shape functions which define \(\phi _j\) as a polynomial over an element with a value of 1 at a single nodal point and a value of 0 at all other points associated with the degrees of freedom such that \(\sum _j\phi _j\)=1 (see Fig. 1). The shape functions can be of arbitrary order and can have various conditions on their continuity across or between elements. We will focus principally on linear Lagrange shape functions (denoted by P1) and quadratic Lagrange shape functions (denoted by P2) that are continuous between mesh elements. Our choice of Lagrange shape functions means that \(u_j\) are the actual values of the solution in (3). With some other forms of the shape function, \(u_j\) are instead interpolation weights that are used to construct the solution values. The split of temporal and spatial dependence above is typical in geodynamic applications but not required. Given the “trial” solution function (3), finite element methods pose (1) as a residual \( {R}({\tilde{u}})\):

$$ \begin{aligned} {R}({\tilde{u}}) = {L} ({\tilde{u}}) - f \end{aligned}$$
(4)

The residual is minimized in a weighted average sense by multiplying the residual with a weighting test function, \({\tilde{u}}_t\), integrating over the domain of interest and setting this to zero:

$$\begin{aligned} \int {\tilde{u}}_t {R}({{\tilde{u}}}) \text{d} \Upomega = 0 \end{aligned}$$
(5)

The test functions \({\tilde{u}}_t\) can be independent of the functions \(\phi _j\) that span the function space of the trial function, but in the widely used Galerkin approach the test functions are restricted to be in the same function space such that

$$\begin{aligned} {\tilde{u}}_t( \vec {x}, t) = \sum _i\phi _i( \vec {x}) u_{ti}(t) \end{aligned}$$
(6)

Since the method is valid for all \({\tilde{u}}_t\) , we can dispense with the test function values at the DOFs, \(u_{ti}\), and the minimization function can be written as

$$\begin{aligned} \int \phi _i {R}({{\tilde{u}}}) \text{d} \Upomega = 0~~~~~\text {for all}~i \end{aligned}$$
(7)

Given a domain with n DOFs such that ij=1, ..., n, combining (7) with (3) results in a matrix–vector system of the form

$$\begin{aligned} {{\textbf{S}}} {{\textbf{u}}} = {{ \textbf{f}}} \end{aligned}$$
(8)

where \({{{\textbf{S}}}}\) is a \(n \times n\) matrix, \({{ \textbf{f}}}\) is the right-hand side vector of length n and \({{\textbf{u}}}\) is the solution vector of values or weights at the DOFs

$$\begin{aligned} {{\textbf{S}}}&= S_{ij} = \int \phi _i {L}(\phi _j) \text{d}\Upomega \end{aligned}$$
(9)
$$\begin{aligned} {{ \textbf {f}}}&= f_i = \int f \phi _i \text{d}\Upomega \end{aligned}$$
(10)
$$\begin{aligned} {{\textbf{u}}}&= u_j \end{aligned}$$
(11)

where we can move the solution values out of the integral in (7) due to the linear nature of L. For elliptic problems, \({\textbf{S}}\) is sometimes called the stiffness matrix and \({ \textbf {f}}\) the load vector because the finite element method was initially used in structural problems where \({\textbf{u}}\) typically represents a displacement. It expresses how for a given load \({ \textbf {f}}\) the stiffness of the structure, as expressed by the coefficients in the stiffness matrix \({{\textbf{S}}}\), limits the displacement \({{\textbf{u}}}\) of nodes in a structure. Note that in the above summary we have glossed over the imposition of boundary conditions (2), which must be incorporated into the residual (4), trial (3) and test (6) functions. Assuming that the boundary conditions are correctly implemented, that problems (1) and (2) are well posed, and that the discretization is adequate, then the discrete approximate solution \({\textbf{u}}\) (11) can be found through direct or iterative solution of (8).

The ease with which finite elements can be used on an unstructured mesh gives them one of their primary advantages for subduction zone modeling—being able to tessellate complex geometries. This is of particular importance when, for example, explicitly discretizing the subducting slab surface, surface topography, or crustal interfaces in the overriding plate. In addition, grid refinement can be used where strong gradients in solutions exist (such as at the top of the slab when it gets in contact with the hot mantle wedge; see Figure 1b in part I) and coarse grids can be used where the solutions are relatively constant, leading to improved overall computational efficiency compared to methods that require a structured discretization of space. Another advantage of the finite element method, that we will see below, is the natural way in which boundary conditions can be implemented.

For Lagrange bases increasing the order of the polynomial of \(\phi _j\) increases the number of DOFs per element (see Fig. 1) and increases the order of accuracy of the solution. The shape functions may be continuous or discontinuous between elements but each \(\phi _j\) ideally has compact support, meaning that the basis function associated with a degree of freedom only has nonzero values in the elements immediately surrounding the DOF. It is this property that ensures the matrix \({{\textbf{S}}}\) (9) is sparse in the final discrete system of equations (8).

We provide practical examples that show how to construct (8) using finite elements. Our goal is to demonstrate the flexibility and power of the FEM without giving an exhaustive introduction or rigorous mathematical derivation of the method. Practical introductions to the FEM can be found in Johnson (1987) and Logan (2017). More mathematically founded descriptions of the FEM can be found in Oden and Reddy (1976), Hughes (1987), and Strang and Fix (2008). Some of these texts are available in affordable Dover reprints.

2.2 Construction of finite element models

2.2.1 Examples of partial differential equations solved by the FEM

The exact set of equations that needs to be solved to make predictions of the thermal structure of subduction zones using a kinematic–dynamic approach is provided in Sect. 2.3.1. These are derived from the fundamental equations governing the conservation of mass, momentum, and thermal energy. The conservation of mass and momentum leads, under a number of simplifying assumptions (that we will not discuss in detail but that can be found in fundamental textbooks such as Turcotte and Schubert 2002), to the nondimensional Stokes equation and the condition of incompressibility

$$\begin{aligned} - \nabla \cdot \left( 2\eta \frac{\nabla \vec {v} + \nabla \vec {v}^T}{2}\right) + \nabla {P} ~&=~ \vec {f}_B \end{aligned}$$
(12)
$$\begin{aligned} \nabla \cdot \vec {v} ~&=~ 0 \end{aligned}$$
(13)

Given a viscosity, \(\eta\), and a buoyancy force, \({\vec{f}}_{B}\), that can depend on temperature and composition, the Stokes equation balances viscous, pressure, and buoyancy forces. Further imposition of the incompressibility constraint (13) allows us to find the velocity, \(\vec {v}\), and pressure, P. The conservation of thermal energy leads to the nondimensional heat advection–diffusion equation

$$\begin{aligned} {\rho c_p\left( \frac{\partial T}{\partial t} + \vec {v} \cdot \nabla T \right) ~=~ \nabla \cdot \left( k \nabla T \right) + H} \end{aligned}$$
(14)

which, given the density, \(\rho\), heat capacity, \(c_p\), and thermal conductivity, k, balances the transport of heat by diffusion and advection with heat production, H. The heat equation can be modeled to be stationary (by assuming \(\frac{\partial {T}}{\partial t}\)=0) and the Stokes equation can be nonlinear due to the dependence of the viscosity on stress. The Stokes equation with the incompressibility constraint are generally nonlinearly coupled with the heat advection–diffusion equation.

In this section, rather than immediately solving the full nonlinear set of equations, we will provide examples of how to solve (12)–(14) one by one, under various simplifying assumptions, before embarking on a fully coupled problem. We will start with a simple worked-out example of a 1D Poisson equation which is arguably the simplest form of (14) under the assumption of zero velocity, which also eliminates (12) and (13) entirely. This will include the generation of shape functions, construction of the matrix–vector system, solution on a coarse mesh, comparisons between linear and quadratic elements, and convergence tests. This section is particularly intended for those new to finite element methodology and nomenclature. Those comfortable with basic FEM concepts but interested in the weak form formulation of PDEs and their FEM solution can skip forward to Sect. 2.2.3 where we describe the FEM implementation and software availability. This is followed by the extension of the Poisson heat diffusion problem to more than one dimension and the solution of the linear Stokes equation for a traditional cornerflow problem, neglecting temperature effects. We then combine the heat and Stokes equation in coupled problems using a standard mantle convection benchmark before focusing on simplified models of subduction zones. Unless explicitly mentioned otherwise, we will assume in all examples below that the equations are in nondimensional form.

Section 2.3 derives (12)–(14) from their dimensional form and discusses how they are used in kinematic–dynamic subduction zone models. Readers who are more interested in understanding how different modeling approaches for subduction zone thermal structure compare or how the models compare to observations are invited to skip forward to part III.

2.2.2 1D Poisson

As an introductory and simplified example, we will solve the Poisson equation on a 1D domain of unit length, \(\Upomega = [0,1]\). This can be derived from the steady-state form of (14) by assuming zero velocity and a constant thermal conductivity, and seeking the approximate solution of

$$\begin{aligned} -\frac{\text{d}^2 {T}}{\text{d}x^2}&= f \end{aligned}$$
(15)

where we choose for this example f = \(\frac{{H}}{k}\)=\(\frac{1}{4}\pi ^2 \sin \left( \frac{\pi x}{2} \right)\). At the boundaries, x = 0 and x = 1, we apply as boundary conditions (2)

$$\begin{aligned} {T}&= 0{} & {} \text {at } x=0 \end{aligned}$$
(16)
$$\begin{aligned} \frac{\text{d}{T}}{\text{d}x}&= 0{} & {} \text {at } x=1 \end{aligned}$$
(17)

The first boundary condition is an example of an essential or Dirichlet boundary condition where we specify the value of the solution. The second boundary condition is an example of a natural or Neumann boundary condition that can be interpreted to mean that the solution is symmetrical around x = 1. We will return to the various types of boundary conditions and their implementation in a later section. The analytical solution to (15) with given boundary conditions (16) and (17) is simply

$$\begin{aligned} {T} = \sin \left( \frac{\pi x}{2}\right) \end{aligned}$$
(18)

Minimization of the residual \( {R({\tilde{T}}}\)) following (4) and (7) leads to

$$\begin{aligned} - \int _0^1 \phi _i \frac{\text{d}^2 {\tilde{{T}}}}{\text{d} x^2}\text{d}x = \int _0^1 \phi _i f \text{d}x~~~~~~i=1,\ldots ,n \end{aligned}$$
(19)

By integrating the first term by parts, we find

$$\begin{aligned} \int _0^1 \frac{\text{d} \phi _i}{\text{d}x} \frac{\text{d} {\tilde{{T}}}}{\text{d}x} \text{d}x - \left[ \phi _i \frac{\text{d} {\tilde{{T}}}}{\text{d}x} \right] _0^1 = \int _0^1 \phi _i f \text{d}x ~~~~~~i=1,\ldots ,n \end{aligned}$$
(20)

where the second term can be dropped because at x = 1 we require \(\frac{\text{d} {\tilde{{T}}}}{\text{d}x}\) = 0 and the solution at x = 0 is known, \({\tilde{T}}\) = 0, so can be lifted from the resulting matrix equation.

We can find the solution at the DOFs, \( {T}_j\), from the discrete \(n \times n\) matrix–vector system (8) where now

$$\begin{aligned} {{\textbf{S}}}&= {S}_{ij} = \int _0^1 \frac{\text{d}\phi _i}{\text{d}x} \frac{\text{d}\phi _j}{\text{d}x} ~\text{d}x \end{aligned}$$
(21)
$$\begin{aligned} {{\textbf{f}}}&= f_i = \int _0^1 \phi _i f ~\text{d}x \end{aligned}$$
(22)
$$\begin{aligned} {{\textbf{u}}}&= {{\textbf{T}}} = {T}_j \end{aligned}$$
(23)

where \({{\textbf{T}}}\) has components \({T}_j\) that define the continuous approximate solution

$$\begin{aligned} {\tilde{{{T}}}}(x) = \sum _{j=1}^n \phi _j(x) {{T}}_j \end{aligned}$$
(24)

and \({T}_0\) = 0.

The domain is divided into \(n_e\) elements of equal length, \(\Delta x\) = \(\frac{1}{n_e}\), with elements \(e_i\) and degrees of freedom \({T}_i\) ordered from x = 0 to x = 1. This introduces nodal points \(x_i\), 0\(\le i \le n\) (see Fig. 1a). A simple assumption for the Lagrange shape functions \(\phi _{i}\) is that the shape functions are linear within the elements. Such functions within a given element \(e_i\) (\(x_{i-1} \le x \le x_i\)), 1\(\le i \le n_e\), are

$$\begin{aligned} \lambda _{i-1} = \frac{x_{i} - x}{\Delta x}, ~~~~~~\lambda _{i} = \frac{x-x_{i-1}}{\Delta x} \end{aligned}$$
(25)

The functions \(\lambda _{j}\) are zero for all elements except \(e_{j}\) and \(e_{j+1}\) (\(\forall e_i \notin \{e_{j}, e_{j+1}\}\)). Since they fit the definition of linear Lagrange functions, we can write \(\phi _i=\lambda _i\). Within a given element \(e_i\), we can construct the interpolated approximate solution for \({\tilde{{T}}}\) from \({{\textbf{T}}}\) using

$$\begin{aligned} {\tilde{{T}}}(x) ~=~ {T}_{i-1} \phi _{i-1} (x) + {T}_{i} \phi _{i} (x) \end{aligned}$$
(26)

The expression is compact because all shape functions other than \(\phi _{i-1}\) and \(\phi _{i}\) are zero within this element. Note that the derivatives of the shape functions in this element are simply

$$\begin{aligned} \frac{\text{d} \phi _{i-1}}{\text{d}x} = -\frac{1}{\Delta x}, ~~~~~~\frac{\text{d} \phi _{i}}{\text{d}x} = \frac{1}{\Delta x} \end{aligned}$$
(27)

which allows for easy evaluation of the matrix coefficients.

Evaluation of the integrals in (21) and (22) allows us to construct (8) as

$$\begin{aligned} \frac{1}{\Delta x^2}\left( \begin{array}{cccccccc} 2 &{} -1 &{} 0 &{} \cdots &{} 0 &{} 0 &{} 0 \\ -1 &{} 2 &{} -1 &{} \cdots &{} 0 &{} 0 &{} 0 \\ \vdots &{} \vdots &{} \vdots &{} &{} \vdots &{} \vdots &{} \vdots \\ 0 &{} 0 &{} 0 &{} \cdots &{} -1 &{} 2 &{} -1 \\ 0 &{} 0 &{} 0 &{} \cdots &{} 0 &{} -1 &{} 1\end{array}\right) \left( \begin{array}{c} {T}_1\\ {T}_2\\ \vdots \\ {T}_{n-1}\\ {T}_{n}\end{array}\right) = \left( \begin{array}{c} \int f \phi _1 \text{d}x \\ \int f \phi _2 \text{d}x \\ \vdots \\ \int f \phi _{n-1} \text{d}x\\ \int f \phi _n \text{d}x\end{array}\right) \end{aligned}$$
(28)

The integral in the right-hand side vector \({{\textbf{f}}}\) can be found analytically or through numerical integration. The matrix may look familiar to those acquainted with finite difference approximations to the 1D Poisson equation where \(\text{d}^2{T}/\text{d}x^2\) is approximated by second-order central finite differences (for a derivation see, e.g., Cuvelier et al. 1986, their section 2.2.1). The matrix rows repeat triples (\(-\)1, 2, \(-\)1) to form a tridiagonal symmetric matrix for which (very) efficient solution methods exist.

Implementation While writing out the system of equations is instructive, and solutions can be constructed by manual Gaussian elimination for a small number of degrees of freedom n, solution of the equations governing subduction zone thermal structure requires significantly more involved code. Modern software design approaches have become available that allow us to develop numerical code using a relatively simple syntax in which the developer describes the problem in terms of the differential equation and boundary conditions, specifies the coefficients, the geometry and its discretization, and solution methods. We will provide a few examples of high-level syntax (written in python) that can be used with the open-source FEniCS software (Logg et al. 2012) to produce a finite element code. We will first introduce this syntax and provide a more complete description of the approach that we use in Sect. 2.2.3.

figure b

The one-dimensional heat diffusion problem (15)–(17) can be solved using FEniCS with the python function solve_poisson_1d (Listing 1). Lagrange polynomials of order one (defined by the keyword argument p on line 17, which defaults to 1) are used to define a function space (V). Test (T_t) and trial (T_a) functions are defined on this function space, before being used to describe the integrals defining \({{\textbf{S}}}\) and \({{\textbf{f}}}\). The Dirichlet boundary condition at x = 0 is then declared as bc before being passed to a function solve that assembles the matrix–vector system, manipulates it to ensure satisfaction of the essential boundary condition and solves for T_i, the function containing the vector of values of \({\tilde{{T}}}\) at the DOFs \({T}_j\). Finally, the solution is returned.

Higher-order elements We will use this simple example further to show that we can construct shape functions of higher order that allow us to find solutions that are (in general) more accurate with the same number of nodal points compared to solutions with lower order shape functions. We will construct quadratic Lagrange shape functions on the elements as shown in Fig. 1b. Note that each element now has an internal nodal point such that the number of nodal points for the fixed number of elements increases by nearly a factor of two compared to the linear P1 function space (Fig. 1a). Within an element \(e_i\) (\(x_{i-1} \le x \le x_i\)), there are three shape functions that are of quadratic form

$$\begin{aligned} \phi _{i-1} ~&=~ \frac{2}{\Delta x^2} (x - x_i)(x - x_{i-1,i}) =~ 2 \lambda _{i-1} ( \lambda _{i-1} - \tfrac{1}{2} ) \end{aligned}$$
(29)
$$\begin{aligned} \phi _{i-1,i} ~&=~ \frac{-4}{\Delta x^2} (x - x_{i-1})(x - x_i) = 4\lambda _{i-1}\lambda _i \end{aligned}$$
(30)
$$\begin{aligned} \phi _{i} ~&=~ \frac{2}{\Delta x^2} (x - x_{i-1})(x - x_{i-1,i}) = 2 \lambda _i ( \lambda _i - \tfrac{1}{2} ) \end{aligned}$$
(31)

with \(\lambda _i\) and \(\lambda _{i-1}\) defined in (25). We have used the notation \(\phi _{i-1,i}\) to identify the internal Lagrange polynomial centered in element \(e_i\) on the new internal nodal point \(x_{i-1,i}\). This also makes explicit the relation between the P1 nodal points and the edge nodal points (also called vertices) of the P2 elements and clarifies the relationship between P1 and P2 shape functions through (29)–(31). Note that the nonzero values of a quadratic Lagrange shape function may extend beyond the neighboring DOFs and they can be positive or negative depending on where its nodal point is located within an element. Note also that the shape functions now connect more nodal points to the central nodal point—which suggests matrix (28) changes form to have more entries per row than in the case of the P1 based matrix. In addition the matrix will have more rows since there are more nodal points for the same number of elements. Clearly the use of higher order elements comes at a greater computational cost since it is more expensive to solve a larger algebraic system.

Calling the python function solve_poisson_1d with a second keyword argument p=2 allows us to solve the system with quadratic Lagrange shape functions. The script shows that only the definition of the FunctionSpace is changed by setting p=2. Figure 2 shows the approximate solution for linear and quadratic elements on a coarse grid compared to the analytical solution. Note that the P2 solution stays closer to the analytical solution than the P1 solution.

Convergence analysis Repeating the numerical experiments with increasing ne allows us to test the convergence of our approximate finite element solution to the known analytical solution (18). A key feature of any discretization technique is that with an increasing number of DOFs these solutions should converge, i.e. the error in our approximation should decrease. As an error metric, we will use the \({L}^2\) norm of the difference between the approximate, \({\tilde{{T}}}\), and analytical, T, solutions

$$\begin{aligned} {e_{L^2,P} = \sqrt{\int _\Upomega \left( {\tilde{T}} - T\right) ^2 {\text{d}}x}} \end{aligned}$$
(32)

where the subscript P stands for Poisson. The rate at which this decreases is known as the order of convergence. Numerical analysis predicts a certain order depending on the type of the polynomials used as finite element shape functions and other constraints related to the well-posedness of the problem. For piecewise-linear shape functions, we expect second-order convergence, that is that the error decreases as \(h^{2}\) where h is the nodal point spacing. With piecewise-quadratic elements, we expect to see third-order convergence. These expectations are met by the actual numerical experiments (Fig. 3). Convergence analysis is an essential way to test the accuracy of a numerical model, but it relies on having a known analytical solution and the ability to represent it and its boundary conditions in a discrete function space. We will discuss this issue in the context of other examples with increasing complexity below.

2.2.3 Practical approaches, software availability and comparison

Traditionally, finite element methods have been implemented using Fortran or C/C++ based codes that, at the core, build the matrix–vector system (8) by numerical integration of (9) and (10) after which this system is solved by linear algebraic solvers. Most FEM codes provide options for time dependence and the ability to solve nonlinear and nonlinearly coupled systems of PDEs. Examples of such codes that have been used in geodynamical applications including subduction zone modeling are ConMan (King et al. 1990), Sopale (Fullsack 1995), Underworld (Moresi et al. 2007), CitcomS (Zhong et al. 2008), MILAMIN (Dabrowski et al. 2008), ASPECT (Kronbichler et al. 2013), Sepran (van den Berg et al. 2015), Fluidity (Davies et al. 2011), and Rhea (Burstedde et al. 2013). A number of these are distributed as open-source software and many among those are currently maintained through the Computational Infrastructure for Geodynamics (geodynamics.org). These implementations can be shown to be accurate using intercomparisons and benchmarks (e.g., King et al. 2010; van Keken et al. 2008; Euen et al. 2022; Davies et al. 2011) and make use of advances in parallel computing and efficient linear algebra solver techniques. Yet, modifications to the existing code requires deep insight into the structure of the Fortran/C/C++ code which is not trivial for experienced, let alone beginning, users.

In recent years, an alternative approach for FEM has become available which elevates the user interface to simply specifying the FEM problem and solution method with the high-level approach of which an example is shown in Listing 1. The python code is used to automatically build a finite element model that can be executed in a variety of environments ranging from Jupyter notebooks (jupyter.org) and desktop computers to massively parallel high performance computers. Two prominent examples of this approach are Firedrake (www.firedrakeproject.org) and FEniCS (www.fenicsproject.org). Examples of the use of these two approaches in geodynamical applications are in Davies et al. (2022) and Vynnytska et al. (2013).

We will focus on the use of the FEniCS (“Finite Elements in Computational Sciences”; Alnæs et al. 2015) approach to solving finite element equations. FEniCS is a suite of open-source numerical libraries for the description of finite element problems. Most importantly, it provides a high-level, human-readable language for the description of equations in python (the “Unified Form Language” (UFL); Alnæs et al. 2014, an example of which we provided in Listing 1) and a compiler (the “FEniCS Form Compiler” (FFC); Kirby and Logg 2006) to write fast code to assemble the resulting discrete matrix–vector system. We will specifically use FEniCS within TerraFERMA (the “Transparent Finite Element Rapid Model Assembler;” Wilson et al. 2017). TerraFERMA provides a graphical user interface (using the “System for Problem Description” (SPuD); Ham et al. 2009) that allows users to describe the geometry, variables and boundary conditions of their problem and construct physics-based solvers using PETSc (the “Portable Extensible Toolkit for Scientific computation;” Balay et al. 2023).

TerraFERMA aims to increase transparency in modeling by exposing all options, including the equations, in a single options file that can be validated and automatically updated, which increases reproducibility. We provide all options files used in the following sections in a repository and in a docker image (see material contained in the zenodo repository referenced in the data availability statement) for readers to try. In addition to results from TerraFERMA, we compare some solutions with the aforementioned finite element package Sepran which has been used extensively in subduction zone modeling (e.g., Syracuse et al. 2010; van Keken et al. 2011). Sepran is not an open-source code but allows for direct comparisons between independent finite element methods to establish their relative precision.

2.2.4 The Poisson equation beyond 1D

We can generalize (and formalize) the description of the Poisson equation using the steady-state heat diffusion equation in multiple dimensions, where (14) becomes

$$\begin{aligned} -\nabla \cdot \left( k \nabla {T} \right)&= {H}{} & {} \text {in }\Upomega \end{aligned}$$
(33)

after assuming zero velocity. T is the temperature solution we are seeking, k is the thermal conductivity, and H is a heat source. If k is constant in space, we can simplify (33) to

$$\begin{aligned} -\nabla ^2 {T}&= f{} & {} \text {in }\Upomega \end{aligned}$$
(34)

where \(f = \frac{{H}}{k}\).

Boundary conditions We supplement (34) with some combination of the boundary conditions (2)

$$\begin{aligned} {T}&= g_D{} & {} \text {on } \partial \Upomega _D \subset \partial \Upomega \end{aligned}$$
(35)
$$\begin{aligned} \nabla {T}\cdot \widehat{{\vec {n} }} &= g_N{} & {} \text {on } \partial \Upomega _N \subset \partial \Upomega \end{aligned}$$
(36)
$$\begin{aligned} a{T + \nabla T}\cdot \widehat{{\vec {n} }} &= g_R{} & {} \text {on } \partial \Upomega _R \subset \partial \Upomega \end{aligned}$$
(37)

where \(\partial \Upomega _D\), \(\partial \Upomega _N\) and \(\partial \Upomega _R\) are segments of the domain boundary that do not overlap (\(\partial \Upomega _D \bigcap \partial \Upomega _N =\)Ø, \(\partial \Upomega _D \bigcap \partial \Upomega _R =\)Ø, \(\partial \Upomega _N \bigcap \partial \Upomega _R =\)Ø) and that together span the entire boundary (\(\partial \Upomega _D \bigcup \partial \Upomega _N \bigcup \partial \Upomega _R = \partial \Upomega\)). The unit outward-pointing normal to the boundary \(\partial \Upomega\) is denoted by \(\widehat{{\vec {n} }}\) , and \(g_D = g_D( \vec {x}, t)\), \(g_N = g_N( \vec {x}, t)\) and \(g_R = g_R( \vec {x}, t)\) are known functions of space and time. Equation (35) is known as a Dirichlet boundary condition and specifies the value of the solution on \(\partial \Upomega _D\). Equation (36) is a Neumann boundary condition and specifies the value of the flux through \(\partial \Upomega _N\). Finally, Equation (37) is a Robin boundary condition, which describes a linear combination of the flux and the solution on \(\partial \Upomega _R\).

Weak form The first step in the finite element discretization of (34) is to transform it into its weak form. Following (7), this requires multiplying the equation by a test function, \({T}_t\), and integrating over the domain \(\Upomega\)

$$\begin{aligned} -\int _\Upomega {T}_t \nabla ^2 {T} ~\text{d}x = \int _\Upomega {T}_t f ~\text{d}x \end{aligned}$$
(38)

After integrating the left-hand side by parts

$$\begin{aligned} \int _\Upomega \nabla {T}_t \cdot \nabla {T} ~\text{d}x - \int _{\partial \Upomega } {T}_t \nabla T\cdot \widehat{{\vec {n} }} ~\text{d}s = \int _\Upomega {T}_t f ~\text{d}x \end{aligned}$$
(39)

we can see that we have reduced the continuity requirements on T by only requiring its first derivative to be bounded across \(\Upomega\) (see Hughes 1987, for a more formal discussion of the requirements on the solution). Integrating by parts also allows Neumann and Robin boundary conditions to be imposed “naturally” through the second integral on the left-hand side since this directly incorporates the flux components across the boundary. In this formulation, Dirichlet conditions cannot be imposed weakly and are referred to as essential boundary conditions, that are required of the solution but do not arise naturally in the weak form. The weak form therefore becomes: find T such that T=\(g_D\) on \(\partial \Upomega _D\) and

$$\begin{aligned} \int _\Upomega \nabla {T}_t \cdot \nabla {T} ~\text{d}x - \int _{\partial \Upomega _N} {T}_t g_N ~\text{d}s - \int _{\partial \Upomega _R} {T}_t \left( g_R - a{T}\right) ~\text{d}s = \int _\Upomega {T}_t f ~\text{d}x \end{aligned}$$
(40)

for all \({T}_t\) such that \({T}_t=0\) on \(\partial \Upomega _D\).

Discretization The weak (40) and strong (34)–(37) forms of the problem are equivalent so long as the solution is sufficiently smooth. We make our first approximation to the solution by seeking the trial function \({\tilde{{T}}}\) such that \({\tilde{{T}}} = g_D\) on \(\partial \Upomega _D\) where

$$\begin{aligned} {T} \approx {\tilde{{T}}} = \sum _j \phi _j {T}_j \end{aligned}$$
(41)

for all test functions \({\tilde{T}}_t\) where

$$\begin{aligned} {T}_t \approx {\tilde{{T}}}_t = \sum _i \phi _i {T}_{ti} \end{aligned}$$
(42)

noting again that \({\tilde{{T}}}_t = 0\) on \(\partial \Upomega _D\). The finite element shape functions \(\phi _j\) are as discussed earlier. Assuming these are continuous across elements of the mesh, (41) and (42) can be substituted into (40) to yield

$$\begin{aligned}{} & {} \sum _i\sum _j {T}_{ti}{T}_j\sum _k \int _{e_k} \nabla \phi _i \cdot \nabla \phi _j ~\text{d}x + \sum _i\sum _j {T}_{ti}{T}_j \sum _k \int _{\partial e_k \cap {\partial \Upomega _R}} \phi _i a\phi _j ~\text{d}s\nonumber \\{} & {} - \sum _i {T}_{ti} \sum _k \int _{\partial e_k \cap {\partial \Upomega _N}} \phi _i g_N ~\text{d}s - \sum _i {T}_{ti} \sum _k \int _{\partial e_k \cap {\partial \Upomega _R}} \phi _i g_R\nonumber \\{} & {} = \sum _i {T}_{ti} \sum _k \int _{e_k} \phi _i f ~\text{d}x \end{aligned}$$
(43)

where we are integrating over the whole domain by summing the integrals over all the elements \(e_k\) (\(\int _\Upomega \text{d}x\)=\(\sum _k\int _{e_k} \text{d}x\)). Note that in practice, because the shape functions are zero over most of the domain, only element integrals with nonzero values need be included in the summation. The element boundaries, \(\partial e_k\), are only of interest (due to the assumed continuity of the shape functions between the elements) if they either intersect with \(\partial \Upomega _N\), \(\partial e_k \cap {\partial \Upomega _N}\) or \(\partial \Upomega _R\), \(\partial e_k \cap {\partial \Upomega _R}\). Since the solution of the now discretized weak form should be valid for all \({\tilde{{T}}}_t\) , we can drop \({T}_{ti}\) from (43)

$$\begin{aligned}{} & {} \sum _j{T}_j\sum _k \int _{e_k} \nabla \phi _i \cdot \nabla \phi _j ~\text{d}x + \sum _j{T}_j\sum _k \int _{\partial e_k \cap {\partial \Upomega _R}} \phi _i a \phi _j ~\text{d}s \nonumber \\{} & {} - \sum _k \int _{\partial e_k \cap {\partial \Upomega _N}} \phi _i g_N ~\text{d}s - \sum _k \int _{\partial e_k \cap {\partial \Upomega _R}} \phi _i g_R~\text{d}s = \sum _k \int _{e_k} \phi _i f ~\text{d}x \end{aligned}$$
(44)

This represents a matrix–vector system of the form of (8) with

$$\begin{aligned} {{\textbf{S}}}&= {S}_{ij} = \sum _k\int _{e_k} \nabla \phi _i \cdot \nabla \phi _j ~\text{d}x + \sum _k \int _{\partial e_k \cap {\partial \Upomega _R}} \phi _i a\phi _j ~\text{d}s \end{aligned}$$
(45)
$$\begin{aligned} {{\textbf{f}}}&= f_i = \sum _k \int _{e_k} \phi _i f ~\text{d}x + \sum _k \int _{\partial e_k \cap {\partial \Upomega _N}} \phi _i g_N ~\text{d}s + \sum _k \int _{\partial e_k \cap {\partial \Upomega _R}} \phi _i g_R~\text{d}s \end{aligned}$$
(46)
$$\begin{aligned} {{\textbf{u}}}&= {{\textbf{T}}} = {T}_j \end{aligned}$$
(47)

The compact support of the shape functions \(\phi _{(i,j)}\), which limits their nonzero values to the elements immediately neighboring DOF i or j, means that the integrals in (45) and (46) can be evaluated efficiently by only considering shape functions associated with an element \(e_k\). It also means that the resulting matrix \({{\textbf{S}}}\) is sparse, with most entries being zero. These properties can be seen by considering a one-dimensional version of (34) as discussed in Sect. 2.2.2.

For an example of the implementation of the 2D Poisson problem on a unit square see Listing 2 with convergence tests and solution in Fig. 4. In this case, we use a manufactured solution (that is, one that is not necessarily an example of a solution to a PDE representing a naturally occurring physical problem) where we take a known analytical solution T(xy) and substitute this into (34) to find f and then use this as the right-hand side in our numerical test. We choose T(xy) = \(\exp \left( x+\tfrac{y}{2}\right)\) which is the solution to

$$\begin{aligned} - \nabla ^2 {T} = -\tfrac{5}{4} \exp \left( x+\tfrac{y}{2} \right) \end{aligned}$$
(48)

Solving (48) numerically in a unit square, \(\Upomega =[0,1]\times [0,1]\), for the approximate solution \({\tilde{{T}}} \approx {T}\), we impose the boundary conditions

$$\begin{aligned} {\tilde{{T}}}&= \exp \left( x+\tfrac{y}{2}\right){} & {} \text {on } \partial \Upomega \text { where } x=0 \text { or } y=0 \end{aligned}$$
(49)
$$\begin{aligned} \nabla {\tilde{{T}}}\cdot \widehat{{\vec {n} }} &= \exp \left( x + \tfrac{y}{2}\right){} & {} \text {on } \partial \Upomega \text { where } x=1 \end{aligned}$$
(50)
$$\begin{aligned} \nabla {\tilde{{T}}}\cdot \widehat{{\vec {n} }} &= \tfrac{1}{2}\exp \left( x + \tfrac{y}{2}\right){} & {} \text {on } \partial \Upomega \text { where } y=1 \end{aligned}$$
(51)

where (49) represents an essential Dirichlet condition on the value of \({\tilde{{T}}}\) and (50)–(51) are natural Neumann conditions on \(\nabla {\tilde{{T}}}\).

Listing 2 shows an implementation of this problem using FEniCS, which returns the approximate solution \({\tilde{{T}}}\). Comparison of this to the analytical solution T using the metric (32) gives the expected order of convergence for the P1 and P2 elements (see Fig. 4).

figure c

2.2.5 Batchelor cornerflow problem

The solid flow in a subduction zone is primarily driven by the motion of the downgoing slab entraining material in the mantle wedge and dragging it down with it setting up a cornerflow in the mantle wedge (see, e.g., Figure 1a in part I). This effect can be simulated by imposing the motion of the slab as a kinematic boundary condition at the base of the dynamic mantle wedge, allowing us to drop the buoyancy term from (12), \(\vec{f}_B\) = 0. With the further assumption of an isoviscous rheology, \(2\eta\) = 1, the momentum and mass equations simplify to

$$\begin{aligned} -\nabla \cdot \left( \frac{\nabla \vec {v} + \nabla \vec {v}^T}{2}\right) + \nabla {P}&= 0{} & {} \text {in }\Upomega \end{aligned}$$
(52)
$$\begin{aligned} \nabla \cdot \vec {v}&= 0{} & {} \text {in }\Upomega \end{aligned}$$
(53)

Here, \(\vec {v}\) is the velocity of the mantle in the subduction zone wedge, \(\Upomega\), and P is the pressure. Imposing isothermal conditions means that (14) has been dropped altogether. With these simplifications, we can test our numerical solution to (52) and (53) against the analytical solution provided by Batchelor (1967).

Analytical solution To more easily describe the analytical solution, we consider the cornerflow geometry in Fig. 5a where we have rotated the mantle wedge by 90\(^\circ\) counterclockwise and assumed a 90\(^\circ\) angle between the wedge boundaries. In this geometry Eqs. (52) and (53) can be transformed into a biharmonic equation for the stream function, \(\psi\),

$$\begin{aligned} \nabla ^4 \psi = 0 \end{aligned}$$
(54)

where \(\psi = \psi (r,\theta )\) is a function of the radius, r, and angle from the x-axis, \(\theta\), related to the velocity, \(\vec {v} = \vec {v}(x, y)\) by

$$\begin{aligned} \vec {v} = \left( \begin{array}{cc}\cos \theta &{} -\sin \theta \\ \sin \theta &{} \cos \theta \end{array}\right) \left( \begin{array}{c}\frac{1}{r}\frac{\partial \psi }{\partial \theta } \\ -\frac{\partial \psi }{\partial r}\end{array}\right) \end{aligned}$$
(55)

With semi-infinite x and y axes, a rigid boundary condition, \(\vec {v} = \textbf{0}\), along the y-axis (the rotated “crust” at the top of the wedge), and a kinematic boundary condition on the x-axis (the “slab” surface at the base of the wedge), \(\vec {v} = (\text{U}, 0)^T\), the analytical solution is found as

$$\begin{aligned} \psi (r, \theta )~=~\frac{r {\text{U}} }{\frac{1}{4}\pi ^2-1} \left( -\frac{1}{4}\pi ^2 \sin \theta + \frac{1}{2}\pi \theta \sin \theta + \theta \cos \theta \right) \end{aligned}$$
(56)

Discretization Since it is not possible with our numerical approach to solve the equations in a semi-infinite domain, we discretize (52) and (53) in a unit square domain with unit length in the x and y domains, as in Fig. 5b. We choose different function spaces, with different shape functions, \(\vec {\omega } _j(x)\) and \(\chi _j(x)\) for the approximations of \(\vec {v}\) and P, respectively, such that

$$\begin{aligned} \vec{v} \approx \tilde{{\vec{v} }} &= \sum _j \omega ^k_j v^k_j \end{aligned}$$
(57)
$$\begin{aligned} {P} \approx {\tilde{{P}}}&= \sum _j \chi _j {P}_j \end{aligned}$$
(58)

where \(v^k_j\) and \({P}_j\) are the values of velocity and pressure at node j, respectively, and the superscript k represents the spatial component of \(\vec {v}\). The discrete test functions \(\tilde{ \vec {v}}_t\) and \({\tilde{{P}}}_t\) are similarly defined. We will discuss the choice of \(\vec {\omega } _{j} = \omega ^k_j\) and \(\chi _j\) later but simply assume that they are continuous across elements of the mesh in the following.

Boundary conditions To match the analytical solution (56), we apply essential Dirichlet conditions on \(\tilde{ \vec {v}}\) on all four sides of the domain

$$\begin{aligned} \tilde{ \vec {v}}&= (0,0)^T{} & {} \text {on } \partial \Upomega \text { where } x=0 \end{aligned}$$
(59)
$$\begin{aligned} \tilde{ \vec {v}}= ({\text{U}}, 0)^T{} & {} \text {on } \partial \Upomega \text { where } y=0 \end{aligned}$$
(60)
$$\begin{aligned} \tilde{ \vec {v}}&= \vec {v}{} & {} \text {on } \partial \Upomega \text { where } x=1 \text { or } y = 1 \end{aligned}$$
(61)

Note that the first two conditions imply a discontinuity in the solution for \(\tilde{ \vec {v}}\) at (xy) = (0, 0). The last boundary condition simply states that we apply the analytical solution (obtained from (56) via (55)) at the boundaries at x = 1 and y = 1. One consequence of applying essential boundary conditions on \(\vec {v}\) on all sides of the domain is that P is unconstrained up to a constant value as only its spatial derivatives appear in the equations. The ability to add an arbitrary constant to the pressure is referred to as the pressure containing a null space. This makes it impossible to find a unique solution to (52) and (53) with (59)–(61) since an infinite number of pressure solutions exist. There are a number of ways to select an appropriate pressure solution. Here, we arbitrarily choose one such solution by adding the condition that

$$\begin{aligned} {\tilde{{P}}}&= 0{} & {} \text {at } (x, y) = (0,0) \end{aligned}$$
(62)

which will allow a unique solution to the discrete equations to be found.

Weak form Multiplying (52) by \(\vec {v}_t\) and (53) by \({P}_t\), integrating (by parts) over \(\Upomega\), and discretizing the test and trial functions allows the discrete matrix–vector system of the form of (8) to be written as

$$\begin{aligned} {{\textbf{S}}}&= \left( \begin{array}{cc}{{\textbf{K}}} &{} {{\textbf{G}}} \\ {{\textbf{D}}} &{} {{\textbf{0}}}\end{array}\right) \end{aligned}$$
(63)
$$\begin{aligned} {{\textbf{K}}}&= K_{i_1j_1} = \sum _k\int _{e_k} \left( \frac{\nabla \vec {\omega } _{i_1} + \nabla \vec {\omega } _{i_1}^T}{2}\right) :\left( \frac{\nabla \vec {\omega } _{j_1} + \nabla \vec {\omega } _{j_1}^T}{2}\right) \text{d}x \end{aligned}$$
(64)
$$\begin{aligned} {{\textbf{G}}}&= G_{i_1j_2} = - \sum _k \int _{e_k} \nabla \cdot \vec {\omega } _{i_1} \chi _{j _2} \text{d}x \end{aligned}$$
(65)
$$\begin{aligned} {{\textbf{D}}}&= D_{i_2j_1} = - \sum _k \int _{e_k} \chi _{i_2} \nabla \cdot \vec {\omega } _{j_1} \text{d}x \end{aligned}$$
(66)
$$\begin{aligned} {{\textbf{u}}}&= \left( {{ \textbf {v}}}, {{\textbf{P}}}\right) ^T = \left( \vec {v}_{j_1}, {P}_{j_2}\right) ^T \end{aligned}$$
(67)
$$\begin{aligned} {{\textbf{f}}}&= f_i = 0 \end{aligned}$$
(68)

Note that in (64)–(66) all surface integrals around \(\partial \Upomega\) arising from integration by parts have been dropped because the velocity solution is fully specified on all boundaries. Additionally, when integrating (64) by parts we have used the fact that \(\nabla \vec {\omega } _{i_1}:\left( \frac{\nabla \vec {\omega } _{j_1} + \nabla \vec {\omega } _{j_1}^T}{2}\right) = \left( \frac{\nabla \vec {\omega } _{i_1} + \nabla \vec {\omega } _{i_1}^T}{2}\right) :\left( \frac{\nabla \vec {\omega } _{j_1} + \nabla \vec {\omega } _{j_1}^T}{2}\right)\) to demonstrate the symmetry of \({{\textbf{K}}}\). In fact, \({{\textbf{S}}}\) has been made symmetric by integrating the gradient of pressure term, \(\nabla {P}\), by parts in (65) and negating (53) in (66) such that \({{\textbf{G}}} = {{\textbf{D}}}^T\). This symmetry property can be exploited when choosing an efficient method of solving (8).

As before, the weak form of (63) may be described using UFL with rather simple python code shown in Listing 3. For the sake of brevity, we have assumed that the test and trial functions v_t, p_t, v_a and p_a have been declared. Additional code is also required to fully describe the boundary conditions and solve the resulting system. The full example is provided in the material contained in the zenodo repository referenced in the data availability statement as a TerraFERMA input file.

figure d

An important aspect of \({{\textbf{S}}}\) is that it describes a so-called saddle point system. The lower right block is zero, which indicates that pressure is acting in this system as a Lagrange multiplier, enforcing the constraint that the velocity is divergence free but not appearing in (53) itself. Such systems require special consideration of the choice of shape functions for the discrete approximations of velocity and pressure to ensure the stability of the solution, \({{\textbf{u}}}\). Several choices of so-called stable element pairs, \(( \vec {\omega } _j, \chi _j)\) are available in the literature (e.g., Auricchio et al. 2017). Here we select the frequently used lowest order Taylor–Hood element pair, in which \(\vec {\omega } _j\) are piecewise-quadratic and \(\chi _j\) are piecewise-linear polynomials, referred to on triangular (and tetrahedral in 3D) meshes as P2P1. This fulfills a necessary (but not sufficient) criterion for stability that the velocity has more DOFs than the pressure. Solving (63)–(68) subject to the conditions (59)–(62) on a series of successively finer meshes and comparing the resulting solution to the analytical result given by (56) and (55) using the error metric

$$\begin{aligned} {e_{L^2,B}} = \sqrt{\int _\Upomega (\tilde{ \vec {v}}- \vec {v})\cdot (\tilde{ \vec {v}}- \vec {v}) \text{d}x } \end{aligned}$$
(69)

(where B stands for Batchelor) shows linear rather than quadratic convergence. We encourage the readers to convince themselves of this by running the example. This first-order convergence rate is lower than would be expected for piecewise-quadratic velocity functions. This drop in convergence is caused by the boundary conditions at the origin being discontinuous, which cannot be represented in the selected function space and results in a pressure singularity at that point. This is an example where convergence analysis demonstrates suboptimal results due to our inability to represent the solution in the selected finite element function space.

2.2.6 Blankenbach thermal convection benchmark

Before discussing the solution of the full governing equations for subduction zone thermal structure, we will explore solving the equations governing a buoyancy-driven convection model in a square domain following the steady-state mantle convection benchmarks from Blankenbach et al. (1989). This example allows us to couple a steady-state advection–diffusion equation for temperature to the Stokes and mass conservation equations we have already discussed. This also provides an example of solving a nonlinearly coupled system and will show how we can test a model for which no analytical solution exists.

The flow in the box is driven by heating from below and cooling from above (Fig. 6). We solve (12) and (13)

$$- \nabla \cdot \left( {2\eta \frac{{\nabla \vec{v} + \nabla \vec{v}^{T} }}{2}} \right) + \nabla {P} = - {\text{Ra}}{T}\hat{\vec{g}}{\text{ }}\;\;\;{\text{ in }}\Omega$$
(70)
$$\begin{aligned} \nabla \cdot \vec {v}&= 0{} & {} \text {in } \Upomega \end{aligned}$$
(71)

where variable rheology is permitted through the inclusion of the viscosity \(\eta\) and the buoyancy force vector has been defined as \(\vec{f}_B\)=\(-\text{Ra}{T}{\hat{\vec{g}}}\), using the temperature T, nondimensional Rayleigh number, Ra, and unit vector in the direction of gravity, \(\hat{\vec{g}}\). The Rayleigh number arises from the nondimensionalization of the governing equations and is a ratio that balances factors that enhance convective vigor (e.g., thermal expansivity, gravity) with those that retard convective vigor (e.g., viscosity). In general, convective vigor increases with increasing Ra when it exceeds a critical value for the Rayleigh number (see, e.g., Turcotte and Schubert 2002). The heat equation (14), under the assumptions of steady state (\(\frac{\partial {T}}{\partial t}\) = 0), constant material properties (k = 1) and zero internal heating (H = 0), reads

$$\begin{aligned} \vec {v} \cdot \nabla {T}&= \nabla ^2 {T}{} & {} \text {in } \Upomega \end{aligned}$$
(72)

Boundary conditions We discretize the trial function spaces for temperature (\({T}\approx {\tilde{{T}}}\)), velocity (\(\vec {v} \approx \tilde{ \vec {v}}\)) and pressure (\( {P}\approx {\tilde{{P}}}\)) as before using (41), (57) and (58), with similarly defined discrete test functions, \({\tilde{{T}}}_t\), \(\tilde{ \vec {v}}_t\) and \({\tilde{{P}}}_t\). For the Stokes problem, we assume free-slip boundaries. These are formed by the combination of a Dirichlet boundary condition of zero normal velocity (\(v_n\) = \(\tilde{ \vec {v}}\cdot { \widehat{{\vec {n} }} }\) = 0) and a Neumann zero tangential stress condition (\(\tau _t\) = \(( {\varvec{\tau }} \cdot \widehat{{\vec {n} }} ) \cdot \widehat{{\vec {t} }}\) = 0). Here, \(\widehat{{\vec {n} }}\) is the unit normal to the boundary, \(\widehat{{\vec {t} }}\) is the unit tangent on the boundary (see Fig. 6a), and \(\varvec{\tau }\) is the deviatoric stress tensor

$$\begin{aligned} {\varvec{\tau }} = 2\eta \frac{\nabla \tilde{ \vec {v}} + \nabla \tilde{ \vec {v}}^T}{2} = 2\eta \begin{bmatrix} \frac{\partial {\tilde{v}}_x}{\partial x} &{} \frac{1}{2} \left( \frac{\partial {\tilde{v}}_x}{\partial y} + \frac{\partial {\tilde{v}}_y}{\partial x} \right) \\ \frac{1}{2} \left( \frac{\partial {\tilde{v}}_x}{\partial y} + \frac{\partial {\tilde{v}}_y}{\partial x} \right) &{} \frac{\partial {\tilde{v}}_y}{\partial y} \end{bmatrix} \end{aligned}$$
(73)

This set of velocity boundary conditions once again results in a pressure null space. We arbitrarily choose to impose the extra condition that \({\tilde{{P}}}(0,0)\) = 0 to force a unique solution to exist. For the heat equation, the side boundaries are insulating (imposed by the Neumann boundary condition \(\partial {{\tilde{{T}}}}/\partial {x}\) = 0) with Dirichlet boundary conditions for the top boundary (\({\tilde{{T}}}\) = 0) and bottom boundary (\({\tilde{{T}}}\) = 1).

Nonlinearity Unlike the previous examples, which were linear problems of their solution variables, (70)–(72) are nonlinear. For an isoviscous rheology, the equations are individually linear but the buoyancy contribution to (70) and the advective component in (72) mean that the coupled system of equations is nonlinear, with \(\vec {v}\) depending on T and vice versa. For non-Newtonian rheologies, where \(\eta\)=\(\eta ( \vec {v})\), (70) itself becomes nonlinear too. Because of this, rather than immediately defining the weak forms of the linear operator \({{\textbf{S}}}\) we begin by considering the weak form of the nonlinear residual, \({{\textbf{r}}}\). This is derived in exactly the same manner as before by multiplying (70) by \(\vec {v}_t\), (71) by \({P}_t\) and (72) by \({T}_t\), discretizing the functions, integrating (by parts) over the domain \(\Upomega\), dropping the resulting surface integrals (either to enforce the weak boundary conditions or because they are unnecessary due to the essential boundary conditions), and defining the discrete weak forms as

$$\begin{aligned} {{\textbf{r}}}_{ \vec {v}}&= r_{ \vec {v}_{i_1}} := \sum _k \int _{e_k} \left[ \left( \frac{\nabla \vec {\omega } _{i_1} + \nabla \vec {\omega } _{i_1}^T}{2}\right) :2\eta \left( \frac{\nabla \tilde{ \vec {v}} + \nabla \tilde{ \vec {v}}^T}{2}\right) - \nabla \cdot \vec {\omega } _{i_1} {\tilde{{P}}} + \vec {\omega } _{i_1}\cdot \hat{\vec{g}} \text{Ra} {\tilde{{T}}} \right] \text{d}x = 0 \end{aligned}$$
(74)
$$\begin{aligned} {{\textbf{r}}}_P&= r_{P_{i_2}} := - \sum _k \int _{e_k} \chi _{i_2} \nabla \cdot \tilde{ \vec {v}} \text{d}x = 0 \end{aligned}$$
(75)
$$\begin{aligned} {{\textbf{r}}}_{T}&= r_{{T}_{i_3}} := \sum _k \int _{e_k} \left[ \phi _{i_3} \tilde{ \vec {v}}\cdot \nabla {\tilde{{T}}} + \nabla \phi _{i_3} \cdot \nabla {\tilde{{T}}} \right] \text{d}x = 0 \end{aligned}$$
(76)

Here, \({{\textbf{r}}} = \left( {{\textbf{r}}}_{ \vec {v}}, {{\textbf{r}}}_{P}, {\textbf{r}}_{T}\right) ^T = \left( r_{ \vec {v}_{i_1}}, r_{{P}_{i_2}}, r_{{T}_{i_3}}\right) ^T\) is a residual vector, the root of which must be found in order to find an approximate solution to (70)–(72). Finding the exact root is not generally possible. Instead, we aim to find \({{\textbf{r}}}\)=\({{\textbf{0}}}\) within some tolerance. For example, we can use an L\(^2\) norm and an absolute \(||{{\textbf{r}}}||_2 = \sqrt{{{\textbf{r}}}\cdot {\textbf{r}}} < {\epsilon}_{\text{atol}}\), or relative, \(\frac{||{\textbf{r}}||_2}{||{{\textbf{r}}}^0||_2} = \frac{\sqrt{{{\textbf{r}}}\cdot {\textbf{r}}}}{\sqrt{{{\textbf{r}}}^0\cdot {{\textbf{r}}}^0}} < {\epsilon}_{\text{rtol}}\), tolerance, where \({{\textbf{r}}}^0\) is the residual evaluated using the initial guess at the solution. We will briefly discuss two commonly used approaches to approximately finding the residual root.

Newton’s method To find the root, \({\textbf{u}}^{i+1}\)=\(\left( {{ \vec {\textbf{v}}}}^{i+1}, {{\textbf{P}}}^{i+1}, {\textbf{T}}^{i+1}\right) ^T\)=\(\left( \vec {v}_{j_1}^{i+1}, {P}_{j_2}^{i+1}, {T}_{j_3}^{i+1}\right) ^T\), we can expand the residual in a Taylor series around the current best guess at the solution \({\textbf{u}}^{i}\)=\(\left( {{ \vec {\textbf{v}}}}^{i}, {{\textbf{P}}}^{i}, {\textbf{T}}^{i}\right) ^T\)=\(\left( \vec {v}_{j_1}^{i}, {P}_{j_2}^{i}, {T}_{j_3}^{i}\right) ^T\) such that

$$\begin{aligned} {{\textbf{r}}}\left( {{\textbf{u}}}^{i+1}\right) = \,\,{{\textbf{r}}}\left( {{\textbf{u}}}^{i}\right) + {{\textbf{r}}}^\prime \left( {{\textbf{u}}}^{i}\right) \left( {{\textbf{u}}}^{i+1}-{\textbf{u}}^{i}\right) + {{\textbf{r}}}^{\prime \prime }\left( {\textbf{u}}^{i}\right) \left( {{\textbf{u}}}^{i+1}-{{\textbf{u}}}^{i}\right) ^2 +... = {\textbf{0}} \end{aligned}$$
(77)

where \({{\textbf{r}}}^\prime \left( {{\textbf{u}}}^{i}\right)\) and \({\textbf{r}}^{\prime \prime }\left( {{\textbf{u}}}^{i}\right)\) represent the first- and second-order derivatives of the residual with respect to the solution variables, evaluated at \({{\textbf{u}}}^i\). Dropping terms with orders higher than first, defining the Jacobian \({{\textbf{J}}}\left( {\textbf{u}}^{i}\right) = {\textbf{r}}^\prime \left( {{\textbf{u}}}^{i}\right)\) and \({\mathbf \delta u} = {{\textbf{u}}}^{i+1}-{{\textbf{u}}}^i\), and rearranging results in the matrix equation

$$\begin{aligned} {{\textbf{J}}}\left( {{\textbf{u}}}^{i}\right) {\mathbf \delta {\bf{u}}} = -{{\textbf{r}}}\left( {\textbf{u}}^{i}\right) \end{aligned}$$
(78)

which can be solved for \({\mathbf \delta \bf{u}}\) and used to find \({\textbf{u}}^{i+1} = {{\textbf{u}}}^i + {\delta {\mathbf u}}\). Since we have dropped terms from the Taylor expansion, \({{\textbf{u}}}^{i+1}\) will only be a first-order approximation of the root of \({{\textbf{r}}}\). So long as the initial guess \({{\textbf{u}}}^i\) is close enough to the final solution and (78) is solvable, then \({{\textbf{u}}}^{i+1}\) should give a better estimate of \({{\textbf{r}}} = {{\textbf{0}}}\), in the sense that \({{\textbf{r}}}\left( {{\textbf{u}}}^{i+1}\right) < {{\textbf{r}}}\left( {\textbf{u}}^{i}\right)\). Repeatedly solving (78) and at each iteration updating \({{\textbf{u}}}^{i+1} \rightarrow {{\textbf{u}}}^{i}\) will then result in a final solution where \({{\textbf{r}}}\) approaches \({{\textbf{0}}}\) in some norm and to some tolerance.

For highly nonlinear problems, the Jacobian matrix, \({{\textbf{J}}} = {\textbf{r}}^\prime\), can be complicated and difficult to derive, let alone to code. Fortunately, modern finite element libraries, like FEniCS, that provide the symbolic and human-readable representation of weak forms, seen above through UFL, allow the Jacobian to be automatically evaluated and assembled. For (74)–(76), this results in the code snippet in Listing 4.

figure e

For the sake of brevity, we have assumed that the most recent iterated solutions, v_i, p_i and T_i, and test functions, v_t, p_t and T_t, have been declared. The individual solutions are part of a larger system solution, u_i=(v_i, p_i,T_i), and a trial function for the system also exists, u_a=(v_t, p_t, T_t). Additionally, the unit vector in the direction of gravity, gravity, the Rayleigh number, Ra, and the viscosity, eta, have been declared with the latter either being 1 in the isoviscous case or a function of temperature, T_i, in the temperature-dependent case. In either case, the Jacobian matrix, J, is easily obtained using the derivative function. Using this and the residual r allow (78) to be repeatedly solved for u_i until convergence is achieved and the root of the residual found.

Picard’s method Convergence of the Newton iteration method depends on having a good initial guess, which is not always possible, especially when solving steady-state problems like (70)–(72). In this case, an alternative approach is to use a Picard iteration. This splits the equations into multiple linearized subsets and solves them sequentially and repeatedly, updating the nonlinear terms at each iteration, until convergence is achieved. Equations (70)–(72) can be split into two systems of the form of (8), the first for the Stokes system

$$\begin{aligned} {{\textbf{S}}}_s&= \left( \begin{array}{cc}{{\textbf{K}}}_s &{} {{\textbf{G}}}_s \\ {{\textbf{D}}}_s &{} {{\textbf{0}}}\end{array}\right) \end{aligned}$$
(79)
$$\begin{aligned} {{\textbf{K}}}_s&= K_{s_{i_1j_1}} = \sum _k\int _{e_k} \left( \frac{\nabla \vec {\omega } _{i_1} + \nabla \vec {\omega } _{i_1}^T}{2}\right) :2\eta \left( \frac{\nabla \vec {\omega } _{j_1} + \nabla \vec {\omega } _{j_1}^T}{2}\right) \text{d}x \end{aligned}$$
(80)
$$\begin{aligned} {{\textbf{G}}}_s&= G_{s_{i_1j_2}} = - \sum _k \int _{e_k} \nabla \cdot \vec {\omega } _{i_1} \chi _{j _2} \text{d}x \end{aligned}$$
(81)
$$\begin{aligned} {{\textbf{D}}}_s&= D_{s_{i_2j_1}} = - \sum _k \int _{e_k} \chi _{i_2} \nabla \cdot \vec {\omega } _{j_1} \text{d}x \end{aligned}$$
(82)
$$\begin{aligned} {{\textbf{u}}}_s&= \left( {{\mathbf {v}}}, {{\textbf{P}}}\right) ^T = \left( \vec {v}_{j_1}, {P}_{j_2}\right) ^T \end{aligned}$$
(83)
$$\begin{aligned} {{\textbf{f}}}_s&= f_{s_{i_1}} = -\sum _k \int _{e_k} \vec {\omega } _{i_1}\cdot \hat{\vec{g}} \text{Ra} {\tilde{{T}}} \text{d}x \end{aligned}$$
(84)

and the second for the temperature equation

$$\begin{aligned} {{\textbf{S}}}_{T}&= S_{{T}_{ij}} = \sum _k\int _{e_k} \left( \phi _i \tilde{ \vec {v}}\cdot \nabla \phi _j + \nabla \phi _i\cdot \nabla \phi _j\right) \text{d}x \end{aligned}$$
(85)
$$\begin{aligned} {{\textbf{u}}}_{T}&= {{\textbf{T}}} = {T}_{j} \end{aligned}$$
(86)
$$\begin{aligned} {{\textbf{f}}}_{T}&= f_{{T}_{i}} = 0 \end{aligned}$$
(87)

For UFL code snippets of (79) and (84), see Listing 5

figure f
figure g

and for (85), see Listing 6.

The full system solution vector remains \({{\textbf{u}}}\) = \(\left( {{\textbf{u}}}_s, {{\textbf{u}}}_T\right) ^T\) = \(\left( {{ \vec {\textbf{v}}}}, {{\textbf{P}}}, {{\textbf{T}}}\right) ^T\) , and the best guess at the solution is \({{\textbf{u}}}^i\). \({{\textbf{S}}}_s({\textbf{u}}^i){{\textbf{u}}}_s^{i+1}\) = \({{\textbf{f}}}_s({{\textbf{u}}}_s^i)\) is solved for \({\textbf{u}}_s^{i+1}\), which is used to update \({{\textbf{u}}}\) such that \({\textbf{u}}_s^{i+1} \rightarrow {{\textbf{u}}}_s^{i}\) before solving \({\textbf{S}}_T({{\textbf{u}}}^i){{\textbf{u}}}_T^{i+1}\) = \({{\textbf{0}}}\) for an updated solution for temperature, \({{\textbf{u}}}_T^{i+1}\). Repeating this iteration will generally find the root of the residuals (74)–(76) and once again the iteration is repeated until \({{\textbf{r}}}\) = \({{\textbf{0}}}\) in some norm and to some tolerance.

If the initial guess is sufficiently good, then Newton should converge quadratically while a Picard iteration will converge at a lower rate. However neither convergence nor the convergence rate of either method is guaranteed. Various methods are available for solutions that do not converge. These include finding a better initial guess (e.g., a solution from a case with lower convective vigor), “relaxing” the solution by only applying a partial update at each iteration, or linearizing terms in the Jacobian matrix. It should also be noted that, if applied to the linear problems discussed in previous sections, any nonlinear iteration should converge in a single iteration.

Diagnostics The geometry and expressions for the boundary conditions for the selected Blankenbach et al. (1989) cases are shown in Fig. 6a, and a converged model solution for temperature and velocity obtained for Ra=\(10^4\) (benchmark case 1a from Blankenbach et al. 1989) is shown in Fig. 6b. To quantify the precision with which the governing equations can be solved, we focus on two measures of convective vigor. The first is the Nusselt number Nu which is the integrated nondimensional surface heatflow

$$\begin{aligned} \text{Nu} ~=~ - \int _{x=0}^{x=1} \frac{\partial {T}}{\partial y}(x,y=1) ~\text{d}x \end{aligned}$$
(88)

The second is the root-mean-square velocity \({V}_\text {rms}\) defined as

$$\begin{aligned} {V}_\text {rms} ~=~ \sqrt{ \frac{\int _\Upomega \vec {v}\cdot \vec {v} \text{d}x}{\int _\Upomega \text{d}x} } \end{aligned}$$
(89)

Table 9 in Blankenbach et al. (1989) specifies their best estimates for various quantities of the benchmark. We will focus on Nu and \( {V}_\text {rms}\) and show results for their steady-benchmarks 1a–1c (isoviscous, \(\eta\) = 1, with Ra increasing from \(10^4\) to \(10^5\) and \(10^6\)) and benchmark 2a which has Ra = \(10^4\) and a temperature-dependent viscosity \(\eta ({{T}})\) = \(\exp \left( - b{T}\right)\) with b = \(\ln (10^3)\) (see Table 1).

Discretization For the Stokes equation, TerraFERMA uses the P2P1 Taylor–Hood Lagrange element pair for the shape functions \(( \vec {\omega } _j,\chi _j)\) (as in Sect. 2.2.5) and P2 elements for the heat equation (\(\phi _j\)). The choice of elements here can be tersely described as P2P1P2. In TerraFERMA, we apply a Newton iteration to cases 1a–c with a harmonic perturbation to the conductive state \( {T}(x,y)=1-y+0.1 \cos \pi x \sin \pi y\) as an initial guess for temperature and the solution to \({{\textbf{S}}}_s{{\textbf{u}}}_s\) = \({{\textbf{f}}}_s\) given the initial T as a first guess for velocity and pressure. We use a Picard iteration and an isoviscous initial velocity and pressure guess for case 2a owing to the difficulty getting Newton to converge without a better initial guess. Both are solved to a relative tolerance, \({\epsilon}_{\text{rtol}}\), of \(10^{-9}\).

We also show results obtained with Sepran using the same P2P1P2 discretization as in TerraFERMA. The same initial guess is used for case 1a, but for 1b and 2a we use the final solution from 1a as an initial guess and for 1c we use the final solution of 1b. Picard iteration is used for all cases to a relative tolerance of \(10^{-9}\).

Results We obtain results for grids with 32, 64, 128, and 256 elements on a side. The TerraFERMA results have grid refinement toward the edges of the domain to allow for better resolution of the thermal boundary layer at a lower number of grid points. The Sepran results are obtained on equidistant meshes where the computation of (88) is improved following the method of Ho-Liu et al. (1987). We follow Blankenbach et al. (1989) in using Richardson extrapolation to attempt to find the “best” estimate as shown in comparison with theirs in Table 1. We make estimates from the modeling approaches independently and average them to find the “new” results. A brief inspection suggests that the estimates made in 1989 were clearly rather precise!

Figure 7 shows how our model predictions trend toward our average extrapolated values. Note that these are not convergence plots like those used previously when we compared the approximate solution to the analytical solution. Here, the best estimates do not represent metrics obtained from an analytical solution. Some of the flattening or ‘V’-ing in the curves is due to the change in sign of the difference between the modeled and extrapolated values. In general, the difference between approximate solution and extrapolated value is smaller at lower convective vigor (compare 1a and 1c) and larger with stronger nonlinearities (compare 1a and 2a).

2.3 FEM determination of SZ thermal structure

2.3.1 Recap of the governing equations

While we already encountered examples of solution of the governing equations (12)–(14), we will formulate the full set of equations for subduction zone thermal structure below for clarity and completeness’ sake. We will set up the parameters and equations in a general form that we will use in part III for a global suite of models (similar to those in Syracuse et al. (2010) and Wada and Wang (2009)) but restrict ourselves in this part to applying them to a simplified benchmark problem. The equations will be introduced in dimensional form before nondimensionalizing them in Sect. 2.3.2. All dimensional variables will be indicated by a superscript \(^*\). Dimensional reference values will be indicated by the subscript \(_0\). We assume a 2D Cartesian coordinate system with coordinates \(\vec {x}^*=(x^*_1,x^*_2)^T=(x^*,y^*)^T=(x^*,-z^*)^T\) where \(z^*\) is depth.

Conservation of mass under the assumption that the fluid is incompressible leads to

$$\begin{aligned} \nabla ^* \cdot \vec {v}^*~=~0 \end{aligned}$$
(90)

where, in two-dimensions, \(\vec {v}^*=(v^*_1,v^*_2)^T = (v^*_x,v^*_y)^T\) is the velocity vector. Assuming all flow is driven by a kinematic boundary condition, conservation of momentum leads to the dimensional Stokes equation without buoyancy forces

$$\begin{aligned} -\nabla ^* \cdot {\varvec{\tau }^*} +\nabla ^* {\textrm{\textit{P}}}^* ~=~ 0 \end{aligned}$$
(91)

where \({\textrm{\textit{P}}}^*\) is the dynamic pressure and \(\varvec{\tau }^*\) is the deviatoric stress tensor given by

$$\begin{aligned} {\varvec{\tau }^*} ~=~2 \eta ^* { \dot{\varvec{\epsilon}}^*} \end{aligned}$$
(92)

Here, \(\eta ^*\) is dynamic viscosity and \(\dot{\varvec{\epsilon}}^*\) is the deviatoric strain rate tensor with components

$$\begin{aligned} {\dot{\epsilon}}^*_{ij}~=~ \frac{1}{2} \left[ \frac{\partial v^*_i}{\partial x^*_j} + \frac{\partial v^*_j}{\partial x^*_i} \right] \end{aligned}$$
(93)

The time-dependent dimensional heat equation is given by

$$\begin{aligned} {\rho}^* {c_{\text{p}}}_0 \left( {\frac{\partial T^*}{\partial t^*} + \vec {v}^* \cdot \nabla ^* T^*} \right) ~=~ \nabla ^* \cdot \left( k^* \nabla ^* {T}^* \right) +{H}^* \end{aligned}$$
(94)

while, in cases where a steady state is assumed (\(\frac{\partial {T}^*}{\partial t^*}\) = 0) temperature is governed by

$$\begin{aligned} \rho ^* {c_\text{p}}_0 \vec {v}^* \cdot \nabla ^* T^* ~=~ \nabla ^* \cdot \left( k^* \nabla ^* T^* \right) +{H}^* \end{aligned}$$
(95)

where \(\rho ^*\) is density, \({c_\text{p}}_0\) is the heat capacity at constant pressure (assumed constant), \({T}^*\) is temperature, \(k^*\) is thermal conductivity, and \({H}^*\) is volumetric heat production. In this paper, we will assume that the viscosity \(\eta ^*\) is either constant, \(\eta ^*\) = \(\eta _0\), or is a function of temperature and strain rate following a simplified creep law for dislocation creep in dry olivine from Karato and Wu (1993)

$$\begin{aligned} \eta ^*_\text {disl}~=~ {A}_\eta ^* \exp \left( \frac{{E}^*}{n{R}^*({T}^*+{T}^*_a)} \right) {{{\dot{\epsilon}}_{II}}}^{*\frac{1-n}{n}} \end{aligned}$$
(96)

where \({A}_\eta ^*\) is a prefactor, \({E}^*\) is the activation energy, \({R}^*\) is the gas constant, n is a power-law index, \({T}^*_a\) a linear approximation of an adiabatic temperature using a gradient of 0.3\(^\circ\)C/km with \({T}^*_a\) = 0 at the top of the model (which may not be at \(z^*\) = 0 due to assumptions of ocean bathymetry as we will see in Sect. 2.3.3) and \({\dot{\epsilon}}_{II}^*\) is the second invariant of the deviatoric strain rate tensor (also known as the effective deviatoric strain rate)

$$\begin{aligned} {\dot{\epsilon}}_{II}^*~=~ \sqrt{\frac{1}{2} \dot{\varvec{\epsilon}}^*:\dot{\varvec{\epsilon }}^*} \end{aligned}$$
(97)

Since the dynamical range of the viscosity (96) is large over the temperature contrast across subduction zones, it is common practice to cap the viscosity at some arbitrary maximum \(\eta ^*_\text {max}\) so that in the variable viscosity case

$$\begin{aligned} \eta ^*~=~ \left( \frac{1}{\eta ^*_\text {disl}} ~+~ \frac{1}{\eta ^*_\text {max}}\right) ^{-1} \end{aligned}$$
(98)

2.3.2 Nondimensionalization

It is attractive to nondimensionalize the equations such that most quantities are scaled to be close to 1. This provides simple scaling arguments to allow for understanding which terms in the equations are dominant, avoids computer algebra that mixes very large and very small numbers, and provides for the formation of a matrix–vector system where the condition number of the matrix (Golub and Van Loan 1989) is more optimal.

Table 2 provides a list of dimensional reference values, dimensional parameters and their nondimensional equivalents. For the nondimensionalization of (90)–(98) we use the diffusional time scaling with nondimensional time defined as t = \(t^* \kappa _0 / h_0^2\) where \(h_0\) is the reference length scale and \(\kappa _0\) is the reference thermal diffusivity. With \(\vec {x}\) = \(\vec {x}^*/h_0\) it follows \(\vec {v}\) = \(\vec {v}^* h_0 / \kappa _0\), \({\dot{\epsilon }}\) = \({\dot{\epsilon }}^* h_0^2 / \kappa _0\) and \(\nabla\) = \(\nabla ^* h_0\). We further introduce T = \(({T}^*-{T}^*_s)/{T}_0\), k = \(k^* / k_0\), \(\rho\) = \(\rho ^* / \rho _0\), P = \( {P}^* h^2_0 / (\kappa _0 \eta _0)\) and H = \( {H^* h_0^2 / ( \rho _0 {c_\text{p}}_0 T_0 \kappa _0 )}\). Note that our choices of \( {T}_0\) and \(h_0\) in Table 2 cause the numerical values of dimensional position (in km) and temperature (in \(^\circ\)C) to have the same magnitude as the corresponding nondimensional quantities. Substitution of the nondimensional variables and constants leads to the following set of nondimensional equations for pressure and velocity

$$\begin{aligned}{} & {} \nabla \cdot \vec {v} = 0 \end{aligned}$$
(99)
$$\begin{aligned}{} & {} -\nabla \cdot \left( 2\eta \frac{\nabla \vec {v} + \nabla \vec {v}^T}{2} \right) + \nabla {P} = 0 \end{aligned}$$
(100)

and either a time-dependent equation for temperature

$$ \begin{aligned} {\rho \left( \frac{\partial T}{\partial t} + \vec {v} \cdot \nabla T \right) = \nabla \cdot \left( k \nabla T \right) + H} \end{aligned}$$
(101)

or its equivalent when a steady-state solution is assumed

$$\begin{aligned} {\rho \vec {v} \cdot \nabla T = \nabla \cdot \left( k \nabla T \right) + H} \end{aligned}$$
(102)

The viscosity \(\eta\) is either constant 1 or follows from the dislocation creep formulation (96) with cap (98) as

$$\begin{aligned} \eta =\frac{\eta ^*}{\eta _0} \end{aligned}$$
(103)

Note that for simplicity as well as clarity we form the viscosity function (98) in dimensional form and nondimensionalize the viscosity with the reference viscosity \(\eta _0\).

2.3.3 Geometry, boundary conditions, and initial conditions

A simplified version of the typical geometry used in 2D subduction zone modeling with a kinematically prescribed slab is shown in Fig. 8a. The model is a 2D Cartesian box of width L and depth D. We picture a model with a straight slab surface here but it can also be constructed from a natural spline through a set of control points as in Syracuse et al. (2010) or connected linear segments with different angles with respect to the horizontal as in Wada and Wang (2009). In the models following the geometries of Syracuse et al. (2010) described in part III this simplified geometry is modified by including a curved slab and a coastline. At x = 0, the top of the model is at \((0,z_\text {trench})^T\), for a given depth of the trench, \(z_\text {trench}\). Between x = 0 and x = \(x_\text {coast}\), the presumed horizontal position of the coast, the top of the model shallows linearly to \((x_\text {coast},0)^T\). For \(x>x_\text {coast}\) , the top of the model is at z = 0. Actual choices for these parameters are provided in the zenodo archive linked to in the data availability statement. The kinematic slab approach requires at a minimum that the slab surface velocity with magnitude \({V_s}\) is prescribed. The velocity in the slab, \(\vec {v}_s\), can be determined from the solution of (99) and (100) in the slab (resulting in an extra Stokes equation owing to the discontinuity in velocity and pressure required across the slab above the coupling depth). Alternatively, the velocity in the slab can also be simply prescribed by defining the internal slab velocity to be parallel to and of same magnitude as that of the point on the slab surface closest to the point internal to the slab. For a straight-dipping slab, we have found that either approach leads to very similar temperature solutions; for a curved slab, the use of temperature-dependent viscosity also yields very similar temperature solution at the top of the slab for these two approaches. Here, we take the approach of solving for the velocity in the slab, solving (101) for temperature T in the whole domain and two Stokes equations (99) and (100), one in the wedge for \(\vec {v}\) and P and one in the slab for \(\vec {v}\) = \(\vec {v}_s\) and P = \( {P_s}\). The velocity in the overriding plate, above the slab and down to z = \(z_2\), is always prescribed as \(\vec {v}\) = 0 and the Stokes equation is not solved here.

We use an unstructured mesh of triangular elements to discretize the domain. A typical example, with 1 km element resolution in the region with the most activity is shown in Fig. 8b. On this mesh, we define approximate discrete solutions for velocity, pressure and temperature as

$$\begin{aligned} \vec {v} \approx \tilde{ \vec {v}}&= \sum _j \omega ^k_j v^k_j \end{aligned}$$
(104)
$$\begin{aligned} {P} \approx {\tilde{{P}}}&= \sum _j \chi _j {P}_j \end{aligned}$$
(105)
$$\begin{aligned} {T} \approx {\tilde{{T}}}&= \sum _j \phi _j {T}_j \end{aligned}$$
(106)

with similarly defined discrete test functions, \(\tilde{ \vec {v}}_t\), \({\tilde{{P}}}_t\) and \({\tilde{{{T}}}}_t\) using the same shape functions \(\vec {\omega } _j\) = \(\omega ^k_j\), \(\chi _j\) and \(\phi _j\) for velocity, pressure and temperature at each DOF j, respectively. In the results presented using TerraFERMA, we use a P2P1P2 discretization where \(\vec {\omega } _j\) are piecewise-quadratic, \(\chi _j\) are piecewise linear and \(\phi _j\) are piecewise-quadratic continuous Lagrange functions. The results from Sepran use either the same P2P1P2 discretization (indicated by TH) or a penalty function method (indicated by PF) with quadratic P2 Crouzeix–Raviart (rather than Lagrange) shape functions for the velocity (\(\vec {\omega } _j\)). In this method, the dynamic pressure is eliminated from the Stokes equation (70) by a perturbation of the incompressibility constraint, that is, \(\nabla \cdot \vec {v} = {\epsilon}_{P} {P}\) where \({\epsilon}_{P}\) is a small number. We use \({\epsilon}_{P}\) = \(10^{-6}\) here; see Cuvelier et al. (1986) or King et al. (1990) for details on the elimination process. This method leads to a smaller stiffness matrix compared to that when using Taylor–Hood elements since the pressure unknowns are eliminated. It also results in a positive definite matrix for which more efficient direct solution methods exist. For the temperature shape functions (\(\phi _j\)) Sepran also uses quadratic Lagrange polynomials (resulting in a combined P2P2 discretization). In the penalty function approach, pressure is eliminated from the equations so \(\chi _j\) are not used.

For the heat equation (101), we assume homogeneous natural (or Neumann) boundary conditions along the geometry where the velocity vector points out of the box (i.e., an outflow boundary). At the trench inflow boundary, we assume a half-space cooling model \({T}_\text {trench}(z)\) given by

$$\begin{aligned} \tilde{{T}}(x=0,z) = {T}_\text {trench}(z) = {T}_\text{s} + ({T}_\text{m} - {T}_\text{s} ) \text{erf} \left( \tfrac{z-z_\text {trench}}{z_d} \right) \end{aligned}$$
(107)

where \({T}_\text{s}\) is the nondimensional surface temperature, \({T}_\text{m}\) is the nondimensional mantle temperature, \(z_\text {trench}\) is the nondimensional depth of the trench, and the nondimensional scale depth \(z_d\) is proportional to the dimensional age of the incoming lithosphere \({{{A}}}^*\) via \(z_d = 2 \tfrac{\sqrt{ \kappa _0 {A}^*}}{h_0}\).

Details of the backarc temperature depend on whether we are modeling ocean–continent or ocean–ocean subduction. In the ocean–continent case, we assume a constant surface heat flow \(q_s\) and radiogenic heat production H. We use a two-layer crustal model with density \(\rho\) = \(\rho _c\), thermal conductivity k = \(k_c\) and heat production H = \({H}_1\) from depth 0 to \(z_1\) and heat production H = \({{{H}}}_2\) between depths \(z_1\) and \(z_2\), where \(z_1\) and \(z_2\) vary between subduction zones. The mantle portion of the model (in both the slab and the wedge) is assumed to have density \(\rho\) = \(\rho _m\), conductivity k = \(k_m\) and zero heat production H = 0. At the backarc, the wedge inflow boundary condition on temperature is chosen to be a geotherm \({{{T}}}_{\text {backarc,c}}(z)\) consistent with these parameters, that is,

$${{{\tilde{T}(x = L,z) = T}}}_{{{\text{backarc,c}}}} (z)~ = ~\left\{ {\begin{array}{*{20}l} {{{{T_{s} - \frac{{H_{1} z^{2} }}{{2k_{c} }}~ + ~\frac{{q_{s} }}{{k_{c} }}z}}}} \hfill & {:0 \le z \le z_{1} } \hfill \\ {{{{T}}}_{{{\text{backarc,c}}}} (z = z_{1} ) - \frac{{{{{H}}}_{2} (z - z_{1} )^{2} }}{{2k_{c} }}~ + ~\frac{{q_{1} }}{{k_{c} }}(z - z_{1} )} \hfill & {:z_{1} < z \le z_{2} } \hfill \\ {\min ({{{T}}}_{m} ,{{{T}}}_{{{\text{backarc,c}}}} (z = z_{2} ) + \frac{{q_{2} }}{{k_{m} }}(z - z_{2} ))} \hfill & {:z_{2} < z \le z_{{io}} } \hfill \\ \end{array} } \right.$$
(108)

The discrete heat flow values \(q_i\) are the heat flow at the crustal boundaries at depth z = \(z_i\) that can be found as \(q_1\) = \(q_s - {H}_1 z_1\) and \(q_2\) = \(q_1 - {H}_2 (z_2 - z_1)\). In the ocean–ocean case we use a one-layer crustal model (\(z_1\) is not defined), heat production is zero (H = 0) and the density and thermal conductivity are set to, respectively, \(\rho\) = \(\rho _m\) and k = \(k_m\) everywhere. The wedge inflow boundary condition on temperature down to \(z_\text {io}\) is then

$$\begin{aligned} {\tilde{{T}}}{{(x=L,z) = T}}_\text {backarc,o}(z) ~=~ {{T_\text{s} + (T_\text{m} - T_\text{s}}} ) \text{erf} \left( \tfrac{z}{z_c} \right) \end{aligned}$$
(109)

where \(z_c\) is related to the dimensional age of the overriding plate \( {A}_c^*\) minus the age of subduction \( {A}_s^*\) via \(z_c = 2 \tfrac{\sqrt{ \kappa _0 ({A_c^*-A^*_s})}}{h_0}\). Below \(z_\text {io}\) we assume again a homogeneous Neumann boundary condition for temperature.

For the two Stokes equations, we assume homogeneous (zero stress) Neumann boundary condition on \(\tilde{ \vec {v}}\) and \({\tilde{P}}\) for the wedge in and outflow and on \(\tilde{ \vec {v}}_s\) and \({\tilde{{P}}}_s\) for the slab in and outflow. The top of the wedge at z = \(z_2\) is a rigid boundary, \(\tilde{ \vec {v}}\) = 0, consistent with the imposition of zero flow in the overriding plate. The wedge flow, \(\tilde{ \vec {v}}\), is driven by the coupling of the slab to the wedge below a coupling depth. This is implemented by a Dirichlet boundary condition along the slab surface. Above the coupling depth we impose zero velocity. Below the coupling depth the velocity is parallel to the slab and has magnitude \({V}_s\). It has been found that a smooth transition from zero to full speed over a short depth interval enhances the accuracy of the Stokes solution (see discussion in van Keken et al. (2002) and equations (13)–(15) in van Keken et al. (2008)) so here coupling begins at z = \(d_c\) and ramps up linearly until full coupling is reached at z = \(d_c\)+2.5. For improved numerical accuracy, we specify nodal points at these depths in all models presented here and in part III. At the top of the wedge we imposed a rigid Dirichlet boundary condition at the base of the Moho on the wedge velocity, \(\vec {v}\) = 0. The slab flow, \(\tilde{ \vec {v}}_s\), is driven by the imposition of a Dirichlet boundary condition parallel to the slab with magnitude \({V}_s\) along the entire length of the slab surface, resulting in a discontinuity between \(\tilde{ \vec {v}}\) and \(\tilde{ \vec {v}}_s\) above z = \(d_c\)+2.5.

In the case of time-dependent simulations, we require an initial condition \( {T}^0\). We use an initial condition where the temperature on the slab side is given by \( {T}_\text {trench}\) (107). Above the slab we use \( {T}_\text {backarc,c}\) (108) for ocean–continent subduction or \( {T}_\text {backarc,o}\) (109) for ocean–ocean subduction. Figure 8c shows the initial condition used in the time-dependent benchmark comparison below.

2.3.4 Solution strategy

Sections 2.3.2 and 2.3.3 describe a set of nonlinear, potentially time-dependent equations and boundary conditions for the temperature, velocity and dynamic pressure in a subduction zone. To find their solution, we wish to find the root of the residual \({{\textbf{r}}} = {{\textbf{r}}}_{ \vec {v}} + {\textbf{r}}_{P} + {{\textbf{r}}}_{ \vec {v}_s} + {{\textbf{r}}}_{{P}_s} + {{\textbf{r}}}_{T}\), where

$$\begin{aligned} {{\textbf{r}}}_{ \vec {v}} = r_{ \vec {v}_{i_1}} :=&~ \int _{\Upomega _\text {wedge}} \left[ \left( \frac{\nabla \vec {\omega } _{i_1} + \nabla \vec {\omega } _{i_1}^T}{2}\right) :2\eta \left( \frac{\nabla \tilde{ \vec {v}} + \nabla \tilde{ \vec {v}}^T}{2}\right) - \nabla \cdot \vec {\omega } _{i_1} {\tilde{{P}}} \right] \text{d}x = 0 \end{aligned}$$
(110)
$$\begin{aligned} {{\textbf{r}}}_{P} = r_{P_{i_2}} :=&~ - \int _{\Upomega _\text {wedge}} \chi _{i_2} \nabla \cdot \tilde{ \vec {v}} \text{d}x = 0 \end{aligned}$$
(111)
$$\begin{aligned} {{\textbf{r}}}_{ \vec {v}_s} = r_{ \vec {v}_{s i_3}} :=&~ \int _{\Upomega _\text {slab}} \left[ \left( \frac{\nabla \vec {\omega } _{i_3} + \nabla \vec {\omega } _{i_3}^T}{2}\right) :2\eta \left( \frac{\nabla \tilde{ \vec {v}}_s + \nabla \tilde{ \vec {v}}_s^T}{2}\right) - \nabla \cdot \vec {\omega } _{i_3} {\tilde{{P}}}_s \right] \text{d}x = 0 \end{aligned}$$
(112)
$$\begin{aligned} {{\textbf{r}}}_{{P}_s} = r_{{P}_{s i_4}} :=&~ - \int _{\Upomega _\text {slab}} \chi _{i_4} \nabla \cdot \tilde{ \vec {v}}_s \text{d}x = 0 \end{aligned}$$
(113)

and, in the time-dependent case

$$\begin{aligned} {{\textbf{r}}}_{T}= r_{{T}_{i_5}} :=&~ \int _{\Upomega _\text {wedge}} \left[ \phi _{i_5}\rho \frac{\partial {\tilde{{T}}}}{\partial t} + \phi _{i_5} \tilde{ \vec {v}}\cdot \nabla {\tilde{{T}}} + \nabla \phi _{i_5} \cdot k\nabla {\tilde{{T}}} \right] \text{d}x \nonumber \\&~ + \int _{\Upomega _\text {slab}} \left[ \phi _{i_5}\rho \frac{\partial {\tilde{{T}}}}{\partial t} + \phi _{i_5} \tilde{ \vec {v}}_s\cdot \nabla {\tilde{{T}}} + \nabla \phi _{i_5} \cdot k\nabla {\tilde{{T}}} \right] \text{d}x \nonumber \\&~ + \int _{\Upomega _\text {crust}} \left[ \phi _{i_5}\rho \frac{\partial {\tilde{{T}}}}{\partial t} + \nabla \phi _{i_5} \cdot k\nabla {\tilde{{T}}} - \phi _{i_5} {H}\right] \text{d}x = 0 \end{aligned}$$
(114)

Here, \(\Upomega _\text {wedge}\), \(\Upomega _\text {slab}\) and \(\Upomega _\text {crust}\) are subsets of the domain corresponding to the mantle wedge, slab and overriding crust, respectively. We have yet to discretize the time derivative \(\frac{\partial {\tilde{{T}}}}{\partial t}\) in (114). Here, we choose to do this using finite differences, approximating the derivative by the difference between two discrete time levels

$$\begin{aligned} \frac{\partial {\tilde{{T}}}}{\partial t} \approx \frac{{\tilde{{T}}}^{n+1} - {\tilde{{T}}}^n}{\Delta t^n} \end{aligned}$$
(115)

where \(\Delta t^n = t^{n+1} - t^n\) is the time-step, the difference between the old and new times, and \({\tilde{{T}}}^{n+1}\) and \({\tilde{{T}}}^n\) represent the solution at these time levels. It then only remains to define at what time level the other coefficients in (114) are evaluated and we do this using a “theta” scheme such that

$$\begin{aligned} {{\textbf{r}}}_{T} = r_{{T}_{i_5}} :=&~ \int _{\Upomega _\text {wedge}} \left[ \phi _{i_5}\rho \left( \frac{{\tilde{{T}}}^{n+1} - {\tilde{{T}}}^n}{\Delta t^n}\right) + \phi _{i_5} \tilde{ \vec {v}}^\theta \cdot \nabla {\tilde{{T}}}^\theta + \nabla \phi _{i_5} \cdot k\nabla {\tilde{{T}}}^\theta \right] \text{d}x \nonumber \\&~ + \int _{\Upomega _\text {slab}} \left[ \phi _{i_5}\rho \left( \frac{{\tilde{{T}}}^{n+1} - {\tilde{{T}}}^n}{\Delta t^n}\right) + \phi _{i_5} \tilde{ \vec {v}}_s^\theta \cdot \nabla {\tilde{{T}}}^\theta + \nabla \phi _{i_5} \cdot k\nabla {\tilde{{T}}}^\theta \right] \text{d}x \nonumber \\&~ + \int _{\Upomega _\text {crust}} \left[ \phi _{i_5}\rho \left( \frac{{\tilde{{T}}}^{n+1} - {\tilde{{T}}}^n}{\Delta t^n}\right) + \nabla \phi _{i_5} \cdot k\nabla {\tilde{{T}}}^\theta - \phi _{i_5} {H}\right] \text{d}x = 0 \end{aligned}$$
(116)

where \(\tilde{ \vec {v}}^\theta = \theta _v \tilde{ \vec {v}}^{n+1} + (1-\theta _v)\tilde{ \vec {v}}^n\), \(\tilde{ \vec {v}}_s^\theta = \theta _v \tilde{ \vec {v}}_s^{n+1} + (1-\theta _v)\tilde{ \vec {v}}_s^n\) and \({\tilde{{T}}}^\theta = \theta {\tilde{{T}}}^{n+1} + (1-\theta _v){\tilde{{T}}}^n\), and \(\theta _v\), \(\theta \in [0,1]\) are parameters controlling what time level the coefficients are evaluated at. The parameter \(\theta\) controls the stability and accuracy of the time integration scheme. Common choices are \(\theta\) = 0 (explicit Euler), \(\theta\) = 1 (implicit Euler) and \(\theta\) = 0.5 (Crank–Nicolson).

At each time level, (110)–(113) and (116) represent a nonlinear problem, which we solve using a Picard iteration, first solving (116), then solving (110)–(113) using the most up to date temperature, \({\tilde{{T}}}^{n+1}\), and repeating until the root of the residual, \({{\textbf{r}}}\), is found to some tolerance. The time level and all solution variables are then updated and a new time level and new Picard iteration commenced. The time-step \(\Delta t^n\) is chosen such that the maximum Courant number, \(c^n_\text {max}\) = \(\max \left( \frac{\max \left( \tilde{ \vec {v}}^n\right) \Delta t^n}{h_e}, \frac{\max \left( \tilde{ \vec {v}}_s^n\right) \Delta t^n}{h_e}\right)\), where \(h_e\) is a measure of the local element size, does not exceed some critical value, \(c^n_\text {max} \le c_\text {crit}\). This procedure is repeated until the final time (the age of subduction, \({A}_s^*\)) is reached.

If we are seeking the steady-state solution (\(\tfrac{\partial {T}}{\partial t} = 0\)), we solve (110)–(113) but (114) becomes

$$\begin{aligned} {{\textbf{r}}}_{T} = r_{{T}_{i_5}} :=&~ \int _{\Upomega _\text {wedge}} \left[ \phi _{i_5} \tilde{ \vec {v}}\cdot \nabla {\tilde{{T}}} + \nabla \phi _{i_5} \cdot k\nabla {\tilde{{T}}} \right] \text{d}x \nonumber \\&~ + \int _{\Upomega _\text {slab}} \left[ \phi _{i_5} \tilde{ \vec {v}}_s\cdot \nabla {\tilde{{T}}} + \nabla \phi _{i_5} \cdot k\nabla {\tilde{{T}}} \right] \text{d}x \nonumber \\&~ + \int _{\Upomega _\text {crust}} \left[ \nabla \phi _{i_5} \cdot k\nabla {\tilde{{T}}} - \phi _{i_5} {H}\right] \text{d}x = 0 \end{aligned}$$
(117)

where a theta-scheme approach is no longer required because no time levels exist. A Picard iteration is used to approximately find \({{\textbf{r}}}\) = \({{\textbf{0}}}\), this time solving (110)–(113) first followed by (117). At the beginning of the simulation, we find an isoviscous (\(\eta\) = 1) solution to (110)–(113) to initialize the velocity and pressure.

2.3.5 An optimized subduction zone benchmark

The community subduction zone benchmark in van Keken et al. (2008) provides a set of simplified models well suited to test the accuracy of the solution of the governing equations that are relevant for subduction zones. Unfortunately, the model geometry and assumptions that were chosen at the time are such that they introduce a few artifacts that do not occur, as best as we know, in any subduction zone on Earth. These artifacts include a slab that dips at a constant angle of 45\(^\circ\) to 600 km depth, an overriding plate that excludes continental heat production, and the imposition of slab-wedge coupling at 50 km rather than at 75–80 km depth. The lack of crustal heating and the large width of the model, combined with the assumption of steady state, lead in the cases with temperature-dependent rheology to a very thick top boundary layer. This is caused by the cooling in the lithosphere, which results in a gradual thickening of the overriding lid in regions of the model that are far away from the arc-side boundary condition. While this is less of a problem in time-dependent problems (where time may not be sufficient for significant growth of the boundary layer), it shows up dramatically as a “viscous belly” in steady-state cases when the model domain is large (as it was in van Keken et al. 2008). In time-dependent models, it can show up if integration time is very long compared to the typical age of subduction zones (Hall 2012). The models in Syracuse et al. (2010) avoided this issue by using time integration to only \({\sim{\text{20-40}}}\) Myr. The models in Wada and Wang (2009) avoided it using steady-state models in a domain that is both narrower and shallower than that of the van Keken et al. (2008) benchmark.

To mitigate the artifacts of the previous benchmark, we propose a new benchmark model. Modifications include a more shallowly dipping slab that only extends to a depth of 200 km, the incorporation of radiogenic heating in the overriding crust and a deeper slab-wedge coupling point. We will also replace some of the requested model outputs from van Keken et al. (2008) with proper integrals. We will use the simplified geometry as in Fig. 8 with constant slab dip \(\Delta\) = \(\tan ^{-1}(1/2)\) = 26.56505\(^\circ\) with respect to the horizontal. The maximum depth D = 200 defines L = 400. Crustal depths \(z_1\) and \(z_2\) are chosen as 15 and 40, respectively. \(z_\text {io}\) depends on wedge geometry and rheology and is therefore variable between models. To find this, we performed a simple iteration in the modeling by setting \(z_\text {io}\) first to a constant value, finding the solution to the nonlinear system, determining the actual value of \(z_\text {io}\) from the wedge flow and then imposing this value in a subsequent solution of the nonlinear system. While this approach guarantees appropriate implementation of the switch from Dirichlet to Neumann boundary condition for the heat equation as stated above, we have found that as long as \(z_\text {io}\) is larger than the depth where the actual switch between inflow and outflow occurs nearly identical solutions are obtained.

We will assume the reference values in Table 2 with case-specific parameters given in Table 3. The benchmark assumes ocean-continent subduction with heat production in a two-layer crust with crustal density and thermal conductivity (\(\rho _c\) and \(k_c\), respectively) distinct from the mantle (\(\rho _m\) and \(k_m\)) and a backarc boundary condition on temperature given by \({T}_\text {backarc,c}(z)\) (108). We will solve (110)–(113) either with constant viscosity (\(\eta\) = 1, case 1) or with temperature- and strain-rate-dependent viscosity following (103) (case 2). The heat equation will be solved under the assumption of steady state (117) for the benchmark, but we will also discuss some time-dependent results below. For the incoming lithosphere, we will assume \(z_d=97.397\) (corresponding to a dimensional age of the incoming lithosphere \({A}^*\) = 100 Myr) and convergence speed \({V}_s\) = 4.2166 (corresponding to a dimensional speed of 10 cm/yr).

2.3.6 Benchmark comparison TerraFERMA–Sepran

In the benchmark comparison, we focus on dimensional metrics representing the averaged thermal and velocity structures near the coupling point where gradients in velocity and temperature are high. The first metric is the slab temperature at 100 km depth, \({T}_{(200,-100)}^*\)

$$\begin{aligned} {T}_{(200,-100)}^* ~=~ {T}_0 {\tilde{{T}}}(x=200,y=-100) \end{aligned}$$
(118)

The second metric is the average integrated temperature \({\overline{{T}}}_s^*\) along the slab surface between depths \(z_{s,1}\) = 70 and \(z_{s,2}\) = 120, that is,

$$\begin{aligned} {\overline{{T}}}_s^* ~=~ {T}_0 \frac{\int _{s_1}^{s_2} {\tilde{{T}}} \text{d}s}{\int _{s_1}^{s_2} \text{d}s} \end{aligned}$$
(119)

where s is distance along the slab top from the trench and \(s_1\) = \(\sqrt{5z_{s,1}^2}\) = 156.5248 and \(s_2\) = \(\sqrt{5z_{s,2}^2}\) = 268.32816. The third metric is the volume-averaged temperature \({\overline{{T}}}_w^*\) in the mantle wedge corner below the Moho, z = \(z_2\) and above where the slab surface, z = \(z_\text {slab}(x)\), is between \(z_{s,1}\) and \(z_{s,2}\) as defined above

$$\begin{aligned} {\overline{{T}}}_w^* ~=~ {T}_0 \frac{\int _{x=140}^{x=240}\int _{z=z_2}^{z=z_\text {slab}(x)} {\tilde{{T}}} \text{d}z \text{d}x}{\int _{x=140}^{x=240}\int _{z=z_2}^{z=z_\text {slab}(x)} \text{d}z \text{d}x} \end{aligned}$$
(120)

where \(z_{\text{slab}}(x)\) = x/2. The final metric is the root-mean-squared averaged velocity \({V}_{{\text{rms}},w}^*\) in the same volume as the third metric, that is,

$$\begin{aligned} {V}_{\text {rms},w}^*~=~ v_0 \sqrt{ \frac{\int _{x=140}^{x=240}\int _{z=z_2}^{z=z_{\text {slab}}(x)} \left( \tilde{ \vec {v}}\cdot \tilde{ \vec {v}} \right) \text{d}z\text{d}x}{\int _{x=140}^{x=240}\int _{z=z_2}^{z=z_\text {slab}(x)} \text{d}z \text{d}x} }. \end{aligned}$$
(121)

Figure 9 shows the temperature fields obtained with TerraFERMA and temperature differences between the TerraFERMA and Sepran models. Convergence behavior on a series of finer meshes as a function of the number of degrees of freedom in the heat equation using metrics (118)–(121) is shown in Tables 4 and 5.

Note that even on the coarser grids the metrics are generally within less than 1% from those at the finest grids. The TerraFERMA and Sepran results tend to converge toward the same limit to reasonable precision for case 1. There seems to be a slight, but systematic difference particularly for \({\overline{{T}}}_w^*\) and \({V}_{\text {rms},w}^*\) for case 2. Inspection of Fig. 9e shows the likely reason for the differences—a systematic bubble shows in \(\Delta {T}\) right above the coupling point. We attribute this to how the two methods treat pressure and we will see more examples of this in part III.

2.3.7 Comparison of the time-dependent solution to that assuming steady state

Solving for the time-dependent solution given the same geometry, boundary conditions and parameters demonstrates how similar the steady-state and time-dependent solutions are after sufficient time in this optimized benchmark. The time-dependent slab top temperature evolution until \(t^*\,=\, {A}_s^*\,=\,25 \,\,\text{Myr}\) is shown in Fig. 10a and that at the Moho is in Fig. 10b. In both cases, we plot the temperature to the depth that the subducting slab has reached after a given time interval. The temperature curves show a gradual convergence to the steady-state solution (the dashed line). The temperature at 25 Myr is given in Fig. 10c (compare with Fig. 9d) and the temperature difference between that at 25 Myr and the steady-state case is shown in Fig. 10d—clearly the forearc thermal structure is the slowest part of the model to adjust to steady state.

The benchmark has been designed to give a near-steady-state solution close to the time-dependent solution after 25 Myr. However, this similarity is not generally the case in other geometries so time-dependent solutions remain necessary when considering a larger suite of models and therefore form the bulk of the results presented in part III. Due to the slow evolution of the subduction system, we found in the time-dependent version of the benchmark that fully converging the residual, \({{\textbf{r}}}\), was not necessary for an accurate solution, making extremely minor differences after 25 Myr of evolution. Linearizing the problem and only taking a single Picard iteration at each time level represent a considerable computational cost saving so we adopt that approach in part III. TerraFERMA results are presented using \(\theta _v\) = \(\theta\) = 0.5. Sepran uses \(\theta _v\) = \(\theta\) = 1 and both use \(c_\text {crit} = 1\) in all time-dependent results shown there.

3 Conclusions

By constructing a series of demonstration problems, we have shown how finite element models can be constructed, tested, and validated. Once validated, these simpler systems of equations can be used as building blocks to develop a kinematic–dynamic model of subduction zone thermal structure. We propose a new benchmark problem for subduction zones that incorporates more of the physical complexity associated with their thermal structure while avoiding some of the pitfalls associated with nonphysical geometries and assumptions of the original van Keken et al. (2008) benchmark. This has been demonstrated with two independent finite element approaches (TerraFERMA and Sepran) that also use different discretization strategies. In part III, we will use these models and apply the discretization and solution strategies described here to a global suite of subduction zones. We will discuss where they agree and disagree, both with each other and with published observations of subduction zone thermal structure.

Fig. 1
figure 1

a Illustration of the discretization of the 1D unit domain into four elements \(e_k\) with five nodal points \(x_i\). The two linear (P1) Lagrange shape functions \(\phi _i\) are shown that are nonzero in element \(e_2\). b Illustration of quadratic (P2) shape functions that are nonzero on element \(e_2\). The mesh still has four elements, but each element now has internal nodal points (indicated by open red circles)

Fig. 2
figure 2

FEM solution to 1D Poisson equation. a Approximate FEM solution obtained using a mesh of four P1 elements compared to analytical solution (black). Discrete values from the solution vector \({{\textbf{T}}}\) are shown in red circles. The interpolated approximate solution is shown by the green dashed line. b Same as a) but now for four P2 elements

Fig. 3
figure 3

Convergence analysis for the 1D Poisson problem with P1 (blue) and P2 (orange) elements. The error following metric (32) is shown as a function of nodal point spacing h. Symbols represent individual experiments—the lines show the trend. With P1 we find second-order convergence, whereas with P2 we find a smaller error at a given h and a faster, third-order, convergence rate

Fig. 4
figure 4

Poisson 2D example. Convergence of the error as a function of nodal point spacing h for P1 and P2 elements. Inset: solution field T

Fig. 5
figure 5

Batchelor cornerflow geometry and example model solution. a Specification of Cartesian (xy) and polar (\(r, \theta\)) coordinate systems as well as boundary conditions. b Solution for \(\psi\) and \(\vec {v}\) on geometry \(\Upomega =[0,1]\times [0,1]\) with U = 1. Stream function contours are at arbitrary intervals

Fig. 6
figure 6

a Thermal convection benchmark description. b Select model solution for case 1a from Blankenbach et al. (1989)

Fig. 7
figure 7

Convergence characteristics for Nu (frame a) and \({V}_\text {rms}\) (frame b) for the thermal convection benchmarks 1a–1c and 2a. Dashed lines with triangular symbols denote Sepran results. TerraFERMA results are shown by the solid lines with circles. The difference between the original Blankenbach “best” estimates and our new extrapolated results is shown by the diamonds at an arbitrary point on the x-axis

Fig. 8
figure 8

a Geometry and coefficients for a simplified 2D subduction zone model specifically for the proposed new benchmark. All coefficients and parameters in the graph are nondimensional. The decoupling point is indicated by the star. b Example mesh (upper frame) constituted of triangles for the new benchmark geometry with zoom in (lower frame). This particular example is for TerraFERMA with 83,935 degrees of freedom in the heat equation. Size of the finite elements ranges from 1 km near the coupling point where the solution gradients are highest to up to 6 km away from thermal boundary layers. Red solid line is the top of the slab. Dashed red line is the slab Moho. c) Initial condition for the time-dependent benchmark problem

Fig. 9
figure 9

Steady-state thermal structure for the updated subduction zone benchmark. a Temperature predicted by TF for case 1; b temperature difference between TF and Sepran using the penalty function (PF) method for case 1 at \(f_m\) = 1 where \(f_m\) represents the smallest element sizes in the finite element grids near the coupling point; c slab top temperature comparison for case 1; (df). As ac but now for case 2. The star indicates the position or temperature conditions at the coupling point

Fig. 10
figure 10

Time-dependent example based on the new subduction zone benchmark. a Evolution of slab top temperature as a function of time—curves are plotted only to the depth that the slab tip has reached at each time; b As a but now for the slab Moho; c temperature at 25 Myr (compare to steady-state thermal structure in Fig. 9d); d difference between temperature of the time-dependent solution after 25 Myr and the steady-state solution—while the slab thermal structure is nearly identical, the cold corner is still evolving at 25 Myr toward the steady-state structure. Star indicates the location of the coupling point or its steady-state temperature

Table 1 Best values from Blankenbach et al. (1989) and our averaged extrapolated values from current models for selected benchmark values (see text)
Table 2 Nomenclature and reference values
Table 3 Benchmark parameter values
Table 4 Convergence of various metrics describing the solution to the new subduction zone benchmark as a function of degrees of freedom in the heat equation \({T}_\text {ndof}\). The employed meshes have grid refinement in the wedge above and near the coupling point. The factor \(f_m\) is representative of the element size near the coupling point. TH Taylor–Hood, PF penalty function method. P2P1P2 indicates a discretization that has quadratic shape functions (P2) for velocity and temperature and linear shape functions for pressure (P1). P2P2 is for velocity and temperature only because pressure is eliminated from the Stokes equation in the penalty function method (Cuvelier et al. 1986). In this case, \(z_\text {io}\) = 139
Table 5 As Table 4 but now for case 2 with stress- and temperature-dependent viscosity. In this case, \(z_\text {io}\) = 154