Discrete Adjoint Computations for Relaxation Runge–Kutta Methods

Bencomo, Mario J.; Chan, Jesse

doi:10.1007/s10915-023-02102-y

Discrete Adjoint Computations for Relaxation Runge–Kutta Methods

Published: 01 February 2023

Volume 94, article number 59, (2023)
Cite this article

Journal of Scientific Computing Aims and scope Submit manuscript

258 Accesses
2 Citations
1 Altmetric
Explore all metrics

Abstract

Relaxation Runge–Kutta methods reproduce a fully discrete dissipation (or conservation) of entropy for entropy stable semi-discretizations of nonlinear conservation laws. In this paper we derive the discrete adjoint of relaxation Runge–Kutta schemes, which are applicable to discretize-then-optimize approaches for optimal control problems. Furthermore, we prove that the derived discrete relaxation Runge–Kutta adjoint preserves time-symmetry when applied to linear skew-symmetric systems of ODEs. Numerical experiments verify these theoretical results while demonstrating the importance of appropriately treating the relaxation parameter when computing the discrete adjoint.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Geometric Methods for Adjoint Systems

Article Open access 19 December 2023

On the Consistency of Runge–Kutta Methods Up to Order Three Applied to the Optimal Control of Scalar Conservation Laws

A space–time variational method for optimal control problems: well-posedness, stability and numerical solution

Article Open access 25 July 2023

Data Availability

The datasets generated during and/or analyzed during the current study are available from the corresponding author on reasonable request.

References

Alexander, R.: Diagonally implicit Runge–Kutta methods for stiff ODE’s. SIAM J. Numer. Anal. 14(6), 1006–1021 (1977)
Article MathSciNet MATH Google Scholar
Alexe, M., Sandu, A.: On the discrete adjoints of adaptive time stepping algorithms. J. Comput. Appl. Math. 233(4), 1005–1020 (2009)
Article MathSciNet MATH Google Scholar
Antil, H., Leykekhman, D.: A brief introduction to PDE-constrained optimization. In: Frontiers in PDE-Constrained Optimization, pp. 3–40. Springer (2018)
Bencomo, M.J., Symes, W.W.: Discretization of multipole sources in a finite difference setting for wave propagation problems. J. Comput. Phys. 386, 296–322 (2019)
Article MathSciNet MATH Google Scholar
Calvo, M., Hernández-Abreu, D., Montijano, J.I., Rández, L.: On the preservation of invariants by explicit Runge–Kutta methods. SIAM J. Sci. Comput. 28(3), 868–885 (2006)
Article MathSciNet MATH Google Scholar
Eberhard, P., Bischof, C.: Automatic differentiation of numerical integration algorithms. Math. Comput. 68(226), 717–731 (1999)
Article MathSciNet MATH Google Scholar
Griewank, A.: A mathematical view of automatic differentiation. Acta Numer. 12, 321–398 (2003)
Article MathSciNet MATH Google Scholar
Griewank, A., Bischof, C., Corliss, G., Carle, A., Williamson, K.: Derivative convergence for iterative equation solvers. Optim. Methods Softw. 2(3–4), 321–355 (1993)
Article Google Scholar
Griewank, A., Walther, A.: Evaluating Derivatives: Principles and Techniques of Algorithmic Differentiation. SIAM, Philadelphia (2008)
Book MATH Google Scholar
Gunzburger, M.D.: Perspectives in Flow Control and Optimization. SIAM, Philadelphia (2002)
Book Google Scholar
Hager, W.W.: Runge–Kutta methods in optimal control and the transformed adjoint system. Numer. Math. 87(2), 247–282 (2000)
Article MathSciNet MATH Google Scholar
Hairer, E., Lubich, C., Wanner, G.: Geometric Numerical Integration: Structure-Preserving Algorithms for Ordinary Differential Equations, vol. 31. Springer, New York (2006)
MATH Google Scholar
Hernandez, D.M., Bertschinger, E.: Time-symmetric integration in astrophysics. Mon. Not. R. Astron. Soc. 475(4), 5570–5584 (2018)
Article Google Scholar
Ketcheson, D.I.: Relaxation Runge–Kutta methods: conservation and stability for inner-product norms. SIAM J. Numer. Anal. 57(6), 2850–2870 (2019)
Article MathSciNet MATH Google Scholar
Moczo, P., Robertsson, J.O., Eisner, L.: The finite-difference time-domain method for modeling of seismic wave propagation. Adv. Geophys. 48, 421–516 (2007)
Article Google Scholar
Persson, P.O.: High-order Navier–Stokes simulations using a sparse line-based discontinuous Galerkin method. In: 50th AIAA Aerospace Sciences Meeting including the New Horizons Forum and Aerospace Exposition, p. 456 (2012)
Plessix, R.E.: A review of the adjoint-state method for computing the gradient of a functional with geophysical applications. Geophys. J. Int. 167(2), 495–503 (2006)
Article Google Scholar
Ranocha, H., Ketcheson, D.I.: Convex Relaxation Runge–Kutta. Relaxation Runge–Kutta methods for convex functionals. https://github.com/ranocha/ConvexRelaxationRungeKutta (2019). https://doi.org/10.5281/zenodo.3066518
Ranocha, H., Lóczi, L., Ketcheson, D.I.: General relaxation methods for initial-value problems with application to multistep schemes. Numer. Math. 146(4), 875–906 (2020)
Article MathSciNet MATH Google Scholar
Ranocha, H., Sayyari, M., Dalcin, L., Parsani, M., Ketcheson, D.I.: Relaxation Runge–Kutta methods: fully discrete explicit entropy-stable schemes for the compressible Euler and Navier–Stokes equations. SIAM J. Sci. Comput. 42(2), A612–A638 (2020)
Article MathSciNet MATH Google Scholar
Rothauge, K.: The discrete adjoint method for high-order time-stepping methods. Ph.D. thesis, University of British Columbia (2016)
Sandu, A.: On the properties of Runge–Kutta discrete adjoints. In: International Conference on Computational Science, pp. 550–557. Springer (2006)
Sanz-Serna, J.M.: Symplectic Runge–Kutta schemes for adjoint equations, automatic differentiation, optimal control, and more. SIAM Rev. 58(1), 3–33 (2016)
Article MathSciNet MATH Google Scholar
Shu, C.W., Osher, S.: Efficient implementation of essentially non-oscillatory shock-capturing schemes. J. Comput. Phys. 77(2), 439–471 (1988)
Article MathSciNet MATH Google Scholar
Sirkes, Z., Tziperman, E.: Finite difference of adjoint or adjoint of finite difference? Mon. Weather Rev. 125(12), 3373–3378 (1997)
Article Google Scholar
Virieux, J.: SH-wave propagation in heterogeneous media: velocity-stress finite-difference method. Geophysics 49(11), 1933–1942 (1984)
Article Google Scholar
Walther, A.: Automatic differentiation of explicit Runge–Kutta methods for optimal control. Comput. Optim. Appl. 36(1), 83–108 (2007)
Article MathSciNet MATH Google Scholar
Wilcox, L.C., Stadler, G., Bui-Thanh, T., Ghattas, O.: Discretely exact derivatives for hyperbolic PDE-constrained optimization problems discretized by the discontinuous Galerkin method. J. Sci. Comput. 63(1), 138–162 (2015)
Article MathSciNet MATH Google Scholar

Download references

Acknowledgements

Mario J. Bencomo was supported by the Pfeiffer Postdoctoral Instructorship. Jesse Chan gratefully acknowledges support from the National Science Foundation under award DMS-CAREER-1943186.

Author information

Authors and Affiliations

Department of Mathematics, California State University Fresno, 5245 N. Backer Ave. M/S PB 108, Fresno, CA, 93740, USA
Mario J. Bencomo
Department of Computational Applied Mathematics and Operations Research, Rice University, 6100 Main St., Houston, TX, 77005, USA
Jesse Chan

Authors

Mario J. Bencomo
View author publications
You can also search for this author in PubMed Google Scholar
Jesse Chan
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

All authors contributed to the study conception and design. Material preparation, data collection and analysis were performed by MJB. The first draft of the manuscript was written by MJB and all authors commented on previous versions of the manuscript. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Mario J. Bencomo.

Ethics declarations

Conflict of interest

The authors have no relevant financial or non-financial interests to disclose.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

In this appendix we present in detail derivations of linearization and adjoint computations for RK and its relaxation variant using a matrix representation. We use an approach and notation similar to [21]. The plan is to interpret time-stepping algorithms as solutions to global matrix-vector systems and use the Jacobians of said systems to deduce a time stepping scheme for the discrete linearization and adjoint.

We first recap and introduce some notation.

N is the dimension of the state vector $\textbf{y}(t)$ in IVP (1);
K is the total number of time steps taken by a given time-stepping scheme;
s is the number of internal stages for an RK method;
$\textbf{A}_s, \textbf{b}_s, \textbf{c}_s$ are the coefficients of a specified s-stage RK method;
The concatenation of vectors indexed by internal stages is denoted by simply removing the internal stage index, e.g.,
$$\begin{aligned} \textbf{Y}_k := \left( \begin{array}{ccccccccc} \textbf{Y}_{k,1} \\ \vdots \\ \textbf{Y}_{k,s}\\ \end{array}\right) \in {\mathbb {R}}^{sN}; \end{aligned}$$
Vectors of size $\overline{N} :=N + N(s+1)K$ are denoted using bold font and an overline and have components denoted as follows:
$$\begin{aligned} \textbf{y} := \left( \begin{array}{ccccccccc} \textbf{y}_0 \\ \textbf{Y}_1 \\ \textbf{y}_1 \\ \vdots \\ \textbf{Y}_{K} \\ \textbf{y}_{K}\\ \end{array}\right) , \end{aligned}$$
with $\textbf{y}_k\in {\mathbb {R}}^N$ and $\textbf{Y}_k\in {\mathbb {R}}^{sN}$;
Matrix $\varvec{\chi }_k\in {\mathbb {R}}^{\overline{N}\times N}$ is defined such that $\varvec{\chi }_k^\top \textbf{y} = \textbf{y}_k$, extracting the k-th step vector. Moreover, for a given $\textbf{v}\in {\mathbb {R}}^N$, then $\textbf{w} :=\varvec{\chi }_k \textbf{v}$ is vector of length $\overline{N}$ with zero entries everywhere except at $\textbf{w}_k = \textbf{v}$;
$\textbf{I}_{M}$ denotes the $M\times M$ identity matrix;
$\textbf{0}_{M}$ is the zero vector of dimension M;
$\textbf{0}_{M_1\times M_2}$ is the $M_1\times M_2$ zero matrix;
$\otimes $ denotes the Kronecker product.

1.1 RK Matrix-Representation

A single step of the RK method, as specified by Eq. (2), can be written in matrix form as

$$\begin{aligned} \left( \begin{array}{ccccccccc} -\textbf{C} &{} \textbf{I}_{sN} &{} \\ -\textbf{I}_N &{} &{} \textbf{I}_N \\ \end{array}\right) \left( \begin{array}{ccccccccc} \textbf{y}_{k-1} \\ \textbf{Y}_{k} \\ \textbf{y}_k \\ \end{array}\right) - \left( \begin{array}{ccccccccc} \textbf{A} \textbf{F}_{k} \\ \textbf{B}^\top \textbf{F}_{k}\\ \end{array}\right) = \left( \begin{array}{ccccccccc} \textbf{0}_{sN} \\ \textbf{0}_N\\ \end{array}\right) \end{aligned}$$

where

$\textbf{A} :=\varDelta t \textbf{A}_s \otimes \textbf{I}_N \in {\mathbb {R}}^{sN\times sN}$,
$\textbf{B} :=\varDelta t \textbf{b}_s \otimes \textbf{I}_N \in {\mathbb {R}}^{sN\times N}$,
$\textbf{C} :=\textbf{1}_s \otimes \textbf{I}_N \in {\mathbb {R}}^{sN\times N}$, with $\textbf{1}_s :=(1,\ldots ,1)^\top \in {\mathbb {R}}^{s},$
$\textbf{F}_k$ is the concatenation of the $\textbf{F}_{k,i}:=\textbf{f}(\textbf{Y}_{k,i},t_{k-1}+c_i \varDelta t)$, and subsequently can be viewed as a vector valued function of $\textbf{Y}_k$.

The RK method as a whole can be represented as a concatenation of the matrix systems above, resulting in a global system of time-stepping equations:

$$\begin{aligned} \textbf{E}(\textbf{y}) :=\textbf{L} \textbf{y} - \textbf{N}(\textbf{y}) - \varvec{\chi }_0 \textbf{y}_{\textrm{init}} = \textbf{0} \end{aligned}$$

with

We make some remarks and observations:

Boxes are meant to help visually separate blocks associated with different time steps in both $\textbf{L}$ and $\textbf{N}(\textbf{y})$.
The unboxed $\textbf{I}_N$ in the top left corner of $\textbf{L}$ is related to the enforcement of the initial condition.
$\textbf{L}\in {\mathbb {R}}^{\overline{N}\times \overline{N}}$ is lower unit triangular, though not quite block diagonal due to some slight overlap in columns.
Given the repeating block structure of matrices presented here, we only write out the blocks associated with the initial/final conditions and two subsequent time steps. Dots indicate a repeating pattern with the understanding that the block structure repeats K times with appropriate indexing when relevant.
$\textbf{N}:{\mathbb {R}}^{\overline{N}} \rightarrow {\mathbb {R}}^{\overline{N}}$ is block lower triangular in the sense that the k-th $N(s+1)$ block of the output does not depend on the j-th $N(s+1)$ block of the input, for $j>k$. For example, the $N(s+1)$ block of $\textbf{N}(\textbf{y})$ associated with the k-th time step, is
$$\begin{aligned} \left( \begin{array}{ccccccccc} \textbf{A}\textbf{F}_k \\ \textbf{B}^\top \textbf{F}_k\\ \end{array}\right) \end{aligned}$$
which only depends on $\textbf{Y}_k$, and not on $\textbf{y}$ at later time steps.

The Jacobian of the global time-stepping operator $\textbf{E}(\textbf{y})$ is given by

where $\textbf{E}_k$ is an $N(s+1)\times N(s+2)$ block matrix,

$$\begin{aligned} \textbf{E}_k = \left( \begin{array}{ccccccccc} -\textbf{C} &{} \textbf{I}_{sN}-\textbf{A}\textbf{J}_k&{} \textbf{0}_{sN\times N} \\ -\textbf{I}_N &{} -\textbf{B}^\top \textbf{J}_k &{} \textbf{I}_N \\ \end{array}\right) , \end{aligned}$$

that overlaps with $\textbf{E}_{k+1}$ over N columns, and

$$\begin{aligned} \textbf{J}_k&:=\textrm{diag}(\textbf{J}_{k,1}, \cdots , \textbf{J}_{k,s}) \in {\mathbb {R}}^{sN\times sN},\\ \textbf{J}_{k,i}&:=\frac{\partial \textbf{f}}{\partial \textbf{y}}(\textbf{Y}_{k,i},t_{k-1}+c_i\varDelta t) \in {\mathbb {R}}^{N\times N}. \end{aligned}$$

We see that the block lower triangular structure of the operator $\textbf{N}$ yields a proper block lower triangular Jacobian $\textbf{E}'$.

Linearized RK formulas (8) are derived by solving linear system $\textbf{E}'(\textbf{y}) \mathbf {\delta }= \textbf{w}$ via forward substitution thus yielding a time-stepping algorithm. In particular, each time step is associated with solving the following system for $\varvec{\varDelta }_k$ and $\varvec{\delta }_k$, with $\varvec{\delta }_{k-1}$ given in the previous time step:

$$\begin{aligned}{} & {} \textbf{E}_k \left( \begin{array}{ccccccccc} \varvec{\delta }_{k-1} \\ \varvec{\varDelta }_k \\ \varvec{\delta }_{k}\\ \end{array}\right) = \left( \begin{array}{ccccccccc} \textbf{W}_k\\ \textbf{w}_k\\ \end{array}\right) , \\{} & {} \implies \; \left\{ \begin{aligned} \varvec{\varDelta }_{k,i}&= \varvec{\delta }_{k-1} + \varDelta t \sum _{j=1}^s a_{ij} \textbf{J}_{k,j}\varvec{\varDelta }_{k,j} + \textbf{W}_{k,i},\quad i=1,\ldots ,s, \\ \varvec{\delta }_k&= \varvec{\delta }_{k-1} + \varDelta t\sum _{i=1}^s b_i \textbf{J}_{k,i} \varvec{\varDelta }_{k,i} + \textbf{w}_k. \end{aligned}\right. \end{aligned}$$

$$\begin{aligned} \textbf{E}'(\textbf{y}) = \textbf{L} - \textbf{N}'(\textbf{y}) \end{aligned}$$

Taking the transpose of $\textbf{E}'(\textbf{y})$, we have

where $\textbf{G}_k$ is an $N(s+1)\times N(s+2)$ block matrix,

$$\begin{aligned} \textbf{G}_k = \left( \begin{array}{ccccccccc} \textbf{I}_N &{} -\textbf{C}^\top &{} -\textbf{I}_N \\ \textbf{0}_{sN\times N} &{} \textbf{I}_{sN}-\textbf{J}^\top _{k}\textbf{A}^\top &{} -\textbf{J}^\top _{k}\textbf{B}\\ \end{array}\right) \end{aligned}$$

that overlaps with $\textbf{G}_{k+1}$ over N columns. Analogous to $\textbf{E}'$, we see a repeating block structure with overlapping columns, though with an identity block at the lower right corner, associated with a final time condition.

Adjoint RK formulas (9) are derived by solving the block upper triangular system $\textbf{E}'(\textbf{y})^\top \mathbf {\uplambda }=\textbf{w}$ via back-substitution. Each time step is associated with solving the following system for $\varvec{\varLambda }_{k}$ and $\varvec{\uplambda }_{k-1}$, with $\varvec{\uplambda }_{k}$ given by the previous adjoint time step:

$$\begin{aligned}{} & {} \textbf{G}_k \left( \begin{array}{ccccccccc} \varvec{\uplambda }_{k-1} \\ \varvec{\varLambda }_k \\ \varvec{\uplambda }_k\\ \end{array}\right) = \left( \begin{array}{ccccccccc} \textbf{w}_{k-1} \\ \textbf{W}_k\\ \end{array}\right) , \\{} & {} \implies \quad \left\{ \begin{aligned} \varvec{\uplambda }_{k-1}&= \varvec{\uplambda }_k + \sum _{i=1}^s \varvec{\varLambda }_{k,i} + \textbf{w}_{k-1}, \\ \varvec{\varLambda }_{k,i}&= b_i \varDelta t \textbf{J}_{k,i}^\top \varvec{\uplambda }_{k} + \varDelta t \sum _{j=1}^s a_{ji}\textbf{J}^\top _{k,i} \varvec{\varLambda }_{k,j} + \textbf{W}_{k,i}, \quad i=1,\ldots ,s. \end{aligned}\right. \end{aligned}$$

1.2 IDT Matrix Representation

The matrix representation of the IDT method is very similar to what we derived for RK,

$$\begin{aligned} \textbf{E}(\textbf{y},\varvec{\gamma }) :=\textbf{L}\textbf{y} - \textbf{N}(\textbf{y},\varvec{\gamma }) - \varvec{\chi }_0 \textbf{y}_{\textrm{init}} = \textbf{0} \end{aligned}$$

where the relaxation parameters $\varvec{\gamma }= (\gamma _1,\ldots ,\gamma _k)$ appear on the (nonlinear) term $\textbf{N}$ only, i.e.,

$$\begin{aligned} \textbf{N}(\textbf{y},\mathbf {\gamma }) := \left( \begin{array}{ccccccccc} \textbf{0}_{N} \\ \hline \textbf{A}\textbf{F}_1 \\ \gamma _1\textbf{B}^\top \textbf{F}_1\\ \hline \textbf{A}\textbf{F}_2 \\ \gamma _2\textbf{B}^\top \textbf{F}_2 \\ \hline \vdots \\ \end{array}\right) . \end{aligned}$$

(29)

Recall that $\gamma _k$ is defined as the positive root near 1 (for $\varDelta t$ small enough) of the root function $r_k(\gamma ; \textbf{y})$, Eq. (4). In other words the relaxation parameters depend implicitly on $\textbf{y}$, i.e., $\varvec{\gamma }=\varvec{\gamma }(\textbf{y})$. Let $\widetilde{\textbf{E}}(\textbf{y})$ denote the reduced state-equation operator, that is,

$$\begin{aligned} \widetilde{\textbf{E}}(\textbf{y}) :=\textbf{E}(\textbf{y},\varvec{\gamma }(\textbf{y})). \end{aligned}$$

The Jacobian is given by

with a similar block structure as the Jacobian for standard RK,

$$\begin{aligned} \widetilde{\textbf{E}}_k = \left( \begin{array}{ccccccccc} -\textbf{C} &{} \textbf{I}_{sN}-\textbf{A}\textbf{J}_k &{} \textbf{0}_{sN\times N} \\ -\textbf{I}_N - \varvec{\varGamma }_{y,k} &{} -\gamma _k\textbf{B}^\top \textbf{J}_k - \varvec{\varGamma }_{Y,k} &{} \textbf{I}_N\\ \end{array}\right) \end{aligned}$$

where

$$\begin{aligned} \varvec{\varGamma }_{y,k} :=\textbf{B}^\top \textbf{F}_k (\nabla _{y}\gamma _k)^\top , \quad \varvec{\varGamma }_{Y,k} :=\textbf{B}^\top \textbf{F}_k (\nabla _{Y}\gamma _k)^\top \end{aligned}$$

are associated with the term $\frac{\partial \textbf{N}}{\partial \textbf{y}}\frac{d\varvec{\gamma }}{d\textbf{y}}$, which we highlight in red text. Again, gradient terms in $\varvec{\varGamma }_{y,k}$ and $\varvec{\varGamma }_{Y,k}$ correspond to gradients of $\gamma _k$ with respect to $\textbf{y}_{k-1}$ and $\textbf{Y}_k$ respectively, and are computed via implicit differentiation; see Eqs. (10) and (11).

Solving $\widetilde{\textbf{E}}'(\textbf{y})\mathbf {\delta }= \textbf{w}$ via forward substitution results in solving at each time step the following system for $\varvec{\varDelta }_k$ and $\varvec{\delta }_k$, with $\varvec{\delta }_{k-1}$ given by the previous time step, deriving the linearized IDT formulas in Lemma 1:

$$\begin{aligned}{} & {} \widetilde{\textbf{E}}_k \left( \begin{array}{ccccccccc} \varvec{\delta }_{k-1}\\ \varvec{\varDelta }_k \\ \varvec{\delta }_k\\ \end{array}\right) = \left( \begin{array}{ccccccccc} \textbf{W}_k\\ \textbf{w}_k\\ \end{array}\right) , \\{} & {} \implies \quad \left\{ \begin{aligned} \varvec{\varDelta }_{k,i}&= \varvec{\delta }_{k-1} + \varDelta t \sum _{j=1}^s a_{ij} \textbf{J}_{k,j} \varvec{\varDelta }_{k,j} + \textbf{W}_{k,i}, \quad i=1,\ldots ,s,\\ \varvec{\delta }_k&= \varvec{\delta }_{k-1} + \gamma _k \varDelta t\sum _{i=1}^s b_i \textbf{J}_{k,i} \varvec{\varDelta }_{k,i} \,+\, \varvec{\varGamma }_{y,k}\varvec{\delta }_{k-1} + \varvec{\varGamma }_{Y,k}\varvec{\varDelta }_k + \textbf{w}_k. \end{aligned}\right. \end{aligned}$$

Note that

$$\begin{aligned} \varvec{\varGamma }_{y,k}\varvec{\delta }_{k-1} + \varvec{\varGamma }_{Y,k}\varvec{\varDelta }_k = \rho _k \varDelta t\sum _{i=1}^s b_i \textbf{F}_{k,i} \end{aligned}$$

with scalar $\rho _k :=\nabla _{y} \gamma _k^\top \varvec{\delta }_{k-1}+ \nabla _{Y} \gamma _k^\top \varvec{\varDelta }_k$.

The transpose of the Jacobian is given by

with

$$\begin{aligned} \widetilde{\textbf{G}}_k = \left( \begin{array}{ccccccccc} \textbf{I}_N &{} -\textbf{C}^\top &{} - \textbf{I}_N \,-\, \varvec{\varGamma }_{y,k}^\top \\ \textbf{0}_{sN\times N} &{} \textbf{I}_{sN}-\textbf{J}_{k}^\top \textbf{A}^\top &{} -\gamma _k \textbf{J}_k^\top \textbf{B} \,-\, \varvec{\varGamma }_{Y,k}^\top \\ \end{array}\right) . \end{aligned}$$

Solving $\widetilde{\textbf{E}}'(\textbf{y})^\top \mathbf {\uplambda }= \textbf{w}$ via back substitution results in solving at each time step the following system for $\varvec{\varLambda }_k$ and $\varvec{\uplambda }_k-1$, with $\varvec{\uplambda }_{k}$ given by the previous time step, deriving the adjoint IDT formulas in Lemma 2:

$$\begin{aligned}{} & {} \widetilde{\textbf{G}}_k \left( \begin{array}{ccccccccc} \varvec{\uplambda }_{k-1} \\ \varvec{\varLambda }_k \\ \varvec{\uplambda }_k\\ \end{array}\right) = \left( \begin{array}{ccccccccc} \textbf{w}_{k-1} \\ \textbf{W}_k\\ \end{array}\right) , \\{} & {} \implies \quad \left\{ \begin{aligned} \varvec{\uplambda }_{k-1}&= \sum _{i=1}^s \varvec{\varLambda }_{k,i} + \varvec{\uplambda }_k \,+\, \varvec{\varGamma }_{y,k}^\top \varvec{\uplambda }_k + \textbf{w}_{k-1}, \\ \varvec{\varLambda }_{k,i}&= \varDelta t \, \textbf{J}_{k,i}^\top \sum _{j=1}^s a_{ji} \varvec{\varLambda }_{k,j} + \gamma _k b_i \varDelta t\, \textbf{J}_{k,i}^\top \varvec{\uplambda }_k \,+\, \varvec{\varGamma }_{Y,k}^\top \varvec{\uplambda }_k + \textbf{W}_{k,i}, \quad i=1,\ldots ,s. \end{aligned}\right. \end{aligned}$$

Note that

$$\begin{aligned} \varvec{\varGamma }_{y,k}^\top \mathbf {\uplambda }_{k}&= (\nabla _{y}\gamma _k) \; \textbf{F}_k^\top \textbf{B} \varvec{\uplambda }_{k} = \xi _k (\nabla _{y}\gamma _k)\\ \varvec{\varGamma }_{Y,k}^\top \mathbf {\uplambda }_{k}&= (\nabla _{Y}\gamma _k) \textbf{F}_k^\top \textbf{B} \varvec{\uplambda }_{k} = \xi _{k} (\nabla _{Y}\gamma _k) \end{aligned}$$

with scalar

$$\begin{aligned} \xi _{k} :=\textbf{F}_{k}^\top \textbf{B} \varvec{\uplambda }_k = \varDelta t \sum _{i=1}^s b_i \textbf{F}_{k,i}^\top \varvec{\uplambda }_{k}. \end{aligned}$$

1.3 RRK Matrix Representation

The matrix representation for RRK is quite similar to what we derived for IDT:

$$\begin{aligned} \textbf{E}(\textbf{y},\varvec{\gamma }, \varDelta t^*) :=\textbf{L} \textbf{y} - \textbf{N}(\textbf{y},\varvec{\gamma }, \varDelta t^*) - \varvec{\chi }_0 \textbf{y}_{\textrm{init}} = \textbf{0}, \end{aligned}$$

where we have made the operator $\textbf{N}$ dependent on the modified step size $\varDelta t^* :=T-t_{K-1}$ as well. Only the last $N(s+1)$ rows of $\textbf{N}$, associated with the last time step, differ from what was presented in the IDT case (Eq. (29)). These last last $N(s+1)$ rows of $\textbf{N}$ are specified by

$$\begin{aligned} \left( \begin{array}{ccccccccc} \textbf{A}_*\textbf{F}_K \\ \gamma _K\textbf{B}_*^\top \textbf{F}_K \\ \end{array}\right) \end{aligned}$$

with

$$\begin{aligned} \begin{aligned} \textbf{A}_*&:=\varDelta t^* \textbf{A}_s \otimes \textbf{I}_N = \frac{\varDelta t^*}{\varDelta t} \textbf{A},\\ \textbf{B}_*&:=\varDelta t^* \textbf{b}_s \otimes \textbf{I}_N = \frac{\varDelta t^*}{\varDelta t} \textbf{B}. \end{aligned} \end{aligned}$$

Recall that in RRK we have $t_k = t_{k-1} + \gamma _k \varDelta t$ for $k=1,\ldots ,K-1$, and hence

$$\begin{aligned} \varDelta t^* = T - t_0 - \varDelta t \sum _{\ell =1}^{K-1} \gamma _\ell \end{aligned}$$

which makes $\varDelta t^*$ a function of $\textbf{y}_{\ell -1}$ and $\textbf{Y}_\ell $ for $\ell =1,\ldots ,K-1$, i.e., $\varDelta t^* = \varDelta t^*(\textbf{y})$. With this is mind, let $\widetilde{\textbf{E}}(\textbf{y})$ denote the reduced state-equation operator,

$$\begin{aligned} \widetilde{\textbf{E}}(\textbf{y}) :=\textbf{E}(\textbf{y}, \varvec{\gamma }(\textbf{y}), \varDelta t^*(\textbf{y})). \end{aligned}$$

Given how $\textbf{N}$ is modified in RRK, it follows that the Jacobian $\widetilde{\textbf{E}}'$ will coincide with what we derived for IDT except at the last $N(s+1)$ rows. In particular, computing

$$\begin{aligned} \frac{d}{d\textbf{y}} \Big ( \textbf{A}_* \textbf{F}_K \Big )&= \textbf{A}_* \frac{d \textbf{F}_K}{d\textbf{y}} + \textbf{A} \textbf{F}_K \left( {\frac{d}{d\textbf{y}}\frac{\varDelta t^*}{\varDelta t}}\right) ,\\ \frac{d}{d\textbf{y}} \left( { \gamma _K \textbf{B}_*^\top \textbf{F}_K }\right)&= \gamma _K \textbf{B}_*^\top \frac{d\textbf{F}_K}{d\textbf{y}} + \textbf{B}_*^\top \textbf{F}_K \left( {\frac{d\gamma _K}{d\textbf{y}}}\right) + \gamma _K \textbf{B}^\top \textbf{F}_K \left( { \frac{d}{d\textbf{y}}\frac{\varDelta t^*}{\varDelta t}}\right) , \end{aligned}$$

will require the derivatives of $\gamma _K$ and $\varDelta t^*$ with respect to $\textbf{y}$.

The derivatives of $\varDelta t^*$ can be expressed in terms of derivatives of the relaxation parameters as follows:

$$\begin{aligned} \frac{d}{d\textbf{y}} \frac{\varDelta t^*}{\varDelta t}(\textbf{y})&= - \left( { \frac{\partial \gamma _1}{\partial \textbf{y}_1}, \;\, \frac{\partial \gamma _1}{\partial \textbf{Y}_1}, \;\, \cdots ,\;\, \frac{\partial \gamma _{K-1}}{\partial \textbf{y}_{K-1}}, \;\, \frac{\partial \gamma _{K-1}}{\partial \textbf{Y}_{K-1}}, \;\, \textbf{0}_{N(s+2)}^\top }\right) \biggr |_{\textbf{y}}^{}\\&= - \left( { \nabla \gamma _1^\top ,\;\, \cdots ,\;\, \nabla \gamma _{K-1}^\top ,\;\, \textbf{0}_{N(s+2)}^\top }\right) , \end{aligned}$$

where

$$\begin{aligned} \nabla \gamma _k := \left( \begin{array}{ccccccccc} \nabla _{y}\gamma _k \\ \nabla _{Y}\gamma _k\\ \end{array}\right) . \end{aligned}$$

An added complication is that $\varDelta t^*$ is also the step size used in $r_K$, thus implying that $\gamma _K$ is dependent on information from all previous time steps. As before, we can compute $\frac{d\gamma _K}{d\textbf{y}}$ via implicit differentiation, though we will have to compute partial derivatives of $\gamma _K$ with respect to $\textbf{y}_{\ell -1}$ and $\textbf{Y}_\ell $ for all $\ell =1,\ldots ,K$;

$$\begin{aligned} \frac{\partial \gamma _K}{\partial \textbf{y}_{\ell -1}}&= -\left( {\frac{\partial r_K}{\partial \gamma }}\right) ^{-1} \frac{\partial r_K}{\partial \textbf{y}_{\ell -1}},\\ \frac{\partial \gamma _K}{\partial \textbf{Y}_\ell }&= -\left( {\frac{\partial r_K}{\partial \gamma }}\right) ^{-1} \frac{\partial r_K}{\partial \textbf{Y}_{\ell }}. \end{aligned}$$

The partial derivatives of $r_K$ with respect to $\gamma $, $\textbf{y}_{K-1}$ and $\textbf{Y}_{K}$, evaluated at $(\gamma _K, \textbf{y}_{K-1}, \textbf{Y}_K)$ are as given in Eq. (11), with $k\mapsto K$ and $\varDelta t\mapsto \varDelta t^*$. Just as in Eq. (10), we use $\nabla _{y}\gamma _K$ and $\nabla _{Y}\gamma _K$ to denote the gradient of $\gamma _K$ with respect to $\textbf{y}_{K-1}$ and $\textbf{Y}_{K}$ respectively. For the remaining $\ell =1,\ldots ,K-1$,

$$\begin{aligned} \begin{aligned} \frac{\partial r_K}{\partial \textbf{y}_{\ell -1}}(\varvec{\gamma }(\textbf{y}),\textbf{y})&= \gamma _K \sum _{i=1}^s b_i \Big ( \nabla \eta (\textbf{y}_{K}) - \nabla \eta (\textbf{Y}_{K,i})\Big )^\top \textbf{F}_{K,i} \frac{\partial \varDelta t^*}{\partial \textbf{y}_{\ell -1}}(\textbf{y})\\&= - \gamma _K \left( {\frac{\varDelta t}{\varDelta t^*} \frac{\partial r_K}{\partial \gamma }(\varvec{\gamma }(\textbf{y}),\textbf{y})}\right) (\nabla _{y} \gamma _\ell )^\top \end{aligned} \end{aligned}$$

where we have used

$$\begin{aligned} \frac{\partial \varDelta t^*}{\partial \textbf{y}_{\ell -1}}(\textbf{y}) = -\varDelta t \, (\nabla _{y} \gamma _\ell )^\top . \end{aligned}$$

Similarly,

$$\begin{aligned} \begin{aligned} \frac{\partial r_K}{\partial \textbf{Y}_{\ell ,j}}(\varvec{\gamma }(\textbf{y}),\textbf{y})&= \gamma _K \sum _{i=1}^s b_i \Big ( \nabla \eta (\textbf{y}_{K}) - \nabla \eta (\textbf{Y}_{K,i})\Big )^\top \textbf{F}_{K,i} \frac{\partial \varDelta t^*}{\partial \textbf{Y}_{\ell ,j}}(\textbf{y})\\&= - \gamma _K \left( {\frac{\varDelta t}{\varDelta t^*} \frac{\partial r_K}{\partial \gamma }(\varvec{\gamma }(\textbf{y}),\textbf{y})}\right) (\nabla _{Y} \gamma _{\ell ,j})^\top . \end{aligned} \end{aligned}$$

Thus,

$$\begin{aligned} \begin{aligned} \frac{\partial \gamma _K}{\partial \textbf{y}_{\ell -1}}(\textbf{y})&= \gamma _K \frac{\varDelta t}{\varDelta t^*} (\nabla _y \gamma _{\ell })^\top ,\\ \frac{\partial \gamma _K}{\partial \textbf{Y}_\ell }(\textbf{y})&= \gamma _K \frac{\varDelta t}{\varDelta t^*} (\nabla _Y \gamma _{\ell })^\top . \end{aligned} \end{aligned}$$

Putting it all together, we have

$$\begin{aligned} \begin{aligned} \frac{d\gamma _K}{d\textbf{y}}(\textbf{y})&= \Big ( \gamma _K \tfrac{\varDelta t}{\varDelta t^*}\nabla \gamma _1^\top , \cdots , \gamma _K\tfrac{\varDelta t}{\varDelta t^*}\nabla \gamma _{K-1}^\top , \nabla \gamma _K^\top , \textbf{0}_{N}^\top \Big ) \\&= -\gamma _K \frac{\varDelta t}{\varDelta t^*} \left( {\frac{d}{d\textbf{y}} \frac{\varDelta t^*}{\varDelta t}(\textbf{y})}\right) + \left( { \textbf{0}_{N(s+1)(K-1)}^\top ,\;\, \nabla \gamma _K^\top ,\,\; \textbf{0}_{N}^\top }\right) . \end{aligned} \end{aligned}$$

In summary,

$$\begin{aligned} \frac{d}{d\textbf{y}} \Big (\textbf{A}_*\textbf{F}_K \Big )\biggr |_{\textbf{y}}^{}&= - \left( { \textbf{A}\textbf{F}_K \nabla \gamma _1^\top ,\;\, \cdots , \;\, \textbf{A}\textbf{F}_K \nabla \gamma _{K-1}^\top ,\;\, \textbf{0}_{sN\times N}, \;\, -\textbf{A}_*\textbf{J}_K,\;\, \textbf{0}_{sN\times N} }\right) \\ \frac{d}{d\textbf{y}} \Big ( \gamma _K \textbf{B}_*^\top \textbf{F}_K \Big )\biggr |_{\textbf{y}}^{}&= \left( { \textbf{0}_{N(s+1)(K-1)}^\top , \;\, \varvec{\varGamma }_{y,K}, \;\, \varvec{\varGamma }_{Y,K}, \;\, \textbf{0}_{N}^\top }\right) \end{aligned}$$

with

$$\begin{aligned} \varvec{\varGamma }_{y,K} = \textbf{B}_*^\top \textbf{F}_{K} (\nabla _{y}\gamma _K)^\top , \quad \varvec{\varGamma }_{Y,K} = \textbf{B}_*^\top \textbf{F}_{K} (\nabla _{Y}\gamma _K)^\top . \end{aligned}$$

We jump forward to interpreting the solution of $\widetilde{\textbf{E}}'(\textbf{y})\mathbf {\delta }= \textbf{w}$ via forward substitution as a time stepping scheme. Again, the first $K-1$ steps are as given by IDT. The last step, as shown in Lemma 1, is derived from the solution of the following system for $\varvec{\varDelta }_{K}$ and $\varvec{\delta }_K$, with $(\varvec{\delta }_{\ell -1},\varvec{\varDelta }_{\ell })$ for $\ell =1,\ldots ,K-1$ given by the the previous time steps:

$$\begin{aligned} \left( \begin{array}{ccccccccc} \textbf{A} \textbf{F}_K \nabla \gamma _1^\top &{} \cdots &{} \textbf{A} \textbf{F}_K \nabla \gamma _{K-1}^\top &{} -\textbf{C} &{} \textbf{I}_{sN} - \textbf{A}_* \textbf{J}_K \\ &{}&{}&{} -\textbf{I}_N \,-\,\varvec{\varGamma }_{y,K} &{} -\gamma _K \textbf{B}_*^\top \textbf{J}_K \,-\, \varvec{\varGamma }_{Y,K} &{} \textbf{I}_N\\ \end{array}\right) \left( \begin{array}{ccccccccc} \left( \begin{array}{ccccccccc} \varvec{\delta }_0 \\ \varvec{\varDelta }_1\\ \end{array}\right) \\ \vdots \\ \left( \begin{array}{ccccccccc} \varvec{\delta }_{K-2}\\ \varvec{\varDelta }_{K-1}\\ \end{array}\right) \\ \varvec{\delta }_{K-1}\\ \varvec{\varDelta }_K \\ \varvec{\delta }_K\\ \end{array}\right) = \left( \begin{array}{ccccccccc} \textbf{W}_{K}\\ \textbf{w}_K\\ \end{array}\right) \end{aligned}$$

$$\begin{aligned} \implies \left\{ \begin{aligned} \varvec{\varDelta }_{K,i}&= \varvec{\delta }_{K-1} + \varDelta t^* \sum _{j=1}^s a_{ij} \textbf{J}_{K,j} \varvec{\varDelta }_{K,j} - \rho _* \varDelta t\sum _{j=1}^s a_{ij}\textbf{F}_{K,j} + \textbf{W}_{K,i}, \quad i=1,\ldots ,s,\\ \varvec{\delta }_{K}&= \delta _{K-1} + \gamma _K \varDelta t^* \sum _{i=1}^s b_i \textbf{J}_{K,i} \varvec{\varDelta }_{K,i} \;+\; \varvec{\varGamma }_{y,K} \varvec{\delta }_{K-1} + \varvec{\varGamma }_{Y,K}\varvec{\varDelta }_{K} + \textbf{w}_K \end{aligned}\right. \end{aligned}$$

with scalar

$$\begin{aligned} \rho _* :=\sum _{\ell =1}^{K-1}\Big ( (\nabla _{y}\gamma _\ell )^\top \varvec{\delta }_{\ell -1} + (\nabla _{Y}\gamma _\ell )^\top \varvec{\varDelta }_{\ell } \Big ). \end{aligned}$$

Similar to before,

$$\begin{aligned} \varvec{\varGamma }_{y,K}\varvec{\delta }_{K-1} + \varvec{\varGamma }_{Y,K}\varvec{\varDelta }_K = \rho _K \varDelta t^*\sum _{i=1}^s b_i \textbf{F}_{K,i}, \end{aligned}$$

with scalar $\rho _K :=(\nabla _{y} \gamma _K)^\top \varvec{\delta }_{K-1}+ (\nabla _{Y} \gamma _K)^\top \varvec{\varDelta }_K$.

We now interpret the solution of $\widetilde{\textbf{E}}'(\textbf{y})^\top \mathbf {\uplambda }= \textbf{w}$ via back substitution as a time stepping scheme. Again, the last step (or first step in reverse-time) is as given by the adjoint IDT formulas in Lemma 2, but with $\varDelta t \mapsto \varDelta t^*$. The remaining $K-1$ steps, as given in Lemma 4, are derived from the solution to the following systems for $\varvec{\varLambda }_k$ and $\varvec{\uplambda }_{k-1}$, with $\varvec{\uplambda }_k$ given by previous time step and $\varvec{\varLambda }_K$ given by the last step,

$$\begin{aligned} \left( \begin{array}{ccccccccc} \textbf{I}_N &{} -\textbf{C}^\top &{} -\textbf{I}_N \,-\, \varvec{\varGamma }_{y,k}^\top &{} \textbf{0}_{N\times P} &{} \nabla _{y}\gamma _{k}^\top \textbf{F}_K^\top \textbf{A}^\top \\ &{} \textbf{I}_{sN}-\textbf{J}_{k}^\top \textbf{A}^\top &{} -\gamma _k \textbf{J}_k^\top \textbf{B} \,-\, \varvec{\varGamma }_{Y,k}^\top &{} \textbf{0}_{sN\times P} &{} \nabla _{Y}\gamma _{k}^\top \textbf{F}_K^\top \textbf{A}^\top \\ \end{array}\right) \left( \begin{array}{ccccccccc} \varvec{\uplambda }_{k-1} \\ \varvec{\varLambda }_k \\ \varvec{\uplambda }_k\\ \vdots \\ \varvec{\varLambda }_K\\ \end{array}\right) = \left( \begin{array}{ccccccccc} \textbf{w}_{k-1} \\ \textbf{W}_k\\ \end{array}\right) \end{aligned}$$

where $P=N(s+1)((K-1)-k)$, which gives

$$\begin{aligned} \implies \quad \left\{ \begin{aligned} \varvec{\uplambda }_{k-1}&= \sum _{i=1}^s \varvec{\varLambda }_{k,i} + \varvec{\uplambda }_k \,+\, \varvec{\varGamma }_{y,k}^\top \varvec{\uplambda }_k \,+\, \xi _* \nabla _{y}\gamma _k^\top + \textbf{w}_{k-1}, \\ \varvec{\varLambda }_{k,i}&= \varDelta t \, \textbf{J}_{k,i}^\top \sum _{j=1}^s a_{ji} \varvec{\varLambda }_{k,j} + \gamma _k b_i \varDelta t\, \textbf{J}_{k,i}^\top \varvec{\uplambda }_k \,+\, \varvec{\varGamma }_{Y,k}^\top \varvec{\uplambda }_k \,-\, \xi _* \nabla _{Y}\gamma _k^\top \\&+ \textbf{W}_{k,i}, \quad i=1,\ldots ,s, \end{aligned}\right. \end{aligned}$$

with scalar

$$\begin{aligned} \xi _* :=\varDelta t\sum _{i=1}^s \sum _{j=1}^s a_{ji} \textbf{F}_{K,i} \varvec{\varLambda }_{K,j} = \textbf{F}_K^\top \textbf{A}^\top \varvec{\varLambda }_K. \end{aligned}$$

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Bencomo, M.J., Chan, J. Discrete Adjoint Computations for Relaxation Runge–Kutta Methods. J Sci Comput 94, 59 (2023). https://doi.org/10.1007/s10915-023-02102-y

Download citation

Received: 29 April 2022
Revised: 11 December 2022
Accepted: 06 January 2023
Published: 01 February 2023
DOI: https://doi.org/10.1007/s10915-023-02102-y

Keywords

Mathematics Subject Classification

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Discrete Adjoint Computations for Relaxation Runge–Kutta Methods

Abstract

Access this article