Abstract
Relaxation Runge–Kutta methods reproduce a fully discrete dissipation (or conservation) of entropy for entropy stable semi-discretizations of nonlinear conservation laws. In this paper we derive the discrete adjoint of relaxation Runge–Kutta schemes, which are applicable to discretize-then-optimize approaches for optimal control problems. Furthermore, we prove that the derived discrete relaxation Runge–Kutta adjoint preserves time-symmetry when applied to linear skew-symmetric systems of ODEs. Numerical experiments verify these theoretical results while demonstrating the importance of appropriately treating the relaxation parameter when computing the discrete adjoint.
Similar content being viewed by others
Data Availability
The datasets generated during and/or analyzed during the current study are available from the corresponding author on reasonable request.
References
Alexander, R.: Diagonally implicit Runge–Kutta methods for stiff ODE’s. SIAM J. Numer. Anal. 14(6), 1006–1021 (1977)
Alexe, M., Sandu, A.: On the discrete adjoints of adaptive time stepping algorithms. J. Comput. Appl. Math. 233(4), 1005–1020 (2009)
Antil, H., Leykekhman, D.: A brief introduction to PDE-constrained optimization. In: Frontiers in PDE-Constrained Optimization, pp. 3–40. Springer (2018)
Bencomo, M.J., Symes, W.W.: Discretization of multipole sources in a finite difference setting for wave propagation problems. J. Comput. Phys. 386, 296–322 (2019)
Calvo, M., Hernández-Abreu, D., Montijano, J.I., Rández, L.: On the preservation of invariants by explicit Runge–Kutta methods. SIAM J. Sci. Comput. 28(3), 868–885 (2006)
Eberhard, P., Bischof, C.: Automatic differentiation of numerical integration algorithms. Math. Comput. 68(226), 717–731 (1999)
Griewank, A.: A mathematical view of automatic differentiation. Acta Numer. 12, 321–398 (2003)
Griewank, A., Bischof, C., Corliss, G., Carle, A., Williamson, K.: Derivative convergence for iterative equation solvers. Optim. Methods Softw. 2(3–4), 321–355 (1993)
Griewank, A., Walther, A.: Evaluating Derivatives: Principles and Techniques of Algorithmic Differentiation. SIAM, Philadelphia (2008)
Gunzburger, M.D.: Perspectives in Flow Control and Optimization. SIAM, Philadelphia (2002)
Hager, W.W.: Runge–Kutta methods in optimal control and the transformed adjoint system. Numer. Math. 87(2), 247–282 (2000)
Hairer, E., Lubich, C., Wanner, G.: Geometric Numerical Integration: Structure-Preserving Algorithms for Ordinary Differential Equations, vol. 31. Springer, New York (2006)
Hernandez, D.M., Bertschinger, E.: Time-symmetric integration in astrophysics. Mon. Not. R. Astron. Soc. 475(4), 5570–5584 (2018)
Ketcheson, D.I.: Relaxation Runge–Kutta methods: conservation and stability for inner-product norms. SIAM J. Numer. Anal. 57(6), 2850–2870 (2019)
Moczo, P., Robertsson, J.O., Eisner, L.: The finite-difference time-domain method for modeling of seismic wave propagation. Adv. Geophys. 48, 421–516 (2007)
Persson, P.O.: High-order Navier–Stokes simulations using a sparse line-based discontinuous Galerkin method. In: 50th AIAA Aerospace Sciences Meeting including the New Horizons Forum and Aerospace Exposition, p. 456 (2012)
Plessix, R.E.: A review of the adjoint-state method for computing the gradient of a functional with geophysical applications. Geophys. J. Int. 167(2), 495–503 (2006)
Ranocha, H., Ketcheson, D.I.: Convex Relaxation Runge–Kutta. Relaxation Runge–Kutta methods for convex functionals. https://github.com/ranocha/ConvexRelaxationRungeKutta (2019). https://doi.org/10.5281/zenodo.3066518
Ranocha, H., Lóczi, L., Ketcheson, D.I.: General relaxation methods for initial-value problems with application to multistep schemes. Numer. Math. 146(4), 875–906 (2020)
Ranocha, H., Sayyari, M., Dalcin, L., Parsani, M., Ketcheson, D.I.: Relaxation Runge–Kutta methods: fully discrete explicit entropy-stable schemes for the compressible Euler and Navier–Stokes equations. SIAM J. Sci. Comput. 42(2), A612–A638 (2020)
Rothauge, K.: The discrete adjoint method for high-order time-stepping methods. Ph.D. thesis, University of British Columbia (2016)
Sandu, A.: On the properties of Runge–Kutta discrete adjoints. In: International Conference on Computational Science, pp. 550–557. Springer (2006)
Sanz-Serna, J.M.: Symplectic Runge–Kutta schemes for adjoint equations, automatic differentiation, optimal control, and more. SIAM Rev. 58(1), 3–33 (2016)
Shu, C.W., Osher, S.: Efficient implementation of essentially non-oscillatory shock-capturing schemes. J. Comput. Phys. 77(2), 439–471 (1988)
Sirkes, Z., Tziperman, E.: Finite difference of adjoint or adjoint of finite difference? Mon. Weather Rev. 125(12), 3373–3378 (1997)
Virieux, J.: SH-wave propagation in heterogeneous media: velocity-stress finite-difference method. Geophysics 49(11), 1933–1942 (1984)
Walther, A.: Automatic differentiation of explicit Runge–Kutta methods for optimal control. Comput. Optim. Appl. 36(1), 83–108 (2007)
Wilcox, L.C., Stadler, G., Bui-Thanh, T., Ghattas, O.: Discretely exact derivatives for hyperbolic PDE-constrained optimization problems discretized by the discontinuous Galerkin method. J. Sci. Comput. 63(1), 138–162 (2015)
Acknowledgements
Mario J. Bencomo was supported by the Pfeiffer Postdoctoral Instructorship. Jesse Chan gratefully acknowledges support from the National Science Foundation under award DMS-CAREER-1943186.
Author information
Authors and Affiliations
Contributions
All authors contributed to the study conception and design. Material preparation, data collection and analysis were performed by MJB. The first draft of the manuscript was written by MJB and all authors commented on previous versions of the manuscript. All authors read and approved the final manuscript.
Corresponding author
Ethics declarations
Conflict of interest
The authors have no relevant financial or non-financial interests to disclose.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix
Appendix
In this appendix we present in detail derivations of linearization and adjoint computations for RK and its relaxation variant using a matrix representation. We use an approach and notation similar to [21]. The plan is to interpret time-stepping algorithms as solutions to global matrix-vector systems and use the Jacobians of said systems to deduce a time stepping scheme for the discrete linearization and adjoint.
We first recap and introduce some notation.
-
N is the dimension of the state vector \(\textbf{y}(t)\) in IVP (1);
-
K is the total number of time steps taken by a given time-stepping scheme;
-
s is the number of internal stages for an RK method;
-
\(\textbf{A}_s, \textbf{b}_s, \textbf{c}_s\) are the coefficients of a specified s-stage RK method;
-
The concatenation of vectors indexed by internal stages is denoted by simply removing the internal stage index, e.g.,
$$\begin{aligned} \textbf{Y}_k := \left( \begin{array}{ccccccccc} \textbf{Y}_{k,1} \\ \vdots \\ \textbf{Y}_{k,s}\\ \end{array}\right) \in {\mathbb {R}}^{sN}; \end{aligned}$$ -
Vectors of size \(\overline{N} :=N + N(s+1)K\) are denoted using bold font and an overline and have components denoted as follows:
$$\begin{aligned} \textbf{y} := \left( \begin{array}{ccccccccc} \textbf{y}_0 \\ \textbf{Y}_1 \\ \textbf{y}_1 \\ \vdots \\ \textbf{Y}_{K} \\ \textbf{y}_{K}\\ \end{array}\right) , \end{aligned}$$with \(\textbf{y}_k\in {\mathbb {R}}^N\) and \(\textbf{Y}_k\in {\mathbb {R}}^{sN}\);
-
Matrix \(\varvec{\chi }_k\in {\mathbb {R}}^{\overline{N}\times N}\) is defined such that \(\varvec{\chi }_k^\top \textbf{y} = \textbf{y}_k\), extracting the k-th step vector. Moreover, for a given \(\textbf{v}\in {\mathbb {R}}^N\), then \(\textbf{w} :=\varvec{\chi }_k \textbf{v}\) is vector of length \(\overline{N}\) with zero entries everywhere except at \(\textbf{w}_k = \textbf{v}\);
-
\(\textbf{I}_{M}\) denotes the \(M\times M\) identity matrix;
-
\(\textbf{0}_{M}\) is the zero vector of dimension M;
-
\(\textbf{0}_{M_1\times M_2}\) is the \(M_1\times M_2\) zero matrix;
-
\(\otimes \) denotes the Kronecker product.
1.1 RK Matrix-Representation
A single step of the RK method, as specified by Eq. (2), can be written in matrix form as
where
-
\(\textbf{A} :=\varDelta t \textbf{A}_s \otimes \textbf{I}_N \in {\mathbb {R}}^{sN\times sN}\),
-
\(\textbf{B} :=\varDelta t \textbf{b}_s \otimes \textbf{I}_N \in {\mathbb {R}}^{sN\times N}\),
-
\(\textbf{C} :=\textbf{1}_s \otimes \textbf{I}_N \in {\mathbb {R}}^{sN\times N}\), with \(\textbf{1}_s :=(1,\ldots ,1)^\top \in {\mathbb {R}}^{s},\)
-
\(\textbf{F}_k\) is the concatenation of the \(\textbf{F}_{k,i}:=\textbf{f}(\textbf{Y}_{k,i},t_{k-1}+c_i \varDelta t)\), and subsequently can be viewed as a vector valued function of \(\textbf{Y}_k\).
The RK method as a whole can be represented as a concatenation of the matrix systems above, resulting in a global system of time-stepping equations:
with
We make some remarks and observations:
-
Boxes are meant to help visually separate blocks associated with different time steps in both \(\textbf{L}\) and \(\textbf{N}(\textbf{y})\).
-
The unboxed \(\textbf{I}_N\) in the top left corner of \(\textbf{L}\) is related to the enforcement of the initial condition.
-
\(\textbf{L}\in {\mathbb {R}}^{\overline{N}\times \overline{N}}\) is lower unit triangular, though not quite block diagonal due to some slight overlap in columns.
-
Given the repeating block structure of matrices presented here, we only write out the blocks associated with the initial/final conditions and two subsequent time steps. Dots indicate a repeating pattern with the understanding that the block structure repeats K times with appropriate indexing when relevant.
-
\(\textbf{N}:{\mathbb {R}}^{\overline{N}} \rightarrow {\mathbb {R}}^{\overline{N}}\) is block lower triangular in the sense that the k-th \(N(s+1)\) block of the output does not depend on the j-th \(N(s+1)\) block of the input, for \(j>k\). For example, the \(N(s+1)\) block of \(\textbf{N}(\textbf{y})\) associated with the k-th time step, is
$$\begin{aligned} \left( \begin{array}{ccccccccc} \textbf{A}\textbf{F}_k \\ \textbf{B}^\top \textbf{F}_k\\ \end{array}\right) \end{aligned}$$which only depends on \(\textbf{Y}_k\), and not on \(\textbf{y}\) at later time steps.
The Jacobian of the global time-stepping operator \(\textbf{E}(\textbf{y})\) is given by
where \(\textbf{E}_k\) is an \(N(s+1)\times N(s+2)\) block matrix,
that overlaps with \(\textbf{E}_{k+1}\) over N columns, and
We see that the block lower triangular structure of the operator \(\textbf{N}\) yields a proper block lower triangular Jacobian \(\textbf{E}'\).
Linearized RK formulas (8) are derived by solving linear system \(\textbf{E}'(\textbf{y}) \mathbf {\delta }= \textbf{w}\) via forward substitution thus yielding a time-stepping algorithm. In particular, each time step is associated with solving the following system for \(\varvec{\varDelta }_k\) and \(\varvec{\delta }_k\), with \(\varvec{\delta }_{k-1}\) given in the previous time step:
Taking the transpose of \(\textbf{E}'(\textbf{y})\), we have
where \(\textbf{G}_k\) is an \(N(s+1)\times N(s+2)\) block matrix,
that overlaps with \(\textbf{G}_{k+1}\) over N columns. Analogous to \(\textbf{E}'\), we see a repeating block structure with overlapping columns, though with an identity block at the lower right corner, associated with a final time condition.
Adjoint RK formulas (9) are derived by solving the block upper triangular system \(\textbf{E}'(\textbf{y})^\top \mathbf {\uplambda }=\textbf{w}\) via back-substitution. Each time step is associated with solving the following system for \(\varvec{\varLambda }_{k}\) and \(\varvec{\uplambda }_{k-1}\), with \(\varvec{\uplambda }_{k}\) given by the previous adjoint time step:
1.2 IDT Matrix Representation
The matrix representation of the IDT method is very similar to what we derived for RK,
where the relaxation parameters \(\varvec{\gamma }= (\gamma _1,\ldots ,\gamma _k)\) appear on the (nonlinear) term \(\textbf{N}\) only, i.e.,
Recall that \(\gamma _k\) is defined as the positive root near 1 (for \(\varDelta t\) small enough) of the root function \(r_k(\gamma ; \textbf{y})\), Eq. (4). In other words the relaxation parameters depend implicitly on \(\textbf{y}\), i.e., \(\varvec{\gamma }=\varvec{\gamma }(\textbf{y})\). Let \(\widetilde{\textbf{E}}(\textbf{y})\) denote the reduced state-equation operator, that is,
The Jacobian is given by
with a similar block structure as the Jacobian for standard RK,
where
are associated with the term \(\frac{\partial \textbf{N}}{\partial \textbf{y}}\frac{d\varvec{\gamma }}{d\textbf{y}}\), which we highlight in red text. Again, gradient terms in \(\varvec{\varGamma }_{y,k}\) and \(\varvec{\varGamma }_{Y,k}\) correspond to gradients of \(\gamma _k\) with respect to \(\textbf{y}_{k-1}\) and \(\textbf{Y}_k\) respectively, and are computed via implicit differentiation; see Eqs. (10) and (11).
Solving \(\widetilde{\textbf{E}}'(\textbf{y})\mathbf {\delta }= \textbf{w}\) via forward substitution results in solving at each time step the following system for \(\varvec{\varDelta }_k\) and \(\varvec{\delta }_k\), with \(\varvec{\delta }_{k-1}\) given by the previous time step, deriving the linearized IDT formulas in Lemma 1:
Note that
with scalar \(\rho _k :=\nabla _{y} \gamma _k^\top \varvec{\delta }_{k-1}+ \nabla _{Y} \gamma _k^\top \varvec{\varDelta }_k\).
The transpose of the Jacobian is given by
with
Solving \(\widetilde{\textbf{E}}'(\textbf{y})^\top \mathbf {\uplambda }= \textbf{w}\) via back substitution results in solving at each time step the following system for \(\varvec{\varLambda }_k\) and \(\varvec{\uplambda }_k-1\), with \(\varvec{\uplambda }_{k}\) given by the previous time step, deriving the adjoint IDT formulas in Lemma 2:
Note that
with scalar
1.3 RRK Matrix Representation
The matrix representation for RRK is quite similar to what we derived for IDT:
where we have made the operator \(\textbf{N}\) dependent on the modified step size \(\varDelta t^* :=T-t_{K-1}\) as well. Only the last \(N(s+1)\) rows of \(\textbf{N}\), associated with the last time step, differ from what was presented in the IDT case (Eq. (29)). These last last \(N(s+1)\) rows of \(\textbf{N}\) are specified by
with
Recall that in RRK we have \(t_k = t_{k-1} + \gamma _k \varDelta t\) for \(k=1,\ldots ,K-1\), and hence
which makes \(\varDelta t^*\) a function of \(\textbf{y}_{\ell -1}\) and \(\textbf{Y}_\ell \) for \(\ell =1,\ldots ,K-1\), i.e., \(\varDelta t^* = \varDelta t^*(\textbf{y})\). With this is mind, let \(\widetilde{\textbf{E}}(\textbf{y})\) denote the reduced state-equation operator,
Given how \(\textbf{N}\) is modified in RRK, it follows that the Jacobian \(\widetilde{\textbf{E}}'\) will coincide with what we derived for IDT except at the last \(N(s+1)\) rows. In particular, computing
will require the derivatives of \(\gamma _K\) and \(\varDelta t^*\) with respect to \(\textbf{y}\).
The derivatives of \(\varDelta t^*\) can be expressed in terms of derivatives of the relaxation parameters as follows:
where
An added complication is that \(\varDelta t^*\) is also the step size used in \(r_K\), thus implying that \(\gamma _K\) is dependent on information from all previous time steps. As before, we can compute \(\frac{d\gamma _K}{d\textbf{y}}\) via implicit differentiation, though we will have to compute partial derivatives of \(\gamma _K\) with respect to \(\textbf{y}_{\ell -1}\) and \(\textbf{Y}_\ell \) for all \(\ell =1,\ldots ,K\);
The partial derivatives of \(r_K\) with respect to \(\gamma \), \(\textbf{y}_{K-1}\) and \(\textbf{Y}_{K}\), evaluated at \((\gamma _K, \textbf{y}_{K-1}, \textbf{Y}_K)\) are as given in Eq. (11), with \(k\mapsto K\) and \(\varDelta t\mapsto \varDelta t^*\). Just as in Eq. (10), we use \(\nabla _{y}\gamma _K\) and \(\nabla _{Y}\gamma _K\) to denote the gradient of \(\gamma _K\) with respect to \(\textbf{y}_{K-1}\) and \(\textbf{Y}_{K}\) respectively. For the remaining \(\ell =1,\ldots ,K-1\),
where we have used
Similarly,
Thus,
Putting it all together, we have
In summary,
with
We jump forward to interpreting the solution of \(\widetilde{\textbf{E}}'(\textbf{y})\mathbf {\delta }= \textbf{w}\) via forward substitution as a time stepping scheme. Again, the first \(K-1\) steps are as given by IDT. The last step, as shown in Lemma 1, is derived from the solution of the following system for \(\varvec{\varDelta }_{K}\) and \(\varvec{\delta }_K\), with \((\varvec{\delta }_{\ell -1},\varvec{\varDelta }_{\ell })\) for \(\ell =1,\ldots ,K-1\) given by the the previous time steps:
with scalar
Similar to before,
with scalar \(\rho _K :=(\nabla _{y} \gamma _K)^\top \varvec{\delta }_{K-1}+ (\nabla _{Y} \gamma _K)^\top \varvec{\varDelta }_K\).
We now interpret the solution of \(\widetilde{\textbf{E}}'(\textbf{y})^\top \mathbf {\uplambda }= \textbf{w}\) via back substitution as a time stepping scheme. Again, the last step (or first step in reverse-time) is as given by the adjoint IDT formulas in Lemma 2, but with \(\varDelta t \mapsto \varDelta t^*\). The remaining \(K-1\) steps, as given in Lemma 4, are derived from the solution to the following systems for \(\varvec{\varLambda }_k\) and \(\varvec{\uplambda }_{k-1}\), with \(\varvec{\uplambda }_k\) given by previous time step and \(\varvec{\varLambda }_K\) given by the last step,
where \(P=N(s+1)((K-1)-k)\), which gives
with scalar
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Bencomo, M.J., Chan, J. Discrete Adjoint Computations for Relaxation Runge–Kutta Methods. J Sci Comput 94, 59 (2023). https://doi.org/10.1007/s10915-023-02102-y
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1007/s10915-023-02102-y