Skip to main content
Log in

Approximation of weak adjoints by reverse automatic differentiation of BDF methods

  • Published:
Numerische Mathematik Aims and scope Submit manuscript

Abstract

We shed light on the relation between the discrete adjoints of multistep backward differentiation formula (BDF) methods and the solution of the adjoint differential equation. To this end, we develop a functional-analytic framework based on a constrained variational problem and introduce the notion of weak adjoint solutions of ordinary differential equations. We devise a Petrov-Galerkin finite element (FE) interpretation of the BDF method and its discrete adjoint scheme obtained by reverse internal numerical differentiation. We show how the FE approximation of the weak adjoint is computed by the discrete adjoint scheme and prove its convergence in the space of normalized functions of bounded variation. We also show convergence of the discrete adjoints to the classical adjoints on the inner time interval. Finally, we give numerical results for non-adaptive and fully adaptive BDF schemes. The presented framework opens the way to carry over techniques on global error estimation from FE methods to BDF methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

References

  1. Adams, R., Fournier, J.: Sobolev Spaces, Pure and Applied Mathematics (Amsterdam), vol. 140, 2nd edn. Elsevier/Academic Press, Amsterdam (2003)

    Google Scholar 

  2. Albersmeyer, J., Bock, H.G.: Efficient sensitivity generation for large scale dynamic systems. Technical report, SPP 1253 Preprints, University of Erlangen (2009)

  3. Albersmeyer, J., Bock, H. G.: Sensitivity Generation in an Adaptive BDF-Method. In: H. G. Bock, E. Kostina, X. Phu, R. Rannacher (eds.) Modeling, simulation and optimization of complex rocesses. In: Proceedings of the International Conference on High Performance Scientific Computing, March 6–10, 2006, Hanoi, Vietnam, pp. 15–24. Springer, Berlin, Heidelberg (2008)

  4. Albersmeyer, J.: Adjoint based algorithms and numerical methods for sensitivity generation and optimization of large scale dynamic systems. Ph.D. thesis, Ruprecht-Karls-Universität Heidelberg (2010)

  5. Alt, H.W.: Lineare Funktionalanalysis, 4th edn. Springer, Berlin (2002)

    MATH  Google Scholar 

  6. Berkovitz, L.: Optimal Control Theory, Applied Mathematical Sciences, vol. 12. Springer, New York (1974)

    Book  Google Scholar 

  7. Bock, H. G., Plitt, K. J.: A Multiple Shooting algorithm for direct solution of optimal control problems. In: Proceedings of the 9th IFAC World Congress, pp. 242–247. Pergamon Press, Budapest (1984)

  8. Bock, H.G.: Numerical treatment of inverse problems in chemical reaction kinetics. In: Ebert, K., Deuflhard, P., Jäger, W. (eds.) Modelling of Chemical Reaction Systems, Springer Series in Chemical Physics, vol. 18, pp. 102–125. Springer, Heidelberg (1981)

    Chapter  Google Scholar 

  9. Bock, H.G.: Randwertproblemmethoden zur Parameteridentifizierung in Systemen nichtlinearer Differentialgleichungen, Bonner Mathematische Schriften, vol. 183. Universität Bonn, Bonn (1987)

    Google Scholar 

  10. Bock, H.G., Schlöder, J.P., Schulz, V.: Numerik großer differentiell-algebraischer Gleichungen—simulation und optimierung. In: Schuler, H. (ed.) Prozeßsimulation, pp. 35–80. VCH Verlagsgesellschaft mbH, Weinheim (1994)

    Chapter  Google Scholar 

  11. Böttcher, K., Rannacher, R.: Adaptive error control in solving ordinary differential equations by the discontinuous galerkin method. Preprint 96-53, SFB 359, University of Heidelberg (1996)

  12. Cao, Y., Li, S., Petzold, L.: Adjoint sensitivity analysis for differential-algebraic equations: algorithms and software. J. Comput. Appl. Math. 149, 171–191 (2002)

    Article  MATH  MathSciNet  Google Scholar 

  13. Cao, Y., Petzold, L.: A posteriori error estimation and global error control for ordinary differential equations by the adjoint method. SIAM J. Sci. Comput. 26, 359–374 (2004)

    Article  MATH  MathSciNet  Google Scholar 

  14. Eriksson, K., Estep, D., Hansbo, P., Johnson, C.: Introduction to adaptive methods for differential equations. Acta Numerica, 4, pp. 105–158 (1995)

    Google Scholar 

  15. Hairer, E., Nørsett, S.: Solving Ordinary Differential Equations I, Springer Series in Computational Mathematics, vol. 8, 2nd edn. Springer, Berlin (1993)

    Google Scholar 

  16. Hartman, P.: Ordinary differential equations, Classics in Applied Mathematics, vol. 38. SIAM, Philadelphia, PA (2002). Corrected reprint of the second (1982) edition [Birkhäuser, Boston, MA; MR0658490 (83e:34002)]

  17. Henrici, P.: Error Propagation for Difference Methods. Robert E. Krieger Publishing Co., Huntington, NY (1970). Reprint of the 1963 edition

  18. Ioffe, A., Tihomirov, V.: Theory of Extremal Problems, Studies in Mathematics and its Applications, vol. 6. North-Holland Publishing Co., Amsterdam (1979)

    Google Scholar 

  19. Johnson, C.: Numerical Solutions of Partial Differential Equations by the Finite Element Method. Cambridge University Press, Cambridge (1987)

    Google Scholar 

  20. Johnson, C.: Error estimates and adaptive time-step control for a class of one-step methods for stiff ordinary differential equations. SIAM J. Numer. Anal. 25(4), 908–926 (1988)

    Article  MATH  MathSciNet  Google Scholar 

  21. Kirches, C., Wirsching, L., Bock, H., Schlöder, J.: Efficient direct multiple shooting for nonlinear model predictive control on long horizons. J. Process Control 22, 540–550 (2012)

    Article  Google Scholar 

  22. Kolmogorov, A., Fomin, S.: Introductory real analysis. Revised English edition. Translated from the Russian and edited by Richard A. Silverman. Prentice-Hall Inc, Englewood Cliffs (1970)

  23. Lang, J., Verwer, J.: On global error estimation and control for initial value problems. SIAM J. Sci. Comput. 29, 1460–1475 (2007)

    Article  MATH  MathSciNet  Google Scholar 

  24. Luenberger, D.: Optimization by vector space methods. Wiley Professional Paperback Series. Wiley, New York (1969)

  25. Moon, K.S., Szepessy, A., Tempone, R., Zouraris, G.: Convergence rates for adaptive approximation of ordinary differential equations. Numer. Math. 96, 99–129 (2003)

    Article  MATH  MathSciNet  Google Scholar 

  26. Natanson, I.: Theorie der Funktionen einer reellen Veränderlichen. Akademie-Verlag, Berlin: Übersetzung nach der zweiten russischen Auflage von 1957, Herausgegeben von Karl Bögel, Vierte Auflage, Mathematische Lehrbücher und Monographien, I. Mathematische Lehrbücher, Band VI, Abteilung (1975)

  27. Sandu, A.: Reverse automatic differentiation of linear multistep methods. In: Bischof, C., Bücker, H., Hovland, P., Naumann, U., Utke, J. (eds.) Advances in Automatic Differentiation. Lecture Notes in Computational Science and Engineering, vol. 64, pp. 1–12. Springer, Berlin (2008)

    Chapter  Google Scholar 

  28. Shampine, L., Gordon, M.K.: Computer Solution of Ordinary Differential Equations. Freeman, San Francisco (1975)

    MATH  Google Scholar 

  29. Shampine, L.: Numerical solution of ordinary differential equations. Chapman & Hall, New York (1994)

    MATH  Google Scholar 

  30. Walther, A.: Automatic differentiation of explicit Runge–Kutta methods for optimal control. Comput. Optim. Appl. 36, 83–108 (2007)

    Article  MATH  MathSciNet  Google Scholar 

  31. Werner, D.: Funktionalanalysis. Springer, Berlin (2000)

    MATH  Google Scholar 

  32. Wirsching, L., Bock, H., Diehl, M.: Fast NMPC of a chain of masses connected by springs. In: Proceedings of the 2006 IEEE International Conference on Control Applications (CCA), pp. 591–596 (2006). doi:10.1109/CACSD-CCA-ISIC.2006.4776712

  33. Wloka, J.: Funktionalanalysis und Anwendungen. Walter de Gruyter, Berlin, New York (1971). De Gruyter Lehrbuch

Download references

Acknowledgments

The authors express their gratitude to Christian Kirches and Andreas Potschka for valuable discussions on the subject. Scientific support of the DFG-Graduate-School 220 “Heidelberg Graduate School of Mathematical and Computational Methods for the Sciences” is gratefully acknowledged. Funding has been graciously provided by the German Ministry of Education and Research (Grant ID: 03MS649A), and the Helmholtz association through the SBCancer programme. The research leading to these results has received funding from the European Union Seventh Framework Programme FP7/2007-2013 under grant agreement \(\hbox {n}^\mathrm{o}\) FP7-ICT-2009-4 248940.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Dörte Beigel.

Appendices

Appendix 1: Lagrange multipliers in \(L^2({t_\mathrm{s}},{t_\mathrm{f}})^d\)

Recall that functions in \(C^0[{t_\mathrm{s}},{t_\mathrm{f}}]^d\), restricted to the open interval \(({t_\mathrm{s}},{t_\mathrm{f}})\), form a dense subset of the space \(L^2({t_\mathrm{s}},{t_\mathrm{f}})^d\) of all quadratically Lebesgue-integrable functions from \(({t_\mathrm{s}},{t_\mathrm{f}})\) to \(\mathbb{R }^d\). Similarly, recall that the subset \(C^1[{t_\mathrm{s}},{t_\mathrm{f}}]^d\) is dense in the Sobolev space \(H^1({t_\mathrm{s}},{t_\mathrm{f}})^d\) of all \(L^2({t_\mathrm{s}},{t_\mathrm{f}})^d\)-functions with weak derivative in \(L^2({t_\mathrm{s}},{t_\mathrm{f}})^d\) (see [1, Ch.3]). Furthermore, both spaces \(L^2({t_\mathrm{s}},{t_\mathrm{f}})^d\) and \(H^1({t_\mathrm{s}},{t_\mathrm{f}})^d\) are Hilbert spaces.

Solving (5) on \(H^1({t_\mathrm{s}},{t_\mathrm{f}})^d\), the Lagrangian \(\mathcal{L }:H^1({t_\mathrm{s}},{t_\mathrm{f}})^d \times L^2({t_\mathrm{s}},{t_\mathrm{f}})^d \rightarrow \mathbb{R }\) reads

$$\begin{aligned} \mathcal{L }(\varvec{y},\varvec{\lambda }):= J(\varvec{y}({t_\mathrm{f}})) - \int \limits _{t_\mathrm{s}}^{t_\mathrm{f}}\varvec{\lambda }^\intercal (t) \left[ \dot{\varvec{y}}(t)-\varvec{f}(t,\varvec{y}(t))\right] \mathrm{d}t - \varvec{\lambda }^\intercal ({t_\mathrm{s}}) \left[ \varvec{y}({t_\mathrm{s}})-\varvec{y}_\mathrm{s}\right] \end{aligned}$$

using the \(L^2\)-scalar product and the Lagrange multiplier \(\varvec{\lambda }\). The optimality condition of (5) is based on the Fréchet derivative of \(\mathcal{L }\) at \((\varvec{y},\varvec{\lambda })\) in direction \((\varvec{w},\varvec{\chi })\) which exists due to Fréchet differentiability of \(J\) and [18, Ch.0§0.2.5]

$$\begin{aligned} \mathcal{L }^\prime (\varvec{y},\varvec{\lambda })(\varvec{w},\varvec{\chi })&\!=\!\left\{ \displaystyle J^\prime (\varvec{y}({t_\mathrm{f}})) \varvec{w}({t_\mathrm{f}}) \!-\! \!\int \limits _{t_\mathrm{s}}^{t_\mathrm{f}}\varvec{\lambda }^\intercal (t) \left[ \! \dot{\varvec{w}}(t)\!-\!\varvec{f}_{\varvec{y}}(t,\varvec{y}(t))\varvec{w}(t) \!\right] \mathrm{d}t \!- \!\varvec{\lambda }^\intercal ({t_\mathrm{s}}) \varvec{w}({t_\mathrm{s}}) \!\right\} \\&\quad + \left\{ \displaystyle - \int \limits _{t_\mathrm{s}}^{t_\mathrm{f}}\varvec{\chi }^\intercal (t) \left[ \dot{\varvec{y}}(t)-\varvec{f}(t,\varvec{y}(t))\right] \mathrm{d}t - \varvec{\chi }^\intercal ({t_\mathrm{s}}) \left[ \varvec{y}({t_\mathrm{s}})-\varvec{y}_\mathrm{s}\right] \right\} . \end{aligned}$$

The necessary condition for a stationary point \((\varvec{y},\varvec{\lambda })\in H^1({t_\mathrm{s}},{t_\mathrm{f}})^d \times L^2({t_\mathrm{s}},{t_\mathrm{f}})^d\) of (5) is that \(\mathcal{L }^\prime (\varvec{y},\varvec{\lambda })(\varvec{w},\varvec{\chi })=0\) holds for all directions \((\varvec{w},\varvec{\chi })\in H^1({t_\mathrm{s}},{t_\mathrm{f}})^d \times L^2({t_\mathrm{s}},{t_\mathrm{f}})^d\). Choosing \(\varvec{w}=\varvec{0} \in H^1({t_\mathrm{s}},{t_\mathrm{f}})^d \) and only varying \(\varvec{\chi } \in L^2({t_\mathrm{s}},{t_\mathrm{f}})^d\) the necessary condition reads

$$\begin{aligned} \mathcal{L }_{\varvec{\lambda }}(\varvec{y},\varvec{\lambda })(\varvec{\chi })=-\int \limits _{t_\mathrm{s}}^{t_\mathrm{f}}\varvec{\chi }^\intercal (t) \left[ \dot{\varvec{y}}(t)-\varvec{f}(t,\varvec{y}(t))\right] \mathrm{d}t - \varvec{\chi }^\intercal ({t_\mathrm{s}}) \left[ \varvec{y}({t_\mathrm{s}})-\varvec{y}_\mathrm{s}\right] = 0,\; \forall \varvec{\chi } \end{aligned}$$
(32)

which possesses the same unique solution \(\varvec{y}\in C^1[{t_\mathrm{s}},{t_\mathrm{f}}]^d\) as (1). Taking now \(\varvec{\chi }=\varvec{0} \in L^2({t_\mathrm{s}},{t_\mathrm{f}})^d\) and only varying \(\varvec{w} \in H^1({t_\mathrm{s}},{t_\mathrm{f}})^d\) one obtains using integration by parts

$$\begin{aligned} \mathcal{L }_{\varvec{y}}(\varvec{y},\varvec{\lambda })(\varvec{w})\!=\!\left[ J^\prime (\varvec{y} ({t_\mathrm{f}}))\!-\!\varvec{\lambda }^\intercal ({t_\mathrm{f}})\right] \varvec{w}({t_\mathrm{f}})\!-\! \int \limits ^{t_\mathrm{s}}_{t_\mathrm{f}}\left[ \dot{\varvec{\lambda }}(t)\!+\! \varvec{f}^\intercal _{\varvec{y}}(t,\varvec{y}(t)) \varvec{\lambda }(t)\right] ^\intercal \varvec{w}(t) \mathrm{d}t\!=\! 0,\; \forall \varvec{w} \end{aligned}$$

which possesses the same solution as (2).

Appendix 2: Duality pairing between \(C^0[{t_\mathrm{s}},{t_\mathrm{f}}]^d\) and \({{\mathrm{NBV}}}[{t_\mathrm{s}},{t_\mathrm{f}}]^d\)

According to the Riesz Representation Theorem [24, Ch.5§5.5] for every continuous linear functional \(\mathfrak L \) on \(C^0[{t_\mathrm{s}},{t_\mathrm{f}}]\) there exists a unique \(\varPsi \in {{\mathrm{NBV}}}[{t_\mathrm{s}},{t_\mathrm{f}}]\) such that

$$\begin{aligned} \mathfrak L [g] = \left\langle \varPsi ,g \right\rangle _{{{\mathrm{NBV}}}[{t_\mathrm{s}},{t_\mathrm{f}}],C^0[{t_\mathrm{s}},{t_\mathrm{f}}]} = \int \limits _{t_\mathrm{s}}^{t_\mathrm{f}}g(t) \mathrm{d}\varPsi (t), \end{aligned}$$
(33)

where the Riemann-Stieltjes integral [26, Ch.VIII§6] is utilized. The Banach space \({{\mathrm{NBV}}}[{t_\mathrm{s}},{t_\mathrm{f}}]\) consists of all normalized functions of bounded variation on \([{t_\mathrm{s}},{t_\mathrm{f}}]\) that are zero in \({t_\mathrm{s}}\) and continuous from the right on \(({t_\mathrm{s}},{t_\mathrm{f}})\). It is equipped with the total variation norm

$$\begin{aligned} \left\| \varPsi \right\| _{{{\mathrm{NBV}}}[{t_\mathrm{s}},{t_\mathrm{f}}]} = \sup \sum _{i=1}^{m} \left| \varPsi (t_i)-\varPsi (t_{i-1}) \right| \end{aligned}$$

where the supremum is taken over all partitions \({t_\mathrm{s}}=t_0<\dots <t_m={t_\mathrm{f}}\) of \([{t_\mathrm{s}},{t_\mathrm{f}}]\). According to the Riesz Representation Theorem, for each \(\varPsi \) the value of the total variation norm coincides with the value of the dual norm given by

$$\begin{aligned} \left\| \varPsi \right\| _{{{\mathrm{NBV}}}[{t_\mathrm{s}},{t_\mathrm{f}}]}= \max _{\left\| g \right\| _{C^0[{t_\mathrm{s}},{t_\mathrm{f}}]}=1} \left| \left\langle \varPsi ,g \right\rangle _{{{\mathrm{NBV}}}[{t_\mathrm{s}},{t_\mathrm{f}}],C^0[{t_\mathrm{s}},{t_\mathrm{f}}]} \right| . \end{aligned}$$

The dual of the finite Cartesian product \(C^0[{t_\mathrm{s}},{t_\mathrm{f}}]^d\) is the finite Cartesian product \({{\mathrm{NBV}}}[{t_\mathrm{s}},{t_\mathrm{f}}]^d\) of the duals with duality pairing

$$\begin{aligned} \left\langle \varvec{\varPsi },\varvec{g} \right\rangle _{{{{\mathrm{NBV}}}}^d,\left( C^0\right) ^d} = \sum _{i=1}^d \left\langle \varPsi _i,g_i \right\rangle _{{{\mathrm{NBV}}},C^0} = \sum _{i=1}^d \int \limits _{t_\mathrm{s}}^{t_\mathrm{f}}g_i(t) \mathrm{d}\varPsi _i(t) =: \int \limits _{t_\mathrm{s}}^{t_\mathrm{f}}\varvec{g}(t) \mathrm{d}\varvec{\varPsi }(t) \end{aligned}$$

and dual norm \( \left\| \varPsi \right\| _{{{\mathrm{NBV}}}[{t_\mathrm{s}},{t_\mathrm{f}}]^d} = \max _{1\le i\le d} \left\| \varPsi _i \right\| _{{{\mathrm{NBV}}}[{t_\mathrm{s}},{t_\mathrm{f}}]}\), see [33, Ch.II§12.1].

Rights and permissions

Reprints and permissions

About this article

Cite this article

Beigel, D., Mommer, M.S., Wirsching, L. et al. Approximation of weak adjoints by reverse automatic differentiation of BDF methods. Numer. Math. 126, 383–412 (2014). https://doi.org/10.1007/s00211-013-0570-4

Download citation

  • Received:

  • Revised:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00211-013-0570-4

Keywords

Mathematics Subject Classification (2000)

Navigation