Appendix
Proof of Proposition 1
Since \(\hat{A}^+,\hat{A}^-,\hat{b}^+\) and \(\hat{b}^-\) are non-negative, the theory of differential inequalities (or monotonic systems) readily implies that the solution \((x^+,x^-)\) of (15) is non-negative whenever \((x^+(0), x^-(0))\) is non-negative. To see the remainder of the statement, let \((x^+,x^-)\) solve (15) for \(x(0) = x^+(0) - x^-(0)\). Then
$$\begin{aligned} \partial _t{x}^+ - \partial _t{x}^-&= (\hat{A}^+ x^+ + \hat{A}^- x^- + \hat{b}^+) \\&\qquad - (\hat{A}^+ x^- + \hat{A}^- x^+ + \hat{b}^-) \\&= (\hat{A}^+ - \hat{A}^-)(x^+ - x^-) + (\hat{b}^+ - \hat{b}^-) \\&= \hat{A}(x^+ - x^-) + \hat{b}\end{aligned}$$
Since \(\partial _t{x} = \hat{A}x + \hat{b}\) admits a unique solution, it must hold \(x = x^+ - x^-\). \(\square \)
Proof of Proposition 2
Write (16) as \(\partial _t{x}^+_i = g^+_i(x^+,x^-)\) and \(\partial _t{x}^-_i = g^-_i(x^+,x^-)\). Since \(g^+_i(z^+,z^-) \ge - \gamma z_i^+ z_i^-\) and \(g^-_i(z^+,z^-) \ge - \gamma z_i^+ z_i^-\) for any \((z^+,z^-) \in \mathbb {R}^{2n}_{\ge 0}\) and the ODE system
$$\begin{aligned} \partial _t{z}^+_i= - \gamma z_i^+ z_i^-,\quad \partial _t{z}^-_i= - \gamma z_i^+ z_i^-, \quad 1 \le i \le n \end{aligned}$$
remains non-negative if initialized with non-negative values, we conclude that \((x^+,x^-)\) remains non-negative. Moreover, since \(\gamma Q\) is non-positive on \(\mathbb {R}^{2n}_{\ge 0}\), the solution of (16) is defined on \([0;\infty )\) and does not exhibit a finite explosion time. Since the second claim follows trivially from Proposition 1 because the Q terms cancel each other out in \(\partial _t{x}^+ - \partial _t{x}^-\), let us focus on the third claim and set \(\xi := \sup _{0 \le t \le \infty } ||x(t) ||_\infty < \infty \). Note that \(x_i = x^{+}_i - x^{-}_i\), hence we get
$$\begin{aligned} \partial _t{x}^+_i&= \big (\hat{A}^+ x^+ + \hat{A}^- x^- + \hat{b}^+\big )_i - \gamma x^{+}_i x^{-}_i \\&= \big (\hat{A}^+ x^+ + \hat{A}^- x^- + \hat{b}^+\big )_i - \gamma x^{+}_i (x^{+}_i - x_i) \\&\le \big (\hat{A}^+ x^+ + \hat{A}^- x^- + \hat{b}^+\big )_i + \gamma \xi x^{+}_i - \gamma (x^{+}_i)^2 \end{aligned}$$
Since a similar calculation implies that
$$\begin{aligned} \partial _t{x}^-_i \le \big (\hat{A}^+ x^- + \hat{A}^- x^+ + \hat{b}^-\big ) + \gamma \xi x^{-}_i - \gamma (x^{-}_i)^2 , \end{aligned}$$
we infer that there exists a \(\zeta > 0\) such that, for all i and \((z^+,z^-) \in \mathbb {R}^{2n}_{\ge 0}\), it holds that
\(g^+_i(z^+,z^-) \le -1\) if \(z^+_i \ge \zeta \) and;
\(g^-_i(z^+,z^-) \le -1\) when \(z^-_i \ge \zeta \).
This ensures that for any initial condition \((x^+(0),x^-(0)) \in \mathbb {R}^{2n}_{\ge 0}\), the solution \((x^{+},x^{-})\) enters eventually \([0;\zeta ]^{2n}\) in order to remain there forever. \(\square \)
Proof of Proposition 3
Straightforward. \(\square \)
Proof of Theorem 3
Follows from a direct combination of Proposition 3 and Theorem 2. \(\square \)
Proof of Theorems 1 and 2
Before proving Theorems 1 and 2, we first have to establish some auxiliary results. To allow for a compact notation, we denote in the present section the i-th step of the numeric sequence by \(x_i\) rather than x[i].
Proposition 4
Consider the ODE systems \(\partial _t{x} = F(x)\) and \(\partial _t{x}_h = F_h(x)\) where F and \(F_h\) are assumed to be Lipschitz continuous on some bounded domain \(B \subseteq \mathbb {R}^n\) and \(L \ge 0\) denotes the Lipschitz constant of F. Let us assume further that both ODE systems have solutions on [0; T] which remain in B and that \(\sup \{ ||F(x) - F_h(x) || \mid x \in B \} \le \eta \). Then, if \(x(0) = x_h(0)\), for all \(0 \le t \le T\) it holds that
$$\begin{aligned} ||x(t)-x_h(t) || \le \frac{\eta }{L} ( e^{L t} - 1 ) \end{aligned}$$
Proof
We first show a modified version of Gronwall’s inequality. To be more specific, let \(\xi _1\) and \(\xi _2\) be positive constants and v a continuous function on \(0 \le t < \infty \) such that
$$\begin{aligned} v(t) \le \xi _2 t + \xi _1 \int _0^t v(s) ds \end{aligned}$$
(20)
Then, it holds that \(v(t) \le \tfrac{\xi _2}{\xi _1} (\mathrm {e}^{\xi _1 t} - 1)\). To see this, we first rewrite (20) to
$$\begin{aligned} v(t) + \frac{\xi _2}{\xi _1} \le \frac{\xi _2}{\xi _1} + \xi _1 \int _0^t \left( v(s)+\frac{\xi _2}{\xi _1} \right) ds \end{aligned}$$
Since this rewrites to \(\tilde{v}(t) \le \tilde{\alpha } + \int _0^t \tilde{v}(s) \tilde{w}(s) ds\) for \(\tilde{v}(s) := v(s) + \tfrac{\xi _2}{\xi _1}\), \(\tilde{\alpha } := \tfrac{\xi _2}{\xi _1}\) and \(\tilde{w}(s) := \xi _1\), Gronwall’s inequality ensures that \(\tilde{v}(t) \le \tilde{\alpha } \cdot \mathrm {e}^{\int _0^t \tilde{w}(s) ds}\) and we infer the auxiliary statement. This, in turn, yields
$$\begin{aligned} ||x(t) - x_h(t) ||&\le \left\| x(0) - x_h(0) \right\| \\&\quad + \left\| \int _0^t \Big ( F(x(s)) - F_h(x_h(s)) \Big ) ds \right\| \\&\le \left\| \int _0^t \big (F(x(s)) - F(x_h(s))\big ) ds \right\| \\&\quad + \left\| \int _0^t \big (F(x_h(s)) - F_h(x_h(s))\big ) ds \right\| \\&\le L \int _0^t \left\| x(s) - x_h(s) \right\| ds + \eta t \\&\le \frac{\eta }{L} ( e^{L t} - 1 ) \end{aligned}$$
\(\square \)
Proposition 5
Let \(E \partial _t{x} = A x + b\) be a regular linear DAE system and let \(\mathcal {D}\subseteq \mathbb {R}^n\) denote the corresponding set of consistent initial conditions. Then, \(\mathcal {D}\) is an affine subspace of \(\mathbb {R}^n\) and \(x + h F_h(x) \in \mathcal {D}\) whenever \(x \in \mathcal {D}\).
Proof
To see that \(\mathcal {D}\) is an affine subspace of \(\mathbb {R}^n\), we refer to (Kunkel and Mehrmann 2006, Section 2.1). Note further that \(x_i = x_{i-1} + h F_h(x_{i-1})\) defines the backward Euler scheme which is applied to the DAE system \(E \partial _t{x} = A x + b\), see (Kunkel and Mehrmann 2006, Section 5.2). Consider the BDF-1 scheme (Kunkel and Mehrmann 2006, Section 5.3) which is given by
$$\begin{aligned} \tfrac{1}{h} E (x_i - x_{i-1}) = A x_i + b \end{aligned}$$
if applied to \(E \partial _t{x} = A x + b\). With this, we first observe that
$$\begin{array}{lll} &\tfrac{1}{h} E (x_i - x_{i-1})&= A x_i + b \\ \Leftrightarrow &\qquad \tfrac{1}{h} E x_i - A x_i&= \tfrac{1}{h} E x_{i-1} + b \\ \Leftrightarrow &\qquad \left(\tfrac{1}{h} E - A\right) x_i&= \tfrac{1}{h} E x_{i-1} + b \\ \Leftrightarrow &\qquad (E - h A) x_i&= E x_{i-1} + h b \\ \Leftrightarrow &\qquad x_i&= (E - h A)^{-1} (E x_{i-1} + h b) , \end{array}$$
where the inversion in the last line can always be performed for sufficiently small h because \(E \partial _t{x} = A x + b\) is regular. This, in turn, yields
$$\begin{aligned}&\frac{x_i - x_{i-1}}{h} = (E - h A)^{-1} b + \big ( (E - h A)^{-1} E - I \big ) \tfrac{1}{h} x_{i-1} \\&\quad = (E - h A)^{-1} b + (E - h A)^{-1} (E - (E - h A) ) \tfrac{1}{h} x_{i-1} \\&\quad = (E - h A)^{-1} b + (E - h A)^{-1} A x_{i-1} \\&\quad = (E - h A)^{-1} (A x_{i-1} + b) \\&\quad = F_h(x_{i-1}) \end{aligned}$$
This shows that the backward Euler scheme and the BDF-1 scheme are identical if applied to \(E \partial _t{x} = A x + b\). With this, the statement of the proposition is closely related to (Kunkel and Mehrmann 2006, Remark 5.25). To see this, we may assume without loss of generality (see proof of Kunkel and Mehrmann 2006, Theorem 5.24) that \(E \partial _t{x} = A x + b\) is such that \(A = I\) and \(E = N\) for some nilpotent N with \(N^\nu = 0\) and \(N^{\nu - 1} \ne 0\). It can be easily seen that in such a case the solution is \(x \equiv -b\), thus implying in particular that the set of consistent initial conditions is \(\mathcal {D}= \{-b\}\). Moreover, the BDF-1 scheme rewrites to
$$\begin{array}{lll} &\left(\tfrac{1}{h} N - I\right) x_i&= \tfrac{1}{h} N x_{i-1} + b \\ \Leftrightarrow &\qquad \left( I - \tfrac{1}{h} N\right) x_i&= -\tfrac{1}{h} N x_{i-1} - b \\ \Leftrightarrow &\qquad x_i&= -\left( I - \tfrac{1}{h} N\right) ^{-1} \left( \tfrac{1}{h} N x_{i-1} + b\right) \\ \Leftrightarrow &\qquad x_i&= - \sum _{l = 0}^{\nu - 1} \left( \tfrac{1}{h} N\right) ^l \left( \tfrac{1}{h} N x_{i-1} + b\right) , \end{array}$$
where the last equivalence is due to the Neumann series and the nilpotency of N. This, in turn, implies that
$$\begin{aligned} x_i&= - \sum _{l = 0}^{\nu - 1} \left( \tfrac{1}{h} N\right) ^l \left( \tfrac{1}{h} N x_{i-1} + b\right) \\&= - \sum _{l = 1}^{\nu - 1} \left( \tfrac{1}{h} N\right) ^l x_{i-1} - \sum _{l = 0}^{\nu - 1}\left( \tfrac{1}{h} N\right) ^l b \\&= - b - \sum _{l = 1}^{\nu - 1}\left( \tfrac{1}{h} N\right) ^l (x_{i-1} + b) , \end{aligned}$$
thus showing that \(x_i = -b\) whenever \(x_{i-1} = -b\). \(\square \)
Proposition 6
Let
\(E \partial _t{x} = A x + b\)
be a regular linear DAE system and let
\(\mathcal {D}\subseteq \mathbb {R}^n\)
denote the corresponding set of consistent initial conditions. Then
-
The solution of \(E \partial _t{x} = A x + b\) is contained in \(\mathcal {D}\).
-
There exist \(\hat{A} \in \mathbb {R}^{n \times n}\) and \(\hat{b} \in \mathbb {R}^n\) such that the solution of the ODE system \(\partial _t{x} = \hat{A} x + \hat{b}\) coincides with that of \(E \partial _t{x} = A x + b\) for all \(x(0) \in \mathcal {D}\).
-
Together with \(F_h(x) := (E - hA)^{-1}(A x + b)\), where \(h > 0\), it holds that \(F_h\) converges uniformly, as \(h \rightarrow 0\), to \(\hat{A} x + \hat{b}\) on any bounded subset of \(\mathcal {D}\).
Proof
The first two points are well-known in the theory of linear DAE systems, see Kunkel and Mehrmann (2006, Section 2.1) [it is interesting to note that an efficient computation of \(\hat{A} \in \mathbb {R}^{n \times n}\) and \(\hat{b} \in \mathbb {R}^n\) is difficult because it relies on index reduction (Pantelides 1988)].
To see the third claim, we observe that \(x_i = x_{i-1} + h F_h(x_{i-1})\) defines the backward Euler scheme applied to the DAE system \(E \partial _t{x} = A x + b\), see Kunkel and Mehrmann (2006, Section 5.2). We next show that \(x_0 \mapsto \frac{1}{h}(x_1 - x_0)\) converges uniformly on any bounded subset of \(\mathcal {D}\) to \(x_0 \mapsto \hat{A} x_0 + \hat{b}\) when \(h \rightarrow 0\). To this end, we may assume without loss of generality (see discussion after Equation 5.25 in Kunkel and Mehrmann 2006) that the DAE system \(E \partial _t{x} = A x + b\) is such that
where N is such that \(N^\nu = 0\) and \(N^{\nu -1} \ne 0\) for some \(\nu \ge 1\). This implies that the solution of \(E \partial _t{x} = A x + b\) is characterized by a pair of decoupled dynamical systems, namely by the ODE system \(\partial _t{x}^{\mathrm{I}} = J x^{\mathrm{I}} + b^{\mathrm{I}}\) and the DAE system \(N \partial _t{x}^{\mathrm{II}} = x^{\mathrm{II}} + b^{\mathrm{II}}\), where \(x = (x^{\mathrm{I}}, x^{\mathrm{II}})\) and \(b =(b^{\mathrm{I}}, b^{\mathrm{II}})\). Thanks to this, it suffices to consider \(x^{\mathrm{I}}_1 - x^{\mathrm{I}}_0\) and \(x^{\mathrm{II}}_1 - x^{\mathrm{I}I}_0\) separately.
Since \(x_{\mathrm{II}} \equiv - b_{\mathrm{II}}\) solves \(N \partial _t{x}^{\mathrm{II}} = x^{\mathrm{II}} + b^{\mathrm{II}}\), we infer that \(\mathcal {D}= \{ (x^{\mathrm{I}}, x^{\mathrm{II}}) \mid x^{\mathrm{II}} = - b^{\mathrm{II}} \}\). Hence, Proposition 5 shows that \(x^{\mathrm{II}}_1 - x^{\mathrm{II}}_0 = 0\) whenever \(x_0 \in \mathcal {D}\).
We next focus on \(x^{\mathrm{I}}_1 - x^{\mathrm{I}}_0\). Thanks to the fact that \(\partial _t{x}^{\mathrm{I}} = J x^{\mathrm{I}} + b^{\mathrm{I}}\), we have to investigate the local truncation error of the backward Euler scheme in the context of a linear ODE system. Despite the fact that this is discussed in many books about ODEs, we provide here a proof because most texts do not show that the local truncation error converges uniformly to zero on arbitrarily large compact sets. To this end, we first observe that the Taylor expansion of \(x^{\mathrm{I}}\) around zero yields
$$\begin{aligned} x^{\mathrm{I}}(h) = x^{\mathrm{I}}_0 + (J x^{\mathrm{I}}_0 + b^{\mathrm{I}}) h + \ddot{x}^{\mathrm{I}}(\xi ) \tfrac{h^2}{2} \end{aligned}$$
for some \(\xi \in (0;h)\). With \(\tilde{F}_h(x^{\mathrm{I}}_0) = (I - hJ)^{-1}(J x^{\mathrm{I}}_0 + b^{\mathrm{I}})\), the proof of Proposition 5 implies that \(\tilde{F}_h(x^{\mathrm{I}}_0) = \frac{1}{h}(x^{\mathrm{I}}_1 - x^{\mathrm{I}}_0)\). This, in turn, implies that
$$\begin{aligned} x^{\mathrm{I}}(h) - x^{\mathrm{I}}_1&= x^{\mathrm{I}}(h) - (x^{\mathrm{I}}_0 + h \tilde{F}_h(x^{\mathrm{I}}_0)) \\&= x^{\mathrm{I}}_0 + (J x^{\mathrm{I}}_0 + b^{\mathrm{I}}) h + \ddot{x}^{\mathrm{I}}(\xi ) \tfrac{h^2}{2} \\&\quad -\left[ x^{\mathrm{I}}_0 + h(I - h J)^{-1}(J^{\mathrm{I}} x_0 + b^{\mathrm{I}})\right] \\&= h^2 \left[ \tfrac{1}{2}\ddot{x}^{\mathrm{I}}(\xi ) + \tfrac{1}{h} (I - (I - h J)^{-1})(J x^{\mathrm{I}}_0 + b^{\mathrm{I}})\right] \end{aligned}$$
In the case \(h \le 1 / (2 ||J ||)\), the Neumann series allows us to deduce that
$$\begin{aligned} I - (I - h J)^{-1}&= (I - h J)(I - h J)^{-1} - (I - h J)^{-1} \\&= ((I - h J) - I)(I - h J)^{-1} \\&= - h J (I - h J)^{-1} \\&= - h J \sum _{k=0}^\infty (h J)^k \end{aligned}$$
with \(||\sum _{k=0}^\infty (h J)^k || \le \sum _{k = 0}^\infty 2^{-k} = 2\). Moreover, a differentiation of \(\partial _t{x}^{\mathrm{I}} = J x^{\mathrm{I}} + b^{\mathrm{I}}\) yields \(\ddot{x}^{\mathrm{I}} = J^2 x^{\mathrm{I}} + J b^{\mathrm{I}}\). This and the last statement imply the existence of constants \(\zeta _1, \zeta _2 \ge 0\) that neither depend on \(x^{\mathrm{I}}_0\) nor on h and that satisfy
$$\begin{aligned} ||x^{\mathrm{I}}(h) - x^{\mathrm{I}}_1 ||&\le h^2 \big (\zeta _1 + \zeta _2 ||x^{\mathrm{I}}_0 ||\big ) \end{aligned}$$
for all \(0 \le h \le 1\). This shows that \(x^{\mathrm{I}}_0 \mapsto \frac{1}{h}(x^{\mathrm{I}}_1 - x^{\mathrm{I}}_0)\) converges uniformly on any bounded set to \(x^{\mathrm{I}}_0 \mapsto J x^{\mathrm{I}}_0 + b^{\mathrm{I}}\). \(\square \)
We are in a position to prove Theorem 1.
Proof of Theorem 1
Let \(\hat{A} \in \mathbb {R}^{n \times n}\) and \(\hat{b} \in \mathbb {R}^n\) be as in Proposition 6 and fix \(T > 0\) and \(x(0) \in \mathcal {D}\). Since the solution of \(E \partial _t{x} = A x + b\) solves the linear ODE system \(\partial _t{x} = \hat{A} x + \hat{b}\), this implies that x exists and is bounded on [0; T]. Hence, there exists a closed ball \(B_\rho (0)\) centered at \(0 \in \mathbb {R}^n\) with radius \(\rho > 0\) such that \(x(t) \in B_\rho (0)\) for all \(0 \le t \le T\). Since \(B_\rho (0)\) is bounded, Proposition 6 ensures that x is contained in \(\mathcal {D}\) and that for any \(\eta > 0\) there exists an \(h > 0\) such that
$$\begin{aligned} \sup _{x \in B_\rho (0) \cap \mathcal {D}} || \hat{A} x + \hat{b} - F_h(x) || \le \eta \end{aligned}$$
Moreover, Proposition 5 ensures that the solution \(x_h\) of \(\partial _t{x}_h = F(x_h)\) is contained in \(\mathcal {D}\). By combining the foregoing statements, Proposition 4 yields the claim. \(\square \)
The following auxiliary results are needed for the proof of Theorem 2
Proposition 7
Fix \(E, A \in \mathbb {R}^{n \times n}\), \(B \in \mathbb {R}^{n \times (k + m)}\), \(D \in \mathbb {R}^{(k + m) \times (k + m)}\), \(d \in \mathbb {R}^{(k + m)}\) and consider the linear DAE system
Then, \((\hat{E}- h \hat{A})^{-1} (\hat{A}\hat{x}+ \hat{b})\) is given by
$$\begin{aligned} \left( \begin{array}{c} (E - hA)^{-1} \big ( Ax + B u + h B (I - hD)^{-1} (D u + d) \big ) \\ (I - hD)^{-1} (D u + d) \end{array} \right) \end{aligned}$$
(21)
Proof
By relying on the inversion formula for block matrices, we obtain
Armed with this, we infer that
and
A summation of the foregoing statements yields (21). \(\square \)
Corollary 1
Fix an arbitrary consistent initial condition \((x(0),u(0))^T \in \mathbb {R}^{n + k + m}\) of the DAE system from Proposition 7. The corresponding ODE approximation is then
$$\begin{aligned} \partial _t{x}_h&= (E - hA)^{-1} \big ( Ax_h + B u_h^{\langle 0 \rangle } + h B u_h^{\langle 1 \rangle } \big ) \\ \partial _t{u}_h^{\langle 0 \rangle }&= (I - hD)^{-1} D u_h^{\langle 0 \rangle } + (I - hD)^{-1} d \\ \partial _t{u}_h^{\langle 1 \rangle }&= (I - hD)^{-1} D u_h^{\langle 1 \rangle } \end{aligned}$$
with \(u^{\langle 0 \rangle }_h(0) = u(0)\) and \(u^{\langle 1 \rangle }_h(0) = (I - hD)^{-1} D u(0)\).
Proof
Follows directly from Proposition 7. \(\square \)
Proof of Theorem 2
In Theorem 2, replace u with \(\hat{u}\), z with \(\hat{z}\) and B with \(\hat{B}\). Afterwards, apply Corollary 1 to the case where \(u := \left( {\begin{array}{c}\hat{u}\\ \hat{z}\end{array}}\right) \in \mathbb {R}^{k + m}\) and
\(\square \)