Abstract
The present paper continues our investigation of an implementation of a least-squares collocation method for higher-index differential-algebraic equations. In earlier papers, we were able to substantiate the choice of basis functions and collocation points for a robust implementation as well as algorithms for the solution of the discrete system. The present paper is devoted to an analytic estimation of condition numbers for different components of an implementation. We present error estimations, which show the sources for the different errors.
Similar content being viewed by others
Avoid common mistakes on your manuscript.
1 Introduction
In a series of papers [1,2,3,4], we were developing a new method for solving higher-index differential algebraic equations (DAEs). In naturally given functional analytic settings, higher index DAEs give rise to ill-posed problems [5, Section 3.9; 6]. Motivated by the well-known method of least-squares, or discretization on preimage space, for the approximation of ill-posed problems [7], this approach has been adapted to the case of higher-index DAEs. In particular, the ansatz spaces for the discrete least-squares problem have been chosen to be piecewise polynomials. Additionally, the integrals have been replaced by discrete versions based on simplified integration rules, in the most simple approach by a version resembling well-known collocation methods for solving boundary value problems for systems of ordinary differential equations (ODEs). The latter, extremely simplified version of the approach proposed in [7], has been motivated by the success of collocations methods for ODEs. This connection led us to coin the notion least-squares collocation method and calling the integration nodes also as collocation points.
For our method, a number of convergence results for both linear and nonlinear DAEs have been proven. Even our first attempts showed surprisingly accurate results when applying the method to some linear examples [1]. More recently, we investigated the algorithmic ingredients of the method in more detail [8, 9]. Not surprisingly, the basis representation and the choice of the integration nodes showed an important influence on the accuracy of the method.
The present note is intended to further quantify the conditioning of the individual ingredients of the implementation of the proposed method and to better understand the (high) accuracy of the computational results obtained so far. Taking the ill-posedness of higher-index DAEs into account, we expect very sensitive discrete problems for sufficiently fine discretizations.
The practical implementation of a projection method consists of two steps for a given approximation space Xπ: Choice of a basis and formulation and solution of the arising discrete system by a suitable method. This in turn gives rise to two different operators, the first being the representation map connecting the elements of x ∈ Xπ with their vector of coefficients with respect to the chosen basis. The other operator is the discrete version of the the least-squares collocation method that becomes a linearly equality constrained linear least-squares problem in our case. Both operators are investigated in detail both analytically and numerically.
In particular, qualitative and quantitative estimations for the condition numbers and norms of the representation map are proven for bases whose usefulness in the present applications has been established earlier [8, 9].
For the constrained linear least-squares problem, a number of perturbation results are well-known, e.g., [10,11,12]. However, in the present application, the constraints play a special role: In the usual choices of the basis functions, some coefficient vectors do not represent a function in the approximation space. A coefficient vector represents a function in the approximation space if and only if the constraints are fulfilled. Therefore, a new error estimation is derived, which takes care of the exceptional role of the constraints. The important ingredients in this estimate are the condition number of the constraints and a restricted condition number for the least-squares functional. For the former, a complete analytical characterization for the chosen bases is provided. In a number of numerical examples, values for the restricted condition number are presented.
In Section 2, the least-squares method for approximating linear DAEs is introduced and the representation map is constructed. Section 3 is devoted to an in-depth investigation of the representation map. Then we derive a perturbation result for constrained linear least-squares problems in Section 4. Numerical examples for the condition numbers of the different ingredients are given in Section 5. Section 6 contains some conclusions.
2 The problem setting
2.1 The discrete functional
In this section, we repeat the problem setting from [8] for the reader’s convenience. Consider a linear boundary-value problem for a DAE with properly involved derivative,
with \([a,b]\subset \mathbb {R}\) being a compact interval, \(D=[I 0]\in \mathbb {R}^{k\times m}\), k < m, with the identity matrix \(I\in \mathbb {R}^{k\times k}\). Furthermore, \(A(t)\in \mathbb {R}^{m\times k}\), \(B(t)\in \mathbb {R}^{m\times m}\), and \(q(t)\in \mathbb {R}^{m}\) are assumed to be sufficiently smooth with respect to t ∈ [a,b]. Moreover, \(G_{a},G_{b}\in \mathbb {R}^{l_{{dyn}}\times m}\). Thereby, ldyn is the dynamical degree of freedom of the DAE, that is, the number of free parameters that can be fixed by initial and boundary conditions. We assume further that \(\ker D\subseteq \ker G_{a}\) and \(\ker D\subseteq \ker G_{b}\).
Unlike regular ODEs where ldyn = k = m, for DAEs, it holds that 0 ≤ ldyn ≤ k < m, in particular, ldyn = k for index-one DAEs, ldyn < k for higher-index DAEs, and ldyn = 0 can certainly happen.
The appropriate space for looking for solutions of (1)–(2) is (cf [2])
Let \(\mathfrak {P}_{K}\) denote the set of all polynomials of degree less than or equal to K ≥ 0. Given the partition π,
with the stepsizes hj = tj − tj− 1, \(h=\max \limits _{1\leq j\leq n}h_{j}\), and \(h_{\min \limits }=\min \limits _{1\leq j\leq n}h_{j}\). Let \(C_{\pi }([a,b],\mathbb {R}^{m})\) denote the space of piecewise continuous functions having breakpoints merely at the meshpoints of the partition π. Let N ≥ 1 be a fixed integer. We are looking for an approximate solution of our boundary value problem from the ansatz space \(X_{\pi }\subset {H_{D}^{1}}(a,b)\),
The continuous version of the least-squares method reads: Find an xπ ∈ Xπ that minimizes the functional
Here and in the following, \(\lvert \cdot \rvert \) denotes the Euclidean norm in the corresponding spaces \(\mathbb {R}^{\alpha }\) for the appropriate α. Let 〈⋅,⋅〉 denote the scalar product in \(\mathbb {R}^{\alpha }\).
The functional values Φ(x), which are needed when minimizing for x ∈ Xπ, cannot be evaluated exactly and the integral must be discretized accordingly. Taking into account that the boundary-value problem is ill-posed in the higher index case, perturbations of the functional may have a serious influence on the error of the approximate least-squares solution or even prevent convergence towards the exact solution. Therefore, careful approximations of the integral in Φ are required. We take over the options provided in [8], in which M ≥ N + 1 so-called collocation points
are used, and further, on the subintervals of the partition π,
Introducing, for each x ∈ Xπ and w(t) = A(t)(Dx)′(t) + B(t)x(t) − q(t), the corresponding vector \(W\in \mathbb {R}^{mMn}\) by
we turn to an approximate functional of the form
with a positive definite symmetric matrixFootnote 1
As detailed in [8], we have different options for the positive definite symmetric matrix \(L\in \mathbb {R}^{M\times M}\), namely
see [8, Section 3] for details concerning the selection of the quadrature weights γ1,…,γM and the construction of the mass matrix V. We emphasize that the matrices LC,LI,LR depend only on M, the node sequence (6), and the quadrature weights, but do not depend on the partition π and its stepsizes at all.
In the context of the numerical experiments below, we denote each of the different versions of the functional by \({\Phi }_{\pi ,M}^{C}\), \({\Phi }_{\pi ,M}^{I}\), and \({\Phi }_{\pi ,M}^{R}\), respectively. The following convergence result is known [8, Theorem 2]:
Theorem 1
Let the DAE (1) be regular with index \(\mu \in \mathbb {N}\) and let the boundary condition (2) be accurately stated. Let x∗ be a solution of the boundary value problem (1)–(2), and let A,B,q and also x∗ be sufficiently smooth.
Let all partitions π be such that \(h/h_{\min \limits }\leq \rho \), with a global constant ρ. Then, with
the following statements are true:
-
(1)
For sufficient fine partitions π and each sequence of arbitrarily placed nodes (6), there exists exactly one \(x_{\pi }^{R}\in X_{\pi }\) minimizing the functional \({\Phi }_{\pi ,M}^{R}\) on Xπ, and
$$ \begin{array}{@{}rcl@{}} \|x_{\pi}^{R}-x_{\ast}\|_{{H_{D}^{1}}(a,b)}\leq C_{R}h^{N-\mu+1}. \end{array} $$ -
(2)
For each integration rule related to the interval [0,1], with M nodes (6) and positive weights γ1,…,γM, that is exact for polynomials with degree less than or equal to 2M − 2, and sufficient fine partitions π, there exists exactly one \(x_{\pi }^{I}\in X_{\pi }\) minimizing the functional \({\Phi }_{\pi ,M}^{I}\) on Xπ, and \(x_{\pi }^{I}=x_{\pi }^{R}\), thus
$$ \begin{array}{@{}rcl@{}} \|x_{\pi}^{I}-x_{\ast}\|_{{H_{D}^{1}}(a,b)}\leq C_{R}h^{N-\mu+1}. \end{array} $$
A corresponding result for \({\Phi }_{\pi ,M}^{C}\) is not known. Numerical tests showed excellent convergence results even for cases not covered by Theorem 1. This holds in particular for any M ≥ N + 1 tested in all three cases of the functional Φπ,M. Thus, M = N + 1 seems to be the preferable choice.
2.2 A basis representation of Φπ,M
By choosing an appropriate basis for Xπ, the minimization of the functional (8) will be reduced to a minimization problem for the coefficients of the elements x ∈ Xπ. For the subsequent considerations, it is appropriate to introduce the space
In particular, the elements x of \(\tilde {X}_{\pi }\) are no longer required to have continuous components Dx. Obviously, it holds \(X_{\pi }\subseteq \tilde {X}_{\pi }\). In general, \(\tilde {X}_{\pi }\) is not a subspace of \({H_{D}^{1}}(a,b)\). However, it holds
Based on the analysis in [8, Section 4], we provide a basis of the ansatz space \(\tilde {X}_{\pi }\) to begin with. Assume that {p0,…,pN− 1} is a basis of \(\mathfrak {P}_{N-1}\) defined on the reference interval [0,1]. Then, \(\{\bar {p}_{0},\ldots ,\bar {p}_{N}\}\) given by
form a basis of \(\mathfrak {P}_{N}\). The transformation to the interval (tj− 1,tj) of the partition π (3) yields
and in particular
Next, we form the matrix functions
such that
Following the discussions in [8], the following bases are suitable in applications:
- Legendre basis:
-
Let Pi denote the Legendre polynomials. Then, pi is chosen to be the shifted Legendre polynomial, that is
$$ p_{i}(\tau)=P_{i}(2\tau-1),\quad i=0,1,\ldots. $$ - Modified Legendre basis:
-
In this case, we set
$$ \bar{p}_{0}(\tau)=1,\quad\bar{p}_{i}(\tau)=P_{i}(2\tau-1)-(-1)^{i},\quad i=1,2,\ldots, $$such that \(p_{i}=\bar {p}_{i+1}^{\prime }\), i = 0,1,…. This basis has not been considered in [8], but later experiments indicated its usefulness. This is supported by considerations later below.
- Chebyshev basis:
-
Let Ti denote the Chebyshev polynomials of the first kind. Then we define
$$ p_{i}(\tau)=T_{i}(2\tau-1),\quad i=0,1,\ldots. $$ - Runge-Kutta basis:
-
Let 0 < τ1 < ⋯ < τN < 1 be interpolation nodes. Then we set
$$ p_{i}(\tau)=\frac{{\prod}_{\kappa\neq i+1}(\tau-\tau_{\kappa})}{{\prod}_{\kappa\neq i+1}(\tau_{i+1}-\tau_{\kappa})}. $$(18)The latter are the usual Lagrange interpolation polynomials. In the implementation, it is advantageous to represent these polynomials in terms of Chebyshev polynomials [8]. Of particular use is the Runge-Kutta basis if the shifted Chebyshev nodes \(\tau _{\kappa }=\frac {1}{2}\left (1+\cos \limits \left (\frac {2\kappa -1}{2N}\pi \right )\right )\) are chosen as interpolation nodes.
For \(x\in \tilde {X}_{\pi }\) we use the denotations
Then, we develop each xj componentwise
with
Introducing still
with \(\mathcal {O}_{1}\in \mathbb {R}^{k\times kN}\) and \(\mathcal {O}_{2}\in \mathbb {R}^{(m-k)\times (m-k)(N+1)}\) being matrices having only zero entries we represent, for t ∈ Ij, j = 1,…,n,
where \(\bar {\mathcal {P}}_{j}^{\prime }(t)=\begin {bmatrix}0 & p_{j0} & {\ldots } & p_{j,N-1}\end {bmatrix}\). Now we collect all coefficients cjκl in the vector c,
Definition 1
The mapping \(\mathcal {R}:\mathbb {R}^{n(mN+k)}\rightarrow \tilde {X}_{\pi }\) given by (20) is called the representation map of \(\tilde {X}_{\pi }\) with respect to the basis (15).
Fact 1
We observe that each \(x\in \tilde {X}_{\pi }\) has a representation of the kind (20) and each function of the form (20) is an element of \(\tilde {X}_{\pi }\). Since \(\dim \tilde {X}_{\pi }=n(mN+k)\), \(\mathcal {R}\) is a bijective mapping.
Consider an element \(x\in \tilde {X}_{\pi }\) with its representation (20). This element belongs to Xπ if and only if its first k components are continuous. Using the representation (19) we see that x ∈ Xπ if and only if
where \(\mathcal {C}\in \mathbb {R}^{k(n-1)\times n(mN+k)}\) and
Owing to the construction, \(\mathcal {C}\) has full row rank, cf. (16), (17).
Fact 2
Let \(\tilde {\mathcal {R}}=\left .\mathcal {R}\right \rvert _{\ker \mathcal {C}}\) be the restriction of the representation map \(\mathcal {R}\) onto the kernel \(\ker \mathcal {C}\) of \(\mathcal {C}\). Since \(\mathcal {C}\) has full row rank, \(\dim \ker \mathcal {C}=n(mN+k)-k(n-1)=nmN+k=\dim X_{\pi }\), and \(\mathcal {R}\) is injective, \(\tilde {\mathcal {R}}\) is bijective. In particular, it holds also \(\tilde {\mathcal {R}}^{-1}=\left .\mathcal {R}^{-1}\right \rvert _{\text {im}\tilde {\mathcal {R}}}\).
The representations (20)–(21) can be inserted into the functional Φπ,M (8). The result becomes a least-squares functional of the form
where \(\mathcal {A}\) has the structure
where \(\mathcal {A}_{j}\in \mathbb {R}^{mM\times (mN+k)}\) and \(G_{a}{\Omega }_{1}(t_{0}),G_{b}{\Omega }_{n}(t_{n})\in \mathbb {R}^{l_{{dyn}}\times (mN+k)}\).
So the discrete version of the least-squares method (8) becomes the linear least-squares problem (23) under the linear equality constraint (22).
Note that it holds \(r\in \mathbb {R}^{nmM+l_{{dyn}}}\) and \(\mathcal {A}\in \mathbb {R}^{(nmM+l_{{dyn}})\times n(mN+k)}\). The matrices \(\mathcal {A}\) and \(\mathcal {C}\) are very sparse. More details of the construction of \(\mathcal {A}\) and \(\mathcal {C}\) can be found in [9].
2.3 Conditioning of the implementation
The implementation for solving the least-squares problem (8) consists of the following steps:
-
1.
Form \(\mathcal {A}\), \(\mathcal {C}\), and r.
- 2.
-
3.
Form the approximation xπ.
What are the errors to be expected? Consider the individual steps:
-
1.
The computation of \(\mathcal {C}\) is not critical. Depending on the chosen basis, the entries of \(\mathcal {C}\) may be available analytically. So we expect at most rounding errors for the representation of the analytical data.Footnote 2 While the components of \(\mathcal {A}\) corresponding to the boundary conditions are only subject to truncation errors when representing real numbers in floating point arithmetic, the DAE-related entries are subject to rounding errors as well as certain amplification factors stemming from the multiplication by the square root of the matrix \({\mathscr{L}}\) (9). The conditioning of the versions (10) and (11) is easy to infer while that of (12) has been discussed extensively in [8]. Under reasonable assumptions on the choice of collocation points, they are rather small.
Similar considerations apply to the computation of r.
-
2.
This algorithmic step corresponds to the solution of a linearly constrained linear least-squares problem. A number of classical perturbation results are available, e.g., [11,12,13]. Further below, we represent a modified version that is taking into account the special role that the equality constraint \(\mathcal {C}c=0\) is playing in our application.
-
3.
This step is described by the representation map \(\mathcal {R}\), which assigns, to each solution c of the previous step, the corresponding solution \(x_{\pi }=\mathcal {R}c\). If \(c\in \ker \mathcal {C}\), it holds \(x_{\pi }\in X_{\pi }\subseteq {H_{D}^{1}}(a,b)\). However, due to the errors made in the previous step, the condition \(c\in \ker \mathcal {C}\) cannot be guaranteed such that \(\mathcal {R}c\in \tilde {X}_{\pi }\) but not necessarily \(\mathcal {R}c\in {H_{D}^{1}}(a,b)\)! In the next section, we will discuss the properties of \(\mathcal {R}\).
3 Properties of the representation map \(\mathcal {R}\)
In the present section, we will investigate the properties of the representation map \(\mathcal {R}:\mathbb {R}^{n(mN+k)}\rightarrow \tilde {X}_{\pi }\) in more detail. Previously, we have established a representation of \(\mathcal {R}\) on each subinterval; see (20). We intend to derive a representation of \(\mathcal {R}^{-1}\). The main tool will be interpolation.
Choose two sets of interpolation nodes
and shifted ones
such that the integration formulae
have positive weights and so that they are exact for polynomials up to degree 2N and 2N − 2, respectively. With matrices
and
we represent, for κ = 1,…,k,
and, for κ = k + 1,…,m,
The matrices \(\bar {V}\) and V are nonsingular. This amounts to the relation
Owing to the fact, that polynomials of degree N and N − 1 are uniquely determined by their values at N + 1 and N different nodes, respectively, formula (28) provides \(c=\mathcal {R}^{-1}x\) for each arbitrary given \(x\in \tilde {X}_{\pi }\).
Next, we equip \(\tilde {X}_{\pi }\) with the norms
The latter norm reduces, for x ∈ Xπ, to \(\lVert x\rVert _{H_{D,\pi }^{1}}=\|x\|_{{H_{D}^{1}}(a,b)}\). Moreover, \(\lVert \cdot \rVert _{L^{2}}=\lVert \cdot \rVert _{L^{2}((a,b),\mathbb {R}^{m})}\). On \(\mathbb {R}^{n(mN+k)}\), we use the Euclidean norm. Then \(\mathcal {R}\) becomes a homeomorphism in each case, and we are interested in the respective operator norms \(\lVert \mathcal {R}\rVert _{\mathbb {R}^{n(mN+k)}\rightarrow L^{2}}\), \(\lVert \mathcal {R}\rVert _{\mathbb {R}^{n(mN+k)}\rightarrow H_{D,\pi }^{1}}\), \(\|\mathcal {R}^{-1}\|_{L^{2}\rightarrow \mathbb {R}^{n(mN+k)}}\), and \(\|\mathcal {R}^{-1}\|_{H_{D,\pi }^{1}\rightarrow \mathbb {R}^{n(mN+k)}}\). Regarding the properties of the related integration formulae and introducing the diagonal matrices
we compute for any \(x=\mathcal {R}c\), and κ = 1,…,k,
and, in addition, for κ = k + 1,…,m,
Summarizing, the following representations result:
with matrices
and
with matrices
Proposition 1
The singular values of \(\mathcal {U}\) and \(\hat {\mathcal {U}}\) are independent of the choice of the nodes σi and \(\bar {\sigma }_{i}\). Moreover, all singular values are positive.
Proof
Uj and \(\hat {U}_{j}\) have full column-rank. Consequently, \(\mathcal {U}^{T}\mathcal {U}\) and \(\hat {\mathcal {U}}^{T}\hat {\mathcal {U}}\) are symmetric and positive definite. Hence, their eigenvalues are all positive and, thus, also their singular values being the square root of the eigenvalues. The eigenvalues are independent of the choice of the nodes σi and \(\bar {\sigma }_{i}\) since, owing to the properties of the involved integration formulae, it holds that
such that the entries of \(\mathcal {U}^{T}\mathcal {U}\) and \(\hat {\mathcal {U}}^{T}\hat {\mathcal {U}}\) are independent of the choice of the integration formulae. □
Theorem 2
Let \(\sigma _{{\min \limits }}(\mathcal {U})\) and \(\sigma _{{\max \limits }}(\mathcal {U})\) denote the maximal and minimal singular values of \(\mathcal {U}\). Similarly, let \(\sigma _{{\min \limits }}(\mathcal {\hat {U}})\) and \(\sigma _{{\max \limits }}(\mathcal {\hat {U}})\) denote the maximal and minimal singular values of \(\hat {\mathcal {U}}\). Then it holds
Proof
It holds \(\mathcal {\hat {U}}\in \mathbb {R}^{\nu \times \lambda }\) with ν = n(mN + k + k(N + 1)) and λ = n(mN + k). Let \(\hat {\mathcal {U}}=U{\Sigma } V^{T}\) be the singular value decomposition of \(\mathcal {U}\). Here,
with \(s_{1}=\sigma _{{\max \limits }}(\hat {\mathcal {U}})\) and \(s_{\nu }=\sigma _{{\min \limits }}(\hat {\mathcal {U}})\). According to Proposition 1, \(\sigma _{{\min \limits }}(\hat {\mathcal {U}})>0\). By (34), this leads to
and
The statements concerning \(\lVert \mathcal {R}\rVert _{\mathbb {R}^{n(mN+k)}\rightarrow L^{2}}\) and \(\|\mathcal {R}^{-1}\|_{H_{D,\pi }^{1}\rightarrow \mathbb {R}^{n(mN+k)}}\) follow similarly. □
Using the structure (33) of \(\mathcal {U}\), we obtain
The estimation of the singular values of \(\hat {\mathcal {U}}\) leads to slightly more involved expressions. Let \(U_{j,{red}}=\left [\begin {array}{c} h_{j}\bar {\Gamma }\bar {V}\\ \bar {\Gamma }\mathring {V} \end {array}\right ]\). Then, it holds
We note that \(\sigma _{\min \limits }(\bar {\Gamma }\mathring {V})=0\) and \(\sigma _{\max \limits }({\Gamma } V)=\sigma _{\max \limits }(\bar {\Gamma }\mathring {V})\). This follows immediately from the construction of the basis for the differential components (14). The definition of singular values and Weyl’s Theorem [14, Theorem III.2.1] provides us with
since \(\lambda _{\min \limits }(\mathring {V}^{T}\bar {\Gamma }^{2}\mathring {V})=0\). Then,
Moreover,
Collecting all estimates ,Theorem 2 provides
Theorem 3
Let the grid (3) have the maximal stepsize h and the minimal stepsize \(h_{\min \limits }\). Furthermore, let Γ and \(\bar {\Gamma }\) be given by (31) and let V, \(\bar {V}\), and V̈ be given by (26), (25), (27). Then it holds, for sufficiently small h,
and
In particular, \(\lVert \mathcal {R}\rVert _{\mathbb {R}^{n(mN+k)}\rightarrow H_{D,\pi }^{1}}=\lVert \mathcal {R}\rVert _{\mathbb {R}^{n(mN+k)}\rightarrow L^{2}}+O(h^{3/2})\).
In these estimates, we used the fact \(\sigma _{\min \limits }({\Gamma } V)>0\). Note that the constants hidden in the big-O notation in this theorem depend on both N and the chosen basis. For the restriction \(\tilde {\mathcal {R}}\) of \(\mathcal {R}\) onto \(\ker \mathcal {C}\) we obtain, obviously,
For some special cases, the singular values can be easily derived.
Proposition 2
Let V, \(\bar {V}\), and V̈ be given by (25)–(27) and Γ, \(\bar {\Gamma }\) by (31). Then it holds:
-
(1)
Let p0,…,pN− 1 be an orthogonal basis in L2(0,1). Then
$$ \begin{array}{@{}rcl@{}} \sigma_{\min}({\Gamma} V) & =\min\left\{ \lVert p_{\alpha}\|_{L^{2}(0,1)}:\alpha=0,\ldots,N-1\right\} ,\\ \sigma_{\max}({\Gamma} V) & =\max\left\{ \lVert p_{\alpha}\|_{L^{2}(0,1)}:\alpha=0,\ldots,N-1\right\} . \end{array} $$In particular, if p0,…,pN− 1 is the Legendre basis, \(\sigma _{\min \limits }({\Gamma } V)=(2N-1)^{-1/2}\) and \(\sigma _{\max \limits }({\Gamma } V)=1\).
-
(2)
For an orthonormal basis p0,…,pN− 1 in L2(0,1), \(\sigma _{\min \limits }({\Gamma } V)=\sigma _{\max \limits }({\Gamma } V)=1\).
-
(3)
If p0,…,pN− 1 is the modified Legendre basis, it holds \(\sigma _{\min \limits }(\bar {\Gamma }\bar {V})\geq (2N+1)^{-1/2}\) and \(\sigma _{\max \limits }(\bar {\Gamma }\bar {V})\leq (N+2)^{1/2}\). Furthermore, the estimates
$$ \sigma_{\min}({\Gamma} V)\geq\left( \frac{1}{2-2\cos\frac{N}{N+2}\pi}\right)^{1/2}\geq\frac{1}{2},\quad\sigma_{\max}({\Gamma} V)\leq\left( \frac{2N-1}{2-2\cos\frac{1}{N+2}\pi}\right)^{1/2} $$hold true.
Proof
First, we observe that \((V^{T}{\Gamma }^{2}V)_{\alpha \upbeta }={{\int \limits }_{0}^{1}}p_{\alpha -1}(\rho )p_{\upbeta -1}(\rho ){d}\rho =\delta _{\alpha \upbeta }\lVert p_{\alpha -1}\rVert _{L^{2}(a,b)}^{2}\). This provides (1) and (2) as special cases.
Consider the modified Legendre basis now. It holds \({{\int \limits }_{0}^{1}}\bar {p}_{0}^{2}(\rho ){d}\rho =1\) and \({{\int \limits }_{0}^{1}}\bar {p}_{0}(\rho )\bar {p}_{\alpha }(\rho ){d}\rho ={{\int \limits }_{0}^{1}}(P_{\alpha }(2\rho -1)-(-1)^{\alpha }){d}\rho =(-1)^{\alpha +1}\) for α = 1,2,…. Moreover, for α,β= 1,2,…, we have
Collecting these expressions, we obtain the compact representation
with \(f^{T}=[1,1,-1,+1,-1,\ldots ,\pm 1]\in \mathbb {R}^{N+1}\). ffT is a rank-1 matrix having, therefore, the N-fold eigenvalue 0. Moreover, f is an eigenvector to the eigenvalue fTf = N + 1. In particular, ffT is positive semidefinite. Invoking Weyl’s theorem again, we obtain
This proves the first assertion of (3).
The relation \((\mathring {V}^{T}\bar {\Gamma }^{2}\mathring {V})_{\alpha \upbeta }={{\int \limits }_{0}^{1}}\bar {p}^{\prime }_{\alpha -1}\bar {p}^{\prime }_{\upbeta -1}(\sigma ){d}\sigma \) shows that \(K=\mathring {V}^{T}\bar {\Gamma }^{2}\mathring {V}\) is the stiffness matrix of the basis functions. For the modified Legendre basis, it has been investigated in [1, cp Eq. (31)]. According to the proof of Proposition A.2 of [1], the nonvanishing eigenvalues can be estimated byFootnote 3
\(K^{\prime }=V^{T}{\Gamma }^{2}V\) is the submatrix of K obtained by omitting the first row and column of K, which consist entirely of zeros. This provides the final relations of assertion (3). □
An asymptotic analysis shows that \(\sigma _{\max \limits }({\Gamma } V)\leq \frac {2}{\sqrt {\pi }}N^{3/2}+O(N^{1/2})\) in the case of the modified Legendre basis.
Remark 1
We are able to estimate the size of the jump of elements of \(\tilde {X}_{\pi }\) at the grid points. For any \(\tilde {x}\in \tilde {X}_{\pi }\) and \(\tilde {c}=\mathcal {R}^{-1}\tilde {x}\), it holds
with \(C_{h_{j}}=\left (\max \limits \{2/h_{j},h_{j}\}\right )^{1/2}\). Here, we used [15, Lemma 3.2]. For sufficiently small hj, this estimate reduces to
Let x be any element of Xπ and \(c=\mathcal {R}^{-1}x\). Replacing \(\tilde {c}\) by \({\Delta } c=\tilde {c}-c\) in the last estimate, we obtain
Proposition 2 provides estimations for the factor \(\sigma _{\max \limits }({\Gamma } V)\). In particular, for some bases, it does not depend on the polynomial degree N. __
4 Error estimation for the constrained minimization problem
The aim of this section is the derivation of bounds for perturbations of the solution c for the problem (23)–(22), that is,
under perturbation of the data \(\mathcal {A}\), \(\mathcal {C}\), r. Such bounds are known for a long time, e.g., [11, 12]. However, we will provide different bounds in this section. The reason for this is that the constraint \(\mathcal {C}c=0\) has an exceptional meaning in the present context: It holds \(\mathcal {C}c=0\) if and only if \(\mathcal {R}c\in {H_{D}^{1}}(a,b)\). If a perturbation \({\Delta }\mathcal {C}\) of \(\mathcal {C}\) changes the kernel of \(\mathcal {C}\), it does no longer hold \(\mathcal {R}c\in {H_{D}^{1}}(a,b)\) in general! Therefore, we will consider the two cases \(\ker (\mathcal {C}+{\Delta }\mathcal {C})=\ker \mathcal {C}\) and \(\ker (\mathcal {C}+{\Delta }\mathcal {C})\neq \ker \mathcal {C}\) separately.
Let \(\tilde {c}\) the solution of the perturbed problem
Then, let \({\Delta } c=c-\tilde {c}\) denote the error. We are interested in deriving an error bound on Δc in terms of the perturbations of the data.
Let us for a matrix \({\mathscr{M}}\), denote the Moore-Penrose inverse by \({\mathscr{M}}^{+}\). Moreover, let \(\lVert {\mathscr{M}}\rVert \) be its spectral norm.
Let \(\mathcal {D}\) be an orthonormal basis of \(\ker \mathcal {C}\). Then, \(P=I_{n(mN+k)}-\mathcal {C}^{+}\mathcal {C}\) is the orthogonal projector onto \(\ker \mathcal {C}\) and \(P\mathcal {D}=\mathcal {D}\). Some more properties are collected in the following proposition.
Proposition 3
It holds, for any matrix \({\mathscr{M}}\in \mathbb {R}^{\nu \times n(mN+k)},\)\(\nu \in \mathbb {N}\),
-
1.
\(\mathcal {D}^{T}\mathcal {D}=I_{nmN+k}\) and \(\mathcal {D}\mathcal {D}^{T}=P\).
-
2.
If \(c=\mathcal {D}d,\) then \(\lvert c\rvert =\lvert d\rvert \).
-
3.
\(\lVert \mathcal {A}\mathcal {D}\rVert =\lVert \mathcal {A}P\rVert \).
-
4.
\((\mathcal {A}P)^{+}=\mathcal {D}(\mathcal {A}\mathcal {D})^{+}\).
-
5.
\(\lVert (\mathcal {A}P)^{+}\rVert =\lVert (\mathcal {A}\mathcal {D})^{+}\rVert \).
The proofs are obvious. For the following, we note that the matrix \(\mathcal {A}\mathcal {D}\) has full column rank [9, Proposition 1].
4.1 \(\ker (\mathcal {C}+{\Delta }\mathcal {C})=\ker \mathcal {C}\)
Each element c of \(\ker \mathcal {C}\) has a unique representation \(c=\mathcal {D}d\) with \(d\in \mathbb {R}^{nmN+k}\). Therefore, (23)–(22) is equivalent to the unconstrained minimization problem
while (36) becomes the unconstrained minimization problem
Since \(\mathcal {A}\mathcal {D}\) has full column rank, standard perturbation results for unconstrained least squares problems apply. As a consequence of [16, Satz 8.2.7] and Proposition 3, we obtain
Theorem 4
Let \(\omega =\lVert (\mathcal {A}P)^{+}\rVert \lVert {\Delta }\mathcal {A}P\rVert <1\). Then it holds
and
Here, \(\mathfrak {r}=r-\mathcal {A}c\) and
Theorem 4 corresponds to classical results for unconstrained minimization problems (e.g., [10], [13, Theorem 9.12] and is a small generalization of them. Let us emphasize that the estimation is independent of the perturbations of \(\mathcal {C}\) as long as the null space of \(\mathcal {C}\) is not changed by the perturbation.
Remark 2
In the case of the Legendre basis, the elements of \(\mathcal {C}\) consist only of three nonzero elements being equal to 1 and − 1, respectively, possibly scaled by the stepsizes, cf. (16), (17). So we expect \({\Delta }\mathcal {C}=0\) such that the estimates of this section apply. __
4.2 \(\ker (\mathcal {C}+{\Delta }\mathcal {C})\protect \neq \ker \mathcal {C}\)
The estimation of the error becomes much more involved than in the previous case. In a first step, we will construct a basis for the kernel of the perturbed constraint \((\mathcal {C}+{\Delta }\mathcal {C})z=0\).
Lemma 1
Let \({\varkappa }=\lVert \mathcal {C}^{+}\rVert \lVert {\Delta }\mathcal {C}\rVert <1/2.\) Then \(\mathcal {C}+{\Delta }\mathcal {C}\) has full rank and \(P_{\Delta }=I_{n(mN+k)}-(\mathcal {C}+{\Delta }\mathcal {C})^{+}(\mathcal {C}+{\Delta }\mathcal {C})\) is a projector onto \(\ker (\mathcal {C}+{\Delta }\mathcal {C})\). Furthermore, \(\mathcal {D}_{\Delta }=P_{\Delta }\mathcal {D}\) is a basis of \(\ker (\mathcal {C}+{\Delta }\mathcal {C})\). Moreover, the estimates
and
hold true.
Proof
The proposition of \(\mathcal {C}+{\Delta }\mathcal {C}\) having full rank as well as the error estimates follow from [16, Satz 8.2.5].
For showing that \(\mathcal {D}_{\Delta }\) is a basis of \(\ker (\mathcal {C}+{\Delta }\mathcal {C})\) consider
It holds
Therefore, the assumptions of [17, Theorem I-6.34] are fulfilled. Since \(\dim \ker (\mathcal {C}+{\Delta }\mathcal {C})=\dim \ker \mathcal {C},\)the first alternative of that theorem applies and PΔ is a one-to-one mapping of \(\ker \mathcal {C}\) onto \(\ker (\mathcal {C}+{\Delta }\mathcal {C})\). Hence, \(\mathcal {D}_{\Delta }\) is a basis of the latter space. □
By using the bases \(\mathcal {D}\) and \(\mathcal {D}_{\Delta }\), the unperturbed and the perturbed least squares problems become (37) and
In a first step, the deviations of the bases shall be estimated. It holds
Invoking Lemma 1, we obtainFootnote 4
with \(\kappa (\mathcal {C})=\lVert \mathcal {C}^{+}\rVert \lVert \mathcal {C}\rVert \). Consequently,
Let us transform (39) now. It holds
where \(\mathfrak {R}={\Delta }\mathcal {A}\mathcal {D}+(\mathcal {A}+{\Delta }\mathcal {A})(\mathcal {D}_{\Delta }-\mathcal {D})\). The representation of \(\mathfrak {R}\) provides the estimate
Denote \(\omega _{\Delta }=\lVert (\mathcal {A}P)^{+}\rVert \lVert \mathfrak {R}\rVert \). The condition ωΔ < 1 is obviously fulfilled if
Let d + Δd be the solution of (39). Using the fact that \(\mathcal {A}\mathcal {D}\) has full rank, Theorem 8.2.7 of [16] provides the estimates
and
with \(\mathfrak {r}=r-\mathcal {A}c\).
Theorem 5
Let \(\lVert {\Delta }\mathcal {A}\rVert \) and \(\lVert {\Delta }\mathcal {C}\rVert \) be sufficiently small such that (42) and \({\varkappa }=\lVert \mathcal {C}^{+}\lVert \rVert {\Delta }\mathcal {C}\rVert <1/2\) hold true. Then it holds
and
Proof
It holds \(c=\mathcal {D}d\) and \({\Delta } c=\mathcal {D}_{\Delta }{\Delta } d+(\mathcal {D}_{\Delta }-\mathcal {D})d\) such that \(\lvert {\Delta } c\rvert \leq \lvert {\Delta } d\rvert +\lVert P_{\Delta }-P\rVert \lvert d\rvert \). Inserting this estimate in (43) and (44) and using \(\lvert c\rvert =\lvert \mathcal {D}d\rvert =\lvert d\rvert \) provides the claim. □
Remark 3
\(\lvert \mathfrak {r}\lvert \) is a measure for the accuracy of the discrete solution. Let xπ ∈ Xπ denote the discrete solution obtained by minimizing Φπ,M (8). Its representation becomes \(c=\mathcal {R}^{-1}x_{\pi }\). Then it holds \(\lvert \mathfrak {r}\rvert ^{2}=\lvert \mathcal {A}c-r\rvert ^{2}={\Phi }_{\pi ,M}(x_{\pi })\). Hence, Φπ,M(xπ) ≤ 2(Φπ,M(x∗) + Φπ,M(xπ − x∗)). Under the conditions of Theorem 1, it holds, therefore, \(\lvert \mathfrak {r}\rvert \leq ch^{N-\mu +1}\). __
The critical quantities to estimate the influence of perturbations are \(\kappa _{\mathcal {C}}(\mathcal {A})\) and \(\lVert \mathcal {C}^{+}\rVert \), \(\kappa (\mathcal {C})\) as well as \(\lVert (\mathcal {A}P)^{+}\rVert \). The norms of \(\mathcal {C}\) and its pseudoinverse depend only on the choice of Xπ and the basis chosen for it, but not on the DAE. It holds \(\lVert \mathcal {C}\rVert =\sigma _{\max \limits }(\mathcal {C})\) and \(\lVert \mathcal {C}^{+}\rVert =\sigma _{\min \limits }(\mathcal {C})^{-1}\) with \(\sigma _{\min \limits }(\mathcal {C})\) being the smallest nonvanishing singular value of \(\mathcal {C}\). Since \(\mathcal {C}\) has full row rank, \(\sigma _{\min \limits }(\mathcal {C})=\left (\lambda _{\min \limits }(\mathcal {C}\mathcal {C}^{T})\right )^{1/2}\) and \(\sigma _{\max \limits }(\mathcal {C})=\left (\lambda _{\max \limits }(\mathcal {C}\mathcal {C}^{T})\right )^{1/2}\).
With \(\mathcal {C}\) from (22) we observe that
with
and \(\mathcal {O}_{\mathrm {s}}\in \mathbb {R}^{k(n-1)\times nN(m-k)}\) consists entirely of zero elements. The permutation matrices π1 and π2 are constructed as follows: Let \(x=[x_{1},x_{2},\ldots ,x_{m}]^{T}\in \tilde {X}_{\pi }\). First, the equations in \(\mathcal {C}c=0\) are reordered such that first all equations related to the first component x1, then those of x2, and so on until xk are available. This reordering is expressed via π1. The column permutation π2 reorders the coefficients such that the ones describing the differential components are taken first, and then the ones belonging to the algebraic components. In particular, the coefficients cκ describing xκ are given by cκ = [c1κ0,c1,κ1,…,c1κN,c2κ0,…,cnκN]T. Then we have
with
where e1 is the first unit vector and \(f=[1,{{\int \limits }_{0}^{1}}p_{0}(\sigma ){d}\sigma ,\ldots ,{{\int \limits }_{0}^{1}}p_{N-1}(\sigma ){d}\sigma ]\). This leads to
The eigenvalues of \(\mathcal {C}\mathcal {C}^{T}\) are those of (47).
Proposition 4
Let the grid (3) have the maximal stepsize h and the minimal stepsize \(h_{\min \limits }\). Then it holds
- (1):
-
\(\lvert f\rvert >1\).
- (2):
-
\(0<h_{\min \limits }^{2}(\lvert f\rvert ^{2}-1)\leq \lambda _{\min \limits }(\mathcal {C}_{s}\mathcal {C}_{s}^{T})\) and \(\lambda _{\max \limits }(\mathcal {C}_{s}\mathcal {C}_{s}^{T})\leq h^{2}(\lvert f\rvert ^{2}+3)\).
Proof
Since the first component of f is equal to 1, we have \(\lvert f\rvert \geq 1\) and \(\lvert f\rvert =1\) if and only if \({{\int \limits }_{0}^{1}}p_{0}(\sigma ){d}\sigma =\cdots ={{\int \limits }_{0}^{1}}p_{N-1}(\sigma ){d}\sigma =0\). Assume that the latter condition holds true. This means in particular that p0,…,pN− 1 are orthogonal to the polynomial \(p(\tau )\equiv 1\in \mathfrak {P}_{N-1}\). The latter space has dimension N. Since \(p_{0},\ldots ,p_{N-1}\in \mathfrak {P}_{N-1}\) are N polynomials being orthogonal to p, they must be linearly dependent in contradiction to the assumption that they form a basis. This proves (1).
In order to prove (2), we observe that \(\mathcal {C}_{s}\mathcal {C}_{s}^{T}\) is symmetric such that all eigenvalues are real. Invoking Gershgorin’s circle theorem [16, Theorem 1.2.10], the eigenvalues λ of \(\mathcal {C}_{s}\mathcal {C}_{s}^{T}\) fulfill
This proves (2). □
We obtain immediately the following corollary. Note that f depends only on N and the chosen basis, but not on the grid.
Corollary 1
Let the grids (3) be quasiuniform, that is \(h/h_{\min \limits }\leq \rho <\infty \) with ρ independent of π. Then it holds \(\kappa (\mathcal {C})\leq \rho \left (\frac {\lvert f\rvert ^{2}+3}{\lvert f\rvert ^{2}-1}\right )^{1/2}\) and \(\lVert \mathcal {C}^{+}\rVert \leq h_{\min \limits }(\lvert f\rvert ^{2}-1)^{-1/2}\).
For constant stepsize h, we have \(\mathcal {C}_{\mathrm {s}}\mathcal {C}_{\mathrm {s}}^{T}=h^{2}C_{s}{C_{s}^{T}}\), which is a Toeplitz tridiagonal matrix. In this case, the eigenvalues of \(C_{s}{C_{s}^{T}}\) are given by [18, Theorem 2.2]
Proposition 5
Let the grid (3) be equidistant with stepsize h, and Cs be given by (46). Then it holds
-
For the Legendre basis \(1\leq \lambda _{\min \limits }(C_{s}{C_{s}^{T}})\leq \lambda _{\max \limits }(C_{s}{C_{s}^{T}})\leq 5\);
-
For the modified Legendre basis \(2N\leq \lambda _{\min \limits }(C_{s}{C_{s}^{T}})\leq \lambda _{\max \limits }(C_{s}{C_{s}^{T}})\leq 2N+6\);
-
For the Chebyshev basis \(1\leq \lambda _{\min \limits }(C_{s}{C_{s}^{T}})\leq \lambda _{\max \limits }(C_{s}{C_{s}^{T}})\leq 4+2\ln 2\).
-
For the Runge-Kutta basis assume additionally that \({{\int \limits }_{0}^{1}}p_{i}(\sigma ){d}\sigma \geq 0\), i = 0,1,…,N − 1. Then \(N^{-1}\leq \lambda _{\min \limits }(C_{s}{C_{s}^{T}})\leq \lambda _{\max \limits }(C_{s}{C_{s}^{T}})\leq 5\).
Proof
In the case of the Legendre basis, it holds f = [1,1,0,…,0]. Hence, \(\lvert f\rvert ^{2}=2\) such that the statement follows.
For the modified Legendre basis, we have f = [1,2,0,2,0,…] such that
For the Chebyshev basis, we observe
This leads to \(f=[1,1,0,-\frac {1}{3},0,-\frac {1}{8},0,\ldots ]\). Hence,
For the sum of the series, cf. [19, p. 269, series 110.d]. This provides the estimate for the Chebyshev basis.
In case of the Runge-Kutta basis, it holds \({\sum }_{i=0}^{N-1}p_{i}(\sigma )\equiv 1\). With f = [1,f2,…,fN+ 1] it holds then fi ≥ 0 and \({\sum }_{i=2}^{N+1}f_{i}=1\). Hence,
This yields \(1+N^{-1}\leq \lvert f\rvert ^{2}\leq 2\) and the claim follows. □
Remark 4
For the Runge-Kutta basis, the values \(f_{i}={{\int \limits }_{0}^{1}}p_{i-1}(\sigma ){d}\sigma \) are just the weights of the interpolatory quadrature rule corresponding to the nodes τ1,…,τN of (18). For a number of common choices of nodes, these weights are known to be positive. Examples are the Gauss-Legendre nodes, Radau nodes, and Lobatto nodes [20, Section 2.7]. It holds also true for Chebyshev nodes and many others; see, e.g., [20, pp. 85f]. __
Note that the claims of Proposition 5 could also be shown using Gershgorin’s theorem. This indicates that the estimates of Proposition 4 are rather tight.
Corollary 2
For equidistant grids (3), it holds
-
For the Legendre basis \(\kappa (\mathcal {C})\leq \sqrt {5}\) and \(\lVert \mathcal {C}^{+}\rVert \leq h^{-1}\);
-
For the modified Legendre basis \(\kappa (\mathcal {C})\leq \left (\frac {2N+6}{2N}\right )^{1/2}\) and \(\lVert \mathcal {C}^{+}\rVert \leq (2N)^{-1/2}h^{-1}\);
-
For the Chebyshev basis \(\kappa (\mathcal {C})\leq (4+2\ln 2)^{1/2}\approx 2.32\) and \(\lVert \mathcal {C}^{+}\rVert \leq h^{-1}\).
-
For the Runge-Kutta basis \(\kappa (\mathcal {C})\leq (5N)^{1/2}\) and \(\lVert \mathcal {C}^{+}\rVert \leq N^{1/2}h^{-1}\) provided that \({{\int \limits }_{0}^{1}}p_{i}(\sigma ){d}\sigma \geq 0\), i = 0,1,…,N − 1.
It should be emphasized again that, if \(\ker (\mathcal {C}+{\Delta }\mathcal {C})\neq \ker \mathcal {C}\), it cannot be guaranteed that the solution of the perturbed problem \(\mathcal {R}(c+{\Delta } c)\) belongs to Xπ. Instead, it belongs to \(\tilde {X}_{\pi }\), only. Simple projection algorithms of elements of \(\tilde {X}_{\pi }\) onto Xπ can be derived. In our experiments so far, these projections did not lead to a better accuracy than the unprojected numerical solutions.
5 Some examples
5.1 Conditioning of the representation map \(\mathcal {R}\)
For each selection {p0,…,pN− 1} of basis polynomials, the conditioning of the representation map depends both on the grid and on N. For simplicity, we assume here that an equidistant grid with stepsize h is used for defining Xπ. Besides the bases introduced before, we will additionally consider the Runge-Kutta basis with uniform interpolation points as used in our very first paper on the subject [1].
The norms of the representation map and its inverse have been computed for both settings (mapping into \(L^{2}((a,b),\mathbb {R}^{m})\) and \({H_{D}^{1}}(a,b)\)) and for polynomial degrees N = 3,5,10,20 and h = n− 1 where N = 10,20,40,80,160,320. These are the first observations:
-
\(\sigma _{\min \limits }(\hat {\mathcal {U}})\) is independent of the chosen basis and independent of N for h ≤ 0.1. However, this is not true for larger stepsizes, cf. Table 2.
-
For every basis, \(\sigma _{\max \limits }(\mathcal {U})\approx \sigma _{\max \limits }(\hat {\mathcal {U}})\) up to a relative error below 10− 3. This coincides with the findings of Theorem 3.
In Tables 1, 2, 3, 4, 5, and 6, we present more detailed results. From these tables, we can draw the following conclusions:
-
The asymptotic behavior with respect to the stepsize h as indicated in Theorem 3 is clearly visible.
-
For both the Legendre and the Chebyshev bases, \(\sigma _{\max \limits }(\mathcal {U})\) and \(\sigma _{\max \limits }(\hat {\mathcal {U}})\) do not depend on N. This is reasonable for the Legendre basis if Proposition 2 is taken into account.
-
The asymptotics of \(\sigma _{\min \limits }(\mathcal {U})\) coincides with the results of Theorem 3 and Proposition 2 for the modified Legendre basis.
-
The norm of the representation map behaves similarly for all considered bases. Not unexpectedly, an exception is the Runge-Kutta basis for uniform nodes, which has a much larger norm than that for other bases. When comparing \(\sigma _{\min \limits }(\mathcal {U})\) and \(\sigma _{\max \limits }(\mathcal {U})\) for different bases, we observe that the difference between the Legendre basis and the Chebyshev basis on one hand and the modified Legendre basis on the other hand it seems that they have different scaling only, but their conditioning (being the product of the norms of the representation map and its inverse) is similar. A similar property holds for \(\hat {\mathcal {U}}\).
-
The Runge-Kutta basis has surprisingly good properties. However, this property depends on the representation with respect to an orthogonal polynomial basis (in the present example, Chebyshev polynomials). Thus, it is much more expensive to work with it compared to using Legendre or Chebyshev bases directly. __
5.2 Conditioning of the constrained minimization problems
In order to provide a first insight into the conditioning of the constrained minimization problem (23)–(22), we computed the condition numbers \(\kappa _{\mathcal {C}}(\mathcal {A})\) which have crucial importance for the behavior of the computational error. Discussions of \(\kappa (\mathcal {C})\) and \(\lVert \mathcal {C}^{+}\rVert \) have been provided earlier (Proposition 5 and Corollary 2). The examples below are chosen from our earlier investigations that led to surprisingly accurate results.
As done before, we use the bases as introduced in Section 5.1. We abandon the use the Runge-Kutta basis with uniform nodes since this basis has a bad conditioning. We choose M = N + 1 and the Gauss-Legendre nodes as collocation points (6). For this choice, \({\Phi }_{\pi ,M}^{R}={\Phi }_{\pi ,M}^{I}\) (see (12), (11)) and \(\kappa _{\mathcal {C}}(\mathcal {A})\) is identical for both choices.
Example 1
The first example is an index-3 DAE without dynamic degrees of freedom. It has been used before in numerous papers, e.g., [1, 2, 8]. The problem is given by
For unique solvability, no boundary or initial conditions are necessary. We choose the exact solution
and adapt the right-hand side q accordingly. In Table 7, the values of \(\kappa _{\mathcal {C}}(\mathcal {A})\) for \({\Phi }_{\pi ,M}^{R}\) and \({\Phi }_{\pi ,M}^{C}\) are provided. It turns out that the behavior for different functionals is comparable. Therefore, in the following examples, we present only the values for \({\Phi }_{\pi ,M}^{R}\).
Example 2
We continue with an example of a Hessenberg index-2 system used previously in [1]. Consider the DAE system
with the right-hand side q chosen in such a way that
is a solution. It has one dynamical degree of freedom. We choose the special condition
The results for η = − 25 and λ = − 1 are provided in Table 8.
Example 3
Our next example is a linearized problem proposed by Campbell and More [21]. It has been used previously in the experiments in [2, 8, 9] and others. Let
where
subject to the initial conditions
This problem has index 3 and dynamical degree of freedom ldyn = 4. The right-hand side q has been chosen in such a way that the exact solution becomes
The results are shown in Table 9. Note that, in the present example, h = 5/n in contrast to all previous computations where h = 1/n. □
The numerical experiments give rise to the following observations:
-
The condition numbers of the discrete problem have almost the same size for given polynomial degree N and stepsize h.
-
The experiments indicate that the Runge-Kutta basis seems to provide the lowest condition number for smaller stepsizes. In the case of higher order ansatz functions and larger stepsizes, the modified Legendre basis seems to provide the smallest condition numbers.
-
In order to obtain a complete picture of the relative merits of the different bases, in the case discussed in Theorem 5, not only the condition number \(\kappa (\mathcal {C})\) of \(\mathcal {C}\) but the term \(\lVert \mathcal {C}^{+}\rVert \kappa (\mathcal {C})\) has to be taken into account. Corollary 2 shows that the modified Legendre basis is well-suited for higher orders N.
-
If the perturbed solution \(\tilde {c}\) of (36) is projected back onto the nullspace \(\ker \mathcal {C}\), we can assume that the conditions of Theorem 4 are fulfilled. In this case, \(\mathcal {C}\) does not have any influence on the error estimation.
6 Conclusions
In this paper, we investigated the conditioning of the discrete problems arising in the least-squares collocation method for DAEs. In particular, the solution algorithm has been split into a representation mapping that connects the coefficients of the basis representation to the function to be represented, and a linearly equality constrained linear least-squares problem. A careful investigation of the representation map allowed for a characterization of errors in the function spaces by those made in the solution of the discrete problem.
The perturbation estimates for the constrained least-squares problem have been derived with the application in mind: the approximation of a DAE. The constraints play an exceptional role. If they are satisfied, the resulting numerical solution belongs to the solution space \({H_{D}^{1}}(a,b)\). If this cannot be guaranteed, the convergence theory for the least-squares method does not apply. Some of the characterizing quantities could be estimated analytically for reasonable choices of bases while others have been estimated numerically in certain examples. We believe that these considerations contribute to a robust and efficient implementation of the proposed method, which seems to provide surprisingly accurate numerical solutions to higher-index DAEs.
Notes
⊗ denotes the Kronecker product.
In the case of the Legendre and modified Legendre bases, all entries are integers weighted by the stepsizes.
In [1], the stiffness matrix is scaled to the interval (− 1, 1) in contrast to the interval (0,1) used here. Therefore, an additional factor of 1/2 appears in the present estimations.
In case that \(\ker (\mathcal {C}+{\Delta }\mathcal {C})=\ker \mathcal {C}\) we obtain \(\mathcal {P}_{\Delta }-\mathcal {P}=0\) and \(\mathcal {D}_{\Delta }=\mathcal {D}\) such that the present estimations coincide with those of the previous section.
References
Hanke, M., März, R., Tischendorf, C., Weinmüller, E., Wurm, S.: Least-squares collocation for linear higher-index differential-algebraic equations. J. Comput. Appl. Math. 317, 403–431 (2017). https://doi.org/10.1016/j.cam.2016.12.017
Hanke, M., März, R., Tischendorf, C.: Least-squares collocation for higher-index linear differential-algebaic equations: Estimating the stability threshold. Math. Comp. 88(318), 1647–1683 (2019). https://doi.org/10.1090/mcom/3393
Hanke, M., März, R.: A reliable direct numerical treatment of differential-algebraic equations by overdetermined collocation: An operator approach. J. Comput. Appl. Math. 387, 112520 (2021)
Hanke, M., März, R.: Convergence analysis of least-squares collocation metods for nonlinear higher-index differential-algebraic equations. J. Comput. Appl. Math. 387(11), 2021 (2514)
Hanke, M.: Linear differential-algebraic equations in spaces of integrable functions. J. Differential Equations 79(1), 14–30 (1989)
Lamour, R., März, R., Tischendorf, C.: Differential-Algebraic Equations: A Projector Based Analysis. Differential-Algebraic Equations Forum. Springer, Berlin (2013). Series Editors: A. Ilchmann, T. Reis
Kaltenbacher, B., Offtermatt, J.A.: Convergence analysis of regularization by discretization in preimage space. Math. Comp. 81(280), 2049–2069 (2012)
Hanke, M., März, R.: Towards a reliable implementation of least-squares collocation for higher-index linear differential-algebaic equations. Part 1: Basics and ansatz choices. Numerical Algorithms 89, 931–963 (2022)
Hanke, M., März, R.: Towards a reliable implementation of least-squares collocation for higher-index linear differential-algebaic equations. Part 2: The discrete least-squares problem. Numerical Algorithms 89, 965–986 (2022)
Wedin, P.-A.: Perturbation theory for pseudo-inverses. BIT 13, 217–232 (1973)
Eldén, L.: Perturbation theory for the least squares problem with linear equality constraints. SIAM J. Numer Anal. 17(3), 338–350 (1980)
Cox, A.J., Higham, N.J.: Accuracy and stability of the nullspace method for solving the equality constraind least squares problem. BIT 39(1), 34–50 (1999)
Lawson, C.L., Hanson, R.J.: Solving least squares problems. Prentice Hall, Englewood Cliffs NY (1974)
Bhatia, R.: Matrix Analysis. Graduate Texts in Mathematics. Springer, New York (1997)
Hanke, M., März, R.: Least-Squares Collocation for Higher-Index Daes: Global Approach and Attempts Towards a Time-Stepping Version. In: Reis, T., et al. (eds.) Progress in Differential-Algebraic Equations II, pp 91–136. Springer, Cham (2020)
Kiełbasiński, A., Schwetlick, H.: Numerische Lineare Algebra. Deutscher Verlag der Wissenschaften, Berlin (1988)
Kato, T.: Perturbation Theory for Linear Operators, 2nd edn. Classics in Mathematics. Springer, Berlin (1995)
Kulkarni, D., Schmidt, D., Tsui, S.-K.: Eigenvalues of tridiagonal pseudo-Toeplitz matrices. Lin. Alg. Appl. 297, 63–80 (1999)
Knopp, K.: Application of Infinite Series Theory. Blackie&Son Glasgow, London (1954)
Davis, P.J., Rabinowithz, P.: Methods of Numerical Intergration, 2nd edn. Academic Press, San Diego, London (1984)
Campbell, S.L., Moore, E.: Constraint preserving integrators for general nonlinear higher index DAEs. Num. Math. 69, 383–399 (1995)
Acknowledgements
The author wants to thank Roswitha März for many discussions that led to a great enhancement of the presentation. In particular, her contributions simplified the derivations leading to Theorem 2 considerably.
Funding
Open access funding provided by Royal Institute of Technology.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The author declares no competing interests.
Additional information
Data availability
The code used for generating all datasets is available on request.
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Hanke, M. On the sensitivity of implementations of a least-squares collocation method for linear higher-index differential-algebraic equations. Numer Algor 91, 1721–1754 (2022). https://doi.org/10.1007/s11075-022-01320-z
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11075-022-01320-z