1 Introduction

An overdetermined least-squares collocation method for the solution of boundary value problems for higher index differential-algebraic equations (DAEs) has been introduced in [28] and further investigated in [25,26,27]. A couple of sufficient convergence conditions have been established. Numerical experiments indicate an excellent behavior. Moreover, it is particularly noteworthy that the computational effort is not much more expensive than for standard collocation methods applied to boundary value problems for ordinary differential equations. However, the particular procedures are much more sensitive which reflects the ill-posedness of higher index DAEs. The question of a reliable implemention is almost completely open. The method offers a number of parameters and options whose selection has not been backed up by any theoretical justifications. Both parts of the present paper are devoted to a first investigation of this topic. We focus on the choice of collocation nodes, the representation of the ansatz function as well as the shape and structure of the resulting discrete problem. We apply various theoretical arguments, among them also new sufficient convergence conditions in Theorems 1, 2, and 3, and report on corresponding systematic comprehensive numerical experiments.

Considering that so far for the practical simulation of general unstructured DAEs with higher index only such methods are available which are connected with the construction, analysis, and evaluation of high-dimensional prolonged systems (derivative array systems), a reliable direct method, as we aim at, would be a significant progress. This is confirmed by a first comparison with the classical derivative array-based method from [6] in [27, subsection 6.4] and [25, subsection 5.2]. A discussion of some earlier direct approaches can be found in [28, Section 5].

The present Part 1 of the note is organized as follows: Section 2 contains the information concerning the problem to be solved as well as the basics on the overdetermined least-squares approach, and, additionally, the new error estimates. Section 3 deals with the selection and calculation of collocation points and integration weights for the different functionals of interest and Section 4 provides a robust selection of bases of the ansatz space. It should also be mentioned at this point that the resulting discrete least-squares problem is treated in detail in Part 2 [24]. We conclude with Section 5, which contains a summary and further comments.

The algorithms have been implemented in C++ 11. All computations have been performed on a laptop running OpenSuSE Linux, release Leap 15.1, the GNU g++ compiler (version 7.5.0) [40], the Eigen matrix library (version 3.3.7) [22], SuiteSparse (version 5.6.0) [7], in particular its sparse QR factorization [8], Intel®; MKL (version 2019.5-281), all in double precision with a rounding unit of εmach ≈ 2.22 × 10− 16.Footnote 1 The code is optimized using the level -O3.Footnote 2

2 Fundamentals of the problem and method

Consider a linear boundary value problem for a DAE with properly involved derivative,

$$ \begin{array}{@{}rcl@{}} A(t)(Dx)'(t)+B(t)x(t) & =&q(t),\quad t\in[a,b], \end{array} $$
(1)
$$ \begin{array}{@{}rcl@{}} G_{a}x(a)+G_{b}x(b) & =&d. \end{array} $$
(2)

with \([a,b]\subset \mathbb {R}\) being a compact interval, \(D=[I 0]\in \mathbb {R}^{k\times m}\), k < m, with the identity matrix \(I\in {\mathbb {R}}^{k\times k}\). Furthermore, \(A(t)\in {\mathbb {R}}^{m\times k}\), \(B(t)\in {\mathbb {R}}^{m\times m}\), and \(q(t)\in {\mathbb {R}}^{m}\) are assumed to be sufficiently smooth with respect to t ∈ [a,b]. Moreover, \(G_{a},G_{b}\in \mathbb {R}^{l\times m}\) and \(\ker G_{a}\supseteq \ker D\), \(\ker G_{b}\supseteq \ker D\). Thereby, l is the dynamical degree of freedom of the DAE, that is, the number of free parameters which can be fixed by initial and boundary conditions.

Unlike regular ordinary differential equations (ODEs) where l = k = m, for DAEs it holds that 0 ≤ lk < m, in particular, l = k for index-one DAEs, l < k for higher index DAEs, and l = 0 can certainly happen.

Supposing accurately stated initial and boundary conditions, index-one DAEs yield well-posed problems in natural settings and can be numerically treated quite well similarly as ODEs [34]. In contrast, in the present paper, we are mainly interested in higher index DAEs which lead to essentially ill-posed problems even if the boundary conditions are stated accurately [27, 33, 34]. The tractability index and projector-based analysis serve as the basis for our investigations. We refer to [33] for a detailed presentation and to [27, 34, 36] for corresponding short sketches.

We assume that the DAE is regular with arbitrarily high index \(\mu \in \mathbb {N}\) and the boundary conditions are stated accurately so that solutions of the problem (1)–(2) are unique. We also assume that a solution \(x_{\ast }:[a,b]\rightarrow {\mathbb {R}}^{m}\) actually exists and is sufficiently smooth.

For the construction of a regularization method to treat an essentially ill-posed problem a Hilbert space setting of the problem is most convenient. For this reason, as in [26,27,28], we apply the spaces

$$ \begin{array}{@{}rcl@{}} {H_{D}^{1}} & :=&{H_{D}^{1}}((a,b),\mathbb{R}^{m})=\{x\in L^{2}((a,b),\mathbb{R}^{m}):Dx\in H^{1}((a,b),\mathbb{R}^{k})\},\\ L^{2} & :=&L^{2}((a,b),{\mathbb{R}}^{m}), \end{array} $$

which are suitable for describing the underlying operators. In particular, let \({\mathscr{T}}:{H_{D}^{1}}\rightarrow L^{2}\times \mathbb {R}^{l}\) be given by

$$ \begin{array}{@{}rcl@{}} (\mathscr{T}x)(t)= & \left[\begin{array}{c} A(t)(Dx)'(t)+B(t)x(t)\\ G_{a}x(a)+G_{b}x(b) \end{array}\right]. \end{array} $$
(3)

Then the boundary value problem can be described by \({\mathscr{T}}x=(q,d)^{T}\).

For K ≥ 0, let \(\boldsymbol {\mathfrak {P}}_{K}\) denote the set of all polynomials of degree less than or equal to K. Next, we define a finite dimensional subspace \(X_{\pi }\subset {H_{D}^{1}}\) of piecewise polynomial functions which should serve as ansatz space for the least-squares approximation: Let the partition π be given by

$$ \pi: \quad a=t_{0}<t_{1}<\cdots<t_{n}=b, $$
(4)

with the stepsizes hj = tjtj− 1, \(h=\max \limits _{1\leq j\leq n}h_{j}\), and \(h_{min}=\min \limits _{1\leq j\leq n}h_{j}\).

Let \(C_{\pi }([a,b],\mathbb {R}^{m})\) denote the space of piecewise continuous functions having breakpoints merely at the meshpoints of the partition π. Let N ≥ 1 be a fixed integer. Then, we define

$$ \begin{array}{@{}rcl@{}} X_{\pi} &=&\{x\in C_{\pi}([a,b],\mathbb{R}^{m}):Dx\in C([a,b],\mathbb{R}^{k}), \\ && x_{\kappa}\lvert_{[t_{j-1},t_{j})}\in\boldsymbol{\mathfrak{P}}_{N}, \kappa=1,\ldots,k,\quad x_{\kappa}\lvert_{[t_{j-1},t_{j})}\in\boldsymbol{\mathfrak{P}}_{N-1},\\&&\kappa=k+1,\ldots,m, j=1,\ldots,n\}. \end{array} $$
(5)

The continuous version of the least-squares method reads: Find an xπXπ that minimizes the functional

$$ \boldsymbol{{\varPhi}}(x) = \|\mathscr{T}x\|^{2} = {{\int}_{a}^{b}}|A(t)(Dx)'(t)+B(t)x(t)-q(t)|^{2}\mathrm{d}t +|G_{a}x(a)+G_{b}x(b) -d|^{2}. $$
(6)

It is ensured by [27, Theorem 4.1] that, for all sufficiently fine partitions π with bounded ratios 1 ≤ h/hminρ, ρ being a global constant, there exists a unique solution xπXπ and the inequality

$$ \|x_{\pi}-x_{\ast}\|_{{H_{D}^{1}}}\leq Ch^{N-\mu+1} $$
(7)

is valid. The constant \(C\in \mathbb {R}\) depends on the solution x, the degree N, and the index μ, but it is independent of h. If N > μ − 1 then (7) apparently indicates convergence \(x_{\pi }\xrightarrow {h\rightarrow 0}x_{*}\) in \({H_{D}^{1}}\).

At this place it is important to mention that, so far, we are aware of only sufficient conditions of convergence and the error estimates may not be sharp. Not only more practical questions of implementation are open, but also several questions about the theoretical background. We are optimistic that much better estimates are possible since the results of the numerical experiments have performed impressively better than theoretically expected till now.

The following theorem can be understood as a specification of [27, Theorem 4.1] by a more detailed description of the ingredients of the constant C, in particular, now the role of N is better clarified, which could well be important for the practical realization. In particular, it suggests that smooth problems could perhaps be solved better with large N and coarser partitions.

Theorem 1

Let the DAE (1) be regular with index \(\mu \in \mathbb {N}\) and let the boundary condition (2) be accurately stated. Let x be a solution of the boundary value problem (1)–(2), and let A,B,q and also x be sufficiently smooth.

Let N ≥ 1 and all partitions π be such that h/hminρ, with a global constant ρ. Then, for all such partitions with sufficiently small h, the estimate (7) is valid with

$$ C=\frac{N!}{(2N)!\sqrt{2N+1}}C_{N}C_{*}\rho^{\mu-1} C_{data}, $$

where

$$ \begin{array}{@{}rcl@{}} &&C_{\ast}= \max \{\|x_{\ast}^{(N)}\|_{\infty},\|x_{\ast}^{(N+1)}\|_{\infty}\}(m+4k(b-a)^{3})^{1/2},\\ &&C_{data} \text{is independent of \textit{N} and \textit{h}, it depends only on the data } A,D,B, G_{a}, G_{b}, \end{array} $$

and CN is a rather involved function of N. In particular, there is an integer K with NK ≤ 2(μ − 1) + N such that, for \(N\rightarrow \infty \), CN does not grow faster than K2(μ− 1). If A and B are constant, it holds K = N.

At this place it should be mentioned that the estimate [38]

$$ \sqrt{2\pi N}\left( \frac{N}{e}\right)^{N} e^{1/(12N+1)}\leq N!\leq \sqrt{2\pi N}\left( \frac{N}{e}\right)^{N} e^{1/(12N)}, $$

or its slightly less sharp version,

$$ \sqrt{2\pi N}\left( \frac{N}{e}\right)^{N}\leq N!\leq \sqrt{2\pi N}\left( \frac{N}{e}\right)^{N} e^{1/12} $$

allow the growth estimate \(\frac {N!}{(2N)!}\leq \sqrt {\pi N}e^{1/6} \frac {1}{N!} \frac {1}{4^{N}}\), thus

$$ \begin{array}{@{}rcl@{}} C&\leq& \sqrt{\pi N} e^{1/6} \frac{1}{N!} \frac{1}{4^{N}\sqrt{2N+1}} C_{N}C_{*}\rho^{\mu-1} C_{data} \leq \sqrt{\frac{\pi}{2}} e^{1/6} \frac{1}{N!} \frac{1}{4^{N}} C_{N}C_{*}\rho^{\mu-1} C_{data} \\ &\leq& \frac{1}{N!} \frac{1}{4^{N}} C_{N}C_{*} \rho^{\mu-1} \sqrt{\frac{\pi}{2}} e^{1/6} C_{data}. \end{array} $$
(8)

However, it should be considered that CN and C also depend on N.

Proof

We apply the estimate [27]

$$ \|x_{\pi}-x_{\ast}\|_{{H_{D}^{1}}}\leq\frac{\|\mathscr{T}\|\alpha_{\pi}}{\gamma_{\pi}}+\alpha_{\pi}, $$

in which the approximation error απ and the instability threshold γπ are given by

$$ \alpha_{\pi}=\underset{x\in X_{\pi}}{\inf} \|x-x_{\ast}\|_{{H_{D}^{1}}},\quad\gamma_{\pi}=\underset{x\in X_{\pi},x\neq0}{\inf} \frac{\|\mathscr{T}x\|}{\|x\|_{{H_{D}^{1}}}}. $$

Owing to [27, Theorem 4.1], there is a constant cγ > 0 independent of π so that the instability threshold γπ satisfies the inequality

$$ c_{\gamma}h_{min}^{\mu-1}\leq \gamma_{\pi}\leq \lVert \mathscr{T}\rVert, $$

for all partitions with sufficiently small h. This leads to

$$ \|x_{\pi}-x_{\ast}\|_{{H_{D}^{1}}}\leq \frac{\alpha_{\pi}}{\gamma_{\pi}}(\|\mathscr{T}\|+\gamma_{\pi})\leq 2\frac{\alpha_{\pi}}{\gamma_{\pi}} \|\mathscr{T}\|. $$

Choosing N interpolation points ρi with

$$ \begin{array}{@{}rcl@{}} 0&<&\rho_{1}<\cdots<\rho_{N}<1,\\ \tilde{\omega}(\rho)&=&(\rho-\rho_{1})\cdots(\rho-\rho_{N}), \end{array} $$
(9)

the approximation error can be estimated by straightforward but elaborate computations by constructing pXπ such that \(p_{*,s}^{\prime }(t_{j-1}+\rho _{i}h_{j})= x_{*,s}^{\prime }(t_{j-1}+\rho _{i}h_{j})\), p∗,s(a) = x∗,s(a), s = 1,…,k, p∗,s(tj− 1 + ρihj) = x∗,s(tj− 1 + ρihj), s = k + 1,…,m, i = 1,…,N, j = 1,…,n, and regarding \(\alpha _{\pi }\leq \|p_{*}-x_{\ast }\|_{{H_{D}^{1}}}\). One obtains

$$ \begin{array}{@{}rcl@{}} \alpha_{\pi}&\leq& \frac{h^{N}}{N!} \lVert\tilde{\omega}\rVert_{L^{2}(0,1)} C_{\ast}, \\ C_{\ast}&=& \max \left\{\|x_{\ast}^{(N)}\|_{\infty},\|x_{\ast}^{(N+1)}\|_{\infty}\right\}\left( m+4k(b-a)^{3}\right)^{1/2}. \end{array} $$
(10)

Turning to shifted Gauss-Legendre nodes that minimize \(\lVert \tilde {\omega }\rVert _{L^{2}(0,1)}\) we obtain

$$ \lVert\tilde{\omega}\rVert_{L^{2}(0,1)}=\frac{(N!)^{2}}{(2N)!\sqrt{2N+1}}. $$

To verify this, we consider the polynomial

$$ \begin{array}{@{}rcl@{}} \omega(t)=2^{N}\tilde{\omega}\left( \frac{t+1}{2}\right)=(t-t_{1})\cdots(t-t_{N}) \end{array} $$

with zeros tj = 2ρj − 1, j = 1,…,N, which is nothing else but the standard Legendre polynomial with leading coefficient one. Using the Rodrigues formula and other arguments from [23, Section 5.4], one obtains

$$ \lVert\omega\rVert_{L^{2}(-1,1)}= 2^{N+\frac{1}{2}}\frac{(N!)^{2}}{(2N)!\sqrt{2N+1}}. $$

Finally, shifting back to the interval (0,1) leads to \(\lVert \tilde {\omega }\rVert _{L^{2}(0,1)} = 2^{-(N{}+{}\frac {1}{2})}\lVert \omega \rVert _{L^{2}(-1,1)}\).

Thus we have

$$ \alpha_{\pi}\leq \frac{h^{N}}{N!} \frac{(N!)^{2}}{(2N)!\sqrt{2N+1}}C_{\ast}= h^{N} \frac{N!}{(2N)!\sqrt{2N+1}}C_{\ast}. $$
(11)

Next, a careful review of the proof of [27, Theorem 4.1 (a)] results in the representation (in terms of [27])

$$ \begin{array}{@{}rcl@{}} \frac{1}{c_{\gamma}}&= &12 c_{Y}\sqrt{g_{\mu-1}}=12 c_{Y}\sqrt{d_{1,\mu-1}c^{*}_{\mu-1}\lVert D\mathscr{L}_{\mu-1}\rVert^{2}_{\infty}}\\ &=& 12 c_{Y}\sqrt{2}\lVert D{{\varPi}}_{0}Q_{1}{\cdots} Q_{\mu-1}D^{+}\rVert_{\infty}\lVert D\mathscr{L}_{\mu-1}\rVert_{\infty}\sqrt{c^{*}_{\mu-1}}. \end{array} $$

The factors \(\lVert D{{\varPi }}_{0}Q_{1}{\cdots } Q_{\mu -1}D^{+}\rVert _{\infty }\) and \(\lVert D{\mathscr{L}}_{\mu -1}\rVert _{\infty }\) depend only on the data A,D,B, likewise the bound cY introduced in [27, Proposition 4.3].

In contrast, the term \(c^{*}_{\mu -1}\) depends additionally on N besides the problem data. Let K denote the degree of the auxiliary polynomial \(q_{\mu -1}=\mathfrak A_{\mu -1}(Dp)'+\mathfrak B_{\mu -1}p,\) pXπ in the proof of [27, Theorem 4.1]. Then we have NKN + 2(μ − 1) and, by [27, Lemma 4.2], \(c_{\mu -1}^{*}=4^{\mu -1}\lambda _{K}\cdots \lambda _{K-\mu +2}\), where each λS > 0 is the maximal eigenvalue of a certain symmetric, positive semidefinite matrix of size (S + 1) × (S + 1) [28, Lemma 3.3].

Owing to [28, Corollary A.3] it holds that \(\lambda _{S}\leq \frac {4}{\pi ^{2}}S^{4}+O(S^{2})\) for large S, and therefore

$$ \begin{array}{@{}rcl@{}} c_{\mu-1}^{*}&=&4^{\mu-1}\lambda_{K}\cdots\lambda_{K-\mu+2} \\ &\leq& 4^{\mu-1} \left( \frac{4}{\pi^{2}}\right)^{\mu-1}K^{4}(K-1)^{4}\cdots(K-\mu+2)^{4}+O\left( K^{4(\mu-1)-1}\right)\\ &=&4^{\mu-1} \left( \frac{4}{\pi^{2}}\right)^{\mu-1}K^{4(\mu-1)}+O\left( K^{4(\mu-1)-1}\right)\\ &\leq& 4^{\mu-1} \left( \frac{4}{\pi^{2}}\right)^{\mu-1}(N+2(\mu-1))^{4(\mu-1)} +O\left( (N+2(\mu-1))^{4(\mu-1)-1}\right). \end{array} $$

Finally, letting

$$ \begin{array}{@{}rcl@{}} C_{data}= 2\|\mathscr{T}\|12 c_{Y}\sqrt{2} \lVert D{{\varPi}}_{0}Q_{1}{\cdots} Q_{\mu-1}D^{+}\rVert_{\infty}\lVert D\mathscr{L}_{\mu-1}\rVert_{\infty},\quad C_{N}=\sqrt{c^{*}_{\mu-1}}, \end{array} $$

we are done. □

Observe that, for smooth problems, any fixed sufficiently fine partition π, and \(N\rightarrow \infty \), the growth rate of the error \(\lVert x_{\pi }-x_{*}\rVert _{{H^{1}_{D}}}\) is not greater than that of

$$ \begin{array}{@{}rcl@{}} C_{*}h^{N}\frac{(N+2(\mu-1))^{2(\mu-1)}}{4^{N} N!} =C_{*}\left( \frac{h}{4}\right)^{N}\frac{(N+2(\mu-1))^{2(\mu-1)}}{ N!} \end{array} $$
(12)

and, for constant matrix function A and B,

$$ \begin{array}{@{}rcl@{}} C_{*}h^{N}\frac{N^{2(\mu-1)}}{4^{N} N!} =C_{*}\left( \frac{h}{4}\right)^{N}\frac{N^{2(\mu-1)}}{ N!}. \end{array} $$
(13)

Remember that C is a function of N.

Remark 1

The specific error estimation provided in [28] for the case of DAEs in Jordan chain form on equidistant grids may provide some further insight into the behavior of the instability threshold γπ. It is shown that

$$ \gamma_{\pi}\geq\bar{C}_{\mu}\left( \frac{h}{\sqrt{\lambda_{N}}}\right)^{\mu-1} $$

holds true for sufficiently small h where \(\bar {C}_{\mu }\) is a moderate constant depending only on μ [28, Theorem 3.6]. This leads to the dominant error term

$$ \begin{array}{@{}rcl@{}} \frac{\alpha_{\pi}}{\gamma_{\pi}}\leq \frac{C_{\ast}}{\bar{C}_{\mu}} \sqrt{\frac{\pi}{2}}e^{1/6}\frac{1}{2^{2N}}\frac{\lambda_{N}^{\frac{\mu-1}{2}}}{N!} h^{N-\mu+1}=\frac{1}{\bar{C}_{\mu}} \sqrt{\frac{\pi}{2}}e^{1/6}\frac{1}{h^{\mu-1}} C_{\ast} \left( \frac{h}{4}\right)^{N}\frac{\lambda_{N}^{\frac{\mu-1}{2}}}{N!}, \end{array} $$

indicating again that, for smooth problems, it seems reasonable to calculate with larger N and coarse partitions. Moreover, for sufficiently small \(\frac {h}{\sqrt {\lambda _{N}}}\), the estimation \(\lambda _{N}\leq \frac {4}{\pi ^{2}}N^{4}+O(N^{2})\) becomes valid [28, Remark 3.4], and hence the growth characteristic (13) for large N is confirmed once more.

The functional values Φ(x), which are needed when minimizing for xXπ, cannot be evaluated exactly and the integral must be discretized accordingly. Taking into account that the boundary value problem is ill-posed in the higher index case μ > 1, perturbations of the functional may have a serious influence on the error of the approximate least-squares solution or even prevent convergence towards the solution x. Therefore, careful approximations of the integral in Φ are required. We discuss the following three options:

$$ \begin{array}{@{}rcl@{}} \boldsymbol{{\varPhi}}_{\pi,M}^{C}(x)&=&\sum\limits_{j=1}^{n}\frac{h_{j}}{M}\sum\limits_{i=1}^{M} |A(t_{ji})(Dx)'(t_{ji})+B(t_{ji})x(t_{ji})-q(t_{ji})|^{2} \\&&+|G_{a}x(a)+G_{b}x(b)-d|^{2}, \end{array} $$
(14)
$$ \begin{array}{@{}rcl@{}} \boldsymbol{{\varPhi}}_{\pi,M}^{I}(x)&=&\sum\limits_{j=1}^{n}h_{j}\sum\limits_{i=1}^{M}\gamma_{i} |A(t_{ji})(Dx)'(t_{ji})+B(t_{ji})x(t_{ji})-q(t_{ji})|^{2} \\&&+|G_{a}x(a)+G_{b}x(b)-d|^{2}, \end{array} $$
(15)

and

$$ \begin{array}{@{}rcl@{}} \boldsymbol{{\varPhi}}_{\pi,M}^{R}(x)&=&\sum\limits_{j=1}^{n}{\int}_{t_{j-1}}^{t_{j}} \left|\sum\limits_{i=1}^{M}l_{ji}(t) (A(t_{ji})(Dx)'(t_{ji})+B(t_{ji})x(t_{ji})-q(t_{ji}))\right|^{2}\mathrm{d}t \\ &&+|G_{a}x(a)+G_{b}x(b)-d|^{2}, \end{array} $$
(16)

in which from the DAE (1) and xXπ only data at the points

$$ t_{ji}=t_{j-1}+\tau_{i}h_{j},\quad i=1,\ldots,M, j=1,\ldots,n, $$

are included, with

$$ 0\leq\tau_{1}<\cdots<\tau_{M}\leq 1. $$
(17)

In the last functional \(\boldsymbol {{\varPhi }}_{\pi ,M}^{R}\) Lagrange basis polynomials appear, i.e.,

$$ l_{ji}(t)=\frac{{\prod}_{\substack{\kappa=1\\ \kappa\neq i } }^{M}(t-t_{j\kappa})}{{\prod}_{\substack{\kappa=1\\ \kappa\neq i } }^{M}(t_{ji}-t_{j\kappa})} =\frac{{\prod}_{\substack{\kappa=1\\ \kappa\neq i } }^{M}(\tau-\tau_{\kappa})}{{\prod}_{\substack{\kappa=1\\ \kappa\neq i } }^{M}(\tau_{i}-\tau_{\kappa})}=:l_{i}(\tau),\quad \tau=(t-t_{j-1})/h_{j}. $$
(18)

Remark 2

The direct numerical implementation of \(\boldsymbol {{\varPhi }}^{R}_{\pi ,M}(x)\) with the Lagrangian basis functions includes the use of the mass matrix belonging to such functions. It is well known that this matrix may be very bad conditioned thus leading to an amplification of rounding errors. In connection with the ill-posedness of higher index DAEs, this may render the numerical solutions useless. The solution of the least-squares problem with \(\boldsymbol {{\varPhi }}_{\pi ,M}^{I}\) is much less expensive than that with \(\boldsymbol {{\varPhi }}^{R}_{\pi ,M}\), and in turn, solving system (21)–(22) below for xXπ in a least-squares sense using the (diagonally weighted) Euclidean norm in \({\mathbb {R}}^{nMm+l}\) according to \(\boldsymbol {{\varPhi }}_{\pi ,M}^{C}\) is even less computationally expensive than using \(\boldsymbol {{\varPhi }}^{I}_{\pi ,M}(x)\).

Introducing, for each xXπ and w(t) = A(t)(Dx)(t) + B(t)x(t) − q(t), the corresponding vector \(W\in {\mathbb {R}}^{mMn}\) by

$$ W=\left[\begin{array}{c} W_{1}\\ \vdots\\ W_{n} \end{array}\right]\in\mathbb{R}^{mMn},\quad W_{j}=h_{j}^{1/2}\left[\begin{array}{c} w(t_{j1})\\ \vdots\\ w(t_{jM}) \end{array}\right]\in\mathbb{R}^{mM}, $$
(19)

we obtain new representations of these functionals, namely

$$ \boldsymbol{{\varPhi}}_{\pi,M}^{C}(x)=W^{T}\mathscr{L}^{C}W+|G_{a}x(a)+G_{b}x(b)-d|^{2}, $$
$$ \boldsymbol{{\varPhi}}_{\pi,M}^{I}(x)=W^{T}\mathscr{L}^{I}W+|G_{a}x(a)+G_{b}x(b)-d|^{2}, $$

and

$$ \boldsymbol{{\varPhi}}_{\pi,M}^{R}(x)=W^{T}\mathscr{L}^{R}W+|G_{a}x(a)+G_{b}x(b)-d|^{2}, $$

whereby the first two formulae are evident, with \({\mathscr{L}}^{C}=\text {diag}(L^{C}\otimes I_{m},\ldots ,L^{C}\otimes I_{m})\), ⊗ denoting the Kronecker product, and LC = M− 1IM such that finally \({\mathscr{L}}^{C}=M^{-1}I_{nMm}\), and further, \({\mathscr{L}}^{I}={\text {diag}}(L^{I}\otimes I_{m},\ldots ,L^{I}\otimes I_{m})\) and LI = diag(γ1,…,γM). LC and thus \({\mathscr{L}}^{C}\) are positive definite. The matrices LI and \({\mathscr{L}}^{I}\) are positive definite if and only if all quadrature weights are positive.

The formula for \(\boldsymbol {{\varPhi }}_{\pi ,M}^{R}(x)\) can be established by straightforward evaluations following the linesFootnote 3 of [28, Section 2.3], in which \({\mathscr{L}}^{R}={\text {diag}}(L^{R}\otimes I_{m},\ldots ,L^{R}\otimes I_{m})\), LR is the mass matrix associated with the Lagrange basis functions li, i = 1,…,M, (18) for the node sequence (17), more precisely,

$$ L^{R}=\left( L_{i\kappa}^{R}\right)_{i,\kappa=1,\ldots,M},\quad L_{i\kappa}^{R}={{\int}_{0}^{1}}l_{i}(\tau)l_{\kappa}(\tau)d\tau. $$
(20)

LR is symmetric and positive definite and, consequently, \({\mathscr{L}}^{R}\) is so.

We emphasize that the matrices LC,LI,LR depend only on M, the node sequence (17), and the quadrature weights, but do not depend on the partition π and h at all.

We set always MN + 1. Although the nodes (17) serve as interpolation points in the functional \(\boldsymbol {{\varPhi }}^{R}_{\pi ,M}\), we still call them collocation nodes after [28]. It should be underlined here that minimizing each of the above functionals on Xπ can be viewed as a special least-squares method to solve the overdetermined collocation system W = 0, Gax(a) + Gbx(b)) = d, with respect to xXπ, that is in detail, the collocation system

$$ \begin{array}{@{}rcl@{}} \!\!\!\!\!\!\!\!\!\!\!\!\!\!\!A(t_{ji})(Dx)'(t_{ji})+B(t_{ji})x(t_{ji}) & =&q(t_{ji}),\quad i=1,\ldots,M,\quad j=1,\ldots,n, \end{array} $$
(21)
$$ \begin{array}{@{}rcl@{}} G_{a}x(a)+G_{b}x(b)) & =&d. \end{array} $$
(22)

The system (21)–(22) for xXπ becomes overdetermined since Xπ has dimension mnN + k, whereas the system consists of mnM + lmnN + mn + lnmN + m + l > nmN + k + lmnN + k scalar equations.Footnote 4

Remark 3

Based on collocation methods for index-1 DAEs, the first thought in [27, 28] was to turn to the functional \(\boldsymbol {{\varPhi }}^{C}_{\pi ,M}\) with nodes 0 < τ1 < ⋯ < τM < 1. However, the use of the special discretized norm in these papers for providing convergence results is in essence already the use of the functional \(\boldsymbol {{\varPhi }}^{R}_{\pi ,M}\).

For a general set of nodes (17), \(\boldsymbol {{\varPhi }}^{C}_{\pi ,M}\) represents a simple Riemann approximation of the corresponding integral which has first order of accuracy, only. If, however, the nodes are chosen as those of the Chebyshev integration, the orders \(1,\dots ,7\) and 9 can be obtained for the corresponding number M of nodes [29, p 349]. The marking with the upper index C indicates now that Chebyshev integration formulas are conceivable. As developed in [23, Section 7.5.2], integration formulas with uniform weights, i.e., Chebyshev formulas, are those where random errors in the function values have the least effect on the quadrature result. This makes these formulas very interesting in our context. However, although a lot of test calculations run well, we are not aware of convergence statements going along with \(\boldsymbol {{\varPhi }}^{C}_{\pi ,M}\) so far.

Remark 4

The functional \(\boldsymbol {{\varPhi }}^{R}_{\pi ,M}\) gets its upper index R from the restriction operator Rπ,M introduced in [26] with nodes 0 < τ1 < ⋯ < τM < 1. Note that [26, Theorem 2.3] generalizes convergence results from [27, 28] to a large extend. Theorem 2 below allows even any nodes with (17).

Remark 5

The functional \(\boldsymbol {{\varPhi }}^{I}_{\pi ,M}\) has its upper index I simply from the word integration formula. We will see first convergence results going along with \(\boldsymbol {{\varPhi }}^{I}_{\pi ,M}\) in Theorem 2 below.

Intuitively, it seems reasonable to use a Gaussian quadrature rule for these purposes. However, it is not known if such a rule is most robust against rounding errors and/or other choices of the overall process.

Remark 6

Our approximations are according to the basic ansatz space Xπ discontinuous, with possible jumps at the grid points in certain components. In this respect it does not matter which of our functionals is selected. Since we always have overdetermined systems (21)–(22), it can no longer be expected that all components of the approximation are continuous even for the case τ1 = 0,τM = 1. This is an important difference to the classical collocation methods for index-1 DAEs, which base on classical uniquely solvable linear systems, e.g., [34].

Theorem 2

Let the DAE (1) be regular with index \(\mu \in \mathbb {N}\) and let the boundary condition (2) be accurately stated. Let x be a solution of the boundary value problem (1)–(2), and let A,B,q and also x be sufficiently smooth.

Let all partitions π be such that h/hminρ, with a global constant ρ. Then, with

$$ M\geq N+\mu, $$

the following statements are true:

  1. (1)

    For sufficient fine partitions π and each sequence of arbitrarily placed nodes (17), there exists exactly one \(x_{\pi }^{R}\in X_{\pi }\) minimizing the functional \(\boldsymbol {{\varPhi }}^{R}_{\pi ,M}\) on Xπ, and

    $$ \begin{array}{@{}rcl@{}} \|x_{\pi}^{R}-x_{\ast}\|_{{H_{D}^{1}}}\leq C_{R}h^{N-\mu+1}. \end{array} $$
  2. (2)

    For each integration rule related to the interval [0,1], with M nodes (17) and positive weights γ1,…,γM, that is exact for polynomials with degree less than or equal to 2M − 2, and sufficient fine partitions π, there exists exactly one \(x_{\pi }^{I}\in X_{\pi }\) minimizing the functional \(\boldsymbol {{\varPhi }}^{I}_{\pi ,M}\) on Xπ, and \(x_{\pi }^{I}=x_{\pi }^{R}\), thus

    $$ \begin{array}{@{}rcl@{}} \|x_{\pi}^{I}-x_{\ast}\|_{{H_{D}^{1}}}\leq C_{R}h^{N-\mu+1}. \end{array} $$

Since Gauss-Legendre and Gauss-Radau integration rules are exact for polynomials up to degree 2M − 1 and 2M − 2, respectively, with positive weights, they are well suitable here, but Gauss-Lobatto rules do not meet the requirement of Theorem 2 (2).

Proof

  1. (1)

    In [26], additionally supposing 0 < τ < ⋯ < τM < 1, conditions are derived that ensure the existence and uniqueness of \(x_{\pi }^{R}\) minimizing \(\boldsymbol {{\varPhi }}^{R}_{\pi ,M}\) on Xπ. It is shown that \(x_{\pi }^{R}\) has similar convergence properties as xπ minimizing Φ on Xπ. Merely the constant CR is slightly larger than C in (7). A further careful check of the proofs in [26] shows that the assertion holds also true for τ1 = 0 and/or τM = 1, possibly with a larger constant CR.

  2. (2)

    For each arbitrary xXπ, the expression

    $$ \begin{array}{@{}rcl@{}} \theta_{j}(t)\!:=\!\left|\sum\limits_{i=1}^{M}l_{ji}(t)(A(t_{ji})(Dx)'(t_{ji}) + B(t_{ji}x(t_{ji}) - q(t_{ji}))\right|^{2},\quad t\in (t_{j-1},t_{j}), \end{array} $$

    shows that 𝜃j is a polynomial with degree less than or equal to 2M − 2, thus

    $$ \begin{array}{@{}rcl@{}} {\int}_{t_{j-1}}^{t_{j}}\theta_{j}(t)\text{dt} = h_{j}\sum\limits_{i=1}^{M} \gamma_{i}\theta_{j}(t_{ji}) = h_{j}\sum\limits_{i=1}^{M}\gamma_{i}\left|A(t_{ji})(Dx)'(t_{ji}) + B(t_{ji})x(t_{ji}) - q(t_{ji})\right|^{2} \end{array} $$

    Therefore, it follows that \(\boldsymbol {{\varPhi }}_{\pi ,M}^{I}(x)=\boldsymbol {{\varPhi }}_{\pi ,M}^{R}(x)\), for all xXπ, and \(\boldsymbol {{\varPhi }}_{\pi ,M}^{I}\) coincides with the special functional \(\boldsymbol {{\varPhi }}_{\pi ,M}^{I}\) having the same nodes. Eventually, (2) is a particular case of (1).

As already emphasized above, until now we are aware of only sufficient convergence conditions, which is, in particular, especially applicable for the size of M. So far, often the applications run well with M = N + 1 and no significant difference to calculations with a larger M was visible, e.g., [27, Section 6] and [28, Section 4]. Also the experiments in Section 4 below are carried out with M = N + 1. The following statement for A and B with polynomial entries allows to choose M independently of the index μ and confirms the choice M = N + 1 for constant A and B.

Theorem 3

Let the DAE (1) be regular with index \(\mu \in \mathbb {N}\) and let the boundary condition (2) be accurately stated. Let x be a solution of the boundary value problem (1)–(2), and let q and also x be sufficiently smooth. Let the entries of A and B be polynomials with degree less than or equal to NAB. Let all partitions π be such that h/hminρ, with a global constant ρ. Then, with

$$ M\geq N+1+N_{AB}, $$

the following statements are true:

  1. (1)

    For sufficient fine partitions π and each sequence of arbitrarily placed nodes (17), there exists exactly one \(x_{\pi }^{R}\in X_{\pi }\) minimizing the functional \(\boldsymbol {{\varPhi }}^{R}_{\pi ,M}\) on Xπ, and

    $$ \begin{array}{@{}rcl@{}} \|x_{\pi}^{R}-x_{\ast}\|_{{H_{D}^{1}}}\leq C_{R}h^{N-\mu+1}. \end{array} $$
  2. (2)

    For each integration rule of interpolation type related to the interval [0,1], with M nodes (17) and positive weights γ1,…,γM, that is exact for polynomials with degree less than or equal to 2M − 2, and sufficient fine partitions π, there exists exactly one \(x_{\pi }^{I}\in X_{\pi }\) minimizing the functional \(\boldsymbol {{\varPhi }}^{I}_{\pi ,M}\) on Xπ, and \(x_{\pi }^{I}=x_{\pi }^{R}\), thus

    $$ \begin{array}{@{}rcl@{}} \|x_{\pi}^{I}-x_{\ast}\|_{{H_{D}^{1}}}\leq C_{R}h^{N-\mu+1}. \end{array} $$
  3. (3)

    If A and B are even constant matrices, for sufficient fine partitions π and each sequence of arbitrarily placed nodes (17), there exists exactly one \(x_{\pi }^{R}\in X_{\pi }\) minimizing the functional \(\boldsymbol {{\varPhi }}^{R}_{\pi ,M}\) on Xπ, and

    $$ \begin{array}{@{}rcl@{}} \|x_{\pi}^{R}-x_{\ast}\|_{{H_{D}^{1}}}\leq C_{R}h^{\max\{0,N-\mu+1\}}. \end{array} $$

Proof

(1): This follows from [26, Proposition 2.2(iv)] and [27, Theorem 4.1]. (2): As in the proof of the previous theorem, this is again a consequence of (1). (3): The statement is a consequence of [26, Proposition 2.2(iv)] and [27, Theorem 4.7]. □

Remark 7

Observe a further interesting feature. Let A and B be constant matrices. Set N = 1, M = N + 1. Then, it holds that

$$ \begin{array}{@{}rcl@{}} \boldsymbol{{\varPhi}}_{\pi,M}^{C}(x)=\boldsymbol{{\varPhi}}_{\pi,M}^{R}(x)=\boldsymbol{{\varPhi}}_{\pi,M}^{I}(x),\quad x\in X_{\pi}, \end{array} $$

in which \(\boldsymbol {{\varPhi }}_{\pi ,M}^{I}\) is associated with the corresponding Gauss-Legendre or Gauss-Radau rules. This follows from the fact that the 2-point Chebyshev integration nodes are just the Gauss-Legendre nodes.

We underline that, by Theorem 3 (3), the approximate solutions stay bounded also for DAEs with larger index μ, for instance [28, Table 6] confirming that for an index four Jordan DAE.

Having in mind the implementation of such an overdetermined least-squares collocation, for given partition π and a given polynomial degree N, a number of parameters and options must be selected:

  • basis functions for Xπ;

  • number M of collocation points and their location 0 ≤ τ1 < ⋯ < τM ≤ 1;

  • setup and solution of the discrete least-squares problem.

We will discuss the first two issues in this context below and refer to [24] for the third one. The main aim is on implementations being as stable as possible, not necessarily best computational efficiency.

3 Collocation nodes, mass matrix and integration weights

3.1 Collocation nodes for \(\boldsymbol {{\varPhi }}^{R}_{\pi ,M}\)

The functional \(\boldsymbol {{\varPhi }}^{R}_{\pi ,M}\) in (16) is based on polynomial interpolation using M nodes (17). It seems reasonable to choose these nodes in such a way that, separately on each subinterval [tj− 1,tj] of the partition, the interpolation error is as small as possible in a certain sense. Without restriction of the generality we can trace back the matter to the interval [0,1].

Consider functions \(q\in C([0,1],\mathbb {R}^{m})\) and define the interpolation operator \(R_{M}:C([0,1],\mathbb {R}^{m})\rightarrow C([0,1],{\mathbb {R}}^{m})\) by

$$ \begin{array}{@{}rcl@{}} R_{M}q=\sum\limits_{i=1}^{M}l_{i}q(\tau_{i}), \end{array} $$

with the Lagrange basis functions (18) such that (RMq)(τi) = q(τi), i = 1,…,M, and RMqYM, where \(Y_{M}\subset C([0,1],{\mathbb {R}}^{m})\) is the subspace of all functions whose components are polynomials up to degree M − 1. Introducing ω(τ) = (ττ1)(ττ2)⋯(ττM) and using componentwise the divided differences we have the error representation, e.g., [23, Kapitel 5],

$$ \begin{array}{@{}rcl@{}} q(\tau)-(R_{M}q)(\tau)=\omega(\tau) q[\tau_{1},\ldots,\tau_{M},\tau]. \end{array} $$

For smooth functions \(q\in C^{M}([0,1],\mathbb {R}^{m})\) it follows that

$$ \begin{array}{@{}rcl@{}} \lVert q-R_{M}q\rVert^{2}_{L^{2}}={{\int}_{0}^{1}}\omega(\tau)^{2} \lvert q[\tau_{1},\ldots,\tau_{M},\tau]\rvert^{2}\text{d}\tau\leq {{\int}_{0}^{1}}\omega(\tau)^{2}\text{d}\tau\frac{m}{(M!)^{2}}\lVert q^{(M)}\rVert^{2}_{\infty}. \end{array} $$

For the evaluation of \(\boldsymbol {{\varPhi }}_{\pi ,M}^{R}\) (16), it seems reasonable to choose the collocation nodes in such a way that this expression is minimized for all functions \(q\in C^{(M)}([0,1],{\mathbb {R}}^{m})\). The optimal set of nodes is determined by the condition

$$ \underset{0\leq\tau_{1}<\cdots<\tau_{M}\leq1}{\min} \|\omega\|_{L^{2}(0,1)}. $$

It is well known that this functional is minimized if the collocation nodes are chosen to be the Gauss-Legendre nodes [23, Chapter 7.5.1 and 4.5.4].

On the other hand, the best polynomial approximation to a given function q in the L2-norm is obtained if the Fourier approximation with respect to the Legendre polynomials is computed. However, to the best of our knowledge, there are no estimations of the interpolation error in \(L^{2}((0,1),{\mathbb {R}}^{m})\) known.Footnote 5 However, in the uniform norm and with arbitrary node sequences, for each \(q\in C([0,1],\mathbb {R}^{m})\), the estimate

$$ \|R_{M}q-q\|_{\infty}\leq(1+{{\varLambda}}_{M})\text{dist}_{\infty}(q,Y_{M}) $$

holds true where \(\text {dist}_{\infty }(q,Y_{M})=\min \limits \{\|q-y\|_{\infty }|y\in Y_{M}\}\) and ΛM is so-called Lebesgue constant defined by

$$ {{\varLambda}}_{M}=\underset{\tau\in[0,1]}{\max} \sum\limits_{i=1}^{M}|l_{i}(\tau)| $$

in which li are again the Lagrange basis functions (18).

The Lebesgue constant \({{{\varLambda }}_{M}^{L}}\) for Gauss-Legendre nodes has the property \({{{\varLambda }}_{M}^{L}}=O(\sqrt {M})\). If instead Chebyshev nodes are used, the corresponding Lebesgue constant \({{{\varLambda }}_{M}^{C}}\) behaves like \({{{\varLambda }}_{M}^{C}}=O(\log M)\) ([12, p 206] and the references therein). For uniform polynomial approximations, these nodes are known to be optimal [9, Theorem 7.6]. Table 1 shows some values for the Lebesgue constants. Note that the Lebesgue constants \({{{\varLambda }}_{M}^{U}}\) for equidistant nodes grow exponentially (see, e.g., [44]).Footnote 6

Table 1 Lebesgue constants for Chebyshev nodes (\({{{\varLambda }}_{M}^{C}}\)), Gauss-Legendre nodes (\({{{\varLambda }}_{M}^{L}}\)), Gauss-Lobatto nodes (\({{\varLambda }}_{M}^{Lo}\)), Gauss-Radau nodes (\({{{\varLambda }}_{M}^{R}}\)), and uniform nodes including the boundaries (\({{{\varLambda }}_{M}^{U}}\)) and without boundaries (\({{{\varLambda }}_{M}^{O}}\))

Remark 8

Computation of nodes and weights for Gauss-type integration formulae

In the following we will make heavy use of Gauss-Legendre, Gauss-Radau, and Gauss-Lobatto integration nodes and their corresponding weights. Since we do not have them available in tabular form for large M with sufficient accuracy, they will be computed on the fly. A severe concern is the accuracy of the nodes and weights. In the case of Gauss-Legendre integration rules, the computed nodes and weights have been provided by the Gnu Scientific Library routine glfixed.c [14]. It makes use of tabulated values for M = 1(1)20, 32, 64, 96, 100, 128, 256, 512, 1024 with an accuracy of 27 digits. Other values are computed on the fly with an accuracy being a small multiple of the machine rounding unit using an adapted version of the Newton method.

For computing the Gauss-Lobatto nodes and weights, the methods of [37] (using the Newton method) as well as [17] (a variant of the method in [21]) have been implemented. Table 2 contains some comparisons to the tabulated values in [37] that have 20 digits. The method of [37] provides slightly more accurate values than that of [17]. Therefore, the former has been used further on.

Table 2 Accuracy of the computed nodes and weights of the Gauss-Lobatto integration rules

We did not find sufficiently accurate tabulated values for the Gauss-Radau nodes and weights. Therefore, the method of [16] has been implemented. We assume that the results obtained have an accuracy similar to the values for the Gauss-Lobatto nodes and weights using the method in [17].

3.2 The mass matrix

In the following, we will make extensive use of Legendre polynomials. For the readers’ convenience, the necessary properties are collected in Appendix A.1.

Let us turn to \(\boldsymbol {{\varPhi }}_{\pi ,M}^{R}\) (16) again. A critical ingredient for determining its properties is the mass matrix LR in (20). Denote as before by li(τ), i = 1,…,M, the Lagrange basis functions for the node sequence (17), that is, cf. (18),

$$ l_{i}(\tau)=\frac{{\prod}_{\kappa\neq i}(\tau-\tau_{\kappa})}{{\prod}_{\kappa\neq i}(\tau_{i}-\tau_{\kappa})}. $$

For evaluating LR, we will use the normalized shifted Legendre polynomials \(\hat {P}_{\nu }=(2\nu +1)^{1/2}\tilde {P}_{\nu }\) (cf Appendix A.1). Assume the representation

$$ l_{i}(\tau)=\sum\limits_{\nu=1}^{M}\alpha_{i\nu}\hat{P}_{\nu-1}(\tau). $$
(23)

A short calculation shows

$$ L_{ij}^{R}=\sum\limits_{\lambda=1}^{M}\alpha_{i\lambda}\alpha_{j\lambda}. $$

Letting ai = (αi1,…,αiM)T we obtain \(L_{ij}^{R}=(a^{i})^{T}a^{j}\). Collecting the vectors ai in a matrix A = (a1,…,aM) it holds LR = ATA. The definition of the coefficients αiν provides us with \(\tilde {V}a^{i}=e^{i}\) where ei denotes the i th unit vector and

$$ \tilde{V}=\left[\begin{array}{ccc} \hat{P}_{0}(\tau_{1}) & {\ldots} & \hat{P}_{M-1}(\tau_{1})\\ {\vdots} & & \vdots\\ \hat{P}_{0}(\tau_{M}) & {\ldots} & \hat{P}_{M-1}(\tau_{M}) \end{array}\right]. $$
(24)

This gives \(A=\tilde {V}^{-1}\).

\(V=\tilde {V}^{T}\) is a so-called Vandermonde-like matrix [15]. It is nonsingular under the condition (17) [41, Theorem 3.6.11]. In [15], representations and estimations of the condition number with respect to the Frobenius norm of such matrices are derived. In particular, [15, Table 1] shows impressingly small condition numbers if the collocation nodes are chosen to be the zeros of \(\tilde {P}_{M}\), that is the Gauss-Legendre nodes. Moreover, this condition number is optimal among all scalings of the Legendre polynomials [15]. A consequence of the Christoffel-Darboux formula is that the rows of \(\tilde {V}\) are orthogonal for Gauss-Legendre nodes.Footnote 7 Thus, we have the representation \(\tilde {V}={\mathscr{D}}U\) with an orthogonal matrix U and a diagonal matrix \({\mathscr{D}}\) with positive diagonal entries.Footnote 8

It is known that the Gauss-Legendre nodes are not the very best set of nodes. However, a comparison of Tables 1 and 2 in [15] as well as [18, Table 4] indicates that the gain of choosing optimal nodes for Legendre polynomials compared to the choice of Gauss-Legendre nodes is rather minor.

In Table 3 we provide condition numbers of \(\tilde {V}\) with respect to the Euclidean norm for different choices of nodes. Note that the condition number of LR is the square of that of \(\tilde {V}\).

Table 3 Spectral condition numbers of the Vandermonde-like matrices for different node choices

The condition numbers for all Gauss-type and Chebyshev nodes are remarkably small.

3.3 Computation of quadrature weights for general \(\boldsymbol {{\varPhi }}^{I}_{\pi ,M}\)

In oder to apply \(\boldsymbol {{\varPhi }}_{\pi ,M}^{I}\) (15), a numerical quadrature formulae is necessary. For standard nodes sequences (Gauss-Legendre, Gauss-Lobatto, Gauss-Radau) their computation has been described above. However, for general node sequences, the weights must be evaluated. This can be done following the derivations in [41, p. 175]: Let \(\hat {P}_{\nu }(\tau )\) denote the normalized shifted Legendre polynomials as before. In particular, it holds then

$$ {{\int}_{0}^{1}}\hat{P}_{0}(\tau)\mathrm{d}\tau=1,{\quad{\int}_{0}^{1}}\hat{P}_{\nu}(\tau)\mathrm{d}\tau=0,\quad\nu=1,2,\ldots $$

For a given function qC[0,1], the integral is approximated by the integral of its polynomial interpolation. Using the representation (23) of the Lagrange basis functions we obtain

$$ \begin{array}{@{}rcl@{}} {{\int}_{0}^{1}}q(\tau)d &\tau & {\approx{\int}_{0}^{1}}\sum\limits_{i=1}^{M}q(\tau_{i})\sum\limits_{\nu=1}^{M}\alpha_{i\nu}\hat{P}_{\nu-1}(\tau)\mathrm{d}\tau\\ & =&\sum\limits_{i=1}^{M}q(\tau_{i})\sum\limits_{\nu=1}^{M}\alpha_{i\nu}{{\int}_{0}^{1}}\hat{P}_{\nu-1}(\tau)\mathrm{d}\tau\\ & =&\sum\limits_{i=1}^{M}q(\tau_{i})\alpha_{i1}. \end{array} $$

Consequently, for the weights it holds γi = αi1, i = 1,…,M. The definition (23) shows that the vector γ = (γ1,…,γM)T of weights fulfills the linear system

$$ V\gamma=e^{1} $$

where \(V=\tilde {V}^{T}\)with \(\tilde {V}\) from (24) and e1 = (1,0,…,0)T is the first unit vector.

The discussion of the condition number of V shows that we can expect reliable and accurate results at least for reasonable node sequences.

For general node sequences, the weights may become negative. This happens, for example, for uniformly distributed nodes and M > 7 (Newton-Cotes formulae) [41, p. 148]. So for \(\boldsymbol {{\varPhi }}_{\pi ,M}^{I}\), only node sequences leading to positive quadrature weights γi are admitted in order to prevent LI from not being positive definite.

4 Choice of basis functions for the ansatz space X π

The ansatz space Xπ (5) consists of piecewise polynomials having the degree N − 1 for the algebraic components and the degree N for the differentiated ones on each subinterval of the partition π (4). For collocation methods for boundary value problems for ordinary differential equations this question has led to the choice of a Runge-Kutta basis for stability reasons (see [2]). This has been later on also used successfully for boundary value problems for index-1 DAEs [3, 31, 32, 34]. However, this ansatz makes heavily use of the collocation nodes which are at the same time used as the nodes for the Runge-Kutta basis. In our case, the number M of collocation nodes and the degree N of the polynomials for the differentiated components do not coincide since M > N such that the reasoning applied in the case of ordinary differential equations does not transfer to the least-squares case.

Taking into account the computational expense for solving the discretized system, bases with local support are preferable. Ideally, the support of each basis function consists of only one subinterval of (4).Footnote 9 Note that the Runge-Kutta basis has this property. We consider the Runge-Kutta basis and further local basis with orthogonal polynomials. A drawback of this strategy is the fact that the continuity of the piecewise polynomials approximating the differentiated components must be ensured explicitly. This in turn will lead to a discrete least-squares problem with equality constraints. Details can be found in Part 2 [24].

Looking for a local basis we turn to the reference interval [0,1]. Once a basis on this reference interval is available it can be defined on any subinterval (tj− 1,tj) by a simple linear transformation.

Assume that {p0,…,pN− 1} is a basis of the set of polynomials of degree less than N defined on the reference interval [0,1]. Then, a basis \(\{\bar {p}_{0},\ldots ,\bar {p}_{N}\}\) for the ansatz functions for the differentiated components is given by

$$ \bar{p}_{i}(\rho)=\left\{\begin{array}{ll} 1, & i=0,\\ {\int}_{0}^{\rho}p_{i-1}(\sigma)\mathrm{d}\sigma, & i=1,\ldots,N,\quad\rho\in[0,1], \end{array}\right. $$
(25)

and the transformation to the interval (tj− 1,tj) of the partition π (4) yields

$$ \begin{array}{@{}rcl@{}} p_{ji}(t) & =&p_{i}((t-t_{j-1})/h_{j}), \\ \bar{p}_{ji}(t) & =&h_{j}\bar{p}_{i}((t-t_{j-1})/h_{j}). \end{array} $$
(26)

Additionally to this transformation, the continuity of the piecewise polynomials must be ensured. This gives rise to the additional conditions

$$ \bar{p}_{ji}(t_{j})=\bar{p}_{j+1,i}(t_{j}),\quad i=0,\ldots,N,\quad j=1,\ldots,n-1, $$
(27)

which must be imposed explicitly.Footnote 10

4.1 The Runge-Kutta basis

In order to define the Runge-Kutta basis, let the N interpolation points ρi with (9) be given. Then, the Lagrange basis functions are chosen,

$$ p_{i}(\rho)=\frac{{\prod}_{\kappa\neq i+1}(\rho-\rho_{\kappa})}{{\prod}_{\kappa\neq i+1}(\rho_{i}-\rho_{\kappa})},\quad i=0,\ldots,N-1. $$

Remark 9

Note that the interpolation nodes are only used to define the local basis functions. Thus, their selection is completely independent of the choice of collocation nodes. In view of the estimations (10) and (11) and the argumentation there we prefer Gauss-Legendre interpolation nodes. This choice is also supported by Experiments 2 and 5 below.

The numerical computation of \(\bar {p}\) is more involved. If not precalculated, the integrals must be available in a closed formula. This can surely be done by expressing the Lagrange basis functions in the monomial representation such that the integration can be carried out analytically. Once these coefficients are known, the evaluation of the values of the basis functions at a given ρ ∈ [0,1] is easily done using the Horner method. However, this approach amounts to the inversion of the Vandermonde matrix using the nodes (9). This matrix is known to be extremely ill-conditioned. In particular, its condition number grows exponentially with N [5, 19]. Therefore, an orthogonal basis might be better suited. This leads to a representation

$$ p_{i}(\rho)=\sum\limits_{\kappa=1}^{N}\alpha_{i\kappa}Q_{\kappa}(\rho) $$
(28)

for some polynomials Q1,…,QN. If these polynomials fulfil a three-term recursion,Footnote 11 the evaluation of function values can be performed using the Clenshaw algorithm [13] which is only slightly more expensive than the Horner method. In order to use this approach, the integrals of p0,…,pN− 1 must be easily representable in terms of the chosen basis. Here, the Legendre and Chebyshev polynomials are well-suited (cf below Appendix A.1 and (30) as well as Appendix A.2 and (32)).

4.2 Orthogonal polynomials

A reasonable choice for the basis is orthogonal polynomials. We will consider Legendre polynomials first. A motivation is provided in the following example.

Example 1

Consider the index-1 DAE

$$ x=q(t),\quad t\in[0,1]. $$

Let \(\{\hat {P}_{0},\ldots ,\hat {P}_{N-1}\}\) be the normalized shifted Legendre polynomials. Then letting \(x={\sum }_{i=1}^{N}\alpha _{i}\hat {P}_{i-1}\) for some vector α = (α1,…,αN)T, the least-squares functional

$$ \boldsymbol{{\varPhi}}(x)={{\int}_{0}^{1}}(x(t)-q(t))^{2}\mathrm{d}t $$

corresponding to this DAE is minimized if α = b and b = (b1,…,bN)T where \(b_{i}={{\int \limits }_{0}^{1}}q(t)\hat {P}_{i-1}(t)dt\) which is just the best approximation of the solution in \({H_{D}^{1}}((0,1),{\mathbb {R}})=L^{2}((0,1),{\mathbb {R}})\).

Similar relations hold for the differential equation \(x^{\prime }=f\) if the basis functions for the differentiated components are constructed according to (25). Hence, these basis functions seem to qualify well for index-1 DAEs.

The necessary ingredients for the efficient implementation of the Legendre polynomials are collected in Appendix A.1.

Another common choice is Chebyshev polynomials of the first kind. They have been used extensively in the context of spectral methods because of their excellent approximation properties (cf [11, 43], see also [10]). The relations used for their implementation can be found in Appendix A.2.Footnote 12

4.3 Comparison of different basis representations

The choice of the basis function representations is dominated by the question of obtaining a most robust implementation. The computational complexity of the representations presented above is not that much different such that this aspect plays a minor role.

The check for robustness can be subdivided into two questions:

  1. 1.

    Which representation is most robust locally?

  2. 2.

    Which representation is most robust globally?

In the following experiments, N will be varied. The functional used is \(\boldsymbol {{\varPhi }}_{\pi ,M}^{R}\). The number of collocation nodes is M = N + 1. Table 3 motivates the choice of the Gauss-Legendre nodes as collocation nodes. In order to compute the norms of \(L^{2}((0,1),{\mathbb {R}}^{m})\) and \({H_{D}^{1}}((0,1),{\mathbb {R}}^{m})\), Gaussian quadrature with N + 2 integration nodes on each subinterval of π is used.

4.3.1 Local behavior of the basis representations

In order to answer the first question, it is reasonable to experiment first with a higher index example that does not have any dynamic components (that is, l = 0) on a grid π consisting only of one subinterval (that is, n = 1). In that case, we check the ability to interpolate functions and to numerically differentiate them.

For n = 1, there are no continuity conditions (27) involved. Therefore, the discrete problem becomes a linear least-squares problem. We will solve it by a Householder QR factorization with column pivoting as implemented in the Eigen library.

The following example is used in [27, 28].

Example 2

$$ \begin{array}{@{}rcl@{}} x^{\prime}_{2}(t)+x_{1}(t) & =&q_{1}(t),\\ t\eta x^{\prime}_{2}(t)+x^{\prime}_{3}(t)+(\eta+1)x_{2}(t) & =&q_{2}(t),\\ t\eta x_{2}(t)+x_{3}(t) & =&q_{3}(t),\quad t\in [0,1]. \end{array} $$

This is an index-3 example with dynamical degree of freedom l = 0 such that no additional boundary or initial conditions are necessary for unique solvability. We choose the exact solution

$$ \begin{array}{@{}rcl@{}} x_{\ast,1}(t) & =&e^{-t}\sin t,\\ x_{\ast,2}(t) & =&e^{-2t}\sin t,\\ x_{\ast,3}(t) & =&e^{-t}\cos t \end{array} $$

and adapt the right-hand side q accordingly. For the exact solution, it holds \(\|x_{\ast }\|_{L^{2}((0,1),{\mathbb {R}}^{3})}\approx 0.673\), \(\|x_{\ast }\|_{L^{\infty }((0,1),{\mathbb {R}}^{3})}=1\), and \(\|x_{\ast }\|_{{H_{D}^{1}}((0,1),{\mathbb {R}}^{3})}\approx 1.11\).

Experiment 1

Robustness of the representation of the Runge-Kutta basis

In a first experiment we intend to clarify the differences between different representations of the Runge-Kutta basis. The interpolation nodes (9) have been fixed to be the Gauss-Legendre nodes (cf (10)). The Runge-Kutta basis has been represented with respect to the monomial, Legendre, and Chebyshev bases. The results are shown in Fig. 1 (see 6). This test indicates that the monomial basis is much less robust than the others for N > 10 while the other representations behave very similar.

Fig. 1
figure 1

Error of the approximate solution in Experiment 1 measured in the norm of \({H_{D}^{1}}((0,1),{\mathbb {R}}^{m})\). The abbreviations (M) for the monomial basis, (L) for the Legendre basis, and (C) for the Chebyshev basis are used

Experiment 2

Robustness of the Runge-Kutta basis with respect to the node sequence

In this experiment we are interested in understanding the influence of the interpolation nodes. For that, we compared the uniform nodes sequence to the Gauss-Legendre and Chebyshev nodes. The uniform nodes are given by \(\rho _{i}=(i-\frac {1}{2})/N\). In accordance with the results of the previous experiment, the representation of the Runge-Kutta basis in Legendre polynomials has been chosen. The results are shown in Fig. 2. Not unexpectedly, uniform nodes are inferior to the other choices at least for N > 13. On the other hand, there is no significant difference between Gauss-Legendre and Chebyshev nodes.

Fig. 2
figure 2

Error of the approximate solution in Experiment 2 measured in the norm of \({H_{D}^{1}}((0,1),{\mathbb {R}}^{m})\). The abbreviations (U) for uniform nodes, (L) for the Gauss-Legendre nodes, and (C) for the Chebyshev nodes are used

Experiment 3

Robustness of different polynomial representations

In this experiment we intend to compare the robustness of different bases. Therefore, we have chosen the Runge-Kutta basis with Gauss-Legendre interpolation nodes, the Legendre polynomials, and the Chebyshev polynomials. The results are shown in Fig. 3. All representations show similar behavior.

Fig. 3
figure 3

Error of the approximate solution in Experiment 3 measured in the norm of \({H_{D}^{1}}((0,1),{\mathbb {R}}^{m})\). The abbreviations (R) for the Runge-Kutta basis in Legendre representation, (L) for the Legendre basis, and (C) for the Chebyshev basis are used

A general note is in order. The exact solution has approximately the norm 1. The machine accuracy is εmach ≈ 2.22 × 10− 16 in all computations. The best accuracy obtained is approximately 10− 12. Considering that there is a twofold differentiation involved in the problem of the example we would expect a much lower accuracy. This surprising behavior has also been observed in other experiments and when using the norms of \(L^{2}((0,1),{\mathbb {R}}^{m})\) and \(L^{\infty }((0,1),{\mathbb {R}}^{m})\).

The next example is an index-3 one which has l = 4 dynamical degrees of freedom. It is the linearized version of an example presented [6] that has also been considered in [27].

Example 3

Consider the DAE

$$ A(Dx)'(t)+B(t)x(t)=q(t),\quad t\in[0,5], $$

where

$$ \begin{array}{@{}rcl@{}} A=\begin{bmatrix} 1&0&0&0&0&0\\ 0&1&0&0&0&0\\ 0&0&1&0&0&0\\ 0&0&0&1&0&0\\ 0&0&0&0&1&0\\ 0&0&0&0&0&1\\ 0&0&0&0&0&0 \end{bmatrix}, D=\begin{bmatrix} 1&0&0&0&0&0&0\\ 0&1&0&0&0&0&0\\ 0&0&1&0&0&0&0\\ 0&0&0&1&0&0&0\\ 0&0&0&0&1&0&0\\ 0&0&0&0&0&1&0 \end{bmatrix}, \end{array} $$

and the smooth coefficient matrix

$$ \begin{array}{@{}rcl@{}} B(t)&=& \begin{bmatrix} 0&0&0&-1&0&0&0\\ 0&0&0&0&-1&0&0\\ 0&0&0&0&0&-1&0\\ 0&0&\sin t&0&1&-\cos t&-2\rho \cos^{2}t\\ 0&0&-\cos t&-1&0&-\sin t&-2\rho \sin t\cos t\\ 0&0&1&0&0&0&2\rho \sin t\\ 2\rho \cos^{2}t&2\rho \sin t \cos t&-2\rho\sin t&0&0&0&0 \end{bmatrix},\\ \rho&=&5, \end{array} $$

subject to the initial conditions

$$ x_{2}(0)=1,\quad x_{3}(0)=2,\quad x_{5}(0)=0,\quad x_{6}(0)=0. $$

This problem has the tractability index 3 and dynamical degree of freedom l = 4. The right-hand side q has been chosen in such a way that the exact solution becomes

$$ \begin{array}{@{}rcl@{}} x_{\ast,1} &=& \sin t, \qquad\quad\quad\quad x_{\ast,4} = \cos t, \\ x_{\ast,2} &=& \cos t, \quad\qquad\quad\quad x_{\ast,5} = -\sin t, \\ x_{\ast,3} &=& 2\cos^{2} t, \quad\qquad\quad x_{\ast,6} = -2\sin 2t, \\ x_{\ast,7} &=& -\rho^{-1}\sin t. \end{array} $$

For the exact solution, it holds \(\|x_{\ast }\|_{L^{2}((0,5),{\mathbb {R}}^{7})}\approx 5.2\), \(\|x_{\ast }\|_{L^{\infty }((0,5),\mathbb {R}^{7})}=2\), and \(\|x_{\ast }\|_{{H_{D}^{1}}((0,5),\mathbb {R}^{7})}\approx 9.4\).

The following experiments with Example 3 are carried out under the same conditions as before when using Example 2.

Experiment 4

Robustness of the representation of the Runge-Kutta basis

In this experiment we intend to clarify the differences between different representations of the Runge-Kutta basis. The interpolation points have been fixed to be the Gauss-Legendre nodes. The Runge-Kutta basis has been represented with respect to the monomial, Legendre, and Chebyshev bases. The results are shown in Fig. 4. This test indicates that the monomial basis is much less robust than the others for N > 15 while the other representations behave very similar.

Fig. 4
figure 4

Error of the approximate solution in Experiment 4 measured in the norm of \({H_{D}^{1}}((0,1),{\mathbb {R}}^{m})\). The abbreviations (M) for the monomial basis, (L) for the Legendre basis, and (C) for the Chebyshev basis are used

Experiment 5

Robustness of the Runga-Kutta basis with respect to the node sequence

In this experiment we are interested in understanding the influence of the interpolation nodes. For that, we compared the uniform nodes sequence to the Gauss-Legendre and Chebyshev nodes. The uniform nodes are given by \(\rho _{i}=(i-\frac {1}{2})/N\). In accordance with the results of the previous experiment, the representation of the Runge-Kutta basis in Legendre polynomials has been chosen. The results are shown in Fig. 5. Not unexpectedly, uniform nodes are inferior to the other choices at least for N > 20. However, there is no real difference between Gauss-Legendre and Chebyshev nodes.

Fig. 5
figure 5

Error of the approximate solution in Experiment 5 measured in the norm of \({H_{D}^{1}}((0,1),{\mathbb {R}}^{m})\). The abbreviations (U) for uniform nodes, (L) for the Gauss-Legendre nodes, and (C) for the Chebyshev nodes are used

Experiment 6

Robustness of different polynomial representations

In this experiment we intend to compare the robustness of different bases. Therefore, we have chosen the Runge-Kutta basis with Gauss-Legendre interpolation nodes, the Legendre polynomials, and the Chebyshev polynomials. The results are shown in Fig. 6. All representations show similar behavior.

Fig. 6
figure 6

Error of the approximate solution in Experiment 6 measured in the norm of \({H_{D}^{1}}((0,1),{\mathbb {R}}^{m})\). The abbreviations (R) for the Runge-Kutta basis in Legendre representation, (L) for the Legendre basis, and (C) for the Chebyshev basis are used

As a conclusion, we can see that the results of the Experiments 1–3 and 4–6 are largely consistent.

4.3.2 Global behavior of the basis representations

We are interested in understanding the global error, which corresponds to error propagation in the case of initial value problems. In order to understand the error propagation properties we will investigate the accuracy of the computed solution with respect to an increasing number of subintervals n. This motivates to use a rather low order N of polynomials. In the previous section we observed that there is no difference in the local properties between different basis representations for low degrees N of the ansatz polynomials.

In the following experiments, the functionals used are \(\boldsymbol {{\varPhi }}_{\pi ,M}^{R}\) and \(\boldsymbol {{\varPhi }}_{\pi ,M}^{C}\). The number of collocation nodes is again M = N + 1. The basis functions are the shifted Legendre polynomials.

The discrete problem for n > 1 is an equality constraint linear least-squares problem. The equality constraint consists just of the continuity requirements for the differentiated components of the elements in Xπ. The problem is solved by a direct solution method as described in Part 2 [24]. In short, the equality constraints are eliminated by a sparse QR-decomposition with column pivoting as implemented in the code SPQR [8]. The resulting least-squares problem has then been solved by the same code.

Experiment 7

Influence of selection of collocation nodes, approximation degree N, and number n of subintervals.

In this experiment, we use Example 3 and vary the choice of collocation nodes as well as the degree N of the polynomial basis and the number n of subintervals. We compare Gauss-Legendre, Radau IIA and Lobatto collocation nodes. Since this example is a pure initial value problem, the use of the Radau IIA collocation nodes is especially justified.Footnote 13 The results using \(\boldsymbol {{\varPhi }}_{\pi ,M}^{R}\) are collected in Table 4, those using \(\boldsymbol {{\varPhi }}_{\pi ,M}^{C}\) in Table 5. We observe no real difference between the different sets of collocation points. The results seem to confirm the conjecture that, in case of smooth problems, a higher degree N is preferable over a larger n or, equivalently, a smaller stepsize h. In addition, for the highest degree polynomials (N = 20), the use of \(\boldsymbol {{\varPhi }}_{\pi ,M}^{C}\) seems to produce more accurate results than that of \(\boldsymbol {{\varPhi }}_{\pi ,M}^{R}\).

Table 4 Error of the approximate solution using Legendre basis functions and Gauss-Legendre (G), Radau IIA (R), and Lobatto (L) collocation nodes when using the functional \(\boldsymbol {{\varPhi }}_{\pi ,M}^{R}\) in Example 3. The norm is that of \({H_{D}^{1}}((0,5),{\mathbb {R}}^{7})\)
Table 5 Error of the approximate solution using Legendre basis functions and Gauss-Legendre (G), Radau IIA (R), and Lobatto (L) collocation nodes when using the functional \(\boldsymbol {{\varPhi }}_{\pi ,M}^{C}\) in Example 3. The norm is that of \({H_{D}^{1}}((0,5),{\mathbb {R}}^{7})\)

5 Final remarks and conclusions

In summary, in the present paper, we investigated questions related to an efficient and reliable realization of a least-squares collocation method. These questions are particularly important since a higher index DAE is an essentially ill-posed problem in naturally given spaces, which is why we must be prepared for highly sensitive discrete problems. In order to obtain an overall procedure that is as robust as possible, we provided criteria which led to a robust selection of the collocation points and of the basis functions, whereby the latter is also useful for the shape of the resulting discrete problem. Additionally, a number of new, more detailed, error estimates have been given that support some of the design decisions. The following particular items are worth highlighting in this context:

  • The basis for the approximation space should be appropriately shifted and scaled orthogonal polynomials. We could not observe any larger differences between the behavior of Legendre and Chebyshev polynomials.

  • The collocation points should be chosen to be the Gauss-Legendre, Lobatto, or Radau nodes. This leads to discrete problems whose conditioning using the discretization by interpolation (\(\boldsymbol {{\varPhi }}_{\pi ,M}^{R}\)) is not much worse than that resembling collocation methods for ordinary differential equations (\(\boldsymbol {{\varPhi }}_{\pi ,M}^{C}\)). A particular efficient and stable implementation is obtained if Gauss-Legendre or Radau nodes are used since, in this case, diagonal weighting (\(\boldsymbol {{\varPhi }}_{\pi ,M}^{I}\)) coincides with the interpolation approach.

  • A critical ingredient for the implementation of the method is the algorithm used for the solution of the constrained linear least-squares problems. Given the expected bad conditioning of the least-squares problem, a QR factorization with column pivoting must lie at the heart of the algorithm. At the same time, the sparsity structure must be used as best as possible. This issue will be discussed in Part 2.

  • It seems as if, for problems with a smooth solution, a higher degree N of the ansatz polynomials with a low number of subintervals n in the mesh is preferable over a smaller degree with a larger number of subintervals with respect to accuracy. Some first theoretical justification has been provided for this claim.

  • So far, in all experiments of this and previously published papers, we did not observe any serious differences in the accuracy obtained in dependence on the choice of M > N for fixed n. The results for M = N + 1 are not much different from those obtained for a larger M.

  • While superconvergence in classical collocation for ODEs and index-1 DAEs is a very favorable phenomenon, we could not find anything analogous in our experiments.

  • The simple collocation procedure using \(\boldsymbol {{\varPhi }}_{\pi ,M}^{C}\) performs surprisingly well. In fact, the results are, in our experiments, in par with those using \(\boldsymbol {{\varPhi }}_{\pi ,M}^{R}=\boldsymbol {{\varPhi }}_{\pi ,M}^{I}\). However, we have no theoretical justification for this as yet.

  • Our method is designed for variable grids. However, so far we have only worked with constant step size. In order to be able to adapt the grid and the polynomial degree, or even select appropriate grids, it is important to understand the structure of the error, that is, how the global error depends on local errors. This is a very important open problem, for which we have no solution yet.

In conclusion, we note that earlier implementations, among others the one from the very first paper in this matter [28] which started from proven ingredients for ODE codes, are from today’s point of view and experience a rather bad version for the least-squares collocation. Nevertheless, the test results calculated with it were already very impressive. This strengthens our belief that a careful implementation of the method will give rise to a very efficient solver for higher index DAEs.