1 Introduction

In a series of papers [1,2,3,4], we were developing a new method for solving higher-index differential algebraic equations (DAEs). In naturally given functional analytic settings, higher index DAEs give rise to ill-posed problems [5, Section 3.9; 6]. Motivated by the well-known method of least-squares, or discretization on preimage space, for the approximation of ill-posed problems [7], this approach has been adapted to the case of higher-index DAEs. In particular, the ansatz spaces for the discrete least-squares problem have been chosen to be piecewise polynomials. Additionally, the integrals have been replaced by discrete versions based on simplified integration rules, in the most simple approach by a version resembling well-known collocation methods for solving boundary value problems for systems of ordinary differential equations (ODEs). The latter, extremely simplified version of the approach proposed in [7], has been motivated by the success of collocations methods for ODEs. This connection led us to coin the notion least-squares collocation method and calling the integration nodes also as collocation points.

For our method, a number of convergence results for both linear and nonlinear DAEs have been proven. Even our first attempts showed surprisingly accurate results when applying the method to some linear examples [1]. More recently, we investigated the algorithmic ingredients of the method in more detail [8, 9]. Not surprisingly, the basis representation and the choice of the integration nodes showed an important influence on the accuracy of the method.

The present note is intended to further quantify the conditioning of the individual ingredients of the implementation of the proposed method and to better understand the (high) accuracy of the computational results obtained so far. Taking the ill-posedness of higher-index DAEs into account, we expect very sensitive discrete problems for sufficiently fine discretizations.

The practical implementation of a projection method consists of two steps for a given approximation space Xπ: Choice of a basis and formulation and solution of the arising discrete system by a suitable method. This in turn gives rise to two different operators, the first being the representation map connecting the elements of xXπ with their vector of coefficients with respect to the chosen basis. The other operator is the discrete version of the the least-squares collocation method that becomes a linearly equality constrained linear least-squares problem in our case. Both operators are investigated in detail both analytically and numerically.

In particular, qualitative and quantitative estimations for the condition numbers and norms of the representation map are proven for bases whose usefulness in the present applications has been established earlier [8, 9].

For the constrained linear least-squares problem, a number of perturbation results are well-known, e.g., [10,11,12]. However, in the present application, the constraints play a special role: In the usual choices of the basis functions, some coefficient vectors do not represent a function in the approximation space. A coefficient vector represents a function in the approximation space if and only if the constraints are fulfilled. Therefore, a new error estimation is derived, which takes care of the exceptional role of the constraints. The important ingredients in this estimate are the condition number of the constraints and a restricted condition number for the least-squares functional. For the former, a complete analytical characterization for the chosen bases is provided. In a number of numerical examples, values for the restricted condition number are presented.

In Section 2, the least-squares method for approximating linear DAEs is introduced and the representation map is constructed. Section 3 is devoted to an in-depth investigation of the representation map. Then we derive a perturbation result for constrained linear least-squares problems in Section 4. Numerical examples for the condition numbers of the different ingredients are given in Section 5. Section 6 contains some conclusions.

2 The problem setting

2.1 The discrete functional

In this section, we repeat the problem setting from [8] for the reader’s convenience. Consider a linear boundary-value problem for a DAE with properly involved derivative,

$$ \begin{array}{@{}rcl@{}} A(t)(Dx)'(t)+B(t)x(t) & =&q(t),\quad t\in[a,b], \end{array} $$
(1)
$$ \begin{array}{@{}rcl@{}} G_{a}x(a)+G_{b}x(b) & =&d. \end{array} $$
(2)

with \([a,b]\subset \mathbb {R}\) being a compact interval, \(D=[I 0]\in \mathbb {R}^{k\times m}\), k < m, with the identity matrix \(I\in \mathbb {R}^{k\times k}\). Furthermore, \(A(t)\in \mathbb {R}^{m\times k}\), \(B(t)\in \mathbb {R}^{m\times m}\), and \(q(t)\in \mathbb {R}^{m}\) are assumed to be sufficiently smooth with respect to t ∈ [a,b]. Moreover, \(G_{a},G_{b}\in \mathbb {R}^{l_{{dyn}}\times m}\). Thereby, ldyn is the dynamical degree of freedom of the DAE, that is, the number of free parameters that can be fixed by initial and boundary conditions. We assume further that \(\ker D\subseteq \ker G_{a}\) and \(\ker D\subseteq \ker G_{b}\).

Unlike regular ODEs where ldyn = k = m, for DAEs, it holds that 0 ≤ ldynk < m, in particular, ldyn = k for index-one DAEs, ldyn < k for higher-index DAEs, and ldyn = 0 can certainly happen.

The appropriate space for looking for solutions of (1)–(2) is (cf [2])

$$ {H_{D}^{1}}(a,b):=\{x\in L^{2}((a,b),\mathbb{R}^{m}:Dx\in H^{1}((a,b),\mathbb{R}^{m}\}. $$

Let \(\mathfrak {P}_{K}\) denote the set of all polynomials of degree less than or equal to K ≥ 0. Given the partition π,

$$ \pi:\quad a=t_{0}<t_{1}<\cdots<t_{n}=b, $$
(3)

with the stepsizes hj = tjtj− 1, \(h=\max \limits _{1\leq j\leq n}h_{j}\), and \(h_{\min \limits }=\min \limits _{1\leq j\leq n}h_{j}\). Let \(C_{\pi }([a,b],\mathbb {R}^{m})\) denote the space of piecewise continuous functions having breakpoints merely at the meshpoints of the partition π. Let N ≥ 1 be a fixed integer. We are looking for an approximate solution of our boundary value problem from the ansatz space \(X_{\pi }\subset {H_{D}^{1}}(a,b)\),

$$ \begin{array}{@{}rcl@{}} X_{\pi} &=&\{x\in C_{\pi}([a,b],\mathbb{R}^{m}):Dx\in C([a,b],\mathbb{R}^{k}), \\ && \quad x_{\kappa}\lvert_{[t_{j-1},t_{j})}\in\mathfrak{P}_{N}, \kappa=1,\ldots,k,\\ && \quad x_{\kappa}\lvert_{[t_{j-1},t_{j})}\in\mathfrak{P}_{N-1}, \kappa=k+1,\ldots,m, j=1,\ldots,n\}. \end{array} $$
(4)

The continuous version of the least-squares method reads: Find an xπXπ that minimizes the functional

$$ {\Phi}(x)={{\int}_{a}^{b}}\lvert A(t)(Dx)'(t)+B(t)x(t)-q(t)\rvert^{2}{d} t+\lvert G_{a}x(a)+G_{b}x(b)-d\rvert^{2}. $$
(5)

Here and in the following, \(\lvert \cdot \rvert \) denotes the Euclidean norm in the corresponding spaces \(\mathbb {R}^{\alpha }\) for the appropriate α. Let 〈⋅,⋅〉 denote the scalar product in \(\mathbb {R}^{\alpha }\).

The functional values Φ(x), which are needed when minimizing for xXπ, cannot be evaluated exactly and the integral must be discretized accordingly. Taking into account that the boundary-value problem is ill-posed in the higher index case, perturbations of the functional may have a serious influence on the error of the approximate least-squares solution or even prevent convergence towards the exact solution. Therefore, careful approximations of the integral in Φ are required. We take over the options provided in [8], in which MN + 1 so-called collocation points

$$ 0\leq\rho_{1}<\cdots<\rho_{M}\leq1. $$
(6)

are used, and further, on the subintervals of the partition π,

$$ t_{ji}=t_{j-1}+\rho_{i}h_{j},\quad i=1,\ldots,M, j=1,\ldots,n. $$

Introducing, for each xXπ and w(t) = A(t)(Dx)(t) + B(t)x(t) − q(t), the corresponding vector \(W\in \mathbb {R}^{mMn}\) by

$$ W=\left[\begin{array}{c} W_{1}\\ \vdots\\ W_{n} \end{array}\right]\in\mathbb{R}^{mMn},\quad W_{j}=h_{j}^{1/2}\left[\begin{array}{c} w(t_{j1})\\ \vdots\\ w(t_{jM}) \end{array}\right]\in\mathbb{R}^{mM}, $$
(7)

we turn to an approximate functional of the form

$$ \begin{array}{@{}rcl@{}} {\Phi}_{\pi,M}(x)=W^{T}\mathcal{L}W+\lvert G_{a}x(a)+G_{b}x(b)-d\rvert^{2},\quad x\in X_{\pi}, \end{array} $$
(8)

with a positive definite symmetric matrixFootnote 1

$$ \begin{array}{@{}rcl@{}} \mathcal{L}={\text{diag}}(L\otimes I_{m},\ldots,L\otimes I_{m}). \end{array} $$
(9)

As detailed in [8], we have different options for the positive definite symmetric matrix \(L\in \mathbb {R}^{M\times M}\), namely

$$ \begin{array}{@{}rcl@{}} L & =&L^{C}=M^{-1}I_{M}, \end{array} $$
(10)
$$ \begin{array}{@{}rcl@{}} L & =&L^{I}={\text{diag}}(\gamma_{1},\ldots,\gamma_{M}), \end{array} $$
(11)
$$ \begin{array}{@{}rcl@{}} L & =&L^{R}=(V^{-1})^{T}V^{-1}, \end{array} $$
(12)

see [8, Section 3] for details concerning the selection of the quadrature weights γ1,…,γM and the construction of the mass matrix V. We emphasize that the matrices LC,LI,LR depend only on M, the node sequence (6), and the quadrature weights, but do not depend on the partition π and its stepsizes at all.

In the context of the numerical experiments below, we denote each of the different versions of the functional by \({\Phi }_{\pi ,M}^{C}\), \({\Phi }_{\pi ,M}^{I}\), and \({\Phi }_{\pi ,M}^{R}\), respectively. The following convergence result is known [8, Theorem 2]:

Theorem 1

Let the DAE (1) be regular with index \(\mu \in \mathbb {N}\) and let the boundary condition (2) be accurately stated. Let x be a solution of the boundary value problem (1)–(2), and let A,B,q and also x be sufficiently smooth.

Let all partitions π be such that \(h/h_{\min \limits }\leq \rho \), with a global constant ρ. Then, with

$$ M\geq N+\mu, $$

the following statements are true:

  1. (1)

    For sufficient fine partitions π and each sequence of arbitrarily placed nodes (6), there exists exactly one \(x_{\pi }^{R}\in X_{\pi }\) minimizing the functional \({\Phi }_{\pi ,M}^{R}\) on Xπ, and

    $$ \begin{array}{@{}rcl@{}} \|x_{\pi}^{R}-x_{\ast}\|_{{H_{D}^{1}}(a,b)}\leq C_{R}h^{N-\mu+1}. \end{array} $$
  2. (2)

    For each integration rule related to the interval [0,1], with M nodes (6) and positive weights γ1,…,γM, that is exact for polynomials with degree less than or equal to 2M − 2, and sufficient fine partitions π, there exists exactly one \(x_{\pi }^{I}\in X_{\pi }\) minimizing the functional \({\Phi }_{\pi ,M}^{I}\) on Xπ, and \(x_{\pi }^{I}=x_{\pi }^{R}\), thus

    $$ \begin{array}{@{}rcl@{}} \|x_{\pi}^{I}-x_{\ast}\|_{{H_{D}^{1}}(a,b)}\leq C_{R}h^{N-\mu+1}. \end{array} $$

A corresponding result for \({\Phi }_{\pi ,M}^{C}\) is not known. Numerical tests showed excellent convergence results even for cases not covered by Theorem 1. This holds in particular for any MN + 1 tested in all three cases of the functional Φπ,M. Thus, M = N + 1 seems to be the preferable choice.

2.2 A basis representation of Φπ,M

By choosing an appropriate basis for Xπ, the minimization of the functional (8) will be reduced to a minimization problem for the coefficients of the elements xXπ. For the subsequent considerations, it is appropriate to introduce the space

$$ \begin{array}{@{}rcl@{}} \tilde{X}_{\pi} & =&\{x\in C_{\pi}([a,b],\mathbb{R}^{m}): \\ x_{\kappa}\lvert_{[t_{j-1},t_{j})}\in\mathfrak{P}_{N}, \kappa&=&1,\ldots,k,\quad \\x_{\kappa}\lvert_{[t_{j-1},t_{j})}\in\mathfrak{P}_{N-1}, \kappa&=&k+1,\ldots,m, j=1,\ldots,n\}. \end{array} $$
(13)

In particular, the elements x of \(\tilde {X}_{\pi }\) are no longer required to have continuous components Dx. Obviously, it holds \(X_{\pi }\subseteq \tilde {X}_{\pi }\). In general, \(\tilde {X}_{\pi }\) is not a subspace of \({H_{D}^{1}}(a,b)\). However, it holds

$$ \begin{array}{@{}rcl@{}} X_{\pi} & =&\{x\in\tilde{X}_{\pi}:x_{\kappa}\in C[a,b],\quad\kappa=1,\ldots,k\}\\ & =&\tilde{X}_{\pi}\cap {H_{D}^{1}}(a,b). \end{array} $$

Based on the analysis in [8, Section 4], we provide a basis of the ansatz space \(\tilde {X}_{\pi }\) to begin with. Assume that {p0,…,pN− 1} is a basis of \(\mathfrak {P}_{N-1}\) defined on the reference interval [0,1]. Then, \(\{\bar {p}_{0},\ldots ,\bar {p}_{N}\}\) given by

$$ \bar{p}_{i}(\tau)=\begin{cases} 1, & i=0,\\ {\int}_{0}^{\tau}p_{i-1}(\sigma)\mathrm{d}\sigma, & i=1,\ldots,N,\quad\tau\in[0,1], \end{cases} $$
(14)

form a basis of \(\mathfrak {P}_{N}\). The transformation to the interval (tj− 1,tj) of the partition π (3) yields

$$ \begin{array}{@{}rcl@{}} p_{ji}(t)=p_{i}((t-t_{j-1})/h_{j}),\quad\bar{p}_{ji}(t)=h_{j}\bar{p}_{i}((t-t_{j-1})/h_{j}). \end{array} $$
(15)

and in particular

$$ \begin{array}{@{}rcl@{}} \bar{p}_{ji}(t_{j-1}) & =&h_{j}\bar{p}_{i}(0)=h_{j}\begin{cases} 1, & i=0,\\ 0, & i=1,\ldots,N, \end{cases}\\ \bar{p}_{ji}(t_{j}) & =&h_{j}\bar{p}_{i}(1)=h_{j}\begin{cases} 1, & i=0,\\ {{\int}_{0}^{1}}p_{i-1}(\sigma)\mathrm{d}\sigma, & i=1,\ldots,N. \end{cases} \end{array} $$

Next, we form the matrix functions

$$ \begin{array}{@{}rcl@{}} \bar{\mathcal{P}}_{j}=\begin{bmatrix}\bar{p}_{j0} & {\ldots} & \bar{p}_{jN}\end{bmatrix}:[t_{j-1},t_{j}]\rightarrow\mathbb{R}^{1\times(N+1)},\quad\mathcal{P}_{j}=\begin{bmatrix}p_{j0} & {\ldots} & p_{j,N-1}\end{bmatrix}:[t_{j-1},t_{j}]\rightarrow\mathbb{R}^{1\times N}, \end{array} $$

such that

$$ \begin{array}{@{}rcl@{}} \bar{\mathcal{P}}_{j}(t_{j-1}) & =&h_{j}\begin{bmatrix}1 & 0 & {\ldots} & 0\end{bmatrix},\quad j=1,\ldots,n, \end{array} $$
(16)
$$ \begin{array}{@{}rcl@{}} \bar{\mathcal{P}}_{j}(t_{j}) & =&h_{j}\begin{bmatrix}1 & {{\int}_{0}^{1}}p_{0}(\sigma)\mathrm{d}\sigma & {\ldots} & {{\int}_{0}^{1}}p_{N-1}(\sigma)\mathrm{d}\sigma\end{bmatrix},\quad j=1,\ldots,n. \end{array} $$
(17)

Following the discussions in [8], the following bases are suitable in applications:

Legendre basis:

Let Pi denote the Legendre polynomials. Then, pi is chosen to be the shifted Legendre polynomial, that is

$$ p_{i}(\tau)=P_{i}(2\tau-1),\quad i=0,1,\ldots. $$
Modified Legendre basis:

In this case, we set

$$ \bar{p}_{0}(\tau)=1,\quad\bar{p}_{i}(\tau)=P_{i}(2\tau-1)-(-1)^{i},\quad i=1,2,\ldots, $$

such that \(p_{i}=\bar {p}_{i+1}^{\prime }\), i = 0,1,…. This basis has not been considered in [8], but later experiments indicated its usefulness. This is supported by considerations later below.

Chebyshev basis:

Let Ti denote the Chebyshev polynomials of the first kind. Then we define

$$ p_{i}(\tau)=T_{i}(2\tau-1),\quad i=0,1,\ldots. $$
Runge-Kutta basis:

Let 0 < τ1 < ⋯ < τN < 1 be interpolation nodes. Then we set

$$ p_{i}(\tau)=\frac{{\prod}_{\kappa\neq i+1}(\tau-\tau_{\kappa})}{{\prod}_{\kappa\neq i+1}(\tau_{i+1}-\tau_{\kappa})}. $$
(18)

The latter are the usual Lagrange interpolation polynomials. In the implementation, it is advantageous to represent these polynomials in terms of Chebyshev polynomials [8]. Of particular use is the Runge-Kutta basis if the shifted Chebyshev nodes \(\tau _{\kappa }=\frac {1}{2}\left (1+\cos \limits \left (\frac {2\kappa -1}{2N}\pi \right )\right )\) are chosen as interpolation nodes.

For \(x\in \tilde {X}_{\pi }\) we use the denotations

$$ \begin{array}{@{}rcl@{}} x(t)=x_{j}(t)=\begin{bmatrix}x_{j1}(t)\\ \vdots\\ x_{jm}(t) \end{bmatrix}\in\mathbb{R}^{m},\quad Dx_{j}(t)=\begin{bmatrix}x_{j1}(t)\\ \vdots\\ x_{jk}(t) \end{bmatrix}\in\mathbb{R}^{k},\quad t\in[t_{j-1},t_{j}). \end{array} $$

Then, we develop each xj componentwise

$$ \begin{array}{@{}rcl@{}} x_{j\kappa}(t) & =&\sum\limits_{l=0}^{N}c_{j\kappa l}\bar{p}_{jl}(t)=\bar{\mathcal{P}}_{j}(t)c_{j\kappa},\quad\kappa=1,\ldots,k,\\ x_{j\kappa}(t) & =&\sum\limits_{l=0}^{N-1}c_{j\kappa l}p_{jl}(t)=\mathcal{P}_{j}(t)c_{j\kappa},\quad\kappa=k+1,\ldots,m. \end{array} $$
(19)

with

$$ \begin{array}{@{}rcl@{}} c_{j\kappa}=\begin{bmatrix}c_{j\kappa0}\\ \vdots\\ c_{j\kappa N} \end{bmatrix}\in\mathbb{R}^{N+1},\quad\kappa=1,\ldots,k,\quad c_{j\kappa}=\begin{bmatrix}c_{j\kappa0}\\ \vdots\\ c_{j\kappa,N-1} \end{bmatrix}\in\mathbb{R}^{N},\quad\kappa=k+1,\ldots,m. \end{array} $$

Introducing still

$$ \begin{array}{@{}rcl@{}} {\Omega}_{j}(t)=\left[\begin{array}{cc} I_{k}\otimes\bar{\mathcal{P}}_{j}(t) & \mathcal{O}_{1}\\ \mathcal{O}_{2} & I_{m-k}\otimes\mathcal{P}_{j}(t) \end{array}\right]\in\mathbb{R}^{m\times(mN+k)},\quad c_{j}=\begin{bmatrix}c_{j1}\\ \vdots\\ c_{jm} \end{bmatrix}\in\mathbb{R}^{mN+k}, \end{array} $$

with \(\mathcal {O}_{1}\in \mathbb {R}^{k\times kN}\) and \(\mathcal {O}_{2}\in \mathbb {R}^{(m-k)\times (m-k)(N+1)}\) being matrices having only zero entries we represent, for tIj, j = 1,…,n,

$$ \begin{array}{@{}rcl@{}} x_{j}(t) & =&{\Omega}_{j}(t)c_{j}, \end{array} $$
(20)
$$ \begin{array}{@{}rcl@{}} (Dx_{j})^{\prime}(t) & =&(D{\Omega}_{j})^{\prime}(t)c_{j}=\begin{bmatrix}I_{k}\otimes\bar{\mathcal{P}}_{j}^{\prime}(t) & \mathcal{O}_{1}\end{bmatrix}c_{j} \end{array} $$
(21)

where \(\bar {\mathcal {P}}_{j}^{\prime }(t)=\begin {bmatrix}0 & p_{j0} & {\ldots } & p_{j,N-1}\end {bmatrix}\). Now we collect all coefficients cjκl in the vector c,

$$ \begin{array}{@{}rcl@{}} c=\begin{bmatrix}c_{1}\\ \vdots\\ c_{n} \end{bmatrix}\in\mathbb{R}^{n(mN+k)}. \end{array} $$

Definition 1

The mapping \(\mathcal {R}:\mathbb {R}^{n(mN+k)}\rightarrow \tilde {X}_{\pi }\) given by (20) is called the representation map of \(\tilde {X}_{\pi }\) with respect to the basis (15).

Fact 1

We observe that each \(x\in \tilde {X}_{\pi }\) has a representation of the kind (20) and each function of the form (20) is an element of \(\tilde {X}_{\pi }\). Since \(\dim \tilde {X}_{\pi }=n(mN+k)\), \(\mathcal {R}\) is a bijective mapping.

Consider an element \(x\in \tilde {X}_{\pi }\) with its representation (20). This element belongs to Xπ if and only if its first k components are continuous. Using the representation (19) we see that xXπ if and only if

$$ \mathcal{C}c=0. $$
(22)

where \(\mathcal {C}\in \mathbb {R}^{k(n-1)\times n(mN+k)}\) and

$$ \mathcal{C}=\begin{bmatrix}I_{k}\otimes\bar{\mathcal{P}}_{1}(t_{1}) & \mathcal{O}_{1} & -I_{k}\otimes\bar{\mathcal{P}}{}_{2}(t_{1}) & \mathcal{O}_{1}\\ & & I_{k}\otimes\bar{\mathcal{P}}_{2}(t_{2}) & \mathcal{O}_{1} & -I_{k}\otimes\bar{\mathcal{P}}_{3}(t_{2}) & \mathcal{O}_{1}\\ & & & {\ddots} & & \ddots\\ \\ & & & & I_{k}\otimes\bar{\mathcal{P}}_{n-1}(t_{n-1}) & \mathcal{O}_{1} & -I_{k}\otimes\bar{\mathcal{P}}_{n}(t_{n-1}) & \mathcal{O}_{1} \end{bmatrix}. $$

Owing to the construction, \(\mathcal {C}\) has full row rank, cf. (16), (17).

Fact 2

Let \(\tilde {\mathcal {R}}=\left .\mathcal {R}\right \rvert _{\ker \mathcal {C}}\) be the restriction of the representation map \(\mathcal {R}\) onto the kernel \(\ker \mathcal {C}\) of \(\mathcal {C}\). Since \(\mathcal {C}\) has full row rank, \(\dim \ker \mathcal {C}=n(mN+k)-k(n-1)=nmN+k=\dim X_{\pi }\), and \(\mathcal {R}\) is injective, \(\tilde {\mathcal {R}}\) is bijective. In particular, it holds also \(\tilde {\mathcal {R}}^{-1}=\left .\mathcal {R}^{-1}\right \rvert _{\text {im}\tilde {\mathcal {R}}}\).

The representations (20)–(21) can be inserted into the functional Φπ,M (8). The result becomes a least-squares functional of the form

$$ \varphi(c)=\lvert\mathcal{A}c-r\rvert_{\mathbb{R}^{nmM+l_{dyn}}}^{2}\rightarrow\min! $$
(23)

where \(\mathcal {A}\) has the structure

$$ \mathcal{A}=\left[\begin{array}{ccccc} \mathcal{A}_{1} & 0 & {\cdots} & & 0\\ 0 & {\ddots} & & & \vdots\\ {\vdots} & & \ddots\\ & & & {\ddots} & 0\\ 0 & & & & \mathcal{A}_{n}\\ G_{a}{\Omega}_{1}(t_{0}) & 0 & {\cdots} & 0 & G_{b}{\Omega}_{n}(t_{n}) \end{array}\right] $$

where \(\mathcal {A}_{j}\in \mathbb {R}^{mM\times (mN+k)}\) and \(G_{a}{\Omega }_{1}(t_{0}),G_{b}{\Omega }_{n}(t_{n})\in \mathbb {R}^{l_{{dyn}}\times (mN+k)}\).

So the discrete version of the least-squares method (8) becomes the linear least-squares problem (23) under the linear equality constraint (22).

Note that it holds \(r\in \mathbb {R}^{nmM+l_{{dyn}}}\) and \(\mathcal {A}\in \mathbb {R}^{(nmM+l_{{dyn}})\times n(mN+k)}\). The matrices \(\mathcal {A}\) and \(\mathcal {C}\) are very sparse. More details of the construction of \(\mathcal {A}\) and \(\mathcal {C}\) can be found in [9].

2.3 Conditioning of the implementation

The implementation for solving the least-squares problem (8) consists of the following steps:

  1. 1.

    Form \(\mathcal {A}\), \(\mathcal {C}\), and r.

  2. 2.

    Solve the constrained least-squares problem (23)–(22).

  3. 3.

    Form the approximation xπ.

What are the errors to be expected? Consider the individual steps:

  1. 1.

    The computation of \(\mathcal {C}\) is not critical. Depending on the chosen basis, the entries of \(\mathcal {C}\) may be available analytically. So we expect at most rounding errors for the representation of the analytical data.Footnote 2 While the components of \(\mathcal {A}\) corresponding to the boundary conditions are only subject to truncation errors when representing real numbers in floating point arithmetic, the DAE-related entries are subject to rounding errors as well as certain amplification factors stemming from the multiplication by the square root of the matrix \({\mathscr{L}}\) (9). The conditioning of the versions (10) and (11) is easy to infer while that of (12) has been discussed extensively in [8]. Under reasonable assumptions on the choice of collocation points, they are rather small.

    Similar considerations apply to the computation of r.

  2. 2.

    This algorithmic step corresponds to the solution of a linearly constrained linear least-squares problem. A number of classical perturbation results are available, e.g., [11,12,13]. Further below, we represent a modified version that is taking into account the special role that the equality constraint \(\mathcal {C}c=0\) is playing in our application.

  3. 3.

    This step is described by the representation map \(\mathcal {R}\), which assigns, to each solution c of the previous step, the corresponding solution \(x_{\pi }=\mathcal {R}c\). If \(c\in \ker \mathcal {C}\), it holds \(x_{\pi }\in X_{\pi }\subseteq {H_{D}^{1}}(a,b)\). However, due to the errors made in the previous step, the condition \(c\in \ker \mathcal {C}\) cannot be guaranteed such that \(\mathcal {R}c\in \tilde {X}_{\pi }\) but not necessarily \(\mathcal {R}c\in {H_{D}^{1}}(a,b)\)! In the next section, we will discuss the properties of \(\mathcal {R}\).

3 Properties of the representation map \(\mathcal {R}\)

In the present section, we will investigate the properties of the representation map \(\mathcal {R}:\mathbb {R}^{n(mN+k)}\rightarrow \tilde {X}_{\pi }\) in more detail. Previously, we have established a representation of \(\mathcal {R}\) on each subinterval; see (20). We intend to derive a representation of \(\mathcal {R}^{-1}\). The main tool will be interpolation.

Choose two sets of interpolation nodes

$$ \begin{array}{@{}rcl@{}} 0\leq\bar{\sigma}_{1}<\cdots<\bar{\sigma}_{N+1}\leq1 { and } 0\leq\sigma_{1}<\cdots<\sigma_{N}\leq1, \end{array} $$
(24)

and shifted ones

$$ \bar{\tau}_{ji}=t_{j-1}+\bar{\sigma}_{i}h_{j},\quad\tau_{ji}=t_{j-1}+\sigma_{i}h_{j} $$

such that the integration formulae

$$ \begin{array}{@{}rcl@{}} {{\int}_{0}^{1}}f(\sigma){d}\sigma\approx\sum\limits_{i=1}^{N+1}\bar{\gamma}_{i}f(\bar{\sigma}_{i}),\quad{and } {{\int}_{0}^{1}}f(\sigma){d}\sigma\approx\sum\limits_{i=1}^{N}\gamma_{i}f(\sigma_{i}) \end{array} $$

have positive weights and so that they are exact for polynomials up to degree 2N and 2N − 2, respectively. With matrices

$$ \begin{array}{@{}rcl@{}} \bar{V}_{j} & = &\begin{bmatrix}\bar{p}_{j0}(\bar{\tau}_{j1}) & {\cdots} & \bar{p}_{jN}(\bar{\tau}_{j1})\\ {\vdots} & & \vdots\\ \bar{p}_{j0}(\bar{\tau}_{j,N+1}) & {\cdots} & \bar{p}_{jN}(\bar{\tau}_{j,N+1}) \end{bmatrix}\!=h_{j}\begin{bmatrix}\bar{p}_{0}(\bar{\sigma}_{1}) & {\cdots} & \bar{p}_{N}(\bar{\sigma}_{1})\\ {\vdots} & & \vdots\\ \bar{p}_{0}(\bar{\sigma}_{N+1}) & {\cdots} & \bar{p}_{N}(\bar{\sigma}_{N+1}) \end{bmatrix}\!=:h_{j}\bar{V}, \end{array} $$
(25)
$$ \begin{array}{@{}rcl@{}} V_{j} & =&\begin{bmatrix}p_{j0}(\tau_{j1}) & {\cdots} & p_{j,N-1}(\tau_{j1})\\ {\vdots} & & \vdots\\ p_{j0}(\tau_{jN}) & {\cdots} & p_{j,N-1}(\tau_{jN}) \end{bmatrix}=\begin{bmatrix}p_{0}(\sigma_{1}) & {\cdots} & p_{N-1}(\sigma_{1})\\ {\vdots} & & \vdots\\ p_{0}(\sigma_{N}) & {\cdots} & p_{N-1}(\sigma_{N}) \end{bmatrix}=:V, \end{array} $$
(26)

and

$$ \begin{array}{@{}rcl@{}} \bar{V}_{j}^{\prime} & =\begin{bmatrix}\bar{p}^{\prime}_{j0}(\bar{\tau}_{j1}) & {\cdots} & \bar{p}^{\prime}_{j N}(\bar{\tau}_{j1})\\ {\vdots} & & \vdots\\ \bar{p}^{\prime}_{j0}(\bar{\tau}_{j,N+1}) & {\cdots} & \bar{p}^{\prime}_{jN}(\bar{\tau}_{j,N+1}) \end{bmatrix}=\begin{bmatrix}0 & p_{0}(\bar{\sigma}_{1}) & {\cdots} & p_{N-1}(\bar{\sigma}_{1})\\ {\vdots} & {\vdots} & & \vdots\\ 0 & p_{0}(\bar{\sigma}_{N+1}) & {\cdots} & p_{N-1}(\bar{\sigma}_{N+1}) \end{bmatrix}=:\mathring{V}, \end{array} $$
(27)

we represent, for κ = 1,…,k,

$$ \begin{array}{@{}rcl@{}} X_{j\kappa}:=\begin{bmatrix}x_{j\kappa}(\bar{\tau}_{j1})\\ \vdots\\ x_{j\kappa}(\bar{\tau}_{j,N+1}) \end{bmatrix}=\bar{V}_{j}c_{j\kappa}=h_{j}\bar{V}c_{j\kappa},\\ X^{\prime}_{j\kappa}:=\begin{bmatrix}x^{\prime}_{j\kappa}(\bar{\tau}_{j1})\\ \vdots\\ x^{\prime}_{j\kappa}(\bar{\tau}_{j,N+1}) \end{bmatrix}=\bar{V}^{\prime}_{j}c_{j\kappa}=\mathring{V}c_{j\kappa}, \end{array} $$

and, for κ = k + 1,…,m,

$$ \begin{array}{@{}rcl@{}} X_{j\kappa}:=\begin{bmatrix}x_{j\kappa}(\tau_{j1})\\ \vdots\\ x_{j\kappa}(\tau_{jN}) \end{bmatrix}=V_{j}c_{j\kappa}=Vc_{j\kappa}. \end{array} $$

The matrices \(\bar {V}\) and V are nonsingular. This amounts to the relation

$$ \begin{array}{@{}rcl@{}} c_{j}=\begin{bmatrix}c_{j1}\\ \vdots\\ c_{jk}\\ c_{j,k+1}\\ \vdots\\ c_{jm} \end{bmatrix}=\begin{bmatrix}I_{k}\otimes\bar{V}^{-1}\\ & I_{m-k}\otimes V^{-1} \end{bmatrix}\begin{bmatrix}\frac{1}{h_{j}}X_{j1}\\ \vdots\\ \frac{1}{h_{j}}X_{jk}\\ X_{j,k+1}\\ \vdots\\ X_{jm} \end{bmatrix}, j=1,\ldots,n. \end{array} $$
(28)

Owing to the fact, that polynomials of degree N and N − 1 are uniquely determined by their values at N + 1 and N different nodes, respectively, formula (28) provides \(c=\mathcal {R}^{-1}x\) for each arbitrary given \(x\in \tilde {X}_{\pi }\).

Next, we equip \(\tilde {X}_{\pi }\) with the norms

$$ \begin{array}{@{}rcl@{}} \lVert x\rVert_{L^{2}}^{2} & =&\sum\limits_{j=1}^{n}\left\{ \sum\limits_{\kappa=1}^{k}{\int}_{t_{j-1}}^{t_{j}}\lvert x_{j\kappa}(t)\rvert^{2}{d} t+ \sum\limits_{\kappa=k+1}^{m}{\int}_{t_{j-1}}^{t_{j}}\lvert x_{j\kappa}(t)\rvert^{2}{d} t \right\} , \end{array} $$
(29)
$$ \begin{array}{@{}rcl@{}} \lVert x\rVert_{H_{D,\pi}^{1}}^{2} & =&\sum\limits_{j=1}^{n}\left\{ \sum\limits_{\kappa=1}^{k}{\int}_{t_{j-1}}^{t_{j}}(\lvert x_{j\kappa}(t)\rvert^{2}+\lvert x_{j\kappa}^{\prime}(t)\rvert^{2}){d} t+ \sum\limits_{\kappa=k+1}^{m}{\int}_{t_{j-1}}^{t_{j}}\lvert x_{j\kappa}(t)\rvert^{2}{d} t \right\} . \end{array} $$
(30)

The latter norm reduces, for xXπ, to \(\lVert x\rVert _{H_{D,\pi }^{1}}=\|x\|_{{H_{D}^{1}}(a,b)}\). Moreover, \(\lVert \cdot \rVert _{L^{2}}=\lVert \cdot \rVert _{L^{2}((a,b),\mathbb {R}^{m})}\). On \(\mathbb {R}^{n(mN+k)}\), we use the Euclidean norm. Then \(\mathcal {R}\) becomes a homeomorphism in each case, and we are interested in the respective operator norms \(\lVert \mathcal {R}\rVert _{\mathbb {R}^{n(mN+k)}\rightarrow L^{2}}\), \(\lVert \mathcal {R}\rVert _{\mathbb {R}^{n(mN+k)}\rightarrow H_{D,\pi }^{1}}\), \(\|\mathcal {R}^{-1}\|_{L^{2}\rightarrow \mathbb {R}^{n(mN+k)}}\), and \(\|\mathcal {R}^{-1}\|_{H_{D,\pi }^{1}\rightarrow \mathbb {R}^{n(mN+k)}}\). Regarding the properties of the related integration formulae and introducing the diagonal matrices

$$ \bar{\Gamma}={\text{diag}}(\bar{\gamma}_{1}^{1/2},\cdots,\bar{\gamma}_{N+1}^{1/2}), {\Gamma}={\text{diag}}(\gamma_{1}^{1/2},\cdots,\gamma_{N}^{1/2}) $$
(31)

we compute for any \(x=\mathcal {R}c\), and κ = 1,…,k,

$$ \begin{array}{@{}rcl@{}} {\int}_{t_{j-1}}^{t_{j}}\lvert x_{j\kappa}(t)\rvert^{2}{d} t & =&h_{j}\sum\limits_{i=1}^{N+1}\bar{\gamma}_{i}\lvert x_{j\kappa}(\bar{\tau}_{ji})\rvert^{2}=h_{j}\sum\limits_{i=1}^{N+1}\lvert\bar{\gamma}_{i}^{1/2}x_{j\kappa}(\bar{\tau}_{ji})\rvert^{2}=h_{j}\lvert\bar{\Gamma}X_{j\kappa}\rvert^{2}\\ & =&h_{j}\lvert\bar{\Gamma}\bar{V}_{j}c_{j\kappa}\rvert^{2}=h_{j}\lvert\bar{\Gamma}h_{j}\bar{V}c_{j\kappa}\rvert^{2}, \end{array} $$
$$ \begin{array}{@{}rcl@{}} {\int}_{t_{j-1}}^{t_{j}} & &\!\!\!\!\!\!\!\!\!(\lvert x_{j\kappa}(t)\rvert^{2}+\lvert x_{j\kappa}^{\prime}(t)\rvert^{2}){d} t=h_{j}\sum\limits_{i=1}^{N+1}\bar{\gamma}_{i}(\lvert x_{j\kappa}(\bar{\tau}_{ji})\rvert^{2}+\lvert x_{j\kappa}^{\prime}(\bar{\tau}_{ji})\rvert^{2})\\ & =&h_{j}\sum\limits_{i=1}^{N+1}(\lvert\bar{\gamma}_{i}^{1/2}x_{j\kappa}(\bar{\tau}_{ji})\rvert^{2}+(\lvert\bar{\gamma}_{i}^{1/2}x_{j\kappa}^{\prime}(\bar{\tau}_{ji})\rvert^{2})=h_{j}\lvert\bar{\Gamma}X_{j\kappa}\rvert^{2}+h_{j}\lvert\bar{\Gamma}X_{j\kappa}^{\prime}\rvert^{2}\\ & =&h_{j}\lvert\bar{\Gamma}\bar{V}_{j}c_{j\kappa}\rvert^{2}+h_{j}\lvert\bar{\Gamma}\mathring{V}c_{j\kappa}\rvert^{2}=h_{j}\lvert\bar{\Gamma}h_{j}\bar{V}c_{j\kappa}\rvert^{2}+h_{j}\lvert\bar{\Gamma}\mathring{V}c_{j\kappa}\rvert^{2}\\ & =&h_{j}\left\lvert\begin{bmatrix}h_{j}\bar{\Gamma}\bar{V}\\ \bar{\Gamma}\mathring{V} \end{bmatrix}c_{j\kappa}\right\rvert^{2}, \end{array} $$

and, in addition, for κ = k + 1,…,m,

$$ \begin{array}{@{}rcl@{}} {\int}_{t_{j-1}}^{t_{j}}\lvert x_{j\kappa}(t)\rvert^{2}{d} t & =&h_{j}\sum\limits_{i=1}^{N}\gamma_{i}\lvert x_{j\kappa}(\tau_{ji})\rvert^{2}=h_{j}\sum\limits_{i=1}^{N}\lvert\gamma_{i}^{1/2}x_{j\kappa}(\tau_{ji})\rvert^{2}=h_{j}\lvert{\Gamma} X_{j\kappa}\rvert^{2}\\ & =&h_{j}\lvert{\Gamma} Vc_{j\kappa}\rvert^{2}. \end{array} $$

Summarizing, the following representations result:

$$ \begin{array}{@{}rcl@{}} \rVert x\lVert_{L^{2}}^{2}=\sum\limits_{j=1}^{n}\left\{ \sum\limits_{\kappa=1}^{k}\lvert h_{j}^{3/2}\bar{\Gamma}\bar{V}c_{j\kappa}\rvert^{2}+\sum\limits_{\kappa=k+1}^{m}\lvert h_{j}^{1/2}{\Gamma} Vc_{j\kappa}\rvert^{2}\right\} =\sum\limits_{j=1}^{n}\lvert U_{j}c_{j}\rvert^{2}=\lvert\mathcal{U}c\rvert^{2}, \end{array} $$
(32)

with matrices

$$ \begin{array}{@{}rcl@{}} \mathcal{U} & =&{\text{diag}}(U_{1},\cdots,U_{n})\in\mathbb{R}^{n(mN+k)\times n(mN+k)},\\ U_{j} & =&\begin{bmatrix}I_{k}\otimes h_{j}^{3/2}\bar{\Gamma}\bar{V}\\ & I_{m-k}\otimes h_{j}^{1/2}{\Gamma} V \end{bmatrix}\in\mathbb{R}^{(mN+k)\times(mN+k)}, \end{array} $$
(33)

and

$$ \begin{array}{@{}rcl@{}} \lVert x\rVert_{H_{D,\pi}^{1}}^{2}=\sum\limits_{j=1}^{n}\left\{ \sum\limits_{\kappa=1}^{k}\left\lvert\begin{bmatrix}h_{j}^{3/2}\bar{\Gamma}\bar{V}\\ h_{j}^{1/2}\bar{\Gamma}\mathring{V} \end{bmatrix}c_{j\kappa}\right\rvert^{2}+\sum\limits_{\kappa=k+1}^{m}\lvert h_{j}^{1/2}{\Gamma} Vc_{j\kappa}\rvert^{2}\right\} =\sum\limits_{j=1}^{n}\lvert\hat{U}_{j}c_{j}\rvert^{2}=\lvert\hat{\mathcal{U}}c\rvert^{2}, \end{array} $$
(34)

with matrices

$$ \begin{array}{@{}rcl@{}} \hat{\mathcal{U}} & =&{\text{diag}}(\hat{U}_{1},\cdots,\hat{U}_{n})\in\mathbb{R}^{n(mN+k+k(N+1))\times n(mN+k)},\\ \hat{U}_{j} & =&\begin{bmatrix}I_{k}\otimes\begin{bmatrix}h_{j}^{3/2}\bar{\Gamma}\bar{V}\\ h_{j}^{1/2}\bar{\Gamma}\mathring{V} \end{bmatrix}\\ & I_{m-k}\otimes h_{j}^{1/2}{\Gamma} V \end{bmatrix}\in\mathbb{R}^{(mN+k+k(N+1))\times(mN+k)}. \end{array} $$
(35)

Proposition 1

The singular values of \(\mathcal {U}\) and \(\hat {\mathcal {U}}\) are independent of the choice of the nodes σi and \(\bar {\sigma }_{i}\). Moreover, all singular values are positive.

Proof

Uj and \(\hat {U}_{j}\) have full column-rank. Consequently, \(\mathcal {U}^{T}\mathcal {U}\) and \(\hat {\mathcal {U}}^{T}\hat {\mathcal {U}}\) are symmetric and positive definite. Hence, their eigenvalues are all positive and, thus, also their singular values being the square root of the eigenvalues. The eigenvalues are independent of the choice of the nodes σi and \(\bar {\sigma }_{i}\) since, owing to the properties of the involved integration formulae, it holds that

$$ \begin{array}{@{}rcl@{}} (V^{T}{\Gamma}^{2}V)_{\alpha\upbeta} & =&{{\int}_{0}^{1}}p_{\alpha-1}p_{\upbeta-1}(\sigma){d}\sigma, \alpha,\upbeta=1,\cdots,N,\\ (\bar{V}^{T}\bar{\Gamma}^{2}\bar{V})_{\alpha\upbeta} & =&{{\int}_{0}^{1}}\bar{p}_{\alpha-1}\bar{p}_{\upbeta-1}(\sigma){d}\sigma, \alpha,\upbeta=1,\cdots,N+1,\\ (\mathring{V}^{T}\bar{\Gamma}^{2}\mathring{V})_{\alpha\upbeta} & =&{{\int}_{0}^{1}}\bar{p}^{\prime}_{\alpha-1}\bar{p}^{\prime}_{\upbeta-1}(\sigma){d}\sigma, \alpha,\upbeta=1,\cdots,N+1, \end{array} $$

such that the entries of \(\mathcal {U}^{T}\mathcal {U}\) and \(\hat {\mathcal {U}}^{T}\hat {\mathcal {U}}\) are independent of the choice of the integration formulae. □

Theorem 2

Let \(\sigma _{{\min \limits }}(\mathcal {U})\) and \(\sigma _{{\max \limits }}(\mathcal {U})\) denote the maximal and minimal singular values of \(\mathcal {U}\). Similarly, let \(\sigma _{{\min \limits }}(\mathcal {\hat {U}})\) and \(\sigma _{{\max \limits }}(\mathcal {\hat {U}})\) denote the maximal and minimal singular values of \(\hat {\mathcal {U}}\). Then it holds

$$ \begin{array}{@{}rcl@{}} \lVert\mathcal{R}\rVert_{\mathbb{R}^{n(mN+k)}\rightarrow L^{2}} & =&\sigma_{{\max}}(\mathcal{U}),\quad\|\mathcal{R}^{-1}\|_{L^{2}\rightarrow\mathbb{R}^{n(mN+k)}}=\sigma_{{\min}}(\mathcal{U})^{-1},\\ \lVert\mathcal{R}\rVert_{\mathbb{R}^{n(mN+k)}\rightarrow H_{D,\pi}^{1}} & =&\sigma_{{\max}}(\mathcal{\hat{U}}),\quad\|\mathcal{R}^{-1}\|_{H_{D,\pi}^{1}\rightarrow\mathbb{R}^{n(mN+k)}}=\sigma_{{\min}}(\mathcal{\hat{U}})^{-1}. \end{array} $$

Proof

It holds \(\mathcal {\hat {U}}\in \mathbb {R}^{\nu \times \lambda }\) with ν = n(mN + k + k(N + 1)) and λ = n(mN + k). Let \(\hat {\mathcal {U}}=U{\Sigma } V^{T}\) be the singular value decomposition of \(\mathcal {U}\). Here,

$$ {\Sigma}=\left[\begin{array}{ccc} s_{1}\\ & \ddots\\ & & s_{\nu}\\ 0 & {\cdots} & 0 \end{array}\right]\in\mathbb{R}^{\nu\times\lambda} $$

with \(s_{1}=\sigma _{{\max \limits }}(\hat {\mathcal {U}})\) and \(s_{\nu }=\sigma _{{\min \limits }}(\hat {\mathcal {U}})\). According to Proposition 1, \(\sigma _{{\min \limits }}(\hat {\mathcal {U}})>0\). By (34), this leads to

$$ \lVert\mathcal{R}\rVert_{\mathbb{R}^{n(mN+k)}\rightarrow H_{D,\pi}^{1}}=\sup_{c\neq0}\frac{\lVert\mathcal{R}c\rVert_{H_{D,\pi}^{1}}}{\lvert c\rvert_{\mathbb{R}^{\lambda}}}=\sup_{c\neq0}\frac{\lvert\hat{\mathcal{U}}c\rvert_{\mathbb{R}^{\nu}}}{\lvert c\rvert_{\mathbb{R}^{\lambda}}}=\sup_{\chi\neq0}\frac{\lvert{\Sigma}\chi\rvert_{\mathbb{R}^{\nu}}}{\lvert\chi\rvert_{\mathbb{R}^{\lambda}}}=\sigma_{{\max}}(\hat{\mathcal{U}}) $$

and

$$ \lVert\mathcal{R}^{-1}\rVert_{H_{D,\pi}^{1}\rightarrow\mathbb{R}^{n(mN+k)}}=\sup_{x\neq0}\frac{\lvert\mathcal{R}^{-1}x\rvert_{\mathbb{R}^{\lambda}}}{\lVert x\rVert_{H_{D,\pi}^{1}}}=\sup_{c\neq0}\frac{\lvert c\rvert_{\mathbb{R}^{\lambda}}}{\lVert\mathcal{R}c\rVert_{H_{D,\pi}^{1}}}=\sup_{\chi\neq0}\frac{\lvert\chi\rvert_{\mathbb{R}^{\lambda}}}{\lvert{\Sigma}\chi\rvert_{\mathbb{R}^{\nu}}}=\sigma_{{\min}}(\hat{\mathcal{U}})^{-1}. $$

The statements concerning \(\lVert \mathcal {R}\rVert _{\mathbb {R}^{n(mN+k)}\rightarrow L^{2}}\) and \(\|\mathcal {R}^{-1}\|_{H_{D,\pi }^{1}\rightarrow \mathbb {R}^{n(mN+k)}}\) follow similarly. □

Using the structure (33) of \(\mathcal {U}\), we obtain

$$ \begin{array}{@{}rcl@{}} \sigma_{{\max}}(\mathcal{U}) & =&\max_{j=1,\ldots,n}\max\{h_{j}^{3/2}\sigma_{{\max}}(\bar{\Gamma}\bar{V}),h_{j}^{1/2}\sigma_{{\max}}({\Gamma} V)\}\\ & =&\max_{j=1,\ldots,n}h_{j}^{1/2}\max\{h_{j}\sigma_{{\max}}(\bar{\Gamma}\bar{V}),\sigma_{{\max}}({\Gamma} V)\},\\ \sigma_{{\min}}(\mathcal{U}) & =&\min_{j=1,\ldots,n}\min\{h_{j}^{3/2}\sigma_{{\min}}(\bar{\Gamma}\bar{V}),h_{j}^{1/2}\sigma_{{\min}}({\Gamma} V)\}\\ & =&\min_{j=1,\ldots,n}h_{j}^{1/2}\min\{h_{j}\sigma_{{\min}}(\bar{\Gamma}\bar{V}),\sigma_{{\min}}({\Gamma} V)\}. \end{array} $$

The estimation of the singular values of \(\hat {\mathcal {U}}\) leads to slightly more involved expressions. Let \(U_{j,{red}}=\left [\begin {array}{c} h_{j}\bar {\Gamma }\bar {V}\\ \bar {\Gamma }\mathring {V} \end {array}\right ]\). Then, it holds

$$ \begin{array}{@{}rcl@{}} \sigma_{{\max}}(\mathcal{\hat{U}}) & =&\max_{j=1,\ldots,n}h_{j}^{1/2}\max\{\sigma_{{\max}}(U_{j,{red}}),\sigma_{{\max}}({\Gamma} V)\},\\ \sigma_{{\min}}(\hat{\mathcal{U}}) & =&\min_{j=1,\ldots,n}h_{j}^{1/2}\min\{\sigma_{{\min}}(U_{j,{red}}),\sigma_{{\min}}({\Gamma} V)\}. \end{array} $$

We note that \(\sigma _{\min \limits }(\bar {\Gamma }\mathring {V})=0\) and \(\sigma _{\max \limits }({\Gamma } V)=\sigma _{\max \limits }(\bar {\Gamma }\mathring {V})\). This follows immediately from the construction of the basis for the differential components (14). The definition of singular values and Weyl’s Theorem [14, Theorem III.2.1] provides us with

$$ \begin{array}{@{}rcl@{}} \lambda_{{\max}}(\mathring{V}^{T}\bar{\Gamma}^{2}\mathring{V})&\leq & \lambda_{{\max}}({h_{j}^{2}}\bar{V}^{T}\bar{\Gamma}^{2}\bar{V}+\mathring{V}^{T}\bar{\Gamma}^{2}\mathring{V})=\sigma_{{\max}}(U_{j,{red}})^{2}\\ & \leq& {h_{j}^{2}}\lambda_{{\max}}(\bar{V}^{T}\bar{\Gamma}^{2}\bar{V})+\lambda_{{\max}}(\mathring{V}^{T}\bar{\Gamma}^{2}\mathring{V}),\\ {h_{j}^{2}}\lambda_{{\min}}(\bar{V}^{T}\bar{\Gamma}^{2}\bar{V})&\leq & \lambda_{{\min}}({h_{j}^{2}}\bar{V}^{T}\bar{\Gamma}^{2}\bar{V}+\mathring{V}^{T}\bar{\Gamma}^{2}\mathring{V})=\sigma_{{\min}}(U_{j,{red}})^{2}\leq {h_{j}^{2}}\lambda_{{\max}}(\bar{V}^{T}\bar{\Gamma}^{2}\bar{V}) \end{array} $$

since \(\lambda _{\min \limits }(\mathring {V}^{T}\bar {\Gamma }^{2}\mathring {V})=0\). Then,

$$ \begin{array}{@{}rcl@{}} \sigma_{\max}({\Gamma} V)&= & \lambda_{\max}(V^{T}{\Gamma}^{2}V)^{1/2}\\ &\leq & \max\{\sigma_{{\max}}(U_{j,{red}}),\sigma_{{\max}}({\Gamma} V)\}\\ &\leq & \sigma_{{\max}}({\Gamma} V)+O(h). \end{array} $$

Moreover,

$$ \begin{array}{@{}rcl@{}} \min\{h_{j}\sigma_{\min}(\bar{\Gamma}\bar{V}),\sigma_{\min}({\Gamma} V)\} & \leq&\min\{\sigma_{{\min}}(U_{j,{red}}),\sigma_{\min}({\Gamma} V)\}\\ & \leq&\min\{h_{j}\sigma_{\max}(\bar{\Gamma}\bar{V}),\sigma_{\min}({\Gamma} V)\} \end{array} $$

Collecting all estimates ,Theorem 2 provides

Theorem 3

Let the grid (3) have the maximal stepsize h and the minimal stepsize \(h_{\min \limits }\). Furthermore, let Γ and \(\bar {\Gamma }\) be given by (31) and let V, \(\bar {V}\), and V̈ be given by (26), (25), (27). Then it holds, for sufficiently small h,

$$ \begin{array}{@{}rcl@{}} \lVert\mathcal{R}\rVert_{\mathbb{R}^{n(mN+k)}\rightarrow L^{2}} & =&h^{1/2}\sigma_{\max}({\Gamma} V)=O(h^{1/2}),\\ \|\mathcal{R}^{-1}\|_{L^{2}\rightarrow\mathbb{R}^{n(mN+k)}} & =&h_{\min}^{-3/2}\sigma_{\min}(\bar{\Gamma}\bar{V})^{-1}=O(h_{\min}^{-3/2}),\\ \lVert\mathcal{R}\rVert_{\mathbb{R}^{n(mN+k)}\rightarrow H_{D,\pi}^{1}} & =&h^{1/2}\sigma_{\max}({\Gamma} V)+O(h^{3/2})=O(h^{1/2}), \end{array} $$

and

$$ h_{\min}^{-3/2}\sigma_{\max}(\bar{\Gamma}\bar{V})^{-1}\leq\lVert\mathcal{R}^{-1}\rVert_{H_{D,\pi}^{1}\rightarrow\mathbb{R}^{n(mN+k)}}\leq h_{\min}^{-3/2}\sigma_{\min}(\bar{\Gamma}\bar{V})^{-1}. $$

In particular, \(\lVert \mathcal {R}\rVert _{\mathbb {R}^{n(mN+k)}\rightarrow H_{D,\pi }^{1}}=\lVert \mathcal {R}\rVert _{\mathbb {R}^{n(mN+k)}\rightarrow L^{2}}+O(h^{3/2})\).

In these estimates, we used the fact \(\sigma _{\min \limits }({\Gamma } V)>0\). Note that the constants hidden in the big-O notation in this theorem depend on both N and the chosen basis. For the restriction \(\tilde {\mathcal {R}}\) of \(\mathcal {R}\) onto \(\ker \mathcal {C}\) we obtain, obviously,

$$ \begin{array}{@{}rcl@{}} \lVert\tilde{\mathcal{R}}\rVert & \leq\lVert\mathcal{R}\rVert,\quad\lVert\tilde{\mathcal{R}}^{-1}\rVert\leq\lVert\mathcal{R}^{-1}\rVert. \end{array} $$

For some special cases, the singular values can be easily derived.

Proposition 2

Let V, \(\bar {V}\), and V̈ be given by (25)–(27) and Γ, \(\bar {\Gamma }\) by (31). Then it holds:

  1. (1)

    Let p0,…,pN− 1 be an orthogonal basis in L2(0,1). Then

    $$ \begin{array}{@{}rcl@{}} \sigma_{\min}({\Gamma} V) & =\min\left\{ \lVert p_{\alpha}\|_{L^{2}(0,1)}:\alpha=0,\ldots,N-1\right\} ,\\ \sigma_{\max}({\Gamma} V) & =\max\left\{ \lVert p_{\alpha}\|_{L^{2}(0,1)}:\alpha=0,\ldots,N-1\right\} . \end{array} $$

    In particular, if p0,…,pN− 1 is the Legendre basis, \(\sigma _{\min \limits }({\Gamma } V)=(2N-1)^{-1/2}\) and \(\sigma _{\max \limits }({\Gamma } V)=1\).

  2. (2)

    For an orthonormal basis p0,…,pN− 1 in L2(0,1), \(\sigma _{\min \limits }({\Gamma } V)=\sigma _{\max \limits }({\Gamma } V)=1\).

  3. (3)

    If p0,…,pN− 1 is the modified Legendre basis, it holds \(\sigma _{\min \limits }(\bar {\Gamma }\bar {V})\geq (2N+1)^{-1/2}\) and \(\sigma _{\max \limits }(\bar {\Gamma }\bar {V})\leq (N+2)^{1/2}\). Furthermore, the estimates

    $$ \sigma_{\min}({\Gamma} V)\geq\left( \frac{1}{2-2\cos\frac{N}{N+2}\pi}\right)^{1/2}\geq\frac{1}{2},\quad\sigma_{\max}({\Gamma} V)\leq\left( \frac{2N-1}{2-2\cos\frac{1}{N+2}\pi}\right)^{1/2} $$

    hold true.

Proof

First, we observe that \((V^{T}{\Gamma }^{2}V)_{\alpha \upbeta }={{\int \limits }_{0}^{1}}p_{\alpha -1}(\rho )p_{\upbeta -1}(\rho ){d}\rho =\delta _{\alpha \upbeta }\lVert p_{\alpha -1}\rVert _{L^{2}(a,b)}^{2}\). This provides (1) and (2) as special cases.

Consider the modified Legendre basis now. It holds \({{\int \limits }_{0}^{1}}\bar {p}_{0}^{2}(\rho ){d}\rho =1\) and \({{\int \limits }_{0}^{1}}\bar {p}_{0}(\rho )\bar {p}_{\alpha }(\rho ){d}\rho ={{\int \limits }_{0}^{1}}(P_{\alpha }(2\rho -1)-(-1)^{\alpha }){d}\rho =(-1)^{\alpha +1}\) for α = 1,2,…. Moreover, for α,β= 1,2,…, we have

$$ \begin{array}{@{}rcl@{}} {{\int}_{0}^{1}}\bar{p}_{\alpha}(\rho)\bar{p}_{\upbeta}(\rho){d}\rho & =&{{\int}_{0}^{1}}(P_{\alpha}(2\rho-1)-(-1)^{\alpha})(P_{\upbeta}(2\rho-1)-(-1)^{\upbeta}){d}\rho\\ & =&{{\int}_{0}^{1}}P_{\alpha}(2\rho-1)P_{\upbeta}(2\rho-1){d}\rho+(-1)^{\alpha+\upbeta}\\ & =&(2\alpha+1)^{-1}\delta_{\alpha\upbeta}+(-1)^{\alpha+\upbeta}. \end{array} $$

Collecting these expressions, we obtain the compact representation

$$ \bar{V}^{T}\bar{\Gamma}^{2}\bar{V}={\text{diag}}(1,\frac{1}{3},\ldots,(2N+1)^{-1})+ff^{T} $$

with \(f^{T}=[1,1,-1,+1,-1,\ldots ,\pm 1]\in \mathbb {R}^{N+1}\). ffT is a rank-1 matrix having, therefore, the N-fold eigenvalue 0. Moreover, f is an eigenvector to the eigenvalue fTf = N + 1. In particular, ffT is positive semidefinite. Invoking Weyl’s theorem again, we obtain

$$ \begin{array}{@{}rcl@{}} (2N+1)^{-1} & =&\lambda_{\min}({\text{diag}}(1,\frac{1}{3},\ldots,(2N+1)^{-1}))\leq\lambda_{\min}(\bar{V}^{T}\bar{\Gamma}^{2}\bar{V})\\ \lambda_{\max}(\bar{V}^{T}\bar{\Gamma}^{2}\bar{V}) & \leq&\lambda_{\max}({\text{diag}}(1,\frac{1}{3},\ldots,(2N+1)^{-1}))+\lambda_{\max}(ff^{T})=N+2. \end{array} $$

This proves the first assertion of (3).

The relation \((\mathring {V}^{T}\bar {\Gamma }^{2}\mathring {V})_{\alpha \upbeta }={{\int \limits }_{0}^{1}}\bar {p}^{\prime }_{\alpha -1}\bar {p}^{\prime }_{\upbeta -1}(\sigma ){d}\sigma \) shows that \(K=\mathring {V}^{T}\bar {\Gamma }^{2}\mathring {V}\) is the stiffness matrix of the basis functions. For the modified Legendre basis, it has been investigated in [1, cp Eq. (31)]. According to the proof of Proposition A.2 of [1], the nonvanishing eigenvalues can be estimated byFootnote 3

$$ \lambda_{\min}(K)\geq\frac{1}{2-2\cos\frac{N}{N+2}\pi},\quad\lambda_{\max}(K)\leq\frac{2N-1}{2-2\cos\frac{1}{N+2}\pi}. $$

\(K^{\prime }=V^{T}{\Gamma }^{2}V\) is the submatrix of K obtained by omitting the first row and column of K, which consist entirely of zeros. This provides the final relations of assertion (3). □

An asymptotic analysis shows that \(\sigma _{\max \limits }({\Gamma } V)\leq \frac {2}{\sqrt {\pi }}N^{3/2}+O(N^{1/2})\) in the case of the modified Legendre basis.

Remark 1

We are able to estimate the size of the jump of elements of \(\tilde {X}_{\pi }\) at the grid points. For any \(\tilde {x}\in \tilde {X}_{\pi }\) and \(\tilde {c}=\mathcal {R}^{-1}\tilde {x}\), it holds

$$ \begin{array}{@{}rcl@{}} \lVert\tilde{x}_{j\kappa}\lVert_{C[t_{j-1},t_{j})}&\leq C_{h_{j}}\lVert\tilde{x}_{j\kappa}\rVert_{H^{1}(t_{j-1},t_{j})}=C_{h_{j}}h_{j}^{1/2}\lvert U_{j,{red}}\tilde{c}_{j\kappa}\rvert\\ &\leq C_{h_{j}}h_{j}^{1/2}\sigma_{\max}(U_{j,{red}})\lvert\tilde{c}\rvert \end{array} $$

with \(C_{h_{j}}=\left (\max \limits \{2/h_{j},h_{j}\}\right )^{1/2}\). Here, we used [15, Lemma 3.2]. For sufficiently small hj, this estimate reduces to

$$ \lVert\tilde{x}_{j\kappa}\lVert_{C[t_{j-1},t_{j})}\leq\sqrt{2}\sigma_{\max}(\bar{\Gamma}\mathring{V})\lvert\tilde{c}\rvert=\sqrt{2}\sigma_{\max}({\Gamma} V)\lvert\tilde{c}\rvert. $$

Let x be any element of Xπ and \(c=\mathcal {R}^{-1}x\). Replacing \(\tilde {c}\) by \({\Delta } c=\tilde {c}-c\) in the last estimate, we obtain

$$ \begin{array}{@{}rcl@{}} \lvert\tilde{x}_{\kappa}(t_{j-0})-\tilde{x}_{\kappa}(t_{j+0})\rvert & =&\lvert\tilde{x}_{\kappa}(t_{j-0})-x_{\kappa}(t_{j-0})+x_{\kappa}(t_{j+0})-\tilde{x}_{\kappa}(t_{j+0})\rvert\\ & \leq&\lVert\tilde{x}_{\kappa j}-x_{\kappa j}\rVert_{C[t_{j-1},t_{j})}+\lVert\tilde{x}_{\kappa,j+1}-x_{\pi,\kappa,j+1}\rVert_{C[t_{j},t_{j+1})}\\ & \leq&2\sqrt{2}\sigma_{\max}({\Gamma} V)\lvert{\Delta} c\rvert. \end{array} $$

Proposition 2 provides estimations for the factor \(\sigma _{\max \limits }({\Gamma } V)\). In particular, for some bases, it does not depend on the polynomial degree N. __

4 Error estimation for the constrained minimization problem

The aim of this section is the derivation of bounds for perturbations of the solution c for the problem (23)–(22), that is,

$$ \begin{gathered}\varphi(z)=\lvert\mathcal{A}z-r\rvert^{2}\rightarrow\min!\\ \textrm{subject to }\mathcal{C}z=0, \end{gathered} $$

under perturbation of the data \(\mathcal {A}\), \(\mathcal {C}\), r. Such bounds are known for a long time, e.g., [11, 12]. However, we will provide different bounds in this section. The reason for this is that the constraint \(\mathcal {C}c=0\) has an exceptional meaning in the present context: It holds \(\mathcal {C}c=0\) if and only if \(\mathcal {R}c\in {H_{D}^{1}}(a,b)\). If a perturbation \({\Delta }\mathcal {C}\) of \(\mathcal {C}\) changes the kernel of \(\mathcal {C}\), it does no longer hold \(\mathcal {R}c\in {H_{D}^{1}}(a,b)\) in general! Therefore, we will consider the two cases \(\ker (\mathcal {C}+{\Delta }\mathcal {C})=\ker \mathcal {C}\) and \(\ker (\mathcal {C}+{\Delta }\mathcal {C})\neq \ker \mathcal {C}\) separately.

Let \(\tilde {c}\) the solution of the perturbed problem

$$ \min\{\lvert(\mathcal{A}+{\Delta}\mathcal{A})z-(r+{\Delta} r)\rvert^{2}:(\mathcal{C}+{\Delta}\mathcal{C})z=0\}. $$
(36)

Then, let \({\Delta } c=c-\tilde {c}\) denote the error. We are interested in deriving an error bound on Δc in terms of the perturbations of the data.

Let us for a matrix \({\mathscr{M}}\), denote the Moore-Penrose inverse by \({\mathscr{M}}^{+}\). Moreover, let \(\lVert {\mathscr{M}}\rVert \) be its spectral norm.

Let \(\mathcal {D}\) be an orthonormal basis of \(\ker \mathcal {C}\). Then, \(P=I_{n(mN+k)}-\mathcal {C}^{+}\mathcal {C}\) is the orthogonal projector onto \(\ker \mathcal {C}\) and \(P\mathcal {D}=\mathcal {D}\). Some more properties are collected in the following proposition.

Proposition 3

It holds, for any matrix \({\mathscr{M}}\in \mathbb {R}^{\nu \times n(mN+k)},\)\(\nu \in \mathbb {N}\),

  1. 1.

    \(\mathcal {D}^{T}\mathcal {D}=I_{nmN+k}\) and \(\mathcal {D}\mathcal {D}^{T}=P\).

  2. 2.

    If \(c=\mathcal {D}d,\) then \(\lvert c\rvert =\lvert d\rvert \).

  3. 3.

    \(\lVert \mathcal {A}\mathcal {D}\rVert =\lVert \mathcal {A}P\rVert \).

  4. 4.

    \((\mathcal {A}P)^{+}=\mathcal {D}(\mathcal {A}\mathcal {D})^{+}\).

  5. 5.

    \(\lVert (\mathcal {A}P)^{+}\rVert =\lVert (\mathcal {A}\mathcal {D})^{+}\rVert \).

The proofs are obvious. For the following, we note that the matrix \(\mathcal {A}\mathcal {D}\) has full column rank [9, Proposition 1].

4.1 \(\ker (\mathcal {C}+{\Delta }\mathcal {C})=\ker \mathcal {C}\)

Each element c of \(\ker \mathcal {C}\) has a unique representation \(c=\mathcal {D}d\) with \(d\in \mathbb {R}^{nmN+k}\). Therefore, (23)–(22) is equivalent to the unconstrained minimization problem

$$ \min_{d\in\mathbb{R}^{nmN+k}}\lVert\mathcal{A}\mathcal{D}d-r\rVert $$
(37)

while (36) becomes the unconstrained minimization problem

$$ \min_{d\in\mathbb{R}^{nmN+k}}\lVert(\mathcal{A}+{\Delta}\mathcal{A})\mathcal{D}d-(r+{\Delta} r)\rVert. $$
(38)

Since \(\mathcal {A}\mathcal {D}\) has full column rank, standard perturbation results for unconstrained least squares problems apply. As a consequence of [16, Satz 8.2.7] and Proposition 3, we obtain

Theorem 4

Let \(\omega =\lVert (\mathcal {A}P)^{+}\rVert \lVert {\Delta }\mathcal {A}P\rVert <1\). Then it holds

$$ \lvert{\Delta} c\rvert\leq\frac{\lVert(\mathcal{A}P)^{+}\rVert}{1-\omega}\left\{ \lVert{\Delta}\mathcal{A}P\rVert\left[\lvert c\rvert+\lVert(\mathcal{A}P)^{+}\rVert\lvert\mathfrak{r}\rvert+\lvert{\Delta} r\rvert\right]\right\} $$

and

$$ \begin{array}{@{}rcl@{}} \frac{\lvert{\Delta} c\rvert}{\lvert c\rvert} & \leq\frac{1}{1-\omega}\left\{ \left[\kappa_{\mathcal{C}}(\mathcal{A})+\frac{\lvert\mathfrak{r}\rvert}{\lVert\mathcal{A}P\rVert\lvert c\rvert}\kappa_{\mathcal{C}}(\mathcal{A})^{2}\right]\frac{\lVert{\Delta}\mathcal{A}P\rVert}{\lVert\mathcal{A}P\rVert}\right.\\ & \left.+\frac{\lVert(\mathcal{A}P)^{+}\rVert\lvert r\rvert}{\lvert c\rvert}\cdot\frac{\lvert{\Delta} r\rvert}{\lvert r\rvert}\right\} . \end{array} $$

Here, \(\mathfrak {r}=r-\mathcal {A}c\) and

$$ \kappa_{\mathcal{C}}(\mathcal{A})=\lVert\mathcal{A}P\rVert\lVert(\mathcal{A}P)^{+}\rVert. $$

Theorem 4 corresponds to classical results for unconstrained minimization problems (e.g., [10], [13, Theorem 9.12] and is a small generalization of them. Let us emphasize that the estimation is independent of the perturbations of \(\mathcal {C}\) as long as the null space of \(\mathcal {C}\) is not changed by the perturbation.

Remark 2

In the case of the Legendre basis, the elements of \(\mathcal {C}\) consist only of three nonzero elements being equal to 1 and − 1, respectively, possibly scaled by the stepsizes, cf. (16), (17). So we expect \({\Delta }\mathcal {C}=0\) such that the estimates of this section apply. __

4.2 \(\ker (\mathcal {C}+{\Delta }\mathcal {C})\protect \neq \ker \mathcal {C}\)

The estimation of the error becomes much more involved than in the previous case. In a first step, we will construct a basis for the kernel of the perturbed constraint \((\mathcal {C}+{\Delta }\mathcal {C})z=0\).

Lemma 1

Let \({\varkappa }=\lVert \mathcal {C}^{+}\rVert \lVert {\Delta }\mathcal {C}\rVert <1/2.\) Then \(\mathcal {C}+{\Delta }\mathcal {C}\) has full rank and \(P_{\Delta }=I_{n(mN+k)}-(\mathcal {C}+{\Delta }\mathcal {C})^{+}(\mathcal {C}+{\Delta }\mathcal {C})\) is a projector onto \(\ker (\mathcal {C}+{\Delta }\mathcal {C})\). Furthermore, \(\mathcal {D}_{\Delta }=P_{\Delta }\mathcal {D}\) is a basis of \(\ker (\mathcal {C}+{\Delta }\mathcal {C})\). Moreover, the estimates

$$ \lVert(\mathcal{C}+{\Delta}\mathcal{C})^{+}\rVert\leq\frac{\lVert\mathcal{C}^{+}\rVert}{1-{\varkappa}} $$

and

$$ \lVert(\mathcal{C}+{\Delta}\mathcal{C})^{+}-\mathcal{C}^{+}\rVert\leq\frac{\sqrt{2}\lVert\mathcal{C}^{+}\rVert^{2}}{1-{\varkappa}}\lVert{\Delta}\mathcal{C}\rVert $$

hold true.

Proof

The proposition of \(\mathcal {C}+{\Delta }\mathcal {C}\) having full rank as well as the error estimates follow from [16, Satz 8.2.5].

For showing that \(\mathcal {D}_{\Delta }\) is a basis of \(\ker (\mathcal {C}+{\Delta }\mathcal {C})\) consider

$$ \begin{array}{@{}rcl@{}} (I-P_{\Delta})P & =&(\mathcal{C}+{\Delta}\mathcal{C})^{+}(\mathcal{C}+{\Delta}\mathcal{C})(I-\mathcal{C}^{+}\mathcal{C})\\ & =&(\mathcal{C}+{\Delta}\mathcal{C})^{+}{\Delta}\mathcal{C}(I-\mathcal{C}^{+}\mathcal{C}). \end{array} $$

It holds

$$ \lVert(I-P_{\Delta})P\rVert\leq\lVert(\mathcal{C}+{\Delta}\mathcal{C})^{+}\rVert\lVert{\Delta}\mathcal{C}\rVert\leq\frac{\lVert\mathcal{C}^{+}\rVert}{1-{\varkappa}}\lVert{\Delta}\mathcal{C}\rVert\leq\frac{{\varkappa}}{1-{\varkappa}}<1. $$

Therefore, the assumptions of [17, Theorem I-6.34] are fulfilled. Since \(\dim \ker (\mathcal {C}+{\Delta }\mathcal {C})=\dim \ker \mathcal {C},\)the first alternative of that theorem applies and PΔ is a one-to-one mapping of \(\ker \mathcal {C}\) onto \(\ker (\mathcal {C}+{\Delta }\mathcal {C})\). Hence, \(\mathcal {D}_{\Delta }\) is a basis of the latter space. □

By using the bases \(\mathcal {D}\) and \(\mathcal {D}_{\Delta }\), the unperturbed and the perturbed least squares problems become (37) and

$$ \min_{d\in\mathbb{R}^{nmN+k}}\lVert(\mathcal{A}+{\Delta}\mathcal{A})\mathcal{D}_{\Delta}d-(r+{\Delta} r)\rVert. $$
(39)

In a first step, the deviations of the bases shall be estimated. It holds

$$ \begin{array}{@{}rcl@{}} P_{\Delta}-P & =&\mathcal{C}^{+}\mathcal{C}-(\mathcal{C}+{\Delta}\mathcal{C})^{+}(\mathcal{C}+{\Delta}\mathcal{C})\\ & =&\mathcal{C}^{+}\mathcal{C}-(\mathcal{C}+{\Delta}\mathcal{C})^{+}\mathcal{C}-(\mathcal{C}+{\Delta}\mathcal{C})^{+}{\Delta}\mathcal{C}\\ & =&\left[\mathcal{C}^{+}-(\mathcal{C}+{\Delta}\mathcal{C})^{+}\right]\mathcal{C}-(\mathcal{C}+{\Delta}\mathcal{C})^{+}{\Delta}\mathcal{C}. \end{array} $$

Invoking Lemma 1, we obtainFootnote 4

$$ \lVert P_{\Delta}-P\rVert\leq\left[\frac{\sqrt{2}\lVert\mathcal{C}^{+}\rVert^{2}}{1-{\varkappa}}\lVert\mathcal{C}\rVert+\frac{\lVert\mathcal{C}^{+}\rVert}{1-{\varkappa}}\right]\lVert{\Delta}\mathcal{C}\rVert=\frac{\lVert\mathcal{C}^{+}\rVert}{1-{\varkappa}}\left[\sqrt{2}\kappa(\mathcal{C})+1\right]\lVert{\Delta}\mathcal{C}\rVert $$

with \(\kappa (\mathcal {C})=\lVert \mathcal {C}^{+}\rVert \lVert \mathcal {C}\rVert \). Consequently,

$$ \lVert\mathcal{D}_{\Delta}-\mathcal{D}\rVert=\lVert(P_{\Delta}-P)\mathcal{D}\rVert\leq\lVert P_{\Delta}-P\rVert\lVert\mathcal{D}\rVert\leq\frac{\lVert\mathcal{C}^{+}\rVert}{1-{\varkappa}}\left[\sqrt{2}\kappa(\mathcal{C})+1\right]\lVert{\Delta}\mathcal{C}\rVert. $$
(40)

Let us transform (39) now. It holds

$$ \begin{array}{@{}rcl@{}} (\mathcal{A}+{\Delta}\mathcal{A})\mathcal{D}_{\Delta} & =(\mathcal{A}+{\Delta}\mathcal{A})\mathcal{D}+(\mathcal{A}+{\Delta}\mathcal{A})(\mathcal{D}_{\Delta}-\mathcal{D})\\ & =\mathcal{A}\mathcal{D}+\mathfrak{R} \end{array} $$

where \(\mathfrak {R}={\Delta }\mathcal {A}\mathcal {D}+(\mathcal {A}+{\Delta }\mathcal {A})(\mathcal {D}_{\Delta }-\mathcal {D})\). The representation of \(\mathfrak {R}\) provides the estimate

$$ \lVert\mathfrak{R}\rVert\leq\lVert{\Delta}\mathcal{A}P\rVert+\lVert\mathcal{A}+{\Delta}\mathcal{A}\rVert\frac{\lVert\mathcal{C}^{+}\rVert}{1-{\varkappa}}\left[\sqrt{2}\kappa(\mathcal{C})+1\right]\lVert{\Delta}\mathcal{C}\rVert. $$
(41)

Denote \(\omega _{\Delta }=\lVert (\mathcal {A}P)^{+}\rVert \lVert \mathfrak {R}\rVert \). The condition ωΔ < 1 is obviously fulfilled if

$$ \lVert(\mathcal{A}P)^{+}\rVert\left\{ \lVert{\Delta}\mathcal{A}P\rVert+\lVert\mathcal{A}+{\Delta}\mathcal{A}\rVert\frac{\lVert\mathcal{C}^{+}\rVert}{1-{\varkappa}}\left[\sqrt{2}\kappa(\mathcal{C})+1\right]\lVert{\Delta}\mathcal{C}\rVert\right\} <1. $$
(42)

Let d + Δd be the solution of (39). Using the fact that \(\mathcal {A}\mathcal {D}\) has full rank, Theorem 8.2.7 of [16] provides the estimates

$$ \lvert{\Delta} d\rvert\leq\frac{\lVert(\mathcal{A}P)^{+}\rVert}{1-\omega_{\Delta}}\left\{ \lVert\mathfrak{R}\rVert\left[\vert d\rvert+\lVert(\mathcal{A}P)^{+}\rVert\lvert\mathfrak{r}\rvert\right]+\lvert{\Delta} r\rvert\right\} $$
(43)

and

$$ \begin{array}{@{}rcl@{}} \frac{\lvert{\Delta} d\rvert}{\lvert d\vert}\leq & \frac{1}{1-\omega_{\Delta}}\left\{ \left[\kappa_{\mathcal{C}}(\mathcal{A})+\frac{\lvert\mathfrak{r}\rvert}{\lVert\mathcal{A}\mathcal{D}\rVert\lvert d\rvert}\kappa_{\mathcal{C}}(\mathcal{A})^{2}\right]\frac{\lVert\mathfrak{R}\rVert}{\lVert\mathcal{A}\mathcal{D}\rVert}\right. \\ & \left.+\frac{\lVert(\mathcal{A}\mathcal{D})^{+}\rVert\lvert r\rvert}{\rvert d\rvert}\cdot\frac{\vert{\Delta} r\rvert}{\lvert r\rvert}\right\} . \end{array} $$
(44)

with \(\mathfrak {r}=r-\mathcal {A}c\).

Theorem 5

Let \(\lVert {\Delta }\mathcal {A}\rVert \) and \(\lVert {\Delta }\mathcal {C}\rVert \) be sufficiently small such that (42) and \({\varkappa }=\lVert \mathcal {C}^{+}\lVert \rVert {\Delta }\mathcal {C}\rVert <1/2\) hold true. Then it holds

$$ \lvert{\Delta} c\rvert\leq\frac{\lVert(\mathcal{A}P)^{+}\rVert}{1-\omega_{\Delta}}\left\{ \lVert\mathfrak{R}\rVert\left[\lvert c\rvert+\lVert(\mathcal{A}P)^{+}\rVert\lvert\mathfrak{r}\lvert\right]+\lvert{\Delta} r\rvert\right\} +\frac{\lVert\mathcal{C}^{+}\rVert}{1-{\varkappa}}\left[\sqrt{2}\kappa(\mathcal{C})+1\right]\lVert{\Delta}\mathcal{C}\rVert\lvert c\rvert $$

and

$$ \begin{array}{@{}rcl@{}} \frac{\lvert{\Delta} c\rvert}{\lvert c\rvert}&\leq & \frac{1}{1-\omega_{\Delta}}\left\{ \left[\kappa_{\mathcal{C}}(\mathcal{A})+\frac{\lvert\mathfrak{r}\rvert}{\lVert\mathcal{A}P\rVert\lvert c\rvert}\kappa_{\mathcal{C}}(\mathcal{A})^{2}\right]\frac{\lVert\mathfrak{R}\rVert}{\lVert\mathcal{A}P\rVert}\right.\\ && \qquad\qquad\left.+\frac{\lVert(\mathcal{A}P)^{+}\rVert\lvert r\rvert}{\lvert c\rvert}\cdot\frac{\vert{\Delta} r\rvert}{\lvert r\rvert}\right\} +\frac{\lVert\mathcal{C}^{+}\rVert}{1-{\varkappa}}\left[\sqrt{2}\kappa(\mathcal{C})+1\right]\lVert{\Delta}\mathcal{C}\rVert. \end{array} $$

Proof

It holds \(c=\mathcal {D}d\) and \({\Delta } c=\mathcal {D}_{\Delta }{\Delta } d+(\mathcal {D}_{\Delta }-\mathcal {D})d\) such that \(\lvert {\Delta } c\rvert \leq \lvert {\Delta } d\rvert +\lVert P_{\Delta }-P\rVert \lvert d\rvert \). Inserting this estimate in (43) and (44) and using \(\lvert c\rvert =\lvert \mathcal {D}d\rvert =\lvert d\rvert \) provides the claim. □

Remark 3

\(\lvert \mathfrak {r}\lvert \) is a measure for the accuracy of the discrete solution. Let xπXπ denote the discrete solution obtained by minimizing Φπ,M (8). Its representation becomes \(c=\mathcal {R}^{-1}x_{\pi }\). Then it holds \(\lvert \mathfrak {r}\rvert ^{2}=\lvert \mathcal {A}c-r\rvert ^{2}={\Phi }_{\pi ,M}(x_{\pi })\). Hence, Φπ,M(xπ) ≤ 2(Φπ,M(x) + Φπ,M(xπx)). Under the conditions of Theorem 1, it holds, therefore, \(\lvert \mathfrak {r}\rvert \leq ch^{N-\mu +1}\). __

The critical quantities to estimate the influence of perturbations are \(\kappa _{\mathcal {C}}(\mathcal {A})\) and \(\lVert \mathcal {C}^{+}\rVert \), \(\kappa (\mathcal {C})\) as well as \(\lVert (\mathcal {A}P)^{+}\rVert \). The norms of \(\mathcal {C}\) and its pseudoinverse depend only on the choice of Xπ and the basis chosen for it, but not on the DAE. It holds \(\lVert \mathcal {C}\rVert =\sigma _{\max \limits }(\mathcal {C})\) and \(\lVert \mathcal {C}^{+}\rVert =\sigma _{\min \limits }(\mathcal {C})^{-1}\) with \(\sigma _{\min \limits }(\mathcal {C})\) being the smallest nonvanishing singular value of \(\mathcal {C}\). Since \(\mathcal {C}\) has full row rank, \(\sigma _{\min \limits }(\mathcal {C})=\left (\lambda _{\min \limits }(\mathcal {C}\mathcal {C}^{T})\right )^{1/2}\) and \(\sigma _{\max \limits }(\mathcal {C})=\left (\lambda _{\max \limits }(\mathcal {C}\mathcal {C}^{T})\right )^{1/2}\).

With \(\mathcal {C}\) from (22) we observe that

$$ \mathcal{C}={\Pi}_{1}\left[I_{k}\otimes\mathcal{C}_{\mathrm{s}}\vert\mathcal{O}_{\mathrm{s}}\right]{\Pi}_{2} $$

with

$$ \mathcal{C}_{\mathrm{s}}=\begin{bmatrix}\bar{\mathcal{P}}_{1}(t_{1}) & -\bar{\mathcal{P}}{}_{2}(t_{1})\\ & \bar{\mathcal{P}}_{2}(t_{2}) & -\bar{\mathcal{P}}_{3}(t_{2})\\ & & {\ddots} & \ddots\\ & & & {\ddots} & \ddots\\ & & & & \bar{\mathcal{P}}_{n-1}(t_{n-1}) & -\bar{\mathcal{P}}_{n}(t_{n-1}) \end{bmatrix}\in\mathbb{R}^{(n-1)\times n(N+1)} $$

and \(\mathcal {O}_{\mathrm {s}}\in \mathbb {R}^{k(n-1)\times nN(m-k)}\) consists entirely of zero elements. The permutation matrices π1 and π2 are constructed as follows: Let \(x=[x_{1},x_{2},\ldots ,x_{m}]^{T}\in \tilde {X}_{\pi }\). First, the equations in \(\mathcal {C}c=0\) are reordered such that first all equations related to the first component x1, then those of x2, and so on until xk are available. This reordering is expressed via π1. The column permutation π2 reorders the coefficients such that the ones describing the differential components are taken first, and then the ones belonging to the algebraic components. In particular, the coefficients cκ describing xκ are given by cκ = [c1κ0,c1,κ1,…,c1κN,c2κ0,…,cnκN]T. Then we have

$$ \begin{array}{@{}rcl@{}} \mathcal{C}\mathcal{C}^{T} & ={\Pi}_{1}\left[I_{k}\otimes\mathcal{C}_{\mathrm{s}}\vert\mathcal{O}_{\mathrm{s}}\right]{\Pi}_{2}{{\Pi}_{2}^{T}}\left[\begin{array}{c} I_{k}\otimes\mathcal{C}_{\mathrm{s}}^{T}\\ \mathcal{O}_{\mathrm{s}}^{T} \end{array}\right]{\Pi}_{1}^{T}={\Pi}_{1}(I_{k}\otimes\mathcal{C}_{\mathrm{s}}\mathcal{C}_{\mathrm{s}}^{T}){{\Pi}_{1}^{T}}, \end{array} $$
(45)

Using (16) and (17), it holds

$$ \mathcal{C}_{\mathrm{s}}=C_{\mathrm{s}}{\text{diag}}(h_{1},\ldots,h_{n}) $$

with

$$ C_{\mathrm{s}}=\left[\begin{array}{cccccc} f & -{e_{1}^{T}}\\ & f & -{e_{1}^{T}}\\ & & {\ddots} & \ddots\\ \\ & & & & f & -{e_{1}^{T}} \end{array}\right] $$
(46)

where e1 is the first unit vector and \(f=[1,{{\int \limits }_{0}^{1}}p_{0}(\sigma ){d}\sigma ,\ldots ,{{\int \limits }_{0}^{1}}p_{N-1}(\sigma ){d}\sigma ]\). This leads to

$$ \mathcal{C}_{\mathrm{s}}\mathcal{C}_{\mathrm{s}}^{T}=\left[\begin{array}{ccccc} {h_{1}^{2}}\lvert f\rvert^{2}+{h_{2}^{2}} & -{h_{2}^{2}}\\ -{h_{2}^{2}} & {h_{2}^{2}}\lvert f\rvert^{2}+{h_{3}^{2}} & -{h_{3}^{2}}\\ & {\ddots} & {\ddots} & \ddots\\ \\ & & & -h_{n-1}^{2} & h_{n-1}^{2}\lvert f\rvert^{2}+{h_{n}^{2}} \end{array}\right]. $$
(47)

The eigenvalues of \(\mathcal {C}\mathcal {C}^{T}\) are those of (47).

Proposition 4

Let the grid (3) have the maximal stepsize h and the minimal stepsize \(h_{\min \limits }\). Then it holds

(1):

\(\lvert f\rvert >1\).

(2):

\(0<h_{\min \limits }^{2}(\lvert f\rvert ^{2}-1)\leq \lambda _{\min \limits }(\mathcal {C}_{s}\mathcal {C}_{s}^{T})\) and \(\lambda _{\max \limits }(\mathcal {C}_{s}\mathcal {C}_{s}^{T})\leq h^{2}(\lvert f\rvert ^{2}+3)\).

Proof

Since the first component of f is equal to 1, we have \(\lvert f\rvert \geq 1\) and \(\lvert f\rvert =1\) if and only if \({{\int \limits }_{0}^{1}}p_{0}(\sigma ){d}\sigma =\cdots ={{\int \limits }_{0}^{1}}p_{N-1}(\sigma ){d}\sigma =0\). Assume that the latter condition holds true. This means in particular that p0,…,pN− 1 are orthogonal to the polynomial \(p(\tau )\equiv 1\in \mathfrak {P}_{N-1}\). The latter space has dimension N. Since \(p_{0},\ldots ,p_{N-1}\in \mathfrak {P}_{N-1}\) are N polynomials being orthogonal to p, they must be linearly dependent in contradiction to the assumption that they form a basis. This proves (1).

In order to prove (2), we observe that \(\mathcal {C}_{s}\mathcal {C}_{s}^{T}\) is symmetric such that all eigenvalues are real. Invoking Gershgorin’s circle theorem [16, Theorem 1.2.10], the eigenvalues λ of \(\mathcal {C}_{s}\mathcal {C}_{s}^{T}\) fulfill

$$ \min_{j=1,\ldots,n-1}{h_{j}^{2}}(\lvert f\rvert^{2}-1)\leq\lambda\leq\max_{j=1,\ldots,n-1}{h_{j}^{2}}(\lvert f\rvert^{2}+1)+2h_{j+1}^{2}. $$

This proves (2). □

We obtain immediately the following corollary. Note that f depends only on N and the chosen basis, but not on the grid.

Corollary 1

Let the grids (3) be quasiuniform, that is \(h/h_{\min \limits }\leq \rho <\infty \) with ρ independent of π. Then it holds \(\kappa (\mathcal {C})\leq \rho \left (\frac {\lvert f\rvert ^{2}+3}{\lvert f\rvert ^{2}-1}\right )^{1/2}\) and \(\lVert \mathcal {C}^{+}\rVert \leq h_{\min \limits }(\lvert f\rvert ^{2}-1)^{-1/2}\).

For constant stepsize h, we have \(\mathcal {C}_{\mathrm {s}}\mathcal {C}_{\mathrm {s}}^{T}=h^{2}C_{s}{C_{s}^{T}}\), which is a Toeplitz tridiagonal matrix. In this case, the eigenvalues of \(C_{s}{C_{s}^{T}}\) are given by [18, Theorem 2.2]

$$ \lambda_{j}=1+\lvert f\rvert^{2}-2\cos\left( \frac{j\pi}{n}\right),\quad j=1,\ldots,n-1. $$
(48)

Proposition 5

Let the grid (3) be equidistant with stepsize h, and Cs be given by (46). Then it holds

  • For the Legendre basis \(1\leq \lambda _{\min \limits }(C_{s}{C_{s}^{T}})\leq \lambda _{\max \limits }(C_{s}{C_{s}^{T}})\leq 5\);

  • For the modified Legendre basis \(2N\leq \lambda _{\min \limits }(C_{s}{C_{s}^{T}})\leq \lambda _{\max \limits }(C_{s}{C_{s}^{T}})\leq 2N+6\);

  • For the Chebyshev basis \(1\leq \lambda _{\min \limits }(C_{s}{C_{s}^{T}})\leq \lambda _{\max \limits }(C_{s}{C_{s}^{T}})\leq 4+2\ln 2\).

  • For the Runge-Kutta basis assume additionally that \({{\int \limits }_{0}^{1}}p_{i}(\sigma ){d}\sigma \geq 0\), i = 0,1,…,N − 1. Then \(N^{-1}\leq \lambda _{\min \limits }(C_{s}{C_{s}^{T}})\leq \lambda _{\max \limits }(C_{s}{C_{s}^{T}})\leq 5\).

Proof

In the case of the Legendre basis, it holds f = [1,1,0,…,0]. Hence, \(\lvert f\rvert ^{2}=2\) such that the statement follows.

For the modified Legendre basis, we have f = [1,2,0,2,0,…] such that

$$ \lvert f\rvert^{2}=\begin{cases} 2N+1, & N~\text{even},\\ 2N+3, & N~\text{odd}. \end{cases} $$

For the Chebyshev basis, we observe

$$ {{\int}_{0}^{1}}p_{i}(\sigma){d}\sigma=\begin{cases} \frac{1}{2}\frac{1+(-1)^{i}}{1-i^{2}}, & i\neq1,\\ 0, & i=1. \end{cases} $$

This leads to \(f=[1,1,0,-\frac {1}{3},0,-\frac {1}{8},0,\ldots ]\). Hence,

$$ 2\leq\lvert f\rvert^{2}\leq2+\sum\limits_{i=1}^{\infty}\left( \frac{1}{1-(2i)^{2}}\right)^{2}\leq2+{\sum}_{i=1}^{\infty}\frac{1}{i(4i^{2}-1)}=2+2\ln2-1. $$

For the sum of the series, cf. [19, p. 269, series 110.d]. This provides the estimate for the Chebyshev basis.

In case of the Runge-Kutta basis, it holds \({\sum }_{i=0}^{N-1}p_{i}(\sigma )\equiv 1\). With f = [1,f2,…,fN+ 1] it holds then fi ≥ 0 and \({\sum }_{i=2}^{N+1}f_{i}=1\). Hence,

$$ \frac{1}{N}=\frac{1}{N}\left( \sum\limits_{i=2}^{N+1}f_{i}\right)^{2}\leq\sum\limits_{i=2}^{N+1}{f_{i}^{2}}\leq\sum\limits_{i=2}^{N+1}f_{i}=1. $$

This yields \(1+N^{-1}\leq \lvert f\rvert ^{2}\leq 2\) and the claim follows. □

Remark 4

For the Runge-Kutta basis, the values \(f_{i}={{\int \limits }_{0}^{1}}p_{i-1}(\sigma ){d}\sigma \) are just the weights of the interpolatory quadrature rule corresponding to the nodes τ1,…,τN of (18). For a number of common choices of nodes, these weights are known to be positive. Examples are the Gauss-Legendre nodes, Radau nodes, and Lobatto nodes [20, Section 2.7]. It holds also true for Chebyshev nodes and many others; see, e.g., [20, pp. 85f]. __

Note that the claims of Proposition 5 could also be shown using Gershgorin’s theorem. This indicates that the estimates of Proposition 4 are rather tight.

Corollary 2

For equidistant grids (3), it holds

  • For the Legendre basis \(\kappa (\mathcal {C})\leq \sqrt {5}\) and \(\lVert \mathcal {C}^{+}\rVert \leq h^{-1}\);

  • For the modified Legendre basis \(\kappa (\mathcal {C})\leq \left (\frac {2N+6}{2N}\right )^{1/2}\) and \(\lVert \mathcal {C}^{+}\rVert \leq (2N)^{-1/2}h^{-1}\);

  • For the Chebyshev basis \(\kappa (\mathcal {C})\leq (4+2\ln 2)^{1/2}\approx 2.32\) and \(\lVert \mathcal {C}^{+}\rVert \leq h^{-1}\).

  • For the Runge-Kutta basis \(\kappa (\mathcal {C})\leq (5N)^{1/2}\) and \(\lVert \mathcal {C}^{+}\rVert \leq N^{1/2}h^{-1}\) provided that \({{\int \limits }_{0}^{1}}p_{i}(\sigma ){d}\sigma \geq 0\), i = 0,1,…,N − 1.

It should be emphasized again that, if \(\ker (\mathcal {C}+{\Delta }\mathcal {C})\neq \ker \mathcal {C}\), it cannot be guaranteed that the solution of the perturbed problem \(\mathcal {R}(c+{\Delta } c)\) belongs to Xπ. Instead, it belongs to \(\tilde {X}_{\pi }\), only. Simple projection algorithms of elements of \(\tilde {X}_{\pi }\) onto Xπ can be derived. In our experiments so far, these projections did not lead to a better accuracy than the unprojected numerical solutions.

5 Some examples

5.1 Conditioning of the representation map \(\mathcal {R}\)

For each selection {p0,…,pN− 1} of basis polynomials, the conditioning of the representation map depends both on the grid and on N. For simplicity, we assume here that an equidistant grid with stepsize h is used for defining Xπ. Besides the bases introduced before, we will additionally consider the Runge-Kutta basis with uniform interpolation points as used in our very first paper on the subject [1].

The norms of the representation map and its inverse have been computed for both settings (mapping into \(L^{2}((a,b),\mathbb {R}^{m})\) and \({H_{D}^{1}}(a,b)\)) and for polynomial degrees N = 3,5,10,20 and h = n− 1 where N = 10,20,40,80,160,320. These are the first observations:

  • \(\sigma _{\min \limits }(\hat {\mathcal {U}})\) is independent of the chosen basis and independent of N for h ≤ 0.1. However, this is not true for larger stepsizes, cf. Table 2.

  • For every basis, \(\sigma _{\max \limits }(\mathcal {U})\approx \sigma _{\max \limits }(\hat {\mathcal {U}})\) up to a relative error below 10− 3. This coincides with the findings of Theorem 3.

In Tables 12345, and 6, we present more detailed results. From these tables, we can draw the following conclusions:

  • The asymptotic behavior with respect to the stepsize h as indicated in Theorem 3 is clearly visible.

  • For both the Legendre and the Chebyshev bases, \(\sigma _{\max \limits }(\mathcal {U})\) and \(\sigma _{\max \limits }(\hat {\mathcal {U}})\) do not depend on N. This is reasonable for the Legendre basis if Proposition 2 is taken into account.

  • The asymptotics of \(\sigma _{\min \limits }(\mathcal {U})\) coincides with the results of Theorem 3 and Proposition 2 for the modified Legendre basis.

  • The norm of the representation map behaves similarly for all considered bases. Not unexpectedly, an exception is the Runge-Kutta basis for uniform nodes, which has a much larger norm than that for other bases. When comparing \(\sigma _{\min \limits }(\mathcal {U})\) and \(\sigma _{\max \limits }(\mathcal {U})\) for different bases, we observe that the difference between the Legendre basis and the Chebyshev basis on one hand and the modified Legendre basis on the other hand it seems that they have different scaling only, but their conditioning (being the product of the norms of the representation map and its inverse) is similar. A similar property holds for \(\hat {\mathcal {U}}\).

  • The Runge-Kutta basis has surprisingly good properties. However, this property depends on the representation with respect to an orthogonal polynomial basis (in the present example, Chebyshev polynomials). Thus, it is much more expensive to work with it compared to using Legendre or Chebyshev bases directly. __

Table 1 \(\sigma _{\min \limits }(\hat {\mathcal {U}})\)
Table 2 \(\sigma _{\min \limits }(\mathcal {\hat {U}})\). The column headings denote the Legendre basis (L), the modified Legendre basis (mL), the Chebyshev basis (Ch), the Runge-Kutta basis (RK), and the Runge-Kutta basis with uniform nodes (RKu)
Table 3 \(\sigma _{\min \limits }(\mathcal {U})\). The column headings denote the Legendre basis (L), the modified Legendre basis (mL), the Chebyshev basis (Ch), the Runge-Kutta basis (RK), and the Runge-Kutta basis with uniform nodes (RKu)
Table 4 \(\sigma _{\max \limits }(\hat {\mathcal {U}})=\sigma _{\max \limits }(\mathcal {U})\). The column headings denote the Legendre basis (L), the modified Legendre basis (mL), the Chebyshev basis (Ch), the Runge-Kutta basis (RK), and the Runge-Kutta basis with uniform nodes (RKu)
Table 5 \(\kappa (\hat {\mathcal {U}})=\sigma _{\max \limits }(\hat {\mathcal {U}})/\sigma _{\min \limits }(\hat {\mathcal {U}})\). The column headings denote the Legendre basis (L), the modified Legendre basis (mL), the Chebyshev basis (Ch), the Runge-Kutta basis (RK), and the Runge-Kutta basis with uniform nodes (RKu)
Table 6 \(\kappa (\mathcal {U})=\sigma _{\max \limits }(\mathcal {U})/\sigma _{\min \limits }(\mathcal {U})\). The column headings denote the Legendre basis (L), the modified Legendre basis (mL), the Chebyshev basis (Ch), the Runge-Kutta basis (RK), and the Runge-Kutta basis with uniform nodes (RKu)

5.2 Conditioning of the constrained minimization problems

In order to provide a first insight into the conditioning of the constrained minimization problem (23)–(22), we computed the condition numbers \(\kappa _{\mathcal {C}}(\mathcal {A})\) which have crucial importance for the behavior of the computational error. Discussions of \(\kappa (\mathcal {C})\) and \(\lVert \mathcal {C}^{+}\rVert \) have been provided earlier (Proposition 5 and Corollary 2). The examples below are chosen from our earlier investigations that led to surprisingly accurate results.

As done before, we use the bases as introduced in Section 5.1. We abandon the use the Runge-Kutta basis with uniform nodes since this basis has a bad conditioning. We choose M = N + 1 and the Gauss-Legendre nodes as collocation points (6). For this choice, \({\Phi }_{\pi ,M}^{R}={\Phi }_{\pi ,M}^{I}\) (see (12), (11)) and \(\kappa _{\mathcal {C}}(\mathcal {A})\) is identical for both choices.

Example 1

The first example is an index-3 DAE without dynamic degrees of freedom. It has been used before in numerous papers, e.g., [1, 2, 8]. The problem is given by

$$ \begin{array}{@{}rcl@{}} x^{\prime}_{2}(t)+x_{1}(t) & =&q_{1}(t),\\ t\eta x^{\prime}_{2}(t)+x^{\prime}_{3}(t)+(\eta+1)x_{2}(t) & =&q_{2}(t),\\ t\eta x_{2}(t)+x_{3}(t) & =&7q_{3}(t),\quad t\in[0,1]. \end{array} $$

For unique solvability, no boundary or initial conditions are necessary. We choose the exact solution

$$ \begin{array}{@{}rcl@{}} x_{\ast,1}(t) & =e^{-t}\sin t,\\ x_{\ast,2}(t) & =e^{-2t}\sin t,\\ x_{\ast,3}(t) & =e^{-t}\cos t \end{array} $$

and adapt the right-hand side q accordingly. In Table 7, the values of \(\kappa _{\mathcal {C}}(\mathcal {A})\) for \({\Phi }_{\pi ,M}^{R}\) and \({\Phi }_{\pi ,M}^{C}\) are provided. It turns out that the behavior for different functionals is comparable. Therefore, in the following examples, we present only the values for \({\Phi }_{\pi ,M}^{R}\).

Table 7 \(\kappa _{\mathcal {C}}(\mathcal {A})\) for \({\Phi }_{\pi ,M}^{R}\) and \({\Phi }_{\pi ,M}^{C}\). Here,L denotes the Legendre basis, mL the modified Legendre basis, Ch the Chebyshev basis, and RK the Runge-Kutta basis. The smallest values are set in boldface

Example 2

We continue with an example of a Hessenberg index-2 system used previously in [1]. Consider the DAE system

$$ \begin{array}{@{}rcl@{}} x^{\prime}_{1}(t)+\lambda x_{1}(t)-x_{2}(t)-x_{3}(t) & =&q_{1}(t),\\ x^{\prime}_{2}(t)+(\eta t(1-\eta t)-\eta)x_{1}(t)+\lambda x_{2}(t)-\eta tx_{3}(t) & =&q_{2}(t),\\ (1-\eta t)x_{1}(t)+x_{2}(t) & =&q_{3}(t),\quad t\in[0,1], \end{array} $$

with the right-hand side q chosen in such a way that

$$ \begin{array}{@{}rcl@{}} x_{1}(t) & =e^{-t}\sin t,\\ x_{2}(t) & =e^{-2t}\sin t,\\ x_{3}(t) & =e^{-t}\cos t, \end{array} $$

is a solution. It has one dynamical degree of freedom. We choose the special condition

$$ x_{1}(0)=0. $$

The results for η = − 25 and λ = − 1 are provided in Table 8.

Table 8 \(\kappa _{\mathcal {C}}(\mathcal {A})\) for \({\Phi }_{\pi ,M}^{R}\). Here, L denotes the Legendre basis, mL the modified Legendre basis, Ch the Chebyshev basis, and RK the Runge-Kutta basis. The smallest values are set in boldface

Example 3

Our next example is a linearized problem proposed by Campbell and More [21]. It has been used previously in the experiments in [2, 8, 9] and others. Let

$$ A(Dx)'(t)+B(t)x(t)=q(t),\quad t\in[0,5], $$

where

$$ \begin{array}{@{}rcl@{}} A=\begin{bmatrix}1 & 0 & 0 & 0 & 0 & 0\\ 0 & 1 & 0 & 0 & 0 & 0\\ 0 & 0 & 1 & 0 & 0 & 0\\ 0 & 0 & 0 & 1 & 0 & 0\\ 0 & 0 & 0 & 0 & 1 & 0\\ 0 & 0 & 0 & 0 & 0 & 1\\ 0 & 0 & 0 & 0 & 0 & 0 \end{bmatrix},D=\begin{bmatrix}1 & 0 & 0 & 0 & 0 & 0 & 0\\ 0 & 1 & 0 & 0 & 0 & 0 & 0\\ 0 & 0 & 1 & 0 & 0 & 0 & 0\\ 0 & 0 & 0 & 1 & 0 & 0 & 0\\ 0 & 0 & 0 & 0 & 1 & 0 & 0\\ 0 & 0 & 0 & 0 & 0 & 1 & 0 \end{bmatrix}, \end{array} $$
$$ \begin{array}{@{}rcl@{}} B(t)=\begin{bmatrix}0 & 0 & 0 & -1 & 0 & 0 & 0\\ 0 & 0 & 0 & 0 & -1 & 0 & 0\\ 0 & 0 & 0 & 0 & 0 & -1 & 0\\ 0 & 0 & \sin t & 0 & 1 & -\cos t & -2\rho\cos^{2}t\\ 0 & 0 & -\cos t & -1 & 0 & -\sin t & -2\rho\sin t\cos t\\ 0 & 0 & 1 & 0 & 0 & 0 & 2\rho\sin t\\ 2\rho\cos^{2}t & 2\rho\sin t\cos t & -2\rho\sin t & 0 & 0 & 0 & 0 \end{bmatrix},\quad\rho=5, \end{array} $$

subject to the initial conditions

$$ x_{2}(0)=1,\quad x_{3}(0)=2,\quad x_{5}(0)=0,\quad x_{6}(0)=0. $$

This problem has index 3 and dynamical degree of freedom ldyn = 4. The right-hand side q has been chosen in such a way that the exact solution becomes

$$ \begin{array}{@{}rcl@{}} x_{\ast,1} & =&\sin t, \quad\qquad ~x_{\ast,4} =\cos t,\\ x_{\ast,2} & =&\cos t, \qquad\quad x_{\ast,5} =-\sin t,\\ x_{\ast,3} & =&2\cos^{2}t, \qquad x_{\ast,6} =-2\sin2t,\\ x_{\ast,7} & =&-\rho^{-1}\sin t. \end{array} $$

The results are shown in Table 9. Note that, in the present example, h = 5/n in contrast to all previous computations where h = 1/n. □

Table 9 \(\kappa _{\mathcal {C}}(\mathcal {A})\) for \({\Phi }_{\pi ,M}^{R}\). Here, L denotes the Legendre basis, mL the modified Legendre basis, Ch the Chebyshev basis, and RK the Runge-Kutta basis. The smallest values are set in boldface

The numerical experiments give rise to the following observations:

  • The condition numbers of the discrete problem have almost the same size for given polynomial degree N and stepsize h.

  • The experiments indicate that the Runge-Kutta basis seems to provide the lowest condition number for smaller stepsizes. In the case of higher order ansatz functions and larger stepsizes, the modified Legendre basis seems to provide the smallest condition numbers.

  • In order to obtain a complete picture of the relative merits of the different bases, in the case discussed in Theorem 5, not only the condition number \(\kappa (\mathcal {C})\) of \(\mathcal {C}\) but the term \(\lVert \mathcal {C}^{+}\rVert \kappa (\mathcal {C})\) has to be taken into account. Corollary 2 shows that the modified Legendre basis is well-suited for higher orders N.

  • If the perturbed solution \(\tilde {c}\) of (36) is projected back onto the nullspace \(\ker \mathcal {C}\), we can assume that the conditions of Theorem 4 are fulfilled. In this case, \(\mathcal {C}\) does not have any influence on the error estimation.

6 Conclusions

In this paper, we investigated the conditioning of the discrete problems arising in the least-squares collocation method for DAEs. In particular, the solution algorithm has been split into a representation mapping that connects the coefficients of the basis representation to the function to be represented, and a linearly equality constrained linear least-squares problem. A careful investigation of the representation map allowed for a characterization of errors in the function spaces by those made in the solution of the discrete problem.

The perturbation estimates for the constrained least-squares problem have been derived with the application in mind: the approximation of a DAE. The constraints play an exceptional role. If they are satisfied, the resulting numerical solution belongs to the solution space \({H_{D}^{1}}(a,b)\). If this cannot be guaranteed, the convergence theory for the least-squares method does not apply. Some of the characterizing quantities could be estimated analytically for reasonable choices of bases while others have been estimated numerically in certain examples. We believe that these considerations contribute to a robust and efficient implementation of the proposed method, which seems to provide surprisingly accurate numerical solutions to higher-index DAEs.