On the sensitivity of implementations of a least-squares collocation method for linear higher-index differential-algebraic equations

Hanke, Michael

doi:10.1007/s11075-022-01320-z

On the sensitivity of implementations of a least-squares collocation method for linear higher-index differential-algebraic equations

Original Paper
Open access
Published: 06 June 2022

Volume 91, pages 1721–1754, (2022)
Cite this article

Download PDF

You have full access to this open access article

Numerical Algorithms Aims and scope Submit manuscript

On the sensitivity of implementations of a least-squares collocation method for linear higher-index differential-algebraic equations

Download PDF

Michael Hanke ORCID: orcid.org/0000-0003-4950-6646¹

1067 Accesses
Explore all metrics

Abstract

The present paper continues our investigation of an implementation of a least-squares collocation method for higher-index differential-algebraic equations. In earlier papers, we were able to substantiate the choice of basis functions and collocation points for a robust implementation as well as algorithms for the solution of the discrete system. The present paper is devoted to an analytic estimation of condition numbers for different components of an implementation. We present error estimations, which show the sources for the different errors.

Towards a reliable implementation of least-squares collocation for higher index differential-algebraic equations—Part 1: basics and ansatz function choices

Article Open access 15 June 2021

Towards a reliable implementation of least-squares collocation for higher index differential-algebraic equations—Part 2: the discrete least-squares problem

Article Open access 15 June 2021

Least-Squares Collocation for Higher-Index DAEs: Global Approach and Attempts Toward a Time-Stepping Version

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

In a series of papers [1,2,3,4], we were developing a new method for solving higher-index differential algebraic equations (DAEs). In naturally given functional analytic settings, higher index DAEs give rise to ill-posed problems [5, Section 3.9; 6]. Motivated by the well-known method of least-squares, or discretization on preimage space, for the approximation of ill-posed problems [7], this approach has been adapted to the case of higher-index DAEs. In particular, the ansatz spaces for the discrete least-squares problem have been chosen to be piecewise polynomials. Additionally, the integrals have been replaced by discrete versions based on simplified integration rules, in the most simple approach by a version resembling well-known collocation methods for solving boundary value problems for systems of ordinary differential equations (ODEs). The latter, extremely simplified version of the approach proposed in [7], has been motivated by the success of collocations methods for ODEs. This connection led us to coin the notion least-squares collocation method and calling the integration nodes also as collocation points.

For our method, a number of convergence results for both linear and nonlinear DAEs have been proven. Even our first attempts showed surprisingly accurate results when applying the method to some linear examples [1]. More recently, we investigated the algorithmic ingredients of the method in more detail [8, 9]. Not surprisingly, the basis representation and the choice of the integration nodes showed an important influence on the accuracy of the method.

The present note is intended to further quantify the conditioning of the individual ingredients of the implementation of the proposed method and to better understand the (high) accuracy of the computational results obtained so far. Taking the ill-posedness of higher-index DAEs into account, we expect very sensitive discrete problems for sufficiently fine discretizations.

The practical implementation of a projection method consists of two steps for a given approximation space X_π: Choice of a basis and formulation and solution of the arising discrete system by a suitable method. This in turn gives rise to two different operators, the first being the representation map connecting the elements of x ∈ X_π with their vector of coefficients with respect to the chosen basis. The other operator is the discrete version of the the least-squares collocation method that becomes a linearly equality constrained linear least-squares problem in our case. Both operators are investigated in detail both analytically and numerically.

In particular, qualitative and quantitative estimations for the condition numbers and norms of the representation map are proven for bases whose usefulness in the present applications has been established earlier [8, 9].

For the constrained linear least-squares problem, a number of perturbation results are well-known, e.g., [10,11,12]. However, in the present application, the constraints play a special role: In the usual choices of the basis functions, some coefficient vectors do not represent a function in the approximation space. A coefficient vector represents a function in the approximation space if and only if the constraints are fulfilled. Therefore, a new error estimation is derived, which takes care of the exceptional role of the constraints. The important ingredients in this estimate are the condition number of the constraints and a restricted condition number for the least-squares functional. For the former, a complete analytical characterization for the chosen bases is provided. In a number of numerical examples, values for the restricted condition number are presented.

In Section 2, the least-squares method for approximating linear DAEs is introduced and the representation map is constructed. Section 3 is devoted to an in-depth investigation of the representation map. Then we derive a perturbation result for constrained linear least-squares problems in Section 4. Numerical examples for the condition numbers of the different ingredients are given in Section 5. Section 6 contains some conclusions.

2 The problem setting

2.1 The discrete functional

In this section, we repeat the problem setting from [8] for the reader’s convenience. Consider a linear boundary-value problem for a DAE with properly involved derivative,

$$ \begin{array}{@{}rcl@{}} A(t)(Dx)'(t)+B(t)x(t) & =&q(t),\quad t\in[a,b], \end{array} $$

(1)

$$ \begin{array}{@{}rcl@{}} G_{a}x(a)+G_{b}x(b) & =&d. \end{array} $$

(2)

with $[a,b]\subset \mathbb {R}$ being a compact interval, $D=[I 0]\in \mathbb {R}^{k\times m}$, k < m, with the identity matrix $I\in \mathbb {R}^{k\times k}$. Furthermore, $A(t)\in \mathbb {R}^{m\times k}$, $B(t)\in \mathbb {R}^{m\times m}$, and $q(t)\in \mathbb {R}^{m}$ are assumed to be sufficiently smooth with respect to t ∈ [a,b]. Moreover, $G_{a},G_{b}\in \mathbb {R}^{l_{{dyn}}\times m}$. Thereby, l_dyn is the dynamical degree of freedom of the DAE, that is, the number of free parameters that can be fixed by initial and boundary conditions. We assume further that $\ker D\subseteq \ker G_{a}$ and $\ker D\subseteq \ker G_{b}$.

Unlike regular ODEs where l_dyn = k = m, for DAEs, it holds that 0 ≤ l_dyn ≤ k < m, in particular, l_dyn = k for index-one DAEs, l_dyn < k for higher-index DAEs, and l_dyn = 0 can certainly happen.

The appropriate space for looking for solutions of (1)–(2) is (cf [2])

$$ {H_{D}^{1}}(a,b):=\{x\in L^{2}((a,b),\mathbb{R}^{m}:Dx\in H^{1}((a,b),\mathbb{R}^{m}\}. $$

Let $\mathfrak {P}_{K}$ denote the set of all polynomials of degree less than or equal to K ≥ 0. Given the partition π,

$$ \pi:\quad a=t_{0}<t_{1}<\cdots<t_{n}=b, $$

(3)

with the stepsizes h_j = t_j − t_j− 1, $h=\max \limits _{1\leq j\leq n}h_{j}$, and $h_{\min \limits }=\min \limits _{1\leq j\leq n}h_{j}$. Let $C_{\pi }([a,b],\mathbb {R}^{m})$ denote the space of piecewise continuous functions having breakpoints merely at the meshpoints of the partition π. Let N ≥ 1 be a fixed integer. We are looking for an approximate solution of our boundary value problem from the ansatz space $X_{\pi }\subset {H_{D}^{1}}(a,b)$,

$$ \begin{array}{@{}rcl@{}} X_{\pi} &=&\{x\in C_{\pi}([a,b],\mathbb{R}^{m}):Dx\in C([a,b],\mathbb{R}^{k}), \\ && \quad x_{\kappa}\lvert_{[t_{j-1},t_{j})}\in\mathfrak{P}_{N}, \kappa=1,\ldots,k,\\ && \quad x_{\kappa}\lvert_{[t_{j-1},t_{j})}\in\mathfrak{P}_{N-1}, \kappa=k+1,\ldots,m, j=1,\ldots,n\}. \end{array} $$

(4)

The continuous version of the least-squares method reads: Find an x_π ∈ X_π that minimizes the functional

$$ {\Phi}(x)={{\int}_{a}^{b}}\lvert A(t)(Dx)'(t)+B(t)x(t)-q(t)\rvert^{2}{d} t+\lvert G_{a}x(a)+G_{b}x(b)-d\rvert^{2}. $$

(5)

Here and in the following, $\lvert \cdot \rvert $ denotes the Euclidean norm in the corresponding spaces $\mathbb {R}^{\alpha }$ for the appropriate α. Let 〈⋅,⋅〉 denote the scalar product in $\mathbb {R}^{\alpha }$.

The functional values Φ(x), which are needed when minimizing for x ∈ X_π, cannot be evaluated exactly and the integral must be discretized accordingly. Taking into account that the boundary-value problem is ill-posed in the higher index case, perturbations of the functional may have a serious influence on the error of the approximate least-squares solution or even prevent convergence towards the exact solution. Therefore, careful approximations of the integral in Φ are required. We take over the options provided in [8], in which M ≥ N + 1 so-called collocation points

$$ 0\leq\rho_{1}<\cdots<\rho_{M}\leq1. $$

(6)

are used, and further, on the subintervals of the partition π,

$$ t_{ji}=t_{j-1}+\rho_{i}h_{j},\quad i=1,\ldots,M, j=1,\ldots,n. $$

Introducing, for each x ∈ X_π and w(t) = A(t)(Dx)^′(t) + B(t)x(t) − q(t), the corresponding vector $W\in \mathbb {R}^{mMn}$ by

$$ W=\left[\begin{array}{c} W_{1}\\ \vdots\\ W_{n} \end{array}\right]\in\mathbb{R}^{mMn},\quad W_{j}=h_{j}^{1/2}\left[\begin{array}{c} w(t_{j1})\\ \vdots\\ w(t_{jM}) \end{array}\right]\in\mathbb{R}^{mM}, $$

(7)

we turn to an approximate functional of the form

$$ \begin{array}{@{}rcl@{}} {\Phi}_{\pi,M}(x)=W^{T}\mathcal{L}W+\lvert G_{a}x(a)+G_{b}x(b)-d\rvert^{2},\quad x\in X_{\pi}, \end{array} $$

(8)

with a positive definite symmetric matrix^{Footnote 1}

$$ \begin{array}{@{}rcl@{}} \mathcal{L}={\text{diag}}(L\otimes I_{m},\ldots,L\otimes I_{m}). \end{array} $$

(9)

As detailed in [8], we have different options for the positive definite symmetric matrix $L\in \mathbb {R}^{M\times M}$, namely

$$ \begin{array}{@{}rcl@{}} L & =&L^{C}=M^{-1}I_{M}, \end{array} $$

(10)

$$ \begin{array}{@{}rcl@{}} L & =&L^{I}={\text{diag}}(\gamma_{1},\ldots,\gamma_{M}), \end{array} $$

(11)

$$ \begin{array}{@{}rcl@{}} L & =&L^{R}=(V^{-1})^{T}V^{-1}, \end{array} $$

(12)

see [8, Section 3] for details concerning the selection of the quadrature weights γ₁,…,γ_M and the construction of the mass matrix V. We emphasize that the matrices L^C,L^I,L^R depend only on M, the node sequence (6), and the quadrature weights, but do not depend on the partition π and its stepsizes at all.

In the context of the numerical experiments below, we denote each of the different versions of the functional by ${\Phi }_{\pi ,M}^{C}$, ${\Phi }_{\pi ,M}^{I}$, and ${\Phi }_{\pi ,M}^{R}$, respectively. The following convergence result is known [8, Theorem 2]:

Theorem 1

Let the DAE (1) be regular with index $\mu \in \mathbb {N}$ and let the boundary condition (2) be accurately stated. Let x_∗ be a solution of the boundary value problem (1)–(2), and let A,B,q and also x_∗ be sufficiently smooth.

Let all partitions π be such that $h/h_{\min \limits }\leq \rho $, with a global constant ρ. Then, with

$$ M\geq N+\mu, $$

the following statements are true:

(1)
For sufficient fine partitions π and each sequence of arbitrarily placed nodes (6), there exists exactly one $x_{\pi }^{R}\in X_{\pi }$ minimizing the functional ${\Phi }_{\pi ,M}^{R}$ on X_π, and
$$ \begin{array}{@{}rcl@{}} \|x_{\pi}^{R}-x_{\ast}\|_{{H_{D}^{1}}(a,b)}\leq C_{R}h^{N-\mu+1}. \end{array} $$
(2)
For each integration rule related to the interval [0,1], with M nodes (6) and positive weights γ₁,…,γ_M, that is exact for polynomials with degree less than or equal to 2M − 2, and sufficient fine partitions π, there exists exactly one $x_{\pi }^{I}\in X_{\pi }$ minimizing the functional ${\Phi }_{\pi ,M}^{I}$ on X_π, and $x_{\pi }^{I}=x_{\pi }^{R}$, thus
$$ \begin{array}{@{}rcl@{}} \|x_{\pi}^{I}-x_{\ast}\|_{{H_{D}^{1}}(a,b)}\leq C_{R}h^{N-\mu+1}. \end{array} $$

A corresponding result for ${\Phi }_{\pi ,M}^{C}$ is not known. Numerical tests showed excellent convergence results even for cases not covered by Theorem 1. This holds in particular for any M ≥ N + 1 tested in all three cases of the functional Φ_π,M. Thus, M = N + 1 seems to be the preferable choice.

2.2 A basis representation of Φ_π,M

By choosing an appropriate basis for X_π, the minimization of the functional (8) will be reduced to a minimization problem for the coefficients of the elements x ∈ X_π. For the subsequent considerations, it is appropriate to introduce the space

$$ \begin{array}{@{}rcl@{}} \tilde{X}_{\pi} & =&\{x\in C_{\pi}([a,b],\mathbb{R}^{m}): \\ x_{\kappa}\lvert_{[t_{j-1},t_{j})}\in\mathfrak{P}_{N}, \kappa&=&1,\ldots,k,\quad \\x_{\kappa}\lvert_{[t_{j-1},t_{j})}\in\mathfrak{P}_{N-1}, \kappa&=&k+1,\ldots,m, j=1,\ldots,n\}. \end{array} $$

(13)

In particular, the elements x of $\tilde {X}_{\pi }$ are no longer required to have continuous components Dx. Obviously, it holds $X_{\pi }\subseteq \tilde {X}_{\pi }$. In general, $\tilde {X}_{\pi }$ is not a subspace of ${H_{D}^{1}}(a,b)$. However, it holds

$$ \begin{array}{@{}rcl@{}} X_{\pi} & =&\{x\in\tilde{X}_{\pi}:x_{\kappa}\in C[a,b],\quad\kappa=1,\ldots,k\}\\ & =&\tilde{X}_{\pi}\cap {H_{D}^{1}}(a,b). \end{array} $$

Based on the analysis in [8, Section 4], we provide a basis of the ansatz space $\tilde {X}_{\pi }$ to begin with. Assume that {p₀,…,p_N− 1} is a basis of $\mathfrak {P}_{N-1}$ defined on the reference interval [0,1]. Then, $\{\bar {p}_{0},\ldots ,\bar {p}_{N}\}$ given by

$$ \bar{p}_{i}(\tau)=\begin{cases} 1, & i=0,\\ {\int}_{0}^{\tau}p_{i-1}(\sigma)\mathrm{d}\sigma, & i=1,\ldots,N,\quad\tau\in[0,1], \end{cases} $$

(14)

form a basis of $\mathfrak {P}_{N}$. The transformation to the interval (t_j− 1,t_j) of the partition π (3) yields

$$ \begin{array}{@{}rcl@{}} p_{ji}(t)=p_{i}((t-t_{j-1})/h_{j}),\quad\bar{p}_{ji}(t)=h_{j}\bar{p}_{i}((t-t_{j-1})/h_{j}). \end{array} $$

(15)

and in particular

$$ \begin{array}{@{}rcl@{}} \bar{p}_{ji}(t_{j-1}) & =&h_{j}\bar{p}_{i}(0)=h_{j}\begin{cases} 1, & i=0,\\ 0, & i=1,\ldots,N, \end{cases}\\ \bar{p}_{ji}(t_{j}) & =&h_{j}\bar{p}_{i}(1)=h_{j}\begin{cases} 1, & i=0,\\ {{\int}_{0}^{1}}p_{i-1}(\sigma)\mathrm{d}\sigma, & i=1,\ldots,N. \end{cases} \end{array} $$

Next, we form the matrix functions

$$ \begin{array}{@{}rcl@{}} \bar{\mathcal{P}}_{j}=\begin{bmatrix}\bar{p}_{j0} & {\ldots} & \bar{p}_{jN}\end{bmatrix}:[t_{j-1},t_{j}]\rightarrow\mathbb{R}^{1\times(N+1)},\quad\mathcal{P}_{j}=\begin{bmatrix}p_{j0} & {\ldots} & p_{j,N-1}\end{bmatrix}:[t_{j-1},t_{j}]\rightarrow\mathbb{R}^{1\times N}, \end{array} $$

such that

$$ \begin{array}{@{}rcl@{}} \bar{\mathcal{P}}_{j}(t_{j-1}) & =&h_{j}\begin{bmatrix}1 & 0 & {\ldots} & 0\end{bmatrix},\quad j=1,\ldots,n, \end{array} $$

(16)

$$ \begin{array}{@{}rcl@{}} \bar{\mathcal{P}}_{j}(t_{j}) & =&h_{j}\begin{bmatrix}1 & {{\int}_{0}^{1}}p_{0}(\sigma)\mathrm{d}\sigma & {\ldots} & {{\int}_{0}^{1}}p_{N-1}(\sigma)\mathrm{d}\sigma\end{bmatrix},\quad j=1,\ldots,n. \end{array} $$

(17)

Following the discussions in [8], the following bases are suitable in applications:

Legendre basis:

Let P_i denote the Legendre polynomials. Then, p_i is chosen to be the shifted Legendre polynomial, that is

$$ p_{i}(\tau)=P_{i}(2\tau-1),\quad i=0,1,\ldots. $$

Modified Legendre basis:

In this case, we set

$$ \bar{p}_{0}(\tau)=1,\quad\bar{p}_{i}(\tau)=P_{i}(2\tau-1)-(-1)^{i},\quad i=1,2,\ldots, $$

such that $p_{i}=\bar {p}_{i+1}^{\prime }$, i = 0,1,…. This basis has not been considered in [8], but later experiments indicated its usefulness. This is supported by considerations later below.

Chebyshev basis:

Let T_i denote the Chebyshev polynomials of the first kind. Then we define

$$ p_{i}(\tau)=T_{i}(2\tau-1),\quad i=0,1,\ldots. $$

Runge-Kutta basis:

Let 0 < τ₁ < ⋯ < τ_N < 1 be interpolation nodes. Then we set

$$ p_{i}(\tau)=\frac{{\prod}_{\kappa\neq i+1}(\tau-\tau_{\kappa})}{{\prod}_{\kappa\neq i+1}(\tau_{i+1}-\tau_{\kappa})}. $$

(18)

The latter are the usual Lagrange interpolation polynomials. In the implementation, it is advantageous to represent these polynomials in terms of Chebyshev polynomials [8]. Of particular use is the Runge-Kutta basis if the shifted Chebyshev nodes $\tau _{\kappa }=\frac {1}{2}\left (1+\cos \limits \left (\frac {2\kappa -1}{2N}\pi \right )\right )$ are chosen as interpolation nodes.

For $x\in \tilde {X}_{\pi }$ we use the denotations

$$ \begin{array}{@{}rcl@{}} x(t)=x_{j}(t)=\begin{bmatrix}x_{j1}(t)\\ \vdots\\ x_{jm}(t) \end{bmatrix}\in\mathbb{R}^{m},\quad Dx_{j}(t)=\begin{bmatrix}x_{j1}(t)\\ \vdots\\ x_{jk}(t) \end{bmatrix}\in\mathbb{R}^{k},\quad t\in[t_{j-1},t_{j}). \end{array} $$

Then, we develop each x_j componentwise

$$ \begin{array}{@{}rcl@{}} x_{j\kappa}(t) & =&\sum\limits_{l=0}^{N}c_{j\kappa l}\bar{p}_{jl}(t)=\bar{\mathcal{P}}_{j}(t)c_{j\kappa},\quad\kappa=1,\ldots,k,\\ x_{j\kappa}(t) & =&\sum\limits_{l=0}^{N-1}c_{j\kappa l}p_{jl}(t)=\mathcal{P}_{j}(t)c_{j\kappa},\quad\kappa=k+1,\ldots,m. \end{array} $$

(19)

with

$$ \begin{array}{@{}rcl@{}} c_{j\kappa}=\begin{bmatrix}c_{j\kappa0}\\ \vdots\\ c_{j\kappa N} \end{bmatrix}\in\mathbb{R}^{N+1},\quad\kappa=1,\ldots,k,\quad c_{j\kappa}=\begin{bmatrix}c_{j\kappa0}\\ \vdots\\ c_{j\kappa,N-1} \end{bmatrix}\in\mathbb{R}^{N},\quad\kappa=k+1,\ldots,m. \end{array} $$

Introducing still

$$ \begin{array}{@{}rcl@{}} {\Omega}_{j}(t)=\left[\begin{array}{cc} I_{k}\otimes\bar{\mathcal{P}}_{j}(t) & \mathcal{O}_{1}\\ \mathcal{O}_{2} & I_{m-k}\otimes\mathcal{P}_{j}(t) \end{array}\right]\in\mathbb{R}^{m\times(mN+k)},\quad c_{j}=\begin{bmatrix}c_{j1}\\ \vdots\\ c_{jm} \end{bmatrix}\in\mathbb{R}^{mN+k}, \end{array} $$

with $\mathcal {O}_{1}\in \mathbb {R}^{k\times kN}$ and $\mathcal {O}_{2}\in \mathbb {R}^{(m-k)\times (m-k)(N+1)}$ being matrices having only zero entries we represent, for t ∈ I_j, j = 1,…,n,

$$ \begin{array}{@{}rcl@{}} x_{j}(t) & =&{\Omega}_{j}(t)c_{j}, \end{array} $$

(20)

$$ \begin{array}{@{}rcl@{}} (Dx_{j})^{\prime}(t) & =&(D{\Omega}_{j})^{\prime}(t)c_{j}=\begin{bmatrix}I_{k}\otimes\bar{\mathcal{P}}_{j}^{\prime}(t) & \mathcal{O}_{1}\end{bmatrix}c_{j} \end{array} $$

(21)

where $\bar {\mathcal {P}}_{j}^{\prime }(t)=\begin {bmatrix}0 & p_{j0} & {\ldots } & p_{j,N-1}\end {bmatrix}$. Now we collect all coefficients c_jκl in the vector c,

$$ \begin{array}{@{}rcl@{}} c=\begin{bmatrix}c_{1}\\ \vdots\\ c_{n} \end{bmatrix}\in\mathbb{R}^{n(mN+k)}. \end{array} $$

Definition 1

The mapping $\mathcal {R}:\mathbb {R}^{n(mN+k)}\rightarrow \tilde {X}_{\pi }$ given by (20) is called the representation map of $\tilde {X}_{\pi }$ with respect to the basis (15).

Fact 1

We observe that each $x\in \tilde {X}_{\pi }$ has a representation of the kind (20) and each function of the form (20) is an element of $\tilde {X}_{\pi }$. Since $\dim \tilde {X}_{\pi }=n(mN+k)$, $\mathcal {R}$ is a bijective mapping.

Consider an element $x\in \tilde {X}_{\pi }$ with its representation (20). This element belongs to X_π if and only if its first k components are continuous. Using the representation (19) we see that x ∈ X_π if and only if

$$ \mathcal{C}c=0. $$

(22)

where $\mathcal {C}\in \mathbb {R}^{k(n-1)\times n(mN+k)}$ and

$$ \mathcal{C}=\begin{bmatrix}I_{k}\otimes\bar{\mathcal{P}}_{1}(t_{1}) & \mathcal{O}_{1} & -I_{k}\otimes\bar{\mathcal{P}}{}_{2}(t_{1}) & \mathcal{O}_{1}\\ & & I_{k}\otimes\bar{\mathcal{P}}_{2}(t_{2}) & \mathcal{O}_{1} & -I_{k}\otimes\bar{\mathcal{P}}_{3}(t_{2}) & \mathcal{O}_{1}\\ & & & {\ddots} & & \ddots\\ \\ & & & & I_{k}\otimes\bar{\mathcal{P}}_{n-1}(t_{n-1}) & \mathcal{O}_{1} & -I_{k}\otimes\bar{\mathcal{P}}_{n}(t_{n-1}) & \mathcal{O}_{1} \end{bmatrix}. $$

Owing to the construction, $\mathcal {C}$ has full row rank, cf. (16), (17).

Fact 2

Let $\tilde {\mathcal {R}}=\left .\mathcal {R}\right \rvert _{\ker \mathcal {C}}$ be the restriction of the representation map $\mathcal {R}$ onto the kernel $\ker \mathcal {C}$ of $\mathcal {C}$. Since $\mathcal {C}$ has full row rank, $\dim \ker \mathcal {C}=n(mN+k)-k(n-1)=nmN+k=\dim X_{\pi }$, and $\mathcal {R}$ is injective, $\tilde {\mathcal {R}}$ is bijective. In particular, it holds also $\tilde {\mathcal {R}}^{-1}=\left .\mathcal {R}^{-1}\right \rvert _{\text {im}\tilde {\mathcal {R}}}$.

The representations (20)–(21) can be inserted into the functional Φ_π,M (8). The result becomes a least-squares functional of the form

$$ \varphi(c)=\lvert\mathcal{A}c-r\rvert_{\mathbb{R}^{nmM+l_{dyn}}}^{2}\rightarrow\min! $$

(23)

where $\mathcal {A}$ has the structure

$$ \mathcal{A}=\left[\begin{array}{ccccc} \mathcal{A}_{1} & 0 & {\cdots} & & 0\\ 0 & {\ddots} & & & \vdots\\ {\vdots} & & \ddots\\ & & & {\ddots} & 0\\ 0 & & & & \mathcal{A}_{n}\\ G_{a}{\Omega}_{1}(t_{0}) & 0 & {\cdots} & 0 & G_{b}{\Omega}_{n}(t_{n}) \end{array}\right] $$

where $\mathcal {A}_{j}\in \mathbb {R}^{mM\times (mN+k)}$ and $G_{a}{\Omega }_{1}(t_{0}),G_{b}{\Omega }_{n}(t_{n})\in \mathbb {R}^{l_{{dyn}}\times (mN+k)}$.

So the discrete version of the least-squares method (8) becomes the linear least-squares problem (23) under the linear equality constraint (22).

Note that it holds $r\in \mathbb {R}^{nmM+l_{{dyn}}}$ and $\mathcal {A}\in \mathbb {R}^{(nmM+l_{{dyn}})\times n(mN+k)}$. The matrices $\mathcal {A}$ and $\mathcal {C}$ are very sparse. More details of the construction of $\mathcal {A}$ and $\mathcal {C}$ can be found in [9].

2.3 Conditioning of the implementation

The implementation for solving the least-squares problem (8) consists of the following steps:

1.
Form $\mathcal {A}$, $\mathcal {C}$, and r.
2.
Solve the constrained least-squares problem (23)–(22).
3.
Form the approximation x_π.

What are the errors to be expected? Consider the individual steps:

1.
The computation of $\mathcal {C}$ is not critical. Depending on the chosen basis, the entries of $\mathcal {C}$ may be available analytically. So we expect at most rounding errors for the representation of the analytical data.^{Footnote 2} While the components of $\mathcal {A}$ corresponding to the boundary conditions are only subject to truncation errors when representing real numbers in floating point arithmetic, the DAE-related entries are subject to rounding errors as well as certain amplification factors stemming from the multiplication by the square root of the matrix ${\mathscr{L}}$ (9). The conditioning of the versions (10) and (11) is easy to infer while that of (12) has been discussed extensively in [8]. Under reasonable assumptions on the choice of collocation points, they are rather small.

Similar considerations apply to the computation of r.
2.
This algorithmic step corresponds to the solution of a linearly constrained linear least-squares problem. A number of classical perturbation results are available, e.g., [11,12,13]. Further below, we represent a modified version that is taking into account the special role that the equality constraint $\mathcal {C}c=0$ is playing in our application.
3.
This step is described by the representation map $\mathcal {R}$, which assigns, to each solution c of the previous step, the corresponding solution $x_{\pi }=\mathcal {R}c$. If $c\in \ker \mathcal {C}$, it holds $x_{\pi }\in X_{\pi }\subseteq {H_{D}^{1}}(a,b)$. However, due to the errors made in the previous step, the condition $c\in \ker \mathcal {C}$ cannot be guaranteed such that $\mathcal {R}c\in \tilde {X}_{\pi }$ but not necessarily $\mathcal {R}c\in {H_{D}^{1}}(a,b)$! In the next section, we will discuss the properties of $\mathcal {R}$.

3 Properties of the representation map $\mathcal {R}$

In the present section, we will investigate the properties of the representation map $\mathcal {R}:\mathbb {R}^{n(mN+k)}\rightarrow \tilde {X}_{\pi }$ in more detail. Previously, we have established a representation of $\mathcal {R}$ on each subinterval; see (20). We intend to derive a representation of $\mathcal {R}^{-1}$. The main tool will be interpolation.

Choose two sets of interpolation nodes

$$ \begin{array}{@{}rcl@{}} 0\leq\bar{\sigma}_{1}<\cdots<\bar{\sigma}_{N+1}\leq1 { and } 0\leq\sigma_{1}<\cdots<\sigma_{N}\leq1, \end{array} $$

(24)

and shifted ones

$$ \bar{\tau}_{ji}=t_{j-1}+\bar{\sigma}_{i}h_{j},\quad\tau_{ji}=t_{j-1}+\sigma_{i}h_{j} $$

such that the integration formulae

$$ \begin{array}{@{}rcl@{}} {{\int}_{0}^{1}}f(\sigma){d}\sigma\approx\sum\limits_{i=1}^{N+1}\bar{\gamma}_{i}f(\bar{\sigma}_{i}),\quad{and } {{\int}_{0}^{1}}f(\sigma){d}\sigma\approx\sum\limits_{i=1}^{N}\gamma_{i}f(\sigma_{i}) \end{array} $$

have positive weights and so that they are exact for polynomials up to degree 2N and 2N − 2, respectively. With matrices

$$ \begin{array}{@{}rcl@{}} \bar{V}_{j} & = &\begin{bmatrix}\bar{p}_{j0}(\bar{\tau}_{j1}) & {\cdots} & \bar{p}_{jN}(\bar{\tau}_{j1})\\ {\vdots} & & \vdots\\ \bar{p}_{j0}(\bar{\tau}_{j,N+1}) & {\cdots} & \bar{p}_{jN}(\bar{\tau}_{j,N+1}) \end{bmatrix}\!=h_{j}\begin{bmatrix}\bar{p}_{0}(\bar{\sigma}_{1}) & {\cdots} & \bar{p}_{N}(\bar{\sigma}_{1})\\ {\vdots} & & \vdots\\ \bar{p}_{0}(\bar{\sigma}_{N+1}) & {\cdots} & \bar{p}_{N}(\bar{\sigma}_{N+1}) \end{bmatrix}\!=:h_{j}\bar{V}, \end{array} $$

(25)

$$ \begin{array}{@{}rcl@{}} V_{j} & =&\begin{bmatrix}p_{j0}(\tau_{j1}) & {\cdots} & p_{j,N-1}(\tau_{j1})\\ {\vdots} & & \vdots\\ p_{j0}(\tau_{jN}) & {\cdots} & p_{j,N-1}(\tau_{jN}) \end{bmatrix}=\begin{bmatrix}p_{0}(\sigma_{1}) & {\cdots} & p_{N-1}(\sigma_{1})\\ {\vdots} & & \vdots\\ p_{0}(\sigma_{N}) & {\cdots} & p_{N-1}(\sigma_{N}) \end{bmatrix}=:V, \end{array} $$

(26)

and

$$ \begin{array}{@{}rcl@{}} \bar{V}_{j}^{\prime} & =\begin{bmatrix}\bar{p}^{\prime}_{j0}(\bar{\tau}_{j1}) & {\cdots} & \bar{p}^{\prime}_{j N}(\bar{\tau}_{j1})\\ {\vdots} & & \vdots\\ \bar{p}^{\prime}_{j0}(\bar{\tau}_{j,N+1}) & {\cdots} & \bar{p}^{\prime}_{jN}(\bar{\tau}_{j,N+1}) \end{bmatrix}=\begin{bmatrix}0 & p_{0}(\bar{\sigma}_{1}) & {\cdots} & p_{N-1}(\bar{\sigma}_{1})\\ {\vdots} & {\vdots} & & \vdots\\ 0 & p_{0}(\bar{\sigma}_{N+1}) & {\cdots} & p_{N-1}(\bar{\sigma}_{N+1}) \end{bmatrix}=:\mathring{V}, \end{array} $$

(27)

we represent, for κ = 1,…,k,

$$ \begin{array}{@{}rcl@{}} X_{j\kappa}:=\begin{bmatrix}x_{j\kappa}(\bar{\tau}_{j1})\\ \vdots\\ x_{j\kappa}(\bar{\tau}_{j,N+1}) \end{bmatrix}=\bar{V}_{j}c_{j\kappa}=h_{j}\bar{V}c_{j\kappa},\\ X^{\prime}_{j\kappa}:=\begin{bmatrix}x^{\prime}_{j\kappa}(\bar{\tau}_{j1})\\ \vdots\\ x^{\prime}_{j\kappa}(\bar{\tau}_{j,N+1}) \end{bmatrix}=\bar{V}^{\prime}_{j}c_{j\kappa}=\mathring{V}c_{j\kappa}, \end{array} $$

and, for κ = k + 1,…,m,

$$ \begin{array}{@{}rcl@{}} X_{j\kappa}:=\begin{bmatrix}x_{j\kappa}(\tau_{j1})\\ \vdots\\ x_{j\kappa}(\tau_{jN}) \end{bmatrix}=V_{j}c_{j\kappa}=Vc_{j\kappa}. \end{array} $$

The matrices $\bar {V}$ and V are nonsingular. This amounts to the relation

$$ \begin{array}{@{}rcl@{}} c_{j}=\begin{bmatrix}c_{j1}\\ \vdots\\ c_{jk}\\ c_{j,k+1}\\ \vdots\\ c_{jm} \end{bmatrix}=\begin{bmatrix}I_{k}\otimes\bar{V}^{-1}\\ & I_{m-k}\otimes V^{-1} \end{bmatrix}\begin{bmatrix}\frac{1}{h_{j}}X_{j1}\\ \vdots\\ \frac{1}{h_{j}}X_{jk}\\ X_{j,k+1}\\ \vdots\\ X_{jm} \end{bmatrix}, j=1,\ldots,n. \end{array} $$

(28)

Owing to the fact, that polynomials of degree N and N − 1 are uniquely determined by their values at N + 1 and N different nodes, respectively, formula (28) provides $c=\mathcal {R}^{-1}x$ for each arbitrary given $x\in \tilde {X}_{\pi }$.

Next, we equip $\tilde {X}_{\pi }$ with the norms

$$ \begin{array}{@{}rcl@{}} \lVert x\rVert_{L^{2}}^{2} & =&\sum\limits_{j=1}^{n}\left\{ \sum\limits_{\kappa=1}^{k}{\int}_{t_{j-1}}^{t_{j}}\lvert x_{j\kappa}(t)\rvert^{2}{d} t+ \sum\limits_{\kappa=k+1}^{m}{\int}_{t_{j-1}}^{t_{j}}\lvert x_{j\kappa}(t)\rvert^{2}{d} t \right\} , \end{array} $$

(29)

$$ \begin{array}{@{}rcl@{}} \lVert x\rVert_{H_{D,\pi}^{1}}^{2} & =&\sum\limits_{j=1}^{n}\left\{ \sum\limits_{\kappa=1}^{k}{\int}_{t_{j-1}}^{t_{j}}(\lvert x_{j\kappa}(t)\rvert^{2}+\lvert x_{j\kappa}^{\prime}(t)\rvert^{2}){d} t+ \sum\limits_{\kappa=k+1}^{m}{\int}_{t_{j-1}}^{t_{j}}\lvert x_{j\kappa}(t)\rvert^{2}{d} t \right\} . \end{array} $$

(30)

The latter norm reduces, for x ∈ X_π, to $\lVert x\rVert _{H_{D,\pi }^{1}}=\|x\|_{{H_{D}^{1}}(a,b)}$. Moreover, $\lVert \cdot \rVert _{L^{2}}=\lVert \cdot \rVert _{L^{2}((a,b),\mathbb {R}^{m})}$. On $\mathbb {R}^{n(mN+k)}$, we use the Euclidean norm. Then $\mathcal {R}$ becomes a homeomorphism in each case, and we are interested in the respective operator norms $\lVert \mathcal {R}\rVert _{\mathbb {R}^{n(mN+k)}\rightarrow L^{2}}$, $\lVert \mathcal {R}\rVert _{\mathbb {R}^{n(mN+k)}\rightarrow H_{D,\pi }^{1}}$, $\|\mathcal {R}^{-1}\|_{L^{2}\rightarrow \mathbb {R}^{n(mN+k)}}$, and $\|\mathcal {R}^{-1}\|_{H_{D,\pi }^{1}\rightarrow \mathbb {R}^{n(mN+k)}}$. Regarding the properties of the related integration formulae and introducing the diagonal matrices

$$ \bar{\Gamma}={\text{diag}}(\bar{\gamma}_{1}^{1/2},\cdots,\bar{\gamma}_{N+1}^{1/2}), {\Gamma}={\text{diag}}(\gamma_{1}^{1/2},\cdots,\gamma_{N}^{1/2}) $$

(31)

we compute for any $x=\mathcal {R}c$, and κ = 1,…,k,

$$ \begin{array}{@{}rcl@{}} {\int}_{t_{j-1}}^{t_{j}}\lvert x_{j\kappa}(t)\rvert^{2}{d} t & =&h_{j}\sum\limits_{i=1}^{N+1}\bar{\gamma}_{i}\lvert x_{j\kappa}(\bar{\tau}_{ji})\rvert^{2}=h_{j}\sum\limits_{i=1}^{N+1}\lvert\bar{\gamma}_{i}^{1/2}x_{j\kappa}(\bar{\tau}_{ji})\rvert^{2}=h_{j}\lvert\bar{\Gamma}X_{j\kappa}\rvert^{2}\\ & =&h_{j}\lvert\bar{\Gamma}\bar{V}_{j}c_{j\kappa}\rvert^{2}=h_{j}\lvert\bar{\Gamma}h_{j}\bar{V}c_{j\kappa}\rvert^{2}, \end{array} $$

$$ \begin{array}{@{}rcl@{}} {\int}_{t_{j-1}}^{t_{j}} & &\!\!\!\!\!\!\!\!\!(\lvert x_{j\kappa}(t)\rvert^{2}+\lvert x_{j\kappa}^{\prime}(t)\rvert^{2}){d} t=h_{j}\sum\limits_{i=1}^{N+1}\bar{\gamma}_{i}(\lvert x_{j\kappa}(\bar{\tau}_{ji})\rvert^{2}+\lvert x_{j\kappa}^{\prime}(\bar{\tau}_{ji})\rvert^{2})\\ & =&h_{j}\sum\limits_{i=1}^{N+1}(\lvert\bar{\gamma}_{i}^{1/2}x_{j\kappa}(\bar{\tau}_{ji})\rvert^{2}+(\lvert\bar{\gamma}_{i}^{1/2}x_{j\kappa}^{\prime}(\bar{\tau}_{ji})\rvert^{2})=h_{j}\lvert\bar{\Gamma}X_{j\kappa}\rvert^{2}+h_{j}\lvert\bar{\Gamma}X_{j\kappa}^{\prime}\rvert^{2}\\ & =&h_{j}\lvert\bar{\Gamma}\bar{V}_{j}c_{j\kappa}\rvert^{2}+h_{j}\lvert\bar{\Gamma}\mathring{V}c_{j\kappa}\rvert^{2}=h_{j}\lvert\bar{\Gamma}h_{j}\bar{V}c_{j\kappa}\rvert^{2}+h_{j}\lvert\bar{\Gamma}\mathring{V}c_{j\kappa}\rvert^{2}\\ & =&h_{j}\left\lvert\begin{bmatrix}h_{j}\bar{\Gamma}\bar{V}\\ \bar{\Gamma}\mathring{V} \end{bmatrix}c_{j\kappa}\right\rvert^{2}, \end{array} $$

and, in addition, for κ = k + 1,…,m,

$$ \begin{array}{@{}rcl@{}} {\int}_{t_{j-1}}^{t_{j}}\lvert x_{j\kappa}(t)\rvert^{2}{d} t & =&h_{j}\sum\limits_{i=1}^{N}\gamma_{i}\lvert x_{j\kappa}(\tau_{ji})\rvert^{2}=h_{j}\sum\limits_{i=1}^{N}\lvert\gamma_{i}^{1/2}x_{j\kappa}(\tau_{ji})\rvert^{2}=h_{j}\lvert{\Gamma} X_{j\kappa}\rvert^{2}\\ & =&h_{j}\lvert{\Gamma} Vc_{j\kappa}\rvert^{2}. \end{array} $$

Summarizing, the following representations result:

$$ \begin{array}{@{}rcl@{}} \rVert x\lVert_{L^{2}}^{2}=\sum\limits_{j=1}^{n}\left\{ \sum\limits_{\kappa=1}^{k}\lvert h_{j}^{3/2}\bar{\Gamma}\bar{V}c_{j\kappa}\rvert^{2}+\sum\limits_{\kappa=k+1}^{m}\lvert h_{j}^{1/2}{\Gamma} Vc_{j\kappa}\rvert^{2}\right\} =\sum\limits_{j=1}^{n}\lvert U_{j}c_{j}\rvert^{2}=\lvert\mathcal{U}c\rvert^{2}, \end{array} $$

(32)

with matrices

$$ \begin{array}{@{}rcl@{}} \mathcal{U} & =&{\text{diag}}(U_{1},\cdots,U_{n})\in\mathbb{R}^{n(mN+k)\times n(mN+k)},\\ U_{j} & =&\begin{bmatrix}I_{k}\otimes h_{j}^{3/2}\bar{\Gamma}\bar{V}\\ & I_{m-k}\otimes h_{j}^{1/2}{\Gamma} V \end{bmatrix}\in\mathbb{R}^{(mN+k)\times(mN+k)}, \end{array} $$

(33)

and

$$ \begin{array}{@{}rcl@{}} \lVert x\rVert_{H_{D,\pi}^{1}}^{2}=\sum\limits_{j=1}^{n}\left\{ \sum\limits_{\kappa=1}^{k}\left\lvert\begin{bmatrix}h_{j}^{3/2}\bar{\Gamma}\bar{V}\\ h_{j}^{1/2}\bar{\Gamma}\mathring{V} \end{bmatrix}c_{j\kappa}\right\rvert^{2}+\sum\limits_{\kappa=k+1}^{m}\lvert h_{j}^{1/2}{\Gamma} Vc_{j\kappa}\rvert^{2}\right\} =\sum\limits_{j=1}^{n}\lvert\hat{U}_{j}c_{j}\rvert^{2}=\lvert\hat{\mathcal{U}}c\rvert^{2}, \end{array} $$

(34)

with matrices

$$ \begin{array}{@{}rcl@{}} \hat{\mathcal{U}} & =&{\text{diag}}(\hat{U}_{1},\cdots,\hat{U}_{n})\in\mathbb{R}^{n(mN+k+k(N+1))\times n(mN+k)},\\ \hat{U}_{j} & =&\begin{bmatrix}I_{k}\otimes\begin{bmatrix}h_{j}^{3/2}\bar{\Gamma}\bar{V}\\ h_{j}^{1/2}\bar{\Gamma}\mathring{V} \end{bmatrix}\\ & I_{m-k}\otimes h_{j}^{1/2}{\Gamma} V \end{bmatrix}\in\mathbb{R}^{(mN+k+k(N+1))\times(mN+k)}. \end{array} $$

(35)

Proposition 1

The singular values of $\mathcal {U}$ and $\hat {\mathcal {U}}$ are independent of the choice of the nodes σ_i and $\bar {\sigma }_{i}$. Moreover, all singular values are positive.

Proof

U_j and $\hat {U}_{j}$ have full column-rank. Consequently, $\mathcal {U}^{T}\mathcal {U}$ and $\hat {\mathcal {U}}^{T}\hat {\mathcal {U}}$ are symmetric and positive definite. Hence, their eigenvalues are all positive and, thus, also their singular values being the square root of the eigenvalues. The eigenvalues are independent of the choice of the nodes σ_i and $\bar {\sigma }_{i}$ since, owing to the properties of the involved integration formulae, it holds that

$$ \begin{array}{@{}rcl@{}} (V^{T}{\Gamma}^{2}V)_{\alpha\upbeta} & =&{{\int}_{0}^{1}}p_{\alpha-1}p_{\upbeta-1}(\sigma){d}\sigma, \alpha,\upbeta=1,\cdots,N,\\ (\bar{V}^{T}\bar{\Gamma}^{2}\bar{V})_{\alpha\upbeta} & =&{{\int}_{0}^{1}}\bar{p}_{\alpha-1}\bar{p}_{\upbeta-1}(\sigma){d}\sigma, \alpha,\upbeta=1,\cdots,N+1,\\ (\mathring{V}^{T}\bar{\Gamma}^{2}\mathring{V})_{\alpha\upbeta} & =&{{\int}_{0}^{1}}\bar{p}^{\prime}_{\alpha-1}\bar{p}^{\prime}_{\upbeta-1}(\sigma){d}\sigma, \alpha,\upbeta=1,\cdots,N+1, \end{array} $$

such that the entries of $\mathcal {U}^{T}\mathcal {U}$ and $\hat {\mathcal {U}}^{T}\hat {\mathcal {U}}$ are independent of the choice of the integration formulae. □

Theorem 2

Let $\sigma _{{\min \limits }}(\mathcal {U})$ and $\sigma _{{\max \limits }}(\mathcal {U})$ denote the maximal and minimal singular values of $\mathcal {U}$. Similarly, let $\sigma _{{\min \limits }}(\mathcal {\hat {U}})$ and $\sigma _{{\max \limits }}(\mathcal {\hat {U}})$ denote the maximal and minimal singular values of $\hat {\mathcal {U}}$. Then it holds

$$ \begin{array}{@{}rcl@{}} \lVert\mathcal{R}\rVert_{\mathbb{R}^{n(mN+k)}\rightarrow L^{2}} & =&\sigma_{{\max}}(\mathcal{U}),\quad\|\mathcal{R}^{-1}\|_{L^{2}\rightarrow\mathbb{R}^{n(mN+k)}}=\sigma_{{\min}}(\mathcal{U})^{-1},\\ \lVert\mathcal{R}\rVert_{\mathbb{R}^{n(mN+k)}\rightarrow H_{D,\pi}^{1}} & =&\sigma_{{\max}}(\mathcal{\hat{U}}),\quad\|\mathcal{R}^{-1}\|_{H_{D,\pi}^{1}\rightarrow\mathbb{R}^{n(mN+k)}}=\sigma_{{\min}}(\mathcal{\hat{U}})^{-1}. \end{array} $$

Proof

It holds $\mathcal {\hat {U}}\in \mathbb {R}^{\nu \times \lambda }$ with ν = n(mN + k + k(N + 1)) and λ = n(mN + k). Let $\hat {\mathcal {U}}=U{\Sigma } V^{T}$ be the singular value decomposition of $\mathcal {U}$. Here,

$$ {\Sigma}=\left[\begin{array}{ccc} s_{1}\\ & \ddots\\ & & s_{\nu}\\ 0 & {\cdots} & 0 \end{array}\right]\in\mathbb{R}^{\nu\times\lambda} $$

with $s_{1}=\sigma _{{\max \limits }}(\hat {\mathcal {U}})$ and $s_{\nu }=\sigma _{{\min \limits }}(\hat {\mathcal {U}})$. According to Proposition 1, $\sigma _{{\min \limits }}(\hat {\mathcal {U}})>0$. By (34), this leads to

$$ \lVert\mathcal{R}\rVert_{\mathbb{R}^{n(mN+k)}\rightarrow H_{D,\pi}^{1}}=\sup_{c\neq0}\frac{\lVert\mathcal{R}c\rVert_{H_{D,\pi}^{1}}}{\lvert c\rvert_{\mathbb{R}^{\lambda}}}=\sup_{c\neq0}\frac{\lvert\hat{\mathcal{U}}c\rvert_{\mathbb{R}^{\nu}}}{\lvert c\rvert_{\mathbb{R}^{\lambda}}}=\sup_{\chi\neq0}\frac{\lvert{\Sigma}\chi\rvert_{\mathbb{R}^{\nu}}}{\lvert\chi\rvert_{\mathbb{R}^{\lambda}}}=\sigma_{{\max}}(\hat{\mathcal{U}}) $$

and

$$ \lVert\mathcal{R}^{-1}\rVert_{H_{D,\pi}^{1}\rightarrow\mathbb{R}^{n(mN+k)}}=\sup_{x\neq0}\frac{\lvert\mathcal{R}^{-1}x\rvert_{\mathbb{R}^{\lambda}}}{\lVert x\rVert_{H_{D,\pi}^{1}}}=\sup_{c\neq0}\frac{\lvert c\rvert_{\mathbb{R}^{\lambda}}}{\lVert\mathcal{R}c\rVert_{H_{D,\pi}^{1}}}=\sup_{\chi\neq0}\frac{\lvert\chi\rvert_{\mathbb{R}^{\lambda}}}{\lvert{\Sigma}\chi\rvert_{\mathbb{R}^{\nu}}}=\sigma_{{\min}}(\hat{\mathcal{U}})^{-1}. $$

The statements concerning $\lVert \mathcal {R}\rVert _{\mathbb {R}^{n(mN+k)}\rightarrow L^{2}}$ and $\|\mathcal {R}^{-1}\|_{H_{D,\pi }^{1}\rightarrow \mathbb {R}^{n(mN+k)}}$ follow similarly. □

Using the structure (33) of $\mathcal {U}$, we obtain

$$ \begin{array}{@{}rcl@{}} \sigma_{{\max}}(\mathcal{U}) & =&\max_{j=1,\ldots,n}\max\{h_{j}^{3/2}\sigma_{{\max}}(\bar{\Gamma}\bar{V}),h_{j}^{1/2}\sigma_{{\max}}({\Gamma} V)\}\\ & =&\max_{j=1,\ldots,n}h_{j}^{1/2}\max\{h_{j}\sigma_{{\max}}(\bar{\Gamma}\bar{V}),\sigma_{{\max}}({\Gamma} V)\},\\ \sigma_{{\min}}(\mathcal{U}) & =&\min_{j=1,\ldots,n}\min\{h_{j}^{3/2}\sigma_{{\min}}(\bar{\Gamma}\bar{V}),h_{j}^{1/2}\sigma_{{\min}}({\Gamma} V)\}\\ & =&\min_{j=1,\ldots,n}h_{j}^{1/2}\min\{h_{j}\sigma_{{\min}}(\bar{\Gamma}\bar{V}),\sigma_{{\min}}({\Gamma} V)\}. \end{array} $$

The estimation of the singular values of $\hat {\mathcal {U}}$ leads to slightly more involved expressions. Let $U_{j,{red}}=\left [\begin {array}{c} h_{j}\bar {\Gamma }\bar {V}\\ \bar {\Gamma }\mathring {V} \end {array}\right ]$. Then, it holds

$$ \begin{array}{@{}rcl@{}} \sigma_{{\max}}(\mathcal{\hat{U}}) & =&\max_{j=1,\ldots,n}h_{j}^{1/2}\max\{\sigma_{{\max}}(U_{j,{red}}),\sigma_{{\max}}({\Gamma} V)\},\\ \sigma_{{\min}}(\hat{\mathcal{U}}) & =&\min_{j=1,\ldots,n}h_{j}^{1/2}\min\{\sigma_{{\min}}(U_{j,{red}}),\sigma_{{\min}}({\Gamma} V)\}. \end{array} $$

We note that $\sigma _{\min \limits }(\bar {\Gamma }\mathring {V})=0$ and $\sigma _{\max \limits }({\Gamma } V)=\sigma _{\max \limits }(\bar {\Gamma }\mathring {V})$. This follows immediately from the construction of the basis for the differential components (14). The definition of singular values and Weyl’s Theorem [14, Theorem III.2.1] provides us with

$$ \begin{array}{@{}rcl@{}} \lambda_{{\max}}(\mathring{V}^{T}\bar{\Gamma}^{2}\mathring{V})&\leq & \lambda_{{\max}}({h_{j}^{2}}\bar{V}^{T}\bar{\Gamma}^{2}\bar{V}+\mathring{V}^{T}\bar{\Gamma}^{2}\mathring{V})=\sigma_{{\max}}(U_{j,{red}})^{2}\\ & \leq& {h_{j}^{2}}\lambda_{{\max}}(\bar{V}^{T}\bar{\Gamma}^{2}\bar{V})+\lambda_{{\max}}(\mathring{V}^{T}\bar{\Gamma}^{2}\mathring{V}),\\ {h_{j}^{2}}\lambda_{{\min}}(\bar{V}^{T}\bar{\Gamma}^{2}\bar{V})&\leq & \lambda_{{\min}}({h_{j}^{2}}\bar{V}^{T}\bar{\Gamma}^{2}\bar{V}+\mathring{V}^{T}\bar{\Gamma}^{2}\mathring{V})=\sigma_{{\min}}(U_{j,{red}})^{2}\leq {h_{j}^{2}}\lambda_{{\max}}(\bar{V}^{T}\bar{\Gamma}^{2}\bar{V}) \end{array} $$

since $\lambda _{\min \limits }(\mathring {V}^{T}\bar {\Gamma }^{2}\mathring {V})=0$. Then,

$$ \begin{array}{@{}rcl@{}} \sigma_{\max}({\Gamma} V)&= & \lambda_{\max}(V^{T}{\Gamma}^{2}V)^{1/2}\\ &\leq & \max\{\sigma_{{\max}}(U_{j,{red}}),\sigma_{{\max}}({\Gamma} V)\}\\ &\leq & \sigma_{{\max}}({\Gamma} V)+O(h). \end{array} $$

Moreover,

$$ \begin{array}{@{}rcl@{}} \min\{h_{j}\sigma_{\min}(\bar{\Gamma}\bar{V}),\sigma_{\min}({\Gamma} V)\} & \leq&\min\{\sigma_{{\min}}(U_{j,{red}}),\sigma_{\min}({\Gamma} V)\}\\ & \leq&\min\{h_{j}\sigma_{\max}(\bar{\Gamma}\bar{V}),\sigma_{\min}({\Gamma} V)\} \end{array} $$

Collecting all estimates ,Theorem 2 provides

Theorem 3

Let the grid (3) have the maximal stepsize h and the minimal stepsize $h_{\min \limits }$. Furthermore, let Γ and $\bar {\Gamma }$ be given by (31) and let V, $\bar {V}$, and V̈ be given by (26), (25), (27). Then it holds, for sufficiently small h,

$$ \begin{array}{@{}rcl@{}} \lVert\mathcal{R}\rVert_{\mathbb{R}^{n(mN+k)}\rightarrow L^{2}} & =&h^{1/2}\sigma_{\max}({\Gamma} V)=O(h^{1/2}),\\ \|\mathcal{R}^{-1}\|_{L^{2}\rightarrow\mathbb{R}^{n(mN+k)}} & =&h_{\min}^{-3/2}\sigma_{\min}(\bar{\Gamma}\bar{V})^{-1}=O(h_{\min}^{-3/2}),\\ \lVert\mathcal{R}\rVert_{\mathbb{R}^{n(mN+k)}\rightarrow H_{D,\pi}^{1}} & =&h^{1/2}\sigma_{\max}({\Gamma} V)+O(h^{3/2})=O(h^{1/2}), \end{array} $$

and

$$ h_{\min}^{-3/2}\sigma_{\max}(\bar{\Gamma}\bar{V})^{-1}\leq\lVert\mathcal{R}^{-1}\rVert_{H_{D,\pi}^{1}\rightarrow\mathbb{R}^{n(mN+k)}}\leq h_{\min}^{-3/2}\sigma_{\min}(\bar{\Gamma}\bar{V})^{-1}. $$

In particular, $\lVert \mathcal {R}\rVert _{\mathbb {R}^{n(mN+k)}\rightarrow H_{D,\pi }^{1}}=\lVert \mathcal {R}\rVert _{\mathbb {R}^{n(mN+k)}\rightarrow L^{2}}+O(h^{3/2})$.

In these estimates, we used the fact $\sigma _{\min \limits }({\Gamma } V)>0$. Note that the constants hidden in the big-O notation in this theorem depend on both N and the chosen basis. For the restriction $\tilde {\mathcal {R}}$ of $\mathcal {R}$ onto $\ker \mathcal {C}$ we obtain, obviously,

$$ \begin{array}{@{}rcl@{}} \lVert\tilde{\mathcal{R}}\rVert & \leq\lVert\mathcal{R}\rVert,\quad\lVert\tilde{\mathcal{R}}^{-1}\rVert\leq\lVert\mathcal{R}^{-1}\rVert. \end{array} $$

For some special cases, the singular values can be easily derived.

Proposition 2

Let V, $\bar {V}$, and V̈ be given by (25)–(27) and Γ, $\bar {\Gamma }$ by (31). Then it holds:

(1)
Let p₀,…,p_N− 1 be an orthogonal basis in L²(0,1). Then
$$ \begin{array}{@{}rcl@{}} \sigma_{\min}({\Gamma} V) & =\min\left\{ \lVert p_{\alpha}\|_{L^{2}(0,1)}:\alpha=0,\ldots,N-1\right\} ,\\ \sigma_{\max}({\Gamma} V) & =\max\left\{ \lVert p_{\alpha}\|_{L^{2}(0,1)}:\alpha=0,\ldots,N-1\right\} . \end{array} $$

In particular, if p₀,…,p_N− 1 is the Legendre basis, $\sigma _{\min \limits }({\Gamma } V)=(2N-1)^{-1/2}$ and $\sigma _{\max \limits }({\Gamma } V)=1$.
(2)
For an orthonormal basis p₀,…,p_N− 1 in L²(0,1), $\sigma _{\min \limits }({\Gamma } V)=\sigma _{\max \limits }({\Gamma } V)=1$.
(3)
If p₀,…,p_N− 1 is the modified Legendre basis, it holds $\sigma _{\min \limits }(\bar {\Gamma }\bar {V})\geq (2N+1)^{-1/2}$ and $\sigma _{\max \limits }(\bar {\Gamma }\bar {V})\leq (N+2)^{1/2}$. Furthermore, the estimates
$$ \sigma_{\min}({\Gamma} V)\geq\left( \frac{1}{2-2\cos\frac{N}{N+2}\pi}\right)^{1/2}\geq\frac{1}{2},\quad\sigma_{\max}({\Gamma} V)\leq\left( \frac{2N-1}{2-2\cos\frac{1}{N+2}\pi}\right)^{1/2} $$
hold true.

Proof

First, we observe that $(V^{T}{\Gamma }^{2}V)_{\alpha \upbeta }={{\int \limits }_{0}^{1}}p_{\alpha -1}(\rho )p_{\upbeta -1}(\rho ){d}\rho =\delta _{\alpha \upbeta }\lVert p_{\alpha -1}\rVert _{L^{2}(a,b)}^{2}$. This provides (1) and (2) as special cases.

Consider the modified Legendre basis now. It holds ${{\int \limits }_{0}^{1}}\bar {p}_{0}^{2}(\rho ){d}\rho =1$ and ${{\int \limits }_{0}^{1}}\bar {p}_{0}(\rho )\bar {p}_{\alpha }(\rho ){d}\rho ={{\int \limits }_{0}^{1}}(P_{\alpha }(2\rho -1)-(-1)^{\alpha }){d}\rho =(-1)^{\alpha +1}$ for α = 1,2,…. Moreover, for α,β= 1,2,…, we have

$$ \begin{array}{@{}rcl@{}} {{\int}_{0}^{1}}\bar{p}_{\alpha}(\rho)\bar{p}_{\upbeta}(\rho){d}\rho & =&{{\int}_{0}^{1}}(P_{\alpha}(2\rho-1)-(-1)^{\alpha})(P_{\upbeta}(2\rho-1)-(-1)^{\upbeta}){d}\rho\\ & =&{{\int}_{0}^{1}}P_{\alpha}(2\rho-1)P_{\upbeta}(2\rho-1){d}\rho+(-1)^{\alpha+\upbeta}\\ & =&(2\alpha+1)^{-1}\delta_{\alpha\upbeta}+(-1)^{\alpha+\upbeta}. \end{array} $$

Collecting these expressions, we obtain the compact representation

$$ \bar{V}^{T}\bar{\Gamma}^{2}\bar{V}={\text{diag}}(1,\frac{1}{3},\ldots,(2N+1)^{-1})+ff^{T} $$

with $f^{T}=[1,1,-1,+1,-1,\ldots ,\pm 1]\in \mathbb {R}^{N+1}$. ff^T is a rank-1 matrix having, therefore, the N-fold eigenvalue 0. Moreover, f is an eigenvector to the eigenvalue f^Tf = N + 1. In particular, ff^T is positive semidefinite. Invoking Weyl’s theorem again, we obtain

$$ \begin{array}{@{}rcl@{}} (2N+1)^{-1} & =&\lambda_{\min}({\text{diag}}(1,\frac{1}{3},\ldots,(2N+1)^{-1}))\leq\lambda_{\min}(\bar{V}^{T}\bar{\Gamma}^{2}\bar{V})\\ \lambda_{\max}(\bar{V}^{T}\bar{\Gamma}^{2}\bar{V}) & \leq&\lambda_{\max}({\text{diag}}(1,\frac{1}{3},\ldots,(2N+1)^{-1}))+\lambda_{\max}(ff^{T})=N+2. \end{array} $$

This proves the first assertion of (3).

The relation $(\mathring {V}^{T}\bar {\Gamma }^{2}\mathring {V})_{\alpha \upbeta }={{\int \limits }_{0}^{1}}\bar {p}^{\prime }_{\alpha -1}\bar {p}^{\prime }_{\upbeta -1}(\sigma ){d}\sigma $ shows that $K=\mathring {V}^{T}\bar {\Gamma }^{2}\mathring {V}$ is the stiffness matrix of the basis functions. For the modified Legendre basis, it has been investigated in [1, cp Eq. (31)]. According to the proof of Proposition A.2 of [1], the nonvanishing eigenvalues can be estimated by^{Footnote 3}

$$ \lambda_{\min}(K)\geq\frac{1}{2-2\cos\frac{N}{N+2}\pi},\quad\lambda_{\max}(K)\leq\frac{2N-1}{2-2\cos\frac{1}{N+2}\pi}. $$

$K^{\prime }=V^{T}{\Gamma }^{2}V$ is the submatrix of K obtained by omitting the first row and column of K, which consist entirely of zeros. This provides the final relations of assertion (3). □

An asymptotic analysis shows that $\sigma _{\max \limits }({\Gamma } V)\leq \frac {2}{\sqrt {\pi }}N^{3/2}+O(N^{1/2})$ in the case of the modified Legendre basis.

Remark 1

We are able to estimate the size of the jump of elements of $\tilde {X}_{\pi }$ at the grid points. For any $\tilde {x}\in \tilde {X}_{\pi }$ and $\tilde {c}=\mathcal {R}^{-1}\tilde {x}$, it holds

$$ \begin{array}{@{}rcl@{}} \lVert\tilde{x}_{j\kappa}\lVert_{C[t_{j-1},t_{j})}&\leq C_{h_{j}}\lVert\tilde{x}_{j\kappa}\rVert_{H^{1}(t_{j-1},t_{j})}=C_{h_{j}}h_{j}^{1/2}\lvert U_{j,{red}}\tilde{c}_{j\kappa}\rvert\\ &\leq C_{h_{j}}h_{j}^{1/2}\sigma_{\max}(U_{j,{red}})\lvert\tilde{c}\rvert \end{array} $$

with $C_{h_{j}}=\left (\max \limits \{2/h_{j},h_{j}\}\right )^{1/2}$. Here, we used [15, Lemma 3.2]. For sufficiently small h_j, this estimate reduces to

$$ \lVert\tilde{x}_{j\kappa}\lVert_{C[t_{j-1},t_{j})}\leq\sqrt{2}\sigma_{\max}(\bar{\Gamma}\mathring{V})\lvert\tilde{c}\rvert=\sqrt{2}\sigma_{\max}({\Gamma} V)\lvert\tilde{c}\rvert. $$

Let x be any element of X_π and $c=\mathcal {R}^{-1}x$. Replacing $\tilde {c}$ by ${\Delta } c=\tilde {c}-c$ in the last estimate, we obtain

$$ \begin{array}{@{}rcl@{}} \lvert\tilde{x}_{\kappa}(t_{j-0})-\tilde{x}_{\kappa}(t_{j+0})\rvert & =&\lvert\tilde{x}_{\kappa}(t_{j-0})-x_{\kappa}(t_{j-0})+x_{\kappa}(t_{j+0})-\tilde{x}_{\kappa}(t_{j+0})\rvert\\ & \leq&\lVert\tilde{x}_{\kappa j}-x_{\kappa j}\rVert_{C[t_{j-1},t_{j})}+\lVert\tilde{x}_{\kappa,j+1}-x_{\pi,\kappa,j+1}\rVert_{C[t_{j},t_{j+1})}\\ & \leq&2\sqrt{2}\sigma_{\max}({\Gamma} V)\lvert{\Delta} c\rvert. \end{array} $$

Proposition 2 provides estimations for the factor $\sigma _{\max \limits }({\Gamma } V)$. In particular, for some bases, it does not depend on the polynomial degree N. __

4 Error estimation for the constrained minimization problem

The aim of this section is the derivation of bounds for perturbations of the solution c for the problem (23)–(22), that is,

$$ \begin{gathered}\varphi(z)=\lvert\mathcal{A}z-r\rvert^{2}\rightarrow\min!\\ \textrm{subject to }\mathcal{C}z=0, \end{gathered} $$

under perturbation of the data $\mathcal {A}$, $\mathcal {C}$, r. Such bounds are known for a long time, e.g., [11, 12]. However, we will provide different bounds in this section. The reason for this is that the constraint $\mathcal {C}c=0$ has an exceptional meaning in the present context: It holds $\mathcal {C}c=0$ if and only if $\mathcal {R}c\in {H_{D}^{1}}(a,b)$. If a perturbation ${\Delta }\mathcal {C}$ of $\mathcal {C}$ changes the kernel of $\mathcal {C}$, it does no longer hold $\mathcal {R}c\in {H_{D}^{1}}(a,b)$ in general! Therefore, we will consider the two cases $\ker (\mathcal {C}+{\Delta }\mathcal {C})=\ker \mathcal {C}$ and $\ker (\mathcal {C}+{\Delta }\mathcal {C})\neq \ker \mathcal {C}$ separately.

Let $\tilde {c}$ the solution of the perturbed problem

$$ \min\{\lvert(\mathcal{A}+{\Delta}\mathcal{A})z-(r+{\Delta} r)\rvert^{2}:(\mathcal{C}+{\Delta}\mathcal{C})z=0\}. $$

(36)

Then, let ${\Delta } c=c-\tilde {c}$ denote the error. We are interested in deriving an error bound on Δc in terms of the perturbations of the data.

Let us for a matrix ${\mathscr{M}}$, denote the Moore-Penrose inverse by ${\mathscr{M}}^{+}$. Moreover, let $\lVert {\mathscr{M}}\rVert $ be its spectral norm.

Let $\mathcal {D}$ be an orthonormal basis of $\ker \mathcal {C}$. Then, $P=I_{n(mN+k)}-\mathcal {C}^{+}\mathcal {C}$ is the orthogonal projector onto $\ker \mathcal {C}$ and $P\mathcal {D}=\mathcal {D}$. Some more properties are collected in the following proposition.

Proposition 3

It holds, for any matrix ${\mathscr{M}}\in \mathbb {R}^{\nu \times n(mN+k)},$$\nu \in \mathbb {N}$,

1.
$\mathcal {D}^{T}\mathcal {D}=I_{nmN+k}$ and $\mathcal {D}\mathcal {D}^{T}=P$.
2.
If $c=\mathcal {D}d,$ then $\lvert c\rvert =\lvert d\rvert $.
3.
$\lVert \mathcal {A}\mathcal {D}\rVert =\lVert \mathcal {A}P\rVert $.
4.
$(\mathcal {A}P)^{+}=\mathcal {D}(\mathcal {A}\mathcal {D})^{+}$.
5.
$\lVert (\mathcal {A}P)^{+}\rVert =\lVert (\mathcal {A}\mathcal {D})^{+}\rVert $.

The proofs are obvious. For the following, we note that the matrix $\mathcal {A}\mathcal {D}$ has full column rank [9, Proposition 1].

4.1 $\ker (\mathcal {C}+{\Delta }\mathcal {C})=\ker \mathcal {C}$

Each element c of $\ker \mathcal {C}$ has a unique representation $c=\mathcal {D}d$ with $d\in \mathbb {R}^{nmN+k}$. Therefore, (23)–(22) is equivalent to the unconstrained minimization problem

$$ \min_{d\in\mathbb{R}^{nmN+k}}\lVert\mathcal{A}\mathcal{D}d-r\rVert $$

(37)

while (36) becomes the unconstrained minimization problem

$$ \min_{d\in\mathbb{R}^{nmN+k}}\lVert(\mathcal{A}+{\Delta}\mathcal{A})\mathcal{D}d-(r+{\Delta} r)\rVert. $$

(38)

Since $\mathcal {A}\mathcal {D}$ has full column rank, standard perturbation results for unconstrained least squares problems apply. As a consequence of [16, Satz 8.2.7] and Proposition 3, we obtain

Theorem 4

Let $\omega =\lVert (\mathcal {A}P)^{+}\rVert \lVert {\Delta }\mathcal {A}P\rVert <1$. Then it holds

$$ \lvert{\Delta} c\rvert\leq\frac{\lVert(\mathcal{A}P)^{+}\rVert}{1-\omega}\left\{ \lVert{\Delta}\mathcal{A}P\rVert\left[\lvert c\rvert+\lVert(\mathcal{A}P)^{+}\rVert\lvert\mathfrak{r}\rvert+\lvert{\Delta} r\rvert\right]\right\} $$

and

$$ \begin{array}{@{}rcl@{}} \frac{\lvert{\Delta} c\rvert}{\lvert c\rvert} & \leq\frac{1}{1-\omega}\left\{ \left[\kappa_{\mathcal{C}}(\mathcal{A})+\frac{\lvert\mathfrak{r}\rvert}{\lVert\mathcal{A}P\rVert\lvert c\rvert}\kappa_{\mathcal{C}}(\mathcal{A})^{2}\right]\frac{\lVert{\Delta}\mathcal{A}P\rVert}{\lVert\mathcal{A}P\rVert}\right.\\ & \left.+\frac{\lVert(\mathcal{A}P)^{+}\rVert\lvert r\rvert}{\lvert c\rvert}\cdot\frac{\lvert{\Delta} r\rvert}{\lvert r\rvert}\right\} . \end{array} $$

Here, $\mathfrak {r}=r-\mathcal {A}c$ and

$$ \kappa_{\mathcal{C}}(\mathcal{A})=\lVert\mathcal{A}P\rVert\lVert(\mathcal{A}P)^{+}\rVert. $$

Theorem 4 corresponds to classical results for unconstrained minimization problems (e.g., [10], [13, Theorem 9.12] and is a small generalization of them. Let us emphasize that the estimation is independent of the perturbations of $\mathcal {C}$ as long as the null space of $\mathcal {C}$ is not changed by the perturbation.

Remark 2

In the case of the Legendre basis, the elements of $\mathcal {C}$ consist only of three nonzero elements being equal to 1 and − 1, respectively, possibly scaled by the stepsizes, cf. (16), (17). So we expect ${\Delta }\mathcal {C}=0$ such that the estimates of this section apply. __

4.2 $\ker (\mathcal {C}+{\Delta }\mathcal {C})\protect \neq \ker \mathcal {C}$

The estimation of the error becomes much more involved than in the previous case. In a first step, we will construct a basis for the kernel of the perturbed constraint $(\mathcal {C}+{\Delta }\mathcal {C})z=0$.

Lemma 1

Let ${\varkappa }=\lVert \mathcal {C}^{+}\rVert \lVert {\Delta }\mathcal {C}\rVert <1/2.$ Then $\mathcal {C}+{\Delta }\mathcal {C}$ has full rank and $P_{\Delta }=I_{n(mN+k)}-(\mathcal {C}+{\Delta }\mathcal {C})^{+}(\mathcal {C}+{\Delta }\mathcal {C})$ is a projector onto $\ker (\mathcal {C}+{\Delta }\mathcal {C})$. Furthermore, $\mathcal {D}_{\Delta }=P_{\Delta }\mathcal {D}$ is a basis of $\ker (\mathcal {C}+{\Delta }\mathcal {C})$. Moreover, the estimates

$$ \lVert(\mathcal{C}+{\Delta}\mathcal{C})^{+}\rVert\leq\frac{\lVert\mathcal{C}^{+}\rVert}{1-{\varkappa}} $$

and

$$ \lVert(\mathcal{C}+{\Delta}\mathcal{C})^{+}-\mathcal{C}^{+}\rVert\leq\frac{\sqrt{2}\lVert\mathcal{C}^{+}\rVert^{2}}{1-{\varkappa}}\lVert{\Delta}\mathcal{C}\rVert $$

hold true.

Proof

The proposition of $\mathcal {C}+{\Delta }\mathcal {C}$ having full rank as well as the error estimates follow from [16, Satz 8.2.5].

For showing that $\mathcal {D}_{\Delta }$ is a basis of $\ker (\mathcal {C}+{\Delta }\mathcal {C})$ consider

$$ \begin{array}{@{}rcl@{}} (I-P_{\Delta})P & =&(\mathcal{C}+{\Delta}\mathcal{C})^{+}(\mathcal{C}+{\Delta}\mathcal{C})(I-\mathcal{C}^{+}\mathcal{C})\\ & =&(\mathcal{C}+{\Delta}\mathcal{C})^{+}{\Delta}\mathcal{C}(I-\mathcal{C}^{+}\mathcal{C}). \end{array} $$

It holds

$$ \lVert(I-P_{\Delta})P\rVert\leq\lVert(\mathcal{C}+{\Delta}\mathcal{C})^{+}\rVert\lVert{\Delta}\mathcal{C}\rVert\leq\frac{\lVert\mathcal{C}^{+}\rVert}{1-{\varkappa}}\lVert{\Delta}\mathcal{C}\rVert\leq\frac{{\varkappa}}{1-{\varkappa}}<1. $$

Therefore, the assumptions of [17, Theorem I-6.34] are fulfilled. Since $\dim \ker (\mathcal {C}+{\Delta }\mathcal {C})=\dim \ker \mathcal {C},$the first alternative of that theorem applies and P_Δ is a one-to-one mapping of $\ker \mathcal {C}$ onto $\ker (\mathcal {C}+{\Delta }\mathcal {C})$. Hence, $\mathcal {D}_{\Delta }$ is a basis of the latter space. □

By using the bases $\mathcal {D}$ and $\mathcal {D}_{\Delta }$, the unperturbed and the perturbed least squares problems become (37) and

$$ \min_{d\in\mathbb{R}^{nmN+k}}\lVert(\mathcal{A}+{\Delta}\mathcal{A})\mathcal{D}_{\Delta}d-(r+{\Delta} r)\rVert. $$

(39)

In a first step, the deviations of the bases shall be estimated. It holds

$$ \begin{array}{@{}rcl@{}} P_{\Delta}-P & =&\mathcal{C}^{+}\mathcal{C}-(\mathcal{C}+{\Delta}\mathcal{C})^{+}(\mathcal{C}+{\Delta}\mathcal{C})\\ & =&\mathcal{C}^{+}\mathcal{C}-(\mathcal{C}+{\Delta}\mathcal{C})^{+}\mathcal{C}-(\mathcal{C}+{\Delta}\mathcal{C})^{+}{\Delta}\mathcal{C}\\ & =&\left[\mathcal{C}^{+}-(\mathcal{C}+{\Delta}\mathcal{C})^{+}\right]\mathcal{C}-(\mathcal{C}+{\Delta}\mathcal{C})^{+}{\Delta}\mathcal{C}. \end{array} $$

Invoking Lemma 1, we obtain^{Footnote 4}

$$ \lVert P_{\Delta}-P\rVert\leq\left[\frac{\sqrt{2}\lVert\mathcal{C}^{+}\rVert^{2}}{1-{\varkappa}}\lVert\mathcal{C}\rVert+\frac{\lVert\mathcal{C}^{+}\rVert}{1-{\varkappa}}\right]\lVert{\Delta}\mathcal{C}\rVert=\frac{\lVert\mathcal{C}^{+}\rVert}{1-{\varkappa}}\left[\sqrt{2}\kappa(\mathcal{C})+1\right]\lVert{\Delta}\mathcal{C}\rVert $$

with $\kappa (\mathcal {C})=\lVert \mathcal {C}^{+}\rVert \lVert \mathcal {C}\rVert $. Consequently,

$$ \lVert\mathcal{D}_{\Delta}-\mathcal{D}\rVert=\lVert(P_{\Delta}-P)\mathcal{D}\rVert\leq\lVert P_{\Delta}-P\rVert\lVert\mathcal{D}\rVert\leq\frac{\lVert\mathcal{C}^{+}\rVert}{1-{\varkappa}}\left[\sqrt{2}\kappa(\mathcal{C})+1\right]\lVert{\Delta}\mathcal{C}\rVert. $$

(40)

Let us transform (39) now. It holds

$$ \begin{array}{@{}rcl@{}} (\mathcal{A}+{\Delta}\mathcal{A})\mathcal{D}_{\Delta} & =(\mathcal{A}+{\Delta}\mathcal{A})\mathcal{D}+(\mathcal{A}+{\Delta}\mathcal{A})(\mathcal{D}_{\Delta}-\mathcal{D})\\ & =\mathcal{A}\mathcal{D}+\mathfrak{R} \end{array} $$

where $\mathfrak {R}={\Delta }\mathcal {A}\mathcal {D}+(\mathcal {A}+{\Delta }\mathcal {A})(\mathcal {D}_{\Delta }-\mathcal {D})$. The representation of $\mathfrak {R}$ provides the estimate

$$ \lVert\mathfrak{R}\rVert\leq\lVert{\Delta}\mathcal{A}P\rVert+\lVert\mathcal{A}+{\Delta}\mathcal{A}\rVert\frac{\lVert\mathcal{C}^{+}\rVert}{1-{\varkappa}}\left[\sqrt{2}\kappa(\mathcal{C})+1\right]\lVert{\Delta}\mathcal{C}\rVert. $$

(41)

Denote $\omega _{\Delta }=\lVert (\mathcal {A}P)^{+}\rVert \lVert \mathfrak {R}\rVert $. The condition ω_Δ < 1 is obviously fulfilled if

$$ \lVert(\mathcal{A}P)^{+}\rVert\left\{ \lVert{\Delta}\mathcal{A}P\rVert+\lVert\mathcal{A}+{\Delta}\mathcal{A}\rVert\frac{\lVert\mathcal{C}^{+}\rVert}{1-{\varkappa}}\left[\sqrt{2}\kappa(\mathcal{C})+1\right]\lVert{\Delta}\mathcal{C}\rVert\right\} <1. $$

(42)

Let d + Δd be the solution of (39). Using the fact that $\mathcal {A}\mathcal {D}$ has full rank, Theorem 8.2.7 of [16] provides the estimates

$$ \lvert{\Delta} d\rvert\leq\frac{\lVert(\mathcal{A}P)^{+}\rVert}{1-\omega_{\Delta}}\left\{ \lVert\mathfrak{R}\rVert\left[\vert d\rvert+\lVert(\mathcal{A}P)^{+}\rVert\lvert\mathfrak{r}\rvert\right]+\lvert{\Delta} r\rvert\right\} $$

(43)

and

$$ \begin{array}{@{}rcl@{}} \frac{\lvert{\Delta} d\rvert}{\lvert d\vert}\leq & \frac{1}{1-\omega_{\Delta}}\left\{ \left[\kappa_{\mathcal{C}}(\mathcal{A})+\frac{\lvert\mathfrak{r}\rvert}{\lVert\mathcal{A}\mathcal{D}\rVert\lvert d\rvert}\kappa_{\mathcal{C}}(\mathcal{A})^{2}\right]\frac{\lVert\mathfrak{R}\rVert}{\lVert\mathcal{A}\mathcal{D}\rVert}\right. \\ & \left.+\frac{\lVert(\mathcal{A}\mathcal{D})^{+}\rVert\lvert r\rvert}{\rvert d\rvert}\cdot\frac{\vert{\Delta} r\rvert}{\lvert r\rvert}\right\} . \end{array} $$

(44)

with $\mathfrak {r}=r-\mathcal {A}c$.

Theorem 5

Let $\lVert {\Delta }\mathcal {A}\rVert $ and $\lVert {\Delta }\mathcal {C}\rVert $ be sufficiently small such that (42) and ${\varkappa }=\lVert \mathcal {C}^{+}\lVert \rVert {\Delta }\mathcal {C}\rVert <1/2$ hold true. Then it holds

$$ \lvert{\Delta} c\rvert\leq\frac{\lVert(\mathcal{A}P)^{+}\rVert}{1-\omega_{\Delta}}\left\{ \lVert\mathfrak{R}\rVert\left[\lvert c\rvert+\lVert(\mathcal{A}P)^{+}\rVert\lvert\mathfrak{r}\lvert\right]+\lvert{\Delta} r\rvert\right\} +\frac{\lVert\mathcal{C}^{+}\rVert}{1-{\varkappa}}\left[\sqrt{2}\kappa(\mathcal{C})+1\right]\lVert{\Delta}\mathcal{C}\rVert\lvert c\rvert $$

and

$$ \begin{array}{@{}rcl@{}} \frac{\lvert{\Delta} c\rvert}{\lvert c\rvert}&\leq & \frac{1}{1-\omega_{\Delta}}\left\{ \left[\kappa_{\mathcal{C}}(\mathcal{A})+\frac{\lvert\mathfrak{r}\rvert}{\lVert\mathcal{A}P\rVert\lvert c\rvert}\kappa_{\mathcal{C}}(\mathcal{A})^{2}\right]\frac{\lVert\mathfrak{R}\rVert}{\lVert\mathcal{A}P\rVert}\right.\\ && \qquad\qquad\left.+\frac{\lVert(\mathcal{A}P)^{+}\rVert\lvert r\rvert}{\lvert c\rvert}\cdot\frac{\vert{\Delta} r\rvert}{\lvert r\rvert}\right\} +\frac{\lVert\mathcal{C}^{+}\rVert}{1-{\varkappa}}\left[\sqrt{2}\kappa(\mathcal{C})+1\right]\lVert{\Delta}\mathcal{C}\rVert. \end{array} $$

Proof

It holds $c=\mathcal {D}d$ and ${\Delta } c=\mathcal {D}_{\Delta }{\Delta } d+(\mathcal {D}_{\Delta }-\mathcal {D})d$ such that $\lvert {\Delta } c\rvert \leq \lvert {\Delta } d\rvert +\lVert P_{\Delta }-P\rVert \lvert d\rvert $. Inserting this estimate in (43) and (44) and using $\lvert c\rvert =\lvert \mathcal {D}d\rvert =\lvert d\rvert $ provides the claim. □

Remark 3

$\lvert \mathfrak {r}\lvert $ is a measure for the accuracy of the discrete solution. Let x_π ∈ X_π denote the discrete solution obtained by minimizing Φ_π,M (8). Its representation becomes $c=\mathcal {R}^{-1}x_{\pi }$. Then it holds $\lvert \mathfrak {r}\rvert ^{2}=\lvert \mathcal {A}c-r\rvert ^{2}={\Phi }_{\pi ,M}(x_{\pi })$. Hence, Φ_π,M(x_π) ≤ 2(Φ_π,M(x_∗) + Φ_π,M(x_π − x_∗)). Under the conditions of Theorem 1, it holds, therefore, $\lvert \mathfrak {r}\rvert \leq ch^{N-\mu +1}$. __

The critical quantities to estimate the influence of perturbations are $\kappa _{\mathcal {C}}(\mathcal {A})$ and $\lVert \mathcal {C}^{+}\rVert $, $\kappa (\mathcal {C})$ as well as $\lVert (\mathcal {A}P)^{+}\rVert $. The norms of $\mathcal {C}$ and its pseudoinverse depend only on the choice of X_π and the basis chosen for it, but not on the DAE. It holds $\lVert \mathcal {C}\rVert =\sigma _{\max \limits }(\mathcal {C})$ and $\lVert \mathcal {C}^{+}\rVert =\sigma _{\min \limits }(\mathcal {C})^{-1}$ with $\sigma _{\min \limits }(\mathcal {C})$ being the smallest nonvanishing singular value of $\mathcal {C}$. Since $\mathcal {C}$ has full row rank, $\sigma _{\min \limits }(\mathcal {C})=\left (\lambda _{\min \limits }(\mathcal {C}\mathcal {C}^{T})\right )^{1/2}$ and $\sigma _{\max \limits }(\mathcal {C})=\left (\lambda _{\max \limits }(\mathcal {C}\mathcal {C}^{T})\right )^{1/2}$.

With $\mathcal {C}$ from (22) we observe that

$$ \mathcal{C}={\Pi}_{1}\left[I_{k}\otimes\mathcal{C}_{\mathrm{s}}\vert\mathcal{O}_{\mathrm{s}}\right]{\Pi}_{2} $$

with

$$ \mathcal{C}_{\mathrm{s}}=\begin{bmatrix}\bar{\mathcal{P}}_{1}(t_{1}) & -\bar{\mathcal{P}}{}_{2}(t_{1})\\ & \bar{\mathcal{P}}_{2}(t_{2}) & -\bar{\mathcal{P}}_{3}(t_{2})\\ & & {\ddots} & \ddots\\ & & & {\ddots} & \ddots\\ & & & & \bar{\mathcal{P}}_{n-1}(t_{n-1}) & -\bar{\mathcal{P}}_{n}(t_{n-1}) \end{bmatrix}\in\mathbb{R}^{(n-1)\times n(N+1)} $$

and $\mathcal {O}_{\mathrm {s}}\in \mathbb {R}^{k(n-1)\times nN(m-k)}$ consists entirely of zero elements. The permutation matrices π₁ and π₂ are constructed as follows: Let $x=[x_{1},x_{2},\ldots ,x_{m}]^{T}\in \tilde {X}_{\pi }$. First, the equations in $\mathcal {C}c=0$ are reordered such that first all equations related to the first component x₁, then those of x₂, and so on until x_k are available. This reordering is expressed via π₁. The column permutation π₂ reorders the coefficients such that the ones describing the differential components are taken first, and then the ones belonging to the algebraic components. In particular, the coefficients c^κ describing x_κ are given by c^κ = [c_1κ0,c_1,κ1,…,c_1κN,c_2κ0,…,c_nκN]^T. Then we have

$$ \begin{array}{@{}rcl@{}} \mathcal{C}\mathcal{C}^{T} & ={\Pi}_{1}\left[I_{k}\otimes\mathcal{C}_{\mathrm{s}}\vert\mathcal{O}_{\mathrm{s}}\right]{\Pi}_{2}{{\Pi}_{2}^{T}}\left[\begin{array}{c} I_{k}\otimes\mathcal{C}_{\mathrm{s}}^{T}\\ \mathcal{O}_{\mathrm{s}}^{T} \end{array}\right]{\Pi}_{1}^{T}={\Pi}_{1}(I_{k}\otimes\mathcal{C}_{\mathrm{s}}\mathcal{C}_{\mathrm{s}}^{T}){{\Pi}_{1}^{T}}, \end{array} $$

(45)

Using (16) and (17), it holds

$$ \mathcal{C}_{\mathrm{s}}=C_{\mathrm{s}}{\text{diag}}(h_{1},\ldots,h_{n}) $$

with

$$ C_{\mathrm{s}}=\left[\begin{array}{cccccc} f & -{e_{1}^{T}}\\ & f & -{e_{1}^{T}}\\ & & {\ddots} & \ddots\\ \\ & & & & f & -{e_{1}^{T}} \end{array}\right] $$

(46)

where e₁ is the first unit vector and $f=[1,{{\int \limits }_{0}^{1}}p_{0}(\sigma ){d}\sigma ,\ldots ,{{\int \limits }_{0}^{1}}p_{N-1}(\sigma ){d}\sigma ]$. This leads to

$$ \mathcal{C}_{\mathrm{s}}\mathcal{C}_{\mathrm{s}}^{T}=\left[\begin{array}{ccccc} {h_{1}^{2}}\lvert f\rvert^{2}+{h_{2}^{2}} & -{h_{2}^{2}}\\ -{h_{2}^{2}} & {h_{2}^{2}}\lvert f\rvert^{2}+{h_{3}^{2}} & -{h_{3}^{2}}\\ & {\ddots} & {\ddots} & \ddots\\ \\ & & & -h_{n-1}^{2} & h_{n-1}^{2}\lvert f\rvert^{2}+{h_{n}^{2}} \end{array}\right]. $$

(47)

The eigenvalues of $\mathcal {C}\mathcal {C}^{T}$ are those of (47).

Proposition 4

Let the grid (3) have the maximal stepsize h and the minimal stepsize $h_{\min \limits }$. Then it holds

(1):: $\lvert f\rvert >1$.
(2):: $0<h_{\min \limits }^{2}(\lvert f\rvert ^{2}-1)\leq \lambda _{\min \limits }(\mathcal {C}_{s}\mathcal {C}_{s}^{T})$ and $\lambda _{\max \limits }(\mathcal {C}_{s}\mathcal {C}_{s}^{T})\leq h^{2}(\lvert f\rvert ^{2}+3)$.

Proof

Since the first component of f is equal to 1, we have $\lvert f\rvert \geq 1$ and $\lvert f\rvert =1$ if and only if ${{\int \limits }_{0}^{1}}p_{0}(\sigma ){d}\sigma =\cdots ={{\int \limits }_{0}^{1}}p_{N-1}(\sigma ){d}\sigma =0$. Assume that the latter condition holds true. This means in particular that p₀,…,p_N− 1 are orthogonal to the polynomial $p(\tau )\equiv 1\in \mathfrak {P}_{N-1}$. The latter space has dimension N. Since $p_{0},\ldots ,p_{N-1}\in \mathfrak {P}_{N-1}$ are N polynomials being orthogonal to p, they must be linearly dependent in contradiction to the assumption that they form a basis. This proves (1).

In order to prove (2), we observe that $\mathcal {C}_{s}\mathcal {C}_{s}^{T}$ is symmetric such that all eigenvalues are real. Invoking Gershgorin’s circle theorem [16, Theorem 1.2.10], the eigenvalues λ of $\mathcal {C}_{s}\mathcal {C}_{s}^{T}$ fulfill

$$ \min_{j=1,\ldots,n-1}{h_{j}^{2}}(\lvert f\rvert^{2}-1)\leq\lambda\leq\max_{j=1,\ldots,n-1}{h_{j}^{2}}(\lvert f\rvert^{2}+1)+2h_{j+1}^{2}. $$

This proves (2). □

We obtain immediately the following corollary. Note that f depends only on N and the chosen basis, but not on the grid.

Corollary 1

Let the grids (3) be quasiuniform, that is $h/h_{\min \limits }\leq \rho <\infty $ with ρ independent of π. Then it holds $\kappa (\mathcal {C})\leq \rho \left (\frac {\lvert f\rvert ^{2}+3}{\lvert f\rvert ^{2}-1}\right )^{1/2}$ and $\lVert \mathcal {C}^{+}\rVert \leq h_{\min \limits }(\lvert f\rvert ^{2}-1)^{-1/2}$.

For constant stepsize h, we have $\mathcal {C}_{\mathrm {s}}\mathcal {C}_{\mathrm {s}}^{T}=h^{2}C_{s}{C_{s}^{T}}$, which is a Toeplitz tridiagonal matrix. In this case, the eigenvalues of $C_{s}{C_{s}^{T}}$ are given by [18, Theorem 2.2]

$$ \lambda_{j}=1+\lvert f\rvert^{2}-2\cos\left( \frac{j\pi}{n}\right),\quad j=1,\ldots,n-1. $$

(48)

Proposition 5

Let the grid (3) be equidistant with stepsize h, and C_s be given by (46). Then it holds

For the Legendre basis $1\leq \lambda _{\min \limits }(C_{s}{C_{s}^{T}})\leq \lambda _{\max \limits }(C_{s}{C_{s}^{T}})\leq 5$;
For the modified Legendre basis $2N\leq \lambda _{\min \limits }(C_{s}{C_{s}^{T}})\leq \lambda _{\max \limits }(C_{s}{C_{s}^{T}})\leq 2N+6$;
For the Chebyshev basis $1\leq \lambda _{\min \limits }(C_{s}{C_{s}^{T}})\leq \lambda _{\max \limits }(C_{s}{C_{s}^{T}})\leq 4+2\ln 2$.
For the Runge-Kutta basis assume additionally that ${{\int \limits }_{0}^{1}}p_{i}(\sigma ){d}\sigma \geq 0$, i = 0,1,…,N − 1. Then $N^{-1}\leq \lambda _{\min \limits }(C_{s}{C_{s}^{T}})\leq \lambda _{\max \limits }(C_{s}{C_{s}^{T}})\leq 5$.

Proof

In the case of the Legendre basis, it holds f = [1,1,0,…,0]. Hence, $\lvert f\rvert ^{2}=2$ such that the statement follows.

For the modified Legendre basis, we have f = [1,2,0,2,0,…] such that

$$ \lvert f\rvert^{2}=\begin{cases} 2N+1, & N~\text{even},\\ 2N+3, & N~\text{odd}. \end{cases} $$

For the Chebyshev basis, we observe

$$ {{\int}_{0}^{1}}p_{i}(\sigma){d}\sigma=\begin{cases} \frac{1}{2}\frac{1+(-1)^{i}}{1-i^{2}}, & i\neq1,\\ 0, & i=1. \end{cases} $$

This leads to $f=[1,1,0,-\frac {1}{3},0,-\frac {1}{8},0,\ldots ]$. Hence,

$$ 2\leq\lvert f\rvert^{2}\leq2+\sum\limits_{i=1}^{\infty}\left( \frac{1}{1-(2i)^{2}}\right)^{2}\leq2+{\sum}_{i=1}^{\infty}\frac{1}{i(4i^{2}-1)}=2+2\ln2-1. $$

For the sum of the series, cf. [19, p. 269, series 110.d]. This provides the estimate for the Chebyshev basis.

In case of the Runge-Kutta basis, it holds ${\sum }_{i=0}^{N-1}p_{i}(\sigma )\equiv 1$. With f = [1,f₂,…,f_N+ 1] it holds then f_i ≥ 0 and ${\sum }_{i=2}^{N+1}f_{i}=1$. Hence,

$$ \frac{1}{N}=\frac{1}{N}\left( \sum\limits_{i=2}^{N+1}f_{i}\right)^{2}\leq\sum\limits_{i=2}^{N+1}{f_{i}^{2}}\leq\sum\limits_{i=2}^{N+1}f_{i}=1. $$

This yields $1+N^{-1}\leq \lvert f\rvert ^{2}\leq 2$ and the claim follows. □

Remark 4

For the Runge-Kutta basis, the values $f_{i}={{\int \limits }_{0}^{1}}p_{i-1}(\sigma ){d}\sigma $ are just the weights of the interpolatory quadrature rule corresponding to the nodes τ₁,…,τ_N of (18). For a number of common choices of nodes, these weights are known to be positive. Examples are the Gauss-Legendre nodes, Radau nodes, and Lobatto nodes [20, Section 2.7]. It holds also true for Chebyshev nodes and many others; see, e.g., [20, pp. 85f]. __

Note that the claims of Proposition 5 could also be shown using Gershgorin’s theorem. This indicates that the estimates of Proposition 4 are rather tight.

Corollary 2

For equidistant grids (3), it holds

For the Legendre basis $\kappa (\mathcal {C})\leq \sqrt {5}$ and $\lVert \mathcal {C}^{+}\rVert \leq h^{-1}$;
For the modified Legendre basis $\kappa (\mathcal {C})\leq \left (\frac {2N+6}{2N}\right )^{1/2}$ and $\lVert \mathcal {C}^{+}\rVert \leq (2N)^{-1/2}h^{-1}$;
For the Chebyshev basis $\kappa (\mathcal {C})\leq (4+2\ln 2)^{1/2}\approx 2.32$ and $\lVert \mathcal {C}^{+}\rVert \leq h^{-1}$.
For the Runge-Kutta basis $\kappa (\mathcal {C})\leq (5N)^{1/2}$ and $\lVert \mathcal {C}^{+}\rVert \leq N^{1/2}h^{-1}$ provided that ${{\int \limits }_{0}^{1}}p_{i}(\sigma ){d}\sigma \geq 0$, i = 0,1,…,N − 1.

It should be emphasized again that, if $\ker (\mathcal {C}+{\Delta }\mathcal {C})\neq \ker \mathcal {C}$, it cannot be guaranteed that the solution of the perturbed problem $\mathcal {R}(c+{\Delta } c)$ belongs to X_π. Instead, it belongs to $\tilde {X}_{\pi }$, only. Simple projection algorithms of elements of $\tilde {X}_{\pi }$ onto X_π can be derived. In our experiments so far, these projections did not lead to a better accuracy than the unprojected numerical solutions.

5 Some examples

5.1 Conditioning of the representation map $\mathcal {R}$

For each selection {p₀,…,p_N− 1} of basis polynomials, the conditioning of the representation map depends both on the grid and on N. For simplicity, we assume here that an equidistant grid with stepsize h is used for defining X_π. Besides the bases introduced before, we will additionally consider the Runge-Kutta basis with uniform interpolation points as used in our very first paper on the subject [1].

The norms of the representation map and its inverse have been computed for both settings (mapping into $L^{2}((a,b),\mathbb {R}^{m})$ and ${H_{D}^{1}}(a,b)$) and for polynomial degrees N = 3,5,10,20 and h = n^− 1 where N = 10,20,40,80,160,320. These are the first observations:

$\sigma _{\min \limits }(\hat {\mathcal {U}})$ is independent of the chosen basis and independent of N for h ≤ 0.1. However, this is not true for larger stepsizes, cf. Table 2.
For every basis, $\sigma _{\max \limits }(\mathcal {U})\approx \sigma _{\max \limits }(\hat {\mathcal {U}})$ up to a relative error below 10^− 3. This coincides with the findings of Theorem 3.

In Tables 1, 2, 3, 4, 5, and 6, we present more detailed results. From these tables, we can draw the following conclusions:

The asymptotic behavior with respect to the stepsize h as indicated in Theorem 3 is clearly visible.
For both the Legendre and the Chebyshev bases, $\sigma _{\max \limits }(\mathcal {U})$ and $\sigma _{\max \limits }(\hat {\mathcal {U}})$ do not depend on N. This is reasonable for the Legendre basis if Proposition 2 is taken into account.
The asymptotics of $\sigma _{\min \limits }(\mathcal {U})$ coincides with the results of Theorem 3 and Proposition 2 for the modified Legendre basis.
The norm of the representation map behaves similarly for all considered bases. Not unexpectedly, an exception is the Runge-Kutta basis for uniform nodes, which has a much larger norm than that for other bases. When comparing $\sigma _{\min \limits }(\mathcal {U})$ and $\sigma _{\max \limits }(\mathcal {U})$ for different bases, we observe that the difference between the Legendre basis and the Chebyshev basis on one hand and the modified Legendre basis on the other hand it seems that they have different scaling only, but their conditioning (being the product of the norms of the representation map and its inverse) is similar. A similar property holds for $\hat {\mathcal {U}}$.
The Runge-Kutta basis has surprisingly good properties. However, this property depends on the representation with respect to an orthogonal polynomial basis (in the present example, Chebyshev polynomials). Thus, it is much more expensive to work with it compared to using Legendre or Chebyshev bases directly. __

Table 1 $\sigma _{\min \limits }(\hat {\mathcal {U}})$

Full size table

Table 2 $\sigma _{\min \limits }(\mathcal {\hat {U}})$. The column headings denote the Legendre basis (L), the modified Legendre basis (mL), the Chebyshev basis (Ch), the Runge-Kutta basis (RK), and the Runge-Kutta basis with uniform nodes (RKu)

Full size table

Table 3 $\sigma _{\min \limits }(\mathcal {U})$. The column headings denote the Legendre basis (L), the modified Legendre basis (mL), the Chebyshev basis (Ch), the Runge-Kutta basis (RK), and the Runge-Kutta basis with uniform nodes (RKu)

Full size table

Table 4 $\sigma _{\max \limits }(\hat {\mathcal {U}})=\sigma _{\max \limits }(\mathcal {U})$. The column headings denote the Legendre basis (L), the modified Legendre basis (mL), the Chebyshev basis (Ch), the Runge-Kutta basis (RK), and the Runge-Kutta basis with uniform nodes (RKu)

Full size table

Table 5 $\kappa (\hat {\mathcal {U}})=\sigma _{\max \limits }(\hat {\mathcal {U}})/\sigma _{\min \limits }(\hat {\mathcal {U}})$. The column headings denote the Legendre basis (L), the modified Legendre basis (mL), the Chebyshev basis (Ch), the Runge-Kutta basis (RK), and the Runge-Kutta basis with uniform nodes (RKu)

Full size table

Table 6 $\kappa (\mathcal {U})=\sigma _{\max \limits }(\mathcal {U})/\sigma _{\min \limits }(\mathcal {U})$. The column headings denote the Legendre basis (L), the modified Legendre basis (mL), the Chebyshev basis (Ch), the Runge-Kutta basis (RK), and the Runge-Kutta basis with uniform nodes (RKu)

Full size table

5.2 Conditioning of the constrained minimization problems

In order to provide a first insight into the conditioning of the constrained minimization problem (23)–(22), we computed the condition numbers $\kappa _{\mathcal {C}}(\mathcal {A})$ which have crucial importance for the behavior of the computational error. Discussions of $\kappa (\mathcal {C})$ and $\lVert \mathcal {C}^{+}\rVert $ have been provided earlier (Proposition 5 and Corollary 2). The examples below are chosen from our earlier investigations that led to surprisingly accurate results.

As done before, we use the bases as introduced in Section 5.1. We abandon the use the Runge-Kutta basis with uniform nodes since this basis has a bad conditioning. We choose M = N + 1 and the Gauss-Legendre nodes as collocation points (6). For this choice, ${\Phi }_{\pi ,M}^{R}={\Phi }_{\pi ,M}^{I}$ (see (12), (11)) and $\kappa _{\mathcal {C}}(\mathcal {A})$ is identical for both choices.

Example 1

The first example is an index-3 DAE without dynamic degrees of freedom. It has been used before in numerous papers, e.g., [1, 2, 8]. The problem is given by

$$ \begin{array}{@{}rcl@{}} x^{\prime}_{2}(t)+x_{1}(t) & =&q_{1}(t),\\ t\eta x^{\prime}_{2}(t)+x^{\prime}_{3}(t)+(\eta+1)x_{2}(t) & =&q_{2}(t),\\ t\eta x_{2}(t)+x_{3}(t) & =&7q_{3}(t),\quad t\in[0,1]. \end{array} $$

For unique solvability, no boundary or initial conditions are necessary. We choose the exact solution

$$ \begin{array}{@{}rcl@{}} x_{\ast,1}(t) & =e^{-t}\sin t,\\ x_{\ast,2}(t) & =e^{-2t}\sin t,\\ x_{\ast,3}(t) & =e^{-t}\cos t \end{array} $$

and adapt the right-hand side q accordingly. In Table 7, the values of $\kappa _{\mathcal {C}}(\mathcal {A})$ for ${\Phi }_{\pi ,M}^{R}$ and ${\Phi }_{\pi ,M}^{C}$ are provided. It turns out that the behavior for different functionals is comparable. Therefore, in the following examples, we present only the values for ${\Phi }_{\pi ,M}^{R}$.

Table 7 $\kappa _{\mathcal {C}}(\mathcal {A})$ for ${\Phi }_{\pi ,M}^{R}$ and ${\Phi }_{\pi ,M}^{C}$. Here,L denotes the Legendre basis, mL the modified Legendre basis, Ch the Chebyshev basis, and RK the Runge-Kutta basis. The smallest values are set in boldface

Full size table

Example 2

We continue with an example of a Hessenberg index-2 system used previously in [1]. Consider the DAE system

$$ \begin{array}{@{}rcl@{}} x^{\prime}_{1}(t)+\lambda x_{1}(t)-x_{2}(t)-x_{3}(t) & =&q_{1}(t),\\ x^{\prime}_{2}(t)+(\eta t(1-\eta t)-\eta)x_{1}(t)+\lambda x_{2}(t)-\eta tx_{3}(t) & =&q_{2}(t),\\ (1-\eta t)x_{1}(t)+x_{2}(t) & =&q_{3}(t),\quad t\in[0,1], \end{array} $$

with the right-hand side q chosen in such a way that

$$ \begin{array}{@{}rcl@{}} x_{1}(t) & =e^{-t}\sin t,\\ x_{2}(t) & =e^{-2t}\sin t,\\ x_{3}(t) & =e^{-t}\cos t, \end{array} $$

is a solution. It has one dynamical degree of freedom. We choose the special condition

$$ x_{1}(0)=0. $$

The results for η = − 25 and λ = − 1 are provided in Table 8.

Table 8 $\kappa _{\mathcal {C}}(\mathcal {A})$ for ${\Phi }_{\pi ,M}^{R}$. Here, L denotes the Legendre basis, mL the modified Legendre basis, Ch the Chebyshev basis, and RK the Runge-Kutta basis. The smallest values are set in boldface

Full size table

Example 3

Our next example is a linearized problem proposed by Campbell and More [21]. It has been used previously in the experiments in [2, 8, 9] and others. Let

$$ A(Dx)'(t)+B(t)x(t)=q(t),\quad t\in[0,5], $$

where

$$ \begin{array}{@{}rcl@{}} A=\begin{bmatrix}1 & 0 & 0 & 0 & 0 & 0\\ 0 & 1 & 0 & 0 & 0 & 0\\ 0 & 0 & 1 & 0 & 0 & 0\\ 0 & 0 & 0 & 1 & 0 & 0\\ 0 & 0 & 0 & 0 & 1 & 0\\ 0 & 0 & 0 & 0 & 0 & 1\\ 0 & 0 & 0 & 0 & 0 & 0 \end{bmatrix},D=\begin{bmatrix}1 & 0 & 0 & 0 & 0 & 0 & 0\\ 0 & 1 & 0 & 0 & 0 & 0 & 0\\ 0 & 0 & 1 & 0 & 0 & 0 & 0\\ 0 & 0 & 0 & 1 & 0 & 0 & 0\\ 0 & 0 & 0 & 0 & 1 & 0 & 0\\ 0 & 0 & 0 & 0 & 0 & 1 & 0 \end{bmatrix}, \end{array} $$

$$ \begin{array}{@{}rcl@{}} B(t)=\begin{bmatrix}0 & 0 & 0 & -1 & 0 & 0 & 0\\ 0 & 0 & 0 & 0 & -1 & 0 & 0\\ 0 & 0 & 0 & 0 & 0 & -1 & 0\\ 0 & 0 & \sin t & 0 & 1 & -\cos t & -2\rho\cos^{2}t\\ 0 & 0 & -\cos t & -1 & 0 & -\sin t & -2\rho\sin t\cos t\\ 0 & 0 & 1 & 0 & 0 & 0 & 2\rho\sin t\\ 2\rho\cos^{2}t & 2\rho\sin t\cos t & -2\rho\sin t & 0 & 0 & 0 & 0 \end{bmatrix},\quad\rho=5, \end{array} $$

subject to the initial conditions

$$ x_{2}(0)=1,\quad x_{3}(0)=2,\quad x_{5}(0)=0,\quad x_{6}(0)=0. $$

This problem has index 3 and dynamical degree of freedom l_dyn = 4. The right-hand side q has been chosen in such a way that the exact solution becomes

$$ \begin{array}{@{}rcl@{}} x_{\ast,1} & =&\sin t, \quad\qquad ~x_{\ast,4} =\cos t,\\ x_{\ast,2} & =&\cos t, \qquad\quad x_{\ast,5} =-\sin t,\\ x_{\ast,3} & =&2\cos^{2}t, \qquad x_{\ast,6} =-2\sin2t,\\ x_{\ast,7} & =&-\rho^{-1}\sin t. \end{array} $$

The results are shown in Table 9. Note that, in the present example, h = 5/n in contrast to all previous computations where h = 1/n. □

Table 9 $\kappa _{\mathcal {C}}(\mathcal {A})$ for ${\Phi }_{\pi ,M}^{R}$. Here, L denotes the Legendre basis, mL the modified Legendre basis, Ch the Chebyshev basis, and RK the Runge-Kutta basis. The smallest values are set in boldface

Full size table

The numerical experiments give rise to the following observations:

The condition numbers of the discrete problem have almost the same size for given polynomial degree N and stepsize h.
The experiments indicate that the Runge-Kutta basis seems to provide the lowest condition number for smaller stepsizes. In the case of higher order ansatz functions and larger stepsizes, the modified Legendre basis seems to provide the smallest condition numbers.
In order to obtain a complete picture of the relative merits of the different bases, in the case discussed in Theorem 5, not only the condition number $\kappa (\mathcal {C})$ of $\mathcal {C}$ but the term $\lVert \mathcal {C}^{+}\rVert \kappa (\mathcal {C})$ has to be taken into account. Corollary 2 shows that the modified Legendre basis is well-suited for higher orders N.
If the perturbed solution $\tilde {c}$ of (36) is projected back onto the nullspace $\ker \mathcal {C}$, we can assume that the conditions of Theorem 4 are fulfilled. In this case, $\mathcal {C}$ does not have any influence on the error estimation.

6 Conclusions

In this paper, we investigated the conditioning of the discrete problems arising in the least-squares collocation method for DAEs. In particular, the solution algorithm has been split into a representation mapping that connects the coefficients of the basis representation to the function to be represented, and a linearly equality constrained linear least-squares problem. A careful investigation of the representation map allowed for a characterization of errors in the function spaces by those made in the solution of the discrete problem.

The perturbation estimates for the constrained least-squares problem have been derived with the application in mind: the approximation of a DAE. The constraints play an exceptional role. If they are satisfied, the resulting numerical solution belongs to the solution space ${H_{D}^{1}}(a,b)$. If this cannot be guaranteed, the convergence theory for the least-squares method does not apply. Some of the characterizing quantities could be estimated analytically for reasonable choices of bases while others have been estimated numerically in certain examples. We believe that these considerations contribute to a robust and efficient implementation of the proposed method, which seems to provide surprisingly accurate numerical solutions to higher-index DAEs.

Notes

⊗ denotes the Kronecker product.
In the case of the Legendre and modified Legendre bases, all entries are integers weighted by the stepsizes.
In [1], the stiffness matrix is scaled to the interval (− 1, 1) in contrast to the interval (0,1) used here. Therefore, an additional factor of 1/2 appears in the present estimations.
In case that $\ker (\mathcal {C}+{\Delta }\mathcal {C})=\ker \mathcal {C}$ we obtain $\mathcal {P}_{\Delta }-\mathcal {P}=0$ and $\mathcal {D}_{\Delta }=\mathcal {D}$ such that the present estimations coincide with those of the previous section.

References

Hanke, M., März, R., Tischendorf, C., Weinmüller, E., Wurm, S.: Least-squares collocation for linear higher-index differential-algebraic equations. J. Comput. Appl. Math. 317, 403–431 (2017). https://doi.org/10.1016/j.cam.2016.12.017
Article MATH MathSciNet Google Scholar
Hanke, M., März, R., Tischendorf, C.: Least-squares collocation for higher-index linear differential-algebaic equations: Estimating the stability threshold. Math. Comp. 88(318), 1647–1683 (2019). https://doi.org/10.1090/mcom/3393
Article MATH MathSciNet Google Scholar
Hanke, M., März, R.: A reliable direct numerical treatment of differential-algebraic equations by overdetermined collocation: An operator approach. J. Comput. Appl. Math. 387, 112520 (2021)
Article MATH MathSciNet Google Scholar
Hanke, M., März, R.: Convergence analysis of least-squares collocation metods for nonlinear higher-index differential-algebraic equations. J. Comput. Appl. Math. 387(11), 2021 (2514)
Google Scholar
Hanke, M.: Linear differential-algebraic equations in spaces of integrable functions. J. Differential Equations 79(1), 14–30 (1989)
Article MATH MathSciNet Google Scholar
Lamour, R., März, R., Tischendorf, C.: Differential-Algebraic Equations: A Projector Based Analysis. Differential-Algebraic Equations Forum. Springer, Berlin (2013). Series Editors: A. Ilchmann, T. Reis
Book MATH Google Scholar
Kaltenbacher, B., Offtermatt, J.A.: Convergence analysis of regularization by discretization in preimage space. Math. Comp. 81(280), 2049–2069 (2012)
Article MATH MathSciNet Google Scholar
Hanke, M., März, R.: Towards a reliable implementation of least-squares collocation for higher-index linear differential-algebaic equations. Part 1: Basics and ansatz choices. Numerical Algorithms 89, 931–963 (2022)
Article MATH MathSciNet Google Scholar
Hanke, M., März, R.: Towards a reliable implementation of least-squares collocation for higher-index linear differential-algebaic equations. Part 2: The discrete least-squares problem. Numerical Algorithms 89, 965–986 (2022)
Article MATH MathSciNet Google Scholar
Wedin, P.-A.: Perturbation theory for pseudo-inverses. BIT 13, 217–232 (1973)
Article MATH MathSciNet Google Scholar
Eldén, L.: Perturbation theory for the least squares problem with linear equality constraints. SIAM J. Numer Anal. 17(3), 338–350 (1980)
Article MATH MathSciNet Google Scholar
Cox, A.J., Higham, N.J.: Accuracy and stability of the nullspace method for solving the equality constraind least squares problem. BIT 39(1), 34–50 (1999)
Article MATH MathSciNet Google Scholar
Lawson, C.L., Hanson, R.J.: Solving least squares problems. Prentice Hall, Englewood Cliffs NY (1974)
Bhatia, R.: Matrix Analysis. Graduate Texts in Mathematics. Springer, New York (1997)
Google Scholar
Hanke, M., März, R.: Least-Squares Collocation for Higher-Index Daes: Global Approach and Attempts Towards a Time-Stepping Version. In: Reis, T., et al. (eds.) Progress in Differential-Algebraic Equations II, pp 91–136. Springer, Cham (2020)
Kiełbasiński, A., Schwetlick, H.: Numerische Lineare Algebra. Deutscher Verlag der Wissenschaften, Berlin (1988)
MATH Google Scholar
Kato, T.: Perturbation Theory for Linear Operators, 2nd edn. Classics in Mathematics. Springer, Berlin (1995)
Book Google Scholar
Kulkarni, D., Schmidt, D., Tsui, S.-K.: Eigenvalues of tridiagonal pseudo-Toeplitz matrices. Lin. Alg. Appl. 297, 63–80 (1999)
Article MATH MathSciNet Google Scholar
Knopp, K.: Application of Infinite Series Theory. Blackie&Son Glasgow, London (1954)
MATH Google Scholar
Davis, P.J., Rabinowithz, P.: Methods of Numerical Intergration, 2nd edn. Academic Press, San Diego, London (1984)
Google Scholar
Campbell, S.L., Moore, E.: Constraint preserving integrators for general nonlinear higher index DAEs. Num. Math. 69, 383–399 (1995)
Article MATH MathSciNet Google Scholar

Download references

Acknowledgements

The author wants to thank Roswitha März for many discussions that led to a great enhancement of the presentation. In particular, her contributions simplified the derivations leading to Theorem 2 considerably.

Funding

Open access funding provided by Royal Institute of Technology.

Author information

Authors and Affiliations

Department of Mathematics, KTH Royal Institute of Technology, S-100 44, Stockholm, Sweden
Michael Hanke

Authors

Michael Hanke
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Michael Hanke.

Ethics declarations

Conflict of interest

The author declares no competing interests.

Additional information

Data availability

The code used for generating all datasets is available on request.

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Hanke, M. On the sensitivity of implementations of a least-squares collocation method for linear higher-index differential-algebraic equations. Numer Algor 91, 1721–1754 (2022). https://doi.org/10.1007/s11075-022-01320-z

Download citation

Received: 09 February 2022
Accepted: 21 April 2022
Published: 06 June 2022
Issue Date: December 2022
DOI: https://doi.org/10.1007/s11075-022-01320-z

Keywords

Mathematics Subject Classification (2010)

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

On the sensitivity of implementations of a least-squares collocation method for linear higher-index differential-algebraic equations

Abstract

Similar content being viewed by others

Towards a reliable implementation of least-squares collocation for higher index differential-algebraic equations—Part 1: basics and ansatz function choices

Towards a reliable implementation of least-squares collocation for higher index differential-algebraic equations—Part 2: the discrete least-squares problem

Least-Squares Collocation for Higher-Index DAEs: Global Approach and Attempts Toward a Time-Stepping Version

1 Introduction

2 The problem setting

2.1 The discrete functional

Theorem 1

2.2 A basis representation of Φπ,M

Definition 1

Fact 1

Fact 2

2.3 Conditioning of the implementation

3 Properties of the representation map \(\mathcal {R}\)

Proposition 1

Proof

Theorem 2

Proof

Theorem 3

Proposition 2

Proof

Remark 1

4 Error estimation for the constrained minimization problem

Proposition 3

4.1 \(\ker (\mathcal {C}+{\Delta }\mathcal {C})=\ker \mathcal {C}\)

Theorem 4

Remark 2

4.2 \(\ker (\mathcal {C}+{\Delta }\mathcal {C})\protect \neq \ker \mathcal {C}\)

Lemma 1

Proof

Theorem 5

Proof

Remark 3

Proposition 4

Proof

Corollary 1

Proposition 5

Proof

Remark 4

Corollary 2

5 Some examples

5.1 Conditioning of the representation map \(\mathcal {R}\)

5.2 Conditioning of the constrained minimization problems

Example 1

Example 2

Example 3

6 Conclusions

Notes

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Data availability

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification (2010)

Search

Navigation

2.2 A basis representation of Φ_π,M