Towards a reliable implementation of least-squares collocation for higher index differential-algebraic equations—Part 2: the discrete least-squares problem

In the two parts of the present note we discuss questions concerning the implementation of overdetermined least-squares collocation methods for higher index differential-algebraic equations (DAEs). Since higher index DAEs lead to ill-posed problems in natural settings, the discrete counterparts are expected to be very sensitive, which attaches particular importance to their implementation. We provide in Part 1 a robust selection of basis functions and collocation points to design the discrete problem whereas we analyze the discrete least-squares problem and substantiate a procedure for its numerical solution in Part 2.


Introduction
This is Part 2 of our work entitled Towards a reliable implementation of least-squares collocation for higher index differential-algebraic equations, which is introduced and classified in detail in Part 1. We put together here very briefly the necessary ingredients for fluent reading of the current second part.
Consider a linear boundary value problem for a DAE with properly involved derivative,

A(t)(Dx) (t) + B(t)x(t) = q(t), t ∈ [a, b],
(1) G a x(a) + G b x(b) = d. ( 2 ) with [a, b] ⊂ R being a compact interval, D = [I 0] ∈ R k×m , k < m, with the identity matrix I ∈ R k×k . Furthermore, A(t) ∈ R m×k , B(t) ∈ R m×m , and q(t) ∈ R m are assumed to be sufficiently smooth with respect to t ∈ [a, b]. Moreover, G a , G b ∈ R l dyn ×m . Thereby, l dyn is the dynamical degree of freedom of the DAE, that is, the number of free parameters which can be fixed by initial and boundary conditions. We assume further that ker D ⊆ ker G a and ker D ⊆ ker G b . Unlike regular ordinary differential equations (ODEs) where l dyn = k = m, for DAEs it holds that 0 ≤ l dyn ≤ k < m, in particular, l dyn = k for index-one DAEs, l dyn < k for higher index DAEs, and l dyn = 0 can certainly happen.
Let P K denote the set of all polynomials of degree less than or equal to K ≥ 0. Given the partition π, π : a = t 0 < t 1 < · · · < t n = b, with the stepsizes h j = t j − t j −1 , let C π ([a, b], R m ) denote the space of piecewise continuous functions having breakpoints merely at the meshpoints of the partition π. Let N ≥ 1 be a fixed integer. We are looking for an approximate solution of our boundary value problem from the ansatz space X π , x κ | [tj−1,tj ) ∈ P N , κ = 1, . . . , k, x κ | [tj−1,tj ) ∈ P N −1 , κ = k + 1, . . . , m, j = 1, . . . , n}. (4) The continuous version of the least-squares method reads: Find an x π ∈ X π that minimizes the functional Hanke and März [11,Theorem 1] provides sufficient conditions ensuring the existence and uniqueness of the approximate solution from X π .
The functional values Φ(x), which are needed when minimizing for x ∈ X π , cannot be evaluated exactly and the integral must be discretized accordingly. Taking into account that the boundary value problem is ill-posed in the higher index case, perturbations of the functional may have a serious influence on the error of the approximate least-squares solution or even prevent convergence towards the exact solution. Therefore, careful approximations of the integral in Φ are required. We take over the options provided in [11], in which M ≥ N + 1 so-called collocation points 0 ≤ τ 1 < · · · < τ M ≤ 1. (6) are used, further, on the subintervals of the partition π, t ji = t j −1 + τ i h j , i = 1, . . . , M, j = 1, . . . , n.
Introducing, for each x ∈ X π and w(t) = A(t)(Dx) (t) + B(t)x(t) − q(t), the corresponding vector W ∈ R mMn by we turn to an approximate functional of the form with a positive definite symmetric matrix 1 L = diag(L ⊗ I m , . . . , L ⊗ I m ). ( 9 ) As detailed in [11], we have different options for the positive definite symmetric matrix L ∈ R M×M , namely see [11,Section 3] for details concerning the selection of the quadrature weights and the construction of the mass matrix. We emphasize that the matrices L C , L I , L R depend only on M, the node sequence (6), and the quadrature weights, but do not depend on the partition π and its stepsizes at all. In the context of the experiments below, we denote each of the different versions of the functional by Φ C π,M , Φ I π,M , and Φ R π,M , respectively. It should be underlined that minimizing each version of the functional Φ π,M on X π can be viewed as a special least-squares method to solve the overdetermined collocation system W = 0, G a x(a) + G b x(b)) = d, with respect to x ∈ X π , that is in detail, the collocation system The system (13)- (14) for x ∈ X π becomes overdetermined since X π has dimension mnN + k, whereas the system consists of mnM + l dyn > nmN + k + l dyn ≥ mnN + k scalar equations. We refer to [11,Theorem 2] for sufficient conditions which ensure the existence and uniqueness of the minimizing element Once the basis of the ansatz space X π has been chosen and the collocation nodes are selected, the discrete problem (15) for a linear boundary value problem (1)-(2) leads to a constrained linear least-squares problem 1 ⊗ denoting the Kronecker product.
under the linear constraint The equality constraints consists of the k(n − 1) continuity conditions for the elements of X π while the functional ϕ(c) represents a reformulation of the functional (8). Here, c ∈ R n(mN+k) is the vector of coefficients of the basis functions for X π disregarding the continuity conditions. Furthermore, it holds r ∈ R nmM+l , A ∈ R (nmM+l)×n(mN+k) , and C ∈ R (n−1)k×n(mN+k) . The matrices A and C are very sparse. Owing to the construction, C has full row rank.
We specify the structure of A and C in detail in Section 2 below. Different approaches to solve the constraint optimization problem (16)-(17) have been tested. We report on related experiments in Section 4. The examples which are used on different places are collected in the particular Section 3. The performance of the linear solver is discussed in Section 5. Section 6 shows some additional experiments concerning the boundary conditions weighting. Section 7 contains final remarks.

The structure of the discrete problem (16), (17)
Based on the analysis in [11,Section 4] we provide a basis of the ansatz space X π to begin with. Assume that {p 0 , . . . , p N−1 } is a basis of P N−1 defined on the reference interval [0, 1]. Then, {p 0 , . . . ,p N } given bȳ forms a basis of P N . The transformation to the interval (t j −1 , t j ) of the partition π (3) yields and in particular For x ∈ X π we set the denotations Then, we develop each x j componentwise Introducing still we represent Now we collect all coefficients c jκ l in the vector c, It follows that the matrix C ∈ R k(n−1)×n(mN+k) in (17) corresponding to the continuity requirement for Dx has the precise form . . . . . .
By construction the segments Dx j , j = 1, . . . , n, all together form a continuous function Dx on [a, b] exactly when C c = 0.
Regarding the structure of C ∈ R k(n−1)×n(mN+k) we know that rankC = k(n − 1), dim ker C = nmN + k = dim X π , and formula (22) provides an one-to-one relation between X π and ker C ⊂ R.
Now we turn to the detailed description of the functional value (16). For this aim we factorize L =L TL and L =L TL such that and (8) rewrites as We derive applying (22), (23) with A ji ∈ R m×(mN+k) . According to (7) we set Introducing still the sparse matrix A ∈ R (nmM+l dyn )×(nmN+nk) and the vector r ∈ R nmM+l dyn , as desired. Eventually, each minimizer x π ∈ argmin{Φ π,M (x) : x ∈ X π } corresponds to a minimizer c min ∈ argmin{ϕ(c) : c ∈ R nmN+nk , C c = 0}, and vice versa. Recall that [11,Theorem 2] provides sufficient condition for x π to be unique.

Proposition 1
Let the functional Φ π,M have the only minimizer x π on X π . Then the following assertions are valid: (1) There is exactly one minimizer c min of the functional ϕ on ker C .
(2) If B ∈ R (nmN+nk)×(nmN+k) is a basis of ker C then A := A B has full column rank nmN + k.
Proof (1) follows directly from the above representations of the related functionals.
Owing to the uniqueness of the minimizer it follows that Bz = 0, and in turn z = 0, since B has full column rank being a basis.

Test examples
The first test problem is often used in the literature to show that standard integration methods fail if applied to higher index DAEs, e.g., [13,14].

Example 1
The DAE has index-3 and dynamical degree of freedom l dyn = 0 such that no additional boundary or initial conditions are necessary for unique solvability. We choose the exact solution and adapt the right-hand side q accordingly. For the exact solution, it holds The next example is the linearized version of a test problem presented [6] that has also been discussed, e.g., in [12].

Example 2 We consider the DAE
subject to the initial conditions This problem has index 3 and dynamical degree of freedom l dyn = 4. The righthand side q has been chosen in such a way that the exact solution becomes x * ,4 = cos t, x * ,2 = cos t, x * ,5 = − sin t, x * ,3 = 2 cos 2 t, x * ,6 = −2 sin 2t, For the exact solution, it holds The following example is a boundary value problem in contrast to Example 2 which is an initial value problem.
subject to the boundary conditions

This DAE can be brought into the proper form (1) by setting
This DAE has the tractability index μ = 4 and dynamical degree of freedom l = 2. The solution reads

Approaches to solve the constraint optimization problem (16)-(17)
Different approaches to solve the constraint optimization problem (16)-(17) have been tested, namely the direct elimination method, the weighting of the constraints, and a special deferred correction procedure as specified in the following three subsections.

Direct elimination method
The matrix C has full row rank (n − 1)k. The solution manifold of (17), that is ker C , forms an (nmN + k)-dimensional subspace of R nmN+nk which can be characterized by Here, B ∈ R n(mN+k)×(nmN+k) is an orthogonal basis of ker C . With this representation, the constrained minimization problem can be reduced to the unconstrained oneφ (z) = |A Bz − r| 2 R nmN +l dyn → min! Owing to Proposition 1, the matrix product A B has full column rank nmN + k.
The implemented algorithm is that of [5] (see also [4, Section 5.1.2]) which is sometimes called the direct elimination method. In our tests below the direct method seems to be the most robust one.

Weighting of the constraints to solve the optimization problem (16)-(17).
In this approach, a sufficiently large parameter ω > 0 is chosen and the problem (16)-(17) is replaced by the free minimization problem It is known that 2 the minimizer c ω of ϕ ω converges towards the solution of (16)-(17) for ω → ∞ (cf. [9, Section 12.1.5]). Two different orderings of the equations have been implemented. One is while the other uses a block-bidiagonal structure as it is common for collocation methods for ODEs, cf [1]. It is known that the order of the equations in the weighting method may have a large impact on the accuracy of the solutions [16]. In our test examples, however, we did not observe a difference in the behavior of both orderings. The results of the weighting method depend substantially on the choice of the parameter ω. In order to have an accurate approximation of the exact solution c * of the problem (16)-(17), a large value of ω should be used (in the absence of rounding errors). However, if ω becomes too large, the algorithm may lack numerical stability. A discussion of this topic has been given in [16]. In particular, it turns out that the algorithm used for the QR decomposition and the pivoting strategies have a strong influence on the success of this method. In our implementation, we use the sparse QR implementation of [8]. On the other hand, an accuracy of the solution being much lower than the approximation error of x π is not necessary. 3 Therefore, a number of experiments have been done in order to obtain some insight into what reasonable choices might be.

Experiment 1 Influence of the choice of the weighting parameter ω
We use Example 2. Two sets of parameters are selected: (i) N = 5, n = 160 and (ii) N = 20, n = 20. The choice (i) corresponds to low degree polynomials with a corresponding large number of subintervals while (ii) uses higher degree polynomials with a corresponding small number of subintervals. Both cases have been selected according to [11,Table 20] in such a way that a high accuracy can be obtained while at the same time having only a small influence of the problem conditioning. The other parameters chosen in this experiment are: M = N + 1, Gauss-Legendre collocation nodes and Legendre polynomials as basis functions. The error in dependence of ω is measured both with respect to the exact solution and with respect to a reference solution obtained by the direct solution method. The results are provided in Tables 1  and 2. The results for Example 3 below are quite similar. The results indicate that an optimal ω may vary considerably depending on the problem parameters. However, the accuracy against the exact solution is rather insensitive of ω.

Deferred correction procedure
The direct solution method by eliminating the constraints has often the deficiency of generating a lot of fill-in in the intermediate matrices. An approach to overcome this situation has been proposed in [16]. The solutions of the weighting approach are iteratively enhanced by a defect correction process. This method is implemented in the form presented in [2,3]. This form is called the deferred correction procedure for Table 1 Influence of parameter ω for the constraints in Example 2 using N = 5 and n = 160 1e-09 2.25e+00 4.54e+00 9.37e+00 2.25e+00 4.54e+00 8.04e+00 1e-08 2.00e+00 4.59e+00 9.04e+00 2.00e+00 4.59e+00 9.04e+00 1e-07 3.55e-01 5. The error of the solution with respect to the exact solution (A) and with respect to a discrete reference solution obtained by a direct method (B) is given in the norms of L 2 ((0, 5), R 7 ), L ∞ ((0, 5), R 7 ) and H 1 D ((0, 5), R 7 ) Table 2 Influence of parameter ω for the constraints in Example 2 using N = 20 and n = 20. The error of the solution with respect to the exact solution (A) and with respect to a discrete reference solution obtained by a direct method (B) is given in the norms of L 2 ((0, 5), R 7 ), L ∞ ((0, 5), R 7 ) and constrained least-squares problems by the authors. As a stopping criterion, the estimate (i) in [3, p. 254] has been implemented. Additionally, a bound for the maximal number of iterations can be provided. Under reasonable conditions, at most 2 iterations should be sufficient for obtaining maximal (with respect to the sensitivity of the problem) accuracy for the discrete solution.
The iterative solver using defect corrections may overcome the difficulties connected with a suitable choice of the parameter ω in the weighting method. According to Experiment 1, we would expect the optimal ω to be in the order of magnitude 10 −3 . . . 10 +2 with an optimum around 10 −2 . This is in contrast to the recommendations given in [3] where a choice of ω ≈ ε −1/3 mach is recommended for the deferred correction algorithm. We test the performance of the deferred correction solver in the next experiment. Here, the tolerance in the convergence check is set to 10 −15 . The iterations are considered not to converge if the convergence check has failed after two iterations.

Experiment 2
We check the performance of the deferred correction solver in dependence of the weight parameter ω. Both Examples 2 and 3 are used. The results are presented in Tables 3, 4, 5 and 6. The results indicate that a larger value for ω seems to be preferable.

Performance of the linear solvers
In this section, we intend to provide some insight into the behavior of the linear solvers. This concerns both the accuracy as well as the computational resources (computation time, memory consumption). All these data are highly implementation dependent. Also the hardware architecture plays an important role.
The linear solvers have been implemented using the standard strategy of subdividing them into a factorization step and a solve step. The price to pay is a larger memory consumption. However, their use in the context of, e.g., a modified Newton method may decrease the computation time considerably.
The tests have been run on a Linux laptop Dell Latitude E5550. While the program is a pure sequential one, the MKL library may use shared memory parallel versions of their BLAS and LAPACK routines. The CPU of the machine is an Intel(R) Core(TM) i7-5600U CPU @ 2.60GHz providing two cores, each of them capable of hyperthreading. For the test runs, cpu throttling has been disabled such that all cores ran at roughly 3.2 GHz.
The parameter for the weighting solver is ω = 1 while the corresponding parameter for the deferred correction solver is ω =  Table 7. For the special properties of the Legendre Table 3 Influence of the parameter ω on the accuracy of the discrete solution for Example 2 using N = 5 and n = 160. The error of the solution with respect to the exact solution (A) and with respect to a discrete reference solution obtained by a direct method (B) is given in the norms of L 2 ((0, 5), R 7 ), L ∞ ((0, 5), R 7 ) and H 1 D ((0, 5), R 7 ). 2 iterations are applied  Table 4 Influence of the parameter ω on the accuracy of the discrete solution for Example 2 using N = 20 and n = 20 0.01 a 2.20e-11 3.35e-12 3.37e-11 6.25e-11 6.67e-12 6.70e-11 10 1.79e-11 1.98e-12 1.99e-11 6.26e-11 6.91e-12 6.95e-11 ε −1/3 mach 1.11e-11 1.71e-12 1.72e-11 6.26e-11 6.94e-12 6.97e-11 a Iteration did not converge The error of the solution with respect to the exact solution (A) and with respect to a discrete reference solution obtained by a direct method (B) is given in the norms of L 2 ((0, 5), R 7 ), L ∞ ((0, 5), R 7 ) and H 1 D ((0, 5), R 7 ). 2 iterations are applied Table 5 Influence of the parameter ω on the accuracy of the discrete solution for Example 3 using N = 20 and n = 5. The error of the solution with respect to the exact solution (A) and with respect to a discrete reference solution obtained by a direct method (B) is given in the norms of L 2 ((0, 5), R 6 ), L ∞ ((0, 5), R 6 ) and H 1 D ((0, 5), R 6 ). 2 iterations are applied Table 6 Influence of the parameter ω on the accuracy of the discrete solution for Example 3 using N = 5 and n = 20.   The number of nonzero elements in the matrices A and C are provided as reported by the functions of the Eigen library. The columns denote: the number of rows of A (dimA), the number of rows of C (dimC), the number of unknowns (nun), the number of nonzero elements of C (nnzC), the number of nonzero elements of A (nnzA) for the functional Φ R π,M and Φ C π,M , respectively polynomials, the matrix C representing the constraints is extremely sparse featuring only three nonzero elements per row. The computational results are shown in Table 8.
In the next computations, the Chebyshev basis has been used which leads to a slightly more occupied matrix C . The results are provided in Tables 9 and 10. They are the average of 100 runs of each case. The error is measured in the norm of H 1 D ((0, 5), R 7 ). The column headings denote: The upper bound on the number of nonzero elements of the QR factors as reported by SPQR (nWork), the time for the matrix assembly (tass), the time for the factorization (afact), and the time for the solution (tslv) for both functionals Φ R π,M and Φ C π,M Table 9 Case characteristics for Experiment 3 using the Chebyshev basis dimC  nun  nnzC  nnzA  nnzA   1  3  320  8964  1914  8640  7656  100164  101124   2  5  80  3364  474  3280  2370  58884  59044   3  10  5  389  24  380  168  12794  12594   4  20  5  739  24  730  288  47509  47119 The number of nonzero elements in the matrices A and C are provided as reported by the functions of the Eigen library. The columns denote: the number of rows of A (dimA), the number of rows of C (dimC), the number of unknowns (nun), the number of nonzero elements of C (nnzC), the number of nonzero elements of A (nnzA) for the functional Φ R π,M and Φ C π,M , respectively The previous example is an initial value problem. This structure may have consequences on the performance of the linear solvers. Therefore, in the next experiment, we consider a boundary value problem.

Experiment 4
We repeat Experiment 3 with Example 3. The problem characteristics and computational results are provided in Tables 11, 12, 13, and 14. It should be noted Table 10 Computing times, permanent workspace needed, and error for the cases described in The computing times are provided in milliseconds. They are the average of 100 runs of each case. The error is measured in the norm of   1  4  320  9602  1595  9280  4785  86403  80643   2  5  160  5762  795  5600  1422  63363  63363   3  10  5  332  20  325  60  6933  6663   4  20  5  632  20  625  60  25793  25263 The number of nonzero elements in the matrices A and C are provided as reported by the functions of the Eigen library. The columns denote: the number of rows of A (dimA), the number of rows of C (dimC), the number of unknowns (nun), the number of nonzero elements of C (nnzC), the number of nonzero elements of A (nnzA) for the functional Φ R π,M and Φ C π,M , respectively that the deferred correction solver returned normally (tolerance as before 10 −15 ) after at most two iterations in all cases. However, in some cases, the results are completely off. This happens, for example, in Tables 12 and 14, cases 1 and 2, for Φ C π,M . The computing times are provided in milliseconds. They are the average of 100 runs of each case. The error is measured in the norm of H 1 D ((0, 1), R 6 ). The column headings denote: The upper bound on the number of nonzero elements of the QR factors as reported by SPQR (nWork), the time for the matrix assembly (tass), the time for the factorization (afact), and the time for the solution (tslv) for both functionals Φ R π,M and Φ C π,M . Data in bold indicate results where the solver terminated normally but with a result being completely off The number of nonzero elements in the matrices A and C are provided as reported by the functions of the Eigen library. The columns denote: the number of rows of A (dimA), the number of rows of C (dimC), the number of unknowns (nun), the number of nonzero elements of C (nnzC), the number of nonzero elements of A (nnzA) for the functional Φ R π,M and Φ C π,M , respectively It should be noted that a considerable amount of memory for the QR factorizations is consumed by the internal representation of the Q-factor in SPQR. This can be avoided if the factorization and solution steps are intervowen. As already known for boundary value problems for ODEs and index-1 DAEs, a special problem is the scaling of the boundary condition, and hence, here the inclusion of the boundary conditions (2). Their scaling is independent of the scaling of the DAE (1). Therefore, it seems to be reasonable to provide an additional possibility for the scaling of the boundary conditions. We decided to enable this by introducing an additional parameter α to be chosen by the user. So, Φ from (5) is replaced by the functional Analogously, the discretized versions Φ R π,M , Φ I π,M and Φ C π,M are replaced by their counterpartsΦ R π,M ,Φ I π,M andΦ C π,M with weighted boundary conditions. The convergence theorems will hold true for these modifications of the functional, too.

Experiment 5 Influence of α on the accuracy
We use the example and settings of Experiment 1. The results are provided in Table 15.

Experiment 6 Influence of α on the accuracy
We repeat the previous experiment with Example 3. The discretization parameters are (i) N = 5, n = 20 and (ii) N = 20, n = 5. All other settings correspond to those of Experiment 5. The results are presented in Table 16.
The results of Experiments 5 and 6 indicate that the final accuracy is rather insensitive to the choice of α. It should be noted that the coefficient matrices in Examples 2 and 3 are well-scaled.

Final remarks
In summary, we investigated questions related to an efficient and reliable realization of a least-squares collocation method. These questions are particularly important since a higher index DAE is an essentially ill-posed problem in naturally given spaces, which is why we must be prepared for highly sensitive discrete problems. In Part 1, in order to obtain an overall procedure that is as robust as possible, we provided criteria which led to a robust selection of the collocation points and of the basis functions, whereby the latter is also useful for the shape of the resulting discrete problem. We refer to the corresponding Final remarks and conclusions in [11].
A critical ingredient for the implementation of the method is the algorithm used for the solution of the discrete linear least-squares problem. Given the expected bad conditioning of the least-squares problem, a QR factorization with column pivoting The error of the solution is given in the norms of L 2 ((0, 5), R 7 ), L ∞ ((0, 5), R 7 ) and H 1 D ((0, 5), R 7 ) must lie at the heart of the algorithm. At the same time, the sparsity structure must be used as best as possible. In our tests, the direct solver seems to be the most robust one. With respect to efficiency and accuracy, the deferred correction solver is preferable. However, it failed in certain tests. The results for M = N + 1 are not much different from those obtained for a larger M, for which we do not yet have an explanation.
In conclusion, we note that earlier implementations, among others the one from the very first paper in this matter [13] which started from proven ingredients for ODE codes, are from today's point of view and experience a rather bad version for the leastsquares collocation. Nevertheless, the test results calculated with it were already very impressive. This strengthens our belief that a careful implementation of the method gives rise to a very efficient solver for higher index DAEs.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.