Towards a reliable implementation of least-squares collocation for higher-index differential-algebraic equations

In this note we discuss several questions concerning the implementation of overdetermined least-squares collocation methods for higher-index differential algebraic equations (DAEs). Since higher-index DAEs lead to ill-posed problems in natural settings, the dicrete counterparts are expected to be very sensitive, what attaches particular importance to their implementation. We provide a robust selection of basis functions and collocation points to design the discrete problem and substantiate a procedure for its numerical solution. Additionally, a number of new error estimates are proven that support some of the design decisions.


Introduction
An overdetermined least-squares collocation method for the solution of boundaryvalue problems for higher-index differential-algebraic equations (DAEs) has been introduced in [32] and further investigated in [31,30,29]. A couple of sufficient convergence conditions have been established. Numerical experiments indicate an excellent behavior. Moreover, it is particularly noteworthy that the computational effort is not much more expensive than for standard collocation methods applied to boundary-value problems for ordinary differential equations. However, the particular procedures are much more sensitive, which reflects the ill-posedness of higherindex DAEs. The question of a reliable implemention is almost completely open.
The method offers a number of parameters and options whose selection has not been backed up by any theoretical justifications. The present paper is devoted to a first investigation of this topic. We focus on the choice of collocation nodes, the representation of the ansatz function as well as the shape and structure of the resulting discrete problem. We apply various theoretical arguments, among them also new sufficient convergence conditions in Theorems 1, 2, and 3, and report on corresponding systematic comprehensive numerical experiments.
The paper ist organized as follows: Section 2 contains the information concerning the problem to be solved as well as the basics on the overdetermined least-squares approach, and, additionally, the new error estimates. Section 3 deals with the selection and calculation of collocation points and integration weights for the different functionals of interest and Section 4 provides a robust selection of basis of the ansatz space. The resulting discrete least-squares problem is treated in Section 5, a number of experiments is reported. The more detailed structur of the discrete problem is described in the Appendix. We conclude with Section 6, which contains a summary and further comments.

Fundamentals of the problem and method
Consider a linear boundary-value problem for a DAE with properly involved derivative, with [a, b] ⊂ R being a compact interval, D = [I 0] ∈ R k×m , k < m, with the identity matrix I ∈ R k×k . Furthermore, A(t) ∈ R m×k , B(t) ∈ R m×m , and q(t) ∈ R m are assumed to be sufficiently smooth with respect to t ∈ [a, b]. Moreover, G a , G b ∈ R l×m . Thereby, l is the dynamical degree of freedom of the DAE, that is, the number of free parameters which can be fixed by initial and boundary conditions. Unlike regular ordinary differential equations (ODEs) where l = k = m, for DAEs it holds that 0 ≤ l ≤ k < m, in particular, l = k for index-one DAEs, l < k for higher-index DAEs, and l = 0 can certainly happen. Supposing accurately stated initial and boundary conditions, index-one DAEs yield well-posed problems in natural settings and can be numerically treated quite well similarly as ODEs [38]. In contrast, in the present paper, we are mainly interested in higher-index DAEs which lead to essentially ill-posed problems even if the boundary conditions are stated accurately [37,38,31]. The tractability index and projector-based analysis serve as the basis for our investigations. We refer to [37] for a detailed presentation and to [38,40,31] for corresponding short sketches.
We assume that the DAE is regular with arbitrarily high index µ ∈ N and the boundary conditions are stated accurately so that solutions of the problem (1)- (2) are unique. We also assume that a solution x * : [a, b] → R m actually exists and is sufficiently smooth.
For the construction of a regularization method to treat an essentially ill-posed problem a Hilbert space setting of the problem is most convenient. For this reason, as in [32,31,30], we apply the spaces which are suitable for describing the underlying operators. In particular, let T : H 1 D → L 2 × R l be given by Then the boundary-value problem can be described by T x = (q, d) T . For K > 0, let P K denote the set of all polynomials of degree less than or equal to K. Next, we define a finite dimensional subspace X π ⊂ H 1 D of piecewise polynomial functions which should serve as ansatz space for the least-squares approximation: Let the partition π be given by π : a = t 0 < t 1 < · · · < t n = b, (4) with the stepsizes h j = t j − t j−1 , h = max 1≤ j≤n h j , and h min = min 1≤ j≤n h j . Let C π ([a, b], R m ) denote the space of piecewise continuous functions having breakpoints merely at the meshpoints of the partition π. Let N ≥ 1 be a fixed integer. Then, we define x κ | [t j−1 ,t j ) ∈ P N , κ = 1, . . . , k, x κ | [t j−1 ,t j ) ∈ P N−1 , κ = k + 1, . . ., m, j = 1, . . . , n}.
The continuous version of the least-squares method reads: Find an x π ∈ X π which minimizes the functional It is ensured by [31,Theorem 4.1] that, for all sufficiently fine partitions π with bounded ratios 1 ≤ h/h min ≤ ρ, ρ being a global constant, there exists a unique solution x π ∈ X π and the inequality is valid. The constant C ∈ R depends on the solution x * , the degree N, and the index µ, but it is independent of h. If N > µ − 1 then (7) apparently indicates convergence x π h→0 −−→ x * in H 1 D . At this place it is important to mention that, so far, we are aware of only sufficient conditions of convergence and the error estimates may not be sharp. Not only more practical questions of implementation are open, but also several questions about the theoretical background. We are optimistic that much better estimates are possible since the results of the numerical experiments have performed impressively better than theoretically expected till now.
The following theorem can be understood as a specification of [31,Theorem 4.1] by a more detailed description of the ingredients of the constant C, in particular, now the role of N is better clarified, which could well be important for the practical realization. In particular, it suggests that smooth problems could perhaps be solved better with large N and coarser partitions.
Theorem 1 Let the DAE (1) be regular with index µ ∈ N and let the boundary condition (2) be accurately stated. Let x * be a solution of the boundary value problem (1)- (2), and let A, B, q and also x * be sufficiently smooth.
Let N ≥ 1 and all partitions π be such that h/h min ≤ ρ, with a global constant ρ. Then, for all such partitions with sufficiently small h, the estimate (7) is valid with 3 ) 1/2 , C data is independent of N and h, it depends only on the data A, D, B, G a , G b , and C N is a rather involved function of N. In particular, there is an integer K with N ≤ K ≤ 2(µ − 1) + N such that, for N → ∞, C N does not grow faster than K 2(µ−1) . If A and B are constant, it holds K = N. At this place it should be mentioned that estimate [42] However, it should be considered that C N and C * also depend on N.
Proof We apply the estimate [31] x π − x * H 1 in which the approximation error α π and the instability threshold γ π are given by Owing to [31,Theorem 4.1], there is a constant c γ > 0 independent of π so that the instability threshold γ π satisfies the inequality c γ h µ−1 min ≤ γ π ≤ T , for all partitions with sufficiently small h. This leads to Choosing N interpolation points ρ i with the approximation error can be estimated by straightforward but elaborate computations by constructing p * ∈ X π such that p ′ * , Turning to shifted Gauss-Legendre nodes that minimize ω L 2 (0,1) we obtain To verify this, we consider the polynomial with zeros t j = 2ρ j − 1, j = 1, . . . , N, which is nothing else but the standard Legendre polynomial with leading coefficient one. Using the Rodrigues formula and other arguments from [28,Section 5.4], one obtains Finally, shifting back to the interval (0, 1) leads to ω L 2 (0,1) = 2 −(N+ 1 2 ) ω L 2 (−1,1) . Thus we have Next, a careful review of the proof of [31, Theorem 4.1 (a)] results in the representation (in terms of [31]) The In contrast, the term c * µ−1 depends additionally on N besides the problem data. Let K denote the degree of the auxiliary polynomial q µ−1 = A µ−1 (Dp) ′ + B µ−1 p, p ∈ X π in the proof of [31,Theorem 4.1]. Then we have N ≤ K ≤ N + 2(µ − 1) and, by [31,Lemma 4.2], c * µ−1 = 4 µ−1 λ K · · · λ K−µ+2 , where each λ S > 0 is the maximal eigenvalue of a certain symmetric, positive semidefinite matrix of size (S + 1) × (S + 1) [32,Lemma 3.3].
Owing to [32,Corollary A.3] it holds that λ S ≤ 4 π 2 S 4 + O(S 2 ) for large S, and therefore we are done. ⊓ ⊔ Observe that, for smooth problems, any fixed sufficiently fine partition π, and N → ∞, the growth rate of the error x π − x * H 1 D is not greater than that of and, for constant matrix function A and B, Remember that C * is a function of N.

Remark 1
The specific error estimation provided in [32] for the case of DAEs in Jordan chain form on equidistant grids may provide some further insight in the behavior of the instability threshold γ π . It is shown that holds true for sufficiently small h whereC µ is a moderate constant depending only on µ [32,Theorem 3.6]. This leads to the dominant error term indicating again that, for smooth problems it seems reasonable to calculate with larger N and coarse partitions. Moreover, for sufficiently small h √ [32,Remark 3.4], and hence the growth characteristic (13) for large N is confirmed once more. ⊓ ⊔ The functional values Φ(x), which are needed when minimizing for x ∈ X π , cannot be evaluated exactly and the integral must be discretized accordingly. Taking into account that the boundary-value problem is ill-posed in the higher index case µ > 1, perturbations of the functional may have a serious influence on the error of the approximate least-squares solution or even prevent convergence towards the solution x * . Therefore, careful approximations of the integral in Φ are required. We discuss the following three options: and in which from the DAE (1) and x ∈ X π only data at the points In the last functional Φ R π,M Lagrange basis polynomials appear, i.e.,

Remark 2
The direct numerical implementation of Φ R π,M (x) with the Lagrangian basis functions includes the use of the mass matrix belonging to such functions. It is well known that this matrix may be very bad conditioned thus leading to an amplification of rounding errors. In connection with the ill-posedness of higher-index DAEs, this may render the numerical solutions useless. The solution of the least-squares problem with Φ I π,M is much less expensive than that with Φ R π,M , and in turn, solving system (21)- (22) below for x ∈ X π in a least-squares sense using the (diagonally weighted) Euclidean norm in R nMm+l according to Φ C π,M is even less computationally expensive than using Φ I π,M (x). ⊓ ⊔ Introducing, for each x ∈ X π and w(t we obtain new representations of these functionals, namely The formula for Φ R π,M (x) can be established by straightforward evaluations following the lines 2 of [32, Section 2.3], in which L R = diag(L R ⊗ I m , . . . , L R ⊗ I m ), L R is the mass matrix associated with the Lagrange basis functions l i , i = 1, . . . , M, (18) for the node sequence (17), more precisely, L R is symmetric and positive definite and, consequently, L R is so. We emphasize that the matrices L C , L I , L R depend only on M, the node sequence (17), and the quadrature weights, but do not depend on the partition π and h at all.
We set always M ≥ N + 1. Although the nodes (17) serve as interpolation points in the functional Φ R π,M , we still call them collocation nodes after [32]. It should be underlined here that minimizing each of the above functionals on X π can be viewed as a special least-squares method to solve the overdetermined collocation system W = 0, G a x(a) + G b x(b)) = d, with respect to x ∈ X π , that is in detail, the collocation system The system (21)-(22) for x ∈ X π becomes overdetermined since X π has dimension mnN + k, whereas the system consists of mnM Remark 3 Based on collocation methods for index-1 DAEs, the first thought in [32,31] was to turn to the functional Φ C π,M with nodes 0 < τ 1 < · · · < τ M < 1. However, the use of the special discretized norm in these papers for providing convergence results is in essence already the use of the functional Φ R π,M . For a general set of nodes (17), Φ C π,M represents a simple Riemann approximation of the corresponding integral, which has first order of accuracy, only. If, however, the nodes are chosen as those of the Chebyshev intergration, the orders 1, . . . , 7 and 9 can be obtained for the corresponding number M of nodes [33, p 349]. The marking with the upper index C indicates now that Chebyshev integration formulas are conceivable. As developed in [28,Section 7.5.2], integration formulas with uniform weights, i.e., Chebyshev formulas, are those where random errors in the function values have the least effect on the quadrature result. This makes these formulas very interesting in our context. However, although a lot of test calculations runs well, we are not aware of convergence statements going along with Φ C π,M so far. ⊓ ⊔

Remark 4
The functional Φ R π,M gets its upper index R from the restriction operator R π,M introduced in [30] with nodes 0 < τ 1 < · · · < τ M < 1. Note that [30, Theorem 2.3] generalizes convergence results from [32,31] to a large extend. Theorem 2 below allows even any nodes with (17). ⊓ ⊔ Remark 5 The functional Φ I π,M has its upper index I simply from the word integration formula. We will see first convergence results going along with Φ I π,M in Theorem 2 below.
Intuitively, it seems reasonable to use a Gaussian quadrature rule for these purposes. However, it is not known if such a rule is most robust against rounding errors and/or other choices of the overall process. ⊓ ⊔ Remark 6 Our approximations are according to the basic ansatz space X π discontinuous, with possible jumps at the grid points in certain components. In this respect it does not matter which of our functionals is selected. Since we always have overdetermined systems (21)- (22), it can no longer be expected that all components of the approximation are continuous even for the case τ 1 = 0, τ M = 1. This is an important difference to the classical collocation methods for Index-1 DAEs, which base on classical uniquely solvable linear systems, e.g., [38].
Theorem 2 Let the DAE (1) be regular with index µ ∈ N and let the boundary condition (2) be accurately stated. Let x * be a solution of the boundary value problem (1)- (2), and let A, B, q and also x * be sufficiently smooth.
Let all partitions π be such that h/h min ≤ ρ, with a global constant ρ. Then, with M ≥ N + µ, the following statements are true: (1) For sufficient fine partitions π and each sequence of arbitrarily placed nodes (17), there exists exactly one x R π ∈ X π minimizing the functional Φ R π,M on X π , and (2) For each integration rule related to the interval [0, 1], with M nodes (17) and positive weights γ 1 , . . . , γ M , which is exact for polynomials with degree less than or equal to 2M −2, and sufficient fine partitions π, there exists exactly one x I π ∈ X π minimizing the functional Φ I π,M on X π , and x I π = x R π , thus Since Gauss-Legendre and Gauss-Radau integration rules are exact for polynomials up to degree 2M − 1 and 2M − 2, respectively, with positive weights, they are well suitable here, but Gauss-Lobatto rules do not meet the requirement of Theorem 2(2).
Proof (1): In [30], additionally supposing 0 < τ < · · · < τ M < 1, conditions are derived that ensure the existence and uniqueness of x R π minimizing Φ R π,M on X π . It is shown that x R π has similar convergence properties as x π minimizing Φ on X π . Merely the constant C R is slightly larger than C in (7). A further careful check of the proofs in [30] shows that the assertion holds also true for τ 1 = 0 and/or τ M = 1, possibly with a larger constant C R .
(2): For each arbitrary x ∈ X π , the expression shows that θ j is a polynomial with degree less than or equal to 2M − 2, thus Therefore, it follows that Φ I π,M (x) = Φ R π,M (x), for all x ∈ X π , and Φ I π,M coincides with the special functional Φ I π,M having the same nodes. Eventually, (2) is a particular case of (1). ⊓ ⊔ As already emphasized above, until now we are aware of only sufficient convergence conditions, which is, in particular, especially applicable for the size of M. So far, often the applications run well with M = N + 1 and no significant difference to calculations with a larger M was visible, e.g., [31,Section 6] and [32,Section 4]. Also the experiments in Section 4 below are carried out with M = N + 1. The following statement for A and B with polynomial entries allows to choose M independently of the index µ and confirms the choice M = N + 1 for constant A and B.
Theorem 3 Let the DAE (1) be regular with index µ ∈ N and let the boundary condition (2) be accurately stated. Let x * be a solution of the boundary value problem (1)- (2), and let q and also x * be sufficiently smooth. Let the entries of A and B be polynomials with degree less than or equal to N AB . Let all partitions π be such that h/h min ≤ ρ, with a global constant ρ. Then, with the following statements are true: (1) For sufficient fine partitions π and each sequence of arbitrarily placed nodes (17), there exists exactly one x R π ∈ X π minimizing the functional Φ R π,M on X π , and (2) For each integration rule of interpolation type related to the interval [0, 1], with M nodes (17) and positive weights γ 1 , . . . , γ M , which is exact for polynomials with degree less than or equal to 2M − 2, and sufficient fine partitions π, there exists exactly one x I π ∈ X π minimizing the functional Φ I π,M on X π , and x I π = x R π , thus (3) If A and B are even constant matrices, for sufficient fine partitions π and each sequence of arbitrarily placed nodes (17), there exists exactly one x R π ∈ X π minimizing the functional Φ R π,M on X π , and Proof ( in which Φ I π,M is associated to the corresponding Gauss-Legendre or Gauss-Radau rules. This follows from the fact that the 2-point Chebyshev integration nodes are just the Gauss-Legendre nodes. We underline that, by Theorem 3(3), the approximate solutions stay bounded also for DAEs with larger index µ, for instance [32, Table 6] confirming that for an index four Jordan DAE. ⊓ ⊔ Having in mind the implementation of such an overdetermined least-squares collocation, for given partition π and a given polynomial degree N, a number of parameters and options must be selected: basis functions for X π ; -number M of collocation points and their location 0 ≤ τ 1 < · · · < τ M ≤ 1; -setup and solution of the discrete least-squares problem.
Below we will discuss several issues in this context. The main aim is on implementations being as stable as possible, not necessarily best computational efficiency. (16) is based on polynomial interpolation using M nodes (17). It seems reasonable to choose these nodes in such a way that, separately on each subinterval [t j−1 ,t j ] of the partition, the interpolation error is as small as possible in a certain sense. Without restriction of the generality we can trace back the matter to the interval [0, 1].

Collocation nodes for
Consider functions q ∈ C([0, 1], R m ) and define the interpolation operator R M : with the Lagrange basis functions (18) is the subspace of all functions whose components are polynomials up to degree M − 1. Introducing ω(τ) = (τ − τ 1 )(τ − τ 2 ) · · · (τ − τ M ) and using componentwise the divided differences we have the error representation, e.g., [28,Kapitel 5], For the evaluation of Φ R π,M (16), it seems resonable to choose the collocation nodes in such a way that this expression is minimized for all functions q ∈ C (M) ([0, 1], R m ). The optimal set of nodes is determined by the condition It is well known that this functional is minimized if the collocation nodes are chosen to be the Gauss-Legendre nodes [28, Chapter 7.5.1 and 4.5.4].
On the other hand, the best polynomial approximation to a given function q in the L 2 -norm is obtained if the Fourier approximation with respect to the Legendre polynomials is computed. However, to the best of our knowledge, there are no estimations of the interpolation error in L 2 ((0, 1), R m ) known. 4 However, in the uniform norm and with arbitrary node sequences, for each q ∈ C([0, 1], R m ), the estimate  Table 1 shows some values for the Lebesgue constants. Note that the Lebesgue constants Λ U M for equidistant nodes grow exponentially, see e.g. [48]. 5

Remark 8 Computation of nodes and weights for for Gauss-type integration formulae
In the following we will make heavy use of Gauss-Legendre, Gauss-Radau, and Gauss-Lobatto integration nodes and their corresponding weights. Since we do not have them available in tabular form for large M with sufficient accuracy, they will be , and the Lebesgue constant is a bound of the operator norm. In there is a set of interpolation nodes τ * i which minimizes the corresponding Lebesgue constant Λ * M . This constant is only slightly smaller than Λ C M [34]. Table 2 Accuracy of the computed nodes and weights of the Gauss-Lobatto integration rules. For each method, the absolute error of the nodes (A), the absolute error of the weights (B), the maximum componentwise relative error of nodes (C) and weights (D) are shown. The machine accuracy (machine epsilon) computed on the fly. A severe concern is the accuracy of the nodes and weights. In the case of Gauss-Legendre integration rules, the computed nodes and weights have been provided by the Gnu Scientific Library routine glfixed.c [18]. For computing the Gauss-Lobatto nodes and weights, the methods of [41] (using the Newton method) as well as [21] (a variant of the method in [26]) have been implemented. Table 2 contains some comparisons to the tabulated values in [41] that have 20 digits. The method of [41] provides slighly more accurate values than that of [21]. Therefore, the former has been used further on.
We did not find sufficiently accurate tabulated values for the Gauss-Radau nodes and weights. Therefore, the method of [20] has been implemented. We assume that the results obtained have an accuracy similar to the values for the Gauss-Lobatto nodes and weights using the method in [21]. ⊓ ⊔

The mass matrix
In the following, we will make extensive use of Legendre polynomials. For the readers convenience, the necessary properties are collected in Appendix A.1.
Let us turn to Φ R π,M (16) again. A critical ingredient for determining its properties is the mass matrix L R in (20). Denote as before by l i (τ), i = 1, . . . , M, the Lagrange basis functions for the node sequence (17), that is, cf. (18), For evaluating L R , we will use the normalized shifted Legendre polynomialsP ν = (2ν + 1) 1/2P ν (cf Appendix A.1). Assume the representation A short calculation shows Letting The definition of the coefficients α iν provides us withṼ a i = e i where e i denotes the i-th unit vector and This gives A =Ṽ −1 .
V =Ṽ T is a so-called Vandermonde-like matrix [19]. It is nonsingular under the condition (17) [45, Theorem 3.6.11]. In [19], representations and estimations of the condition number with respect to the Frobenius norm of such matrices are derived. In particular, [19, Table 1] shows impressingly small condition numbers if the collocation nodes are chosen to be the zeros ofP M , that is the Gauss-Legendre nodes. Moreover, this condition number is optimal among all scalings of the Legendre polynomials [19]. A consequence of the Christoffel-Darboux formula is that the rows ofṼ are orthogonal for Gauss-Legendre nodes. 6 Thus, we have the representationṼ = DU with an orthogonal matrix U and a diagonal matrix D with positive diagonal entries. 7 It is known that the Gauss-Legendre nodes are not the very best set of nodes. However, a comparison of Tables 1 and 2 in [19] as well as [22,Table 4] indicates that the gain of choosing optimal nodes for Legendre polynomials compared to the choice of Gauss-Legendre nodes is rather minor.
In Table 3 we provide condition numbers ofṼ with respect to the Euclidean norm for different choices of nodes. Note that the condition number of L R is the square of that ofṼ .
The condition numbers for all Gauss-type and Chebyshev nodes are remarkably small. 6 The Christoffel-Darboux formula for Legendre polynomials reads: If i = κ, then where µ M and µ M−1 are the leading coefficients ofP M andP M−1 , respectively. For the Gauss-Legendre nodes, it holdsP M (τ i ) = 0. Hence, the right hand side vanishes. 7 The diagonal elements d i , i = 1,... ,M of D can be evaluated analytically using the Christoffel-Darboux formula again: In oder to apply Φ I π,M (15), a numerical quadrature formulae is necessary. For standard nodes sequences (Gauss-Legendre, Gauss-Lobatto, Gauss-Radau) their computation has been described above. However, for general node sequences, the weights must be evaluated. This can be done following the derivations in [45, p. 175]: LetP ν (τ) denote the normalized shifted Legendre polynomials as before (cf. Appendix A.1). In particular, it holds then For a given function q ∈ C[0, 1], the integral is approximated by the integral of its polynomial interpolation. Using the representation (23) of the Lagrange basis functions we obtain Consequently, for the weights it holds γ i = α i1 , i = 1, . . . , M. The definition (23) shows that the vector γ = (γ 1 , . . . , γ M ) T of weights fulfills the linear system where V =Ṽ T withṼ from (24) and e 1 = (1, 0, . . . , 0) T is the first unit vector.
The discussion of the condition number of V shows that we can expect reliable and accurate results at least for reasonable node sequences.
For general node sequences, the weights may become negative. This happens, for example, for uniformly distributed nodes and M > 7 (Newton-Cotes formulae) [45, p. 148]. So for Φ I π,M , only node sequences leading to positive quadrature weights γ i are admitted in order to prevent L I from not being positive definite.

Choice of basis functions for the ansatz space X π
The ansatz space X π (5) consists of piecewise polynomials having the degree N − 1 for the algebraic components and the degree N for the differentiated ones on each subinterval of the partition π (4). For collocation methods for boundary value problems for ordinary differential equations this questions has lead to the choice of a Runge-Kutta basis for stability reasons, see [2]. This has been later on also used successfully for boundary value problems for index-1 DAEs [3,35,38,36]. However, this ansatz makes heavily use of the collocation nodes which are at the same time used as the nodes for the Runge-Kutta basis. In our case, the number M of collocation nodes and the degree N of the polynomials for the differentiated components do not coincide since M > N such that the reasoning applied in the case of ordinary differential equations does not transfer to the least-squares case.
Taking into account the computational expense for solving the discretized system, bases with local support are preferable. Ideally, the support of each basis function consists of only one subinterval of (4). 8 Note that the Runge-Kutta basis has this property. We consider the Runge-Kutta basis and further local basis with orthogonal polynomials. A drawback of this strategy is the fact that the continuity of the piecewise polynomials approximating the differentiated components must be ensured explicitly. This in turn will lead to a discrete least-squares problem with equality constraints. Details can be found in Appendix A.3.
Looking for a local basis we turn to the reference interval [0, 1]. Once a basis on this reference interval is available it can be defined on any subinterval (t j−1 ,t j ) by a simple linear transformation.
Assume that {p 0 , . . . , p N−1 } is a basis of the set of polynomials of degree less than N defined on the reference interval [0, 1]. Then, a basis {p 0 , . . . ,p N } for the ansatz functions for the differentiated components is given bȳ and the transformation to the interval (t j−1 ,t j ) of the partition π (4) yields Additionally to this transformation, the continuity of the piecewise polynomials must be ensured. This gives rise to the additional conditions which must be imposed explicitely. 9

The Runge-Kutta basis
In order to define the Runge-Kutta basis, let the N interpolation points ρ i with (9) be given. Then, the Lagrange basis functions are chosen, Remark 9 Note that the interpolation nodes are only used to define the local basis functions. Thus, their selection is completely independent of the choice of collocation nodes. In view of the estimations (10) and (11) and the argumentation there we prefer Gauss-Legendre interpolation nodes. This choice is also supported by Experiments 2 and 5 below.
The numerical computation ofp is more involved. If not precalculated, the integrals must be available in a closed formula. This can surely be done by expressing the Lagrange basis functions in the monomial representation such that the integration can be carried out analytically. Once these coefficients are known, the evaluation of the values of the basis functions at a given ρ ∈ [0, 1] is easily done using the Horner method. However, this approach amounts to the inversion of the Vandermonde matrix using the nodes (9). This matrix is known to be extremly ill-conditioned. In particular, its condition number grows exponentially with N [23,7]. Therefore, an orthogonal basis might be better suited. This leads to a representation for some polynomials Q 1 , . . . , Q N . If these polynomials fulfil a three-term recursion, 10 the evaluation of function values can be performed using the Clenshaw algorithm [17] which is only slightly more expensive than the Horner method. In order to use this approach, the integrals of p 0 , . . . , p N−1 must be easily representable in terms of the chosen basis. Here, the Legendre and Chebyshev polynomials are well-suited (cf below Appendix A.1 and (32) as well as Appendix A.2 and (34)).

Orthogonal polynomials
A reasonable choice for the basis are orthogonal polynomials. We will consider Legendre polynomials first. A motivation is provided in the following example. Let {P 0 , . . . ,P N−1 } be the normalized shifted Legendre polynomials. Then letting x = ∑ N i=1 α iPi−1 for some vector α = (α 1 , . . . , α N ) T , the least-squares functional Similar relations hold for the differential equation x ′ = f if the basis functions for the differentiated components are constructed according to (25). Hence, these basis functions seem to qualify well for index-1 DAEs. ⊓ ⊔ The necessary ingredients for the efficient implementation of the Legendre polynomials are collected in Appendix A.1.
Another common choice are Chebyshev polynomials of the first kind. They have been used extensivly in the context of spectral methods because of their excellent approximation properties, cf [15,47], see also [14]. The relations used for their implementation can be found in Appendix A.2. 11

Comparison of different basis representations
The choice of the basis function representations is dominated by the question of obtaining a most robust implementation. The computational complexity of the representations presented above is not that much different such that this aspect plays a minor role.
The check for robustness can be subdivided into two questions: 1. Which representation is most robust locally? 2. Which representation is most robust globally?
In the following experiments, N will be varied. The functional used is Φ R π,M . The number of collocation nodes is M = N + 1. Table 3 motivates the choice of the Gauss-Legendre nodes as collocation nodes. In order to compute the norms of L 2 ((0, 1), R m ) and H 1 D ((0, 1), R m ), Gaussian quadrature with N + 2 integration nodes on each subinterval of π is used.

Local behavior of the basis representations
In order to answer the first question, is is reasonable to experiment first with a higher index example that does not have any dynamic components (that is, l = 0) on a grid 11 Let us note in passing that the first routine for solving two-point boundary problems in the NAG library (NAG® is a registered trademark of The Numerical Algorithms Group) besides shooting methods was just a least-squares collocation method corresponding to n = 1 and using a version of the functional Φ C π,M . The NAG routine D02AFF and its predecessor D02TGF (and its driver D02JBF) use Chebyshev polynomials as basis functions and Gauss-Legendre nodes as collocation points [24,1]. This routine appeared as early as 1970 in Mark 8 of the library and survived to date (as of Mark 27 of 2019) [46]. π consisting only of one subinterval (that is, n = 1). In that case, we check the ability to interpolate functions and to numerically differentiate them.
For n = 1, there are no continuity conditions (27) involved. Therefore, the discrete problem becomes a linear least-squares problem. We will solve it by a Householder QR-factorization with column pivoting as implemented in the Eigen library.
The following example is used in [32,31].
Example 2 This is an index-3 example with dynamical degree of freedom l = 0 such that no additional boundary or initial conditions are necessary for unique solvability. We choose the exact solution and adapt the right-hand side q accordingly. For the exact solution, it holds x * L 2 ((0,1),

Experiment 1 Robustness of the representation of the Runge-Kutta basis
In a first experiment we intend to clarify the differences between different representations of the Runge-Kutta basis. The interpolation nodes (9) have been fixed to be the Gauss-Legendre nodes (cf (10)). The Runge-Kutta basis has been represented with respect to the monomial, Legendre, and Chebyshev bases. The results are shown in Figure 1 (see appendix). This test indicates that the monomial basis is much less robust than the others for N > 10 while the other representations behave very similar. ⊓ ⊔ Experiment 2 Robustness of the Runge-Kutta basis with respect to the node sequence In this experiment we are interested in understanding the influence of the interpolation nodes. For that, we compared the uniform nodes sequence to the Gauss-Legendre and Chebyshev nodes. The uniform nodes are given by ρ i = (i − 1 2 )/N. In accordance with the results of the previous experiment, the representation of the Runge-Kutta basis in Legendre polynomials has been chosen. The results are shown in Figure 2. Not unexpectedly, uniform nodes are inferior to the other choices at least for N > 13. On the other hand, there is no significant difference between Gauss-Legendre and Chebyshev nodes. ⊓ ⊔

Experiment 3 Robustness of different polynomial representations
In this experiment we intend to compare the robustness of different bases. Therefore, we have chosen the Runge-Kutta basis with Gauss-Legendre interpolation nodes, the Legendre polynomials, and the Chebyshev polynomials. The results are shown in Figure 3. All representations show similar behavior. ⊓ ⊔ A general note is in order. The exact solution has approximately the norm 1 in all used norms. The machine accuracy is ε mach ≈ 2.22 × 10 −16 in all computations. The best accuracy obtained is 10 −12 -10 −14 . Considering that there is a twofold differentiation involved in the problem of the example we would expect a much lower accuracy. This surprising behavior has also been observed in other experiments.
The next example is an index-3 one which has l = 4 dynamical degrees of freedom. It is the linearized version of an example presented [10] that has also been considered in [31].

Example 3 Consider the DAE
subject to the initial conditions This problem has the tractability index 3 and dynamical dgree of freedom l = 4. The right-hand side q has been chosen in such a way that the exact solution becomes x * ,3 = 2 cos 2 t, x * ,6 = −2 sin 2t, For the exact solution, it holds x * L 2 ((0,5),R 7 ) ≈ 5.2, x * L ∞ ((0,5),R 7 ) = 2, and x * H 1 D ((0,5),R 7 ) ≈ 9.4. ⊓ ⊔ The following experiments with Example 3 are carried out under the same conditions as before when using Example 2.

Experiment 4 Robustness of the representation of the Runge-Kutta basis
In this experiment we intend to clarify the differences between different representations of the Runge-Kutta basis. The interpolation points have been fixed to be the Gauss-Legendre nodes. The Runge-Kutta basis has been represented with respect to the monomial, Legendre, and Chebyshev bases. The results are shown in Figure 4. This test indicates that the monomial basis is much less robust than the others for N > 15 while the other representations behave very similar. ⊓ ⊔ Experiment 5 Robustness of the Runga-Kutta basis with respect to the node sequence In this experiment we are interested in understanding the influence of the interpolation nodes. For that, we compared the uniform nodes sequence to the Gauss-Legendre and Chebyshev nodes. The uniform nodes are given by ρ i = (i − 1 2 )/N. In accordance with the results of the previous experiment, the representation of the Runge-Kutta basis in Legendre polynomials has been chosen. The results are shown in Figure 5. Not unexpectedly, uniform nodes are inferior to the other choices at least for N > 20. However, there is no real difference between Gauss-Legendre and Chebyshev nodes. ⊓ ⊔

Experiment 6 Robustness of different polynomial representations
In this experiment we intend to compare the robustness of different bases. Therefore, we have chosen the Runge-Kutta basis with Gauss-Legendre interpolation nodes, the Legendre polynomials, and the Chebyshev polynomials. The results are shown in Figure 6. All representations show similar behavior.
As a conclusion, we can see that the results of the Experiments 1-3 and 4-6 are largely consistent.

Global behavior of the basis representations
We are interested in understandig the global error, which corresponds to error propagation in the case of initial value problems. In order to understand the error propagation properties we will investigate the accuracy of the computed solution with respect to an increasing number of subintervals n. This motivates to use a rather low order N of polynomials. In the previous section we observed that there is no difference in the local properties between different basis representations for low degrees N of the ansatz polynomials.
In the following experiments, the functionals used are Φ R π,M and Φ C π,M . The number of collocation nodes is again M = N + 1. The basis functions are the shifted Legendre polynomials.
The discrete problem for n > 1 is an equality constraint linear least-squares problem. The equality constraints consists just of the continuity requirements for the differentiated components of the elements in X π . The problem is solved by a direct solution method as described in Section 5. In short, the equality constraints are eliminated by a sparse QR-decomposition with column pivoting as implemented in the code SPQR [12]. The resulting least-squares problem has then been solved by the same code.

Experiment 7
Influence of selection of collocation nodes, approximation degree N, and number n of subintervals In this experiment, we use Example 3 and vary the choice of collocation nodes as well as the degree N of the polynomial basis and the number n of subintervals. We compare Gauss-Legendre, Radau IIA and Lobatto collocation nodes. Since this example is a pure initial value problem, the use of the Radau IIA collocation nodes is especially justified. 12 The results using Φ R π,M are collected in Table 20, those using Φ C π,M in Table 21. We observe no real difference between the different sets of collocation points. The results seem to confirm the conjecture that, in case of smooth problems, a higher degree N is preferable over a larger n or, equivalently, a smaller stepsize h. In addition, for the highest degree polynomials (N = 20), the use of Φ C π,M seems to produce more accurate results than that of Φ R π,M . ⊓ ⊔

The discrete least-squares problem
Once the basis has been chosen and the collocation conditions are selected, the discrete problems (16), (15), and (14) for a linear boundary value problem (1) -(2) lead to a constraint linear least-squares problem under the constraint The equality constraints consists of the k(n − 1) continuity conditions for the approximation of the differential constraints while the functional ϕ(c) represent a reformulation of the functionals (16), (15), and (14), respectively. Here, c ∈ R n(mN+k) is the vector of coefficients of the basis functions for X π disregarding the continuity conditions. Furthermore, it holds r ∈ R nM , A ∈ R (nM+l)×n(mN+k) , and C ∈ R (n−1)k×n(mN+k) . The matrices A and C are very sparse. For details about their structure we refer to Appendix A.3.

Approaches to solve the constraint optimization problem (29)-(30)
A number of approaches to solve the constraint optimization problem (29)-(30) have been tested.
1. Direct method. The solution manifold of (30) forms a subspace which can be characterized by 13 C c = 0 if and only if c =C z for some z ∈ R nmN+k−l .
Here,C ∈ R n(mN+k)×(nmN+k−l) has orthonormal columns. With this representation, the constrained minimization problem can be reduced to the uncontrained oneφ (z) = AC z − r 2 R W → min! The implemented algorithm is that of [9], see also [8, Section 5.1.2] which is sometimes called the direct elimination method. 2. Weighting of the constraints. In this approach, a sufficiently large parameter ω > 0 is chosen and the problem (29) -(30) is replaced by the free minimization problem ϕ ω (c) = A c − r 2 R W + ω C c → min! It is known that the 14 minimizer c ω of ϕ ω converges towards the solution of (29) -(30) for ω → ∞ (cf. [25, Section 12.1.5]). Two different orderings of the equations have been implemented. One is while the other uses a block-bidiagonal structure as it is common for collocation methods for ODEs, cf [2]. It is known that the order of the equations in the weighting method may have a large impact on the accuracy of the solutions [49]. In our test examples, however, we did not observe a difference in the behavior of both orderings. 3. The direct solution method by eliminating the constraints has often the deficiency of generating a lot of fill-in in the intermediate matrices. An approach to overcome this situation has been proposed in [49]. The solutions of the weighting approach are iteratively enhanced by a defect correction process. This method is implemented in the form presented in [4,5]. This form is called the deferred correction procedure for constrained leaset-squares problems by the authors. As a stopping criterion, the estimate (i) in [5, p. 254] has been implemented. Additionally, a bound for the maximal numer of iterations can be provided. Under reasonable conditions, at most 2 iterations should be sufficient for obtaining maximal (with respect to the sensitivity of the problem) accuracy for the discrete solution.
The results of the weighting method depend substantially on the choice of the parameter ω. In order to have an accurate approximation of the exact solution c * of the problem (29)-(30), a large value of ω should be used (in the absence of rounding errors). However, if ω becomes too large, the algorithm may lack numerical stability. A discussion of this topic has been given in [49]. In particular, it turns out that the algorithm used for the QR decomposition and the pivoting strategies have a strong influence on the success of this method. In our implementation, we use the sparse QR implementation of [12]. On the other hand, an accuracy of the solution being much lower than the approximation error of x π is not necessary. 15 Therefore, a number of experiments have been done in order to obtain some insight into what reasonable choices might be. The results indicate that an optimal ω may vary considerably depending on the problem parameters. However, the accuracy against the exact solution is rather insensitive of ω. ⊓ ⊔ The following example is a boundary value problem incontrast to Example 3 which is an intial value problem.
subject to the boundary conditions Table 4 Influence of parameter ω for the constraints in Example 3 using N = 5 and n = 160. The error of the solution with respect to the exact solution (A) and with respect to a discrete reference solution obtained by a direct method (B) is given in the norms of L 2 ((0,5),R 7 ), L ∞ ((0,5),R 7 ) and H 1 D ((0,5), 2.25e+00 4.54e+00 9.37e+00 2.25e+00 4.54e+00 8.04e+00 1e-08 2.00e+00 4.59e+00 9.04e+00 2.00e+00 4.59e+00 9.04e+00 1e-07 3.55e-01 5.  Table 5 Influence of parameter ω for the constraints in Example 3 using N = 20 and n = 20. The error of the solution with respect to the exact solution (A) and with respect to a discrete reference solution obtained by a direct method (B) is given in the norms of L 2 ((0,5), This DAE has the tractability index µ = 4 and dynamical degree of freedom l = 2. The solution reads The iterative solver using defect corrections may overcome the difficulties connected with a suitable choice of the parameter ω in the weighting method. According to Experiment 10, we would expect the optimal ω to be in the order of magnitude 10 −3 . . . 10 +2 with an optimum around 10 −2 . This is in contrast to the recommendations given in [5] where a choice of ω ≈ ε −1/3 mach is recommended for the deferred correction algorithm. We test the performance of the deferred correction solver in the next experiment. Here, the tolerance in the convergence check is set to 10 −15 . The iterations are considered not to converge if the convergence check has failed after two iterations.

Experiment 9
We check the performance of the deferred correction solver in dependence of the weight parameter ω. Both Examples 3 and 4 are used. The results are presented in Tables 6 -9. The results indicate that a larger value for ω seems to be preferable. ⊓ ⊔ Table 6 Influence of the parameter ω on the accuracy of th discrete solution for Example 3 using N = 5 and n = 160. The error of the solution with respect to the exact solution (A) and with respect to a discrete reference solution obtained by a direct method (B) is given in the norms of L 2 ((0,5),R 7 ), L ∞ ((0,5),R 7 ) and H 1 D ((0,5),R 7 ). 2 iterations are applied 13e-08 1.39e-08 1.40e-07 4.30e-09 3.30e-09 3.31e-08 10 1.73e-08 1.12e-08 1.12e-07 5.43e-11 1.62e-11 1.63e-10 ε −1/3 mach 1.73e-08 1.12e-08 1.12e-07 5.42e-11 1.62e-11 1.63e-10 a Iteration did not converge Table 7 Influence of the parameter ω on the accuracy of th discrete solution for Example 3 using N = 20 and n = 20. The error of the solution with respect to the exact solution (A) and with respect to a discrete reference solution obtained by a direct method (B) is given in the norms of L 2 ((0,5),R 7 ), L ∞ ((0,5),R 7 ) and H 1 D ((0,5),R 7 ). 2 iterations are applied 20e-11 3.35e-12 3.37e-11 6.25e-11 6.67e-12 6.70e-11 10 1.79e-11 1.98e-12 1.99e-11 6.26e-11 6.91e-12 6.95e-11 ε −1/3 mach 1.11e-11 1.71e-12 1.72e-11 6.26e-11 6.94e-12 6.97e-11 a Iteration did not converge Table 8 Influence of the parameter ω on the accuracy of the discrete solution for Example 4 using N = 20 and n = 5. The error of the solution with respect to the exact solution (A) and with respect to a discrete reference solution obtained by a direct method (B) is given in the norms of L 2 ((0,5),R 6 ), L ∞ ((0,5),R 6 ) and H 1 D ((0,5),R 6 ). 2 iterations are applied  Table 9 Influence of the parameter ω on the accuracy of th discrete solution for Example 4 using N = 5 and n = 20. The error of the solution with respect to the exact solution (A) and with respect to a discrete reference solution obtained by a direct method (B) is given in the norms of L 2 ((0,5),R 6 ), L ∞ ((0,5),R 6 ) and H 1 D ((0,5),R 6 ). 2 iterations are applied a Iteration did not converge Table 10 Case characteristics for Experiment 10 using the Legendre basis. The number of nonzero elements in the matrices A and C are provided as reported by the functions of the Eigen library. The columns denote: the number of rows of A (dimA), the number of rows of C (dimC), the number of unknowns (nun), the number of nonzero elements of C (nnzC), the number of nonzero elements of A (nnzA) for the functional Φ R π,M and Φ C π,M , respectively nun  nnzC  nnzA  nnzA  1  3  320  8964  1914  8640  5742  101124  101124  2  5  80  3364  474  3280  1422  58964  59044  3  10  5  389  24  380  72  12749  12334  4  20  5  739  24  730  72  47509  46534 5.2 Performance of the linear solvers In this section, we intend to provide some insight into the behavior of the linear solvers. This concerns both the accuracy as well as the computational resources (computation time, memory consumption). All these data are highly implementation dependent. Also the hardware architecture plays an important role. The linear solvers have been implemented using the standard strategy of subdividing them into a factorization step and a solve step. The price to pay is a larger memory consumption. However, their use in the context of, e.g., a modified Newton method may decrease the computation time considerably.
The tests have benn run on a Linux laptop Dell Latitude E5550. While the program is a pure sequential one, the MKL library may use shared memory parallel versions of their BLAS and LAPACK routines. The CPU of the machine is an Intel(R) Core(TM) i7-5600U CPU @ 2.60GHz providing two cores, each of them capable of hyperthreading. For the test runs, cpu throttling has been disabled such that all cores ran at roughly 3.2 GHz.
The parameter for the weighting solver is ω = 1 while the corresponding parameter for the deferred correction solver is ω = ε  Table 10. For the special properties of the Legendre polynomials, the matrix C representing the constraints is extremly sparse featuring only three nonzero elements per row. The computational results are shown in Table 11. In the next computations, the Chebyshev basis has been used which leads to a slightly more occupied matrix C . The results are provided in Tables 12 and 13. The previous example is an initial value problem. This structure may have consequences on the performance of the linear solvers. Therefore, in the next experiment, we consider a boundary value problem.

Experiment 11
We repeat Experiment 10 with Example 4. The problem characteristics and computational results are provided in Tables 14 -17. It should be noted that Table 11 Computing times, permanent workspace needed, and error for the cases described in Table 10. The computing times are provided in milliseconds. They are the average of 100 runs of each case. The error is measured in the norm of H 1 D ((0,5),R 7 ). The column headings denote: The upper bound on the number of nonzero elements of the QR-factors as reported by SPQR (nWork), the time for the matrix assembly (tass), the time for the factorization (afact), and the time for the solution (tslv) for both functionals Φ R π,M and Φ C π,M Table 12 Case characteristics for Experiment 10 using the Chebyshev basis. The number of nonzero elements in the matrices A and C are provided as reported by the functions of the Eigen library. The columns denote: the number of rows of A (dimA), the number of rows of C (dimC), the number of unknowns (nun), the number of nonzero elements of C (nnzC), the number of nonzero elements of A (nnzA) for the functional Φ R π,M and Φ C π,M , respectively nun  nnzC  nnzA  nnzA  1  3  320  8964  1914  8640  7656  101128  101124  2  5  80  3364  474  3280  3318  58851  59056  3  10  5  389  24  380  360  12846  12626  4  20  5  739  24  730  720  47581  47191   Table 13 Computing times, permanent workspace needed, and error for the cases described in Table 12.
The computing times are provided in milliseconds. They are the average of 100 runs of each case. The error is measured in the norm of H 1 D ((0,5),R 7 ). The column headings denote: The upper bound on the number of nonzero elements of the QR-factors as reported by SPQR (nWork), the time for the matrix assembly (tass), the time for the factorization (afact), and the time for the solution (tslv) for both functionals Φ R π,M and Φ C Table 14 Case characteristics for Experiment 11 using the Legendre basis. The number of nonzero elements in the matrices A and C are provided as reported by the functions of the Eigen library. The columns denote: the number of rows of A (dimA), the number of rows of C (dimC), the number of unknowns (nun), the number of nonzero elements of C (nnzC), the number of nonzero elements of A (nnzA) for the functional Φ R π,M and Φ C π,M , respectively nun  nnzC  nnzA  nnzA  1  4  320  9602  1595  9280  4785  86403  80643  2  5  160  5762  795  5600  1422  63363  63363  3  10  5  332  20  325  60  6933  6663  4  20  5  632  20  625  60  25793  25263   Table 15 Computing times, permanent workspace needed, and error for the cases described in Table 14.     dimC  nun  nnzC  nnzA  nnzA  1  4  320  9602  1595  9280  7656  86406  82566  2  5  160  5762  795  5600  5565  63367  63367  3  10  5  332  20  325  300  6945  6795  4  20  5  632  20  325  600  25830  25560 the deferred correction solver returned normally (tolerance as before 10 −15 ) after at most two iterations in all cases. However, in some cases, the results are completely off. This happens, for example, in Tables 15 and 17, cases 1 and 2, for Φ C π,M . It should be noted that a considerable amount of memory for the QR-factorizations is consumed by the internal representation of the Q-factor in SPQR. This can be avoided if the factorization and solution steps are intervowen.

Sensitivity of boundary condition weighting
As already known for boundary value problems for ODEs and index-1 DAEs, a special problem is the scaling of the boundary condition, and hence, here the inclusion of the boundary conditions (2). Their scaling is independent of the scaling of the DAE (1). Therefore, it seems to be reasonable to provide an additional possibility for the scaling of the boundary conditions. We decided to enable this by introducing an additional parameter α to be chosen by the user. So, Φ from (6) is replaced by the functional Analogously, the discretized versions Φ R π,M , Φ I π,M and Φ C π,M are replaced by their counterpartsΦ R π,M ,Φ I π,M andΦ C π,M with weighted boundary conditions. The convergence theorems will hold true for these modifications of the functional, too.

Experiment 12 Influence of α on the accuracy
We use the example and settings of Experiment 8. The results are provided in Table 18. ⊓ ⊔ Experiment 13 Influence of α on the accuracy We repeat the previous experiment with Example 4. The discretization parameters are (i) N = 5, n = 20 and (ii) N = 20, n = 5. All other settings correspond to those of Experiment 12. The results are presented in Table 19. ⊓ ⊔ Table 18 Influence of weight parameter α for the boundary conditions in Example 3. The error of the solution is given in the norms of L 2 ((0,5),R 7 ), L ∞ ((0,5),R 7 ) and H 1 D ((0,5),R 7 ) N = 5, n = 160 N = 20, n = 20 α L ∞ (0,5)  Table 19 Influence of weight parameter α for the boundary conditions in Example 4. The error of the solution is given in the norms of L 2 ((0,1),R 6 ), L ∞ ((0,1),R 6 ) and H 1 D ((0,1), The results of Experiments 12 and 13 indicate that the final accuracy is rather insensitive to the choice of α. It should be noted that the coefficient matrices in Examples 3 and 4 are well-scaled.

Final remarks and conclusions
In summary, in the present paper, we investigated questions related to an efficient and reliable realization of a least-squares collocation method. These questions are particularly important since a higher index DAE is an essentially ill-posed problem in naturally given spaces, which is why we must be prepared for highly sensitive discrete problems. In order to obtain a overall procedure that is as robust as possible, we provided criteria which led to a robust selection of the collocation points and of the basis functions, whereby the latter is also useful for the shape of the resulting discrete problem. Additionally, a number of new, more detailed, error estimates have been given that support some of the design decisions. The following particular items are worth highlighting in this context: -The basis for the approximation space should be appropriately shifted and scaled orthogonal polynomials. We could not observe any larger differences between the behavior of Legendre and Chebyshev polynomials. -The collocation points should be chosen to be the Gauss-Legendre, Lobatto, or Radau nodes. This leads to discrete problems whose conditioning using the discretization by interpolation (Φ R π,M ) is not much worse than that resembling collocation methods for ordinary differential equations (Φ C π,M ). A particular efficient and stable implementation is obtained if Gauss-Legendre or Radau nodes are used since, in this case, diagonal weighting (Φ I π,M ) coincides with the interpolation approach.
-A critical ingredient for the implementation of the method is the algorithm used for the solution of the constrained linear least-squares problems. Given the expected bad conditioning of the least-squares problem, a QR-factorization with column pivoting must lie at the heart of the algorithm. At the same time, the sparsity structure must be used as best as possible. In our tests, the direct solver seems to be the most robust one. With respect to efficiency and accuracy, the deferred correction solver is preferable. However, it failed in certain tests. -It seems as if, for problems with a smooth solution, a higher degree N of the ansatz polynomials with a low number of subintervals n in the mesh is preferable over a smaller degree with a larger number of subintervals with respect to accuracy. Some first theoretical justification has been provided for this claim. -The simple collocation procedure using Φ C π,M performs surprisingly well. In fact, the results are, in our experiments, in par with those using Φ R π,M = Φ I π,M . However, we have no theoretical justification for this as yet.
-Our method is designed for variable grids. However, so far we have only worked with constant step size. In order to be able to adapt the grid and the polynomial degree, or even select appropriate grids, it is important to understand the structure of the error, that is, how the global error depends on local errors. This is a very important open problem, for which we have no solution yet.
In conclusion, we note that earlier implementations, among others the one from the very first paper in this matter [32], which started from proven ingredients for ODE codes, are from today's point of view and experience a rather bad version for the leastsquares collocation. Nevertheless, the test results calculated with it were already very impressive. This strengthens our belief that a careful implementation of the method gives rise to a very efficient solver for higher-index DAEs.
A Some facts about classical orthogonal polynomials In the derivations, classical orthogonal polynomials have been heavily used. For the reader's convenience important properties are collected below.

A.2 Chebyshev polynomials
The Chebyshev polynomials of the first kind T ν , ν = 0,1,..., are defined by the recurrence relation T 0 (τ) = 1, Some properties of the Chebyshev polynomials are .. Similarly as before, we obtain the simple presentation The orthogonality property of the Chebyshev polynomials reads The normalized Chebyshev polynomialsT ν are given bȳ A.3 The structure of the discrete problems Consider the linear DAE (1). In order to simplify the notation slightly, define E = AD such that, for sufficiently smooth functions x ∈ X π , (1) is equivalent to Let, on (t j−1 ,t j ), x(t) = (x j1 (t),... ,x jm (t)) T . Then, we have the representations 16P ν ist eine Standardbezeichnung.
Then it holds x j (t) = a j (t)c j . Then, for W j of (19), we have the representation w j (t ji ) = E(t ji )a ′ j (t ji ) + B(t ji )a j (t ji ) c j − q(t ji ) =: A ji c j − r ji and The functionals Φ r π,M have, for r = R,I,C an representation of the kind Moreover, the continuity conditions (27) can be represented by the matrix . . . . . .
The discrete minimization problem becomes, therefore, ϕ r (c) = A c − r 2 R nmM+l → min! under the constraint C c = 0.  Table 20 Error of the approximate solution using Legendre basis functions and Gauss-Legendre (G), Radau IIA (R), and Lobatto (L) collocation nodes when using the functional Φ R π,M in Example 3. The norm is that of H 1 D ((0,5),R 7 )  Table 21 Error of the approximate solution using Legendre basis functions and Gauss-Legendre (G), Radau IIA (R), and Lobatto (L) collocation nodes when using the functional Φ C π,M in Example 3. The norm is that of H 1 D ((0,5),R 7 )