Solving large linear least squares problems with linear equality constraints

Scott, Jennifer; Tůma, Miroslav

doi:10.1007/s10543-022-00930-2

Solving large linear least squares problems with linear equality constraints

Open access
Published: 05 July 2022

Volume 62, pages 1765–1787, (2022)
Cite this article

Download PDF

You have full access to this open access article

BIT Numerical Mathematics Aims and scope Submit manuscript

Solving large linear least squares problems with linear equality constraints

Download PDF

3372 Accesses
3 Citations
Explore all metrics

Abstract

We consider the problem of solving large-scale linear least squares problems that have one or more linear constraints that must be satisfied exactly. While some classical approaches are theoretically well founded, they can face difficulties when the matrix of constraints contains dense rows or if an algorithmic transformation used in the solution process results in a modified problem that is much denser than the original one. We propose modifications with an emphasis on requiring that the constraints be satisfied with a small residual. We examine combining the null-space method with our recently developed algorithm for computing a null-space basis matrix for a “wide” matrix. We further show that a direct elimination approach enhanced by careful pivoting can be effective in transforming the problem to an unconstrained sparse-dense least squares problem that can be solved with existing direct or iterative methods. We also present a number of solution variants that employ an augmented system formulation, which can be attractive for solving a sequence of related problems. Numerical experiments on problems coming from practical applications are used throughout to demonstrate the effectiveness of the different approaches.

Approximating sparse Hessian matrices using large-scale linear least squares

Article Open access 02 November 2023

Performance of first- and second-order methods for $\ell _1$-regularized least squares problems

Article Open access 14 June 2016

Scaled nonlinear conjugate gradient methods for nonlinear least squares problems

Article 03 October 2018

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Our interest lies in efficient and robust methods for solving large-scale linear least squares problems with linear equality constraints. We assume that $A \in {{\mathbb {R}}}^{m \times n}$ and $C \in {{\mathbb {R}}}^{p \times n}$, with $m > n \gg p$. We further assume that A is large and sparse and C represents a few, possibly dense, linear constraints. Given $b \in {{\mathbb {R}}}^{m}$ and $d \in {{\mathbb {R}}}^{p}$, the least squares problem with equality constraints (the LSE problem) is

$$\begin{aligned}&\min _{x\in {{\mathbb {R}}}^{n}} \left\| A\, x -b \right\| ^2_2 \end{aligned}$$

(1.1)

$$\begin{aligned}&\text{ s.t. } \;\; C\, x = d. \end{aligned}$$

(1.2)

A solution exists if and only if (1.2) is consistent. For simplicity, we assume that C has full row rank (although the proposed approaches can be made more general). In this case, (1.2) is consistent for any d. A solution to the LSE problem (1.1)–(1.2) is unique if and only if ${{{\mathscr {N}}}}(A) \cap {{{\mathscr {N}}}}(C) =\{0\}$, where for any matrix B, ${{{\mathscr {N}}}}(B)$ denotes its null space. This is equivalent to the extended matrix

$$\begin{aligned} {{{\mathscr {A}}}} = \begin{pmatrix} A \\ C \end{pmatrix} \end{aligned}$$

(1.3)

having full column rank. In the case of non-uniqueness, there is a unique minimum-norm solution.

LSE problems arise in a variety of practical applications, including scattered data approximation [13], fitting curves to data [16], surface fitting problems [35], real-time signal processing, and control and communication leading to recursive problems [50], as well as nonlinear least squares problems and least squares problems with inequality constraints. For example, in fitting curves to data, equality constraints may arise from the need to interpolate some data or from a requirement for adjacent fitted curves to match with continuity of the curves and possibly of some derivatives. Motivations for LSE problems together with solution strategies are summarized in the research monographs [7, 8, 29].

Classical approaches for solving LSE problems derive an equivalent unconstrained linear least squares (LS) problem of lower dimension. There are two standard ways to perform this reduction: the null-space approach [21, 29] and the method of direct elimination [10], both of which, with suitable implementation, offer good numerical stability. These methods, termed constraint substitution methods, consider the constraints (1.2) as the primary data and substitute from them into the LS problem (1.1). The former performs a substitution using a null-space basis of C obtained from a QR factorization, while the latter is based on substituting an expression for selected solution components from the constraints into (1.1). This can be done using a QR factorization of C [7, 10]. If there are a large number of constraints, a pivoted LU factorization might also be an option [31]. Other solution methods, which may be regarded as complementary to the constraint substitution approaches, reverse the direction of the substitution, substituting from the LS problem into the constraints. This involves the use of an augmented system and include a Lagrange multiplier formulation [22], updating procedures that force the constraints to be satisfied a posteriori [6, 7], and a weighting approach [5, 36, 46],

Solving large-scale LS problems is typically much harder than solving systems of linear algebraic equations, in part because key issues such as ill-conditioning or dense structures within an otherwise sparse problem can vary significantly between different problem classes. Consequently, we do not expect that there will be a single method that is optimal for all LSE problems, and having a range of approaches available that target different problems is important. Our main objective is to revisit classical solution strategies and to propose new ideas and modifications that enable large-scale systems to be solved, with an emphasis first on the possibility that the constraints may be dense, and second on requiring that the constraints be tightly satisfied. In Sects. 2 and 3, we consider the null-space method and the direct elimination approach, respectively. We review the methods and show how they can be used for large-scale problems. In Sect. 4, we present complementary solution approaches within an augmented system framework. This allows us to treat the constraints and the least squares part of the problem using a single extended system of equations or via a global updating scheme. Both direct and iterative methods are discussed.

Much of the published literature related to LSE problems lacks numerical results. For instance, Björck [6] remarks “no attempt has yet been made to implement the (general updating LSE) algorithm”, and as far as we are aware, attempts remain absent. We assume this is because implementing the algorithms is far from straightforward. While it is not the intention here to offer a full general comparison of the different approaches, throughout our study we use numerical experiments on problems arising from real applications to highlight key features that may make a method attractive (or unsuitable) for particular problems and to illustrate the effectiveness of the different approaches. Our key findings and recommendations are summarized in Sect. 5.

We end this introduction by describing our test environment. The test matrices are taken from the SuiteSparse Matrix Collection [15] and comprise a subset of those used by Gould and Scott in their study of numerical methods for solving large-scale LS problems [20]. If necessary, the matrix is transposed to give an overdetermined system. Basic information on our test set is given in Table 1.

Table 1 Statistics for our test set. m, n and $nnz({{\mathscr {A}}})$ are, respectively, the row and column counts and the number of entries in the matrix ${{{\mathscr {A}}}}$ given by (1.3). dratio is the ratio of the nonzero counts of the densest row to the sparsest row of ${{{\mathscr {A}}}}$. $^\dag $ indicates at least one column was removed to ensure there are no null columns in A

Full size table

The problems in the top half of the table contain rows that are identified as dense by Algorithm 1 of [42] (with the density parameter set to 0.05). These rows are taken to form the constraint matrix C and all other rows form A. For the other problems, we form A by removing the 20 densest rows of the SuiteSparse matrix; some or all of these rows are used to form C (and the rest are discarded). Table 1 reports data for $p=5$ and 20 (denoted, for example, by deter_5 and deter_20, respectively). Although the densest rows are not necessarily very dense, we make this choice because it corresponds to the typical situation in which the constraints couple many of the solution components together. For some of our test examples, splitting the supplied matrix into a sparse part and a dense part results in the sparse part A containing a small number of null columns (at most 7 such columns for our test examples). For the purpose of our experiments, we remove the corresponding columns from the extended matrix (1.3) (the data in Table 1 is for the modified problem). In all our tests, we check that the norms of the computed solution x and least squares residual $r = b- A\,x$ are consistent with the values given in Table 1.

In our experiments, we prescale the extended matrix ${\mathscr {A}}$ given by (1.3) by normalizing each of its columns. That is, we replace ${\mathscr {A}}$ by ${{\mathscr {A}}}{\mathscr {D}}$, where ${\mathscr {D}}$ is the diagonal matrix with entries ${\mathscr {D}}_{ii}$ satisfying ${\mathscr {D}}_{ii} =1/ \Vert {\mathscr {A}}e_i\Vert _2 $ ($e_i$ denotes the i-th unit vector). The entries of ${{\mathscr {A}}}{\mathscr {D}}$ are at most one in absolute value. The vectors b and d are set to be vectors of 1’s (so that $\Vert b\Vert _2$ and $\Vert d\Vert _2$ are O(1)).

For the substitution approaches described in Sects. 2 and 3, we have developed prototype Fortran codes; in Sect. 4, the augmented system methods are implemented using the SuiteSparseQR package of Davis [14] and Fortran software from the HSL mathematical software library [26]. The prototype codes are not optimised for efficiency and so computational times are not reported. Developing library quality implementations is far from trivial and is outside the scope of the current study, which focuses rather on determining which approaches are sufficiently promising for sophisticated implementations to be considered in the future.

Notation All norms are 2-norms and in the rest of the paper, to simplify the notation, $\Vert .\Vert _2$ is denoted by $\Vert .\Vert $. I is used to denote the identity matrix of appropriate dimension. The entries of any matrix B are $(B)_{i,j}$ and its columns are denoted by $b_1,b_2, \ldots $. The null space of B is ${{{\mathscr {N}}}}(B)$ and Z is used to denote a matrix whose columns form a basis for the null space (i.e., Z satisfies $B\,Z = 0$). Permutation matrices are denoted by P (possibly with a subscript). The normal matrix for (1.1) is $H = A^T\,A$.

2 The null-space approach

The null-space approach is a standard technique for solving least squares problems. It is based on constructing a matrix $Z \in {{\mathbb {R}}}^{n \times (n-p)}$ such that its columns form a basis for ${{{\mathscr {N}}}}(C)$. Any $x \in {{\mathbb {R}}}^{n}$ satisfying the constraints can be written in the form

$$\begin{aligned} x=x_1 + Z\, x_2, \end{aligned}$$

(2.1)

where $x_1 \in {{\mathbb {R}}}^{n}$ is a particular solution of the underdetermined system $ C\, x_1 = d.$ The minimum-norm solution can be obtained from the QR factorization of C, that is, $C\,P_C = Q_C \begin{pmatrix}R_C&0 \end{pmatrix}$, where the permutation $P_C \in {{\mathbb {R}}}^{n \times n}$ represents the pivoting, $R_C \in {{\mathbb {R}}}^{p \times p}$ is an upper triangular matrix and $Q_C \in {{\mathbb {R}}}^{p \times p}$ is an orthogonal matrix. $x_1$ is then given by

$$\begin{aligned} x_1 = P_C \begin{pmatrix} R_C^{-1}Q_C^T\, d \\ 0 \end{pmatrix}. \end{aligned}$$

Substituting (2.1) into (1.1) gives the transformed LS problem

$$\begin{aligned} \min _{x_2} \left\| A \,Z\, x_2 - (b -A\, x_1) \right\| ^2. \end{aligned}$$

(2.2)

The method is summarized as Algorithm 1.

In the 1970s, the null-space method was developed and discussed by a number of authors, including in relation to quadratic programming [21, 29, 39, 45]. These and subsequent contributions formulate the approach via the orthogonal null-space basis obtained, for example, from the QR factorization of $C^T$ given by

$$\begin{aligned} C^T = Q \begin{pmatrix} R \\ 0 \end{pmatrix}, \end{aligned}$$

where $Q \in {{\mathbb {R}}}^{n \times n}$ is an orthogonal matrix. Z is equal to the last $n-p$ columns of Q and consequently is dense. Note that although it is possible to store Q implicitly using, for example, Householder transformations, the memory demands and implied operation counts are generally too high. Our interest is in large LS problems and therefore it may not be practical to solve the $(n-p) \times (n-p)$ system in Step 3 if Z is dense. To make the approach feasible for large problems we can exploit our recent work [44] on constructing sparse null-space bases of “wide” matrices such as C that have many more columns than rows and may include some dense rows.

Scott and Tůma [44] propose a number of ways to construct sparse Z. In our experiments, we employ Algorithm 3 from Section 3 of [44]. This algorithm first computes a QR factorization of C with column pivoting. The chosen pivots correspond to p columns of R. Then each of the remaining $n-p$ columns of C induces a column $z \in Z$ that is computed independently of the other columns as follows. While in the trivial case of a zero column the corresponding z contains a single nonzero entry, for any nonzero column $c \in C$ a linearly dependent set involving other columns of C is constructed. The smallest such set is called a circuit; circuits play an important role in the problem of the sparsest null-space basis [12]. The coefficients of the linear combination of c and other columns of C that sum to zero are the row entries of the column $z \in Z$ corresponding to c. The linearly dependent sets are found using a partial pivoted QR factorization of C (with at most p steps) that involves the column c. To obtain Z with a narrow bandwidth so that $Z^TH\,Z$ is sparse when H is sparse, a pivoting threshold $\theta \in [0,1]$ is employed in these partial QR factorizations. The role of $\theta $ is to balance the locality of the dependent sets (combining columns of C whose indices are close to c) with the stability of a QR factorization with column pivoting (which maximizes the absolute values of the diagonal entries of R). Small values of $\theta $ result in Z having a narrow bandwidth.

Our first results are for problems deter3 and gemat1. As discussed in the Introduction, we form the constraint matrix C by taking the $p = 2$, 5, 10, 20 densest rows of ${{\mathscr {A}}}$. The sparse block A is the same for each case. In Fig. 1, we plot the number of entries $nnz(Z^TH\,Z)$ in $Z^TH\,Z$ and the norm of the constraints residual $\left\| r_c\right\| = \left\| d - C \,x \right\| $. As expected, $nnz(Z^TH\,Z)$ increases with $\theta $, and this increase grows with p. This is illustrated further by the results in Table 2. We see that, independently of the choice of $\theta $, for some problems (including lp_fit2p and sctap1-2r) the constraints are not tightly satisfied. This demonstrates an inherent limitation of the null-space approach of [44] that focuses on constructing the columns of Z so as to keep $Z^TH\,Z$ sparse but does not result in Z having orthogonal columns.

Table 2 The density of $Z^TH\,Z$ (that is, $nnz(Z^TH\,Z)/(n-p)^2$) and constraint residual $\left\| r_c \right\| $ for two values of the threshold pivoting parameter $\theta $ used in the computation of the null-space basis. $\ddag $ indicates insufficient memory for the sparse direct solver HSL_MA87

Full size table

The matrix $Z^TH\,Z$ in Step 3 of Algorithm 1 is symmetric positive definite. In the above experiments, we employ the sparse direct solver HSL_MA87 [23] (combined with an approximate minimum degree ordering). However, for large problems, the memory demands mean it may not be possible to use a direct method; this is illustrated by problem south31 with $\theta = 1$. If a preconditioned iterative solver is used instead, not only are the solver memory requirements much less but explicitly forming the potentially ill-conditioned normal matrix H can be avoided and because Z only needs to be applied implicitly, the need for sparsity can be relaxed. Currently, finding a good preconditioner for use in this case remains an open problem [32].

If a sequence of LSE problems is to be solved with the same set of constraints but different A, the null-space basis can be reused, substantially reducing the work required. But if the constraints are changed, then Z will also change. In [44], we present a strategy that allows Z to be updated when a row (or block of rows) is added to C.

3 The method of direct elimination

The second method we look at is direct elimination [29]. The basic idea is to express the dependency of p selected components of the vector x on the remaining $n-p$ components and to substitute this into the LS problem (1.1). Here we propose how to choose the p components so as to retain sparsity in the transformed problem.

Consider the constraints (1.2). The method starts by permuting and splitting the solution components as follows:

$$\begin{aligned} C\,x = C\,P_c \,y = \begin{pmatrix} C_1&C_2 \end{pmatrix} \begin{pmatrix} y_{1} \\ y_{2} \end{pmatrix} = d, \end{aligned}$$

where $P_c\in {{\mathbb {R}}}^{n \times n} $ is a permutation matrix chosen so that $C_1 \in {{\mathbb {R}}}^{p \times p}$ is nonsingular. Let $A\,P_c = \begin{pmatrix} A_1&A_2 \end{pmatrix}$ be a conformal partitioning of $AP_c$. Substituting the expression

$$\begin{aligned} y_1 = C_1^{-1}(d - C_2\,y_2) \end{aligned}$$

(3.1)

into (1.1) gives the transformed LS problem

$$\begin{aligned} \min _{y_2} \left\| A_T y_2 -(b - A_1\,C_1^{-1}d)\right\| ^2, \end{aligned}$$

(3.2)

with the transformed matrix

$$\begin{aligned} A_T = A_2 - A_1\,C_1^{-1}C_2 \in {{\mathbb {R}}}^{m \times (n-p)}. \end{aligned}$$

(3.3)

Note that if $C_1$ is irreducible, the transformation combines all the rows of $C_2$. If C is composed of dense rows then $A_T$ has more dense rows than A. We thus seek to add as few row patterns as possible replicating the (possibly) dense pattern of C within $A_T$. If both A and C are sparse, the substitution leads to a sparse LS problem. We have the following straightforward result.

Lemma 3.1

Let $A \in {{\mathbb {R}}}^{m \times n}$ be sparse. Let $m>n > p$ and assume a conformal column splitting induced by the permutation $P_c$ is such that $CP_c = \begin{pmatrix}C_1 \,&C_2\end{pmatrix}$ and $A\,P_c = \begin{pmatrix}A_1\,&A_2\end{pmatrix}$ with $C_1 \in {{\mathbb {R}}}^{p \times p}$ nonsingular and $A_1 \in {{\mathbb {R}}}^{m \times p}$. Define the index set

$$\begin{aligned} Occupied =\{i \; | \; (A_1)_{i,k} \ne 0 \text{ for } \text{ some } \text{ k, } 1 \le k \le p \}. \end{aligned}$$

Then the number of dense rows in $A_T$ given by (3.3) is at most the number of entries in Occupied.

Proof

The result follows directly from the transformation. Assuming the rows of $C_1^{-1}C_2$ are dense, the substitution step (3.1) of the direct elimination implies a dense row k in $A_T$ only if there is a nonzero in the k-th row of $A_1$. $\square $

A simple example is given in Fig. 2. Here we ignore cancellation of nonzeros during arithmetic operations. We see that the pattern of $A_T$ satisfies Lemma 3.1. Note that, although in this example $C_1^{-1}C_2$ is shown as dense, it need not be fully dense and the number of entries in Occupied represents an upper bound on the number of dense rows in $A_T$.

Lemma 3.1 implies that the LSE problem is transformed to a LS problem (3.2) that has some dense rows, which we refer to as a sparse-dense LS problem. Consequently, existing methods for sparse-dense LS problems can be used, including those recently proposed in [40, 41, 43] (see also the recent direct LS solver HSL_MA85 from the HSL library). A straightforward algorithmic implication of the lemma is that the permuting and splitting of C cannot be separated from considering the sparsity pattern of A because the splitting also determines $A_1$ and $A_2$. Thus we want to permute the columns of C to allow a sufficiently well-conditioned factorization of $C_1$ while limiting the number of entries in Occupied and hence the number of dense rows in $A_T$. The approach outlined in Algorithm 2 is one way to achieve this. There is an important difference between the pivoting used in Algorithm 3 of [44] (which we used in the previous section) and that of Algorithm 2 below. The former modifies the column pivoting that is considered as standard for QR factorizations by employing a threshold parameter $\theta $ that ensures Z is banded and the transformed normal matrix $Z^TH\,Z$ retains sparsity. The choice of $\theta $ aims to balance the stability of the factorization with the sparsity of Z. The threshold parameter $\tau \in (0,1]$ used in Algorithm 2 also guarantees the pivots in the QR factorization of C are sufficiently large but the selection of the candidate pivots is balanced with limiting the fill-in in the transformed matrix $A_T$. A crucial role is played by the set of rows held in Occupied that potentially cause fill-in in $A_T$. The use of different notation for the threshold parameters emphasises the difference between the two QR-based approaches and the distinct roles of the two thresholds.

Observe that the pivoting strategy in Algorithm 2 considers C and A simultaneously and will not select a column as the pivot column if this column in A is dense (as it would lead to $A_T$ being dense). While we do not discuss the implementation details, we remark that care is needed to ensure efficiency. For example, the QR factorization with pivoting of a wide matrix is relatively cheap but it may be necessary to store the squares of the column norms using a heap, which is why we emphasize their role in the algorithm by using the explicit notation $w_i$ for these norms.

The effects of increasing the pivoting parameter $\tau $ on the number of dense rows in $A_T$ are illustrated in Fig. 3 for problems deter3 and gemat1; results for the full test set are given in Table 3. The dense rows of the transformed matrix $A_T$ are determined using Algorithm 1 of [42] and to solve the transformed LS problem (3.2) we use the sparse-dense preconditioned iterative approach of [40].

Table 3 The number (ndense) of dense rows in $A_T$ and norm of the constraints residual $\left\| r_c \right\| $ for two values of the pivoting parameter $\tau $

Full size table

This computes a Cholesky factorization of the normal matrix corresponding to the sparse part of $A_T$ and uses it as a preconditioner within a conjugate gradient (CG) method; the CG convergence tolerance that measures the relative decrease of the transformed residual $\Vert A_T^T\,r\Vert /\Vert r\Vert $ is set to $10^{-11}$. For the problems in the top half of the table for which the rows of C are much denser than those of A (recall Table 1), reducing $\tau $ leads to only a small reduction in the number ndense of dense rows in $A_T$. However, when the constraints are not dense (the problems in the lower half of the table), ndense can be significantly decreased by choosing $\tau < 1$, although if $\tau $ is too small, the matrix $C_1$ computed by Algorithm 2 can become highly ill-conditioned and $A_T$ close to being singular. In our experiments we occasionally observed this for $\tau < 10^{-5}$.

By comparing the pairs of problems in the lower half of the table (such as deter3_20 and deter3_5) and considering the plots in Fig. 3, we see that increasing the number p of constraints can lead to a sharp increase in ndense (even if these constraints are relatively sparse), which can result in the transformed problem being hard to solve. The constraints are very well satisfied in all the test cases, making this an attractive approach if a good sparse-dense LS solver is available and the number of dense rows in the transformed problem is not too large. Furthermore, it can be used, without modification, if the matrix A contains a (small) number of dense rows. However, for a sequence of problems, if A and/or C changes then, because direct elimination couples the two matrices, the computation must be completely restarted.

4 Approaches described via augmented systems

We now focus on complementary approaches that are based on substitution from the unconstrained least squares problem into the constraints. A useful way to describe this is via the augmented (or saddle-point) system

$$\begin{aligned} \begin{pmatrix} H\, &{} C^T \\ C &{} 0 \end{pmatrix} \begin{pmatrix} x \\ \lambda \end{pmatrix} = \begin{pmatrix} A^T \,b \\ d \end{pmatrix}, \qquad H = A^T\,A. \end{aligned}$$

(4.1)

Here $\lambda \in {{\mathbb {R}}}^{p}$ is a vector of additional variables that are often called Lagrange multipliers [18, 22]. The solution x of (4.1) solves the LSE problem. Using (4.1) can be particularly useful if C is dense and p is small. As we see in the following discussions, this is because the work involved in the proposed algorithms that depends upon p is effectively independent of the density of C. Observe that because (4.1) has a zero (2, 2) block, the augmented system can be also used to give an alternative derivation of the null-space approach of Algorithm 1. For if Z is such that $C\,Z = 0$ and $x_1$ is a particular solution of the second equation of (4.1) so that $Cx_1 = d$ (steps 1 and 2 of Algorithm 1), then if $x = x_1 + {\hat{x}}$, (4.1) becomes

$$\begin{aligned} \begin{pmatrix} H\, &{} C^T \\ C &{} 0 \end{pmatrix} \begin{pmatrix} {\hat{x}} \\ \lambda \end{pmatrix} = \begin{pmatrix} A^T (b- A\,x_1) \\ 0 \end{pmatrix}. \end{aligned}$$

The second equation in this system is equivalent to finding $x_2$ such that ${\hat{x}} = Z\,x_2$. Substituting this into the first equation $H\,Z\,x_2 + C^T \lambda = A^T (b- A\,x_1)$. Hence $Z^TH\,Zx_2 = (A\,Z)^T (b- A\,x_1)$, as in Algorithm 1.

4.1 Direct use of Lagrange multipliers

Algorithm 3 presents a straightforward updating scheme for solving the LSE problem using Lagrange multipliers and (4.1). Any appropriate direct or iterative method can be used for Step 1, which is usually the most expensive part of the computation.

There is no dependence on C so the solution y does not need to be recomputed when C changes. The method used to solve the system with a block of p right-hand sides in Step 2 can be chosen to exploit Step 1. For example, a sparse Cholesky factorization of H may be computed in Step 1 and the factors reused in Step 2. Using existing sparse LS solvers (and a dense linear solver for the $p \times p$ at Step 4), Algorithm 3 is straightforward to implement and the solution y of the unconstrained LS problem obtained from Step 1 can be compared with that of the LSE computed in Step 5.

As discussed by Golub [17] and Heath [22], a numerically superior direct method that avoids both forming the potentially ill-conditioned normal matrix H and computing the multipliers $\lambda $ can be derived using a QR factorization of A. Following [43], we obtain Algorithm 4. Here P is a permutation matrix chosen to ensure sparsity of the R factor. Note that, unless b (and hence f) changes, the Q factor need not be retained and the R factor can be reused if the constraints change but A is fixed.

Results for Algorithm 4 presented in Table 4 confirm that the computed solution is such that the norm of the constraints residual $\Vert r_c\Vert = \Vert d - C\,x\Vert $ is small. We omit results for problems such as deter_5 that have $p=5$ constraints because they are similar (with $\Vert r_c\Vert $ typically smaller than for the corresponding problems with $p=20$).

Table 4 Norm of the constraint residuals $\Vert r_c\Vert $ for QR with updating (Algorithm 4)

Full size table

4.2 An extended augmented system approach

An equivalent formulation of (4.1) is given by the 3-block saddle-point system (the first order optimality conditions)

$$\begin{aligned} {{{\mathscr {A}}}}_{aug}\, y= b_{aug}, \end{aligned}$$

where

$$\begin{aligned} {{{\mathscr {A}}}}_{aug} = \begin{pmatrix} I &{} 0 &{} A \\ 0 &{} 0 &{} C \\ A^T &{} C^T &{} 0 \end{pmatrix}, \qquad y = \begin{pmatrix} r \\ -\lambda \\ x \end{pmatrix}, \qquad b_{aug}= \begin{pmatrix} b \\ d \\ 0 \end{pmatrix}. \end{aligned}$$

(4.2)

Applying the analysis of Section 5 of [43] to this problem yields Algorithm 5. In exact arithmetic, the main difference between the work required by Algorithms 4 and 5 is that the former involves an additional solve with $RP^T$. For both algorithms, K is independent of b and d.

4.3 Augmented regularized normal equations

The next approach weights the constraints and uses a regularization parameter within an augmented system formulation and then aims to balance these two modifications. Consider the weighted least squares problem (WLS)

$$\begin{aligned} \min _{x}\left\| A_{\gamma } \,x_{\gamma } - b_{\gamma } \right\| ^2 \;\; \text{ with } \;\; A_{\gamma } = \begin{pmatrix} A \\ \gamma \, C \end{pmatrix}, \;\; b_{\gamma } = \begin{pmatrix} b \\ \gamma \, d \end{pmatrix}, \end{aligned}$$

(4.3)

for some large $\gamma $ ($\gamma \gg 1$). Let $x_{LSE}$ be the solution of the LSE problem (1.1)–(1.2). Then because

$$\begin{aligned} \lim _{\gamma \rightarrow \infty } x_{\gamma } = x_{LSE}, \end{aligned}$$

the WLS problem can be used to solve the LSE problem approximately [28]. An obvious solution method is to solve the normal equations for (4.3):

$$\begin{aligned} H_\gamma \,x = A_\gamma ^T \,A_\gamma \,x = (A^T\,A + \gamma ^2\, C^TC)\, x = A^T\, b + \gamma ^2 \,C^T d= A_\gamma ^T \,b_\gamma . \end{aligned}$$

The appeal is that no special methods are required: software for solving standard normal equations can be used. However, for very large values of $\gamma $, the normal matrix $H_\gamma $ becomes extremely ill-conditioned; this is discussed in Section 4 of [9], where it is shown that the method of normal equations can break down if $\gamma > \epsilon ^{-1/2}$ ($\epsilon $ is the machine precision). Furthermore, if C contains dense rows then $H_\gamma $ will be dense.

Another possibility is to use the regularized normal equations

$$\begin{aligned} (H_\gamma + \omega ^2 I) \,x = A_\gamma ^T \,b_\gamma , \end{aligned}$$

(4.4)

where $\omega > 0$ is a regularization parameter [49]. Solving (4.4) is equivalent to solving the $(m + p +n) \times (m + p +n)$ augmented regularized normal equations

$$\begin{aligned} {{{\mathscr {A}}}}(\omega ,\gamma ) \begin{pmatrix} y \\ x \end{pmatrix} = \begin{pmatrix} b_\gamma \\ 0 \end{pmatrix}, \qquad {{{\mathscr {A}}}}(\omega ,\gamma ) = \begin{pmatrix} \omega I &{} A_\gamma \\ A_\gamma ^T \;&{} -\omega I \end{pmatrix}, \end{aligned}$$

(4.5)

where $y = \omega ^{-1}(b_\gamma - A_\gamma \,x)\in {{\mathbb {R}}}^{m+p}$. The spectral condition number of (4.5) is

$$\begin{aligned} \text {cond}({{{\mathscr {A}}}}(\omega ,\gamma )) = \sqrt{\text {cond}(H_\gamma + \omega ^2 \,I)} \end{aligned}$$

and Saunders [38] shows that $\text {cond}({{{\mathscr {A}}}}(\omega ,\gamma )) \approx \Vert A_{\gamma }\Vert /\omega $ regardless of the condition of $A_{\gamma }$. Thus using (4.5) potentially gives a significantly more accurate approximation to the pseudo solution $x = A_\gamma ^+ \,b_\gamma $ (where $(.)^+$ denotes the Moore-Penrose pseudoinverse of a matrix) compared to the approximation provided by solving (4.4). In [48], the parameters are set to $\omega = 10^{-q}$ and $\gamma = 10^q$, where

$$\begin{aligned}q = \min \{ k: 10^{-2k} \le \nu ^{-t} \}. \end{aligned}$$

Here t-bit floating-point arithmetic with base $\nu $ is used.

Rewriting (4.5) using (4.3) and a conformal partitioning of y gives

$$\begin{aligned} \begin{pmatrix} \omega I &{} 0 &{} A \\ 0 &{} \omega I &{} \gamma C \\ A^T \,&{} \gamma C^T \,&{} -\omega I \end{pmatrix} \begin{pmatrix} y_s \\ y_c \\ x \end{pmatrix} = \begin{pmatrix} b \\ \gamma d \\ 0 \end{pmatrix}. \end{aligned}$$

(4.6)

This system can be solved as in [43] using a modified version of Algorithm 5. Or, eliminating $y_s$ and setting $\omega \gamma = 1$, yields

$$\begin{aligned} \begin{pmatrix} -H(\omega ) &{} C^T \\ C \;&{} \omega ^2 I \end{pmatrix} \begin{pmatrix} x \\ y_c \end{pmatrix} = \begin{pmatrix} -A^T\,b \\ d \end{pmatrix}, \qquad H(\omega ) = A^T\,A + \omega ^2 \,I. \end{aligned}$$

(4.7)

We can solve this system using a QR factorization of $\begin{pmatrix}A \\ \omega I \end{pmatrix}$ and modifying Algorithm 4. Or, ignoring the block structure, we can treat it as a sparse symmetric indefinite linear system and compute an $LDL^T$ factorization (with L unit lower triangular and D block diagonal with blocks of size 1 and 2) using a sparse direct solver such as HSL_MA97 [24] that incorporates pivoting for stability with a sparsity-preserving ordering. This factorization would have to be recomputed for each new set of constraints. Alternatively, a block signed Cholesky factorization of (4.7) can be used, that is,

$$\begin{aligned} \begin{pmatrix} -H(\omega ) &{} C^T \\ C \;&{} \omega ^2 I \end{pmatrix} = \begin{pmatrix} L &{} \\ B\; &{} \;L_{\omega } \end{pmatrix} \begin{pmatrix} -I &{} \\ &{} \;I \end{pmatrix} \begin{pmatrix} L^T \;&{} B^T \\ &{} L_{\omega }^T \end{pmatrix}, \end{aligned}$$

where

$$\begin{aligned} H(\omega ) = L\,L^T, \quad L\,B^T = -C^T \quad \text{ and } \quad S = \omega ^2 I + B\,B^T = L_{\omega }\,L_{\omega }^T. \end{aligned}$$

We then obtain Algorithm 6. Note that B need not be computed explicitly. Rather, the Schur complement S may be computed using $\omega ^2 I + C\,L^{-T}L^{-1}C^T$, and $w=B\,z$ may be computed by solving $L\,v = z$ and then setting $w=-C\,v$, and $w= -B^T\,y_c$ may be obtained by solving $Lw = C^T\,y_c$.

Results for Algorithm 6 for three of our test problems using a range of values of $\omega $ are given in Table 5. Note that here $\Vert r_c\Vert $ is computed using $r_c = d - C \,x$ (rather than using $r_c = \omega *y_c$). We see that, provided $\omega $ is sufficiently small, the values of $\Vert x\Vert $ and $\Vert r\Vert $ are consistent with those given in Table 1.

By replacing the Cholesky factorization of $H(\omega )$ by an incomplete factorization $H(\omega ) \approx {\tilde{L}}\,{\tilde{L}}^T$, we can obtain a preconditioner for solving (4.7). In particular, the right-preconditioned system is

$$\begin{aligned} \begin{pmatrix} -H(\omega ) &{} C^T \\ C\; &{} \omega ^2 I \end{pmatrix} M^{-1}\begin{pmatrix} w \\ w_c \end{pmatrix} = \begin{pmatrix} -A^T \,b \\ d \end{pmatrix}, \qquad M\begin{pmatrix} x \\ y_c \end{pmatrix} = \begin{pmatrix} w \\ w_c \end{pmatrix}, \end{aligned}$$

(4.8)

and we can take the preconditioner in factored form to be

$$\begin{aligned} M = \begin{pmatrix} {\tilde{L}} \\ {\tilde{B}} \;&{} \;\;\;I \end{pmatrix} \begin{pmatrix} -I &{} \\ &{} \; {\tilde{S}}_d \end{pmatrix} \begin{pmatrix} {\tilde{L}}^T \;&{} {\tilde{B}}^T \\ &{} I \end{pmatrix}, \end{aligned}$$

(4.9)

with

$$\begin{aligned} {\tilde{L}}\,{\tilde{B}}^T = -C^T \quad \text{ and } \quad {\tilde{S}} = \omega ^2 I + {\tilde{B}} \,{\tilde{B}}^T. \end{aligned}$$

As the preconditioner (4.9) is indefinite, it needs to be used with a general nonsymmetric iterative method such as GMRES [37]. A positive definite preconditioner for use with MINRES [33] can be obtained by replacing $-I$ in (4.9) by I. MINRES has the important advantage of only requiring three vectors of length equal to the size of the linear system. GMRES results are included in Table 5. The GMRES convergence tolerance is taken to be $10^{-11}$. We see that the GMRES iteration count is essentially independent of $\omega $. We also ran MINRES with the same settings. For problems sctap1-2r, south31 and deter3_20 with $\omega = 10^{-5}$ the counts were 17, 772 and 56 (approximately twice the GMRES counts). This would be of more interest if all counts were higher.

Table 5 Results for the augmented regularized normal equations approach (Algorithm 6) for problems sctap1-2r, south31, and deter3_20 using a range of values of $\omega $. iters is the number of preconditioned GMRES iterations. The computed $\Vert x\Vert $ and $\Vert r\Vert $ are consistent for both approaches

Full size table

Table 6 Convergence results for problems sctap1-2r with $\omega =1.0\times 10^{-8}$ and stormg2-8_20 with $\omega =1.0\times 10^{-6}$. tol and iters are the convergence tolerance and the iteration count for GMRES

Full size table

Our findings in Sect. 4 suggest that, if we require the constraints to be solved with a small residual, then an augmented system based approach combined with a QR factorization performs better (in terms of $\Vert r_c\Vert $) than combining it with regularization and a Cholesky factorization. Unfortunately, QR factorizations are more expensive and while strategies for computing incomplete orthogonal factorizations for use in building preconditioners have been proposed (see, for instance, [2,3,4, 27, 30, 34, 47]), the only available software is the MIQR package of Li and Saad [30] (probably because developing high quality implementations is non-trivial). In their study of preconditioners for LS problems, Gould and Scott [19, 20] found that MIQR generally performed less well than incomplete Cholesky factorization preconditioners and so is not considered here.

We have made the implicit assumption that A is sparse. However, it is straightforward to extend the augmented system-based approaches to the more general case that A contains rows that are dense. For example, if A is permuted and partitioned as

$$\begin{aligned} A = \begin{pmatrix} A_1 \\ A_2 \end{pmatrix}, \end{aligned}$$

where $A_1$ is sparse and $A_2$ is dense, then using a conformal partitioning of $y_s$ and of b, (4.7) can be replaced by the augmented system

$$\begin{aligned} \begin{pmatrix} -H_{1}(\omega ) &{} C_d^T \\ C_d \;&{} \omega ^2 I \end{pmatrix} \begin{pmatrix} x \\ y_d \end{pmatrix} = \begin{pmatrix} -A_1^T\,b_1 \\ d \end{pmatrix} \end{aligned}$$

with

$$\begin{aligned} \quad H_{1}(\omega ) = A_1^T\,A_1 + \omega ^2 I, \quad y_d = \begin{pmatrix} y_c \\ y_2 \end{pmatrix}, \quad C_d = \begin{pmatrix} C \\ \omega \,A_2 \end{pmatrix}, \quad d = \begin{pmatrix} d \\ \omega \,b_2 \end{pmatrix}. \end{aligned}$$

Finally, we remark that, if we use the 3-block form (4.6) then we can follow [43], which in turn generalises the work of Carson, Higham and Pranesh [11], and obtain an augmented system approach with multi-precision refinement. This has the potential to reduce the computational cost in terms of time and/or memory, thus allowing larger problems to be solved.

5 Conclusions

We have considered a number of approaches for solving large-scale LSE problems in which the constraints may be dense. Our main findings can be summarized as follows:

The classical null-space method relies on computing a null-space basis matrix Z for the “wide” constraint matrix C such that $Z^T A^T \,A \,Z$ is sparse. In recent work [44], we proposed how this can be achieved using a method based on a QR factorization of C with threshold pivoting. This is not straightforward to implement. Furthermore, our numerical experiments show that, in some cases, the norm $\Vert r_c\Vert $ of the constraints residual can be larger than for other approaches considered in this study. Thus, although in some contexts null-space approaches are popular, we do not recommend the strategy of [44] for LSE problems.
The direct elimination approach couples the constraint matrix and the LS matrix, leading to a sparse-dense transformed least squares problem. Existing direct or iterative methods can be used to solve the transformed problem and our experiments found the computed constraint residuals are small. The approach can be used for problems for which A (as well as C) contains a small number of dense rows. A weakness is that, if solving a sequence of problems in which either A or C is fixed, the coupling of the two blocks in the solution process means that it must be restarted. Furthermore, the number of dense rows in the transformed problem can be relatively large, making it expensive to solve.
There are several options for using an augmented system formulation. This can be solved using standard building blocks, such as a sparse QR factorization, a sparse symmetric indefinite linear solver, or a block sparse Cholesky factorization. An attraction of each of these is that existing “black box” solvers can be exploited, thereby greatly reducing the effort required in developing robust and efficient implementations. The augmented system formulation can be generalised to handle dense rows in A and offers the potential for mixed-precision computation. Moreover, an incomplete Cholesky factorization can be used as a preconditioner with a Krylov subspace solver.
In the case of a series of LSE problems in which only the constraints change, both the null-space and direct elimination approaches have the disadvantage that the computation must be redone for each new set of constraints. For the augmented system approaches, a significant amount of work can be reused from the first problem in the sequence when solving subsequent problems.

Finally, we observe that there is a lack of iterative methods and preconditioners that can be used to extend the size of LSE problems that can be solved. We have shown that using an incomplete factorization within a block factorization of an augmented system can be effective, but most current incomplete factorizations that result in efficient preconditioners are serial in nature and not able to tackle extremely large problems (but see [1, 25] for novel approaches that are designed to exploit parallelism). Addressing the lack of iterative approaches is a challenging subject for future work.

References

Anzt, H., Chow, E., Dongarra, J.: ParILUT-a new parallel threshold ILU factorization. SIAM J. on Scientific Computing 40(4), C503–C519 (2018)
Article MathSciNet MATH Google Scholar
Bai, Z.-Z., Duff, I.S., Wathen, A.J.: A class of incomplete orthogonal factorization methods. I: Methods and theories. BIT Numer. Math. 41(1), 53–70 (2001)
Article MathSciNet MATH Google Scholar
Bai, Z.-Z., Duff, Iain S., Yin, J.-F.: Numerical study on incomplete orthogonal factorization preconditioners. J. Comput. Appl. Math. 226(1), 22–41 (2009)
Article MathSciNet MATH Google Scholar
Bai, Z.-Z., Yin, J.-F.: Modified incomplete orthogonal factorization methods using Givens rotations. Computing 86(1), 53–69 (2009)
Article MathSciNet MATH Google Scholar
Barlow, J.L., Handy, S.L.: The direct solution of weighted and equality constrained least-squares problems. SIAM J. on Scientific Computing 9(4), 704–716 (1988)
Article MathSciNet MATH Google Scholar
Björck, Å.: A general updating algorithm for constrained linear least squares problems. SIAM J. on Scientific and Statistical Computing 5(2), 394–402 (1984)
Article MathSciNet MATH Google Scholar
Björck, Å.: Numerical Methods for Least Squares Problems. SIAM, Philadelphia (1996)
Book MATH Google Scholar
Björck, Å.: Numerical Methods in Matrix Computations, volume 59 of Texts in Applied Mathematics. Springer, Cham (2015)
Björck, Å., Duff, I.S.: A direct method for the solution of sparse linear least squares problems. Linear Algebra Appl. 34, 43–67 (1980)
Article MathSciNet MATH Google Scholar
Björck, Å., Golub, G.: ALGOL Programming, Contribution No. 22: Iterative refinement of linear least square solutions by Householder transformation. BIT Numer. Math. 7, 322–337 (1967)
Google Scholar
Carson, E., Higham, N., Pranesh, S.: Three-precision GMRES-based iterative refinement for least squares problems. SIAM J. on Scientific Computing 42(6), A4063–A4083 (2020)
Article MathSciNet MATH Google Scholar
Coleman, T.F., Pothen, A.: The null space problem. I. Complexity. SIAM J. on Algebraic and Discrete Methods 7(4), 527–537 (1986)
Article MathSciNet MATH Google Scholar
Damm, T., Stahl, D.: Linear least squares problems with additional constraints and an application to scattered data approximation. Linear Algebra Appl. 439(4), 933–943 (2013)
Article MathSciNet MATH Google Scholar
Davis, T.A.: Algorithm 915, SuiteSparseQR: Multifrontal multithreaded rank-revealing sparse QR factorization. ACM Transactions on Mathematical Software 38(1), 8:1-8:22 (2011)
Article MathSciNet MATH Google Scholar
Davis, T.A., Hu, Y.: The University of Florida sparse matrix collection. ACM Transactions on Mathematical Software 38(1), 1–28 (2011)
MathSciNet MATH Google Scholar
Farebrother, R.W.: Visualizing Statistical Models and Concepts, volume 166 of Statistics: Textbooks and Monographs. Marcel Dekker, Inc., New York (2002)
Golub, G.H.: Numerical methods for solving least squares problems. Numer. Math. 7, 206–216 (1965)
Article MathSciNet MATH Google Scholar
Golub, G.H., Van Loan, C.F.: Matrix Computations, 4th edn. The Johns Hopkins University Press, Baltimore and London (1996)
MATH Google Scholar
Gould, N.I.M., Scott, J.A.: The state-of-the-art of preconditioners for sparse linear least squares problems: the complete results. Technical Report RAL-TR-2015-009, Rutherford Appleton Laboratory (2015)
Gould, N.I.M., Scott, J.A.: The state-of-the-art of preconditioners for sparse linear least squares problems. ACM Transactions on Mathematical Software 43(4), 36:1–35 (2017)
Article MathSciNet MATH Google Scholar
Hanson, R.J., Lawson, C.L.: Extensions and applications of the Householder algorithm for solving linear least squares problems. Math. Comput. 23, 787–812 (1969)
Article MathSciNet MATH Google Scholar
Heath, M.T.: Some extensions of an algorithm for sparse linear least squares problems. SIAM J. on Scientific and Statistical Computing 3(2), 223–237 (1982)
Article MathSciNet MATH Google Scholar
Hogg, J.D., Reid, J.K., Scott, J.A.: Design of a multicore sparse Cholesky factorization using DAGs. SIAM J. on Scientific Computing 32, 3627–3649 (2010)
Article MathSciNet MATH Google Scholar
Hogg, J.D., Scott, J.A.: New parallel sparse direct solvers for multicore archiectures. Algorithms 6, 702–725 (2013)
Article MATH Google Scholar
Hook, J., Scott, J., Tisseur, F., Hogg, J.: A max-plus approach to incomplete Cholesky factorization preconditioners. SIAM J. on Scientific Computing 40(4), A1987–A2004 (2018)
Article MathSciNet MATH Google Scholar
HSL. A collection of Fortran codes for large-scale scientific computation (2018). http://www.hsl.rl.ac.uk
Jennings, A., Ajiz, M.A.: Incomplete methods for solving $A^TAx=b$. SIAM J. on Scientific and Statistical Computing 5(4), 978–987 (1984)
Article MathSciNet MATH Google Scholar
Lawson, C.L., Hanson, R.J.: Solving Least Squares Problems. Prentice-Hall, Inc., Englewood Cliffs, N.J. (1974). Prentice-Hall Series in Automatic Computation
Lawson, C.L., Hanson, R.J.: Solving Least Squares Problems, volume 15 of Classics in Applied Mathematics. SIAM, Philadelphia (1995). Revised reprint of the 1974 original
Li, N., Saad, Y.: MIQR: A multilevel incomplete QR preconditioner for large sparse least-squares problems. SIAM J. on Matrix Analysis and Applications 28(2), 524–550 (2006)
Article MathSciNet MATH Google Scholar
Murtagh, B.A., Saunders, M.A.: MINOS 5.51 “User’s Guide”. Technical Report SOL-83-20, Systems Optimization Laboratory, Dept. of Operations Research, Stanford Univ. (2003)
Nash, S.G., Sofer, A.: Preconditioning reduced matrices. SIAM J. on Matrix Analysis and Applications 17(1), 47–68 (1996)
Article MathSciNet MATH Google Scholar
Paige, C.C., Saunders, M.A.: Solution of sparse indefinite systems of linear equations. SIAM J. on Numerical Analysis 12(4), 617–629 (1975)
Article MathSciNet MATH Google Scholar
Papadopoulus, A.T., Duff, I.S., Wathen, A.J.: A class of incomplete orthogonal factorization methods. II: Implementation and results. BIT Numer. Math. 45(1), 159–179 (2005)
Article MathSciNet Google Scholar
Pisinger, G., Zimmermann, A.: Bivariate least squares approximation with linear constraints. BIT Numer. Math. 47(2), 427–439 (2007)
Article MathSciNet MATH Google Scholar
Powell, M.J.D., Reid, J.K.: On applying Householder transformations to linear least squares problems. In: Information Processing 68 (Proc. IFIP Congress, Edinburgh, 1968), Vol. 1: Mathematics, Software, pages 122–126. North-Holland, Amsterdam (1969)
Saad, Y., Schultz, M.H.: GMRES: A generalized minimal residual algorithm for solving nonsymmetric linear systems. SIAM J. on Scientific and Statistical Computing 7, 856–869 (1986)
Article MathSciNet MATH Google Scholar
Saunders, M.A.: Solution of sparse rectangular systems using LSQR and CRAIG. BIT Numer. Math. 35(4), 588–604 (1995)
Article MathSciNet MATH Google Scholar
Schittkowski, K., Stoer, J.: A factorization method for the solution of constrained linear least squares problems allowing subsequent data changes. Numerische Mathematik 31(4), 431–463 (1978/79)
Scott, J.A., Tůma, M.: Solving mixed sparse-dense linear least-squares problems by preconditioned iterative methods. SIAM J on Scientific Computing 39(6), A2422–A2437 (2017)
Article MathSciNet MATH Google Scholar
Scott, J.A., Tůma, M.: A Schur complement approach to preconditioning sparse least-squares problems with some dense rows. Numerical Algorithms 79(4), 1147–1168 (2018). https://doi.org/10.1007/s11075-018-0478-2
Article MathSciNet MATH Google Scholar
Scott, J.A., Tůma, M.: Strengths and limitations of stretching for least-squares problems with some dense rows. ACM Transactions on Mathematical Software 47(1), 1:1-25 (2021)
Article MathSciNet MATH Google Scholar
Scott, J.A., Tůma, M.: A computational study of using black-box QR solvers for large-scale sparse-dense linear least squares problems. ACM Transactions on Mathematical Software 48(1), 5:1-24 (2022)
Article MathSciNet MATH Google Scholar
Scott, J.A., Tůma, M.: A null-space approach for large-scale symmetric saddle point systems with a small and non zero (2,2) block. Numerical Algorithms, 2022. published online
Stoer, J.: On the numerical solution of constrained least-squares problems. SIAM J. on Numerical Analysis 8, 382–411 (1971)
Article MathSciNet MATH Google Scholar
Van Loan, C.: On the method of weighting for equality-constrained least-squares problems. SIAM J. on Numerical Analysis 22(5), 851–864 (1985)
Article MathSciNet MATH Google Scholar
Wang, X., Gallivan, K.A., Bramley, R.: CIMGS: an incomplete orthogonal factorization preconditioner. SIAM J. on Scientific Computing 18(2), 516–536 (1997)
Article MathSciNet MATH Google Scholar
Zhdanov, A.I.: The method of augmented regularized normal equations. Comput. Math. Math. Phys. 52(2), 194–197 (2012)
Article MathSciNet MATH Google Scholar
Zhdanov, A.I., Gogoleva, S.Y.: Solving least squares problems with equality constraints based on augmented regularized normal equations. Applied Mathematics E-Notes 15, 218–224 (2015)
MathSciNet MATH Google Scholar
Zhu, Y., Li, X.R.: Recursive least squares with linear constraints. Commun. Inf. Syst. 7(3), 287–311 (2007)
Article MathSciNet MATH Google Scholar

Download references

Acknowledgements

We are grateful to Professor Michael Saunders and an anonymous reviewer for their constructive comments that have led to many improvements in the presentation of this paper.

Author information

Authors and Affiliations

STFC Rutherford Appleton Laboratory, Harwell Campus, Didcot, Oxfordshire, OX11 0QX, UK
Jennifer Scott
School of Mathematical, Physical and Computational Sciences, University of Reading, Reading, RG6 6AQ, UK
Jennifer Scott
Department of Numerical Mathematics, Faculty of Mathematics and Physics, Charles University, Sokolovska, 49/83, 186 75 Praha 8, Czech Republic
Miroslav Tůma

Authors

Jennifer Scott
View author publications
You can also search for this author in PubMed Google Scholar
Miroslav Tůma
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jennifer Scott.

Additional information

Communicated by Michiel E. Hochstenbach.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

J. Scott was partially supported by the EPSRC Grant EP/W009676/1. M. Tůma was supported by project GACR-12719S of the Grant Agency of the Czech Republic.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Scott, J., Tůma, M. Solving large linear least squares problems with linear equality constraints. Bit Numer Math 62, 1765–1787 (2022). https://doi.org/10.1007/s10543-022-00930-2

Download citation

Received: 24 June 2021
Accepted: 14 June 2022
Published: 05 July 2022
Issue Date: December 2022
DOI: https://doi.org/10.1007/s10543-022-00930-2

Keywords

Mathematics Subject Classification

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Solving large linear least squares problems with linear equality constraints

Abstract

Similar content being viewed by others

Approximating sparse Hessian matrices using large-scale linear least squares

Performance of first- and second-order methods for \(\ell _1\)-regularized least squares problems

Scaled nonlinear conjugate gradient methods for nonlinear least squares problems

1 Introduction

2 The null-space approach