A null-space approach for large-scale symmetric saddle point systems with a small and non zero (2, 2) block

Null-space methods have long been used to solve large sparse n × n symmetric saddle point systems of equations in which the (2, 2) block is zero. This paper focuses on the case where the (1, 1) block is ill conditioned or rank deficient and the k × k (2, 2) block is non zero and small (k ≪ n). Additionally, the (2, 1) block may be rank deficient. Such systems arise in a range of practical applications. A novel null-space approach is proposed that transforms the system matrix into a nicer symmetric saddle point matrix of order n that has a non zero (2, 2) block of order at most 2k and, importantly, the (1, 1) block is symmetric positive definite. Success of any null-space approach depends on constructing a suitable null-space basis. We propose methods for wide matrices having far fewer rows than columns with the aim of balancing stability of the transformed saddle point matrix with preserving sparsity in the (1, 1) block. Linear least squares problems that contain a small number of dense rows are an important motivation and are used to illustrate our ideas and to explore their potential for solving large-scale systems.


Introduction
Our interest is in solving symmetric saddle point systems of equations of the form in which H ∈ R n×n is large, sparse and symmetric positive semidefinite (SPSD), B ∈ R k×n (n > k) and C ∈ R k×k is SPSD. Our focus is on k n (the saddle point matrix may then be referred to as a "bordered" matrix) and the "wide" matrix B may contain one or more dense rows. Such systems arise in a range of scientific applications, including the finite element approximation of PDEs in which a constraint is enforced on a subset of unknowns via a global scalar Lagrange multiplier, numerical continuation methods for large nonlinear systems of equations, electronic circuit simulation, constrained optimization, and linear least squares problems (see the paper of Benzi, Golub and Liesen [1], which describes a wide range of real life problems that are dependent on the solution of saddle point systems and see also [2,3]). We are particularly concerned with large-scale least squares problems that have a few dense rows. They can be expressed in the form (1) and provide an important motivation for the null-space approach proposed in this paper. In this application, the (1, 1) block can be ill conditioned or rank deficient.
There has been considerable work over many years and in different research areas into null-space methods for saddle point problems with a zero (2,2) block. The basic idea is to characterize the null space of the constraint (off-diagonal) blocks and use that characterization to reduce the system to two smaller linear systems of order (n − k) × (n − k) and k × k that have nice properties and are thus straightforward to solve. Consequently, applications of null-space methods are usually geared towards problems for which n − k is small. An attractive feature of null-space methods that widens their applicability is that the (1, 1) block H need not be invertible. However, a key problem is the need to construct a null-space basis N (B) for the matrix B. In general, this is challenging, particularly if it is desirable that the matrix whose columns form a basis for N (B) has additional properties, such as bandedness, sparsity or orthogonality. Moreover, the classical null-space method is restricted to C = 0.
The objectives of this paper are twofold. Firstly, we present a null-space based approach for symmetric saddle point systems in which the (1, 1) block is SPSD and the small k×k block C is non zero. The new approach preserves symmetry and results in a transformed symmetric saddle point system of the same order but with favourable numerical properties, namely a sparse symmetric positive definite (SPD) (1, 1) block and a non zero (2, 2) block of size k + r, where r ≤ k is the rank of B. Having an SPD (1, 1) block allows the use of block (possibly incomplete) factorization methods to be used to solve the system. Secondly, we propose techniques for constructing a null-space basis N (B) when k is small, with an emphasis on balancing sparsity in the transformed problem with stability through the use of QR factorizations with threshold pivoting. Novel contributions are the underlying theory that we present in Section 2 (Theorem 1 and the transformation of the sparse-dense least squares problem described by Lemma 2) as well as the main algorithmic approach based on the threshold QR factorization of the off-diagonal block of the transformed saddlepoint system. While the composite approaches we propose for constructing the nullspace bases are a straightforward generalization of the idea of Howell [3], the pivoted algorithmic approach based on right oblique conjugation is new, building on work done in the 1990s by Benzi, Meyer and Tůma [4][5][6].
This paper is organized as follows. In Section 2, we first recall the standard nullspace approach for saddle point systems with a zero (2,2) block and the recent method of Howell [3] that allows for a non zero C but sacrifices symmetry. We then propose a new symmetry preserving null-space approach for the case of a non zero C and discuss how it can be used to solve large-scale linear least squares problems that have a small number of dense rows. Null basis construction is considered in Section 3. A brief review of methods for building null-space bases for sparse matrices is given before we focus on the case of interest to us: null-space bases for wide matrices that may include some dense rows. A number of approaches are proposed, with an emphasis on ensuring sparsity within the transformed saddle point system while also guaranteeing numerical stability through the use of pivoting. In Section 4, numerical examples from practical applications demonstrate the effectiveness of the approaches and illustrate their potential strengths and limitations. Finally, some concluding remarks and pointers to future research directions area given in Section 5.

Test environment
The numerical experiments reported on in this paper are performed on an Intel(R) Core(TM) i5-4590 CPU running at 3.30 GHz with 12 GB of internal memory. The software for the null-space computations is written in Fortran and compiled using GNU Fortran 6.3.0. The algorithms we present are complex to implement and the software developed for our experiments is not optimized for efficiency, but is written for robustness at the expense of speed. Having obtained a null-space basis, we employ the sparse direct solver HSL MA87 [7] from the HSL mathematical software library (available at http://www.hsl.rl.ac.uk) to compute the sparse Cholesky factorization within the block matrix factorization. This package is also written in Fortran. The experimental results are processed by MATLAB R2020a. In particular, the presented condition numbers are computed using the MATLAB 1-norm condition number estimator condest.

Notation
We end this section by introducing notation that we use throughout. Let x ∈ R n be a vector. We denote the i-th entry of x by (x) i and (x) j :l denotes entries j to l of x (1 ≤ j < l ≤ n). x ⊥ denotes the (n − 1)-dimensional space of all vectors w ∈ R n such that x T w = 0. e i denotes the i-th unit vector. Let B ∈ R k×n be a matrix. The denote the null space and range space of B, respectively. Z is used to denote a nullspace basis matrix (that is, its columns form a null-space basis). P (with or without a subscript and/or superscript) is used for a permutation matrix. I k denotes the k × k identity matrix and 0 n,k denotes the n × k null matrix.
2 Null-space approach for solving saddle point systems

Null-space approach for zero C
Consider the case that B is of full rank and C = 0 k,k , that is, Such systems frequently arise in practical applications, particularly when solving constrained optimization problems (where they are usually referred to as reduced Hessian methods); a null-space approach is one possible method of solution (see, for example, [1, Section 6]). Assume we have a matrix Z ∈ R n×(n−k) whose columns form a basis for N (B) i.e., BZ = 0 k,n , and that we have a particular solution for the second equation in (2), that is, a vectorû such that Then solving (2) is equivalent to solving The second equation in this system is equivalent to finding a vector z ∈ R n−k such thatū = Zz. Substituting this into the first equation gives Therefore, by solving the reduced system (3), it is possible to straightforwardly recover u =û+Zz. Finally, v can be obtained by solving the reduced k×k symmetric positive definite system The null-space method for solving (2) is summarized as Algorithm 1.
A recent study of null-space factorizations for saddle point systems of the form (2) has been given by Rees and Scott [8] and the advantages and disadvantages of null-space methods are summarized in [9]. They can be very efficient for solving a series of problems with the same block B but different H because the null-space basis N (B) needs to be computed only once. A further key advantage is that H −1 is not required. In fact, the method is applicable if H is singular, provided R(H )∩N (B) = {0}. Furthermore, the Schur complement is not needed. The null-space approach can also be useful when additional rows (dense or sparse) are added to the system matrix a posteriori [10]. However, a potential difficulty is the need to construct a null-space basis matrix Z and, even when Z can be computed efficiently, some columns of Z may contain a large number of entries so that Z T H Z is not sparse.
The null-space method can be generalized to unsymmetric saddle point systems with a zero (2, 2) block in which H is unsymmetric and the full rank (1, 2) and (2, 1) blocks are B T 1 and B 2 with B 1 = B 2 . In this case,û is required such that B 2û = g and matrices Z 1 and Z 2 ∈ R n×(n−k) whose columns form a basis for the null-spaces N (B 1 ) and N (B 2 ), respectively, must be constructed.

Symmetry-preserving null-space approach for non zero C
As Golub et al. observe [1,Section 6], the null-space method cannot be applied to solve saddle point systems with C = 0 k,k . In a recent paper, Howell [3] proposes what he terms a one-sided application of the null-space method to solve non-singular generalized saddle point systems of the form in which H and C = 0 k,k are possibly unsymmetric, B 1 ∈ R k×n and B 2 ∈ R k×n are of full row rank and either B 1 and/or B 2 is dense. Howell introduces an auxiliary variable w that is used to replace the (n + k) × (n + k) system (4) by a larger (n + 2k) × (n + 2k) system Interchanging the second and third equations leads to another unsymmetric saddle point system Because the coefficient matrix in (6) is non singular with a zero (2, 2) block, the nullspace method can be applied. Howell provides further details and presents results for a number of practical applications that have a small number of dense rows and columns. However, a major problem with this approach is that it fails to preserve symmetry. If B 1 = B 2 and H and C are symmetric, the symmetry of the original problem is lost because of the swapping of the second and third equations in (5); it leads to the need to solve an n × n unsymmetric system of the form Z T 1ĤẐ 2v =ĝ, whereẐ j is a null-space basis matrix forB j (j = 1, 2). Howell does not comment on this aspect and the use of MATLAB backslash for solving linear systems within his numerical experiments obscures the fact that a symmetric problem has been traded for a larger unsymmetric one.
Our interest is in developing a null-space approach for (1) when C = 0 k,k and k is small that retains symmetry and leads to a transformed saddle point system of the same dimension n + k that has favourable numerical properties, specifically, a symmetric positive definite (1, 1) block and a (2, 2) block of dimension at most 2k. We consider the general case that both H and C are only SPSD and B may not have full row rank. Lemma 1 gives sufficient conditions for A to be nonsingular. This extends results from [1]; see also [11].

Lemma 1. Let
where H ∈ R n×n and C ∈ R k×k are SPSD matrices, and the rectangular matrix B ∈ R k×n is not necessarily of full row rank.
Proof Consider a vector w = u T v T such that Aw = 0. This means that H u + B T v = 0 and Bu − Cv = 0. Consequently, w T Aw = 0 implies u T H u + v T Cv = 0.
Because H and C are SPSD, u T H u = 0 and v T Cv = 0 and it follows that H u = 0 and Cv = 0 [12]. Consequently, B T v = 0 and Bu = 0. The null-space assumptions then imply that w = 0 and hence A is nonsingular.
The following theorem presents our proposed transformation without B needing to be of full row rank. Proof Because the saddle-point problem satisfies the assumptions of Lemma 1, the system matrix A is nonsingular. Let E be the n × n matrix of full rank given by where Z ∈ R n×(n−r) has full column rank and is such Set E to be the (n + k) × (n + k) matrix Then the transformed saddle point matrix is symmetric and non singular. We can rewrite this transformation as is the SPD leading principal submatrix of E T H E of order n − r, and

The transformed system becomes
whereũ ∈ R n−r andṽ ∈ R r+k and To solve (11), consider a conformal partitioning of the vectors u and f . Onceũ and v have been computed, the solution of the original problem (1) is given by

Remark 1. The claim in Theorem 1 that the transformed saddle point matrix
A given by (9) is symmetric and nonsingular cannot be strengthened to it being quasi definite, even when B has full row rank. The problem is that the submatrix (E T H E) n−r+1:n,n−r+1:n is then only symmetric.
The (1, 1) block H in the transformed system (11) is SPD. This allows block factorizations to be employed, which are useful not only for direct methods but also in the construction of preconditioners for use with iterative methods. For example, using the Cholesky factorization H = L 1 L T 1 yields where where L S is unit lower triangular and D S is block diagonal with 1 × 1 and 2 × 2 blocks.

An application: sparse-dense least squares problems
Consider the linear least-squares (LS) problem where the system matrix A ∈ R m×n (m ≥ n) and the right-hand side vector b ∈ R m are given. The solution x satisfies the n × n normal equations where, provided A has full column rank, the normal matrix C is symmetric positive definite. Our interest lies in the case where A is large and mainly sparse and includes a relatively small number of rows that are regarded as dense. These rows may be fully dense or have significantly more entries compared to the other rows of A or may contain far fewer entries than n but nevertheless lead to large amounts of fill in C. The presence of such rows has long been recognized as a fundamental difficulty in the solution of large LS problems; see, for example, [13] for a discussion and references to past work on this problem. Assuming the rows of A that are to be treated as dense have been permuted to the end and assuming a conformal partitioning of the vector b (and omitting the row permutation matrix for simplicity), we have where m s denotes the number of sparse rows of A, and m d is the number of dense rows, with m = m s + m d , m s ≥ n > m d ≥ 1 and m d m s . The normal equations become where C s = A T s A s is the reduced normal matrix. The solution of (13) can be obtained from the equivalent (n + m d ) × (n + m d ) system This is of the form (1) If A s has full column rank, C s is SPD. But in practice A s may contain one or more null columns, and C s is then singular with a corresponding number of null rows and columns. Even if C s has no null columns, it can be singular or highly ill conditioned. Recent studies have proposed ways of circumventing the problem of null columns in A s . In [13,14], it was shown how they can be dealt with either by perturbing diagonal entries of C s before applying a direct solver or by solving a number of related sparse LS problems and combining their solutions to give the solution of the original problem. Unfortunately, both methods incur overheads, with the former requiring the sparse direct solver is combined with an iterative solver and the latter needing the solution of a number of LS problems. These disadvantages led us to consider employing matrix stretching [15,16]. The main aim of matrix stretching is to split each of the dense rows (that is, each row of A d ) into a number of sparser rows and to formulate a (larger) modified problem from which the solution to the original problem can be derived. This has the advantage that, provided A is of full rank, the issue of null columns does not arise. However, when A s is very sparse, the dimensions of the stretched system can grow rapidly with m d so that the stretched LS problem can be considerably larger than the original problem. It may also be highly ill conditioned. Thus, we remain interested in developing alternative strategies and this was a key motivation behind the current work on null-space approaches.
The following result shows that, in the case of the full rank LS problem (14), the assumptions of Theorem 1 are satisfied, with r ≤ m d the rank of A d .

Lemma 2. Consider
Thus, provided a basis for N (A d ) can be constructed, the proposed null-space approach offers an alternative way of solving sparse-dense LS problems, even in the case that C s is SPSD.

Null-space basis construction
Solving large-scale saddle point systems using null-space methods leads to the problem of finding null-space bases that preserve sparsity and lead to a stable transformed system. Before looking at null-space basis construction for wide matrices, we give a brief overview of the historical development of techniques for constructing null-space bases of sparse matrices.

Construction of null-space bases of sparse matrices
A null-space basis matrix Z for B ∈ R k×n of rank r > 0 can sometimes be obtained directly by analysing the problem [17] but more advanced methods are normally necessary. In structural analysis, early methods were based on factorizations of B and focused on the sparsity of Z, with emphasis on Z having a banded or skyline sparsity pattern. Computation of Z based on an initial LU factorization of B was proposed by Topçu [18] (see also [19,20]). A specific strategy of this kind, called the turnback method [18,19], attracted interest in the late 1970s/early 1980s (see also an early parallel implementation [21] and its recent use in practice [22]). The turnback (backward looking) method finds null vectors by expressing n − r chosen start columns of B as linear combinations of a small set of previously numbered columns. The initial LU factorization serves to determine these start columns. Linear independence of the columns of Z is guaranteed by not using the leftmost columns in these linear combinations from any set computed for the remaining start columns. An overview and modifications to the basic approach (such as replacing LU by a QR factorization) are described in [23]; see also [24] for further refinements. More recently, factorization approaches for the computation of null-space bases of sparse matrices are discussed in [25,26].
Theoretical and algorithmic breakthroughs based on bipartite graph matchings and matroid theory came with the 1984 thesis of Pothen [27] and subsequent papers. The contributions covered not only new algorithmic approaches but also complexity concepts for the related problem of finding the sparsest null-space basis of B, that is, for finding Z with the smallest possible number of non zero entries (see also [28]). It has been shown that such a basis can be constructed by a greedy algorithm that assembles the basis using a sequential choice of the sparsest vectors belonging to N (B). While this can provide extremely sparse Z, there are two important practical limitations. Firstly, finding the sparsest vectors of a null-space basis for a general constraint matrix is NP-hard. Secondly, sparsity of Z may not be enough and its numerical properties must also be considered. Sophisticated proposals given in [27,29,30] were designed to compute sparse null-space bases of sparse matrices and they led to efficient algorithms for computing so-called fundamental and triangular null-space bases.
Origins of another line of research appear in the 1969 seminal paper of Henderson and Maunder [31] (see also [32]). Subsequently, a number of related algorithms for specific applications have been developed independently. These exploit the graph of B and, in particular, cycles in the graph. The idea is most transparent if B is the vertex-edge incident matrix of some underlying graph. In structural mechanics, cycles of the graph that describes the interconnection of separate substructures of a skeletal structure can be considered. Using the graph cycles, a set of independent null-space vectors that form columns of Z can be computed [31,33]. This construction was developed and discussed in the context of other approaches in [34]. In some partial differential equation applications the cycle approach is relevant because a simple constraint structure may be implied by discretization schemes [35][36][37][38][39]. Many publications discuss using cycles in the graph: pointers to relevant literature can be found in [40,41]. There are heuristic ways of finding the cycle basis, a task that is straightforward if the underlying graph is planar [39]. More generally, construction may be based on a spanning tree of the graph [27]. Procedures to find sparse cycle bases in this way are given in [42] (see also [43,44]). However, discretizations of three-dimensional grids may not allow sparse cycle bases to be found [38]. As with sparse factorization approaches, those based solely on local computations of Z may suffer from ill conditioning.
An interesting motivation is to construct Z using structure, for example, by allowing a permutation to block angular form. Such an approach was introduced in [45]. Closely related proposals to compute Z while explicitly exploiting substructuring, motivated in part by parallel computing, were given in [46,47] (see also [48]). The block form can be obtained algebraically, as in the nested dissection ordering discussed in [44]; see also [45].

Null-space basis for wide matrices
In contrast to much of the previous work on constructing null-space bases, our focus is on wide matrices B ∈ R k×n with k n. B may be sparse but because k is small, we may want to treat B as dense (although the null-space basis construction may still involve sparse operations). As it can happen in practical applications, we allow for B being rank deficient (rank(B) = r < k). We want the nearly-square null-space basis matrix Z ∈ R n×(n−r) to be sparse but we are also concerned about conditioning. Our experiments consider the quality of Z and, as we employ a sparse direct solver to factorize the (1, 1) block Z T H Z given by (10) of the transformed system, we need Z to have no dense rows and be such that Z T H Z is sparse. Orthogonal transformations inevitably lead to dense Z and so, although useful in other situations (see, for instance, [49]), are not appropriate here. Because B is wide, each column of Z can be computed independently and such that each has small support (the entries can be computed as the coefficients of linear combinations of at most k other columns). Small support means that there is always some orthogonality but Z may still be ill conditioned. Indeed, if B is an adjacency matrix of a domain discretized by mixed hybrid finite elements, then the condition number of the null-space basis of the constraints that express the hybridization with small support can grow like h −2 , where h is the discretization parameter [38]. Thus, we propose employing a QR factorization that incorporates pivoting for stability and, because B has only a small number of rows, this is not prohibitively expensive.

Null-space bases with local support
This section discusses constructing (a permutation of) Z by finding linearly dependent sets of columns with close column indices. The coefficients of the linear combination of these columns that sum to zero are the values of the entries in Z. These bases are similar to those belonging to the category of triangular null-space  1 showing B ∈ R 1×5 , P ∈ R 5×5 , the permuted matrix BP and Z computed using Algorithm 2. E satisfies (8) bases [29], but employment of QR factorizations with pivoting leads to Z having a more general sparsity pattern. Our focus is on the numerical qualities of the basis combined with limiting the number of entries through the employment of a threshold pivoting strategy (thus, as is common with sparse matrix factorizations, there is a trade off between stability and sparsity).
We begin with the simple case in which B comprises a single fully dense row (k = 1) and consider two extreme cases for the structure of a sparse null-space basis Z ∈ R n×(n−1) . First, Z can be constructed to have non zeros only in its first row and on the diagonal of Z 2:n,1:n−1 . The entries in row 1 are obtained by dividing entries 2 to n of B by the entry in its (1, 1) position. Column pivoting is needed to ensure that this entry is large. The resulting Z is sparse but Z T H Z is dense and so is not of interest here.
The second extreme case constructs Z to be a permutation of a matrix with a banded structure; it is presented as Algorithm 2 and illustrated in Fig. 1. When B ∈ R 1×n we have Q = I 1 , but we include the Q factor in the algorithm description because it will be needed in subsequent more general algorithms. The R factor is a single row with its first entry β 1 = 0. The computed Y (recall (7)) is sparse, has full rank, is well conditioned because of the column pivoting and belongs to the space of vectors Z ⊥ [8,50]. In particular, BY = 1. We remark that a similar approach was recently proposed by Howell [3, Algorithm 3].

A dominant part of the saddle-point matrix (11) is its (1, 1) block Z T H Z. Its condition number cond(Z T H Z) is bounded by
where Z 0 is an orthogonal null-space matrix (see Lemma 10 in [51]). Consequently, it is desirable for cond(Z T Z) to be small. Constructing Z so that P T Z has a narrow band may not be sufficient to guarantee this, as can be deduced from the following result for the idealized example of B comprising a single row of all ones.

Lemma 3.
Let B ∈ R 1×n be a row vector of all ones and construct Z using Algorithm 2. Then the condition number of Z T Z ∈ R (n−1)×(n−1) is asymptotically of order n 2 .
Proof The permutation P is the identity and the computed Z is bidiagonal with diagonal entries equal to −1 and off diagonal nonzeros equal to 1. Z T Z is a well-known Laplacian matrix with eigenvalues given by Consequently, the largest eigenvalue goes asymptotically to 4 and the smallest one is 4 sin 2 π 2 n ≈ π 2 n 2 . The result follows.
Algorithm 3 considers general B with 1 ≤ k < n rows and rank r such that 0 < r ≤ k. The basic idea is to express columns of Z as linear combinations of previous columns with close indices. r is computed using a QR factorization with column pivoting. The columns of B corresponding to the first r columns of the R factor are permuted to the front and the remaining n−r columns are marked as dependent; they correspond to the columns of Z. They are computed independently while aiming to balance sparsity and numerical stability. As in other sparse numerical linear algebra algorithms that need to combine locality with ensuring stability, this is achieved using threshold column pivoting. The principle behind threshold pivoting is as follows. Assume a threshold parameter θ is given with 0 < θ ≤ 1. At a step of the QR factorization, assume that is the maximum norm of all the columns eligible to be chosen as the next pivot column. While full column pivoting selects the pivot column to be an eligible column with norm , threshold pivoting chooses the pivot column to be the eligible column with norm at least θ and such that its index is closest to the index of the first (starting) column of the factorized matrix. In Algorithm 3, we perform QR factorizations repeatedly in the loop commencing at Line 3 for a set of n − r starting columns. The initial permutation P guarantees to find r and determine the n − r starting columns, that is, the columns that are expressed in the loop as linear combinations of previous columns in the order determined by P . The local permutation P L reverses the column order so that threshold column pivoting chooses columns closest to the starting column first. The coefficients are then obtained by performing QR factorizations (with threshold partial pivoting) of submatrices of BP . While Algorithm 3 is somewhat technical, it essentially combines two steps: it first determines BP and then solves n−r subproblems for the entries of the n−r columns of Z. Permutations ensure the subproblems always find these entries. The matrix E computed by Algorithm 3 satisfies (8).
Incorporating threshold column pivoting in the QR factorizations in Algorithm 3 implies that, provided the parameter θ is not too small, the matrix BY should be reasonably well conditioned. Moreover, it guarantees that any dependent column ( B) :,l ofB can be stably expressed as a linear combination of previous columns. Note that the coefficients of this linear combination are expressed using the inverse ofR. Because each such column is used in the construction of just one column of Z, the columns of Z must be linearly independent. Observe that a null column (B) :,l implies a unit column in Z. Note also that Line 10 is well defined because the permutation P guarantees not only that there is a submatrix of rank r in each X l , l = r + 1, . . . , n, but also that Q T ( B) :,l cannot have non zeros in rows r + 1, . . . , k. Figure 2 shows the basis matrix Z constructed using Algorithm 3 for B ∈ R 2×6 of full rank; θ is set to 0.1. With this threshold choice, it is not necessary to permute column 6 of B (the one with largest norm) to the front and hence P = I and B =B. The loop starting in Line 3 considers the last n − r = 4 columns ofB and expresses each as a linear combination of the closest previous columns, successively accessing

Fundamental null-space basis
We next discuss constructing the so-called fundamental null-space basis, which was first mentioned in an unpublished 1962 paper by Wolfe [52] (see [29]).
Let B ∈ R k×n have full row rank and assume the columns have been permuted so that G = (B) 1:k,1:k is nonsingular. Then the fundamental null-space basis of B = G N is defined to be For rank(B) = r < k, let BP = Q R 1 R 2 0 k−r,r 0 k−r,n−r and P 1 BP 2 = L 1 L 2 I k−r U 1 U 2 0 k−r,r 0 k−r,n−r be the pivoted QR and LU factorizations of B, respectively. The fundamental nullspace basis of B may be expressed as either (17) or For the LU factorization, our experience is that complete pivoting should be used for stability. The biconjugation process of Hestenes [53] applied to a square matrix A ∈ R n×n yields a pair of biconjugate matrices (Ṽ , V ) such thatṼ AV is diagonal. Biconjugate factorizations were discussed in [54] and some practical extensions were given by Benzi [4]; see also the direct projection method [5,6]. Applying biconjugation approximately leads to so-called approximate inverse preconditioners [55]. We define a partial biconjugation factorization applied to B ∈ R k×n with rank(B) = r, 0 < r ≤ k ≤ n, to be a process that yields V ∈ R n×n and a lower trapezoidal L ∈ R k×n satisfying BV = L.
We term this the right oblique conjugation. Assuming r steps of an LDU factorization of B can be performed without pivoting, Algorithm 4 computes a null-space basis matrix Z ∈ R (n−r)×n by right oblique conjugation.
It initializes V to be the identity. The columns of V are then transformed so that, At the end of the algorithm, BV is lower trapezoidal. This means, in particular, that the last n − r columns form a null-space basis because they are linearly independent. To see the mutual relationship between oblique conjugation and LDU factorization, first consider the 'non-wide' square case (r = k = n). In this instance, V = U −1 , where U is the U factor of the LDU factorization of B without pivoting. This result follows from the fact that V is unit upper triangular by construction, the uniqueness of the LDU factorization without pivoting and (19) is satisfied (see [56]). Now consider r ≤ k < n. Because the construction commences with V = I n , and all the updates at Line 4 are well defined, the computed V is upper triangular and its last n − r columns have the shape of (16), where G represents the principal leading submatrix of B of dimension r. In the algorithm outline, for simplicity, we omit pivoting. Even with column pivoting, an assumption is needed to guarantee the factorization exists. It is sufficient to assume linear independence of the first r rows of B, which is weaker than requiring the first r steps of the LDU factorization of B can be performed without pivoting. We have the following straightforward result.
for some W ∈ R r×r and S ∈ R r×(n−r) . In particular, Z = S I n−r is the fundamental null-space basis corresponding to G N .
Proof Setting P to be the permutation matrix corresponding to column pivoting within the LU factorization and using the uniqueness of the first r steps of the LU factorization of BP , we have W = G −1 . Moreover, GS + N = 0 r,n−r , from which it follows that S I n−r ≡ −G −1 N I n−r is the fundamental null-space basis of the matrix B with permuted columns.
One reason why fundamental null-space bases using right oblique conjugation are of potential interest is the practical implementation of Algorithm 4 and relates to the fact that the one-sided factorization does not split the inverse of G into two factors. In practice, pivoting needs to be incorporated to maximize the magnitude of the quantity b T i v (i−1) i used in Line 4. While the rows of B can be stored in static data structures, the columns of V that are eligible to be pivot columns (those with indices j = i + 1, . . . , n) are updated and so need to be stored in a (single) dynamic data structure. By contrast, if the basis is constructed using a standard LU factorization, two sets of vectors (those that form the L and U factors) change dynamically and so must be stored using two dynamic data structures. This storage difference may be important when k is large (which is not the case in this paper). Note that the one-sided factorization is a useful way to explain relations among closely related computational approaches [57]. In exact arithmetic, the diagonal entries b T i v (i−1) i are inverses of the diagonal entries of the LDU factorization (see [4]). In finite precision arithmetic, there are other, and possibly more stable, ways of computing the pivots that we do not discuss here; theoretical properties of some biconjugation variants (with additional assumptions) are discussed in [58]. In the case k n, complete pivoting is not prohibitively expensive and may be needed for stability.
A further attraction of right oblique conjugation is its flexibility. It was introduced as a way to obtain the fundamental null-space basis, but it can be modified to give other null-space bases, including banded ones. To see this, consider the inner loop of where x 2 , . . . , x n are linearly independent. The real power to construct the null-space bases via oblique projections is apparent from the following result. It shows that Algorithm 2, which obtains the null-space basis Z by permuting the upper bidiagonal matrixZ, can be cast in the form of projections. We state the result without proof because it can be verified by direct checking.

Lemma 5. Algorithm 2 can be rephrased in terms of the general scheme of Algorithm 4 by using in Line 4 the oblique projections
with x j = e j −1 , j = 2, . . . , n.

Composite null-space basis for wide dense matrices
There are situations in which the null-space basis can be constructed from partial null-space bases in several steps, instead of the single pass of Algorithm 3. This may simplify the computation, allow it to be more parallel, or to be useful in specific cases, for example, when B has a particular block structure. The following is a straightforward extension of Theorem 6.4.1 of [59] for rank deficient blocks and non-orthonormal bases (a slightly less general version was described in [3]).

Theorem 2.
Consider the null-space basis Z F ∈ R n×(n−r 1 ) of the matrix F ∈ R k 1 ×n of rank r 1 and the null-space basis Z G ∈ R (n−r 1 )×(n−r 1 −r 2 ) of the matrix GZ F ∈ R k 2 ×(n−r 1 ) of rank r 2 , where G ∈ R k 2 ×n . Then the columns of the matrix This result allows Algorithms 2 and 3 to be generalized by adding rows (or blocks of rows) sequentially to B. An important application for such a construction is when a sequence of problems is generated by successively modifying B through the addition of further rows. A special case of a procedure of this type was proposed by Howell [3]. To demonstrate the mechanism, we introduce Algorithm 5 that applies Algorithm 3 repeatedly to τ ≥ 1 row blocks of B and the null-space basis construction exploits Theorem 2. For simplicity, we assume that the rows of B are dense and rank(B) = k (so that all the row blocks are also of full rank). In practice, B may contain zeros that fill in as the algorithm proceeds.
An important practical application of Algorithm 5 is B having a non-trivial block angular form as this can be exploited to reduce the work. Assume that B (of full row rank) can be ordered to the block angular form whereB i ∈ R k i ×n i andD i ∈ R k i ×n d . If N (B) can be composed from the null-space bases of theB i extended by zeros outside their domains, then the null-space basis matrix Z is of block angular form; see also the framework of substructuring in [47].
In our experiments, we consider the special case in which each row block contains a single row but B is not assumed to have full row rank. The approach is given in Algorithm 6 and illustrated in Fig. 3. It constructs the null-space basis Z as a product of bases Z i corresponding to the rows of B. Let ( B) 1

Numerical experiments
In this section, we present numerical results to illustrate the potential of the proposed approaches for computing null space bases for solving symmetric saddle point problems with a small non zero (2, 2) block. We start by presenting detailed results for a least squares problem that has 7 dense rows. In particular, we examine the effects of varying the threshold parameter θ on the construction of the null-space basis, reporting on its sparsity and orthogonality. Then for a wider set of least squares problems that have a few dense rows and using a fixed threshold, we compare the performances of Algorithms 3, 4 and 6 in terms of the sparsity of Z T C s Z and its condition number. Finally, we consider a problem coming from quadratic programming.
With the exception of PDE1 (which comes from the CUTEst linear program test set [60]), the test problems given in Table 1 are from the SuiteSparse Matrix Collection 1 . If necessary, a test matrix is transposed to give an over determined system (m > n). The problems are a subset of those used in the least squares study of Gould and Scott [61]. They were selected because they are real rectangular matrices of full rank that contain a small number of dense rows. Problem PDE1 was included because it is a large problem that illustrates the effects of a single dense row. The row block A d is identified using the variant of the approach of Meszaros [62] described in [16]. All reported norms are Euclidean norms. In our experiments, for any matrix

Results for problem scagr7-2b
Our initial experiments are for problem scagr7-2b. Figure 4 demonstrates the effect of varying the threshold parameter θ ∈ [0. 15,1] in Algorithm 3 on the sparsity of the null-space basis matrix Z and on the transformed normal matrix Z T C s Z. It confirms the significant advantage of using a small θ. Once θ is sufficiently small, there are sufficient degrees of freedom to find a suitable null-space basis Z and the sparsity of the matrices Z and Z T C s Z does not decrease. The sparsity patterns of Z and Z T C s Z for θ = 0.15, 0.5 and 1 are given in Figs. 5 and 6. The effects of varying θ on the orthogonality of the null-space basis (here and elsewhere this is measured as the norm of BZ) and on the condition number cond(Z T C s Z) are illustrated in Fig. 7. We see that, except for very small θ , BZ is small and there is little variation in cond(Z T C s Z). We also tested the fundamental null-space basis approaches discussed in Section 3.2.2. The standard approach based on the pivoted QR factorization (17) finds a well-conditioned submatrix and nnz(Z T C s Z) ≈ 7 × 10 6 for a range of values of θ. For right oblique conjugation (Algorithm 4), cond(Z T C s Z) = 2.8 × 10 9 (independently of θ ) and nnz(Z T C s Z) ≈ 1.4 × 10 6 . The greater density for the fundamental null-space approach is potentially an important disadvantage if it is used as here in combination with a sparse direct solver. However, we anticipate that the approach may be more attractive if Z T C s Z is applied implicitly, such as in the employment of an iterative solver; we plan to investigate this further in the future.
In Fig. 8, we plot cond(Z T Z) and ratio = A T res / res for Algorithms 3, 4 and 6 for the threshold parameter θ ∈ [0.15, 1] (here res = b − Ax is the least squares residual). We see that Algorithm 4 leads to the smallest condition numbers but the variation in ratio for the different approaches is much smaller.  proposed approaches for computing a null-space basis for wide B = A d ∈ R m d ×n .

Experiments on other matrices
In particular, we compare the standard pivoted QR computation of the fundamental null-space basis given by (17) with Algorithm 3 and the right conjugation approach of Algorithm 4.
The results demonstrate that, as expected, the different approaches lead to nullspace bases with complementary properties, with no single approach being uniformly advantageous.
In particular, we see that for the chosen threshold parameter, Algorithms 3 and 6 result in sparse transformed matrices but it can lead to a large condition number (as illustrated by problem lp agg). Furthermore, comparing the pivoted QR and Algorithm 4, which both construct fundamental bases, we see that the condition number estimate is similar but the latter produces a sparser transformed normal matrix.  Here nrow and cond denote the average number of nonzero entries in a row of Z T C s Z and the condition number estimate for Z T C s Z, respectively. ‡ indicates insufficient memory for condest; † indicates insufficient memory to construct Z T C s Z

Results for problem hues mod
The experiments presented so far targeted the saddle-point formulation of sparsedense least squares problems. The next example, hues mod, comes from a convex quadratic programming problem and is taken from the CUTEst test set [60]. While the (1, 1) Hessian block of size n = 10000 is well conditioned, the k = 2 rows that form the off-diagonal constraint block B are dense with entries that differ by 6 orders of magnitude. This is potentially challenging when constructing a sparse null space basis matrix Z. The (2, 2) block is C = 10 −6 × I 2 . In Fig. 9, we report results for Algorithm 3 for different values of the threshold pivoting parameter θ. We see that, even for small thresholds (θ ≈ 0.1), the orthogonality of Z is very good and both the number of entries nnz(Z T H Z) in the transformed (1, 1) block and its factor increase steadily with θ . Note that, for this problem, there is little fill in the factor of Z T H Z.
In Fig. 10, we illustrate how the solution time is influenced by the threshold parameter. This is the total solution time (in seconds). 2 We see that using a large threshold is expensive.

Concluding remarks and future directions
In this paper, we have proposed a new null-space approach for solving general symmetric saddle point systems with a small and non zero (2, 2) block and a (2, 1) block that may be rank deficient. An important motivation was solving large-scale linear least squares problems in which the system matrix has a small number of rows that are considered to be dense. Because the success of null-space approaches depends on being able to construct appropriate null-space bases, we have looked at how this can be done stably for our applications. In particular, our emphasis has been on null-space bases for k × n matrices that are wide (k n) and possibly dense. The standard QR-based fundamental null-space basis computation leads to a transformed matrix that is relatively dense but has the advantage of being generally well conditioned. If a sparse direct solver is employed, the blocks of the transformed matrix must be constructed explicitly and the factors will further fill in. In this case, the QR approach is not ideal; indeed, memory requirements limit the size of systems that can be tackled. Null-space bases computed using right conjugation are also well conditioned and offer the possibility of sparser transformed matrices.
Fundamental null-space bases are potentially attractive for iterative solvers if the basis can be efficiently applied implicitly and provided an effective preconditioner is available. In the future, we plan to develop preconditioners for use with an iterative solver for the solution of large-scale saddle-point systems with a small non zero (2, 2) block via our proposed null-space transformation. Possible lines of research are the left inverses proposed by Nash and Sofer [51] and the factorization behind the conjugation process. Preconditioning of the transformed system based on constructing Z using Algorithm 3 with a small threshold parameter may be more straightforward but possible ill conditioning must be taken into account.
A further goal will be to satisfy linear constraints with a small residual, that is, using the least squares notation of Section 2.3, if we require A d x = b d then we need to ensure res d = b d − A d x is small. Applying Algorithm 3 to the test example scagr7-2b gives res d ∞ / res ∞ ≈ 5.7×10 −2 , with little variation for different values of the threshold parameter θ. Thus the constraints are not tightly satisfied. In the future, we will explore how we can use the null-space approach presented here in combination with other techniques to reduce res d ∞ .