Some algorithms for maximum volume and cross approximation of symmetric semidefinite matrices

Finding the $r\times r$ submatrix of maximum volume of a matrix $A\in\mathbb R^{n\times n}$ is an NP hard problem that arises in a variety of applications. We propose a new greedy algorithm of cost $\mathcal O(n)$, for the case $A$ symmetric positive semidefinite (SPSD) and we discuss its extension to related optimization problems such as the maximum ratio of volumes. In the second part of the paper we prove that any SPSD matrix admits a cross approximation built on a principal submatrix whose approximation error is bounded by $(r+1)$ times the error of the best rank $r$ approximation in the nuclear norm. In the spirit of recent work by Cortinovis and Kressner we derive some deterministic algorithms which are capable to retrieve a quasi optimal cross approximation with cost $\mathcal O(n^3)$.


Introduction
Given A ∈ R n×n and r ∈ N, this work is mainly concerned with the selection of row and column subsets of indices I, J ⊂ {1, . . . , n} of cardinality r with one of the following feature: for a low-degree polynomial p(·) and a matrix norm · .
A connection between problems (i) and (iii) is given by a result of Goreinov and Tyrtyshnikov [16] which says that if A(I, J) has maximum volume then the cross approximation A IJ satisfies the bound A − A IJ max (r + 1)σ r+1 (A), with σ k (·) indicating the k-th singular value and · max denoting the maximum magnitude among the entries of the matrix argument. We remark that, in general being a quasi optimal cross approximation does not imply any connection between the volume of A(I, J) and the maximum volume. Indeed, while (i) is an NP hard problem, it has been recently shown that a quasi optimal approximation with respect to the Frobenius norm always exists [31] and can be found in polynomial time [6].
Maximum volume. Problem (i) finds application in a varied range of fields that highlight how the maximum volume concept is multifaceted. For instance, identifying the optimal nodes for polynomial interpolation on a given domain, the so called Fekete points, can be recast as selecting the maximum volume submatrix of Vandermonde matrices on suitable discretization meshes [28]. In the optimal experimental design of linear regression models, it is of interest to select the subset of experiments which is influenced the least by the noise in the measurements.
To pursue this goal, the D-optimality criterion suggests to look at the covariance matrix of the model and find its principal subblock of maximum volume [20]. Other fields where (i) arises are rank revealing factorizations [23], preconditioning [1] and tensor decompositions [24]. Finding a submatrix with either exact or approximate maximum volume are both NP hard problems [5,30]. Despite this downside there has been quite some effort in the development of efficient heuristic algorithms for volume maximization. A central tool for our discussion is one of these methods: the Adaptive Cross Approximation (ACA) [2,18]. ACA is typically presented as a low-rank matrix approximation algorithm but it can be interpreted as a greedy method for maximizing the volume. When used for low-rank approximation, ACA is equivalent to a Gaussian elimination process with either partial or total pivoting, and it returns an incomplete LU factorization. In particular, the approximant computed by ACA is of the form in (1) although there is no clear relation between the maximum volume submatrix and the submatrix selected by ACA. On the other hand, the latter can be used as starting guess for procedures that "locally maximize" the volume, e.g., [15,22]. These algorithms guarantee that the volume of the submatrix that they return can not be increased with a small cardinality change of either its row or column index set. See also [25] for an analysis of these techniques.
In many situations the matrix A is symmetric positive semidefinite (SPSD). For instance, this setting arises in kernel-based interpolation [13], low-rank approximation of covariance matrices [18,21] and discretization of operators involving convolution with a positive semidefinite kernel function [3]. The SPSD structure comes with a major benefit: the submatrix of maximum volume is always attained for a principal submatrix [7]. Although this does not cure the NP hardness of the task, it reduces significantly the search space by adding the constraint I = J.
In Section 2.2 we propose a new efficient procedure for the local maximization of the volume over the set of principal submatrices. More specifically, our algorithm returns an r × r principal submatrix whose volume is maximal over the set of principal submatrices that can be obtained with the replacement of one of the selected indices. Implementation details and complexity analysis are discussed in Section 2.2.2. Numerical tests are reported in Section 2.4.
Maximum ratio of volumes. To the best of our knowledge, there is no reference to problem (ii) in the literature and there are no direct links with either (i) or (iii) when generic matrices A, B are considered. Nevertheless, we might think at the following situation: suppose that A is SPSD, B is banded and symmetric positive definite and that we want to compute a cross approximation of E := R −⊤ B AR −1 B -where R B indicates the Cholesky factor of B -without forming E. Since E is SPSD it would make sense to apply ACA with diagonal pivoting. However, this requires to evaluate the diagonal of E which is as expensive as forming the whole matrix.
Our idea is to replace the diagonal pivoting with the solution of (ii) as heuristic strategy for finding a cross approximation for E.
Indeed, the Binet-Cauchy theorem tells us that a principal minor of E satisfies If B is banded and well conditioned, then R B is banded and the magnitude of the entries of R −1 B decays exponentially with the distance from the main diagonal [9]. Under these assumptions we might have Based on this argument we propose to select J via a greedy algorithm for (ii) and return E J := E(:, J)E(J, J)E(J, :) as approximation of E. Note that, forming the factors of E J only requires to solve r linear systems with R B and to compute r matrix vector products with A. In Section 2.3 we describe how to extend the ACA based techniques for addressing (i) to deal with (ii). We conclude by testing the approximation property of the approach in Section 2.4.
Quasi optimal cross approximations. In contrast to the typical robustness of ACA and its simple formulation, very little can be said a priori on the quality of the cross approximation that it returns. Even for structured cases, a priori bounds for the approximation error contain factors that grow exponentially with r [19,18], with the only exception of the doubly diagonally dominant case [7].
Recently, Zamarshkin and Osinsky proved in [31] the existence of quasi optimal cross approximations with respect to the Frobenius norm by means of a probabilistic method. Derandomizing the proof of this result, Cortinovis and Kressner have shown in [6] how to design an algorithm that finds a quasi optimal cross approximation in polynomial time.
In Section 3.1 we describe how to modify the technique used in [31] to prove that for an SPSD matrix A there exists a quasi optimal cross approximation with respect to the nuclear norm which is built on a principal submatrix, i.e., I = J. This is of particular interest in uncertainty quantification: if A is the covariance matrix of a Gaussian process, then the nuclear norm of the error bounds the Wasserstein distance with respect to another Gaussian process that can be efficiently sampled [21].
In Section 3.2-3.3 we propose two algorithms, obtained with the method of conditional expectations, which are able to retrieve quasi optimal cross approximations of SPSD matrices in polynomial time. We conclude by discussing the algorithmic implementation and reporting, in Section 3.4, numerical experiments illustrating the performances of the methods.
Notation. In this work we use Matlab-like notation for denoting the submatrices. The identity matrix of dimension n is indicated with Id n and we use e j to denote the j-th column of the identity matrix, whose dimension will be clear from the context. The symbols · * , · F indicate the nuclear and Frobenius norm, respectively.

Maximizing the volume and the ratio of volumes
Given r ∈ N, an SPSD matrix A ∈ R n×n and a symmetric positive definite matrix B ∈ R n×n , the ultimate goal of this section is to discuss some numerical methods for dealing with the following optimization problems: When B = Id n , (3) reduces to (2); moreover (2) corresponds to the maximum volume problem because for an SPSD matrix, the maximum is attained at a principal submatrix [7]. We start by recalling a well known greedy strategy to deal with (2), the so-called Adaptive Cross Approximation (ACA) [18]. Then, we will see how to generalize ACA for addressing (3).

Adaptive cross approximation
The selection of high volume submatrices of A is intimately related with the low-rank approximation of A. The link is the cross approximation Cross approximations are attractive because to build A J only requires a partial evaluation of the entries of A, which is crucial when considering large scale matrices. Moreover, since the residual matrix R J := A − A J is SPSD, the approximation error can be cheaply estimated as When J is a maximum point of (2), A J yields a quasi optimal approximation error with respect to the maximum norm [16]. However, solving (2) is NP hard which paves the way to the use of heuristic approaches such as ACA.
The ACA method selects J with a process analogous to Gaussian elimination with complete pivoting. The algorithm begins by choosing j 1 = arg max j A jj and computes R J1 = A − A(: , j 1 )A −1 j1j1 A(j 1 , :). Then, the procedure is iterated on the residual matrices R Ji , i = 1, . . . , r − 1 in order to retrieve r indices. The elements (R Ji ) ji+1ji+1 correspond to the first r pivots selected by the Gaussian elimination with complete pivoting on the matrix A, and we have the identity where R J0 := A. In particular, (6) explains that each step of ACA augments the set of selected indices by following a greedy strategy with respect to the volume of the selected submatrix. The whole procedure is reported in Algorithm 1. Note that, if one stores the vectors u 1 , . . . , u r , then only the diagonal and the columns j 1 , . . . , j r , of A, need to be evaluated. The efficient implementation of the algorithm replaces the computation of the residual matrix at line 8 with the update of the diagonal of R J . In case the matrix A is not formed explicitly and its entries are evaluated with a given handle function, Algorithm 1 requires O(rn) storage and its computational cost is O((r + c A )rn) where c A denotes the cost of evaluating one entry of A.

Local maximization
Let us suppose that a certain index set J = {j 1 , . . . , j r } is given. Inspired by [15], we would like to know whether the volume of A(J, J) is locally optimal, in the sense that it can not be increased with the replacement of just one of the indices in J. Practically, this requires to check that: In the following sections we describe an efficient procedure to iteratively increase V(A(J, J)) based on the evaluation of the r(n − r) ratios in (7). An algorithm for the analogous, yet simpler, task when the index replacement affects only the row or the column index set has been proposed in [15].

Updating the determinant
Let us remark that each A( J , J) in (7) is a rank-2 modification of the matrix A(J, J). More precisely, if the index set J is obtained by replacing j i ∈ J with h ∈ {1, . . . , n} \ I, then and e i indicates the i-th vector of the canonical basis. Applying the matrix determinant lemma yields By denoting with D := A(J, J) −1 , B := A(:, J)D and with C := BA(J, :), we have that where we have used the identities Putting all pieces together we get Then, we might think at the following greedy scheme for increasing the volume of a starting submatrix A(J, J): 6. Identify Vĥî = max h,i V hi . If Vĥî > 1 + tol -for a prescribed tolerance tol -then update J by replacing jî withĥ and repeat the procedure. Otherwise stop the iteration.
We will discuss possible improvements to this algorithm in the next section.

Updating the quantities B, C and D
The previously sketched procedure requires, whenever the index set J is updated, to recompute the quantities B, C and D. Here, we explain how to leverage the old information to decrease the iteration cost. In the following, we assume that the new index J new is obtained by replacing The new matrix D is the inverse of a rank-2 modification of the old D, therefore it can be obtained with the Woodbury identity: i.e., as the difference of two rank-1 SPSD matrices, and performing a rank-1 update and a rank-1 downdate of the old Cholesky factor [29, Chapter 4, Section 3]. For instance, these routines are implemented in the Matlab command cholupdate.
The new matrix B is also a low-rank correction of the old B, given by Performing the updates of D and B with (8) and (9), respectively, brings down the iteration cost to O(r 2 + (r + c A )n), apart from the first iteration which remains O(r 3 + (r + c A )rn). The procedure is reported in Algorithm 3.
Since the use of the Woodbury identity is sometimes prone to numerical instabilities, e.g, when the selected submatrix is nearly singular, we may switch off the updating mechanism by setting the boolean variable do update to false at line 3. Update D via (8) 12: Update B via (9) 13: end if 14: return J 21: end procedure Finally, we remark that updating the diagonal elements of C with the relation would reduce the cost of line 14 in Algorithm 3 of a factor r. However, since this does not change the complexity of the iteration and requires to store additional intermediate quantities, it is not incorporated in our implementation.

A new algorithm for the maximum volume of SPSD matrices
Quite naturally, we propose to apply Algorithm 3 to the index set returned by Algorithm 1 as heuristic method for solving (2). The resulting procedure is ensured to return a locally optimal principal submatrix of A -in the sense of Section 2.2 -whose volume is larger or equal than the one returned by ACA. For completeness, we report the method in Algorithm 4.
By denoting with it the number of iterations performed by local maxvol, we have that the For instance, the extension of ACA to (3) looks for arg max j (R hi . We refer to the latter with local maxvol ratio and -due to its length -we refrain to write its pseudocode. Finally, the extension of Algorithm 4 to (3) is reported in Algorithm 5.
By denoting with it the number of iterations performed by local maxvol ratio, we have that the computational cost of Algorithm 5 is O((r + c A + c B )(r + it)n), where c B indicates the cost of evaluating one entry of B.

Numerical results
Algorithm 1-5 have been implemented in Matlab version R2020a and all the numerical tests in this work have been executed on a Laptop with the dual-core Intel Core i7-7500U 2.70 GHz CPU, 256KB of level 2 cache, and 16 GB of RAM. The parameter tol used in Algorithm 4 and Algorithm 5 has been set to 5 · 10 −2 for all the experiments reported in this manuscript. In the numerical tests involving the test matrix A 3 and Algorithm 3 the updating mechanism has been switched off by setting do update to false. Everywhere else, do update has been set to true.
The aforementioned test matrices are representative of various singular values distributions. A 1 , A 2 have a subexponential decay, A 3 , A 5 have an exponential decay and A 4 , taken from [17], is banded and well conditioned. We also indicate with R ⊤ 4 R 4 = A 4 the Cholesky factorization of A 4 . Test 1. As first experiment we run Algorithm 1 and Algorithm 4 on A 1 , A 2 , A 3 , by setting n = 1020 and varying the size r of the sought submatrix. For the matrices A 1 , A 2 we let r to range in {1, . . . , 100}. When experimenting on A 3 we consider r ∈ {1, . . . , 20} because of the small numerical rank of the Hilbert matrix. We measure the timings required by the two methods and the gain factor | det(A(J maxvol , J maxvol ))/ det(A(J aca , J aca ))| which Algorithm 4 provides with respect to Algorithm 1. From the results reported in Figure 1, we see that the costs of both algorithms scale quadratically with respect to the parameter r. For small values of r maxvol struggles to increase the volume of the submatrix returned by aca. This happen more often and more consistently for larger values of r. We mention that disabling the updates based on the Woodbury identity generally increases of about 20% the timings of Algorithm 4 for this test.
Test 2. The second numerical test considers maximizing the ratio of volumes (3). We keep n = 1020 and we run Algorithm 2 and Algorithm 5 using A 1 , A 2 , A 3 , as numerator and A 4 as denominator. The time consumption as the size r of the submatrix increases is reported Figure 2. Also in this case, quadratic complexity with respect to r is observed for the computational cost. The gain factor | det(A(J maxvol ratio , J maxvol ratio ))/ det(A(J aca ratio , J aca ratio ))| is shown as well in the bottom right part of Figure 2.
Test 3. Let us test the computational cost of aca, maxvol, aca ratio and maxvol ratio as the size of the target matrices increases. We fix r = 40 and we let n = 1020 · 2 t , t = 0, . . . , 5. Then, we run aca, maxvol on A 1 and maxvol, aca ratio on the pair (A 1 , A 4 ). The timings reported in Figure 3 confirm that the computational time scales linearly with respect to n.
Test 4. Finally, we test the quality of the cross approximations returned by aca ratio and maxvol ratio. More specifically, we compute the approximation error E i − (E i ) J 2 , i = 1, 2, 3, 5, with E i := (R ⊤ 4 ) −1 A i R −1 4 , n = 1020 and J chosen as either J aca ratio or J maxvol ratio . In Figure 4 we compare the error curves, as r increases, of the cross approximations with the ones associated with the truncated SVD, which represents the best attainable scenario. We see that the decay rate of the error of aca ratio is pretty similar to the one of the truncated SVD. maxvol ratio performs also well on the matrices which have a fast decay of the singular values, i.e., A 3 , A 5 . Its convergence slightly deteriorates for the matrices A 1 and A 2 .

Quasi optimal cross approximation in the nuclear norm
Adaptive cross approximation has a much lower cost than computing the truncated SVD for the low-rank matrix approximation, although the latter provides an optimal solution, in any unitarily invariant norm. Empirically, ACA typically returns an approximant that is close, in  terms of the associated approximation error, to the truncated SVD. However, it appears difficult to ensure this property theoretically, e.g., see the quite pessimistic bounds in [19,18,7] . On the other hand, there are some recent results about cross approximations with quasi optimal approximation error.
Zamarashkin and Osinsky proved in [31, Theorem 1] that, given A ∈ C m×n of rank k, ∀r Cortinovis and Kressner proposed in [6] a polynomial time algorithm to find I and J such that A IJ is quasi optimal with respect to the Frobenius norm. Their approach, inspired by [12], is based on the derandomization of the result by Zamarashkin and Osinsky with the method of conditional expectations. More precisely, let t r and assuming to have already selected the first t − 1 indices {i 1 , . . . , i t−1 }, {j 1 , . . . , j t−1 } of I and J, the pair (i t , j t ) is chosen as the one which minimizes Incrementally selecting all the indices with this criteria ensures that (I, J) identifies a cross approximation which verifies (10). Interestingly, (11) can be shown to be (r − t + 1) times the ratio of two consecutive coefficients in the characteristic polynomial of the symmetrized residual matrix In the next section, we analyze what can be achieved with cross approximations built on principal submatrices, when A is SPSD.

Existence result
In view of [7,Theorem 1] it is tempting to replace a symmetric choice of indices I = J in (10) when A is SPSD. However, such error bound it is not true in general as pointed out in [6,Section 3.2.3]. The following result shows that a quasi optimal error in the nuclear norm can be obtained by restricting the search space to principal submatrices. This yields a quasi optimal error in the Frobenius norm, with a constant increased by a factor √ n − r.
Before going into the proof of Theorem 3.1, let us state and prove some properties regarding the volume of principal submatrices.
and specifically: where the second equality has been proved in [31,Lemma 1]. If J is generic and A is SPSD, then A − A J is SPSD and its nuclear norm is the sum of its diagonal entries which are all Schur complements of the form given in (13); this yields (i). Proof of Theorem 3.1. Let us denote by Ω r the set of r × r principal submatrices of A. We show that (r + 1) · t r+1 σ t (A) is larger than the expected value of the cross approximation error, with respect to the following probability distribution on Ω r : .
Indeed, we have: where we used that once J is fixed, there are r + 1 possible choices for J. Finally, we have where the last inequality follows from the Cauchy-Schwarz inequality.

Derandomizing Theorem 3.1
Following the approach in [6], we obtain a deterministic algorithm for computing a cross approximation which verifies (12), by derandomizing Theorem 3.1. In order to do so, we need to determine the conditional expectation of the cross approximation error, with respect to a partial choice of the indices in J.

E( A −
Theorem 3.3 suggests to design an iterative scheme that in each step computes the characteristic polynomial of A − A Jt for all the possible choices of the last index j t and select the one which minimizes Interpreting A − A Jt as a rank-1 modification of A − A Jt−1 , we may look at the problem of updating the coefficients of the characteristic polynomial under a rank-1 change of the matrix. Since stable procedures, such as the Summation Algorithm [27, Algorithm 1], compute the characteristic polynomial from the eigenvalues, our task boils down to updating the eigenvalues of an SPSD matrix and in turn to computing the eigenvalues of a real diagonal matrix minus a rank-1 symmetric matrix. The latter can be transformed into a symmetric tridiagonal eigenvalue problem with a standard bulge chasing procedure [14, Section 5] and finally solved with Cuppen's divide and conquer method [8]. Both tridiagonalization and Cuppen's method require O(n 2 ) flops.
The certified cross approximation (CCA) obtained from the derandomization of Theorem (3.1) is reported in Algorithm 6. Note that all the operations inside the inner loop have at most a quadratic cost and computing the eigendecomposition at line 4 is cubic. Therefore, the asymptotic computational cost is O(rn 3 ).

Updating the characteristic polynomial via trace of powers
Each iteration of Algorithm 6 requires to update the eigendecomposition of the residual matrix, resulting in a computational cost O(rn 3 ). Here we discuss how, in principle, to reduce the complexity to O(r 2 n ω ) where 2 < ω < 3 is the exponent of the computational complexity of the matrix-matrix multiplication. The idea is that, since we need to update only a (small) portion of the characteristic polynomial we may avoid to deal with the eigendecomposition.
The coefficients of the characteristic polynomial of a matrix A can be expressed with the so for j ∈ {1 . . . , n} \ J do 8: u j = R(:, j)/ R(j, j), u j = Q ⊤ u j 9: Reduce Λ − u i u ⊤ j to a tridiagonal matrix T via bulge chasing 10: Compute the eigenvalues of T with Cuppen's method 11: Compute the characteristic polynomial of R − u j u ⊤ j via [27, Algorithm 1] 12: 13: if ratio < min ratio then 14: min ratio ← ratio, j * ← j so that Equation (14)  Let H k and H k := H k − u 2 e 1 e ⊤ 1 be the orthogonal projections of A and A − uu ⊤ on K k (A, u), then it holds Hence, to update the traces of the first k powers of A we may perform k steps of the Arnoldi method to get H k , H k , compute the trace of their powers (via their eigenvalues) and, finally, evaluate (16). Updating the traces for a single low-rank modification costs O(k · matvec(A) + k 2 n); so a procedure that naively applies this computation for the O(n) low-rank modifications still provides a cubic iteration cost -with respect to n -unless matvec(A) has a subquadratic cost. In the case O(matvec(A)) = O(n 2 ), we propose to carry on the Arnoldi step simultaneously for all the O(n) low-rank modifications u i u ⊤ i . More specifically, if u i(h) denotes the h-th vector computed by the Arnoldi process for K k (A, u i ), then we perform all the Arnoldi steps together by computing the matrix-matrix multiplication A · [u 1(h) | . . . |u n−k+1(h) ]. Theoretically, this yields the iteration cost O(kn ω ). This has also practical benefits because of the use of highly optimized BLAS 3 operations. The procedure for updating the trace of powers is reported in Algorithm 8; the certified cross approximation method (CCA2) that relies on Algorithm 8 is reported in Algorithm 7.
Unfortunately, Algorithm 7 suffers from the numerical instability of evaluating the determinant in (14). More specifically, when the matrix T (k) becomes nearly singular the use of standard techniques provide small singular values which are accurate only in an absolute sense. Methods that guarantee relative accuracy for singular values apply only to particular classes of matrices [10,11]; T (k) does not belong to any of such classes. On top of that, we often observe that the matrix T (k) becomes nearly singular quite fast as k increases; typically for k above 10 the computed ratio (15) has no reliable digits. In the next section we propose a strategy to partially circumvent this problem. T ← update traces(A, t, U, r − k + 2)

A restarted algorithm
In view of the instability issues related to evaluating (15), we propose to combine Algorithm 7 with a restarting mechanism. Let us assume that the rank of the sought cross approximation is r and thatr < r is a small value for which (15) can be computed with a sufficient accuracy. We might think at forming the index set J by the incremental application of Algorithm 7 with input parameterr. This means that we first compute a certified cross approximation of rankr of A. Then, we add to the latter a certified cross approximation of rankr of the residual matrix, and so on and so for. The procedure stops when we reach an index set J of cardinality r. We call this method quasi certified cross approximation and we report its pseudocode in Algorithm 9. Even though the cross approximation returned by Algorithm 9 is not guaranteed to verify (12), it is usually the case, as we will see in the numerical results.

Numerical results
Let us compare the performances of Algorithm 6 and Algorithm 9 on the test matrices A 1 , A 2 , A 3 , A 5 introduced in Section2.4. The bulge chasing procedure used in Algorithm 6 has been implemented in Fortran and is called via a MEX interface. When executing Algorithm 9, the parameter r has been set to 5.
Test 5. We set n = 100, ρ = 0.85 and we measure the nuclear norm of the cross approximation error, A − A J * , obtained with cca and quasi cca as the parameter r increases. The results are shown in Figure 5, where we also report the upper bound provided by Theorem 3.1 and the lower bound g(r) := j r+1 σ j , corresponding to the approximation error of the truncated SVD (TSVD). We see that, on all examples, the accuracy of cca and quasi cca is really close and often the convergence curves are not distinguishable. In addition, in the examples where the decay of the singular values is slow we notice that Theorem 3.1 tends to be pessimistic and the accuracy of cca and quasi cca is very close to the one of the TSVD. Test 6. Finally, we test the computational cost of the proposed numerical procedure. We fix r = 20, ρ = 0.85 and we run Algorithm 6 and Algorithm 9 on A 5 for n ∈ {100, 200, 400, 800, 1600}. The timings, reported in Figure 6, confirm the cubic complexity with respect to n of Algorithm 6. Although the complexity of the implementation of quasi cca is cubic as well (no fast matrix multiplication algorithm has been implemented), it results in a significant gain of computational time due to the more intense use of BLAS 3 operations.

Outlook
We have proposed several numerical methods for the solution of problems related to the selection of the maximum volume submatrix and the cross approximation of symmetric definite matrices. We remark that, the idea used for deriving Algorithm 2 and Algorithm 5 extends easily to combinatorial optimization problems of the form for a multivariate function f and SPSD matrices A 1 , . . . , A p .
Also the second part of the manuscript can inspire some future works. For instance, the fact that the maximum volume submatrix of a diagonally dominant matrix is principal might suggest that a result analogous to Theorem 3.1 holds also for diagonally dominant matrices. However, it is not straightforward to adjust the proof of Theorem 3.1 to this case because we lose the connection between the sum of the volumes of the principal submatrices and the coefficients of the characteristic polynomial.
Another interesting point is to understand whether the ratio of determinants in (15) can be computed with high relative accuracy. This would pave the way to the use of cca2 without incorporating any restart mechanisms.
Finally, in the case of large scale matrices one might derive new scalable algorithms for computing cross approximations by combining Algorithm 6-9 with heuristic techniques for reducing the dependence on n in the computational cost.