Fast and accurate pseudoinverse with sparse matrix reordering and incremental approach

How can we compute the pseudoinverse of a sparse feature matrix efficiently and accurately for solving optimization problems? A pseudoinverse is a generalization of a matrix inverse, which has been extensively utilized as a fundamental building block for solving linear systems in machine learning. However, an approximate computation, let alone an exact computation, of pseudoinverse is very time-consuming due to its demanding time complexity, which limits it from being applied to large data. In this paper, we propose FastPI (Fast PseudoInverse), a novel incremental singular value decomposition (SVD) based pseudoinverse method for sparse matrices. Based on the observation that many real-world feature matrices are sparse and highly skewed, FastPI reorders and divides the feature matrix and incrementally computes low-rank SVD from the divided components. To show the efficacy of proposed FastPI, we apply them in real-world multi-label linear regression problems. Through extensive experiments, we demonstrate that FastPI computes the pseudoinverse faster than other approximate methods without loss of accuracy. Results imply that our method efficiently computes the low-rank pseudoinverse of a large and sparse matrix that other existing methods cannot handle with limited time and space.


Introduction
A pseudoinverse is a generalized inverse method for all types of matrices (Ben-Israel and Greville, 2003) that play a crucial role in obtaining best-fit solutions to the linear systems even when unique solutions do not exist (Strang, 2006).Pseudoinverses have been studied by many researchers in various domains, including mathematics and machine learning, from the viewpoint of theory (Ben-Israel and Greville, 2003), computational engineering (Golub and Van Loan, 2012), and applications (Guo et al., 2019;Xu and Guo, 2018;He et al., 2016;Spyromitros-Xioufis et al., 2016;Chen and Lin, 2012;Horata et al., 2013).
Although pseudoinverses have been widely applied, applications were limited to small data due to their high computational complexity.More specifically, the most widely applied pseudoinverse is the Moore-Penrose inverse; and the most elegant and precise solution for obtaining the Moore-Penrose inverse is by utilizing a singular value decomposition (SVD).However, calculating SVDs are impractical for large matrices, i.e., the time complexity of a full-rank SVD for an m × n matrix is min(O(n 2 m), O(nm 2 )) (Trefethen and Bau III, 1997).Low-rank approximation techniques (Halko et al., 2011;Feng et al., 2018) have been proposed to reduce the time complexity problem.However, costs can still be improved, especially for handling large matrices using larger rank approximations for higher accuracies; e.g., O(mn log(r)+(m+n)r 2 ) when a randomized algorithm is used, where r is the low-rank, is still large.
In this paper, we propose FastPI (Fast PseudoInverse), a novel approximation algorithm for computing pseudoinverse efficiently and accurately based on sparse matrix reordering and incremental low-rank SVD.FastPI was motivated by the observation that many real-world feature matrices are highly sparse and skewed.Based on this observation, FastPI reorders a feature matrix such that its non-zero elements are concentrated at the bottom right corner leaving a large sparse area at the top left of the feature matrix (Figure 3(e)).The reordered matrix is split into four submatrices, where one of the submatrices is the large and sparse rectangular block diagonal matrix, whose SVD is easy-to-compute.FastPI efficiently obtains the approximate pseudoinverse of the feature matrix by performing incremental low-rank SVD starting from the SVD of this block diagonal submatrix.Experiments show that FastPI successfully approximates the pseudoinverse faster than compared methods without loss of accuracy in the multi-label linear regression problem.Our contributions are the followings: -Observation.We observed that a sparse feature matrix can be transformed to a bipartite network (Definition 1) characteristic with a highly skewed node degree distribution (Figure 1).-Method.We propose FastPI, a novel method for efficiently and accurately obtaining the approximate pseudoinverse with sparse matrix reordering and incremental SVD (Algorithm 1).-Experiment.We show FastPI computes an approximate pseudoinverse faster than its competitors for most datasets without loss of accuracy in the multi-label linear regression experiments (Figures 5 and 6).

Preliminaries
We describe the preliminaries on pseudoinverse and singular value decomposition (SVD), and provide the formal definition of the problem and target application handled in this paper.Symbols used in the paper are summarized in Table 1.

Pseudoinverse and SVD
In many machine learning models, a training data is represented as a feature matrix denoted by A ∈ R m×n , where m is the number of training instances and n is the number of features.Learning optimal model parameters often involves pseudoinverse A † ∈ R n×m .The Moore-Penrose inverse is the most accurate and widely used generalized matrix inverse that can be solved using SVD as follows.
Problem 1 (Solving Moore-Penrose Inverse via low-rank SVD (Golub and Van Loan, 2012)) For feature matrix A ∈ R m×n , let A be decomposed into U m×r Σ r×r V r×n , where U m×r and V n×r are orthogonal matrices and Σ r×r is diagonal with r singular values.If r is the rank of A, the pseudoinverse A † of A is given by A † = V n×r Σ † r×r U r×m .Otherwise, for a given target rank r, it results in a best approximate pseudoinverse The state-of-the-art low-rank SVD is randomized-SVD with the computational complexity of O(mn log(r) + (m + n)r 2 ).Randomized-SVD utilizes randomized algorithm with oversampling technique (see the details in Section 4.1) for efficient computation (Halko et al., 2011).In cases of sparse matrices, Krylov subspace-based methods have also been shown to be efficient (Baglama and Reichel, 2005).However, both methods target problems that require very small ranks, while as accurate approximations of pseudoinverses require relatively large rank approximations of SVDs.Thus, the costs of existing low-rank SVDs are still too heavy for practical applications of pseudoinverses on large feature matrices.

Target Application of Pseudoinverse
We describe our target application, multi-label linear regression based on pseudoinverse as follows: Application 1 (Multi-label Linear Regression (Yu et al., 2014)) Given feature matrix A ∈ R m×n and label matrix Y ∈ R m×L where m > n, L is the number of labels, and each row of Y is a binary label vector of size L, the goal is to learn parameter Z ∈ R n×L satisfying AZ Y to estimate the score vector ŷ = Z a for a new feature vector a ∈ R n .
The linear system for unknown Z is over-determined when m > n; thus, the solution for Z is obtained by minimizing the least square error AZ − Y 2 F , which results in the closed form solution Z = A † Y (Chen and Lin, 2012).As described in Problem 1, A † V n×r Σ † r×r U r×m , where the equality holds when r is the rank of A. Hence, the SVD results can be used to compute pseudoinverse exactly or approximately in a multi-label linear regression.

Proposed Method
We propose FastPI (Fast PseudoInverse), a novel method for efficiently and accurately computing the approximate pseudoinverse for sparse matrices.We describe the overall procedure of FastPI in Algorithm 1.Our main ideas for accelerating the pseudoinverse computation are as follows: -Idea 1 (line 1).Many feature matrices collected from real-world domains are highly sparse and skewed as shown in Figure 1 (Section 3.1); and we show that these feature matrices can be reordered such that their non-zeros are concentrated as shown in Figure 3(e) (Section 3.2).-Idea 2 (line 2).The reordered matrix involves a large and sparse block diagonal submatrix whose SVD is easy-to-compute (Section 3.3).-Idea 3 (lines 3 and 4).The final SVD result of the feature matrix is efficiently obtained by incrementally updating the SVD result of the sparse submatrix (Section 3.3).

Observation from Real-world Feature Matrix
We first explain the skewness of feature matrices in the real-world datasets, which plays a key role in motivating the matrix reordering of FastPI.A notable characteristic of feature matrices collected from many real-world problems is that they are extremely sparse as shown in Table 3 (the details of the datasets are described in Section 4).This sparsity naturally leads us to interpret A as a sparse network.Moreover, rows of a feature matrix map training instances, columns map the features, and non-zero values map the relations between instance-to-feature pairs.Thus a feature matrix A naturally represents a bipartite network as follows: where V T is the set of instance nodes, V F is the set of feature nodes, and E is the set of edges between instance and feature nodes.For each non-zero entry a ij , an edge (i, j) is formed in G, where i ∈ V T and j ∈ V F .FastPI assigns the highest id to the feature node (id 7) and instance node (id 9), respectively.The nodes in spokes get the lowest ids and the GCC receives the remaining ids.We remove multiple hub nodes in each iteration of the algorithm to sufficiently shatter the graph.
Figure 1 depicts the degree distributions of instance and feature nodes in each bipartite network derived from the Amazon and RCV feature matrices, respectively.Note that the degree distributions of both bipartite networks are skewed, i.e., there are only a few high degree nodes.In network analysis, skewness of the degree distributions have been exploited to reorder association matrices for efficient analysis.In the network terminology, the high degree nodes are called hub nodes, or simply hubs, and the neighbor nodes of a hub node are called the spoke nodes, or simply spokes.There is no consistent threshold degree of a node for it to be considered a hub and often a relative proportion of high-degree nodes rather than an explicit threshold degree is used to select the hubs.
Previous works on real-world networks have shown that real-world networks can be shattered by removing sets of highest degree nodes (Kang and Faloutsos, 2011;Lim et al., 2014;Jung et al., 2016Jung et al., , 2017)).That is, when a set of hubs is removed from a connected component, a non-trivial portion of the nodes, i.e., the spokes, form small disconnected components, while the majority of the nodes remain in a giant connected component.In figure 2, we show that this shattering property of real-world networks also applies to bipartite graphs formed from feature matrices.We apply the shattering property to our feature matrix reordering (Algorithm 2 of FastPI).

Matrix Reordering of FastPI
Given a bipartite network G derived from feature matrix A (Definition 1), FastPI obtains permutation arrays π T : V T → {1, • • • , m} for instance nodes and π F : V F → {1, • • • , n} for feature nodes, such that the non-zero entries of the feature matrix are concentrated as seen in Figure 3(e).
The high-level mechanism of the matrix reordering procedure is summarized in Algorithm 2. The algorithm first selects hub instance and feature nodes at line 2 in the order of node degree; given a hub selection ratio 0 < k < 1, it chooses m hub ← k × |V T | hub instance nodes and n hub ← k × |V F | hub feature nodes, respectively.Then, it removes the selected hubs at line 3, such that the given network is split into three parts: 1) hubs, 2) giant connected Algorithm 2: Matrix Reordering of FastPI Input: bipartite network G = (V T , V F , E) derived from feature matrix A and hub selection ratio k Output: permutation arrays π T : component (GCC) (colored blue), and 3) spokes (colored green) to the hubs as shown in Figure 2.
After removing the hubs, we assign new nodes ids to each π T and π F according to their node types.In Figure 2, the initial number of instance and features nodes are |V T | = 9 and |V F | = 7.After line 3, the hub instance node gets the highest instance id 9 (= |V T |), and the hub feature node gets the highest feature id 7 (= |V F |).Note that those two hubs should be treated differently; the instance node id corresponds to a row index, and the feature node id corresponds to a column index in the feature matrix.At line 4, the nodes in spokes take the lowest ids as in Figure 2(b).The remaining ids are assigned to the GCC.The same procedure is recursively repeated on the new GCC at line 5.
Figure 3 depicts the matrix reordering in Algorithm 2 for the feature matrix of the Amazon dataset.The spy plot of the feature matrix before reordering is in Figure 3(a).Figures 3(b)∼3(d) shows the intermediate matrices of the reordering process.As iterations proceed, the non-zero entries of the feature matrix are concentrated at the bottom right corner of the feature matrix as shown in Figure 3(e).The final reordered matrix can be divided into four submatrices, or blocks, where the top left submatrix is a large and sparse block diagonal matrix.More specifically, the top and left submatrix contains small rectangular blocks at the diagonal area, where those blocks are formed by the spokes nodes; e.g., in Figure 2(b), the feature node with id 1 and instance nodes with ids 1-3 are grouped to form a tiny rectangular block on the diagonal area of the submatrix.

Incremental SVD Computation of FastPI
The reordered matrix A is divided into , where A 11 ∈ R m1×n1 , A 12 ∈ R m1×n2 , A 21 ∈ R m2×n1 , and A 22 ∈ R m2×n2 .Note that m 1 and n 1 are the number of spoke instance and feature nodes, respectively.m 2 and n 2 are the number of hub instance and feature nodes, respectively.

SVD Computation for A 11
This step computes the low-rank SVD result of A 11 .Note that A 11 is large and sparse, where many but small rectangular blocks are located at the diagonal area of A 11 as shown in Figure 3(e).In this case, SVD result of A 11 is efficiently obtained by computing SVD of each small block in A 11 instead of performing SVD on the whole submatrix.For ith-block is the low-rank approximated SVD with the target rank s i = αn 1i (let m 1i > n 1i without loss of generality).Then, the SVD result of A 11 is as follows: where B is the number of blocks and bdiag(•) is the function returning a rectangular block diagonal matrix with a valid block sequence.The obtained rank is s = B i=1 α n 1i ≈ αn 1 .Note that this is also a valid SVD result since U m1×s and V s×n1 are orthogonal matrices, and Σ s×s is diagonal, which follows the definition of SVD (Strang, 2006).

Incremental Update of the SVD result
The next step is to obtain the SVD result for [A 11 ; A 21 ], where ';' indicates a vertical concatenation.The SVD of [A 11 ; A 21 ] is calculated by incremental SVD (Brand, 2003;Ross et al., 2008) given the SVD of A 11 .The derivation for this incremental computation with the given target rank s = αn 1 is the followings: where Σ s×s = Σs×s , V s×n1 = Ṽ s×n1 , O is a zero matrix, and I is an identity matrix.Note that Ũ(s+m 2 )×s is orthogonal since the product of two orthogonal matrices is also orthogonal (Strang, 2006).Note also that any low-rank SVD algorithm can be used for this purpose; we use frPCA (Feng et al., 2018) for a given low target rank (r < 0.3n used), and the standard SVD otherwise since frPCA is optimized for very low ranks, and thus it is too slow for handling high ranks.
The final step is to incrementally update the SVD result in equation ( 2) for T = [A 12 ; A 22 ] with r = αn as follows: Low-rank approximation with r where Σ r×r = Σr×r , U m×r = Ũm×r , and V r×n = Ṽ r×(s+n 2 ) V s×n 1 O s×n 2 O n 2 ×n 1 I n 2 ×n 2 are also orthogonal.

Complexity Analysis
We analyze the computational complexity of Algorithm 1 in the following lemma: Lemma 1 (Computational Complexity of FastPI) Given a feature matrix A ∈ R m×n and a target rank r = αn , where α is the target rank ratio, the computational complexity of FastPI is O( prior to the final pseudoinverse construction (line 5 in Algorithm 1), where m 1 and n 1 are the number of spoke instance and feature nodes, respectively, m 2 and n 2 are the number of hub instance and feature nodes, respectively, B is the number of rectangular blocks, m 1i and n 1i are the height and the width of each rectangular block in A 11 , respectively, s i = αn 1i is the target rank of i-th block, |A| is the number of non-zeros of A, and T is the number of iterations in Algorithm 2.
Proof We summarize the complexity of each step of Algorithm 1 in Table 2.For this proof, we use the traditional complexity of the low-rank approximation as describe in (Gu and Eisenstat, 1996;Halko et al., 2011); for matrix A ∈ R p×q , the low-rank approximation takes O(pqk) time with target rank k.For a detailed comparison, we omit the cost of the final pseudoinverse construction (line 5 in Algorithm 1) because all SVD based methods should perform the construction as a common step.The complexity of each step of the algorithm is proved as follows: - -Line 3: in equation ( 2), the low-rank approximation takes O((m 2 +s)n 1 s) = O(m 2 n 1 s+n 1 s 2 ) time.The matrix multiplication for U m×s takes O(m 1 s 2 ) as follows: where Ũ(s+m 2 )×s = Ũs×s Ũm 2 ×s . Since s ≤ r, it is bounded by O(m 1 r 2 + n 1 r 2 + m 2 n 1 r) where s = αn 1 and r = αn and n 1 ≤ n. -Line 4: in equation ( 3), the low-rank approximation takes O(m(n 2 + s)r) = O(mn 2 r + msr), and the matrix multiplication takes O(n 1 s 2 ) as follows: The dominant factor is mr 2 in the complexity of the analysis of FastPI.As described in Section 2, O(mr 2 ) is faster than O(mnr) of traditional methods (Gu and Eisenstat, 1996).FastPI exhibits similar complexity to Randomized SVD, the state-of-the-art method with complexity of O(mr 2 + nr 2 + mn log(r)) (Halko et al., 2011).However, the actual running time of the Randomized SVD is slower than that of the FastPI for a reasonably high rank (see Figure 6).The reason is that Randomized SVD is based on oversampling  technique (see the details in Section 4.1); due to this point, Randomized SVD has a higher coefficient for the dominant factor compared to FastPI (i.e., Randomized SVD requires 4mr 2 operations while FastPI needs mr 2 ones for the same r).
Note that for a detailed comparison, we have omitted the cost of the final pseudoinverse construction (line 5 in Algorithm 1), which is a universal step in all SVD based methods.

Experiment
In this section, we aim to answer the following questions from experiments: -Q1.Reconstruction error (Section 4.2).Does FastPI correctly produce low-rank SVD results in terms of reconstruction error?-Q2.Accuracy (Section 4.3).How accurate is the pseudoinverse obtained by FastPI for the multi-label linear regression task compared to other methods?-Q3.Efficiency (Section 4.4).How quickly does FastPI compute the approximate pseudoinverse of sparse feature matrices compared to stateof-the-art methods?

Experimental Setting
Datasets.We use four real-world multi-label datasets, and their statistics are summarized in Table 3.The Bibtex dataset is from a social bookmarking system, where each instance consists of features from a bibtex item and labels are tags in the system (Katakis et al., 2008).The Eurlex dataset is from documents about European Union law, where each instance is formed by word features from a document and labels indicate categories (Mencia and Fürnkranz, 2008).The RCV dataset is randomly sampled from an archive of newswire stories made available by Reuters, Ltd., where each instance consists of features from a document, and labels are categories (Lewis et al., 2004).The Amazon dataset is randomly sampled from a set of reviews of Amazon, where each instance is formed by word features of a review and labels are items (McAuley and Leskovec, 2013).In Table 3, sparsity(A) indicates the sparsity of feature matrix A ∈ R m×n , which is defined as sp(A) = 1 − |A|/(mn), where |A| is the number of non-zero entries in A.
Machine and Implementation.We use a single thread in a machine with an Intel Xeon E5-2630 v4 2.2GHz CPU and 512GB RAM.All tested methods, including our proposed FastPI, are implemented using MATLAB which provides a state-of-the-art linear algebra package.
Competing Methods.We use the following methods as the competitors of FastPI: -RandPI is based on Randomized SVD (Halko et al., 2011), the state-ofthe-art low-rank SVD method for target rank r = αn using an oversampling technique for numerical stability.The main procedure of RandPI is as follows: - Ũ2r×r Σ r×r V r×n .
-Step 4: It computes U m×r = Q m×2r Ũ2r×r .-KrylovPI is based on a Krylov subspace iterative method for the lowrank SVD method (Baglama and Reichel, 2005).KrylovPI is specialized for computing a few singular values and vectors on a sparse matrix (used in svds of MATLAB).-frPCA (Feng et al., 2018) combines the randomized SVD and a power iteration method so that it controls trade-off between running time and accuracy for computing SVD of sparse data.This also exploits LU decomposition in the power iteration to improve accuracy.We set the oversampling parameter s to 5 and the number of iterations to 11 as in (Feng et al., 2018).Fig. 5: Accuracy of the multi-label linear regression task (Application 1) in terms of P@3 varying target rank ratio α (Section 4.3).Note that accuracies of all tested methods are almost the same for each α, implying that FastPI accurately computes the approximate pseudoinverse as other methods.

Reconstruction Error
We analyze the reconstruction error of the SVD result computed by each method to check if it computes the SVD result accurately.The reconstruction error of the SVD result U m×r Σ r×r V r×n from the original matrix A is defined as A − U m×r Σ r×r V r×n F , where r = αn is the target rank and • F is the Frobenius norm.We measure the error of each method varying the target rank ratio α from 0.01 to 1.0, respectively.
Figure 4 demonstrates the reconstruction error of all tested methods.The error of our FastPI is slightly better than that of RandPI, which is the state-ofthe-art SVD for low-rank SVD computation, especially when α is low.Another point is that the reconstruction error of our method is almost the same as that of KrylovPI.These show that reconstruction-wise, our method is near-optimal given rank ratio α for the SVD computation.

Accuracy of Multi-label Linear Regression
We examine the quality of the approximate pseudoinverse by measuring the predictive performance of each method in the multi-label linear regression task as described in the Application 1.For each experiment, we randomly split a multi-label dataset into a training set (90%) and a test set (10%); and compute the pseudoinverse on the training set varying the target rank ratio α from 0.01 to 1.0.Note that in the multi-label datasets, each instance has only a few positive (1) labels, i.e., the label matrix Y is sparse as shown in Table 3.Therefore, it is important to focus on the accurate prediction of the few positive labels.Due to this reason, many researchers (Chen and Lin, 2012;Prabhu and Varma, 2014;Yu et al., 2014) have evaluated the performance of this task using ranking based measures such as top-k precision, denoted by P@k, based on predicted scores.We also measure P@k as the accuracy of this task on the test set.Let y ∈ {0, 1} L be a ground truth label vector and ŷ ∈ R L be its predictive score vector, where L is the number of labels.Then, P@k = 1 k l∈rank k (ŷ) y l , where rank k (ŷ) returns the k largest indices of ŷ ranked in the descending order.
Figure 5 demonstrates the accuracy of each method for the task in terms of P@3.As expected, the accuracy depends on the rank ratio and there is little difference between the compared methods, which serves to verify our proposed FastPI is correctly derived.However, an interesting observation is made about the appropriate rank ratio for uses in multi-linear regression tasks: accuracy plots are curved indicating that there are underfittings when α are too small, and overfittings when α are too large.This implies that such lowrank approximation with a relatively large rank is effective in the viewpoint of the machine learning application.

Computational Performance
We investigate the computational performance of each method in terms of running time.For each experiment, we measure the wall-clock time of FastPI, RandPI, and KrylovPI varying the target rank ratio α from 0.01 to 1.0.
Figure 6 demonstrates the running time of methods on each dataset.First, the running time of KrylovPI, an iterative method, skyrockets since it requires more iterations for convergence as α increases.KrylovPI is specialized for computing a very few largest or smallest singular values and vectors on a large and sparse matrix (e.g., α = 0.01).Thus, it is impractical to use KrylovPI for obtaining relatively high-rank SVD results as shown in Figure 6.
We next compare FastPI to RandPI.As shown in Figure 6, FastPI is faster than RandPI over all datasets.Especially, the performance of RandPI becomes much worse as the target rank r = αn increases.The main reason is because of the oversampling technique of RandPI, i.e., it needs to perform several matrix operations on m × 2r matrices (see the details in Section 4.1).If r is very small, their computations are efficient.However, if r is large, and it is close to n, then RandPI needs to handle up to m × 2n matrices whose size is twice the size of original, thereby slowing down the execution speed for large α or high rank.
Finally, we look into the comparison between FastPI and frPCA.Overall, the performance of FastPI is better than that of frPCA, especially in the Eurlex and Bibtex datasets.In the Amazon and RCV datasets, although the running time of FastPI is similar to or slightly slower than frPCA for r ≤ 0.6, FastPI is faster than frPCA for r > 0.6.These results indicate that FastPI is competitive with frPCA for low ranks, and it is more efficient than frPCA for high ranks.

Conclusion
In this paper, we have shown how feature matrix reordering can speed up the approximate pseudoinverse calculation faster than the state-of-the-art lowrank approximation method for computing pseudoinverses.Our proposed approach FastPI (Fast PseudoInverse) is based on a crucial observation that many real-world feature matrices are considerably sparse and skewed, which have the possibility of being reordered.FastPI reorders the feature matrix such that the reordered matrix contains a large and sparse block diagonal matrix whose SVD is easily computed.FastPI then efficiently computes the approximate pseudoinverse through incremental low-rank SVD updates.We applied the FastPI on multi-label linear regression problem, a type of machine learning problem.Our experiments demonstrate that our FastPI computes SVD based pseudoinverse quickly for sufficiently large ranks compared to other methods.

Fig. 1 :
Fig.1: Degree distributions of instance and feature nodes in a bipartite network derived from real-world feature matrices.Note that there are few high degree nodes while the majority of nodes have low degrees, implying skewness on the degree distributions.
Fig.2:A bipartite network after one iteration of Algorithm 2, where a square indicates a feature node and a circle indicates an instance node.FastPI assigns the highest id to the feature node (id 7) and instance node (id 9), respectively.The nodes in spokes get the lowest ids and the GCC receives the remaining ids.We remove multiple hub nodes in each iteration of the algorithm to sufficiently shatter the graph.

Fig. 3 :
Fig. 3: The matrix reordering process of FastPI on the feature matrix of the Amazon dataset.(a) depicts the original matrix, (b-d) are the reordered matrix after several iterations in Algorithm 2, and (e) is the matrix after the final iteration.As shown in (e), the non-zero entries of the feature matrix are concentrated by the matrix reordering such that it is divided into four submatrices where A 11 is a large and sparse rectangular block diagonal matrix.
Step 1: It generates a Gaussian over-sampled random matrix X n×2r for constructing randomized data matrix B m×2r = AX n×2r .-Step 2: It finds a matrix Q m×2r with orthonormal columns as a proxy of an orthogonal matrix from B m×2r satisfying A QQ A. -Step 3: It constructs Y 2r×n = Q 2r×m A and compute SVD of Y 2r×n

Fig. 4 :
Fig.4: Reconstruction error of the SVD result of each method varying target rank ratio α (Section 4.2).Note that the error of our FastPI is almost the same as that of KrylovPI, indicating that our method computes the SVD result near optimally for any α. 0

Table 1 :
Table of symbols.
U m 1 ×s Σ s×s V s×n 1 ← compute the SVD result for A 11 with target rank s = αn 1 Line 1: for each iteration, FastPI sorts the degrees of instance and feature nodes; thus, it requires up to O(m log(m)) since m > n.Then, it searches connected components in G using the breadth first search (BFS) algorithm in O(|A|) indicating the number of edges in the network.Hence, each iteration demands O(m log(m) + |A|) time.-Line 2: FastPI computes the low-rank approximated SVD of each rectangular block in O(m 1i n 1i s i ) with target rank s i = αn 1i ; thus, it is O(

Table 3 :
Dataset statistics.m is the number of instances (rows), and n is the number of features (columns) of feature matrix A. L is the number of labels of label matrix Y. |A| is the number of non-zero entries of A, and sp(A) is the sparsity of A defined in Section 4.1.k is the hub selection ratio of Algorithm 2. m 2 and n 2 are the number of instance and feature hub nodes, respectively.