Skip to main content
Log in

Distance Preserving Model Order Reduction of Graph-Laplacians and Cluster Analysis

  • Published:
Journal of Scientific Computing Aims and scope Submit manuscript

Abstract

Graph-Laplacians and their spectral embeddings play an important role in multiple areas of machine learning. This paper is focused on graph-Laplacian dimension reduction for the spectral clustering of data as a primary application, however, it can also be applied in data mining, data manifold learning, etc. Spectral embedding provides a low-dimensional parametrization of the data manifold which makes the subsequent task (e.g., clustering with k-means or any of its approximations) much easier. However, despite reducing the dimensionality of data, the overall computational cost may still be prohibitive for large data sets due to two factors. First, computing the partial eigendecomposition of the graph-Laplacian typically requires a large Krylov subspace. Second, after the spectral embedding is complete, one still has to operate with the same number of data points, which may ruin the efficiency of the approach. For example, clustering of the embedded data is typically performed with various relaxations of k-means which computational cost scales poorly with respect to the size of data set. Also, they become prone to getting stuck in local minima, so their robustness depends on the choice of initial guess. In this work, we switch the focus from the entire data set to a subset of graph vertices (target subset). We develop two novel algorithms for such low-dimensional representation of the original graph that preserves important global distances between the nodes of the target subset. In particular, it allows to ensure that target subset clustering is consistent with the spectral clustering of the full data set if one would perform such. That is achieved by a properly parametrized reduced-order model (ROM) of the graph-Laplacian that approximates accurately the diffusion transfer function of the original graph for inputs and outputs restricted to the target subset. Working with a small target subset reduces greatly the required dimension of Krylov subspace and allows to exploit the conventional algorithms (like approximations of k-means) in the regimes when they are most robust and efficient. This was verified in the numerical clustering experiments with both synthetic and real data. We also note that our ROM approach can be applied in a purely transfer-function-data-driven way, so it becomes the only feasible option for extremely large graphs that are not directly accessible. There are several uses for our algorithms. First, they can be employed on their own for representative subset clustering in cases when handling the full graph is either infeasible or simply not required. Second, they may be used for quality control. Third, as they drastically reduce the problem size, they enable the application of more sophisticated algorithms for the task under consideration (like more powerful approximations of k-means based on semi-definite programming (SDP) instead of the conventional Lloyd’s algorithm). Finally, they can be used as building blocks of a multi-level divide-and-conquer type algorithm to handle the full graph. The latter will be reported in a separate article.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

Notes

  1. In future work we plan to investigate the use of more powerful tangential multi-point Pad̀e approximations, e. g., see [8, 25].

  2. Vectors \({{\mathbf {w}}}_k\) need only computed at sampling points, and only rows \({{\mathbf {e}}}_{q_j}^T{{\mathbf {Q}}}_1\) need to be stored, which leads to significant savings in Algorithm A.2.

References

  1. Antoulas, A.C., Sorensen, D.C., Gugercin, S.: A survey of model reduction methods for large-scale systems. Contemp. Math. 280, 193–219 (2001)

    Article  MathSciNet  Google Scholar 

  2. Arioli, M., Benzi, M.: A finite element method for quantum graphs. IMA J. Numer. Anal. 38(3), 1119–1163 (2018)

    Article  MathSciNet  Google Scholar 

  3. Arthur, D., Vassilvitskii, S.: K-means++: the advantages of careful seeding. In: Proceedings of the Eighteenth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA ’07, pp. 1027–1035. Society for Industrial and Applied Mathematics, Philadelphia, PA (2007)

  4. Asvadurov, S., Druskin, V., Knizhnerman, L.: Application of the difference Gaussian rules to solution of hyperbolic problems. J. Comput. Phys. 158(1), 116–135 (2000)

    Article  MathSciNet  Google Scholar 

  5. Asvadurov, S., Druskin, V., Knizhnerman, L.: Application of the difference Gaussian rules to solution of hyperbolic problems: II. Global expansion. J. Comput. Phys. 175(1), 24–49 (2002)

    Article  MathSciNet  Google Scholar 

  6. Bai, Z.: Krylov subspace techniques for reduced-order modeling of large-scale dynamical systems. Appl. Numer. Math. 43(1), 9–44 (2002). (19th Dundee Biennial Conference on Numerical Analysis)

    Article  MathSciNet  Google Scholar 

  7. Baker, G.A., Graves-Morris, P.R.: Padé Approximants, 2nd edn. Cambridge University Press (1996)

  8. Beattie, C.A., Drmač, Z., Gugercinm, S.: Quadrature-based IRKA for optimal H2 model reduction. IFAC-PapersOnLine 48(1), 5–6 (2015). (8th Vienna International Conference on Mathematical Modelling)

    Article  Google Scholar 

  9. Belkin, M., Niyogi, P.: Laplacian eigenmaps for dimensionality reduction and data representation. Neural Comput. 15, 1373–1396 (2003)

    Article  Google Scholar 

  10. Belkin, M., Niyogi, P.: Towards a theoretical foundation for Laplacian-based manifold methods. J. Comput. Syst. Sci. 74, 1289–1308 (2008)

    Article  MathSciNet  Google Scholar 

  11. Borcea, L., Druskin, V., Mamonov, A., Zaslavsky, M.: A model reduction approach to numerical inversion for a parabolic partial differential equation. Inverse Probl. 30(12), 125011 (2014)

    Article  MathSciNet  Google Scholar 

  12. Borcea, L., Druskin, V., Knizhnerman, L.: On the continuum limit of a discrete inverse spectral problem on optimal finite difference grids. Commun. Pure Appl. Math. 58(9), 1231–1279 (2005)

    Article  MathSciNet  Google Scholar 

  13. Cheng, X., Kawano, Y., Scherpen, J.M.A.: Graph structure-preserving model reduction of linear network systems. In: 2016 European Control Conference (ECC), pp. 1970–1975 (2016)

  14. Chung, F.R.K.: Spectral Graph Theory. American Mathematical Society (1997)

  15. Coifman, R.R., Lafon, S.: Diffusion maps. Appl. Comput. Harmon. Anal. 21(1), 5–30 (2006). (Special Issue: Diffusion Maps and Wavelets)

    Article  MathSciNet  Google Scholar 

  16. Damle, A., Minden, V., Ying, L.: Robust and efficient multi-way spectral clustering. CoRR. arXiv:1609.08251 (2016)

  17. Ding, C.H.Q., He, X., Zha, H., Gu, M., Simon, H.D.: A min–max cut algorithm for graph partitioning and data clustering. In: Proceedings IEEE International Conference on Data Mining, 2001. ICDM 2001, pp. 107–114. IEEE (2001)

  18. Dirac, P.A.M.: Bakerian lecture. The physical interpretation of quantum mechanics. Proc. R. Soc. Lond. Ser. A 180, 1–40 (1942)

    Article  Google Scholar 

  19. Druskin, V., Güttel, S., Knizhnerman, L.: Compressing Variable-Coefficient Exterior Helmholtz Problems via RKFIT. University of Manchester, Manchester Institute for Mathematical Sciences (2016)

  20. Druskin, V., Knizhnerman, L.: Two polynomial methods of calculating functions of symmetric matrices. USSR Comput. Math. Math. Phys. 29(6), 112–121 (1989)

    Article  MathSciNet  Google Scholar 

  21. Druskin, V., Knizhnerman, L.: Krylov subspace approximation of eigenpairs and matrix functions in exact and computer arithmetic. Numer. Linear Algebra Appl. 2(3), 205–217 (1995)

    Article  MathSciNet  Google Scholar 

  22. Druskin, V., Knizhnerman, L.: Gaussian spectral rules for the three-point second differences: I. A two-point positive definite problem in a semi-infinite domain. SIAM J. Numer. Anal. 37(2), 403–422 (1999)

    Article  MathSciNet  Google Scholar 

  23. Druskin, V., Knizhnerman, L.: Gaussian spectral rules for second order finite-difference schemes. Numer. Algorithms 25(1–4), 139–159 (2000)

    Article  MathSciNet  Google Scholar 

  24. Druskin, V., Mamonov, A., Zaslavsky, M.: Multiscale s-fraction reduced-order models for massive wavefield simulations. Multiscale Model. Simul. 15(1), 445–475 (2017)

    Article  MathSciNet  Google Scholar 

  25. Druskin, V., Simoncini, V., Zaslavsky, M.: Adaptive tangential interpolation in rational Krylov subspaces for MIMO dynamical systems. SIAM J. Matrix Anal. Appl. 35(2), 476–498 (2014)

    Article  MathSciNet  Google Scholar 

  26. Dyukarev, Y.M.: Indeterminacy criteria for the Stieltjes matrix moment problem. Math. Notes 75(1–2), 66–82 (2004)

    Article  MathSciNet  Google Scholar 

  27. Fan, L., Shuman, D.I., Ubaru, S., Saad, Y.: Spectrum-adapted polynomial approximation for matrix functions. arXiv~preprint arXiv:1808.09506v (2018)

  28. Flake, G.W., Tarjan, R.E., Tsioutsiouliklis, K.: Graph clustering and minimum cut trees. Internet Math. 1(4), 385–408 (2004)

    Article  MathSciNet  Google Scholar 

  29. Ingerman, D., Druskin, V., Knizhnerman, L.: Optimal finite difference grids and rational approximations of the square root I. Elliptic problems. Commun. Pure Appl. Math. 53(8), 1039–1066 (2000)

    Article  MathSciNet  Google Scholar 

  30. Johnson, E.L., Mehrotra, A., Nemhauser, G.L.: Min-cut clustering. Math. Program. 62(1–3), 133–151 (1993)

    Article  MathSciNet  Google Scholar 

  31. Knyazev, A.V.: Signed Laplacian for spectral clustering revisited. CoRR. arXiv:1701.01394 (2017)

  32. Lehoucq, R.B., Sorensen, D.C., Yang, C.: Arpack users guide: solution of large-scale eigenvalue problems with implicitly restarted Arnoldi methods (1998)

  33. Leskovec, J., Krevl, A.: SNAP datasets: Stanford large network dataset collection. http://snap.stanford.edu/data (2014)

  34. Newman, M.E.J.: Equivalence between modularity optimization and maximum likelihood methods for community detection. Phys. Rev. E 94(5), 052315 (2016)

    Article  Google Scholar 

  35. Ng, A.Y., Jordan, M.I., Weiss, Y.: On spectral clustering: analysis and an algorithm. In: NIPS (2001)

  36. Peng, J., Wei, Y.: Approximating k-means-type clustering via semidefinite programming. SIAM J. Optim. 18(1), 186–205 (2007)

    Article  MathSciNet  Google Scholar 

  37. Reichel, L., Rodriguez, G., Tang, T.: New block quadrature rules for the approximation of matrix functions. Linear Algebra Appl. 502, 299–326 (2016)

    Article  MathSciNet  Google Scholar 

  38. Shi, J., Malik, J.: Normalized cuts and image segmentation. In: CVPR (1997)

  39. Shi, P., He, K., Bindel, D., Hopcroft, J.: Local Lanczos spectral approximation for community detection. In: Proceedings of ECML-PKDD (2017)

  40. von Luxburg, U.: A tutorial on spectral clustering. Stat. Comput. 17(4), 395–416 (2007)

    Article  MathSciNet  Google Scholar 

  41. Wu, Y., Xu, J., Hajek, B.: Achieving exact cluster recovery threshold via semidefinite programming under the stochastic block model. In: 2015 49th Asilomar Conference on Signals, Systems and Computers, pp. 1070–1074 (2015)

  42. Xie, W., Bindel, D., Demers, A., Gehrke, J.: Edge-weighted personalized PageRank: breaking a decade-old performance barrier. In: Proceedings of ACM KDD 2015 (2015)

Download references

Acknowledgements

This material is based upon research supported in part by the U.S. Office of Naval Research under award number N00014-17-1-2057 to Mamonov. Mamonov was also partially supported by the National Science Foundation Grant DMS-1619821. Druskin acknowledges support by Druskin Algorithms and by the Air Force Office of Scientific Research under award number FA9550-20-1-0079. The authors thank Andrew Knyazev, Alexander Lukyanov, Cyrill Muratov and Eugene Neduv for useful discussions.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Alexander V. Mamonov.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix A: Two-Stage Model Reduction Algorithm

Here we introduce an algorithm for computing the orthonormal basis for \({{\mathcal {K}}}_{k_2}[{{{\mathbf {A}}}^\dagger },{{\mathbf {B}}}]\) defined in Proposition 3.2 and all the necessary quantities for the reduced order transfer function (3.13) and state-space embedding (3.21) (e.g., satisfying Assumption 3.1) via projection of the normalized symmetric graph-Laplacian \({{\mathbf {A}}}\) consecutively on two Krylov subspaces. This approach follows the methodology of [24] for multiscale model reduction for the wave propagation problem.

At the first stage we use the deflated block-Lanczos process, Algorithm C.1, to compute an orthogonal matrix \({{\mathbf {Q}}}_1 \in {\mathbb {R}}^{N \times n_1}\), the columns of which span the block Krylov subspace

$$\begin{aligned} {{\mathcal {K}}}_{k_1}({{\mathbf {A}}}, {{\mathbf {B}}}) = \text{ colspan } \{{{\mathbf {B}}}, {{\mathbf {A}}}{{\mathbf {B}}}, \ldots , {{\mathbf {A}}}^{k_1-1} {{\mathbf {B}}}\}. \end{aligned}$$
(6.1)

After projecting onto \({{{\mathcal {K}}}}_{k_1}({{\mathbf {A}}}, {{\mathbf {B}}})\), the normalized symmetric graph-Laplacian takes the deflated block tridiagonal form

$$\begin{aligned} {{\mathbf {T}}}_1 = {{\mathbf {Q}}}_1^T {{\mathbf {A}}}{{\mathbf {Q}}}_1 \in {\mathbb {R}}^{n_1 \times n_1}, \end{aligned}$$
(6.2)

as detailed in Appendix C. Note that the input/output matrix is transformed simply to

$$\begin{aligned} {{\mathbf {E}}}_1 = {{\mathbf {Q}}}_1^T {{\mathbf {B}}}\in {\mathbb {R}}^{n_1\times m}. \end{aligned}$$
(6.3)

Observe also that \(n_1 = \text{ dim }[ {{\mathcal {K}}}_{k_1}({{\mathbf {A}}}, {{\mathbf {B}}}) ]\), the number of columns of \({{\mathbf {Q}}}_1\), satisfies \(n_1 \le k_1 m\) with a strict inequality in case of deflation.

Remark A.1

Since the input/output matrix \({{\mathbf {B}}}\) is supported at the vertices in the target subset \(G_m\), repeated applications of \({{\mathbf {A}}}\) cannot propagate \({{\mathbf {B}}}\) outside of the connected components of the graph that contain \(G_m\). Therefore, the support of the columns of \({{\mathbf {Q}}}_1\) is included in these connected components. As a result, projection (6.2) is only sensitive to the entries of \({{\mathbf {A}}}\) corresponding to graph vertices that can be reached from \(G_m\) with a path of at most \(k_1-1\) steps.

The number of block-Lanczos steps \(k_1\) is chosen to attain the desired accuracy of the approximation of \({{\mathbf {Z}}}\) (projection of \({{\mathbf {B}}}\) on nullspace of \({{\mathbf {A}}}\)) and the requested lower eigenmodes via \({{\mathbf {Q}}}_1\) (i.e., to satisfy Assumption 3.1), that also gives good approximation of the diffusion transfer function on the entire time interval.

While the first stage provides a certain level of graph-Laplacian compression, the approximation considerations presented above may lead to the number of block-Lanczos steps \(k_1\) and the resulting subspace dimension \(n_1\) to be relatively large. It corresponds to Padé approximation of the transfer function \({{\mathcal {L}}}{{{\mathbf {F}}}}\) at \(\lambda =\infty \) while we are interested in \(\lambda =0\) as in (3.13) to obtain a good approximation in the lower part of spectrum. Therefore, our approach includes the second stage to compress the ROM even further. This is achieved by another application of the deflated block-Lanczos process to construct an approximation to \({{\mathbf {Q}}}_{12}\) from Proposition 3.2.

Let the columns of matrix \({{\mathbf {Z}}}_1\in {\mathbb {R}}^{n_1\times m_0}\) form an orthonormal basis for the nullspace of \({{{\mathbf {T}}}}_1\). Then we apply the deflated block-Lanczos Algorithm from Appendix C to compute an orthogonal matrix \({{\mathbf {Q}}}_2 \in {\mathbb {R}}^{n_1 \times n}\) such that

$$\begin{aligned} \text{ colspan }\left( {{\mathbf {Q}}}_2\right) = {{\mathcal {K}}}_{k_2}\left[ {{\mathbf {T}}}_{1}^{-1},\left( {{\mathbf {I}}}-{{\mathbf {Z}}}_1{{\mathbf {Z}}}_1^T\right) {{\mathbf {E}}}_1\right] , \end{aligned}$$
(6.4)

where compression is achieved by choosing \(k_2 < k_1\). The total dimension n satisfies (3.17) with a strict inequality in case of deflation.

For large enough \(k_1\), matrices

$$\begin{aligned} {{\mathbf {Z}}}\approx {{\mathbf {Q}}}_1{{\mathbf {Z}}}_1, \quad {{\mathbf {Q}}}\approx {{\mathbf {Q}}}_1{{\mathbf {Q}}}_2, \quad {{\mathbf {Q}}}_{12} \approx [{{\mathbf {Q}}}, {{\mathbf {Z}}}] \end{aligned}$$
(6.5)

and

$$\begin{aligned} {\widetilde{{\mathbf {A}}}}_{12} = {{\mathbf {Q}}}_{12}^T {{\mathbf {A}}}{{\mathbf {Q}}}_{12} \approx \left[ {{\mathbf {Q}}}_2, {{\mathbf {Z}}}_1\right] ^T {{\mathbf {T}}}_1 \left[ {{\mathbf {Q}}}_2, {{\mathbf {Z}}}_1\right] , \end{aligned}$$
(6.6)

approximate their counterparts from Proposition 3.2 and thus we obtain \({{\widetilde{\lambda }}}_j\) and \({\widetilde{\mathbf{s}}}_j\in {\mathbb {R}}^n\) as the eigenpairs of \({{\mathbf {T}}}\) and also

$$\begin{aligned} {{\mathbf {w}}}_j ={{\mathbf {Q}}}_{12}{\widetilde{\mathbf{s}}}_j. \end{aligned}$$
(6.7)

We summarize the model reduction algorithm below.

Algorithm A.2

(Two-stage model reduction).

Input: normalized symmetric graph-Laplacian \({{\mathbf {A}}}\in {\mathbb {R}}^{N \times N}\), target subset \(G_m\), numbers of Lanczos steps \(k_1\), \(k_2\) for the first and second stage deflated block-Lanczos processes, respectively, and the truncation tolerance \(\varepsilon \).

Output: \(n\le N\), \(m_0\le m\), \({\widetilde{{\mathbf {A}}}}_{12}\), \({\widetilde{\mathbf{s}}}_j\) and \({{\mathbf {w}}}_j\) for \(j=1,\ldots , n\).

Stage 1: Form the input/output matrix \({{\mathbf {B}}}\) (2.5) and perform the deflated block-Lanczos process with \(k_1\) steps on \({{\mathbf {A}}}\) and \({{\mathbf {B}}}\) with truncation tolerance \(\varepsilon \), as described in Appendix C, to compute the orthonormal basis \({{\mathbf {Q}}}_1 \in {\mathbb {R}}^{N \times n_1}\) for the block Krylov subspace (6.1) and the deflated block tridiagonal matrix \({{\mathbf {T}}}_1\). Compute \(m_0\), \({{\mathbf {Z}}}_1\).

Stage 2: Perform \(k_2\) steps of the deflated block-Lanczos process using matrix \({{\mathbf {T}}}_{1}^{-1}\) and initial vector \(({{\mathbf {I}}}-{{\mathbf {Z}}}_1{{\mathbf {Z}}}_1^T){{\mathbf {E}}}_1\), with \({{\mathbf {E}}}_1 \in {\mathbb {R}}^{n_1 \times m}\) and truncation tolerance \(\varepsilon \), as described in Appendix C, to compute n and the orthogonal matrix basis \({{\mathbf {Q}}}_2 \in {\mathbb {R}}^{n_1 \times n}\). Compute the remaining elements of the output using (6.5)–(6.7).

Remark A.3

Due to good compression properties of Krylov subspaces, \(n \ll n_1 \ll N\), thus, the computational cost of Algorithm A.2 is dominated by the first stage block-Lanczos process. In turn, assuming that no deflation occurs and each column of \({{\mathbf {A}}}\) has on average M nonzero entries, the cost of the first stage is driven by matrix products of \({{\mathbf {A}}}\) and the blocks of \({{\mathbf {Q}}}_1\) (containing m columns each, see step (2a) of Algorithm C.1). Since \(k_1\) such products are computed, the computational cost of the first stage and of the whole Algorithm A.2 can be estimated as \(O(k_1 M N m)\). Note that this analysis excludes an expensive reorthogonalization step (2j) of Algorithm C.1 that we do not perform in Stage 1 of Algorithm A.2, as mentioned in Appendix C.

To illustrate the compression properties of both stages of Algorithm A.2, we display in Fig. 9 the error of the transfer function for both Lanczos processes corresponding to the late, nullspace dominated, part of the diffusion curve for the Astro Physics collaboration network data set with \(N=18{,}872\) described in Sect. 5.3 with \(m=20\). For the first stage we plotted dependence of the error on \(k_1\) for \(k_2=15\). The second stage was performed using \({{\mathbf {T}}}_1\) and \(k_1 = 30\) that adds \(10^{-13}\) of relative error. Both curves exhibit superlinear (in logarithmic scale) convergence in agreement with the bounds of [20, 21]. Even without accounting for deflation, the first stage provides more than 30-fold compression of the full graph, and due to much faster convergence of \({{\mathbf {F}}}_2\), the second stage provides more than two-fold additional compression.

Fig. 9
figure 9

Relative errors of Stage 1 and Stage 2 deflated block-Lanczos processes in Algorithm A.2 versus the numbers of Lanczos steps \(k_1\) and \(k_2\), respectively, for AstroPhysics collaboration network data set

The choice of parameters \(k_1\) and \(k_2\) depends strongly on the graph structure, and normally is made adaptively by a posteriori error control, e.g., by extrapolating error from three consecutive iterations. For some scenarios we do not even need to perform Stage 2 of Algorithm A.2. For example, let us consider a family of graph Laplacians \({{\mathbf {L}}}\in {\mathbb {R}}^{N \times N}\) with random entries \(L_{ij} \in \{0; -1\}, ~ i \ne j\), chosen with probability

$$\begin{aligned} p \left( \{ L_{ij} = -1 \} \right) = 0.01, \quad i \ne j, \quad i,j = 1,\ldots ,N, \end{aligned}$$
(6.8)

where \(N=3000\), 6000, 12,000, 24,000 and 48,000.

Fig. 10
figure 10

Absolute errors of Stage 1 Lanczos process in Algorithm A.2 versus the numbers of Lanczos steps \(k_1\) for the family of random graph Laplacians (6.8) (left). Fiedler eigenvalue for the family of random graph Laplacians (right)

We show in the left plot in Fig. 10 how the \(l_2\) error of global diffusion on the graph from a randomly chosen single node, a.k.a. probability cloud at some late time (here \(t=10\)) depends on \(k_1\) at Stage 1 Lanczos process of Algorithm A.2. We also show a priori error bound obtained via Chebyshev series decomposition of the graph Laplacian exponential [20]. This bound is uniform for all matrices with a given spectral interval (e.g., [0, 2] for normalized random walk graph-Laplacians), and it is tight for large matrices with spectrum densely distributed on the spectral interval.

Note that convergence of Lanczos process (as well as the error bound) slows down monotonically with time, hence late times give the worse case scenario. As we observe in Fig. 10, the actual convergence rate is significantly faster than the bound, and for the family of graph Laplacians (6.8) we do not benefit from Stage 2 of Algorithm A.2. Also, surprisingly, the error decays faster for larger graph Laplacians. This is caused by the increase of Fiedler eigenvalue with respect to the size of graph Laplacian from the family (6.8), as shown in the right plot in Fig. 10. Indeed, this eigenvalue determines the late time asymptotics of diffusion process on a graph, and the larger it is, the fewer steps Lanzcos process needs to converge.

Appendix B: Interpretation in Terms of Finite-Difference Gaussian Rules

To connect the clustering approaches presented here to the so-called finite-difference Gaussian rules, a.k.a optimal grids [4, 5, 12, 22, 23, 29], we view the random-walk normalized graph-Laplacian \({{\mathbf {L}}}_{RW}\) as a finite-difference approximation of the positive-semidefinite elliptic operator

$$\begin{aligned} {{\mathcal {L}}} u(x) = - \frac{1}{\sigma (x)} \nabla \cdot \left[ \sigma (x) \nabla u(x) \right] , \end{aligned}$$
(6.9)

on a grid uniform in some sense defined on the data manifold, e.g., see [9]. Note that since we assume the grid to be “uniform”, all the variability of the weights of \({{\mathbf {L}}}_{RW}\) is absorbed into the coefficient \(\sigma (x) > 0\).

For simplicity, following the setting of [11], let us consider the single input/single output (SISO) 1D diffusion problem on \(x \in [0,1]\), \(t \in (0, \infty )\):

$$\begin{aligned} u_t(x, t) - \frac{1}{\sigma (x)} [\sigma (x) u_x(x, t)]_x = 0, \quad u(x, 0) = \delta (x), \quad u_x(0, t) = 0, \quad u_x(1, t) = 0,\nonumber \\ \end{aligned}$$
(6.10)

with a regular enough \(\sigma (x) > 0\), and the diffusion transfer function defined as

$$\begin{aligned} F(t) = u(0, t). \end{aligned}$$
(6.11)

Since both input and output are concentrated at \(x = 0\), the “target set” consists of a single “vertex” corresponding to \(x = 0\). Therefore, it does not make sense to talk about clustering, however, we can still use the SISO dynamical system (6.10)–(6.11) to give a geometric interpretation of the embedding properties of our reduced model and to provide the reasoning for Assumption 3.3.

The ROM (3.31)–(3.29) constructed for the system (6.10) transforms it into

$$\begin{aligned} {{\widetilde{{{\mathbf {u}}}}}}_t - {{\widetilde{{{\mathbf {D}}}}}}^{-1} {{\widetilde{{{\mathbf {L}}}}}}{{\widetilde{{{\mathbf {u}}}}}}= 0, \quad {{\widetilde{{{\mathbf {u}}}}}}|_{t=0} = {{\widetilde{{{\mathbf {D}}}}}}^{-1} {{\mathbf {e}}}_1, \end{aligned}$$
(6.12)

where \({{\widetilde{{{\mathbf {u}}}}}}, {{\mathbf {e}}}_1 \in {\mathbb {R}}^{n}\), \({{\widetilde{{{\mathbf {D}}}}}}, {{\widetilde{{{\mathbf {L}}}}}}\in {\mathbb {R}}^{n \times n}\), \(n = k_2\), with \({{\widetilde{{{\mathbf {D}}}}}}= \text{ diag } \{ {\widehat{h}}_1 \widehat{\sigma }_1, \ldots , {\widehat{h}}_{n} \widehat{\sigma }_{n} \}\), and \({{\widetilde{{{\mathbf {L}}}}}}\) is the second order finite-difference operator defined by

$$\begin{aligned}{}[{{\widetilde{{{\mathbf {L}}}}}}{{\widetilde{{{\mathbf {u}}}}}}]_{i} = \frac{\sigma _{i}}{h_{i}}\left( {{\widetilde{u}}}_{i} - {{\widetilde{u}}}_{i-1}\right) - \frac{\sigma _{i+1}}{h_{i+1}}\left( {{\widetilde{u}}}_{i+1} - {{\widetilde{u}}}_{i}\right) , \qquad i = 1,\ldots ,n, \end{aligned}$$
(6.13)

with \({{\widetilde{u}}}_0\) and \({{\widetilde{u}}}_{n+1}\) defined to satisfy the discrete Neumann boundary conditions

$$\begin{aligned} {{\widetilde{u}}}_0 = {{\widetilde{u}}}_1, \quad {{\widetilde{u}}}_{n} = {{\widetilde{u}}}_{n+1}. \end{aligned}$$
(6.14)

As we expect from Sections C and 4.2 , \({{\widetilde{{{\mathbf {L}}}}}}\) is indeed a tridiagonal matrix. Parameters \(h_i, {\widehat{h}}_i\), \(i = 1,2,\ldots ,n\) can be interpreted as the steps of a primary and dual grids of a staggered finite-difference scheme, respectively, whereas \(\sigma _i, \widehat{\sigma }_i\) are respectively the values of \(\sigma (x) > 0\) at the primary and dual grid points. Assumption 3.3 then follows from the positivity of primary and dual grid steps (a.k.a. Stieljes parameters) given by the Stieljes theorem [22].

Fig. 11
figure 11

Finite-difference interpretation of ROM graph-Laplacian realization for the 1D Laplace operator on [0, 1]. Primary and dual grid nodes are dots and stars, respectively

As an illustration, we display in Fig. 11 the optimal grid with steps \(h_i, {\widehat{h}}_i\), \(i = 1,\ldots ,10\) computed for \(\sigma \equiv 1\). The continuum operator \({{\mathcal {L}}}\) is discretized on an equidistant grid on [0, 1] with \(N = 100\) nodes, i.e., \({{\mathbf {L}}}_{RW} \in {\mathbb {R}}^{N \times N}\). The optimal grid steps were computed using the gridding algorithm from [19], which coincides with Algorithm A.2 and the subsequent transformations (3.29) and (3.31) for the 1D Laplacian operator. We observe that the grid is embedded in the domain [0, 1] and has a pronounced stretching away from the origin. We should point out, that such stretching is inconsistent with random-walk normalized graph-Laplacian formulation, employing uniform grids. Grid non-uniformity is the price to pay for the spectral convergence of the transfer function approximation. One can view Algorithm 4.1 as an embedding of the reduced-order graph back to the space of the original normalized random walk graph Laplacian \({{\mathbf {L}}}_{RW}\). The randomized choice of sampling vertices provides a uniform graph sampling.

For the general MIMO problem, Proposition 3.4 yields a symmetric positive-semidefinite block-tridiagonal \({{\widetilde{{{\mathbf {L}}}}}}\) with zero sum of elements in every row. The classical definition of graph-Laplacians requires non-positivity of the off-diagonal entries, which may not hold for our \({{\widetilde{{{\mathbf {L}}}}}}\). However, it is known that operators with oscillating off-diagonal elements still allow for efficient clustering if the zero row sum condition remains valid [31]. Matrix \({{\widetilde{{{\mathbf {L}}}}}}\) still fits to a more general PDE setting, i.e., when the continuum Laplacian is discretized in an anisotropic media or using high order finite-difference schemes. Such schemes appear naturally when one wants to employ upscaling or grid coarsening in approximation of elliptic equations in multidimensional domains, which is how we can interpret the transformed ROM (3.31).

Appendix C: Deflated Block-Lanczos Tridiagonalization Process for Symmetric Matrices

Let \({{\mathbf {M}}}= {{\mathbf {M}}}^T \in {\mathbb {R}}^{n \times n}\) be a symmetric matrix and \({{\mathbf {C}}}\in {\mathbb {R}}^{n \times m}\) be a “tall” (\(n > m\)) matrix with orthonormal columns: \({{\mathbf {C}}}^T {{\mathbf {C}}}= {{\mathbf {I}}}_m\), where \({{\mathbf {I}}}_m\) is the \(m \times m\) identity matrix. The conventional block-Lanczos algorithm successively constructs an orthonormal basis, the columns of the orthogonal matrix \(\widetilde{{{\mathbf {Q}}}} \in {\mathbb {R}}^{n \times m k}\), for the block Krylov subspace

$$\begin{aligned} {{{\mathcal {K}}}}_k({{\mathbf {M}}}, {{\mathbf {C}}}) = \text{ colspan } \{{{\mathbf {C}}}, {{\mathbf {M}}}{{\mathbf {C}}}, {{\mathbf {M}}}^2 {{\mathbf {C}}}, \ldots , {{\mathbf {M}}}^{k-1} {{\mathbf {C}}}\}, \end{aligned}$$
(6.15)

such that

$$\begin{aligned} \widetilde{{{\mathbf {T}}}} = \widetilde{{{\mathbf {Q}}}}^T {{\mathbf {M}}}\widetilde{{{\mathbf {Q}}}} \end{aligned}$$
(6.16)

is block tridiagonal and the first m columns of \(\widetilde{{{\mathbf {Q}}}}\) are equal to \({{\mathbf {C}}}\). The deflation procedure allows to truncate the obtained basis at each step.

Algorithm C.1

(Deflated block-Lanczos process).

Input: Symmetric matrix \({{\mathbf {M}}}= {{\mathbf {M}}}^T \in {\mathbb {R}}^{n \times n}\), a matrix \({{\mathbf {C}}}\in {\mathbb {R}}^{n \times m}\) of initial vectors with orthonormal columns, maximum number of Lanczos steps k such that \(m k \le n\), and truncation tolerance \(\varepsilon \).

Output: Deflated block tridiagonal matrix \(\widetilde{{{\mathbf {T}}}}\), orthogonal matrix \(\widetilde{{{\mathbf {Q}}}}\).

Steps of the algorithm:

  1. 1.

    Set \(\widetilde{{{\mathbf {Q}}}}_1 = {{\mathbf {C}}}\), \({\varvec{\beta }}_1 = {{\mathbf {I}}}_m\), \(m_1 = m\).

  2. 2.

    For \(j = 1, 2, \ldots , k\):

    1. (a)

      Compute \({{\mathbf {R}}}_j := {{\mathbf {M}}}\widetilde{{{\mathbf {Q}}}}_j\).

    2. (b)

      Compute \({\varvec{\alpha }}_j := \widetilde{{{\mathbf {Q}}}}_j^T {{\mathbf {R}}}_j\).

    3. (c)

      Compute \({{\mathbf {R}}}_j: = {{\mathbf {R}}}_j - \widetilde{{{\mathbf {Q}}}}_j {\varvec{\alpha }}_j\).

    4. (d)

      If \(j > 1\) then set \({{\mathbf {R}}}_j := {{\mathbf {R}}}_j - \widetilde{{{\mathbf {Q}}}}_{j-1} {\varvec{\beta }}_{j}^T\).

    5. (e)

      Perform the SVD of \({{\mathbf {R}}}_j\):

      $$\begin{aligned} {{\mathbf {R}}}_j = {{\mathbf {U}}}{\varvec{\Sigma }}{{\mathbf {W}}}^T, \end{aligned}$$
      (6.17)

      with orthogonal \({{\mathbf {U}}}\in {\mathbb {R}}^{n \times m_j}\), \({{\mathbf {W}}}\in {\mathbb {R}}^{m_j \times m_j}\), and a diagonal matrix of singular values \({\varvec{\Sigma }}\).

    6. (f)

      Truncate \({{\mathbf {U}}}, {\varvec{\Sigma }}, {{\mathbf {W}}}\) by discarding the singular vectors corresponding to the singular values less than \(\varepsilon \). Denote the truncated matrices by \(\widetilde{{{\mathbf {U}}}} \in {\mathbb {R}}^{n \times m_{j+1}}\), \(\widetilde{{\varvec{\Sigma }}} \in {\mathbb {R}}^{m_{j+1} \times m_{j+1}}\), \(\widetilde{{{\mathbf {W}}}} \in {\mathbb {R}}^{m_{j} \times m_{j+1}}\) , where \(m_{j+1} \le m_j\) is the number of the remaining, non-truncated singular modes.

    7. (g)

      If \(m_{j+1} = 0\) then exit the for loop.

    8. (h)

      Set \(\widetilde{{{\mathbf {Q}}}}_{j+1} := \widetilde{{{\mathbf {U}}}} \in {\mathbb {R}}^{n \times m_{j+1}} \).

    9. (i)

      Set \({\varvec{\beta }}_{j+1} := \widetilde{{\varvec{\Sigma }}} \widetilde{{{\mathbf {W}}}}^T \in {\mathbb {R}}^{m_{j+1} \times m_j} \).

    10. (j)

      Perform reorthogonalization \(\widetilde{{{\mathbf {Q}}}}_{j+1} := \widetilde{{{\mathbf {Q}}}}_{j+1} - \sum \nolimits _{i=1}^j \widetilde{{{\mathbf {Q}}}}_i (\widetilde{{{\mathbf {Q}}}}^T_i \widetilde{{{\mathbf {Q}}}}_{j+1})\) if needed.

  3. 3.

    endfor

  4. 4.

    Let \(\widetilde{k}\) be the number of performed steps, set

    $$\begin{aligned} \widetilde{{{\mathbf {Q}}}} := \left[ \widetilde{{{\mathbf {Q}}}}_1, \widetilde{{{\mathbf {Q}}}}_2, \ldots , \widetilde{{{\mathbf {Q}}}}_{\widetilde{k}} \right] \in {\mathbb {R}}^{n \times \widetilde{n}}, \end{aligned}$$
    (6.18)

    where \(\widetilde{n} = \sum \nolimits _{j=1}^{\widetilde{k}} m_j.\)

  5. 5.

    Set

    $$\begin{aligned} \widetilde{{{\mathbf {T}}}} = \begin{bmatrix} {\varvec{\alpha }}_1 &{}\quad {\varvec{\beta }}_2^T &{}\quad &{}\quad &{}\quad \\ {\varvec{\beta }}_2 &{} \quad {\varvec{\alpha }}_2 &{} {\varvec{\beta }}_3^T &{}\quad &{}\quad \\ &{}\quad \ddots &{}\quad \ddots &{}\quad \ddots &{}\quad \\ &{}\quad &{} \quad {\varvec{\beta }}_{\widetilde{k}-1} &{}\quad {\varvec{\alpha }}_{\widetilde{k}-1} &{} \quad {\varvec{\beta }}_{\widetilde{k}}^T \\ &{} \quad &{} \quad &{}\quad {\varvec{\beta }}_{\widetilde{k}} &{}\quad {\varvec{\alpha }}_{\widetilde{k}} \end{bmatrix} \in {\mathbb {R}}^{\widetilde{n} \times \widetilde{n}}. \end{aligned}$$
    (6.19)

Note that step (2j) of Algorithm C.1 is computationally expensive and is only needed for computations with finite precision. In practice, it is infeasible to perform for large data sets in the first stage of Algorihtm A.2. However, in the second stage of Algorithm A.2 and also when performing the third block-Lanczos process in ROGL construction, we deal with relatively small matrices \({{\mathbf {T}}}_1 \in {\mathbb {R}}^{n_1 \times n_1}\), \({{\mathbf {T}}}_2 \in {\mathbb {R}}^{n \times n}\), \(n_1,n \ll N\), so the reorthogonalization in step (2j) becomes computationally feasible.

Given that the bulk of the computational effort of ROGL construction via Algorithm A.2 is spent in Algorithm C.1, one may consider its parallelization to boost the overall performance. The most straightforward way of doing so is to parallelize the matrix product computation in step (2a). For a system with p processors, one may store N/p rows of \({{\mathbf {M}}}\) on each processor, multiply these submatrices by \(\widetilde{{{\mathbf {Q}}}}_j\) in parallel and then communicate the results across the processors so that each one has access to its own copy of \({{\mathbf {R}}}_j = {{\mathbf {M}}}\widetilde{{{\mathbf {Q}}}}_j\). All other operations of Algorithm C.1 can be performed locally at each processor to avoid communicating anything else except for the rows of \({{\mathbf {R}}}_j\).

Appendix D: Proof of Proposition 3.6

Proof

Similar to (3.34) and (3.36), for the vertex set \({{\widetilde{G}}}\) of the reduced-order graph we have

$$\begin{aligned} \left( D^p_{jk}({{\widetilde{G}}}) \right) ^2 = \left( \sqrt{\widetilde{D}_{jj}} {{\mathbf {e}}}_j^T - \sqrt{\widetilde{D}_{kk}} {{\mathbf {e}}}_k^T \right) ( {{\mathbf {I}}}- {{\mathbf {T}}}_3)^{2p} \left( \sqrt{\widetilde{D}_{jj}} {{\mathbf {e}}}_j - \sqrt{\widetilde{D}_{kk}} {{\mathbf {e}}}_k \right) \end{aligned}$$
(6.20)

and

$$\begin{aligned} C^2_{jk}({{\widetilde{G}}}) = \left( \frac{1}{\sqrt{\widetilde{D}_{jj}}} {{\mathbf {e}}}_j^T - \frac{1}{\sqrt{\widetilde{D}_{kk}}} {{\mathbf {e}}}_k^T \right) {{\mathbf {T}}}^\dagger _3 \left( \frac{1}{\sqrt{\widetilde{D}_{jj}}} {{\mathbf {e}}}_j - \frac{1}{\sqrt{\widetilde{D}_{kk}}} {{\mathbf {e}}}_k \right) . \end{aligned}$$
(6.21)

Here in a slight abuse of notation we let \({{\mathbf {e}}}_j\) and \({{\mathbf {e}}}_k\) be unit vectors in \({\mathbb {R}}^{n}\).

Due to (3.30), for \({{\mathbf {D}}}\) given by (2.2) we obtain

$$\begin{aligned} \left( D^p_{i_j, i_k}(G) \right) ^2 - \left( D^p_{jk}({{\widetilde{G}}}) \right) ^2 = L_{i_j, i_j} \Delta P^{2p}_{jj} + L_{i_k, i_k} \Delta P^{2p}_{kk} - 2 \sqrt{L_{i_j, i_j} L_{i_k, i_k}} \Delta P^{2p}_{jk}\nonumber \\ \end{aligned}$$
(6.22)

and

$$\begin{aligned} C^2_{i_j, i_k}(G) - C^2_{jk}({{\widetilde{G}}}) = \frac{1}{L_{i_j, i_j}} \Delta J_{jj} + \frac{1}{L_{i_k, i_k}} \Delta J_{kk} - 2 \frac{1}{\sqrt{L_{i_j, i_j} L_{i_k, i_k}}} \Delta J_{jk} \end{aligned}$$
(6.23)

where

$$\begin{aligned} \Delta P^{2p}_{jk} = {{\mathbf {e}}}_{i_j}^T ( {{\mathbf {I}}}- {{\mathbf {A}}})^{2p} {{\mathbf {e}}}_{i_k} - {{\mathbf {e}}}_{j}^T ( {{\mathbf {I}}}- {\widetilde{{\mathbf {A}}}})^{2p} {{\mathbf {e}}}_{k}, \quad j, k = 1,\ldots ,m, \end{aligned}$$
(6.24)

and

$$\begin{aligned} \Delta J_{jk} = {{\mathbf {e}}}_{i_j}^T {{\mathbf {A}}}^\dagger {{\mathbf {e}}}_{i_k} - {{\mathbf {e}}}_j^T {\widetilde{{\mathbf {A}}}}^\dagger {{\mathbf {e}}}_k, \quad j, k = 1,\ldots ,m, \end{aligned}$$
(6.25)

are ROM errors of approximations of polynomials and the pseudo-inverse, respectively.

Since the ROGL is obtained via a three-stage process, we make the analysis more explicit by splitting the errors into three parts corresponding to each stage

$$\begin{aligned} \Delta P^{2p}_{jk} = \Delta ^1 P^{2p}_{jk} + \Delta ^2 P^{2p}_{jk} + \Delta ^3 P^{2p}_{jk}, \end{aligned}$$
(6.26)

where

$$\begin{aligned} \Delta ^1 P^{2p}_{jk}&= {{\mathbf {e}}}_{i_j}^T\left( {{\mathbf {I}}}- {{\mathbf {A}}}\right) ^{2p} {{\mathbf {e}}}_{i_k} - {{\mathbf {e}}}_j^T \left( {{\mathbf {I}}}- {{\mathbf {T}}}_1\right) ^{2p} {{\mathbf {e}}}_k, \\ \Delta ^2 P^{2p}_{jk}&= {{\mathbf {e}}}_j^T \left( {{\mathbf {I}}}- {{\mathbf {T}}}_1\right) ^{2p} {{\mathbf {e}}}_k - {{\mathbf {e}}}_j^T \left( {{\mathbf {I}}}- {\widetilde{{\mathbf {A}}}}_{12}\right) ^{2p} {{\mathbf {e}}}_k, \\ \Delta ^3 P^{2p}_{jk}&= {{\mathbf {e}}}_j^T \left( {{\mathbf {I}}}- {\widetilde{{\mathbf {A}}}}_{12}\right) ^{2p} {{\mathbf {e}}}_k - {{\mathbf {e}}}_j^T \left( {{\mathbf {I}}}- {\widetilde{{\mathbf {A}}}}\right) ^{2p} {{\mathbf {e}}}_k. \end{aligned}$$

Similarly, for the pseudo-inverse error we have

$$\begin{aligned} \Delta J_{jk} = \Delta ^1 J_{jk} + \Delta ^2 J_{jk} + \Delta ^3 J_{jk}, \end{aligned}$$
(6.27)

where

$$\begin{aligned} \Delta ^1 J_{jk}&= {{\mathbf {e}}}_{i_j}^T {{\mathbf {A}}}^\dagger {{\mathbf {e}}}_{i_k} - {{\mathbf {e}}}_j^T {{\mathbf {T}}}^\dagger _1 {{\mathbf {e}}}_k, \\ \Delta ^2 J_{jk}&= {{\mathbf {e}}}_j^T {{\mathbf {T}}}^\dagger _1 {{\mathbf {e}}}_k - {{\mathbf {e}}}_j^T {\widetilde{{\mathbf {A}}}}^\dagger _{12} {{\mathbf {e}}}_k, \\ \Delta ^3 J_{jk}&= {{\mathbf {e}}}_j^T {\widetilde{{\mathbf {A}}}}^\dagger _{12} {{\mathbf {e}}}_k - {{\mathbf {e}}}_j^T {\widetilde{{\mathbf {A}}}}^\dagger {{\mathbf {e}}}_k. \end{aligned}$$

We note that \(\Delta ^3 P^{2p}_{jk} = \Delta ^3 J_{jk} = 0\) since the transformation from \({\widetilde{{\mathbf {A}}}}_{12}\) to \({\widetilde{{\mathbf {A}}}}\) is unitary. To finalize the proof, we refer to the known results in the theory of model reduction via Krylov subspace projection [21]. In particular, \(\Delta ^2 P^{2p}_{jk} \rightarrow 0\) and \(\Delta ^1 J_{jk} \rightarrow 0\) exponentially in n. Also, \(\Delta ^2 J_{jk}=0\) for \(k_2>1\) and \(\Delta ^1 P^{2p}_{jk} = 0\) for \(k_1 \ge p\). \(\square \)

It follows from the proof that if the second stage of Algorithm A.2 is exact then \(D^p_{i_j, i_k}(G) = D^p_{jk} ({{\widetilde{G}}})\) for \(k_1 \ge p\) (see [21]).

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Druskin, V., Mamonov, A.V. & Zaslavsky, M. Distance Preserving Model Order Reduction of Graph-Laplacians and Cluster Analysis. J Sci Comput 90, 32 (2022). https://doi.org/10.1007/s10915-021-01660-3

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s10915-021-01660-3

Navigation