Eigen-stratified models

Tuck, Jonathan; Boyd, Stephen

doi:10.1007/s11081-020-09592-x

Eigen-stratified models

Research Article
Published: 06 January 2021

Volume 23, pages 397–419, (2022)
Cite this article

Optimization and Engineering Aims and scope Submit manuscript

Jonathan Tuck¹ &
Stephen Boyd¹

260 Accesses
1 Citation
Explore all metrics

Abstract

Stratified models depend in an arbitrary way on a selected categorical feature that takes K values, and depend linearly on the other n features. Laplacian regularization with respect to a graph on the feature values can greatly improve the performance of a stratified model, especially in the low-data regime. A significant issue with Laplacian-regularized stratified models is that the model is K times the size of the base model, which can be quite large. We address this issue by formulating eigen-stratified models, which are stratified models with an additional constraint that the model parameters are linear combinations of some modest number m of bottom eigenvectors of the graph Laplacian, i.e., those associated with the m smallest eigenvalues. With eigen-stratified models, we only need to store the m bottom eigenvectors and the corresponding coefficients as the stratified model parameters. This leads to a reduction, sometimes large, of model size when $m\le n$ and $m \ll K$. In some cases, the additional regularization implicit in eigen-stratified models can improve out-of-sample performance over standard Laplacian regularized stratified models.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Marginal and simultaneous predictive classification using stratified graphical models

Article 11 February 2015

On the CLT on Low Dimensional Stratified Spaces

Statistical inference using rank-based post-stratified samples in a finite population

Article 03 November 2018

Availability of data and material

All data is made available at www.github.com/cvxgrp/strat_models.

References

Ahmed N, Natarajan T, Rao K (1974) Discrete cosine transform. IEEE Trans Comput C-23(1):90–93
Aspvall B, Gilbert J (1984) Graph coloring using eigenvalue decomposition. SIAM J Matrix Anal Appl 5(4):526–13
MathSciNet MATH Google Scholar
Boyd S, Vandenberghe L (2004) Convex Optimization. Cambridge University Press, Cambridge
Book Google Scholar
Boyd S, Diaconis P, Xiao L (2004) Fastest mixing markov chain on a graph. SIAM Rev 46(4):667–689
Article MathSciNet Google Scholar
Boyd S, Diaconis P, Parrilo P, Xiao L (2009) Fastest mixing markov chain on graphs with symmetries. SIAM J Optim 20(2):792–819. https://doi.org/10.1137/070689413
Boyd S, Parikh N, Chu E, Peleato B, Eckstein J (2011) Distributed optimization and statistical learning via the alternating direction method of multipliers. Found Trends Mach Learn 3(1):1–122
Article Google Scholar
Brélaz D (1979) New methods to color the vertices of a graph. Commun ACM 22(4):251–256
Article MathSciNet Google Scholar
Brooks R (1941) On colouring the nodes of a network. Math Proc Cambridge Philos Soc 37:194–197
Article MathSciNet Google Scholar
Brouwer A, Haemers W (2012) Spectra of Graphs. Springer, New York, New York
Book Google Scholar
Butler S (2008) Eigenvalues and structures of graphs. PhD thesis, University of California, San Diego
Chung F (1997) Spectral Graph Theory. American Mathematical Society, Providence
MATH Google Scholar
Cohen-Steiner D, Kong W, Sohler C, Valiant G (2018) Approximating the spectrum of a graph. In: Proceedings of the 24th ACM SIGKDD international conference on knowledge discovery and data mining, KDD ’18. pp 1263–1271
Diamond S, Boyd S (2016) CVXPY: A Python-embedded modeling language for convex optimization. J Mach Learn Res 17(1):2909–2913
MathSciNet MATH Google Scholar
Elman J (1990) Finding structure in time. Cognit Sci 14(2):179–211
Article Google Scholar
Fu A, Narasimhan B, Boyd S (2019a) CVXR: an R package for disciplined convex optimization. J Stat Softw
Fu A, Zhang J, Boyd S (2019b) Anderson accelerated Douglas-Rachford splitting. http://stanford.edu/~boyd/papers/a2dr.html
Grant M, Boyd S (2014) CVX: Matlab software for disciplined convex programming, version 2.1. http://cvxr.com/cvx
Grant M, Boyd S, Ye Y (2006) Disciplined convex programming. Global optimization: from theory to implementation. Optimization and its application series. Springer, Nonconvex, pp 155–210
Guo C, Berkhahn F (2016) Entity embeddings of categorical variables. CoRR arxiv: abs/1604.06737
Hallac D, Leskovec J, Boyd S (2015) Network lasso: Clustering and optimization in large graphs. In: Proceedings of the ACM international conference on knowledge discovery and data mining. ACM, pp 387–396
Hallac D, Park Y, Boyd S, Leskovec J (2017) Network inference via the time-varying graphical lasso. In: Proceedings of the ACM international conference on knowledge discovery and data mining. ACM, pp 205–213
Jacobi C (1846) Über ein leichtes verfahren die in der theorie der säcularstörungen vorkommenden gleichungen numerisch aufzulösen. Journal für die reine und angewandte Mathematik 1846(30):51–94
Article Google Scholar
Jr WA, Morley T (1985) Eigenvalues of the laplacian of a graph. Linear and Multilinear Algebra 18(2):141–145
Jung A, Tran N (2019) Localized linear regression in networked data. IEEE Signal Process Lett 26(7)1090–1094. Doi: https://doi.org/10.1109/LSP.2019.2918933
Kaggle (2019) Cardiovascular disease dataset. https://www.kaggle.com/sulianova/cardiovascular-disease-dataset
Lanczos C (1950) An iteration method for the solution of the eigenvalue problem of linear differential and integral operators. J Res Natl Bureau Standards 45
MacDonald J (1934) On the modified ritz variation method. Phys Rev 46:828–828
Article Google Scholar
Merris R (1998) Laplacian graph eigenvectors. Linear Algebra Appl 278(1):221–236
Article MathSciNet Google Scholar
Mikolov T, Chen K, Corrado G, Dean J (2013) Efficient estimation of word representations in vector space. arXiv e-prints
Mises R, Pollaczek-Geiringer H (1929) Praktische verfahren der gleichungsauflösung. ZAMM J Appl Math Mech 9(2):152–164
Article Google Scholar
Ng A, Jordan M, Weiss Y (2002) On spectral clustering: Analysis and an algorithm. In: Dietterich TG, Becker S, Ghahramani Z (eds) Advances in Neural Information Processing Systems 14. MIT Press, Cambridge, pp 849–856
Google Scholar
Ojalvo I, Newman M (1970) Vibration modes of large structures by an automatic matrix-reduction method. AIAA J 8(7):1234–1239
Article Google Scholar
Oppenheim A, Schafer R (2009) Discrete-Time Signal Processing, 3rd edn. Prentice Hall Press, Upper Saddle River, NJ, USA
MATH Google Scholar
Paige C (1971) The computation of eigenvalues and eigenvectors of very large sparse matrices. PhD thesis, University of London
Parikh N, Boyd S (2014) Proximal algorithms. Found Trends Optim 1(3):127–239
Article Google Scholar
Spielman D (2010) Algorithms, graph theory, and linear equations in Laplacian matrices. In: Proceedings of the international congress of mathematicians. World Scientific, pp 2698–2722
Sun J, Boyd S, Xiao L, Diaconis P (2006) The fastest mixing Markov process on a graph and a connection to a maximum variance unfolding problem. SIAM Rev 48(4):681–699
Article MathSciNet Google Scholar
Tran N, Ambos H, Jung A (2020) Classifying partially labeled networked data via logistic network lasso. In: ICASSP 2020—2020 IEEE international conference on acoustics, speech and signal processing (ICASSP), pp 3832–3836. https://doi.org/10.1109/ICASSP40776.2020.9054408
Tuck J, Barratt S, Boyd S (2019) A distributed method for fitting Laplacian regularized stratified models. arXiv preprint arXiv:190412017
Tuck J, Hallac D, Boyd S (2019) Distributed majorization-minimization for Laplacian regularized problems. IEEE/CAA J Autom Sinica 6(1):45–52
Article MathSciNet Google Scholar
Weinberger K, Sha F, Zhu Q, Saul L (2007) Graph Laplacian regularization for large-scale semidefinite programming. In: Schölkopf B, Platt JC, Hoffman T (eds) Advances in Neural Information Processing Systems 19. MIT Press, Cambridge, pp 1489–1496
Google Scholar
Zhang Y, Liu X, Yong X (2009) Which wheel graphs are determined by their Laplacian spectra? Comput Math Appl 58(10):1887–1890
Article MathSciNet Google Scholar

Download references

Acknowledgements

We gratefully acknowledge discussions with Shane Barratt and Peter Stoica, who provided us with useful suggestions that improved the paper. J. Tuck is supported by the Stanford Graduate Fellowship.

Funding

J. Tuck is supported by the Stanford Graduate Fellowship.

Author information

Authors and Affiliations

Stanford University, 350 Jane Stanford Way, Packard Rm. 243, Stanford, CA, 94305, USA
Jonathan Tuck & Stephen Boyd

Authors

Jonathan Tuck
View author publications
You can also search for this author in PubMed Google Scholar
Stephen Boyd
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jonathan Tuck.

Ethics declarations

Conflicts of interest

The authors declare that a possible conflict of interest is that S. Boyd is an author of this paper and an editor of this journal.

Code availability

All code is made available at www.github.com/cvxgrp/strat_models.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix: Eigenvectors and eigenvalues of common graphs

The direct relation of a graph’s structure to the eigenvalues and eigenvectors of its corresponding graph Laplacian is well-known (Jr and Morley 1985). In some cases, mentioned below, we can find them analytically, especially when the graph has many symmetries. The eigenvectors are given in normalized form (i.e., $\Vert q_k\Vert _2 = 1$.) Outside of these common graphs, many other simple graphs can be analyzed analytically; see, e.g., (Brouwer and Haemers 2012).

A note on complex graphs. If a graph is complex, i.e., there is no analytical form for its graph Laplacian’s eigenvalues and eigenvectors, the bottom eigenvalues and eigenvectors of the Laplacian of a graph can be computed extremely efficiently by, e.g., the Lanczos algorithm, variants of power iteration, the Jacobi eigenvalue algorithm, the folded spectrum method, and other exotic methods. We refer the reader to (Jacobi 1846; Mises and Pollaczek-Geiringer 1929; MacDonald 1934; Lanczos 1950; Ojalvo and Newman 1970; Paige 1971) for these methods.

Path graph. A path or linear/chain graph is a graph whose vertices can be listed in order, with edges between adjacent vertices in that order. The first and last vertices only have one edge, whereas the other vertices have two edges. Figure 8 shows a path graph with 8 vertices and unit weights.

Eigenvectors $q_1, \ldots , q_{K}$ of a path graph Laplacian with K nodes and unit edge weights are given by

$$\begin{aligned} q_k = \cos (\pi k v / K - \pi k / 2K)/\Vert \cos (\pi k v / K - \pi k / 2K)\Vert _2 \quad k = 0, \ldots , K-1, \end{aligned}$$

where $v = (0, \ldots , K-1)$ and $\cos (\cdot )$ is applied elementwise. The eigenvalues are $2-2\cos (\pi k/K), k = 0, \ldots , K-1$.

Cycle graph. A cycle graph or circular graph is a graph where the vertices are connected in a closed chain. Every node in a cycle graph has two edges. Figure 9 shows a cycle graph with 10 vertices and unit weights.

Eigenvectors of a cycle graph Laplacian with K nodes and unit weights are given by

$$\begin{aligned}&\frac{1}{\sqrt{K}}{\mathbf {1}}, k=0\\&\cos (2 \pi k v / K)/\Vert \cos (2 \pi k v / K)\Vert _2 \ \mathrm {and} \ \sin (2 \pi k v / K)/\Vert \sin (2 \pi k v / K)\Vert _2, \quad k = 1, \ldots , K/2, \end{aligned}$$

where $v = (0, \ldots , K-1)$ and $\cos (\cdot )$ and $\sin (\cdot )$ are applied elementwise. The eigenvalues are $2-2\cos (2\pi k/K), k = 0, \ldots , K-1$.

Star graph. A star graph is a graph where all of the vertices are only connected to one central vertex. Figure 10 shows an example of a star graph with 10 vertices (9 outer vertices) and unit weights.

Eigenvectors of a star graph with K vertices (i.e., $K-1$ outer vertices) and unit edge weights are given by

$$\begin{aligned}&q_0 = \frac{1}{\sqrt{K}}{\mathbf {1}}\\&q_k = \frac{1}{\sqrt{2}} (e_i - e_{i+1}), \quad 1 \le i \le K-2\\&q_{K-1} = \frac{1}{\sqrt{K(K-1)}}(K-1, -1, -1, \ldots , -1, -1), \end{aligned}$$

where $e_i$ is the ith basis vector in ${\mathbf{R}}^K$. The smallest eigenvalue of this graph is zero, the largest eigenvalue is K, and all other eigenvalues are 1.

Wheel graph. A wheel graph with K nodes consists of a center (hub) vertex and a ring of $K-1$ peripheral vertices, each connected to the hub (Boyd et al. 2009). Figure 11 shows a wheel graph with 11 vertices (10 peripheral vertices) and unit weights.

Eigenvectors of a wheel graph with K vertices (i.e., $K-1$ peripheral vertices) are given by (Zhang et al. 2009)

$$\begin{aligned}&q_0 = \frac{1}{\sqrt{K}}{\mathbf {1}}\\&q_k = \sin (2 \pi k v / K) / \Vert \sin (2 \pi k v / K)\Vert _2, \quad 1 \le i \le K-2, i \text { odd}\\&q_k = \cos (2 \pi k v / K) / \Vert \cos (2 \pi k v / K)\Vert _2, \quad 1 \le i \le K-2, i \text { even}\\&q_{K-1} = \frac{1}{\sqrt{K(K-1)}}(K-1, -1, -1, \ldots , -1, -1), \end{aligned}$$

where $v = (0, \ldots , K-1)$ and $\cos (\cdot )$ and $\sin (\cdot )$ are applied elementwise. The smallest eigenvalue of the graph is zero, the largest eigenvalue is K, and the middle eigenvalues are given by $3 - 2\cos (2 \pi i/(K-1)), i = 1, \ldots , (K-2)/2$, with multiplicity 2 (Butler 2008).

Complete graph A complete graph contains every possible edge; we assume here the edge weights are all one. The first eigenvector of a complete graph Laplacian with K nodes is $\frac{1}{\sqrt{K}} {\mathbf {1}}$, and the other $K-1$ eigenvectors are any orthonormal vectors that complete the basis. The eigenvalues are 0 with multiplicity 1, and K with multiplicity $K-1$.

Figure 12 shows an example of a complete graph with 8 vertices and unit weights.

Complete bipartite graph A bipartite graph is a graph whose vertices can be decomposed into two disjoint sets such that no two vertices share an edge within a set. A complete bipartite graph is a bipartite graph such that every pair of vertices in the two sets share an edge. We denote a complete bipartite graph with $\alpha$ vertices on the first set and $\beta$ vertices on the second set as an $(\alpha , \beta )$-complete bipartite graph. We have that $\alpha +\beta =K$, and use the convention that $\alpha \le \beta$. Figure 13 illustrates an example of a complete bipartite graph with $(\alpha , \beta )=(3,6)$ and unit weights.

Eigenvectors of an $(\alpha , \beta )$-complete bipartite graph with unit edge weights are given by (Merris 1998):

$$\begin{aligned}&q_0 = \frac{1}{\sqrt{K}}{\mathbf {1}}\\&q_k = \frac{1}{\sqrt{2}}(e_k - e_{k+1}), \quad 1 \le k \le \alpha -1\\&q_k = \frac{1}{\sqrt{2}}(e_k - e_{k+1}), \quad \alpha \le k \le K-1\\&(q_{K-1})_i = {\left\{ \begin{array}{ll} \frac{-\beta }{\sqrt{\alpha ^2\beta + \beta ^2\alpha }} & 1 \le i \le \alpha \\ \frac{\alpha }{\sqrt{\alpha ^2\beta + \beta ^2\alpha }} & \alpha < i \le K \\ \end{array}\right. }. \end{aligned}$$

The eigenvalues are zero (multiplicity 1), $\alpha$ (multiplicity $\beta -1$), $\beta$ (multiplicity $\alpha -1$), and $K=\alpha +\beta$ (multiplicity 1).

Scaling and products of graphs We can find the eigenvectors and eigenvalues of the graph Laplacian of more complex graphs using some simple relationships. First, if we scale the edge weights of a graph by $\alpha \ge 0$, the eigenvectors remain the same, and the eigenvalues are scaled by $\alpha$. Second, the eigenvectors of a Cartesian product of graph Laplacians are given by the Kronecker products between the eigenvectors of each of the individual graph Laplacians; the eigenvalues consist of the sums of one eigenvalue from one graph and one from the other. This can be seen by noting that the Laplacian matrix of the Cartesian product of two graphs with graph Laplacians $L_1 \in {\mathbf{R}}^{P \times P}$ and $L_2 \in {\mathbf{R}}^{Q \times Q}$ is given by

$$\begin{aligned} L = (L_1 \otimes I) + (I \otimes L_2), \end{aligned}$$

where L is the Laplacian matrix of the Cartesian product of the two graphs. With Cartesian products of graphs, we find it convienent to index the eigenvalues and eigenvectors of the Laplacian by two indices, i.e., the eigenvalues may be denoted as $\lambda _{i,j}$ with corresponding eigenvector $q_{i,j}$ for $i = 0, \ldots , P-1$ and $j = 0, \ldots , Q-1$. (The eigenvalues will need to be sorted, as explained below.)

As an example, consider a graph which is the product of a chain graph with P vertices, edge weights $\alpha ^\text {ch}$ and eigenvalues $\lambda ^{\text {ch}} \in {\mathbf{R}}^P$; and a cycle graph with Q vertices, edge weights $\alpha ^\text {cy}$, and eigenvalues $\lambda ^{\text {ch}} \in {\mathbf{R}}^P$. The eigenvalues have the form

$$\begin{aligned} \lambda ^{\text {ch}}_i + \lambda ^{\text {cy}}_j, \quad i=0\ldots , P-1, \quad j=0, \ldots , Q-1, \end{aligned}$$

To find the m smallest of these, we sort them. The order depends on the ratio of the edge weights, $\alpha ^\text {ch}/\alpha ^\text {cy}$.

As a very specific example, take $P=4$ and $Q=5$, $\alpha ^\text {ch} = 1$, and $\alpha ^\text {cy} = 2$. The eigenvalues of the chain and cycle graphs are

$$\begin{aligned} \lambda ^{\text {ch}} = (0, 0.586, 2, 3.414), \quad \lambda ^{\text {cy}} = (0, 2.764, 2.764, 7.236, 7.236). \end{aligned}$$

The bottom six eigenvalues of the Cartesian product of these two graphs are then

$$\begin{aligned} 0, 0.586, 2, 2.764, 2.764, 3.350. \end{aligned}$$

Rights and permissions

Reprints and permissions

About this article

Cite this article

Tuck, J., Boyd, S. Eigen-stratified models. Optim Eng 23, 397–419 (2022). https://doi.org/10.1007/s11081-020-09592-x

Download citation

Received: 02 February 2020
Revised: 23 December 2020
Accepted: 24 December 2020
Published: 06 January 2021
Issue Date: March 2022
DOI: https://doi.org/10.1007/s11081-020-09592-x

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Eigen-stratified models

Abstract

Access this article

Similar content being viewed by others

Marginal and simultaneous predictive classification using stratified graphical models

On the CLT on Low Dimensional Stratified Spaces

Statistical inference using rank-based post-stratified samples in a finite population

Availability of data and material

References

Acknowledgements

Funding