Skip to main content
Log in

Eigen-stratified models

  • Research Article
  • Published:
Optimization and Engineering Aims and scope Submit manuscript

Abstract

Stratified models depend in an arbitrary way on a selected categorical feature that takes K values, and depend linearly on the other n features. Laplacian regularization with respect to a graph on the feature values can greatly improve the performance of a stratified model, especially in the low-data regime. A significant issue with Laplacian-regularized stratified models is that the model is K times the size of the base model, which can be quite large. We address this issue by formulating eigen-stratified models, which are stratified models with an additional constraint that the model parameters are linear combinations of some modest number m of bottom eigenvectors of the graph Laplacian, i.e., those associated with the m smallest eigenvalues. With eigen-stratified models, we only need to store the m bottom eigenvectors and the corresponding coefficients as the stratified model parameters. This leads to a reduction, sometimes large, of model size when \(m\le n\) and \(m \ll K\). In some cases, the additional regularization implicit in eigen-stratified models can improve out-of-sample performance over standard Laplacian regularized stratified models.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

Availability of data and material

All data is made available at www.github.com/cvxgrp/strat_models.

References

  • Ahmed N, Natarajan T, Rao K (1974) Discrete cosine transform. IEEE Trans Comput C-23(1):90–93

  • Aspvall B, Gilbert J (1984) Graph coloring using eigenvalue decomposition. SIAM J Matrix Anal Appl 5(4):526–13

    MathSciNet  MATH  Google Scholar 

  • Boyd S, Vandenberghe L (2004) Convex Optimization. Cambridge University Press, Cambridge

    Book  Google Scholar 

  • Boyd S, Diaconis P, Xiao L (2004) Fastest mixing markov chain on a graph. SIAM Rev 46(4):667–689

    Article  MathSciNet  Google Scholar 

  • Boyd S, Diaconis P, Parrilo P, Xiao L (2009) Fastest mixing markov chain on graphs with symmetries. SIAM J Optim 20(2):792–819. https://doi.org/10.1137/070689413

  • Boyd S, Parikh N, Chu E, Peleato B, Eckstein J (2011) Distributed optimization and statistical learning via the alternating direction method of multipliers. Found Trends Mach Learn 3(1):1–122

    Article  Google Scholar 

  • Brélaz D (1979) New methods to color the vertices of a graph. Commun ACM 22(4):251–256

    Article  MathSciNet  Google Scholar 

  • Brooks R (1941) On colouring the nodes of a network. Math Proc Cambridge Philos Soc 37:194–197

    Article  MathSciNet  Google Scholar 

  • Brouwer A, Haemers W (2012) Spectra of Graphs. Springer, New York, New York

    Book  Google Scholar 

  • Butler S (2008) Eigenvalues and structures of graphs. PhD thesis, University of California, San Diego

  • Chung F (1997) Spectral Graph Theory. American Mathematical Society, Providence

    MATH  Google Scholar 

  • Cohen-Steiner D, Kong W, Sohler C, Valiant G (2018) Approximating the spectrum of a graph. In: Proceedings of the 24th ACM SIGKDD international conference on knowledge discovery and data mining, KDD ’18. pp 1263–1271

  • Diamond S, Boyd S (2016) CVXPY: A Python-embedded modeling language for convex optimization. J Mach Learn Res 17(1):2909–2913

    MathSciNet  MATH  Google Scholar 

  • Elman J (1990) Finding structure in time. Cognit Sci 14(2):179–211

    Article  Google Scholar 

  • Fu A, Narasimhan B, Boyd S (2019a) CVXR: an R package for disciplined convex optimization. J Stat Softw

  • Fu A, Zhang J, Boyd S (2019b) Anderson accelerated Douglas-Rachford splitting. http://stanford.edu/~boyd/papers/a2dr.html

  • Grant M, Boyd S (2014) CVX: Matlab software for disciplined convex programming, version 2.1. http://cvxr.com/cvx

  • Grant M, Boyd S, Ye Y (2006) Disciplined convex programming. Global optimization: from theory to implementation. Optimization and its application series. Springer, Nonconvex, pp 155–210

  • Guo C, Berkhahn F (2016) Entity embeddings of categorical variables. CoRR arxiv: abs/1604.06737

  • Hallac D, Leskovec J, Boyd S (2015) Network lasso: Clustering and optimization in large graphs. In: Proceedings of the ACM international conference on knowledge discovery and data mining. ACM, pp 387–396

  • Hallac D, Park Y, Boyd S, Leskovec J (2017) Network inference via the time-varying graphical lasso. In: Proceedings of the ACM international conference on knowledge discovery and data mining. ACM, pp 205–213

  • Jacobi C (1846) Über ein leichtes verfahren die in der theorie der säcularstörungen vorkommenden gleichungen numerisch aufzulösen. Journal für die reine und angewandte Mathematik 1846(30):51–94

    Article  Google Scholar 

  • Jr WA, Morley T (1985) Eigenvalues of the laplacian of a graph. Linear and Multilinear Algebra 18(2):141–145

  • Jung A, Tran N (2019) Localized linear regression in networked data. IEEE Signal Process Lett 26(7)1090–1094. Doi: https://doi.org/10.1109/LSP.2019.2918933

  • Kaggle (2019) Cardiovascular disease dataset. https://www.kaggle.com/sulianova/cardiovascular-disease-dataset

  • Lanczos C (1950) An iteration method for the solution of the eigenvalue problem of linear differential and integral operators. J Res Natl Bureau Standards 45

  • MacDonald J (1934) On the modified ritz variation method. Phys Rev 46:828–828

    Article  Google Scholar 

  • Merris R (1998) Laplacian graph eigenvectors. Linear Algebra Appl 278(1):221–236

    Article  MathSciNet  Google Scholar 

  • Mikolov T, Chen K, Corrado G, Dean J (2013) Efficient estimation of word representations in vector space. arXiv e-prints

  • Mises R, Pollaczek-Geiringer H (1929) Praktische verfahren der gleichungsauflösung. ZAMM J Appl Math Mech 9(2):152–164

    Article  Google Scholar 

  • Ng A, Jordan M, Weiss Y (2002) On spectral clustering: Analysis and an algorithm. In: Dietterich TG, Becker S, Ghahramani Z (eds) Advances in Neural Information Processing Systems 14. MIT Press, Cambridge, pp 849–856

    Google Scholar 

  • Ojalvo I, Newman M (1970) Vibration modes of large structures by an automatic matrix-reduction method. AIAA J 8(7):1234–1239

    Article  Google Scholar 

  • Oppenheim A, Schafer R (2009) Discrete-Time Signal Processing, 3rd edn. Prentice Hall Press, Upper Saddle River, NJ, USA

    MATH  Google Scholar 

  • Paige C (1971) The computation of eigenvalues and eigenvectors of very large sparse matrices. PhD thesis, University of London

  • Parikh N, Boyd S (2014) Proximal algorithms. Found Trends Optim 1(3):127–239

    Article  Google Scholar 

  • Spielman D (2010) Algorithms, graph theory, and linear equations in Laplacian matrices. In: Proceedings of the international congress of mathematicians. World Scientific, pp 2698–2722

  • Sun J, Boyd S, Xiao L, Diaconis P (2006) The fastest mixing Markov process on a graph and a connection to a maximum variance unfolding problem. SIAM Rev 48(4):681–699

    Article  MathSciNet  Google Scholar 

  • Tran N, Ambos H, Jung A (2020) Classifying partially labeled networked data via logistic network lasso. In: ICASSP 2020—2020 IEEE international conference on acoustics, speech and signal processing (ICASSP), pp 3832–3836. https://doi.org/10.1109/ICASSP40776.2020.9054408

  • Tuck J, Barratt S, Boyd S (2019) A distributed method for fitting Laplacian regularized stratified models. arXiv preprint arXiv:190412017

  • Tuck J, Hallac D, Boyd S (2019) Distributed majorization-minimization for Laplacian regularized problems. IEEE/CAA J Autom Sinica 6(1):45–52

    Article  MathSciNet  Google Scholar 

  • Weinberger K, Sha F, Zhu Q, Saul L (2007) Graph Laplacian regularization for large-scale semidefinite programming. In: Schölkopf B, Platt JC, Hoffman T (eds) Advances in Neural Information Processing Systems 19. MIT Press, Cambridge, pp 1489–1496

    Google Scholar 

  • Zhang Y, Liu X, Yong X (2009) Which wheel graphs are determined by their Laplacian spectra? Comput Math Appl 58(10):1887–1890

    Article  MathSciNet  Google Scholar 

Download references

Acknowledgements

We gratefully acknowledge discussions with Shane Barratt and Peter Stoica, who provided us with useful suggestions that improved the paper. J. Tuck is supported by the Stanford Graduate Fellowship.

Funding

J. Tuck is supported by the Stanford Graduate Fellowship.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jonathan Tuck.

Ethics declarations

Conflicts of interest

The authors declare that a possible conflict of interest is that S. Boyd is an author of this paper and an editor of this journal.

Code availability

All code is made available at www.github.com/cvxgrp/strat_models.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix: Eigenvectors and eigenvalues of common graphs

Appendix: Eigenvectors and eigenvalues of common graphs

The direct relation of a graph’s structure to the eigenvalues and eigenvectors of its corresponding graph Laplacian is well-known (Jr and Morley 1985). In some cases, mentioned below, we can find them analytically, especially when the graph has many symmetries. The eigenvectors are given in normalized form (i.e., \(\Vert q_k\Vert _2 = 1\).) Outside of these common graphs, many other simple graphs can be analyzed analytically; see, e.g., (Brouwer and Haemers 2012).

A note on complex graphs. If a graph is complex, i.e., there is no analytical form for its graph Laplacian’s eigenvalues and eigenvectors, the bottom eigenvalues and eigenvectors of the Laplacian of a graph can be computed extremely efficiently by, e.g., the Lanczos algorithm, variants of power iteration, the Jacobi eigenvalue algorithm, the folded spectrum method, and other exotic methods. We refer the reader to (Jacobi 1846; Mises and Pollaczek-Geiringer 1929; MacDonald 1934; Lanczos 1950; Ojalvo and Newman 1970; Paige 1971) for these methods.

Path graph. A path or linear/chain graph is a graph whose vertices can be listed in order, with edges between adjacent vertices in that order. The first and last vertices only have one edge, whereas the other vertices have two edges. Figure 8 shows a path graph with 8 vertices and unit weights.

Fig. 8
figure 8

A path graph with 8 vertices and unit weights

Eigenvectors \(q_1, \ldots , q_{K}\) of a path graph Laplacian with K nodes and unit edge weights are given by

$$\begin{aligned} q_k = \cos (\pi k v / K - \pi k / 2K)/\Vert \cos (\pi k v / K - \pi k / 2K)\Vert _2 \quad k = 0, \ldots , K-1, \end{aligned}$$

where \(v = (0, \ldots , K-1)\) and \(\cos (\cdot )\) is applied elementwise. The eigenvalues are \(2-2\cos (\pi k/K), k = 0, \ldots , K-1\).

Cycle graph. A cycle graph or circular graph is a graph where the vertices are connected in a closed chain. Every node in a cycle graph has two edges. Figure 9 shows a cycle graph with 10 vertices and unit weights.

Fig. 9
figure 9

A cycle graph with 10 vertices and unit weights

Eigenvectors of a cycle graph Laplacian with K nodes and unit weights are given by

$$\begin{aligned}&\frac{1}{\sqrt{K}}{\mathbf {1}}, k=0\\&\cos (2 \pi k v / K)/\Vert \cos (2 \pi k v / K)\Vert _2 \ \mathrm {and} \ \sin (2 \pi k v / K)/\Vert \sin (2 \pi k v / K)\Vert _2, \quad k = 1, \ldots , K/2, \end{aligned}$$

where \(v = (0, \ldots , K-1)\) and \(\cos (\cdot )\) and \(\sin (\cdot )\) are applied elementwise. The eigenvalues are \(2-2\cos (2\pi k/K), k = 0, \ldots , K-1\).

Star graph. A star graph is a graph where all of the vertices are only connected to one central vertex. Figure 10 shows an example of a star graph with 10 vertices (9 outer vertices) and unit weights.

Fig. 10
figure 10

A star graph with 10 vertices (9 outer vertices) and unit weights

Eigenvectors of a star graph with K vertices (i.e., \(K-1\) outer vertices) and unit edge weights are given by

$$\begin{aligned}&q_0 = \frac{1}{\sqrt{K}}{\mathbf {1}}\\&q_k = \frac{1}{\sqrt{2}} (e_i - e_{i+1}), \quad 1 \le i \le K-2\\&q_{K-1} = \frac{1}{\sqrt{K(K-1)}}(K-1, -1, -1, \ldots , -1, -1), \end{aligned}$$

where \(e_i\) is the ith basis vector in \({\mathbf{R}}^K\). The smallest eigenvalue of this graph is zero, the largest eigenvalue is K, and all other eigenvalues are 1.

Wheel graph. A wheel graph with K nodes consists of a center (hub) vertex and a ring of \(K-1\) peripheral vertices, each connected to the hub (Boyd et al. 2009). Figure 11 shows a wheel graph with 11 vertices (10 peripheral vertices) and unit weights.

Fig. 11
figure 11

A wheel graph with 11 vertices (10 peripheral vertices) and unit weights

Eigenvectors of a wheel graph with K vertices (i.e., \(K-1\) peripheral vertices) are given by (Zhang et al. 2009)

$$\begin{aligned}&q_0 = \frac{1}{\sqrt{K}}{\mathbf {1}}\\&q_k = \sin (2 \pi k v / K) / \Vert \sin (2 \pi k v / K)\Vert _2, \quad 1 \le i \le K-2, i \text { odd}\\&q_k = \cos (2 \pi k v / K) / \Vert \cos (2 \pi k v / K)\Vert _2, \quad 1 \le i \le K-2, i \text { even}\\&q_{K-1} = \frac{1}{\sqrt{K(K-1)}}(K-1, -1, -1, \ldots , -1, -1), \end{aligned}$$

where \(v = (0, \ldots , K-1)\) and \(\cos (\cdot )\) and \(\sin (\cdot )\) are applied elementwise. The smallest eigenvalue of the graph is zero, the largest eigenvalue is K, and the middle eigenvalues are given by \(3 - 2\cos (2 \pi i/(K-1)), i = 1, \ldots , (K-2)/2\), with multiplicity 2 (Butler 2008).

Complete graph A complete graph contains every possible edge; we assume here the edge weights are all one. The first eigenvector of a complete graph Laplacian with K nodes is \(\frac{1}{\sqrt{K}} {\mathbf {1}}\), and the other \(K-1\) eigenvectors are any orthonormal vectors that complete the basis. The eigenvalues are 0 with multiplicity 1, and K with multiplicity \(K-1\).

Figure 12 shows an example of a complete graph with 8 vertices and unit weights.

Fig. 12
figure 12

A complete graph with 8 vertices and unit weights

Complete bipartite graph A bipartite graph is a graph whose vertices can be decomposed into two disjoint sets such that no two vertices share an edge within a set. A complete bipartite graph is a bipartite graph such that every pair of vertices in the two sets share an edge. We denote a complete bipartite graph with \(\alpha\) vertices on the first set and \(\beta\) vertices on the second set as an \((\alpha , \beta )\)-complete bipartite graph. We have that \(\alpha +\beta =K\), and use the convention that \(\alpha \le \beta\). Figure 13 illustrates an example of a complete bipartite graph with \((\alpha , \beta )=(3,6)\) and unit weights.

Fig. 13
figure 13

A (3,6)-complete bipartite graph with unit weights

Eigenvectors of an \((\alpha , \beta )\)-complete bipartite graph with unit edge weights are given by (Merris 1998):

$$\begin{aligned}&q_0 = \frac{1}{\sqrt{K}}{\mathbf {1}}\\&q_k = \frac{1}{\sqrt{2}}(e_k - e_{k+1}), \quad 1 \le k \le \alpha -1\\&q_k = \frac{1}{\sqrt{2}}(e_k - e_{k+1}), \quad \alpha \le k \le K-1\\&(q_{K-1})_i = {\left\{ \begin{array}{ll} \frac{-\beta }{\sqrt{\alpha ^2\beta + \beta ^2\alpha }} & 1 \le i \le \alpha \\ \frac{\alpha }{\sqrt{\alpha ^2\beta + \beta ^2\alpha }} & \alpha < i \le K \\ \end{array}\right. }. \end{aligned}$$

The eigenvalues are zero (multiplicity 1), \(\alpha\) (multiplicity \(\beta -1\)), \(\beta\) (multiplicity \(\alpha -1\)), and \(K=\alpha +\beta\) (multiplicity 1).

Scaling and products of graphs We can find the eigenvectors and eigenvalues of the graph Laplacian of more complex graphs using some simple relationships. First, if we scale the edge weights of a graph by \(\alpha \ge 0\), the eigenvectors remain the same, and the eigenvalues are scaled by \(\alpha\). Second, the eigenvectors of a Cartesian product of graph Laplacians are given by the Kronecker products between the eigenvectors of each of the individual graph Laplacians; the eigenvalues consist of the sums of one eigenvalue from one graph and one from the other. This can be seen by noting that the Laplacian matrix of the Cartesian product of two graphs with graph Laplacians \(L_1 \in {\mathbf{R}}^{P \times P}\) and \(L_2 \in {\mathbf{R}}^{Q \times Q}\) is given by

$$\begin{aligned} L = (L_1 \otimes I) + (I \otimes L_2), \end{aligned}$$

where L is the Laplacian matrix of the Cartesian product of the two graphs. With Cartesian products of graphs, we find it convienent to index the eigenvalues and eigenvectors of the Laplacian by two indices, i.e., the eigenvalues may be denoted as \(\lambda _{i,j}\) with corresponding eigenvector \(q_{i,j}\) for \(i = 0, \ldots , P-1\) and \(j = 0, \ldots , Q-1\). (The eigenvalues will need to be sorted, as explained below.)

As an example, consider a graph which is the product of a chain graph with P vertices, edge weights \(\alpha ^\text {ch}\) and eigenvalues \(\lambda ^{\text {ch}} \in {\mathbf{R}}^P\); and a cycle graph with Q vertices, edge weights \(\alpha ^\text {cy}\), and eigenvalues \(\lambda ^{\text {ch}} \in {\mathbf{R}}^P\). The eigenvalues have the form

$$\begin{aligned} \lambda ^{\text {ch}}_i + \lambda ^{\text {cy}}_j, \quad i=0\ldots , P-1, \quad j=0, \ldots , Q-1, \end{aligned}$$

To find the m smallest of these, we sort them. The order depends on the ratio of the edge weights, \(\alpha ^\text {ch}/\alpha ^\text {cy}\).

As a very specific example, take \(P=4\) and \(Q=5\), \(\alpha ^\text {ch} = 1\), and \(\alpha ^\text {cy} = 2\). The eigenvalues of the chain and cycle graphs are

$$\begin{aligned} \lambda ^{\text {ch}} = (0, 0.586, 2, 3.414), \quad \lambda ^{\text {cy}} = (0, 2.764, 2.764, 7.236, 7.236). \end{aligned}$$

The bottom six eigenvalues of the Cartesian product of these two graphs are then

$$\begin{aligned} 0, 0.586, 2, 2.764, 2.764, 3.350. \end{aligned}$$

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Tuck, J., Boyd, S. Eigen-stratified models. Optim Eng 23, 397–419 (2022). https://doi.org/10.1007/s11081-020-09592-x

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11081-020-09592-x

Keywords

Navigation