Abstract
Stratified models depend in an arbitrary way on a selected categorical feature that takes K values, and depend linearly on the other n features. Laplacian regularization with respect to a graph on the feature values can greatly improve the performance of a stratified model, especially in the low-data regime. A significant issue with Laplacian-regularized stratified models is that the model is K times the size of the base model, which can be quite large. We address this issue by formulating eigen-stratified models, which are stratified models with an additional constraint that the model parameters are linear combinations of some modest number m of bottom eigenvectors of the graph Laplacian, i.e., those associated with the m smallest eigenvalues. With eigen-stratified models, we only need to store the m bottom eigenvectors and the corresponding coefficients as the stratified model parameters. This leads to a reduction, sometimes large, of model size when \(m\le n\) and \(m \ll K\). In some cases, the additional regularization implicit in eigen-stratified models can improve out-of-sample performance over standard Laplacian regularized stratified models.
Similar content being viewed by others
Availability of data and material
All data is made available at www.github.com/cvxgrp/strat_models.
References
Ahmed N, Natarajan T, Rao K (1974) Discrete cosine transform. IEEE Trans Comput C-23(1):90–93
Aspvall B, Gilbert J (1984) Graph coloring using eigenvalue decomposition. SIAM J Matrix Anal Appl 5(4):526–13
Boyd S, Vandenberghe L (2004) Convex Optimization. Cambridge University Press, Cambridge
Boyd S, Diaconis P, Xiao L (2004) Fastest mixing markov chain on a graph. SIAM Rev 46(4):667–689
Boyd S, Diaconis P, Parrilo P, Xiao L (2009) Fastest mixing markov chain on graphs with symmetries. SIAM J Optim 20(2):792–819. https://doi.org/10.1137/070689413
Boyd S, Parikh N, Chu E, Peleato B, Eckstein J (2011) Distributed optimization and statistical learning via the alternating direction method of multipliers. Found Trends Mach Learn 3(1):1–122
Brélaz D (1979) New methods to color the vertices of a graph. Commun ACM 22(4):251–256
Brooks R (1941) On colouring the nodes of a network. Math Proc Cambridge Philos Soc 37:194–197
Brouwer A, Haemers W (2012) Spectra of Graphs. Springer, New York, New York
Butler S (2008) Eigenvalues and structures of graphs. PhD thesis, University of California, San Diego
Chung F (1997) Spectral Graph Theory. American Mathematical Society, Providence
Cohen-Steiner D, Kong W, Sohler C, Valiant G (2018) Approximating the spectrum of a graph. In: Proceedings of the 24th ACM SIGKDD international conference on knowledge discovery and data mining, KDD ’18. pp 1263–1271
Diamond S, Boyd S (2016) CVXPY: A Python-embedded modeling language for convex optimization. J Mach Learn Res 17(1):2909–2913
Elman J (1990) Finding structure in time. Cognit Sci 14(2):179–211
Fu A, Narasimhan B, Boyd S (2019a) CVXR: an R package for disciplined convex optimization. J Stat Softw
Fu A, Zhang J, Boyd S (2019b) Anderson accelerated Douglas-Rachford splitting. http://stanford.edu/~boyd/papers/a2dr.html
Grant M, Boyd S (2014) CVX: Matlab software for disciplined convex programming, version 2.1. http://cvxr.com/cvx
Grant M, Boyd S, Ye Y (2006) Disciplined convex programming. Global optimization: from theory to implementation. Optimization and its application series. Springer, Nonconvex, pp 155–210
Guo C, Berkhahn F (2016) Entity embeddings of categorical variables. CoRR arxiv: abs/1604.06737
Hallac D, Leskovec J, Boyd S (2015) Network lasso: Clustering and optimization in large graphs. In: Proceedings of the ACM international conference on knowledge discovery and data mining. ACM, pp 387–396
Hallac D, Park Y, Boyd S, Leskovec J (2017) Network inference via the time-varying graphical lasso. In: Proceedings of the ACM international conference on knowledge discovery and data mining. ACM, pp 205–213
Jacobi C (1846) Über ein leichtes verfahren die in der theorie der säcularstörungen vorkommenden gleichungen numerisch aufzulösen. Journal für die reine und angewandte Mathematik 1846(30):51–94
Jr WA, Morley T (1985) Eigenvalues of the laplacian of a graph. Linear and Multilinear Algebra 18(2):141–145
Jung A, Tran N (2019) Localized linear regression in networked data. IEEE Signal Process Lett 26(7)1090–1094. Doi: https://doi.org/10.1109/LSP.2019.2918933
Kaggle (2019) Cardiovascular disease dataset. https://www.kaggle.com/sulianova/cardiovascular-disease-dataset
Lanczos C (1950) An iteration method for the solution of the eigenvalue problem of linear differential and integral operators. J Res Natl Bureau Standards 45
MacDonald J (1934) On the modified ritz variation method. Phys Rev 46:828–828
Merris R (1998) Laplacian graph eigenvectors. Linear Algebra Appl 278(1):221–236
Mikolov T, Chen K, Corrado G, Dean J (2013) Efficient estimation of word representations in vector space. arXiv e-prints
Mises R, Pollaczek-Geiringer H (1929) Praktische verfahren der gleichungsauflösung. ZAMM J Appl Math Mech 9(2):152–164
Ng A, Jordan M, Weiss Y (2002) On spectral clustering: Analysis and an algorithm. In: Dietterich TG, Becker S, Ghahramani Z (eds) Advances in Neural Information Processing Systems 14. MIT Press, Cambridge, pp 849–856
Ojalvo I, Newman M (1970) Vibration modes of large structures by an automatic matrix-reduction method. AIAA J 8(7):1234–1239
Oppenheim A, Schafer R (2009) Discrete-Time Signal Processing, 3rd edn. Prentice Hall Press, Upper Saddle River, NJ, USA
Paige C (1971) The computation of eigenvalues and eigenvectors of very large sparse matrices. PhD thesis, University of London
Parikh N, Boyd S (2014) Proximal algorithms. Found Trends Optim 1(3):127–239
Spielman D (2010) Algorithms, graph theory, and linear equations in Laplacian matrices. In: Proceedings of the international congress of mathematicians. World Scientific, pp 2698–2722
Sun J, Boyd S, Xiao L, Diaconis P (2006) The fastest mixing Markov process on a graph and a connection to a maximum variance unfolding problem. SIAM Rev 48(4):681–699
Tran N, Ambos H, Jung A (2020) Classifying partially labeled networked data via logistic network lasso. In: ICASSP 2020—2020 IEEE international conference on acoustics, speech and signal processing (ICASSP), pp 3832–3836. https://doi.org/10.1109/ICASSP40776.2020.9054408
Tuck J, Barratt S, Boyd S (2019) A distributed method for fitting Laplacian regularized stratified models. arXiv preprint arXiv:190412017
Tuck J, Hallac D, Boyd S (2019) Distributed majorization-minimization for Laplacian regularized problems. IEEE/CAA J Autom Sinica 6(1):45–52
Weinberger K, Sha F, Zhu Q, Saul L (2007) Graph Laplacian regularization for large-scale semidefinite programming. In: Schölkopf B, Platt JC, Hoffman T (eds) Advances in Neural Information Processing Systems 19. MIT Press, Cambridge, pp 1489–1496
Zhang Y, Liu X, Yong X (2009) Which wheel graphs are determined by their Laplacian spectra? Comput Math Appl 58(10):1887–1890
Acknowledgements
We gratefully acknowledge discussions with Shane Barratt and Peter Stoica, who provided us with useful suggestions that improved the paper. J. Tuck is supported by the Stanford Graduate Fellowship.
Funding
J. Tuck is supported by the Stanford Graduate Fellowship.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflicts of interest
The authors declare that a possible conflict of interest is that S. Boyd is an author of this paper and an editor of this journal.
Code availability
All code is made available at www.github.com/cvxgrp/strat_models.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix: Eigenvectors and eigenvalues of common graphs
Appendix: Eigenvectors and eigenvalues of common graphs
The direct relation of a graph’s structure to the eigenvalues and eigenvectors of its corresponding graph Laplacian is well-known (Jr and Morley 1985). In some cases, mentioned below, we can find them analytically, especially when the graph has many symmetries. The eigenvectors are given in normalized form (i.e., \(\Vert q_k\Vert _2 = 1\).) Outside of these common graphs, many other simple graphs can be analyzed analytically; see, e.g., (Brouwer and Haemers 2012).
A note on complex graphs. If a graph is complex, i.e., there is no analytical form for its graph Laplacian’s eigenvalues and eigenvectors, the bottom eigenvalues and eigenvectors of the Laplacian of a graph can be computed extremely efficiently by, e.g., the Lanczos algorithm, variants of power iteration, the Jacobi eigenvalue algorithm, the folded spectrum method, and other exotic methods. We refer the reader to (Jacobi 1846; Mises and Pollaczek-Geiringer 1929; MacDonald 1934; Lanczos 1950; Ojalvo and Newman 1970; Paige 1971) for these methods.
Path graph. A path or linear/chain graph is a graph whose vertices can be listed in order, with edges between adjacent vertices in that order. The first and last vertices only have one edge, whereas the other vertices have two edges. Figure 8 shows a path graph with 8 vertices and unit weights.
Eigenvectors \(q_1, \ldots , q_{K}\) of a path graph Laplacian with K nodes and unit edge weights are given by
where \(v = (0, \ldots , K-1)\) and \(\cos (\cdot )\) is applied elementwise. The eigenvalues are \(2-2\cos (\pi k/K), k = 0, \ldots , K-1\).
Cycle graph. A cycle graph or circular graph is a graph where the vertices are connected in a closed chain. Every node in a cycle graph has two edges. Figure 9 shows a cycle graph with 10 vertices and unit weights.
Eigenvectors of a cycle graph Laplacian with K nodes and unit weights are given by
where \(v = (0, \ldots , K-1)\) and \(\cos (\cdot )\) and \(\sin (\cdot )\) are applied elementwise. The eigenvalues are \(2-2\cos (2\pi k/K), k = 0, \ldots , K-1\).
Star graph. A star graph is a graph where all of the vertices are only connected to one central vertex. Figure 10 shows an example of a star graph with 10 vertices (9 outer vertices) and unit weights.
Eigenvectors of a star graph with K vertices (i.e., \(K-1\) outer vertices) and unit edge weights are given by
where \(e_i\) is the ith basis vector in \({\mathbf{R}}^K\). The smallest eigenvalue of this graph is zero, the largest eigenvalue is K, and all other eigenvalues are 1.
Wheel graph. A wheel graph with K nodes consists of a center (hub) vertex and a ring of \(K-1\) peripheral vertices, each connected to the hub (Boyd et al. 2009). Figure 11 shows a wheel graph with 11 vertices (10 peripheral vertices) and unit weights.
Eigenvectors of a wheel graph with K vertices (i.e., \(K-1\) peripheral vertices) are given by (Zhang et al. 2009)
where \(v = (0, \ldots , K-1)\) and \(\cos (\cdot )\) and \(\sin (\cdot )\) are applied elementwise. The smallest eigenvalue of the graph is zero, the largest eigenvalue is K, and the middle eigenvalues are given by \(3 - 2\cos (2 \pi i/(K-1)), i = 1, \ldots , (K-2)/2\), with multiplicity 2 (Butler 2008).
Complete graph A complete graph contains every possible edge; we assume here the edge weights are all one. The first eigenvector of a complete graph Laplacian with K nodes is \(\frac{1}{\sqrt{K}} {\mathbf {1}}\), and the other \(K-1\) eigenvectors are any orthonormal vectors that complete the basis. The eigenvalues are 0 with multiplicity 1, and K with multiplicity \(K-1\).
Figure 12 shows an example of a complete graph with 8 vertices and unit weights.
Complete bipartite graph A bipartite graph is a graph whose vertices can be decomposed into two disjoint sets such that no two vertices share an edge within a set. A complete bipartite graph is a bipartite graph such that every pair of vertices in the two sets share an edge. We denote a complete bipartite graph with \(\alpha\) vertices on the first set and \(\beta\) vertices on the second set as an \((\alpha , \beta )\)-complete bipartite graph. We have that \(\alpha +\beta =K\), and use the convention that \(\alpha \le \beta\). Figure 13 illustrates an example of a complete bipartite graph with \((\alpha , \beta )=(3,6)\) and unit weights.
Eigenvectors of an \((\alpha , \beta )\)-complete bipartite graph with unit edge weights are given by (Merris 1998):
The eigenvalues are zero (multiplicity 1), \(\alpha\) (multiplicity \(\beta -1\)), \(\beta\) (multiplicity \(\alpha -1\)), and \(K=\alpha +\beta\) (multiplicity 1).
Scaling and products of graphs We can find the eigenvectors and eigenvalues of the graph Laplacian of more complex graphs using some simple relationships. First, if we scale the edge weights of a graph by \(\alpha \ge 0\), the eigenvectors remain the same, and the eigenvalues are scaled by \(\alpha\). Second, the eigenvectors of a Cartesian product of graph Laplacians are given by the Kronecker products between the eigenvectors of each of the individual graph Laplacians; the eigenvalues consist of the sums of one eigenvalue from one graph and one from the other. This can be seen by noting that the Laplacian matrix of the Cartesian product of two graphs with graph Laplacians \(L_1 \in {\mathbf{R}}^{P \times P}\) and \(L_2 \in {\mathbf{R}}^{Q \times Q}\) is given by
where L is the Laplacian matrix of the Cartesian product of the two graphs. With Cartesian products of graphs, we find it convienent to index the eigenvalues and eigenvectors of the Laplacian by two indices, i.e., the eigenvalues may be denoted as \(\lambda _{i,j}\) with corresponding eigenvector \(q_{i,j}\) for \(i = 0, \ldots , P-1\) and \(j = 0, \ldots , Q-1\). (The eigenvalues will need to be sorted, as explained below.)
As an example, consider a graph which is the product of a chain graph with P vertices, edge weights \(\alpha ^\text {ch}\) and eigenvalues \(\lambda ^{\text {ch}} \in {\mathbf{R}}^P\); and a cycle graph with Q vertices, edge weights \(\alpha ^\text {cy}\), and eigenvalues \(\lambda ^{\text {ch}} \in {\mathbf{R}}^P\). The eigenvalues have the form
To find the m smallest of these, we sort them. The order depends on the ratio of the edge weights, \(\alpha ^\text {ch}/\alpha ^\text {cy}\).
As a very specific example, take \(P=4\) and \(Q=5\), \(\alpha ^\text {ch} = 1\), and \(\alpha ^\text {cy} = 2\). The eigenvalues of the chain and cycle graphs are
The bottom six eigenvalues of the Cartesian product of these two graphs are then
Rights and permissions
About this article
Cite this article
Tuck, J., Boyd, S. Eigen-stratified models. Optim Eng 23, 397–419 (2022). https://doi.org/10.1007/s11081-020-09592-x
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11081-020-09592-x