ADMM Algorithmic Regularization Paths for Sparse Statistical Machine Learning

Hu, Yue; Chi, Eric C.; Allen, Genevera I.

doi:10.1007/978-3-319-41589-5_13

Yue Hu¹⁸,
Eric C. Chi¹⁹ &
Genevera I. Allen²⁰

Part of the book series: Scientific Computation ((SCIENTCOMP))

4154 Accesses
6 Citations

Abstract

Optimization approaches based on operator splitting are becoming popular for solving sparsity regularized statistical machine learning models. While many have proposed fast algorithms to solve these problems for a single regularization parameter, conspicuously less attention has been given to computing regularization paths, or solving the optimization problems over the full range of regularization parameters to obtain a sequence of sparse models. In this chapter, we aim to quickly approximate the sequence of sparse models associated with regularization paths for the purposes of statistical model selection by using the building blocks from a classical operator splitting method, the Alternating Direction Method of Multipliers (ADMM). We begin by proposing an ADMM algorithm that uses warm-starts to quickly compute the regularization path. Then, by employing approximations along this warm-starting ADMM algorithm, we propose a novel concept that we term the ADMM Algorithmic Regularization Path. Our method can quickly outline the sequence of sparse models associated with the regularization path in computational time that is often less than that of using the ADMM algorithm to solve the problem for a single regularization parameter. We demonstrate the applicability and substantial computational savings of our approach through three popular examples, sparse linear regression, reduced-rank multi-task learning, and convex clustering.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 149.00; Price excludes VAT (USA)

Softcover Book: USD 199.99; Price excludes VAT (USA)

Hardcover Book: USD 199.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Aguiar, P., Xing, E.P., Figueiredo, M., Smith, N.A., Martins, A.: An augmented Lagrangian approach to constrained MAP inference. In: Proceedings of the 28th International Conference on Machine Learning (ICML-11), pp. 169–176 (2011)
Google Scholar
Bien, J., Taylor, J., Tibshirani, R.: A lasso for hierarchical interactions. The Annals of Statistics 41 (3), 1111–1141 (2013)
Article MATH MathSciNet Google Scholar
Boyd, S., Parikh, N., Chu, E., Peleato, B., Eckstein, J.: Distributed optimization and statistical learning via the alternating direction method of multipliers. Foundations and Trends®; in Machine Learning 3 (1), 1–122 (2011)
Google Scholar
Bühlmann, P., Van De Geer, S.: Statistics for High-Dimensional Data: Methods, Theory and Applications. Springer, Berlin Heidelberg (2011)
Book MATH Google Scholar
Chi, E.C., Lange, K.: Splitting methods for convex clustering. Journal of Computational and Graphical Statistics (to appear)
Google Scholar
Danaher, P., Wang, P., Witten, D.M.: The joint graphical lasso for inverse covariance estimation across multiple classes. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 76 (2), 373–397 (2014)
Google Scholar
Donoho, D.L., Tsaig, Y., Drori, I., Starck, J.L.: Sparse solution of underdetermined systems of linear equations by stagewise orthogonal matching pursuit. Information Theory, IEEE Transactions on 58 (2), 1094–1121 (2012)
Article MathSciNet Google Scholar
Efron, B., Hastie, T., Johnstone, I., Tibshirani, R.: Least angle regression. The Annals of Statistics 32 (2), 407–499 (2004)
Article MATH MathSciNet Google Scholar
Friedman, J., Hastie, T., Höfling, H., Tibshirani, R., et al.: Pathwise coordinate optimization. The Annals of Applied Statistics 1 (2), 302–332 (2007)
Article MATH MathSciNet Google Scholar
Friedman, J., Hastie, T., Tibshirani, R.: Regularization paths for generalized linear models via coordinate descent. Journal of Statistical Software 33 (1), 1–22 (2010)
Article Google Scholar
Goldstein, T., O’Donoghue, B., Setzer, S.: Fast alternating direction optimization methods. SIAM Journal on Imaging Sciences 7, 1588–1623 (2014)
Article MATH MathSciNet Google Scholar
Goldstein, T., Osher, S.: The split Bregman method for L1-regularized problems. SIAM J. Img. Sci. 2 (2), 323–343 (2009)
Article MATH MathSciNet Google Scholar
Hager, W.W.: Updating the inverse of a matrix. SIAM Review 31 (2), 221–239 (1989)
Article MATH MathSciNet Google Scholar
Hastie, T., Rosset, S., Tibshirani, R., Zhu, J.: The entire regularization path for the support vector machine. Journal of Machine Learning Research 5, 1391–1415 (2004)
MATH MathSciNet Google Scholar
Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2 edn. Springer (2009)
Google Scholar
He, B., Yang, H., Wang, S.: Alternating direction method with self-adaptive penalty parameters for monotone variational inequalities. Journal of Optimization Theory and applications 106 (2), 337–356 (2000)
Article MATH MathSciNet Google Scholar
Hocking, T., Vert, J.P., Bach, F., Joulin, A.: Clusterpath: an algorithm for clustering using convex fusion penalties. In: L. Getoor, T. Scheffer (eds.) Proceedings of the 28th International Conference on Machine Learning (ICML-11), ICML ’11, pp. 745–752 (2011)
Google Scholar
Hu, Y., Allen, G.I.: Local-aggregate modeling for big-data via distributed optimization: Applications to neuroimaging. Biometrics 41 (4), 905–917 (2015)
Article MATH MathSciNet Google Scholar
Lindsten, F., Ohlsson, H., Ljung, L.: Just relax and come clustering! A convexification of k-means clustering. Tech. rep., Linköpings Universitet (2011)
Google Scholar
Liu, J., Musialski, P., Wonka, P., Ye, J.: Tensor completion for estimating missing values in visual data. In: Computer Vision, 2009 IEEE 12th International Conference on, pp. 2114–2121. IEEE (2009)
Google Scholar
Ma, S., Xue, L., Zou, H.: Alternating direction methods for latent variable Gaussian graphical model selection. Neural Computation 25 (8), 2172–2198 (2013)
Article MathSciNet Google Scholar
Mairal, J., Jenatton, R., Obozinski, G., Bach, F.: Convex and network flow optimization for structured sparsity. The Journal of Machine Learning Research 12, 2681–2720 (2011)
MATH MathSciNet Google Scholar
Meinshausen, N., Bühlmann, P.: Stability selection. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 72 (4), 417–473 (2010)
Google Scholar
Mohan, K., Chung, M., Han, S., Witten, D., Lee, S.I., Fazel, M.: Structured learning of Gaussian graphical models. In: Advances in Neural Information Processing Systems, pp. 629–637 (2012)
Google Scholar
Mohan, K., London, P., Fazel, M., Witten, D., Lee, S.I.: Node-based learning of multiple Gaussian graphical models. Journal of Machine Learning Research 15, 445–488 (2014)
MATH MathSciNet Google Scholar
Mota, J.F., Xavier, J., Aguiar, P.M., Puschel, M.: Distributed basis pursuit. Signal Processing, IEEE Transactions on 60 (4), 1942–1956 (2012)
Article MathSciNet Google Scholar
Negahban, S., Wainwright, M.J., et al.: Estimation of (near) low-rank matrices with noise and high-dimensional scaling. The Annals of Statistics 39 (2), 1069–1097 (2011)
Article MATH MathSciNet Google Scholar
Osborne, M.R., Presnell, B., Turlach, B.A.: A new approach to variable selection in least squares problems. IMA Journal of Numerical Analysis 20 (3), 389–403 (2000)
Article MATH MathSciNet Google Scholar
Parikh, N., Boyd, S.: Proximal algorithms. Foundations and Trends in Optimization 1 (3), 123–231 (2013)
Google Scholar
Peng, Z., Yan, M., Yin, W.: Parallel and distributed sparse optimization. In: Signals, Systems and Computers, 2013 Asilomar Conference on, pp. 659–646. IEEE (2013)
Google Scholar
Rosset, S., Ji, Z.: Piecewise linear regularized solution paths. The Annals of Statistics 35 (3), 1012–1030 (2007)
Article MATH MathSciNet Google Scholar
Rudin, L.I., Osher, S., Fatemi, E.: Nonlinear total variation based noise removal algorithms. Physica D: Nonlinear Phenomena 60 (1–4), 259–268 (1992)
Article MATH MathSciNet Google Scholar
Shi, J., Yin, W., Osher, S.: A new regularization path for logistic regression via linearized Bregman. Tech. rep., Rice CAAM Tech Report TR12-24 (2012)
Google Scholar
Tibshirani, R.: Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society. Series B (Methodological) 58 (1), 267–288 (1996)
Google Scholar
Tibshirani, R., Saunders, M., Rosset, S., Zhu, J., Knight, K.: Sparsity and smoothness via the fused lasso. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 67 (1), 91–108 (2005)
Google Scholar
Vu, V.Q., Cho, J., Lei, J., Rohe, K.: Fantope projection and selection: A near-optimal convex relaxation of sparse PCA. In: Advances in Neural Information Processing Systems 26, pp. 2670–2678 (2013)
Google Scholar
Wahlberg, B., Boyd, S., Annergren, M., Wang, Y.: An ADMM algorithm for a class of total variation regularized estimation problems. System Identification 16 (1), 83–88 (2012)
Google Scholar
Wu, T.T., Lange, K., et al.: Coordinate descent algorithms for lasso penalized regression. The Annals of Applied Statistics 2 (1), 224–244 (2008)
Article MATH MathSciNet Google Scholar
Yan, M., Yin, W.: Self equivalence of the alternating direction method of multipliers. In: R. Glowinski, S. Osher, W. Yin (eds.) Splitting Methods in Communication and Imaging, Science and Engineering. Springer (2016)
Google Scholar
Yuan, L., Liu, J., Ye, J.: Efficient methods for overlapping group lasso. In: Advances in Neural Information Processing Systems 24 (NIPS 2011), 9, pp. 352–360 (2011)
Google Scholar
Yuan, M., Lin, Y.: Model selection and estimation in regression with grouped variables. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 68 (1), 49–67 (2006)
Google Scholar
Yuan, X., Yang, J.: Sparse and low-rank matrix decomposition via alternating direction method. The Pacific Journal of Optimization 9 (1), 167–180 (2012)
MATH MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

Department of Statistics, Rice University, Houston, TX, USA
Yue Hu
Department of Statistics, North Carolina State University, Raleigh, NC, USA
Eric C. Chi
Department of Statistics and Electrical and Computer Engineering, Rice University, Jan and Dan Duncan Neurological Research Institute, Baylor College of Medicine, Houston, TX, USA
Genevera I. Allen

Authors

Yue Hu
View author publications
You can also search for this author in PubMed Google Scholar
Eric C. Chi
View author publications
You can also search for this author in PubMed Google Scholar
Genevera I. Allen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Genevera I. Allen .

Editor information

Editors and Affiliations

Department of Mathematics, University of Houston, Houston, Texas, USA
Roland Glowinski
Department of Mathematics, UCLA, Los Angeles, California, USA
Stanley J. Osher
Department of Mathematics, UCLA, Los Angeles, California, USA
Wotao Yin

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Hu, Y., Chi, E.C., Allen, G.I. (2016). ADMM Algorithmic Regularization Paths for Sparse Statistical Machine Learning. In: Glowinski, R., Osher, S., Yin, W. (eds) Splitting Methods in Communication, Imaging, Science, and Engineering. Scientific Computation. Springer, Cham. https://doi.org/10.1007/978-3-319-41589-5_13

Download citation

DOI: https://doi.org/10.1007/978-3-319-41589-5_13
Published: 06 January 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-41587-1
Online ISBN: 978-3-319-41589-5
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)

Publish with us

Policies and ethics