Abstract
Optimization approaches based on operator splitting are becoming popular for solving sparsity regularized statistical machine learning models. While many have proposed fast algorithms to solve these problems for a single regularization parameter, conspicuously less attention has been given to computing regularization paths, or solving the optimization problems over the full range of regularization parameters to obtain a sequence of sparse models. In this chapter, we aim to quickly approximate the sequence of sparse models associated with regularization paths for the purposes of statistical model selection by using the building blocks from a classical operator splitting method, the Alternating Direction Method of Multipliers (ADMM). We begin by proposing an ADMM algorithm that uses warm-starts to quickly compute the regularization path. Then, by employing approximations along this warm-starting ADMM algorithm, we propose a novel concept that we term the ADMM Algorithmic Regularization Path. Our method can quickly outline the sequence of sparse models associated with the regularization path in computational time that is often less than that of using the ADMM algorithm to solve the problem for a single regularization parameter. We demonstrate the applicability and substantial computational savings of our approach through three popular examples, sparse linear regression, reduced-rank multi-task learning, and convex clustering.
Keywords
- Regularization Parameter
- Sparse Solution
- Group Lasso
- Sparse Model
- Sparsity Level
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
This is a preview of subscription content, access via your institution.
Buying options




References
Aguiar, P., Xing, E.P., Figueiredo, M., Smith, N.A., Martins, A.: An augmented Lagrangian approach to constrained MAP inference. In: Proceedings of the 28th International Conference on Machine Learning (ICML-11), pp. 169–176 (2011)
Bien, J., Taylor, J., Tibshirani, R.: A lasso for hierarchical interactions. The Annals of Statistics 41 (3), 1111–1141 (2013)
Boyd, S., Parikh, N., Chu, E., Peleato, B., Eckstein, J.: Distributed optimization and statistical learning via the alternating direction method of multipliers. Foundations and Trends®; in Machine Learning 3 (1), 1–122 (2011)
Bühlmann, P., Van De Geer, S.: Statistics for High-Dimensional Data: Methods, Theory and Applications. Springer, Berlin Heidelberg (2011)
Chi, E.C., Lange, K.: Splitting methods for convex clustering. Journal of Computational and Graphical Statistics (to appear)
Danaher, P., Wang, P., Witten, D.M.: The joint graphical lasso for inverse covariance estimation across multiple classes. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 76 (2), 373–397 (2014)
Donoho, D.L., Tsaig, Y., Drori, I., Starck, J.L.: Sparse solution of underdetermined systems of linear equations by stagewise orthogonal matching pursuit. Information Theory, IEEE Transactions on 58 (2), 1094–1121 (2012)
Efron, B., Hastie, T., Johnstone, I., Tibshirani, R.: Least angle regression. The Annals of Statistics 32 (2), 407–499 (2004)
Friedman, J., Hastie, T., Höfling, H., Tibshirani, R., et al.: Pathwise coordinate optimization. The Annals of Applied Statistics 1 (2), 302–332 (2007)
Friedman, J., Hastie, T., Tibshirani, R.: Regularization paths for generalized linear models via coordinate descent. Journal of Statistical Software 33 (1), 1–22 (2010)
Goldstein, T., O’Donoghue, B., Setzer, S.: Fast alternating direction optimization methods. SIAM Journal on Imaging Sciences 7, 1588–1623 (2014)
Goldstein, T., Osher, S.: The split Bregman method for L1-regularized problems. SIAM J. Img. Sci. 2 (2), 323–343 (2009)
Hager, W.W.: Updating the inverse of a matrix. SIAM Review 31 (2), 221–239 (1989)
Hastie, T., Rosset, S., Tibshirani, R., Zhu, J.: The entire regularization path for the support vector machine. Journal of Machine Learning Research 5, 1391–1415 (2004)
Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2 edn. Springer (2009)
He, B., Yang, H., Wang, S.: Alternating direction method with self-adaptive penalty parameters for monotone variational inequalities. Journal of Optimization Theory and applications 106 (2), 337–356 (2000)
Hocking, T., Vert, J.P., Bach, F., Joulin, A.: Clusterpath: an algorithm for clustering using convex fusion penalties. In: L. Getoor, T. Scheffer (eds.) Proceedings of the 28th International Conference on Machine Learning (ICML-11), ICML ’11, pp. 745–752 (2011)
Hu, Y., Allen, G.I.: Local-aggregate modeling for big-data via distributed optimization: Applications to neuroimaging. Biometrics 41 (4), 905–917 (2015)
Lindsten, F., Ohlsson, H., Ljung, L.: Just relax and come clustering! A convexification of k-means clustering. Tech. rep., Linköpings Universitet (2011)
Liu, J., Musialski, P., Wonka, P., Ye, J.: Tensor completion for estimating missing values in visual data. In: Computer Vision, 2009 IEEE 12th International Conference on, pp. 2114–2121. IEEE (2009)
Ma, S., Xue, L., Zou, H.: Alternating direction methods for latent variable Gaussian graphical model selection. Neural Computation 25 (8), 2172–2198 (2013)
Mairal, J., Jenatton, R., Obozinski, G., Bach, F.: Convex and network flow optimization for structured sparsity. The Journal of Machine Learning Research 12, 2681–2720 (2011)
Meinshausen, N., Bühlmann, P.: Stability selection. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 72 (4), 417–473 (2010)
Mohan, K., Chung, M., Han, S., Witten, D., Lee, S.I., Fazel, M.: Structured learning of Gaussian graphical models. In: Advances in Neural Information Processing Systems, pp. 629–637 (2012)
Mohan, K., London, P., Fazel, M., Witten, D., Lee, S.I.: Node-based learning of multiple Gaussian graphical models. Journal of Machine Learning Research 15, 445–488 (2014)
Mota, J.F., Xavier, J., Aguiar, P.M., Puschel, M.: Distributed basis pursuit. Signal Processing, IEEE Transactions on 60 (4), 1942–1956 (2012)
Negahban, S., Wainwright, M.J., et al.: Estimation of (near) low-rank matrices with noise and high-dimensional scaling. The Annals of Statistics 39 (2), 1069–1097 (2011)
Osborne, M.R., Presnell, B., Turlach, B.A.: A new approach to variable selection in least squares problems. IMA Journal of Numerical Analysis 20 (3), 389–403 (2000)
Parikh, N., Boyd, S.: Proximal algorithms. Foundations and Trends in Optimization 1 (3), 123–231 (2013)
Peng, Z., Yan, M., Yin, W.: Parallel and distributed sparse optimization. In: Signals, Systems and Computers, 2013 Asilomar Conference on, pp. 659–646. IEEE (2013)
Rosset, S., Ji, Z.: Piecewise linear regularized solution paths. The Annals of Statistics 35 (3), 1012–1030 (2007)
Rudin, L.I., Osher, S., Fatemi, E.: Nonlinear total variation based noise removal algorithms. Physica D: Nonlinear Phenomena 60 (1–4), 259–268 (1992)
Shi, J., Yin, W., Osher, S.: A new regularization path for logistic regression via linearized Bregman. Tech. rep., Rice CAAM Tech Report TR12-24 (2012)
Tibshirani, R.: Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society. Series B (Methodological) 58 (1), 267–288 (1996)
Tibshirani, R., Saunders, M., Rosset, S., Zhu, J., Knight, K.: Sparsity and smoothness via the fused lasso. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 67 (1), 91–108 (2005)
Vu, V.Q., Cho, J., Lei, J., Rohe, K.: Fantope projection and selection: A near-optimal convex relaxation of sparse PCA. In: Advances in Neural Information Processing Systems 26, pp. 2670–2678 (2013)
Wahlberg, B., Boyd, S., Annergren, M., Wang, Y.: An ADMM algorithm for a class of total variation regularized estimation problems. System Identification 16 (1), 83–88 (2012)
Wu, T.T., Lange, K., et al.: Coordinate descent algorithms for lasso penalized regression. The Annals of Applied Statistics 2 (1), 224–244 (2008)
Yan, M., Yin, W.: Self equivalence of the alternating direction method of multipliers. In: R. Glowinski, S. Osher, W. Yin (eds.) Splitting Methods in Communication and Imaging, Science and Engineering. Springer (2016)
Yuan, L., Liu, J., Ye, J.: Efficient methods for overlapping group lasso. In: Advances in Neural Information Processing Systems 24 (NIPS 2011), 9, pp. 352–360 (2011)
Yuan, M., Lin, Y.: Model selection and estimation in regression with grouped variables. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 68 (1), 49–67 (2006)
Yuan, X., Yang, J.: Sparse and low-rank matrix decomposition via alternating direction method. The Pacific Journal of Optimization 9 (1), 167–180 (2012)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this chapter
Cite this chapter
Hu, Y., Chi, E.C., Allen, G.I. (2016). ADMM Algorithmic Regularization Paths for Sparse Statistical Machine Learning. In: Glowinski, R., Osher, S., Yin, W. (eds) Splitting Methods in Communication, Imaging, Science, and Engineering. Scientific Computation. Springer, Cham. https://doi.org/10.1007/978-3-319-41589-5_13
Download citation
DOI: https://doi.org/10.1007/978-3-319-41589-5_13
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-41587-1
Online ISBN: 978-3-319-41589-5
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)