Skip to main content

ADMM Algorithmic Regularization Paths for Sparse Statistical Machine Learning

  • Chapter
  • First Online:
Splitting Methods in Communication, Imaging, Science, and Engineering

Part of the book series: Scientific Computation ((SCIENTCOMP))

Abstract

Optimization approaches based on operator splitting are becoming popular for solving sparsity regularized statistical machine learning models. While many have proposed fast algorithms to solve these problems for a single regularization parameter, conspicuously less attention has been given to computing regularization paths, or solving the optimization problems over the full range of regularization parameters to obtain a sequence of sparse models. In this chapter, we aim to quickly approximate the sequence of sparse models associated with regularization paths for the purposes of statistical model selection by using the building blocks from a classical operator splitting method, the Alternating Direction Method of Multipliers (ADMM). We begin by proposing an ADMM algorithm that uses warm-starts to quickly compute the regularization path. Then, by employing approximations along this warm-starting ADMM algorithm, we propose a novel concept that we term the ADMM Algorithmic Regularization Path. Our method can quickly outline the sequence of sparse models associated with the regularization path in computational time that is often less than that of using the ADMM algorithm to solve the problem for a single regularization parameter. We demonstrate the applicability and substantial computational savings of our approach through three popular examples, sparse linear regression, reduced-rank multi-task learning, and convex clustering.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 149.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 199.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 199.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Aguiar, P., Xing, E.P., Figueiredo, M., Smith, N.A., Martins, A.: An augmented Lagrangian approach to constrained MAP inference. In: Proceedings of the 28th International Conference on Machine Learning (ICML-11), pp. 169–176 (2011)

    Google Scholar 

  2. Bien, J., Taylor, J., Tibshirani, R.: A lasso for hierarchical interactions. The Annals of Statistics 41 (3), 1111–1141 (2013)

    Article  MATH  MathSciNet  Google Scholar 

  3. Boyd, S., Parikh, N., Chu, E., Peleato, B., Eckstein, J.: Distributed optimization and statistical learning via the alternating direction method of multipliers. Foundations and Trends®; in Machine Learning 3 (1), 1–122 (2011)

    Google Scholar 

  4. Bühlmann, P., Van De Geer, S.: Statistics for High-Dimensional Data: Methods, Theory and Applications. Springer, Berlin Heidelberg (2011)

    Book  MATH  Google Scholar 

  5. Chi, E.C., Lange, K.: Splitting methods for convex clustering. Journal of Computational and Graphical Statistics (to appear)

    Google Scholar 

  6. Danaher, P., Wang, P., Witten, D.M.: The joint graphical lasso for inverse covariance estimation across multiple classes. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 76 (2), 373–397 (2014)

    Google Scholar 

  7. Donoho, D.L., Tsaig, Y., Drori, I., Starck, J.L.: Sparse solution of underdetermined systems of linear equations by stagewise orthogonal matching pursuit. Information Theory, IEEE Transactions on 58 (2), 1094–1121 (2012)

    Article  MathSciNet  Google Scholar 

  8. Efron, B., Hastie, T., Johnstone, I., Tibshirani, R.: Least angle regression. The Annals of Statistics 32 (2), 407–499 (2004)

    Article  MATH  MathSciNet  Google Scholar 

  9. Friedman, J., Hastie, T., Höfling, H., Tibshirani, R., et al.: Pathwise coordinate optimization. The Annals of Applied Statistics 1 (2), 302–332 (2007)

    Article  MATH  MathSciNet  Google Scholar 

  10. Friedman, J., Hastie, T., Tibshirani, R.: Regularization paths for generalized linear models via coordinate descent. Journal of Statistical Software 33 (1), 1–22 (2010)

    Article  Google Scholar 

  11. Goldstein, T., O’Donoghue, B., Setzer, S.: Fast alternating direction optimization methods. SIAM Journal on Imaging Sciences 7, 1588–1623 (2014)

    Article  MATH  MathSciNet  Google Scholar 

  12. Goldstein, T., Osher, S.: The split Bregman method for L1-regularized problems. SIAM J. Img. Sci. 2 (2), 323–343 (2009)

    Article  MATH  MathSciNet  Google Scholar 

  13. Hager, W.W.: Updating the inverse of a matrix. SIAM Review 31 (2), 221–239 (1989)

    Article  MATH  MathSciNet  Google Scholar 

  14. Hastie, T., Rosset, S., Tibshirani, R., Zhu, J.: The entire regularization path for the support vector machine. Journal of Machine Learning Research 5, 1391–1415 (2004)

    MATH  MathSciNet  Google Scholar 

  15. Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2 edn. Springer (2009)

    Google Scholar 

  16. He, B., Yang, H., Wang, S.: Alternating direction method with self-adaptive penalty parameters for monotone variational inequalities. Journal of Optimization Theory and applications 106 (2), 337–356 (2000)

    Article  MATH  MathSciNet  Google Scholar 

  17. Hocking, T., Vert, J.P., Bach, F., Joulin, A.: Clusterpath: an algorithm for clustering using convex fusion penalties. In: L. Getoor, T. Scheffer (eds.) Proceedings of the 28th International Conference on Machine Learning (ICML-11), ICML ’11, pp. 745–752 (2011)

    Google Scholar 

  18. Hu, Y., Allen, G.I.: Local-aggregate modeling for big-data via distributed optimization: Applications to neuroimaging. Biometrics 41 (4), 905–917 (2015)

    Article  MATH  MathSciNet  Google Scholar 

  19. Lindsten, F., Ohlsson, H., Ljung, L.: Just relax and come clustering! A convexification of k-means clustering. Tech. rep., Linköpings Universitet (2011)

    Google Scholar 

  20. Liu, J., Musialski, P., Wonka, P., Ye, J.: Tensor completion for estimating missing values in visual data. In: Computer Vision, 2009 IEEE 12th International Conference on, pp. 2114–2121. IEEE (2009)

    Google Scholar 

  21. Ma, S., Xue, L., Zou, H.: Alternating direction methods for latent variable Gaussian graphical model selection. Neural Computation 25 (8), 2172–2198 (2013)

    Article  MathSciNet  Google Scholar 

  22. Mairal, J., Jenatton, R., Obozinski, G., Bach, F.: Convex and network flow optimization for structured sparsity. The Journal of Machine Learning Research 12, 2681–2720 (2011)

    MATH  MathSciNet  Google Scholar 

  23. Meinshausen, N., Bühlmann, P.: Stability selection. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 72 (4), 417–473 (2010)

    Google Scholar 

  24. Mohan, K., Chung, M., Han, S., Witten, D., Lee, S.I., Fazel, M.: Structured learning of Gaussian graphical models. In: Advances in Neural Information Processing Systems, pp. 629–637 (2012)

    Google Scholar 

  25. Mohan, K., London, P., Fazel, M., Witten, D., Lee, S.I.: Node-based learning of multiple Gaussian graphical models. Journal of Machine Learning Research 15, 445–488 (2014)

    MATH  MathSciNet  Google Scholar 

  26. Mota, J.F., Xavier, J., Aguiar, P.M., Puschel, M.: Distributed basis pursuit. Signal Processing, IEEE Transactions on 60 (4), 1942–1956 (2012)

    Article  MathSciNet  Google Scholar 

  27. Negahban, S., Wainwright, M.J., et al.: Estimation of (near) low-rank matrices with noise and high-dimensional scaling. The Annals of Statistics 39 (2), 1069–1097 (2011)

    Article  MATH  MathSciNet  Google Scholar 

  28. Osborne, M.R., Presnell, B., Turlach, B.A.: A new approach to variable selection in least squares problems. IMA Journal of Numerical Analysis 20 (3), 389–403 (2000)

    Article  MATH  MathSciNet  Google Scholar 

  29. Parikh, N., Boyd, S.: Proximal algorithms. Foundations and Trends in Optimization 1 (3), 123–231 (2013)

    Google Scholar 

  30. Peng, Z., Yan, M., Yin, W.: Parallel and distributed sparse optimization. In: Signals, Systems and Computers, 2013 Asilomar Conference on, pp. 659–646. IEEE (2013)

    Google Scholar 

  31. Rosset, S., Ji, Z.: Piecewise linear regularized solution paths. The Annals of Statistics 35 (3), 1012–1030 (2007)

    Article  MATH  MathSciNet  Google Scholar 

  32. Rudin, L.I., Osher, S., Fatemi, E.: Nonlinear total variation based noise removal algorithms. Physica D: Nonlinear Phenomena 60 (1–4), 259–268 (1992)

    Article  MATH  MathSciNet  Google Scholar 

  33. Shi, J., Yin, W., Osher, S.: A new regularization path for logistic regression via linearized Bregman. Tech. rep., Rice CAAM Tech Report TR12-24 (2012)

    Google Scholar 

  34. Tibshirani, R.: Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society. Series B (Methodological) 58 (1), 267–288 (1996)

    Google Scholar 

  35. Tibshirani, R., Saunders, M., Rosset, S., Zhu, J., Knight, K.: Sparsity and smoothness via the fused lasso. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 67 (1), 91–108 (2005)

    Google Scholar 

  36. Vu, V.Q., Cho, J., Lei, J., Rohe, K.: Fantope projection and selection: A near-optimal convex relaxation of sparse PCA. In: Advances in Neural Information Processing Systems 26, pp. 2670–2678 (2013)

    Google Scholar 

  37. Wahlberg, B., Boyd, S., Annergren, M., Wang, Y.: An ADMM algorithm for a class of total variation regularized estimation problems. System Identification 16 (1), 83–88 (2012)

    Google Scholar 

  38. Wu, T.T., Lange, K., et al.: Coordinate descent algorithms for lasso penalized regression. The Annals of Applied Statistics 2 (1), 224–244 (2008)

    Article  MATH  MathSciNet  Google Scholar 

  39. Yan, M., Yin, W.: Self equivalence of the alternating direction method of multipliers. In: R. Glowinski, S. Osher, W. Yin (eds.) Splitting Methods in Communication and Imaging, Science and Engineering. Springer (2016)

    Google Scholar 

  40. Yuan, L., Liu, J., Ye, J.: Efficient methods for overlapping group lasso. In: Advances in Neural Information Processing Systems 24 (NIPS 2011), 9, pp. 352–360 (2011)

    Google Scholar 

  41. Yuan, M., Lin, Y.: Model selection and estimation in regression with grouped variables. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 68 (1), 49–67 (2006)

    Google Scholar 

  42. Yuan, X., Yang, J.: Sparse and low-rank matrix decomposition via alternating direction method. The Pacific Journal of Optimization 9 (1), 167–180 (2012)

    MATH  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Genevera I. Allen .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this chapter

Cite this chapter

Hu, Y., Chi, E.C., Allen, G.I. (2016). ADMM Algorithmic Regularization Paths for Sparse Statistical Machine Learning. In: Glowinski, R., Osher, S., Yin, W. (eds) Splitting Methods in Communication, Imaging, Science, and Engineering. Scientific Computation. Springer, Cham. https://doi.org/10.1007/978-3-319-41589-5_13

Download citation

Publish with us

Policies and ethics