Sublinear Algorithms for Penalized Logistic Regression in Massive Datasets

  • Haoruo Peng
  • Zhengyu Wang
  • Edward Y. Chang
  • Shuchang Zhou
  • Zhihua Zhang
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7523)


Penalized logistic regression (PLR) is a widely used supervised learning model. In this paper, we consider its applications in large-scale data problems and resort to a stochastic primal-dual approach for solving PLR. In particular, we employ a random sampling technique in the primal step and a multiplicative weights method in the dual step. This technique leads to an optimization method with sublinear dependency on both the volume and dimensionality of training data. We develop concrete algorithms for PLR with ℓ2-norm and ℓ1-norm penalties, respectively. Experimental results over several large-scale and high-dimensional datasets demonstrate both efficiency and accuracy of our algorithms.


Logistic Regression Test Error Neural Information Processing System Stochastic Gradient Descent Machine Learn Research 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Arora, S., Hazan, E., Kale, S.: The multiplicative weights update method: a meta algorithm and applications (2005), Preliminary draft of paper available online at (manuscript)
  2. 2.
    Balakrishnan, S., Madigan, D.: Algorithms for sparse linear classifiers in the massive data setting. The Journal of Machine Learning Research 9, 313–337 (2008)zbMATHGoogle Scholar
  3. 3.
    Clarkson, K.L., Hazan, E., Woodruff, D.P.: Sublinear optimization for machine learning. In: Proceedings of the 2010 IEEE 51st Annual Symposium on Foundations of Computer Science, pp. 449–457. IEEE Computer Society (2010)Google Scholar
  4. 4.
    Cotter, A., Shalev-Shwartz, S., Srebro, N.: The kernelized stochastic batch perceptron. Arxiv preprint arXiv:1204.0566 (2012)Google Scholar
  5. 5.
    Delany, S.J., Cunningham, P., Tsymbal, A., Coyle, L.: A case-based technique for tracking concept drift in spam filtering. Knowledge-Based Systems 18(4-5), 187–195 (2005)CrossRefGoogle Scholar
  6. 6.
    Garber, D., Hazan, E.: Approximating semidefinite programs in sublinear time. In: Advances in Neural Information Processing Systems (2011)Google Scholar
  7. 7.
    Genkin, A., Lewis, D.D., Madigan, D.: Large-scale bayesian logistic regression for text categorization. Technometrics 49(3), 291–304 (2007)MathSciNetCrossRefGoogle Scholar
  8. 8.
    Guyon, I., Gunn, S., Ben-Hur, A., Dror, G.: Result analysis of the nips 2003 feature selection challenge. In: Advances in Neural Information Processing Systems, vol. 17, pp. 545–552 (2004)Google Scholar
  9. 9.
    Hastie, T., Tishirani, R., Friedman, J.: The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer, New York (2001)zbMATHGoogle Scholar
  10. 10.
    Hazan, E., Koren, T.: Optimal algorithms for ridge and lasso regression with partially observed attributes. Arxiv preprint arXiv:1108.4559 (2011)Google Scholar
  11. 11.
    Hazan, E., Koren, T., Srebro, N.: Beating sgd: Learning svms in sublinear time. In: Advances in Neural Information Processing Systems (2011)Google Scholar
  12. 12.
    Hogan, C., Cassell, L., Foglesong, J., Kordas, J., Nemanic, M., Richmond, G.: The livermore distributed storage system: Requirements and overview. In: Tenth IEEE Symposium on Mass Storage Systems Digest of Papers, pp. 6–17. IEEE (1990)Google Scholar
  13. 13.
    Kohavi, R.: A study of cross-validation and bootstrap for accuracy estimation and model selection. In: International Joint Conference on Artificial Intelligence, vol. 14, pp. 1137–1145. Lawrence Erlbaum Associates Ltd. (1995)Google Scholar
  14. 14.
    Panda, D.K.: Global reduction in wormhole k-ary n-cube networks with multidestination exchange worms. In: IPPS: 9th International Parallel Processing Symposium, pp. 652–659. IEEE Computer Society Press (1995)Google Scholar
  15. 15.
    Shi, J., Yin, W., Osher, S., Sajda, P.: A fast hybrid algorithm for large scale l1-regularized logistic regression. Journal of Machine Learning Research 1, 8888 (2008)Google Scholar
  16. 16.
    Tibshirani, R.: Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society. Series B (Methodological), 267–288 (1996)Google Scholar
  17. 17.
    Tsumoto, S.: Mining diagnostic rules from clinical databases using rough sets and medical diagnostic model. Information Sciences 162(2), 65–80 (2004)CrossRefGoogle Scholar
  18. 18.
    Vapnik, V.: Statistical Learning Theory. John Wiley and Sons, New York (1998)zbMATHGoogle Scholar
  19. 19.
    Xiao, L.: Dual averaging methods for regularized stochastic learning and online optimization. The Journal of Machine Learning Research 11, 2543–2596 (2010)zbMATHGoogle Scholar
  20. 20.
    Zhang, T.: Solving large scale linear prediction problems using stochastic gradient descent algorithms. In: Proceedings of the Twenty-First International Conference on Machine Learning, p. 116. ACM (2004)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Haoruo Peng
    • 1
    • 2
  • Zhengyu Wang
    • 1
    • 3
  • Edward Y. Chang
    • 1
  • Shuchang Zhou
    • 1
  • Zhihua Zhang
    • 1
    • 4
  1. 1.Google Research BeijingBeijingChina
  2. 2.Department of Computer Science and TechnologyTsinghua UniversityBeijingChina
  3. 3.Institute for Interdisciplinary Information SciencesTsinghua UniversityBeijingChina
  4. 4.College of Computer Science and TechnologyZhejiang UniversityZhejiangChina

Personalised recommendations