In this paper, we develop a subsampling Newton’s method to efficiently approximate the maximum likelihood estimate in logistic regression, which is especially useful for large-sample problems. One distinct feature of our algorithm is that matrix inversion is not explicitly performed. We propose two algorithms which are used to construct iteratively a sequence of matrices which converge to the Hessian of the maximum likelihood function on the subsample. We provide numerical examples to show that the proposed method is efficient and robust.
This is a preview of subscription content, access via your institution.
Buy single article
Instant access to the full article PDF.
Tax calculation will be finalised during checkout.
Using the code provided by authors at https://github.com/Ossifragus/OSMAC.
Bach F (2013) Adaptivity of averaged stochastic gradient descent to local strong convexity for logistic regression. arXiv preprint arXiv:1303.6149
Bercu B, Godichon A, Portier B (2020) An efficient stochastic newton algorithm for parameter estimation in logistic regressions. SIAM J Control Optim 58(1):348–367
Bishop C (2006) Pattern recognition and machine learning. Springer, Berlin
Clémençon S, Bertail P, Chautru E, Papa G (2019) Optimal survey schemes for stochastic gradient descent with applications to m-estimation. ESAIM: Probab Stat 23:310–337
Czado C (1994) Parametric link modification of both tails in binary regression. Stat Papers 35(1):189–201
Drineas P, Magdon-Ismail M, Mahoney MW, Woodruff DP (2012) Fast approximation of matrix coherence and statistical leverage. J Mach Learn Res 13(Dec):3475–3506
Drineas P, Mahoney MW, Muthukrishnan S (2006) Sampling algorithms for l 2 regression and applications. In: Proceedings of the seventeenth annual ACM-SIAM symposium on Discrete algorithm. Society for Industrial and Applied Mathematics, pp 1127–1136
Drineas P, Mahoney MW, Muthukrishnan S, Sarlós T (2011) Faster least squares approximation. Numerische Mathematik 117(2):219–249
Dua D, Graff C (2017) UCI machine learning repository. http://archive.ics.uci.edu/ml
Duflo M (2013) Random iterative models, vol 34. Springer, Berlin
Fan J, Han F, Liu H (2014) Challenges of big data analysis. Nat Sci Rev 1(2):293–314
Godichon-Baggioni A (2016) Lp and almost sure rates of convergence of averaged stochastic gradient algorithms: locally strongly convex objective. arXiv preprint arXiv:1609.05479
Haggstrom GW (1983) Logistic regression and discriminant analysis by ordinary least squares. J Bus Econ Stat 1(3):229–238
Komori O, Eguchi S, Ikeda S, Okamura H, Ichinokawa M, Nakayama S (2016) An asymmetric logistic regression model for ecological data. Methods Ecol Evol 7(2):249–260
Küchenhoff H (1995) The identification of logistic regression models with errors in the variables. Stat Papers 36(1):41–47
Ma P, Mahoney MW, Yu B (2015) A statistical perspective on algorithmic leveraging. J Mach Learn Res 16(1):861–911
Ma P, Sun X (2015) Leveraging for big data regression. Wiley Interdiscip Rev 7(1):70–76
Marx V (2013) Biology: the big challenges of big data. Nature 498(7453):255–260
Merlo J, Wagner P, Ghith N, Leckie G (2016) An original stepwise multilevel logistic regression analysis of discriminatory accuracy: the case of neighbourhoods and health. PLoS ONE 11(4):e0153778
Miller KS (1981) On the inverse of the sum of matrices. Math Mag 54(2):67–72
Özkale MR (2016) Iterative algorithms of biased estimation methods in binary logistic regression. Stat Papers 57(4):991–1016
Polyak BT, Juditsky AB (1992) Acceleration of stochastic approximation by averaging. SIAM J Control Optim 30(4):838–855
Ruppert D (1988) Efficient estimations from a slowly convergent robbins-monro process. Technical report, Cornell University Operations Research and Industrial Engineering
Sherman J (1949) Adjustment of an inverse matrix corresponding to changes in the elements of a given column or a given row of the original matrix. Ann Math Stat 20(4):621
Sherman J, Morrison WJ (1950) Adjustment of an inverse matrix corresponding to a change in one element of a given matrix. Ann Math Stat 21(1):124–127
Wang H (2019) More efficient estimation for logistic regression with optimal subsamples. J Mach Learn Res 20(132):1–59
Wang H, Ma Y (2020) Optimal subsampling for quantile regression in big data. Biometrika 108:99–112
Wang H, Yang M, Stufken J (2019) Information-based optimal subdata selection for big data linear regression. J Am Stat Assoc 114(525):393–405
Wang H, Zhu R, Ma P (2018) Optimal subsampling for large sample logistic regression. J Am Stat Assoc 113(522):829–844
Yao Y, Wang H (2019) Optimal subsampling for softmax regression. Stat Papers 60(2):235–249
We are grateful to the editors and reviewer for the evaluation and for the detailed comments and suggestions on earlier versions of the manuscript, which have much improved the paper. Nhu N. Nguyen was in part supported by the National Science Foundation under Grant DMS-1710827.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
About this article
Cite this article
Kirkby, J.L., Nguyen, D.H., Nguyen, D. et al. Inversion-free subsampling Newton’s method for large sample logistic regression. Stat Papers (2021). https://doi.org/10.1007/s00362-021-01263-y
- Logistic regression
- Massive data
- Optimal subsampling
- Newton’s method
- Gradient descent
- Stochastic gradient descent
Mathematics Subject Classification