Abstract
Nowadays, the major challenge in machine learning is the ‘Big Data’ challenge. The big data problems due to large number of data points or large number of features in each data point, or both, the training of models have become very slow. The training time has two major components: Time to access the data and time to process (learn from) the data. So far, the research has focused only on the second part, i.e., learning from the data. In this paper, we have proposed one possible solution to handle the big data problems in machine learning. The idea is to reduce the training time through reducing data access time by proposing systematic sampling and cyclic/sequential sampling to select mini-batches from the dataset. To prove the effectiveness of proposed sampling techniques, we have used empirical risk minimization, which is commonly used machine learning problem, for strongly convex and smooth case. The problem has been solved using SAG, SAGA, SVRG, SAAG-II and MBSGD (Mini-batched SGD), each using two step determination techniques, namely, constant step size and backtracking line search method. Theoretical results prove similar convergence for systematic and cyclic sampling as the widely used random sampling technique, in expectation. Experimental results with bench marked datasets prove the efficacy of the proposed sampling techniques and show up to six times faster training.
Similar content being viewed by others
Notes
datasets used in the experiments are available at: https://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/
Experimental results can be reproduced using the code available at following link: https://sites.google.com/site/jmdvinodjmd/code
References
Byrd R H, Hansen S L, Nocedal J, Singer Y (2016) A stochastic quasi-newton method for large-scale optimization. SIAM J Optim 26(2):1008–1031
Chang K W, Hsieh C J, Lin C J (2008) Coordinate descent method for large-scale l2-loss linear support vector machines. J Mach Learn Res 9:1369–1398
Chauhan VK, Dahiya K, Sharma A (2017) Mini-batch block-coordinate based stochastic average adjusted gradient methods to solve big data problems. In: Proceedings of the 9th Asian conference on machine learning, PMLR, vol 77, pp 49–64. http://proceedings.mlr.press/v77/chauhan17a.html
Chauhan VK, Dahiya K, Sharma A (2018) Problem formulations and solvers in linear SVM: a review. Artif Intell Rev. https://doi.org/10.1007/s10462-018-9614-6
Cotter A, Shamir O, Srebro N, Sridharan K (2011) Better mini-batch algorithms via accelerated gradient methods. Nips pp 1–9. arXiv:1106.4574
Csiba D, Richt P (2016) Importance Sampling for Minibatches, pp 1–19. arXiv:1602.02283v1
Csiba D, Richtárik P (2016) Coordinate Descent Face-Off: Primal or Dual? pp 1–17. arXiv:1605.08982
Defazio A, Bach F, Lacoste-Julien S (2014) Saga: a fast incremental gradient method with support for non-strongly convex composite objectives. In: Proceedings of the 27th international conference on neural information processing systems. NIPS’14. MIT Press, Cambridge, pp 1646–1654
Gopal S (2016) Adaptive sampling for SGD by exploiting side information. Icml
Johnson R, Zhang T (2013) Accelerating stochastic gradient descent using predictive variance reduction. In: Burges C J C, Bottou L, Welling M, Ghahramani Z, Weinberger K Q (eds) Advances in neural information processing systems, vol 26. Curran Associates, Inc., New York, pp 315–323
Konečný J, Richtárik P (2013) Semi-Stochastic Gradient Descent Methods 1:19. arXiv:1312.1666
Li M, Zhang T, Chen Y, Smola AJ (2014) Efficient mini-batch training for stochastic optimization. In: Proceedings of the 20th ACM SIGKDD international conference on knowledge discovery and data mining. KDD ’14. ACM, New York, pp 661–670
Madow WG (1949) On the theory of systematic sampling, ii. Ann Math Stat 20:333–354. http://www.jstor.org/stable/2236532
Madow WG, Madow LH (1944) On the theory of systematic sampling, i. Ann Math Stat 15:1–24. http://www.jstor.org/stable/2236209
Needell D, Ward R, Srebro N (2014) Stochastic gradient descent, weighted sampling, and the randomized kaczmarz algorithm. In: Ghahramani Z, Welling M, Cortes C, Lawrence N D, Weinberger K Q (eds) Advances in neural information processing systems, vol 27. Curran Associates, Inc, New York, pp 1017–1025
Nemirovski A, Juditsky A, Lan G, Shapiro A (2009) Robust stochastic approximation approach to stochastic programming. SIAM J Optim 19(4):1574–1609
Qu Z, Richt P (2015) Randomized dual coordinate ascent with arbitrary sampling. Neural Information Processing Systems (1):1–34. arXiv:1411.5873v1
Reddi SJ (2017) New optimization methods for modern machine learning. Dissertations. 1116. http://repository.cmu.edu/dissertations/1116
Schmidt M, Le Roux N, Bach F (2016) Minimizing finite sums with the stochastic average gradient. Math Program, pp 1–30
Shalev-Shwartz S, Singer Y, Srebro N, Cotter A (2011) Pegasos: primal estimated sub-gradient solver for svm. Math Program 127(1):3–30
Tak M, Bijral A, Richtárik P, Srebro N (2013) Mini-batch primal and dual methods for svms. In: In 30th international conference on machine learning. Springer, pp 537–552
Wang X, Wang S, Zhang H (2017) Inexact proximal stochastic gradient method for convex composite optimization. Computational Optimization and Applications. https://doi.org/10.1007/s10589-017-9932-7
Wright S J (2015) Coordinate descent algorithms. Math Program 151(1):3–34
Xu Y, Yin W (2015) Block stochastic gradient iteration for convex and nonconvex optimization. SIAM J Optim 25(3):1686–1716
Yu HF, Hsieh CJ, Chang KW, Lin CJ (2010) Large linear classification when data cannot fit in memory. In: Proceedings of the 16th ACM SIGKDD international conference on knowledge discovery and data mining. KDD ’10. ACM, New York, pp 833–842
Zhang A, Gu Q (2016) Accelerated stochastic block coordinate descent with optimal sampling. In: Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, KDD ’16. ACM, New York, pp 2035–2044
Zhang T (2004) Solving large scale linear prediction problems using stochastic gradient descent algorithms. In: Proceedings of the 21st international conference on machine learning, ICML ’04. ACM, New York, p 116
Zhao P, Zhang T (2014) Accelerating minibatch stochastic gradient descent using stratified sampling, pp 1–13. arXiv:14053080
Zhao P, Zhang T (2015) Stochastic optimization with importance sampling for regularized loss minimization. ICML 37:1–9
Zhou L, Pan S, Wang J, Vasilakos AV (2017) Machine learning on big data: opportunities and challenges. Neurocomputing 237:350–361. https://doi.org/10.1016/j.neucom.2017.01.026, https://www.sciencedirect.com/science/article/pii/S0925231217300577
Acknowledgments
First, author is thankful to Ministry of Human Resource Development, Government of INDIA, to provide fellowship (University Grants Commission - Senior Research Fellowship) to pursue his PhD.
We acknowledge Manish Goyal, Senior Research Fellow, Department of Statistics, Panjab University for helping with the statistical insights around sampling and expectation.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Chauhan, V.K., Sharma, A. & Dahiya, K. Faster learning by reduction of data access time. Appl Intell 48, 4715–4729 (2018). https://doi.org/10.1007/s10489-018-1235-x
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-018-1235-x