Skip to main content
Log in

Faster learning by reduction of data access time

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

Nowadays, the major challenge in machine learning is the ‘Big Data’ challenge. The big data problems due to large number of data points or large number of features in each data point, or both, the training of models have become very slow. The training time has two major components: Time to access the data and time to process (learn from) the data. So far, the research has focused only on the second part, i.e., learning from the data. In this paper, we have proposed one possible solution to handle the big data problems in machine learning. The idea is to reduce the training time through reducing data access time by proposing systematic sampling and cyclic/sequential sampling to select mini-batches from the dataset. To prove the effectiveness of proposed sampling techniques, we have used empirical risk minimization, which is commonly used machine learning problem, for strongly convex and smooth case. The problem has been solved using SAG, SAGA, SVRG, SAAG-II and MBSGD (Mini-batched SGD), each using two step determination techniques, namely, constant step size and backtracking line search method. Theoretical results prove similar convergence for systematic and cyclic sampling as the widely used random sampling technique, in expectation. Experimental results with bench marked datasets prove the efficacy of the proposed sampling techniques and show up to six times faster training.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

Notes

  1. datasets used in the experiments are available at: https://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/

  2. Experimental results can be reproduced using the code available at following link: https://sites.google.com/site/jmdvinodjmd/code

References

  1. Byrd R H, Hansen S L, Nocedal J, Singer Y (2016) A stochastic quasi-newton method for large-scale optimization. SIAM J Optim 26(2):1008–1031

    Article  MathSciNet  Google Scholar 

  2. Chang K W, Hsieh C J, Lin C J (2008) Coordinate descent method for large-scale l2-loss linear support vector machines. J Mach Learn Res 9:1369–1398

    MathSciNet  MATH  Google Scholar 

  3. Chauhan VK, Dahiya K, Sharma A (2017) Mini-batch block-coordinate based stochastic average adjusted gradient methods to solve big data problems. In: Proceedings of the 9th Asian conference on machine learning, PMLR, vol 77, pp 49–64. http://proceedings.mlr.press/v77/chauhan17a.html

  4. Chauhan VK, Dahiya K, Sharma A (2018) Problem formulations and solvers in linear SVM: a review. Artif Intell Rev. https://doi.org/10.1007/s10462-018-9614-6

  5. Cotter A, Shamir O, Srebro N, Sridharan K (2011) Better mini-batch algorithms via accelerated gradient methods. Nips pp 1–9. arXiv:1106.4574

  6. Csiba D, Richt P (2016) Importance Sampling for Minibatches, pp 1–19. arXiv:1602.02283v1

  7. Csiba D, Richtárik P (2016) Coordinate Descent Face-Off: Primal or Dual? pp 1–17. arXiv:1605.08982

  8. Defazio A, Bach F, Lacoste-Julien S (2014) Saga: a fast incremental gradient method with support for non-strongly convex composite objectives. In: Proceedings of the 27th international conference on neural information processing systems. NIPS’14. MIT Press, Cambridge, pp 1646–1654

  9. Gopal S (2016) Adaptive sampling for SGD by exploiting side information. Icml

  10. Johnson R, Zhang T (2013) Accelerating stochastic gradient descent using predictive variance reduction. In: Burges C J C, Bottou L, Welling M, Ghahramani Z, Weinberger K Q (eds) Advances in neural information processing systems, vol 26. Curran Associates, Inc., New York, pp 315–323

  11. Konečný J, Richtárik P (2013) Semi-Stochastic Gradient Descent Methods 1:19. arXiv:1312.1666

  12. Li M, Zhang T, Chen Y, Smola AJ (2014) Efficient mini-batch training for stochastic optimization. In: Proceedings of the 20th ACM SIGKDD international conference on knowledge discovery and data mining. KDD ’14. ACM, New York, pp 661–670

  13. Madow WG (1949) On the theory of systematic sampling, ii. Ann Math Stat 20:333–354. http://www.jstor.org/stable/2236532

    Article  MathSciNet  Google Scholar 

  14. Madow WG, Madow LH (1944) On the theory of systematic sampling, i. Ann Math Stat 15:1–24. http://www.jstor.org/stable/2236209

    Article  MathSciNet  Google Scholar 

  15. Needell D, Ward R, Srebro N (2014) Stochastic gradient descent, weighted sampling, and the randomized kaczmarz algorithm. In: Ghahramani Z, Welling M, Cortes C, Lawrence N D, Weinberger K Q (eds) Advances in neural information processing systems, vol 27. Curran Associates, Inc, New York, pp 1017–1025

  16. Nemirovski A, Juditsky A, Lan G, Shapiro A (2009) Robust stochastic approximation approach to stochastic programming. SIAM J Optim 19(4):1574–1609

    Article  MathSciNet  Google Scholar 

  17. Qu Z, Richt P (2015) Randomized dual coordinate ascent with arbitrary sampling. Neural Information Processing Systems (1):1–34. arXiv:1411.5873v1

  18. Reddi SJ (2017) New optimization methods for modern machine learning. Dissertations. 1116. http://repository.cmu.edu/dissertations/1116

  19. Schmidt M, Le Roux N, Bach F (2016) Minimizing finite sums with the stochastic average gradient. Math Program, pp 1–30

  20. Shalev-Shwartz S, Singer Y, Srebro N, Cotter A (2011) Pegasos: primal estimated sub-gradient solver for svm. Math Program 127(1):3–30

    Article  MathSciNet  Google Scholar 

  21. Tak M, Bijral A, Richtárik P, Srebro N (2013) Mini-batch primal and dual methods for svms. In: In 30th international conference on machine learning. Springer, pp 537–552

  22. Wang X, Wang S, Zhang H (2017) Inexact proximal stochastic gradient method for convex composite optimization. Computational Optimization and Applications. https://doi.org/10.1007/s10589-017-9932-7

    Article  MathSciNet  Google Scholar 

  23. Wright S J (2015) Coordinate descent algorithms. Math Program 151(1):3–34

    Article  MathSciNet  Google Scholar 

  24. Xu Y, Yin W (2015) Block stochastic gradient iteration for convex and nonconvex optimization. SIAM J Optim 25(3):1686–1716

    Article  MathSciNet  Google Scholar 

  25. Yu HF, Hsieh CJ, Chang KW, Lin CJ (2010) Large linear classification when data cannot fit in memory. In: Proceedings of the 16th ACM SIGKDD international conference on knowledge discovery and data mining. KDD ’10. ACM, New York, pp 833–842

  26. Zhang A, Gu Q (2016) Accelerated stochastic block coordinate descent with optimal sampling. In: Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, KDD ’16. ACM, New York, pp 2035–2044

  27. Zhang T (2004) Solving large scale linear prediction problems using stochastic gradient descent algorithms. In: Proceedings of the 21st international conference on machine learning, ICML ’04. ACM, New York, p 116

  28. Zhao P, Zhang T (2014) Accelerating minibatch stochastic gradient descent using stratified sampling, pp 1–13. arXiv:14053080

  29. Zhao P, Zhang T (2015) Stochastic optimization with importance sampling for regularized loss minimization. ICML 37:1–9

    Google Scholar 

  30. Zhou L, Pan S, Wang J, Vasilakos AV (2017) Machine learning on big data: opportunities and challenges. Neurocomputing 237:350–361. https://doi.org/10.1016/j.neucom.2017.01.026, https://www.sciencedirect.com/science/article/pii/S0925231217300577

    Article  Google Scholar 

Download references

Acknowledgments

First, author is thankful to Ministry of Human Resource Development, Government of INDIA, to provide fellowship (University Grants Commission - Senior Research Fellowship) to pursue his PhD.

We acknowledge Manish Goyal, Senior Research Fellow, Department of Statistics, Panjab University for helping with the statistical insights around sampling and expectation.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Vinod Kumar Chauhan.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Chauhan, V.K., Sharma, A. & Dahiya, K. Faster learning by reduction of data access time. Appl Intell 48, 4715–4729 (2018). https://doi.org/10.1007/s10489-018-1235-x

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-018-1235-x

Keywords

Navigation