Skip to main content

Machine Learning Algorithms for Big Data

  • Chapter
  • First Online:
Big Data Analytics: Systems, Algorithms, Applications

Abstract

Growth of data provided from varied sources has created enormous amount of resources. However, utilizing those resources for any useful task requires deep understanding about characteristics of the data. Goal of machine learning algorithms is to learn these characteristics and use them for future predictions. However, in the context of big data, applying machine learning algorithms rely on the effective processing techniques of the data such as using data parallelism by working with huge chunks of data. Hence, machine learning methodologies are increasingly becoming statistical and less rule-based to handle such scale of data.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 54.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 69.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. S.P. Singh, U.C. Jaiswal, Machine learning for big data: a new perspective. Int. J. Appl. Eng. Res. 13(5), 2753–2762 (2018)

    Google Scholar 

  2. A.Y. Ng, M.I. Jordan, On discriminative vs. generative classifiers: a comparison of logistic regression and naive bayes, in Advances in Neural Information Processing Systems (2002), pp. 841–848

    Google Scholar 

  3. T. Jaakkola, M. Meila, T. Jebara, Maximum entropy discrimination, in Advances in Neural Information Processing Systems (2000), pp. 470–476

    Google Scholar 

  4. J. Han, J. Pei, M. Kamber, Data Mining: Concepts and Techniques (Elsevier, 2011)

    Google Scholar 

  5. P.E. Utgoff, Incremental induction of decision trees. Mach. Learn. 4(2), 161–186 (1989)

    Article  Google Scholar 

  6. J.R. Quinlan, C4. 5: Programs for Machine Learning (Elsevier, 2014)

    Google Scholar 

  7. L. Breiman, J. Friedman, C.J. Stone, R.A. Olshen, Classification and Regression Trees (CRC Press, 1984)

    Google Scholar 

  8. S. Wold, K. Esbensen, P. Geladi, Principal component analysis. Chemometr. Intell. Lab. Syst. 2(1–3), 37–52 (1987)

    Article  Google Scholar 

  9. B. Schölkopf, A. Smola, K.-R. Muller, Nonlinear Component Analysis as a Kernel Eigenvalue Problem. Neural Comput. 10(5), 1299–1319 (1998)

    Google Scholar 

  10. D.M. Blei, A.Y. Ng, M.I. Jordan, Latent dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)

    MATH  Google Scholar 

  11. D.M. Blei, J. Lafferty, Correlated Topic Models, in Advances in Neural Information Processing Systems (2005)

    Google Scholar 

  12. D.M. Blei, J.D. Lafferty, Dynamic topic models, in Proceedings of the 23rd International Conference on MACHINE Learning, (2006), pp. 113–120

    Google Scholar 

  13. J. Wang, Local tangent space alignment, in Geometric Structure of High-Dimensional Data and Dimensionality Reduction (2012), pp. 221–234

    Google Scholar 

  14. J. Sivic, B.C. Russell, A.A. Efros, A. Zisserman, W.T. Freeman, Discovering objects and their location in images, in ICCV (2005), pp. 370–377

    Google Scholar 

  15. M. Fritz, B. Schiele, Decomposition, Discovery and Detection of Visual Categories Using Topic Models, in CVPR (2008)

    Google Scholar 

  16. N. Srebro, J. Rennie, T.S. Jaakkola, Maximum-margin matrix factorization, in Advances in Neural Information Processing Systems (2005), pp. 1329–1336

    Google Scholar 

  17. D.D. Lee, H. Sebastian Seung, Learning the parts of objects by non-negative matrix factorization. Nature 401(6755), 788 (1999)

    Article  Google Scholar 

  18. D.D. Lee, H. Sebastian Seung, Algorithms for non-negative matrix factorization, in Advances in Neural Information Processing Systems (2001), pp. 556–562

    Google Scholar 

  19. R. Gemulla, E. Nijkamp, P.J. Haas, Y. Sismanis, Large-scale matrix factorization with distributed stochastic gradient descent, in Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2011), pp. 69–77

    Google Scholar 

  20. Y. Koren, R. Bell, C. Volinsky, Matrix factorization techniques for recommender systems. Computer 8, 30–37 (2009)

    Article  Google Scholar 

  21. L Cayton, Algorithms for Manifold Learning. University of California at San Diego Tech. Rep 12, no. 1–17: 1 (2005)

    Google Scholar 

  22. J.B. Tenenbaum, V. De Silva, J.C. Langford, A global geometric framework for nonlinear dimensionality reduction. Science 290(5500), 2319–2323 (2000)

    Google Scholar 

  23. L.K. Saul, S.T. Roweis, Think globally, fit locally: unsupervised learning of low dimensional manifolds. J. Mach. Learn. Res. 4, 119–155 (2003)

    MathSciNet  MATH  Google Scholar 

  24. M. Belkin, P. Niyogi, Laplacian eigenmaps and spectral techniques for embedding and clustering, in Advances in Neural Information Processing Systems (2002), pp. 585–591

    Google Scholar 

  25. R. Pless, R. Souvenir, A survey of manifold learning for images. IPSJ Trans. Comput. Vis. Appl. 1, 83–94 (2009)

    Article  Google Scholar 

  26. I. Labutov, H. Lipson, Re-embedding words, in Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), vol. 2, (2013), pp. 489–493

    Google Scholar 

  27. X. Zhu, Semi-supervised learning, in Encyclopedia of Machine Learning (2011), pp. 892–897

    Google Scholar 

  28. A. Blum, T. Mitchell, Combining labeled and unlabeled data with co-training, in Proceedings of the Eleventh Annual Conference on Computational Learning Theory (1998), pp. 92–100

    Google Scholar 

  29. D. Yarowsky, Word-sense disambiguation using statistical models of Roget’s categories trained on large corpora, in Proceedings of the 14th Conference on Computational Linguistics, vol. 2 (1992), pp. 454–460

    Google Scholar 

  30. K. Nigam, A.K. McCallum, S. Thrun, T. Mitchell, Text classification from labeled and unlabeled documents using EM. Mach. Learn. 39(2–3), 103–134 (2000)

    Article  Google Scholar 

  31. X. Zhu, T. Rogers, R. Qian, C. Kalish, Humans perform semi-supervised classification too. AAAI 2007, 864–870 (2007)

    Google Scholar 

  32. S. Dasgupta, M.L. Littman, D.A. McAllester, PAC generalization bounds for co-training, in Advances in Neural Information Processing Systems (2001), pp. 375–382

    Google Scholar 

  33. X. Zhu, Z. Ghahramani, Learning from labeled and unlabeled data with label propagation (2002)

    Google Scholar 

  34. V. Sindhwani, P. Niyogi, M. Belkin, A co-regularization approach to semi-supervised learning with multiple views, in Proceedings of ICML Workshop on Learning with Multiple Views (2005), pp. 74–79

    Google Scholar 

  35. U. Brefeld, C. Bscher, T. Scheffer, Multi-view discriminative sequential learning, in European Conference on Machine Learning (2005),pp. 60–71

    Google Scholar 

  36. W.S. Lovejoy, A survey of algorithmic methods for partially observed Markov decision processes. Ann. Oper. Res. 28(1), 47–65 (1991)

    Article  MathSciNet  Google Scholar 

  37. C. Guestrin, D. Koller, R. Parr, Multiagent planning with factored MDPs, in Advances in Neural Information Processing Systems (2002), pp. 1523–1530

    Google Scholar 

  38. D. Silver, A. Huang, C.J. Maddison, A. Guez, L. Sifre, G. Van Den Driessche, J. Schrittwieser et al., Mastering the game of Go with deep neural networks and tree search. Nature 529(7587), 484 (2016)

    Google Scholar 

  39. W. Lam, S.T.S. Lu Liu, A.R. Prasad, Z. Vacheri, A. Doan, Muppet: MapReduce-style processing of fast data. Proc. VLDB Endow. 5(12), 1814–1825 (2012)

    Article  Google Scholar 

  40. Ó. Fontenla-Romero, B. Guijarro-Berdiñas, D. Martinez-Rego, B. Pérez-Sánchez, D Peteiro-Barral, Online machine learning, in Efficiency and Scalability Methods for Computational Intellect (2013), pp. 27–54

    Google Scholar 

  41. G. Widmer, M. Kubat, Learning in the presence of concept drift and hidden contexts. Mach. Learn. 23(1), 69–101 (1996)

    Google Scholar 

  42. P. Domingos, G. Hulten, Mining high-speed data streams, in Proceedings of the Sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2000), pp. 71–80

    Google Scholar 

  43. D. Crankshaw, X. Wang, G. Zhou, M.J. Franklin, J.E. Gonzalez, I. Stoica, Clipper: a low-latency online prediction serving system, in NSDI (2017), pp. 613–627

    Google Scholar 

  44. M. Li, D.G. Andersen, J.W. Park, A.J. Smola, A. Ahmed, V. Josifovski, J. Long, E.J. Shekita, S. Bor-Yiing, Scaling distributed machine learning with the parameter server. OSDI 14, 583–598 (2014)

    Google Scholar 

  45. A. Smola, S. Narayanamurthy, An architecture for parallel topic models. Proc. VLDB Endow. 3(1–2), 703–710 (2010)

    Article  Google Scholar 

  46. B. Fitzpatrick, Distributed caching with memcached. Linux J. 124, 5 (2004)

    Google Scholar 

  47. Q. Ho, J. Cipar, H. Cui, S. Lee, J.K. Kim, P.B. Gibbons, G.A. Gibson, G. Ganger, E.P. Xing, More effective distributed ml via a stale synchronous parallel parameter server, in Advances in Neural Information Processing systems (2013), pp. 1223–1231

    Google Scholar 

  48. M. Li, L. Zhou, Z. Yang, A. Li, F. Xia, D.G. Andersen, A. Smola, Parameter server for distributed machine learning. Big Learn. NIPS Works. 6, 2 (2013)

    Google Scholar 

  49. J. Dean, G. Corrado, R. Monga, K. Chen, M. Devin, M. Mao, A. Senior et al., Large scale distributed deep networks, in Advances in Neural Information Processing Systems (2012), pp. 1223–1231

    Google Scholar 

  50. J. Zhou, Q. Cui, X. Li, P. Zhao, S. Qu, J. Huang, PSMART: parameter server based multiple additive regression trees system, in Proceedings of the 26th International Conference on World Wide Web Companion (2017), pp. 879–880

    Google Scholar 

  51. A. Nair, P. Srinivasan, S. Blackwell, C. Alcicek, R. Fearon, A. De Maria, V. Panneershelvam et al., Massively parallel methods for deep reinforcement learning (2015). arXiv preprint arXiv:1507.04296

  52. A.A. Benczúr, L. Kocsis, R. Pálovics, Online machine learning in big data streams (2018). arXiv preprint arXiv:1802.05872

  53. A.P. Dawid, Present position and potential developments: some personal views: statistical theory: the prequential approach. J. Royal Stat. Soc. Series A (General), 278–292 (1984)

    Google Scholar 

  54. P. Zhao, S.C.H. Hoi, R. Jin, T. Yang, Online AUC maximization, in ICML (2011)

    Google Scholar 

  55. S. Agarwal, V. Vijaya Saradhi, H. Karnick, Kernel-based online machine learning and support vector reduction. Neurocomputing 71(7–9), 1230–1237 (2008)

    Article  Google Scholar 

  56. R.S. Sutton, A.G. Barto, F. Bach, Reinforcement Learning: An Introduction (MIT Press, 1998)

    Google Scholar 

  57. J. Langford, L. Li, T. Zhang, Sparse online learning via truncated gradient. J. Mach. Learn. Res. 10, 777–801 (2009)

    Google Scholar 

  58. A. Bordes, L. Bottou, The huller: a simple and efficient online SVM, in European Conference on Machine Learning (2005), pp. 505–512

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to C.S.R. Prabhu .

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Singapore Pte Ltd.

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Prabhu, C., Chivukula, A., Mogadala, A., Ghosh, R., Livingston, L. (2019). Machine Learning Algorithms for Big Data. In: Big Data Analytics: Systems, Algorithms, Applications. Springer, Singapore. https://doi.org/10.1007/978-981-15-0094-7_6

Download citation

  • DOI: https://doi.org/10.1007/978-981-15-0094-7_6

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-15-0093-0

  • Online ISBN: 978-981-15-0094-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics