Machine Learning Algorithms for Big Data

Prabhu, C.S.R.; Chivukula, Aneesh Sreevallabh; Mogadala, Aditya; Ghosh, Rohit; Livingston, L.M. Jenila

doi:10.1007/978-981-15-0094-7_6

C.S.R. Prabhu⁶,
Aneesh Sreevallabh Chivukula⁷,
Aditya Mogadala⁸,
Rohit Ghosh⁹ &
…
L.M. Jenila Livingston¹⁰

3732 Accesses

Abstract

Growth of data provided from varied sources has created enormous amount of resources. However, utilizing those resources for any useful task requires deep understanding about characteristics of the data. Goal of machine learning algorithms is to learn these characteristics and use them for future predictions. However, in the context of big data, applying machine learning algorithms rely on the effective processing techniques of the data such as using data parallelism by working with huge chunks of data. Hence, machine learning methodologies are increasingly becoming statistical and less rule-based to handle such scale of data.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 54.99; Price excludes VAT (USA)

Softcover Book: USD 69.99; Price excludes VAT (USA)

Hardcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

S.P. Singh, U.C. Jaiswal, Machine learning for big data: a new perspective. Int. J. Appl. Eng. Res. 13(5), 2753–2762 (2018)
Google Scholar
A.Y. Ng, M.I. Jordan, On discriminative vs. generative classifiers: a comparison of logistic regression and naive bayes, in Advances in Neural Information Processing Systems (2002), pp. 841–848
Google Scholar
T. Jaakkola, M. Meila, T. Jebara, Maximum entropy discrimination, in Advances in Neural Information Processing Systems (2000), pp. 470–476
Google Scholar
J. Han, J. Pei, M. Kamber, Data Mining: Concepts and Techniques (Elsevier, 2011)
Google Scholar
P.E. Utgoff, Incremental induction of decision trees. Mach. Learn. 4(2), 161–186 (1989)
Article Google Scholar
J.R. Quinlan, C4. 5: Programs for Machine Learning (Elsevier, 2014)
Google Scholar
L. Breiman, J. Friedman, C.J. Stone, R.A. Olshen, Classification and Regression Trees (CRC Press, 1984)
Google Scholar
S. Wold, K. Esbensen, P. Geladi, Principal component analysis. Chemometr. Intell. Lab. Syst. 2(1–3), 37–52 (1987)
Article Google Scholar
B. Schölkopf, A. Smola, K.-R. Muller, Nonlinear Component Analysis as a Kernel Eigenvalue Problem. Neural Comput. 10(5), 1299–1319 (1998)
Google Scholar
D.M. Blei, A.Y. Ng, M.I. Jordan, Latent dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)
MATH Google Scholar
D.M. Blei, J. Lafferty, Correlated Topic Models, in Advances in Neural Information Processing Systems (2005)
Google Scholar
D.M. Blei, J.D. Lafferty, Dynamic topic models, in Proceedings of the 23rd International Conference on MACHINE Learning, (2006), pp. 113–120
Google Scholar
J. Wang, Local tangent space alignment, in Geometric Structure of High-Dimensional Data and Dimensionality Reduction (2012), pp. 221–234
Google Scholar
J. Sivic, B.C. Russell, A.A. Efros, A. Zisserman, W.T. Freeman, Discovering objects and their location in images, in ICCV (2005), pp. 370–377
Google Scholar
M. Fritz, B. Schiele, Decomposition, Discovery and Detection of Visual Categories Using Topic Models, in CVPR (2008)
Google Scholar
N. Srebro, J. Rennie, T.S. Jaakkola, Maximum-margin matrix factorization, in Advances in Neural Information Processing Systems (2005), pp. 1329–1336
Google Scholar
D.D. Lee, H. Sebastian Seung, Learning the parts of objects by non-negative matrix factorization. Nature 401(6755), 788 (1999)
Article Google Scholar
D.D. Lee, H. Sebastian Seung, Algorithms for non-negative matrix factorization, in Advances in Neural Information Processing Systems (2001), pp. 556–562
Google Scholar
R. Gemulla, E. Nijkamp, P.J. Haas, Y. Sismanis, Large-scale matrix factorization with distributed stochastic gradient descent, in Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2011), pp. 69–77
Google Scholar
Y. Koren, R. Bell, C. Volinsky, Matrix factorization techniques for recommender systems. Computer 8, 30–37 (2009)
Article Google Scholar
L Cayton, Algorithms for Manifold Learning. University of California at San Diego Tech. Rep 12, no. 1–17: 1 (2005)
Google Scholar
J.B. Tenenbaum, V. De Silva, J.C. Langford, A global geometric framework for nonlinear dimensionality reduction. Science 290(5500), 2319–2323 (2000)
Google Scholar
L.K. Saul, S.T. Roweis, Think globally, fit locally: unsupervised learning of low dimensional manifolds. J. Mach. Learn. Res. 4, 119–155 (2003)
MathSciNet MATH Google Scholar
M. Belkin, P. Niyogi, Laplacian eigenmaps and spectral techniques for embedding and clustering, in Advances in Neural Information Processing Systems (2002), pp. 585–591
Google Scholar
R. Pless, R. Souvenir, A survey of manifold learning for images. IPSJ Trans. Comput. Vis. Appl. 1, 83–94 (2009)
Article Google Scholar
I. Labutov, H. Lipson, Re-embedding words, in Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), vol. 2, (2013), pp. 489–493
Google Scholar
X. Zhu, Semi-supervised learning, in Encyclopedia of Machine Learning (2011), pp. 892–897
Google Scholar
A. Blum, T. Mitchell, Combining labeled and unlabeled data with co-training, in Proceedings of the Eleventh Annual Conference on Computational Learning Theory (1998), pp. 92–100
Google Scholar
D. Yarowsky, Word-sense disambiguation using statistical models of Roget’s categories trained on large corpora, in Proceedings of the 14th Conference on Computational Linguistics, vol. 2 (1992), pp. 454–460
Google Scholar
K. Nigam, A.K. McCallum, S. Thrun, T. Mitchell, Text classification from labeled and unlabeled documents using EM. Mach. Learn. 39(2–3), 103–134 (2000)
Article Google Scholar
X. Zhu, T. Rogers, R. Qian, C. Kalish, Humans perform semi-supervised classification too. AAAI 2007, 864–870 (2007)
Google Scholar
S. Dasgupta, M.L. Littman, D.A. McAllester, PAC generalization bounds for co-training, in Advances in Neural Information Processing Systems (2001), pp. 375–382
Google Scholar
X. Zhu, Z. Ghahramani, Learning from labeled and unlabeled data with label propagation (2002)
Google Scholar
V. Sindhwani, P. Niyogi, M. Belkin, A co-regularization approach to semi-supervised learning with multiple views, in Proceedings of ICML Workshop on Learning with Multiple Views (2005), pp. 74–79
Google Scholar
U. Brefeld, C. Bscher, T. Scheffer, Multi-view discriminative sequential learning, in European Conference on Machine Learning (2005),pp. 60–71
Google Scholar
W.S. Lovejoy, A survey of algorithmic methods for partially observed Markov decision processes. Ann. Oper. Res. 28(1), 47–65 (1991)
Article MathSciNet Google Scholar
C. Guestrin, D. Koller, R. Parr, Multiagent planning with factored MDPs, in Advances in Neural Information Processing Systems (2002), pp. 1523–1530
Google Scholar
D. Silver, A. Huang, C.J. Maddison, A. Guez, L. Sifre, G. Van Den Driessche, J. Schrittwieser et al., Mastering the game of Go with deep neural networks and tree search. Nature 529(7587), 484 (2016)
Google Scholar
W. Lam, S.T.S. Lu Liu, A.R. Prasad, Z. Vacheri, A. Doan, Muppet: MapReduce-style processing of fast data. Proc. VLDB Endow. 5(12), 1814–1825 (2012)
Article Google Scholar
Ó. Fontenla-Romero, B. Guijarro-Berdiñas, D. Martinez-Rego, B. Pérez-Sánchez, D Peteiro-Barral, Online machine learning, in Efficiency and Scalability Methods for Computational Intellect (2013), pp. 27–54
Google Scholar
G. Widmer, M. Kubat, Learning in the presence of concept drift and hidden contexts. Mach. Learn. 23(1), 69–101 (1996)
Google Scholar
P. Domingos, G. Hulten, Mining high-speed data streams, in Proceedings of the Sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2000), pp. 71–80
Google Scholar
D. Crankshaw, X. Wang, G. Zhou, M.J. Franklin, J.E. Gonzalez, I. Stoica, Clipper: a low-latency online prediction serving system, in NSDI (2017), pp. 613–627
Google Scholar
M. Li, D.G. Andersen, J.W. Park, A.J. Smola, A. Ahmed, V. Josifovski, J. Long, E.J. Shekita, S. Bor-Yiing, Scaling distributed machine learning with the parameter server. OSDI 14, 583–598 (2014)
Google Scholar
A. Smola, S. Narayanamurthy, An architecture for parallel topic models. Proc. VLDB Endow. 3(1–2), 703–710 (2010)
Article Google Scholar
B. Fitzpatrick, Distributed caching with memcached. Linux J. 124, 5 (2004)
Google Scholar
Q. Ho, J. Cipar, H. Cui, S. Lee, J.K. Kim, P.B. Gibbons, G.A. Gibson, G. Ganger, E.P. Xing, More effective distributed ml via a stale synchronous parallel parameter server, in Advances in Neural Information Processing systems (2013), pp. 1223–1231
Google Scholar
M. Li, L. Zhou, Z. Yang, A. Li, F. Xia, D.G. Andersen, A. Smola, Parameter server for distributed machine learning. Big Learn. NIPS Works. 6, 2 (2013)
Google Scholar
J. Dean, G. Corrado, R. Monga, K. Chen, M. Devin, M. Mao, A. Senior et al., Large scale distributed deep networks, in Advances in Neural Information Processing Systems (2012), pp. 1223–1231
Google Scholar
J. Zhou, Q. Cui, X. Li, P. Zhao, S. Qu, J. Huang, PSMART: parameter server based multiple additive regression trees system, in Proceedings of the 26th International Conference on World Wide Web Companion (2017), pp. 879–880
Google Scholar
A. Nair, P. Srinivasan, S. Blackwell, C. Alcicek, R. Fearon, A. De Maria, V. Panneershelvam et al., Massively parallel methods for deep reinforcement learning (2015). arXiv preprint arXiv:1507.04296
A.A. Benczúr, L. Kocsis, R. Pálovics, Online machine learning in big data streams (2018). arXiv preprint arXiv:1802.05872
A.P. Dawid, Present position and potential developments: some personal views: statistical theory: the prequential approach. J. Royal Stat. Soc. Series A (General), 278–292 (1984)
Google Scholar
P. Zhao, S.C.H. Hoi, R. Jin, T. Yang, Online AUC maximization, in ICML (2011)
Google Scholar
S. Agarwal, V. Vijaya Saradhi, H. Karnick, Kernel-based online machine learning and support vector reduction. Neurocomputing 71(7–9), 1230–1237 (2008)
Article Google Scholar
R.S. Sutton, A.G. Barto, F. Bach, Reinforcement Learning: An Introduction (MIT Press, 1998)
Google Scholar
J. Langford, L. Li, T. Zhang, Sparse online learning via truncated gradient. J. Mach. Learn. Res. 10, 777–801 (2009)
Google Scholar
A. Bordes, L. Bottou, The huller: a simple and efficient online SVM, in European Conference on Machine Learning (2005), pp. 505–512
Google Scholar

Download references

Author information

Authors and Affiliations

National Informatics Centre, New Delhi, Delhi, India
Dr. C.S.R. Prabhu
Advanced Analytics Institute, University of Technology, Sydney, Ultimo, NSW, Australia
Dr. Aneesh Sreevallabh Chivukula
Saarland University, Saarbrücken, Saarland, Germany
Dr. Aditya Mogadala
Qure.ai, Goregaon East, Mumbai, Maharashtra, India
Rohit Ghosh
School of Computing Science and Engineering, Vellore Institute of Technology, Chennai, Tamil Nadu, India
Dr. L.M. Jenila Livingston

Authors

Dr. C.S.R. Prabhu
View author publications
You can also search for this author in PubMed Google Scholar
Dr. Aneesh Sreevallabh Chivukula
View author publications
You can also search for this author in PubMed Google Scholar
Dr. Aditya Mogadala
View author publications
You can also search for this author in PubMed Google Scholar
Rohit Ghosh
View author publications
You can also search for this author in PubMed Google Scholar
Dr. L.M. Jenila Livingston
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to C.S.R. Prabhu .

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Prabhu, C., Chivukula, A., Mogadala, A., Ghosh, R., Livingston, L. (2019). Machine Learning Algorithms for Big Data. In: Big Data Analytics: Systems, Algorithms, Applications. Springer, Singapore. https://doi.org/10.1007/978-981-15-0094-7_6

Download citation

DOI: https://doi.org/10.1007/978-981-15-0094-7_6
Published: 15 October 2019
Publisher Name: Springer, Singapore
Print ISBN: 978-981-15-0093-0
Online ISBN: 978-981-15-0094-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics