Unleashing Machine Learning onto Big Data: Issues, Challenges and Trends

  • Roheet BhatnagarEmail author
Part of the Studies in Computational Intelligence book series (SCI, volume 801)


In modern digital world, we have data deluge, but still starving for information. Big Data era is characterized by vast amounts of data sized in the order of petabytes or even exabytes coming at high speed from variety of sources. These unstructured data have got tremendous potential, but Big data by itself has no value unless it is processed leading to derivation of meaningful insights. This is where Machine Learning comes into picture which helps machine to learn and act on its own. Machine Learning can help us to sniff through enormous quantities of data, process them and get meaningful results. The confluence of Big Data and Machine Learning is allowing organizations to automate and improve complex descriptive, predictive and prescriptive analytical tasks and arriving at informed decision making. This is to say that, harnessing the value & power of Big Data can offer great insights to the companies with the help of Machine Learning (ML) increasing their revenues and providing a competitive advantage over their rivals. Machine Learning is acting as a catalyst to derive tangible value from Big Data and serving as key to unlocking the potential of Big Data Analytic. The management of big data gives rise to concerns regarding data collection efficiency, data processing, analytic, and security thereby opening new paradigms of research & innovations. This is a hot research area and amalgamation of Machine Learning with Big Data is proving to be major performance booster providing information which were hidden and not to be seen earlier. ML based algorithms and development in the area are explored and discussed at length in this chapter. It focuses on applications of Machine Learning to Big Data, issues, challenges and most recent trends in the area.


Machine learning Big data Predictive and prescriptive analytical 


  1. 1.
    Sandryhaila, A., Moura, J.M.: Big data analysis with signal processing on graphs: representation and processing of massive data sets with irregular structure. IEEE Signal Proc. Mag. 31(5), 80–90 (2014)CrossRefGoogle Scholar
  2. 2.
    Gantz, J., Reinsel, D.: Extracting value from chaos technical report white paper. International Data Corporation (IDC) Sponsored by EMC Corporation (2011)Google Scholar
  3. 3.
    Gantz, J., Reinsel, D.: The Digital Universe Decade. Are You Ready (2010)Google Scholar
  4. 4.
    Press, G.: 6 predictions for the $125 billion big data analytics market in 2015 (2014)Google Scholar
  5. 5.
    The evolution of big data, and where were headed|wired. Accessed on 06 Oct 2017
  6. 6.
  7. 7.
    Hype cycle for big data. Accessed 06 Oct 2017 (2014)
  8. 8.
    Hype cycle—wikipedia. Accessed 06 Oct 2017
  9. 9.
    2017 gartner hype cycle for emerging technologies: Ai, ar/vr, digital platforms|what’s the big data? Accessed 06 Oct 2017
  10. 10.
    Shafer, T.: The 42 v’s of big data and data science.
  11. 11.
    Soubra, D., Steve Laney, D., Malak, M., Rennhackkamp, M., Reply, P.: The 3 Vs that define big data.
  12. 12.
  13. 13.
    Biehn, P.N.: The missing v’s in big data: viability and value. (2015)
  14. 14.
  15. 15.
    Vorhies, W.: How many “v’s” in big data? the characteristics that define big data.
  16. 16.
    Bu, Y., Borkar, V., Carey, M.J., Rosen, J., Polyzotis, N., Condie, T., Weimer, M., Ramakrishnan, R.: Scaling datalog for machine learning on big data. arXiv preprint arXiv:1203.0160 (2012)
  17. 17.
    Rodríguez-Mazahua, L., Rodríguez-Enríquez, C.A., Sánchez-Cervantes, J.L., Cervantes, J., García-Alcaraz, J.L., Alor-Hernández, G.: A general perspective of big data: applications, tools, challenges and trends. J. Supercomput. 72(8), 3073–3113 (2016)CrossRefGoogle Scholar
  18. 18.
    What is the difference between artificial intelligence and machine learning? Accessed 06 Oct 2017
  19. 19.
    Jordan, M.I., Mitchell, T.M.: Machine learning: trends, perspectives, and prospects. Science 349(6245), 255–260 (2015)MathSciNetCrossRefGoogle Scholar
  20. 20.
    Fayyad, U., Piatetsky-Shapiro, G., Smyth, P.: From data mining to knowledge discovery in databases. AI Mag. 17(3), 37 (1996)Google Scholar
  21. 21.
    Ingersoll, G.: Introducing apache mahout. IBM Developer Works Technical Library (2009)Google Scholar
  22. 22.
    Mikut, R., Reischl, M.: Data mining tools. Wiley Interdisc. Rev. Data Min. Knowl. Discov. 1(5), 431–443 (2011)CrossRefGoogle Scholar
  23. 23.
    Chen, H., Chiang, R.H., Storey, V.C.: Business intelligence and analytics: from big data to big impact. MIS quarterly 36(4) (2012)Google Scholar
  24. 24.
    Dietrich, D., Heller, B., Yang, B.: Data science & big data analytics: discovering, analyzing, visualizing and presenting data (2015)Google Scholar
  25. 25.
    Chopra, A., Madan, S.: Big data: a trouble or a real solution? Int. J. Comput. Sci. Issues (IJCSI) 12(2), 221 (2015)Google Scholar
  26. 26.
    Twardowski, B., Ryzko, D.: Multi-agent architecture for real-time big data processing. In: 2014 IEEE/WIC/ACM International Joint Conferences on Web Intelligence (WI) and Intelligent Agent Technologies (IAT), vol. 3, pp. 333–337. IEEE (2014)Google Scholar
  27. 27.
    Amatriain, X.: Mining large streams of user data for personalized recommendations. ACM SIGKDD Explor. Newslett. 14(2), 37–48 (2013)CrossRefGoogle Scholar
  28. 28.
    Richter, A.N., Khoshgoftaar, T.M., Landset, S., Hasanin, T.: A multi-dimensional comparison of toolkits for machine learning with big data. In: 2015 IEEE International Conference on Information Reuse and Integration (IRI), pp. 1–8. IEEE (2015)Google Scholar
  29. 29.
    Wold, S., Esbensen, K., Geladi, P.: Principal component analysis. Chemom. Intell. Lab. Syst. 2(1–3), 37–52 (1987)CrossRefGoogle Scholar
  30. 30.
    Agneeswaran, V.S., et al.: Big-data-theoretical, engineering and analytics perspective. In: BDA. pp. 8–15. Springer (2012)Google Scholar
  31. 31.
    Lehmann, D., Fekete, D., Vossen, G.: Technology selection for big data and analytical applications. Technical Report, Working Papers, ERCIS-European Research Center for Information Systems (2016)Google Scholar
  32. 32.
    A short history of machine learning—every manager should read. Accessed on 06 Oct 2017
  33. 33.
    Qiu, J., Wu, Q., Ding, G., Xu, Y., Feng, S.: A survey of machine learning for big data processing. EURASIP J. Adv. Signal Proc. 2016(1), 67 (2016)CrossRefGoogle Scholar
  34. 34.
    Zheng, J., Shen, F., Fan, H., Zhao, J.: An online incremental learning support vector machine for large-scale data. Neural Comput. Appl. 22(5), 1023–1035 (2013)CrossRefGoogle Scholar
  35. 35.
    Mitchell, T.M., et al.: Machine Learning. WCB (1997)Google Scholar
  36. 36.
    Ghosh, C., Cordeiro, C., Agrawal, D.P., Rao, M.B.: Markov chain existence and hidden markov models in spectrum sensing. In: IEEE International Conference on Pervasive Computing and Communications, 2009. PerCom 2009, pp. 1–6. IEEE (2009)Google Scholar
  37. 37.
    Yue, K., Fang, Q., Wang, X., Li, J., Liu, W.: A parallel and incremental approach for data-intensive learning of bayesian networks. IEEE Trans. Cybern. 45(12), 2890–2904 (2015)CrossRefGoogle Scholar
  38. 38.
    Dong, X., Li, Y., Wu, C., Cai, Y.: A learner based on neural network for cognitive radio. In: 2010 12th IEEE International Conference on Communication Technology (ICCT), pp. 893–896. IEEE (2010)Google Scholar
  39. 39.
    Safatly, L., Bkassiny, M., Al-Husseini, M., El-Hajj, A.: Cognitive radio transceivers: Rf, spectrum sensing, and learning algorithms review. Int. J. Antennas Propag. (2014)Google Scholar
  40. 40.
    Bkassiny, M., Jayaweera, S.K., Li, Y.: Multidimensional dirichlet process-based non-parametric signal classification for autonomous self-learning cognitive radios. IEEE Trans. Wirel. Commun. 12(11), 5413–5423 (2013)CrossRefGoogle Scholar
  41. 41.
    Das, T.K., Gosavi, A., Mahadevan, S., Marchalleck, N.: Solving semi-markov decision problems using average reward reinforcement learning. Manage. Sci. 45(4), 560–574 (1999)CrossRefGoogle Scholar
  42. 42.
    Sutton, R.S.: Learning to predict by the methods of temporal differences. Mach. Learn. 3(1), 9–44 (1988)Google Scholar
  43. 43.
    Najafabadi, M.M., Villanustre, F., Khoshgoftaar, T.M., Seliya, N., Wald, R., Muharemagic, E.: Deep learning applications and challenges in big data analytics. J. Big Data 2(1), 1 (2015)CrossRefGoogle Scholar
  44. 44.
    Ryohei, F., Satoshi, M.: The most advanced data mining of the big data era. NEC Tech. J. 7(2), 91–95 (2012)Google Scholar
  45. 45.
    Jones, N.: The learning machines. Nature 505(7482), 146 (2014)CrossRefGoogle Scholar
  46. 46.
    Langford, J.: Tutorial on practical prediction theory for classification. J. Mach. Learn. Res. 6(Mar), 273–306 (2005)Google Scholar
  47. 47.
    Bekkerman, R., El-Yaniv, R., Tishby, N., Winter, Y.: Distributional word clusters versus words for text categorization. J. Mach. Learn. Res. 3(Mar), 1183–1208 (2003)Google Scholar
  48. 48.
    Zhou, L., Pan, S., Wang, J., Vasilakos, A.V.: Machine learning on big data: opportunities and challenges. Neurocomputing 237, 350–361 (2017)CrossRefGoogle Scholar
  49. 49.
    Bengio, Y., Courville, A., Vincent, P.: Representation learning: a review and new perspectives. IEEE Trans. Pattern Anal. Mach. Intell. 35(8), 1798–1828 (2013)CrossRefGoogle Scholar
  50. 50.
    Chen, Q., Zobel, J., Verspoor, K.: Evaluation of a machine learning duplicate detection method for bioinformatics databases. In: Proceedings of the ACM Ninth International Workshop on Data and Text Mining in Biomedical Informatics, pp. 4–12. ACM (2015)Google Scholar
  51. 51.
    Rakthanmanon, T., Campana, B., Mueen, A., Batista, G., Westover, B., Zhu, Q., Zakaria, J., Keogh, E.: Addressing big data time series: mining trillions of time series subsequences under dynamic time warping. ACM Trans. Knowl. Discov. Data (TKDD) 7(3), 10 (2013)Google Scholar
  52. 52.
    García, S., Ramírez-Gallego, S., Luengo, J., Benítez, J.M., Herrera, F.: Big data preprocessing: methods and prospects. Big Data Anal. 1(1), 9 (2016)CrossRefGoogle Scholar
  53. 53.
    Cao, L., Wei, M., Yang, D., Rundensteiner, E.A.: Online outlier exploration over large datasets. In: Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 89–98. ACM (2015)Google Scholar
  54. 54.
    Gandomi, A., Haider, M.: Beyond the hype: big data concepts, methods, and analytics. Int. J. Inf. Manage. 35(2), 137–144 (2015)CrossRefGoogle Scholar
  55. 55.
    Cai, X., Nie, F., Huang, H.: Multi-view k-means clustering on big data. In: IJCAI, pp. 2598–2604 (2013)Google Scholar
  56. 56.
    Ramírez-Gallego, S., García, S., Mouriño-Talín, H., Martínez-Rego, D., Bolón-Canedo, V., Alonso-Betanzos, A., Benítez, J.M., Herrera, F.: Data discretization: taxonomy and big data challenge. Wiley Interdisc. Rev. Data Min. Knowl. Discov. 6(1), 5–21 (2016)CrossRefGoogle Scholar
  57. 57.
    Zhang, Y., Cheung, Y.M.: Discretizing numerical attributes in decision tree for big data analysis. In: 2014 IEEE International Conference on Data Mining Workshop (ICDMW), pp. 1150–1157. IEEE (2014)Google Scholar
  58. 58.
    Nguyen-Dinh, L.V., Rossi, M., Blanke, U., Tröster, G.: Combining crowd-generated media and personal data: semi-supervised learning for context recognition. In: Proceedings of the 1st ACM International Workshop on Personal Data Meets Distributed Multimedia, pp. 35–38. ACM (2013)Google Scholar
  59. 59.
    Al-Jarrah, O., Siddiqui, A., Elsalamouny, M., Yoo, P.D., Muhaidat, S., Kim, K.: Machine-learning-based feature selection techniques for large-scale network intrusion detection. In: 2014 IEEE 34th International Conference on Distributed Computing Systems Workshops (ICDCSW), pp. 177–181. IEEE (2014)Google Scholar
  60. 60.
    Abadi, M., Agarwal, A., Barham, P., Brevdo, E., Chen, Z., Citro, C., Corrado, G.S., Davis, A., Dean, J., Devin, M., et al.: Tensorflow: Large-scale Machine Learning on Heterogeneous Distributed Systems. arXiv:1603.04467 (2016)
  61. 61.
    You, Y., Fu, H., Song, S.L., Randles, A., Kerbyson, D., Marquez, A., Yang, G., Hoisie, A.: Scaling support vector machines on modern hpc platforms. J. Parallel Distrib. Comput. 76, 16–31 (2015)CrossRefGoogle Scholar
  62. 62.
    Panda, B., Herbach, J.S., Basu, S., Bayardo, R.J.: Planet: massively parallel learning of tree ensembles with mapreduce. Proc. VLDB Endow. 2(2), 1426–1437 (2009)CrossRefGoogle Scholar
  63. 63.
    Xing, E.P., Ho, Q., Dai, W., Kim, J.K., Wei, J., Lee, S., Zheng, X., Xie, P., Kumar, A., Yu, Y.: Petuum: a new platform for distributed machine learning on big data. IEEE Trans. Big Data 1(2), 49–67 (2015)CrossRefGoogle Scholar
  64. 64.
    Ahmed, E., Yaqoob, I., Hashem, I.A.T., Khan, I., Ahmed, A.I.A., Imran, M., Vasilakos, A.V.: The role of big data analytics in internet of things. Comput. Netw. 129, 459–471 (2017)CrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.Department of Computer Science & EngineeringManipal University JaipurJaipurIndia

Personalised recommendations