Skip to main content

A Novel Hybrid Technique for Big Data Classification Using Decision Tree Learning

  • Conference paper
  • First Online:
Computational Intelligence, Communications, and Business Analytics (CICBA 2017)

Abstract

Machine learning techniques have proven to be very effective in classification scenarios. However, learning approaches of these algorithms remain unsuccessful to scale up to massive sized data. To handle and classify voluminous big data efficiently, a hybrid algorithm of logical combination of machine learning technique decision tree along with big data analytical platform Hive is introduced in this paper. The objective of proposed solution is to uplift the efficiency of traditional decision tree learning classifier by using map reduce based framework. Experiments designed on big dataset using proposed hybrid approach have shown promising results in terms of improved accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

References

  1. Patel, A.B., Birla, M., Nair, U.: Addressing big data problem using Hadoop and map reduce. In: Nirma University International Conference on Engineering (2012)

    Google Scholar 

  2. Baldominos, A., Albacete, E., et al.: A Scalable Machine Learning Online Service for Big Data Real-Time Analysis. IEEE, New York (2014)

    Google Scholar 

  3. Gandomi, A., Haider, M.: Beyond the hype: big data concepts, methods and analytics. In: International Journal of Information Management, Elsevier, pp. 137–144 (2015)

    Google Scholar 

  4. Reshmy, A.K., Paulraj, D.: An Efficient Unstructured Big Data Analysis Method for Enhancing Performance using Machine Learning Algorithm. IEEE, New York (2015)

    Google Scholar 

  5. Katal, A., Mazid, M., Goudar, R.H.: Big data: issues, challenges, tools and good practices. In: Sixth IEEE International Conference on Contemporary Computing, pp. 8–10 (2013)

    Google Scholar 

  6. Tsai, C.-W., et al.: Big data analytics: a survey. In: Journal of Big Data. Springer, New York (2015)

    Google Scholar 

  7. Alam, F., Mehmood, R., Katib, I., Albeshri, A.: Analysis of eight data mining algorithms for smarter internet of things(IoT). In: Procedia Computer Science, pp. 437–442 (2016)

    Google Scholar 

  8. Fazal-e-Amin, I.A., Alghamdi, A.S.: Big data for C4I systems: goals, applications, challenges and tools. In: Fifth International Conference on Innovative Computing Technology. IEEE (2015)

    Google Scholar 

  9. Varian, H.R.: Big data: new tricks for econometrics. J. Econ. Perspect. 28, 3–28 (2014)

    Article  Google Scholar 

  10. Hu, H., Wen, Y., Chua, T.S., Li, X.: Toward scalable systems for big data analytics: a technology tutorial. IEEE J. Mag. 2, 652–687 (2014)

    Google Scholar 

  11. Yang, H., Fong, S.: Incrementally Optimized Decision Tree for Noisy Big Data. ACM, New York (2012)

    Google Scholar 

  12. Rodger, J.A.: Discovery of medical big data analytics: improving the prediction of traumatic brain injury survival rates by data mining patient informatics processing software hybrid Hadoop hive. In: Informatics in Medicine Unlocked, pp. 17–26 (2015)

    Google Scholar 

  13. Maillo, J., et al.: A MapReduce-based k-Nearest Neighbor Approach for Big Data Classification. IEEE Trustcom, New York (2015)

    Google Scholar 

  14. Maillo, J., et al.: kNN-IS: An Iterative Spark-Based Design of the k-Nearest Neighbors Classifier for Big Data. Knowledge Based Systems. Elsevier, Amerstem (2016)

    Google Scholar 

  15. Han, J., Kamber, M., Pei, J.: Data Mining Concepts and Techniques. Morgan Kaufmann, Elsevier, Burlington (2012)

    MATH  Google Scholar 

  16. Zheng, J., Dagnino, A.: An initial study of predictive machine learning analytics on large volumes of historical data for power system applications. In: International Conference on Big Data, pp. 952–959. IEEE (2014)

    Google Scholar 

  17. Yue, K., et al.: A parallel and incremental approach for data- intensive learning of bayesian networks. In: IEEE Transactions on Cybernetics (2015)

    Google Scholar 

  18. Ali-ud-din Khan, M., Uddin, M.F., Gupta, N.: Seven V’s of big data understanding big data to extract value. In: Conference of the American Society for Engineering Education. IEEE (2014)

    Google Scholar 

  19. Ghazi, M.R., Gangodkar, D.: Hadoop, MapReduce and HDFS: a developers perspective. In: Procedia Computer Science, pp. 45–50 (2015)

    Google Scholar 

  20. Prasada Babu, M.S., Hanumanth Sastry, S.: Big Data and Predictive Analytics in ERP Systems for Automating Decision Making Process. IEEE, New York (2014)

    Google Scholar 

  21. Al-Jarrah, O.Y., Yoo, P.D., et al.: Efficient Machine Learning for Big Data: A Review. Big Data Research. Elsevier, Amerstem (2015)

    Google Scholar 

  22. Chandarana, P., Vijayalakshmi, M.: Big data analytics frameworks. In: International Conference on Circuits, Systems, Communication and Information Technology Applications, pp. 430–434. IEEE (2014)

    Google Scholar 

  23. Zhang, P., et al.: Short-term load forecasting based on big data technologies. CSEE J. Power Energy Syst. 1, 59–67 (2015)

    Google Scholar 

  24. Pandey, R., Dhoundiyal, M.: Quantitative evaluation of big data categorical variables through R. In: Procedia Computer Science, pp. 582–588 (2015)

    Google Scholar 

  25. Uskenbayeva, R., et al.: Integrating of data using the Hadoop and R. In: Procedia Computer Science, pp. 145–149 (2015)

    Google Scholar 

  26. Wang, R., et al.: Learning ELM-tree from big data based on uncertainty reduction. In: Fuzzy Sets and Systems. Elsevier, Amerstem (2015)

    Google Scholar 

  27. Landset, S., et al.: A survey of open source tools for machine learning with big data in the Hadoop ecosystem. In: Journal of Big Data. Springer, New York (2015)

    Google Scholar 

  28. Rio, S.D., et al.: On the use of MapReduce for imbalanced big data using Random Forest. In: Information Sciences. Elsevier, Amerstem (2014)

    Google Scholar 

  29. Maitrey, S., Jha, C.K.: Handling big data efficiently by using map reduce technique. In: International Conference on Computational Intelligence & Communication Technology. IEEE (2015)

    Google Scholar 

  30. Sruthika, S., Tajunisha, N.: A study on evolution of data analytics to big data analytics and its research scope. In: 2nd International Conference on Innovations in Information Embedded and Communications Systems. IEEE (2015)

    Google Scholar 

  31. Zang, W., et al.: Comparative study between incremental and ensemble learning on data streams: case study. J. Big Data 1, 5 (2014)

    Google Scholar 

  32. Wu, X., Zhu, X., et al.: Data mining with big data. In: IEEE Transactions on Knowledge and Data Engineering (2014)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Khyati Ahlawat .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer Nature Singapore Pte Ltd.

About this paper

Cite this paper

Ahlawat, K., Singh, A.P. (2017). A Novel Hybrid Technique for Big Data Classification Using Decision Tree Learning. In: Mandal, J., Dutta, P., Mukhopadhyay, S. (eds) Computational Intelligence, Communications, and Business Analytics. CICBA 2017. Communications in Computer and Information Science, vol 775. Springer, Singapore. https://doi.org/10.1007/978-981-10-6427-2_10

Download citation

  • DOI: https://doi.org/10.1007/978-981-10-6427-2_10

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-10-6426-5

  • Online ISBN: 978-981-10-6427-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics