A Novel Hybrid Technique for Big Data Classification Using Decision Tree Learning

Ahlawat, Khyati; Singh, Amit Prakash

doi:10.1007/978-981-10-6427-2_10

Khyati Ahlawat¹² &
Amit Prakash Singh¹²

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 775))

Included in the following conference series:

International Conference on Computational Intelligence, Communications, and Business Analytics

872 Accesses
3 Citations

Abstract

Machine learning techniques have proven to be very effective in classification scenarios. However, learning approaches of these algorithms remain unsuccessful to scale up to massive sized data. To handle and classify voluminous big data efficiently, a hybrid algorithm of logical combination of machine learning technique decision tree along with big data analytical platform Hive is introduced in this paper. The objective of proposed solution is to uplift the efficiency of traditional decision tree learning classifier by using map reduce based framework. Experiments designed on big dataset using proposed hybrid approach have shown promising results in terms of improved accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

References

Patel, A.B., Birla, M., Nair, U.: Addressing big data problem using Hadoop and map reduce. In: Nirma University International Conference on Engineering (2012)
Google Scholar
Baldominos, A., Albacete, E., et al.: A Scalable Machine Learning Online Service for Big Data Real-Time Analysis. IEEE, New York (2014)
Google Scholar
Gandomi, A., Haider, M.: Beyond the hype: big data concepts, methods and analytics. In: International Journal of Information Management, Elsevier, pp. 137–144 (2015)
Google Scholar
Reshmy, A.K., Paulraj, D.: An Efficient Unstructured Big Data Analysis Method for Enhancing Performance using Machine Learning Algorithm. IEEE, New York (2015)
Google Scholar
Katal, A., Mazid, M., Goudar, R.H.: Big data: issues, challenges, tools and good practices. In: Sixth IEEE International Conference on Contemporary Computing, pp. 8–10 (2013)
Google Scholar
Tsai, C.-W., et al.: Big data analytics: a survey. In: Journal of Big Data. Springer, New York (2015)
Google Scholar
Alam, F., Mehmood, R., Katib, I., Albeshri, A.: Analysis of eight data mining algorithms for smarter internet of things(IoT). In: Procedia Computer Science, pp. 437–442 (2016)
Google Scholar
Fazal-e-Amin, I.A., Alghamdi, A.S.: Big data for C4I systems: goals, applications, challenges and tools. In: Fifth International Conference on Innovative Computing Technology. IEEE (2015)
Google Scholar
Varian, H.R.: Big data: new tricks for econometrics. J. Econ. Perspect. 28, 3–28 (2014)
Article Google Scholar
Hu, H., Wen, Y., Chua, T.S., Li, X.: Toward scalable systems for big data analytics: a technology tutorial. IEEE J. Mag. 2, 652–687 (2014)
Google Scholar
Yang, H., Fong, S.: Incrementally Optimized Decision Tree for Noisy Big Data. ACM, New York (2012)
Google Scholar
Rodger, J.A.: Discovery of medical big data analytics: improving the prediction of traumatic brain injury survival rates by data mining patient informatics processing software hybrid Hadoop hive. In: Informatics in Medicine Unlocked, pp. 17–26 (2015)
Google Scholar
Maillo, J., et al.: A MapReduce-based k-Nearest Neighbor Approach for Big Data Classification. IEEE Trustcom, New York (2015)
Google Scholar
Maillo, J., et al.: kNN-IS: An Iterative Spark-Based Design of the k-Nearest Neighbors Classifier for Big Data. Knowledge Based Systems. Elsevier, Amerstem (2016)
Google Scholar
Han, J., Kamber, M., Pei, J.: Data Mining Concepts and Techniques. Morgan Kaufmann, Elsevier, Burlington (2012)
MATH Google Scholar
Zheng, J., Dagnino, A.: An initial study of predictive machine learning analytics on large volumes of historical data for power system applications. In: International Conference on Big Data, pp. 952–959. IEEE (2014)
Google Scholar
Yue, K., et al.: A parallel and incremental approach for data- intensive learning of bayesian networks. In: IEEE Transactions on Cybernetics (2015)
Google Scholar
Ali-ud-din Khan, M., Uddin, M.F., Gupta, N.: Seven V’s of big data understanding big data to extract value. In: Conference of the American Society for Engineering Education. IEEE (2014)
Google Scholar
Ghazi, M.R., Gangodkar, D.: Hadoop, MapReduce and HDFS: a developers perspective. In: Procedia Computer Science, pp. 45–50 (2015)
Google Scholar
Prasada Babu, M.S., Hanumanth Sastry, S.: Big Data and Predictive Analytics in ERP Systems for Automating Decision Making Process. IEEE, New York (2014)
Google Scholar
Al-Jarrah, O.Y., Yoo, P.D., et al.: Efficient Machine Learning for Big Data: A Review. Big Data Research. Elsevier, Amerstem (2015)
Google Scholar
Chandarana, P., Vijayalakshmi, M.: Big data analytics frameworks. In: International Conference on Circuits, Systems, Communication and Information Technology Applications, pp. 430–434. IEEE (2014)
Google Scholar
Zhang, P., et al.: Short-term load forecasting based on big data technologies. CSEE J. Power Energy Syst. 1, 59–67 (2015)
Google Scholar
Pandey, R., Dhoundiyal, M.: Quantitative evaluation of big data categorical variables through R. In: Procedia Computer Science, pp. 582–588 (2015)
Google Scholar
Uskenbayeva, R., et al.: Integrating of data using the Hadoop and R. In: Procedia Computer Science, pp. 145–149 (2015)
Google Scholar
Wang, R., et al.: Learning ELM-tree from big data based on uncertainty reduction. In: Fuzzy Sets and Systems. Elsevier, Amerstem (2015)
Google Scholar
Landset, S., et al.: A survey of open source tools for machine learning with big data in the Hadoop ecosystem. In: Journal of Big Data. Springer, New York (2015)
Google Scholar
Rio, S.D., et al.: On the use of MapReduce for imbalanced big data using Random Forest. In: Information Sciences. Elsevier, Amerstem (2014)
Google Scholar
Maitrey, S., Jha, C.K.: Handling big data efficiently by using map reduce technique. In: International Conference on Computational Intelligence & Communication Technology. IEEE (2015)
Google Scholar
Sruthika, S., Tajunisha, N.: A study on evolution of data analytics to big data analytics and its research scope. In: 2nd International Conference on Innovations in Information Embedded and Communications Systems. IEEE (2015)
Google Scholar
Zang, W., et al.: Comparative study between incremental and ensemble learning on data streams: case study. J. Big Data 1, 5 (2014)
Google Scholar
Wu, X., Zhu, X., et al.: Data mining with big data. In: IEEE Transactions on Knowledge and Data Engineering (2014)
Google Scholar

Download references

Author information

Authors and Affiliations

University School of Information Communication and Technology, Guru Gobind Singh Indraprastha University, New Delhi, India
Khyati Ahlawat & Amit Prakash Singh

Authors

Khyati Ahlawat
View author publications
You can also search for this author in PubMed Google Scholar
Amit Prakash Singh
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Khyati Ahlawat .

Editor information

Editors and Affiliations

Department of Computer Science and Engineering, University of Kalyani, Kalyani, West Bengal, India
J. K. Mandal
Department of Computer and System Sciences, Visva Bharati University, Bolpur Santiniketan, West Bengal, India
Paramartha Dutta
Department of Information Technology, Calcutta Business School, Kolkata, India
Somnath Mukhopadhyay

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ahlawat, K., Singh, A.P. (2017). A Novel Hybrid Technique for Big Data Classification Using Decision Tree Learning. In: Mandal, J., Dutta, P., Mukhopadhyay, S. (eds) Computational Intelligence, Communications, and Business Analytics. CICBA 2017. Communications in Computer and Information Science, vol 775. Springer, Singapore. https://doi.org/10.1007/978-981-10-6427-2_10

Download citation

DOI: https://doi.org/10.1007/978-981-10-6427-2_10
Published: 24 September 2017
Publisher Name: Springer, Singapore
Print ISBN: 978-981-10-6426-5
Online ISBN: 978-981-10-6427-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics