Abstract
Machine learning techniques have proven to be very effective in classification scenarios. However, learning approaches of these algorithms remain unsuccessful to scale up to massive sized data. To handle and classify voluminous big data efficiently, a hybrid algorithm of logical combination of machine learning technique decision tree along with big data analytical platform Hive is introduced in this paper. The objective of proposed solution is to uplift the efficiency of traditional decision tree learning classifier by using map reduce based framework. Experiments designed on big dataset using proposed hybrid approach have shown promising results in terms of improved accuracy.
References
Patel, A.B., Birla, M., Nair, U.: Addressing big data problem using Hadoop and map reduce. In: Nirma University International Conference on Engineering (2012)
Baldominos, A., Albacete, E., et al.: A Scalable Machine Learning Online Service for Big Data Real-Time Analysis. IEEE, New York (2014)
Gandomi, A., Haider, M.: Beyond the hype: big data concepts, methods and analytics. In: International Journal of Information Management, Elsevier, pp. 137–144 (2015)
Reshmy, A.K., Paulraj, D.: An Efficient Unstructured Big Data Analysis Method for Enhancing Performance using Machine Learning Algorithm. IEEE, New York (2015)
Katal, A., Mazid, M., Goudar, R.H.: Big data: issues, challenges, tools and good practices. In: Sixth IEEE International Conference on Contemporary Computing, pp. 8–10 (2013)
Tsai, C.-W., et al.: Big data analytics: a survey. In: Journal of Big Data. Springer, New York (2015)
Alam, F., Mehmood, R., Katib, I., Albeshri, A.: Analysis of eight data mining algorithms for smarter internet of things(IoT). In: Procedia Computer Science, pp. 437–442 (2016)
Fazal-e-Amin, I.A., Alghamdi, A.S.: Big data for C4I systems: goals, applications, challenges and tools. In: Fifth International Conference on Innovative Computing Technology. IEEE (2015)
Varian, H.R.: Big data: new tricks for econometrics. J. Econ. Perspect. 28, 3–28 (2014)
Hu, H., Wen, Y., Chua, T.S., Li, X.: Toward scalable systems for big data analytics: a technology tutorial. IEEE J. Mag. 2, 652–687 (2014)
Yang, H., Fong, S.: Incrementally Optimized Decision Tree for Noisy Big Data. ACM, New York (2012)
Rodger, J.A.: Discovery of medical big data analytics: improving the prediction of traumatic brain injury survival rates by data mining patient informatics processing software hybrid Hadoop hive. In: Informatics in Medicine Unlocked, pp. 17–26 (2015)
Maillo, J., et al.: A MapReduce-based k-Nearest Neighbor Approach for Big Data Classification. IEEE Trustcom, New York (2015)
Maillo, J., et al.: kNN-IS: An Iterative Spark-Based Design of the k-Nearest Neighbors Classifier for Big Data. Knowledge Based Systems. Elsevier, Amerstem (2016)
Han, J., Kamber, M., Pei, J.: Data Mining Concepts and Techniques. Morgan Kaufmann, Elsevier, Burlington (2012)
Zheng, J., Dagnino, A.: An initial study of predictive machine learning analytics on large volumes of historical data for power system applications. In: International Conference on Big Data, pp. 952–959. IEEE (2014)
Yue, K., et al.: A parallel and incremental approach for data- intensive learning of bayesian networks. In: IEEE Transactions on Cybernetics (2015)
Ali-ud-din Khan, M., Uddin, M.F., Gupta, N.: Seven V’s of big data understanding big data to extract value. In: Conference of the American Society for Engineering Education. IEEE (2014)
Ghazi, M.R., Gangodkar, D.: Hadoop, MapReduce and HDFS: a developers perspective. In: Procedia Computer Science, pp. 45–50 (2015)
Prasada Babu, M.S., Hanumanth Sastry, S.: Big Data and Predictive Analytics in ERP Systems for Automating Decision Making Process. IEEE, New York (2014)
Al-Jarrah, O.Y., Yoo, P.D., et al.: Efficient Machine Learning for Big Data: A Review. Big Data Research. Elsevier, Amerstem (2015)
Chandarana, P., Vijayalakshmi, M.: Big data analytics frameworks. In: International Conference on Circuits, Systems, Communication and Information Technology Applications, pp. 430–434. IEEE (2014)
Zhang, P., et al.: Short-term load forecasting based on big data technologies. CSEE J. Power Energy Syst. 1, 59–67 (2015)
Pandey, R., Dhoundiyal, M.: Quantitative evaluation of big data categorical variables through R. In: Procedia Computer Science, pp. 582–588 (2015)
Uskenbayeva, R., et al.: Integrating of data using the Hadoop and R. In: Procedia Computer Science, pp. 145–149 (2015)
Wang, R., et al.: Learning ELM-tree from big data based on uncertainty reduction. In: Fuzzy Sets and Systems. Elsevier, Amerstem (2015)
Landset, S., et al.: A survey of open source tools for machine learning with big data in the Hadoop ecosystem. In: Journal of Big Data. Springer, New York (2015)
Rio, S.D., et al.: On the use of MapReduce for imbalanced big data using Random Forest. In: Information Sciences. Elsevier, Amerstem (2014)
Maitrey, S., Jha, C.K.: Handling big data efficiently by using map reduce technique. In: International Conference on Computational Intelligence & Communication Technology. IEEE (2015)
Sruthika, S., Tajunisha, N.: A study on evolution of data analytics to big data analytics and its research scope. In: 2nd International Conference on Innovations in Information Embedded and Communications Systems. IEEE (2015)
Zang, W., et al.: Comparative study between incremental and ensemble learning on data streams: case study. J. Big Data 1, 5 (2014)
Wu, X., Zhu, X., et al.: Data mining with big data. In: IEEE Transactions on Knowledge and Data Engineering (2014)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Ahlawat, K., Singh, A.P. (2017). A Novel Hybrid Technique for Big Data Classification Using Decision Tree Learning. In: Mandal, J., Dutta, P., Mukhopadhyay, S. (eds) Computational Intelligence, Communications, and Business Analytics. CICBA 2017. Communications in Computer and Information Science, vol 775. Springer, Singapore. https://doi.org/10.1007/978-981-10-6427-2_10
Download citation
DOI: https://doi.org/10.1007/978-981-10-6427-2_10
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-10-6426-5
Online ISBN: 978-981-10-6427-2
eBook Packages: Computer ScienceComputer Science (R0)