Skip to main content

Naive Bayes and Decision Tree Classifier for Streaming Data Using HBase

  • Chapter
  • First Online:
Advanced Computing and Systems for Security

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 897))

Abstract

Classification in real-time environment on streaming data set is one of the most challenging research areas nowadays. Data streaming is used in real-time environment where massive volume of data is generated in small sizes chunks which need to be processed very fast. HBase is a good option which is used for storing such heterogeneous massive small data files in a way so that scalability and availability are preserved. In real-time environment, data are generated exponentially. Thus to store auto incremented data, dynamic splitting is needed which is supported by HBase. We choose tobacco-affected student record and observed that Naive Bayes classifier is less complex and more accurate than decision tree. Also, in real-time environment, it shows its efficacy compared to others when the training sample is too large which is handled by HBase. The key value store in HBase provides the classifiers an extra edge by improving its performance in terms of time.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 79.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 99.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Guo, J., Xu, W.: Research on optimization of community mass data storage based on HBase. In: Third International Conference on Cyberspace Technology (CCT) (2015)

    Google Scholar 

  2. Rajeswari, S., Lawrence R.: Classification model to predict the learners. In: Academic Performance using Big Data. 978-1-4673-8437-7/16/$31.00. IEEE (2016)

    Google Scholar 

  3. Vinod, D.F., Vasudevan, V.: A filter based feature set selection approach for big data classification of patient records. In: International Conference on Electrical, Electronics, and Optimization Techniques (ICEEOT) (2016)

    Google Scholar 

  4. An, Y., Sun, S., Wang, S.: Naive bayes classifiers for music emotion classification based on lyrics. In: IEEE/ACIS 16th International Conference on Computer and Information Science (ICIS) (2017)

    Google Scholar 

  5. Huang, Y., Li, L.: Naive bayes classification algorithm based on small sample. In: IEEE International Conference on Cloud Computing and Intelligence Systems (2011)

    Google Scholar 

  6. Yang, X., Dong, H., Zhang, H.: Naive bayes based on estimation of distribution algorithms for classification. In: First International Conference on Information Science and Engineering (2009)

    Google Scholar 

  7. Tennant, M., Stahl, F., Rana, O., Gomes, J.B.: Scalable real-time classification of data streams with concept drift. Futur. Gener. Comput. Syst. 75 (2017)

    Google Scholar 

  8. Balicki, J., Dryja, P., Korłub, W.: Harmony search for data mining with big data. In: Saeed, K., Homenda, W. (eds.) Computer Information Systems and Industrial Management. CISIM 2016. Lecture Notes in Computer Science, vol. 9842. Springer (2016)

    Google Scholar 

  9. Samchao, F.: An incremental decision tree learning methodology regarding attributes in medical data mining. In: IEEE Proceedings of the Eighth International Conference on Machine Learning and Cybernetics, Baoding (2009)

    Google Scholar 

  10. Chen, J., Wang, T., Abbey R., Pingeno, J.: A distributed decision tree algorithm and its implementation on big data platforms. In: IEEE Data Science and Advanced Analytics (DSAA) (2016)

    Google Scholar 

  11. Chandrasekar, P., Qian, K., Shahriar, H., Bhattacharya, P.: Improving the prediction accuracy of decision tree mining with data preprocessing. In: IEEE Annual Computer Software and Applications Conference (2017)

    Google Scholar 

  12. Wan, K.Y., Alagar, V.: Characteristics and classification of big data in health care sector. In: International Conference on Natural Computation, Fuzzy Systems and Knowledge Discovery (ICNC-FSKD) (2016)

    Google Scholar 

  13. Gandhi Bhagyashri, S., Deshpande Leena, A.: The survey on approaches to efficient clustering and classification analysis of big data. In: International Conference on Computing Communication Control and Automation (ICCUBEA) (2016)

    Google Scholar 

  14. Azqueta-Alzúaz, A., Brondino, I., Patiño-Martinez, M., Jimenez-Peris, R.: Massive data load on distributed database systems over HBase. In: 17th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID) (2017)

    Google Scholar 

  15. https://catalog.data.gov/dataset/youth-tobacco-survey-yts-data

  16. Dangi, A., Srivastava, S.: Educational data classification using selective Naïve Bayes for quota categorization. In: 2014 IEEE International Conference on MOOC, Innovation and Technology in Education (MITE), Patiala, 2014, pp. 118–121

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Aradhita Mukherjee .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Singapore Pte Ltd.

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Mukherjee, A., Mondal, S., Chaki, N., Khatua, S. (2019). Naive Bayes and Decision Tree Classifier for Streaming Data Using HBase. In: Chaki, R., Cortesi, A., Saeed, K., Chaki, N. (eds) Advanced Computing and Systems for Security. Advances in Intelligent Systems and Computing, vol 897. Springer, Singapore. https://doi.org/10.1007/978-981-13-3250-0_8

Download citation

Publish with us

Policies and ethics