Skip to main content
Log in

Holoentropy based Correlative Naive Bayes classifier and MapReduce model for classifying the big data

  • Special Issue
  • Published:
Evolutionary Intelligence Aims and scope Submit manuscript

Abstract

Big data is the recent imminent technology, which can provide large benefits to the business administration. Owing to such huge volume, it becomes very complicated to ensure effective analysis by the existing techniques. The complications can be related to analyze, capture, sharing, storage, and visualization of the data. To tackle these challenges, a novel classification technique using Holoentropy based Correlative Naive Bayes classifier and MapReduce Model (HCNB-MRM) is proposed. The proposed HCNB, which is designed by combining the Holoentropy function with the correlative based Naive Bayes classifier deals with both high-dimensional data sets as well as extensive datasets to improve the benchmark, and classify the data based on dependent assumption. Therefore, the proposed HCNB-MRM is used to make the process simpler and to choose the best features from big dataset. The proposed HCNB with the MapReduce Model maximizes the performance of big data classification using probability index table, and posterior probability of the testing data samples. The performance of the proposed HCNB-MRM is evaluated using three metrics, such as accuracy, sensitivity, and specificity. From the experimental results, it is analyzed that the proposed HCNB-MRM obtains a high classification accuracy of 93.5965% and 94.3369% for the localization dataset, and skin dataset when compared with the existing techniques.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

References

  1. Bechini A, Marcelloni F, Segatori A (2016) A MapReduce solution for associative classification of big data. Inf Sci 332:33–55

    Article  Google Scholar 

  2. Priyadarshini A, Agarwal S (2015) A map-reduce based support vector machine for big data classification. Int J Database Theory Appl 8(5):77–98

    Article  Google Scholar 

  3. Deng Z, Zhu X, Cheng D, Zong M, Zhang S (2016) Efficient kNN classification algorithm for big data. Neurocomputing 195:143–148

    Article  Google Scholar 

  4. Elkano M, Galar M, Sanz J, Bustince H (2018) CHI-BD: a fuzzy rule-based classification system for big data classification problems. Fuzzy Sets Syst 348:75–101

    Article  MathSciNet  Google Scholar 

  5. Benabderrahmane S, Mellouli N, Lamolle M, Paroubek P (2017) Smart4Job: a big data framework for intelligent job offers broadcasting using time series forecasting and semantic classification. Big Data Res 7:16–30

    Article  Google Scholar 

  6. Fong S, Wong R, Vasilakos AV (2016) Accelerated PSO swarm search feature selection for data stream mining big data. IEEE Trans Serv Comput 9(1):33–45

    Google Scholar 

  7. Lin K-C, Zhang K-Y, Huang Y-H, Hung JC, Yen N (2016) Feature selection based on an improved cat swarm optimization algorithm for big data classification. J Supercomput 72(8):3210–3221

    Article  Google Scholar 

  8. Read J, Bifet A (2015) Data stream classification using random feature functions and novel method combinations. In: Proceedings in 2015 IEEE Trustcom/BigDataSE/ISPA, vol 2, pp 211–216

  9. Triguero I, Peralta D, Bacardit J, Garcia S, Herrera F (2015) MRPR: a MapReduce solution for prototype reduction in big data classification. Neurocomputing 150(20A):331–345

    Article  Google Scholar 

  10. Hazewinkel M (2001) Arithmetic series. In: Hazewinkel M (ed) Encyclopedia of mathematics. Springer, Netherlands

    Google Scholar 

  11. Garren ST (1998) Maximum likelihood estimation of the correlation coefficient in a bivariate normal model with missing data. Stat Probab Lett 38(3):281–288

    Article  MathSciNet  Google Scholar 

  12. Shu W, Wang S (2013) Information-theoretic outlier detection for large-scale categorical data. IEEE Trans Knowl Data Eng 25(3):589–602

    Article  Google Scholar 

  13. Lampi J (2014) Large-scale distributed data management and processing using R, Hadoop and MapReduce. University of Oulu, Department of Computer Science and Engineering, Master’s Thesis

  14. Gantz J, Reinsel D (2012) The digital universe in 2020: big data, bigger digital shadows, and biggest growth in the far east. IDC iView IDC Anal Future 2007:1–16

    Google Scholar 

  15. Hu H, Wen Y, Chua TS, Li X (2014) Toward scalable systems for big data analytics: a technology tutorial. IEEE Access 2:652–687

    Article  Google Scholar 

  16. Marx V (2013) The big challenges of big data. Nature 7453(498):255–260

    Article  Google Scholar 

  17. Minelli M, Chambers M, Dhiraj A (2013) Big data, big analytics: emerging business intelligence and analytic trends for today’s businesses (Wiley CIO)”, 1st edn. Wiley, New York

    Book  Google Scholar 

  18. Plummer D, Bittman T, Austin T, Cearley D, Cloud DS (2008) Defining and describing an emerging phenomenon. Technical report

  19. Alpaydin E (2010) Introduction to machine learning, 2nd edn. MIT Press, Cambridge

    MATH  Google Scholar 

  20. Woniak M, Granaa M, Corchado E (2013) A survey of multiple classifier systems as hybrid systems. Inf Fusion 16:3–17

    Article  Google Scholar 

  21. Xu K, Wen C, Yuan Q, He X, Tie J (2014) A MapReduce based parallel SVM for email classification. J Netw 9(6):1640–1647

    Google Scholar 

  22. Prasad BR, Agarwal S (2014) Handling big data stream analytics using SAMOA framework-a practical experience. Int J Database and Appl 7(4):197–208

    Article  Google Scholar 

  23. Dean Jeffrey, Ghemawat Sanjay (2008) MapReduce: simplified data processing on large clusters. ACM Commun 51(1):107–113

    Article  Google Scholar 

  24. Banchhor C, Srinivasu N (2016) CNB-MRF: adapting correlative Naive Bayes classifier and MapReduce framework for big data classification. Int Rev Comput Softw (IRECOS) 11(11):1007–1015

    Article  Google Scholar 

  25. Wu X, Kumar V, Quinlan JR, Ghosh J, Yang Q, Motoda H, McLachlan GJ, Ng A, Liu B, Yu PS, Zhou Z-H, Steinbach M, Hand DJ, Steinberg D (2008) Top 10 algorithms in data mining. Knowl Inf Syst 14(1):1–37

    Article  Google Scholar 

  26. UCI machine learning repository from http://archive.ics.uci.edu/ml/. Accessed on Nov 2017

  27. Mirjalili S, Mirjalili SM, Lewis A (2014) Grey wolf optimizer. Adv Eng Softw 69:46–61

    Article  Google Scholar 

  28. Ezatpoor P, Zhan J, Wu JMT, Chiu C (2018) Finding Top-k dominance on incomplete big data using MapReduce framework. IEEE Access 6:7872–7887

    Article  Google Scholar 

  29. Dhyani P, Chander S, Vijaya P (2016) DOFL: kernel based directive operative fractional line optimization algorithm for data clustering. Int Rev Comput Softw (IRECOS) 11(8):701

    Article  Google Scholar 

  30. Thomas R, Rangachar MJS (2016) Integrating GWTM and BAT algorithm for face recognition in low-resolution images. Imaging Sci J 64(8):441–452

    Article  Google Scholar 

  31. Ingle RB, More NS (2018) Energy-aware VM migration using Dragonfly–Crow optimization and support vector regression model in Cloud Data. Int J Model Simul Sci Comput. https://doi.org/10.1142/S1793962318500502

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Chitrakant Banchhor.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Banchhor, C., Srinivasu, N. Holoentropy based Correlative Naive Bayes classifier and MapReduce model for classifying the big data. Evol. Intel. 15, 1037–1050 (2022). https://doi.org/10.1007/s12065-019-00276-9

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12065-019-00276-9

Keywords

Navigation