Skip to main content

A Comprehensive Analysis of Classification Methods for Big Data Stream

  • Conference paper
  • First Online:
  • 812 Accesses

Part of the book series: Algorithms for Intelligent Systems ((AIS))

Abstract

Traditional tools for mining the big data have become insufficient due to ever-growing data in the world. For handling big data, real-time and distributed processing is adopted. From so many mining tools options, it can be difficult for a researcher to opt an efficient tool. This paper is intended to aid the researcher who understands WEKA but is inexperienced with big data. The preliminary stage of data mining is classification, which categorizes the data into predefined groups. In this paper, WEKA with MOA package is used to classify big data stream with four different classifiers. Performance of these classifiers is analyzed on the basis of accuracy, i.e., correctly and incorrectly classified instances, time taken to test the model, and time taken to build the model. For this particular scenario, obtained results prove that naive Bayes is the most accurate classifier and decision stump is least effective classifier for big data.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD   169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Bifet, A., & Kirkby, R. (2009). Data stream mining: A practical approach.

    Google Scholar 

  2. Statista. (2015). Number of smartphones users worldwide from 2014 to 2020. Retrieved December, 2018, from http://www.statista.com/statistics/330695/number-of-smartphone-users-worldwide/.

  3. Newberry, C. (2018). Twitter users and twitter usage statistics. Retrieved December, 2018, from https://blog.hootsuite.com/twitterstatistics/.

  4. Omnicore Agency. (2018). Youtube by the numbers: Stats, demographics and fun facts. Retrieved December, 2018, from https://www.omnicoreagency.com/youtube-statistics/.

  5. van Rijmenam, M. (2018). Understanding the various sources of big data—infographic. Retrieved December, 2018, from https://datafloq.com/read/understanding-sources-big-data-infographic/338.

  6. Ruzgas, T., Jakubėlienė, K., & Buivytė, A. (2016). Big data mining and knowledge discovery. Journal of Communications Technology, Electronics and Computer Science, 9, 5–9.

    Google Scholar 

  7. Kumar, D., & Mohanty, M. N. (2019). A survey: classification of big data. In Proceedings of cognitive informatics and soft computing. Singapore: Springer.

    Google Scholar 

  8. Alotaibi, N. M., & Abdullah, M. A. (2017). Big data mining: A classification perspective.

    Google Scholar 

  9. Han, J., Pei, J., & Kamber, M. (2011). Data mining: concepts and techniques. Amsterdam: Elsevier.

    Google Scholar 

  10. Madden, S. (2012). From databases to big data. IEEE Internet Computing, 16(3), 4–6.

    Article  Google Scholar 

  11. Laney, D. (2001). 3d data management: Controlling data volume, velocity and variety. META Group Research Note, 6(70), 1.

    Google Scholar 

  12. IBM Big Data Analytics Hub. (2018). The four v’s of big data. Retrieved January, 2019, from https://www.ibmbigdatahub.com/infographic/fourvs-big-data.

  13. Nunez, S. G., & Attoh-Okine, N. (2014). Metaheuristics in big data: An approach to railway engineering. In 2014 IEEE International conference on big data (big data), (pp. 42–47). IEEE.

    Google Scholar 

  14. Dhaenens, C., & Jourdan, L. (2016). Metaheuristics for big data. New York: Wiley.

    Google Scholar 

  15. Chen, C. L. P., & Zhang, C.-Y. (2014). Data-intensive applications, challenges, techniques and technologies: A survey on big data. Information Sciences, 275, 314–347.

    Article  Google Scholar 

  16. Raghupathi, W., & Raghupathi, V. (2014). Big data analytics in healthcare: promise and potential. Health information science and systems. Berlin: Springer.

    MATH  Google Scholar 

  17. Gupta, S., Poonia, R. C., Singh, V., & Raja, L. (2019). Tier application in multi-cloud databases to improve security and service availability. In Handbook of Research on Cloud Computing and Big Data Applications in IoT (pp. 82–93). IGI Global.

    Google Scholar 

  18. Kotsiantis, S. B., Zaharakis, I., & Pintelas, P. (2007). Supervised machine learning: A review of classification techniques. Emerging artificial intelligence applications in computer engineering, 160, 3–24.

    Google Scholar 

  19. Aggarwal, C. C. (2014). Instance-based learning: A survey. In Data classification: algorithms and applications (p. 157). New York: Chapman and Hall/CRC.

    Google Scholar 

  20. Bhatt, C., & Bhensdadia, C. K. (2016). Mining big data using modified induction tree approach. International Journal of Intelligent Engineering and Systems, 9(2), 14–20.

    Article  Google Scholar 

  21. Witten, I. H. Mining big data with WEKA 3. Retrieved January, 2019, from https://www.cs.waikato.ac.nz/ml/weka/mooc/moredataminingwithweka/slides/Class1-moreDataMiningWithWeka-2014.pdf.

  22. Markov, Z. An introduction to the weka data mining system. Retrieved January, 2019, from http://www.cs.ccsu.edu/markov/wekatutorial.pdf.

  23. Wikipedia. (2018). Decision stump. Retrieved January, 2019, from https://en.wikipedia.org/wiki/Decisionstump.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Amrinder Kaur .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Kaur, A., Kumar, R. (2020). A Comprehensive Analysis of Classification Methods for Big Data Stream. In: Sharma, H., Govindan, K., Poonia, R., Kumar, S., El-Medany, W. (eds) Advances in Computing and Intelligent Systems. Algorithms for Intelligent Systems. Springer, Singapore. https://doi.org/10.1007/978-981-15-0222-4_18

Download citation

Publish with us

Policies and ethics