A Comprehensive Analysis of Classification Methods for Big Data Stream

Kaur, Amrinder; Kumar, Rakesh

doi:10.1007/978-981-15-0222-4_18

A Comprehensive Analysis of Classification Methods for Big Data Stream

Amrinder Kaur⁹ &
Rakesh Kumar¹⁰

Conference paper
First Online: 03 January 2020

812 Accesses

Part of the book series: Algorithms for Intelligent Systems ((AIS))

Abstract

Traditional tools for mining the big data have become insufficient due to ever-growing data in the world. For handling big data, real-time and distributed processing is adopted. From so many mining tools options, it can be difficult for a researcher to opt an efficient tool. This paper is intended to aid the researcher who understands WEKA but is inexperienced with big data. The preliminary stage of data mining is classification, which categorizes the data into predefined groups. In this paper, WEKA with MOA package is used to classify big data stream with four different classifiers. Performance of these classifiers is analyzed on the basis of accuracy, i.e., correctly and incorrectly classified instances, time taken to test the model, and time taken to build the model. For this particular scenario, obtained results prove that naive Bayes is the most accurate classifier and decision stump is least effective classifier for big data.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Bifet, A., & Kirkby, R. (2009). Data stream mining: A practical approach.
Google Scholar
Statista. (2015). Number of smartphones users worldwide from 2014 to 2020. Retrieved December, 2018, from http://www.statista.com/statistics/330695/number-of-smartphone-users-worldwide/.
Newberry, C. (2018). Twitter users and twitter usage statistics. Retrieved December, 2018, from https://blog.hootsuite.com/twitterstatistics/.
Omnicore Agency. (2018). Youtube by the numbers: Stats, demographics and fun facts. Retrieved December, 2018, from https://www.omnicoreagency.com/youtube-statistics/.
van Rijmenam, M. (2018). Understanding the various sources of big data—infographic. Retrieved December, 2018, from https://datafloq.com/read/understanding-sources-big-data-infographic/338.
Ruzgas, T., Jakubėlienė, K., & Buivytė, A. (2016). Big data mining and knowledge discovery. Journal of Communications Technology, Electronics and Computer Science, 9, 5–9.
Google Scholar
Kumar, D., & Mohanty, M. N. (2019). A survey: classification of big data. In Proceedings of cognitive informatics and soft computing. Singapore: Springer.
Google Scholar
Alotaibi, N. M., & Abdullah, M. A. (2017). Big data mining: A classification perspective.
Google Scholar
Han, J., Pei, J., & Kamber, M. (2011). Data mining: concepts and techniques. Amsterdam: Elsevier.
Google Scholar
Madden, S. (2012). From databases to big data. IEEE Internet Computing, 16(3), 4–6.
Article Google Scholar
Laney, D. (2001). 3d data management: Controlling data volume, velocity and variety. META Group Research Note, 6(70), 1.
Google Scholar
IBM Big Data Analytics Hub. (2018). The four v’s of big data. Retrieved January, 2019, from https://www.ibmbigdatahub.com/infographic/fourvs-big-data.
Nunez, S. G., & Attoh-Okine, N. (2014). Metaheuristics in big data: An approach to railway engineering. In 2014 IEEE International conference on big data (big data), (pp. 42–47). IEEE.
Google Scholar
Dhaenens, C., & Jourdan, L. (2016). Metaheuristics for big data. New York: Wiley.
Google Scholar
Chen, C. L. P., & Zhang, C.-Y. (2014). Data-intensive applications, challenges, techniques and technologies: A survey on big data. Information Sciences, 275, 314–347.
Article Google Scholar
Raghupathi, W., & Raghupathi, V. (2014). Big data analytics in healthcare: promise and potential. Health information science and systems. Berlin: Springer.
MATH Google Scholar
Gupta, S., Poonia, R. C., Singh, V., & Raja, L. (2019). Tier application in multi-cloud databases to improve security and service availability. In Handbook of Research on Cloud Computing and Big Data Applications in IoT (pp. 82–93). IGI Global.
Google Scholar
Kotsiantis, S. B., Zaharakis, I., & Pintelas, P. (2007). Supervised machine learning: A review of classification techniques. Emerging artificial intelligence applications in computer engineering, 160, 3–24.
Google Scholar
Aggarwal, C. C. (2014). Instance-based learning: A survey. In Data classification: algorithms and applications (p. 157). New York: Chapman and Hall/CRC.
Google Scholar
Bhatt, C., & Bhensdadia, C. K. (2016). Mining big data using modified induction tree approach. International Journal of Intelligent Engineering and Systems, 9(2), 14–20.
Article Google Scholar
Witten, I. H. Mining big data with WEKA 3. Retrieved January, 2019, from https://www.cs.waikato.ac.nz/ml/weka/mooc/moredataminingwithweka/slides/Class1-moreDataMiningWithWeka-2014.pdf.
Markov, Z. An introduction to the weka data mining system. Retrieved January, 2019, from http://www.cs.ccsu.edu/markov/wekatutorial.pdf.
Wikipedia. (2018). Decision stump. Retrieved January, 2019, from https://en.wikipedia.org/wiki/Decisionstump.

Download references

Author information

Authors and Affiliations

Department of Computer Science and Applications, Maharishi Dayanand University, Rohtak, India
Amrinder Kaur
Department of Computer Science and Applications, Kurukshetra University, Thanesar, India
Rakesh Kumar

Authors

Amrinder Kaur
View author publications
You can also search for this author in PubMed Google Scholar
Rakesh Kumar
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Amrinder Kaur .

Editor information

Editors and Affiliations

Rajasthan Technical University, Kota, Rajasthan, India
Harish Sharma
University of Southern Denmark, Odense, Denmark
Kannan Govindan
Amity University, Jaipur, Rajasthan, India
Ramesh C. Poonia
Amity University, Jaipur, Rajasthan, India
Sandeep Kumar
University of Bahrain, Zallaq, Bahrain
Wael M. El-Medany

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kaur, A., Kumar, R. (2020). A Comprehensive Analysis of Classification Methods for Big Data Stream. In: Sharma, H., Govindan, K., Poonia, R., Kumar, S., El-Medany, W. (eds) Advances in Computing and Intelligent Systems. Algorithms for Intelligent Systems. Springer, Singapore. https://doi.org/10.1007/978-981-15-0222-4_18

Download citation

DOI: https://doi.org/10.1007/978-981-15-0222-4_18
Published: 03 January 2020
Publisher Name: Springer, Singapore
Print ISBN: 978-981-15-0221-7
Online ISBN: 978-981-15-0222-4
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics