Abstract
Traditional tools for mining the big data have become insufficient due to ever-growing data in the world. For handling big data, real-time and distributed processing is adopted. From so many mining tools options, it can be difficult for a researcher to opt an efficient tool. This paper is intended to aid the researcher who understands WEKA but is inexperienced with big data. The preliminary stage of data mining is classification, which categorizes the data into predefined groups. In this paper, WEKA with MOA package is used to classify big data stream with four different classifiers. Performance of these classifiers is analyzed on the basis of accuracy, i.e., correctly and incorrectly classified instances, time taken to test the model, and time taken to build the model. For this particular scenario, obtained results prove that naive Bayes is the most accurate classifier and decision stump is least effective classifier for big data.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Bifet, A., & Kirkby, R. (2009). Data stream mining: A practical approach.
Statista. (2015). Number of smartphones users worldwide from 2014 to 2020. Retrieved December, 2018, from http://www.statista.com/statistics/330695/number-of-smartphone-users-worldwide/.
Newberry, C. (2018). Twitter users and twitter usage statistics. Retrieved December, 2018, from https://blog.hootsuite.com/twitterstatistics/.
Omnicore Agency. (2018). Youtube by the numbers: Stats, demographics and fun facts. Retrieved December, 2018, from https://www.omnicoreagency.com/youtube-statistics/.
van Rijmenam, M. (2018). Understanding the various sources of big data—infographic. Retrieved December, 2018, from https://datafloq.com/read/understanding-sources-big-data-infographic/338.
Ruzgas, T., Jakubėlienė, K., & Buivytė, A. (2016). Big data mining and knowledge discovery. Journal of Communications Technology, Electronics and Computer Science, 9, 5–9.
Kumar, D., & Mohanty, M. N. (2019). A survey: classification of big data. In Proceedings of cognitive informatics and soft computing. Singapore: Springer.
Alotaibi, N. M., & Abdullah, M. A. (2017). Big data mining: A classification perspective.
Han, J., Pei, J., & Kamber, M. (2011). Data mining: concepts and techniques. Amsterdam: Elsevier.
Madden, S. (2012). From databases to big data. IEEE Internet Computing, 16(3), 4–6.
Laney, D. (2001). 3d data management: Controlling data volume, velocity and variety. META Group Research Note, 6(70), 1.
IBM Big Data Analytics Hub. (2018). The four v’s of big data. Retrieved January, 2019, from https://www.ibmbigdatahub.com/infographic/fourvs-big-data.
Nunez, S. G., & Attoh-Okine, N. (2014). Metaheuristics in big data: An approach to railway engineering. In 2014 IEEE International conference on big data (big data), (pp. 42–47). IEEE.
Dhaenens, C., & Jourdan, L. (2016). Metaheuristics for big data. New York: Wiley.
Chen, C. L. P., & Zhang, C.-Y. (2014). Data-intensive applications, challenges, techniques and technologies: A survey on big data. Information Sciences, 275, 314–347.
Raghupathi, W., & Raghupathi, V. (2014). Big data analytics in healthcare: promise and potential. Health information science and systems. Berlin: Springer.
Gupta, S., Poonia, R. C., Singh, V., & Raja, L. (2019). Tier application in multi-cloud databases to improve security and service availability. In Handbook of Research on Cloud Computing and Big Data Applications in IoT (pp. 82–93). IGI Global.
Kotsiantis, S. B., Zaharakis, I., & Pintelas, P. (2007). Supervised machine learning: A review of classification techniques. Emerging artificial intelligence applications in computer engineering, 160, 3–24.
Aggarwal, C. C. (2014). Instance-based learning: A survey. In Data classification: algorithms and applications (p. 157). New York: Chapman and Hall/CRC.
Bhatt, C., & Bhensdadia, C. K. (2016). Mining big data using modified induction tree approach. International Journal of Intelligent Engineering and Systems, 9(2), 14–20.
Witten, I. H. Mining big data with WEKA 3. Retrieved January, 2019, from https://www.cs.waikato.ac.nz/ml/weka/mooc/moredataminingwithweka/slides/Class1-moreDataMiningWithWeka-2014.pdf.
Markov, Z. An introduction to the weka data mining system. Retrieved January, 2019, from http://www.cs.ccsu.edu/markov/wekatutorial.pdf.
Wikipedia. (2018). Decision stump. Retrieved January, 2019, from https://en.wikipedia.org/wiki/Decisionstump.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Kaur, A., Kumar, R. (2020). A Comprehensive Analysis of Classification Methods for Big Data Stream. In: Sharma, H., Govindan, K., Poonia, R., Kumar, S., El-Medany, W. (eds) Advances in Computing and Intelligent Systems. Algorithms for Intelligent Systems. Springer, Singapore. https://doi.org/10.1007/978-981-15-0222-4_18
Download citation
DOI: https://doi.org/10.1007/978-981-15-0222-4_18
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-15-0221-7
Online ISBN: 978-981-15-0222-4
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)