Survey of Streaming Clustering Algorithms in Machine Learning on Big Data Architecture

Parekh, Madhuri; Shukla, Madhu

doi:10.1007/978-981-19-0095-2_48

Madhuri Parekh¹² &
Madhu Shukla¹³

Part of the book series: Lecture Notes in Networks and Systems ((LNNS,volume 400))

480 Accesses

Abstract

Machine learning is becoming increasingly popular in a range of fields. Big data helps machine learning algorithms better timely and accurate recommendations than ever before. Machine learning big data phases and subsystems point the way to comparable issues and threats, as well as to associated research in various of previously new of unexplored areas. In latest days, data stream mining has become a popular research topic. The biggest challenge in streaming data mining is extracting important information directly from a vast, persistent, and changeable data stream in just one scan. Clustering is a powerful method for resolving this issue. Financial transactions, electronic communications, and other fields can benefit from data stream clustering. This research examines approaches for synthesizing enormous data streams, including projection clusters for high-dimensional data, scalability, and spreading computing, as well as the issues of big data and machine learning. The data abstraction and regular features of learning streams, such as abstraction wander, data flow system, and outlier observation, are discussed in this study. AutoCloud, which seems supported by previous inform prospect of typicality with eccentricity data science, will be used to discover deviations but instead resolve them using the support clustering approach, which is the process of computing difficulty and clustering exactness, will be detailed in this article. We present the MLAutoCloud algorithm, which will be used in machine learning PySpark frameworks. Implement the MLAutoCloud algorithm in the future to tackle the AutoCloud algorithm’s problems.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 189.00; Price excludes VAT (USA)

Softcover Book: USD 249.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Mavragani A, Ochoa G, Tsagarakis KP (2018) Assessing the methods, tools, and statistical procedures in google trends research: systematic review
Google Scholar
Kolajo T, Daramola O, Adebiyi A (2019) Big data stream analysis: a systematic literature review. J Big Data
Google Scholar
Qian Z, He Y, Su C, Wu Z, Zhu H, Zhang T, Zhou L, Yu Y, Zhang Z (2013) TimeStream: reliable stream computation in the cloud. In: Proceedings of the eighth ACM European conference on computer systems, EuroSys 2013. ACM Press, Prague, pp 1–4
Google Scholar
Zhou L, Pan S, Wang J, Vasilakos AV (2017) Machine learning on big data: opportunities and challenges
Google Scholar
Tsai C-W, Lai C-F, Chao H-C, Vasilakos AV (2015) Big data analytics: a survey. J Big Data 2:1–32
Article Google Scholar
Russell S, Norvig P (2010) Artificial intelligence: a modern approach, 3rd edn. Prentice Hall
Google Scholar
Bengio Y, Courville A, Vincent P (2013) Representation learning: a review and new perspectives. IEEE Trans Pattern Anal Mach Intell 35
Google Scholar
O’Callaghan L, Mishra N, Meyerson A, Guha S, Motwani R (2002) Streaming-data algorithms for high-quality clustering. In: 18th international conference on data engineering, pp 685–694
Google Scholar
Aggarwal CC, Han J, Wang J, Yu PS (2003) A framework for clustering evolving data streams. In: International conference on very large databases, vol 29, pp 81–92
Google Scholar
Aggarwal CC, Han J, Wang J, Yu PS (2004) A framework for projected clustering of high dimensional data streams. In: Thirtieth international conference on very large data bases, vol 30, pp 852–863
Google Scholar
Dharni C, Bnasal M (2013) An enhancement of DBSCAN algorithm to dissect cluster for large datasets. In: IEEE international conference on MOOC innovation and technology in education (MITE), pp 42–46
Google Scholar
Ankerst M, Breunig MM, Kriegel H, Sander J (1999) OPTICS ordering points to identify the clustering structure. ACM SIGMOD Rec 28(2):49–60
Article Google Scholar
Wan L, Ng WK, Dang XH, Yu PS, Zhang K (2009) Viscosity-grounded clustering of data aqueducts at multiple judgments. ACM Trans Knowl Discov Data (TKDD) 3(3):1–28
Google Scholar
Ntoutsi I, Zimek A, Palpanas T, Kröger P, Kriegel H (2012) Viscosity-grounded projected clustering over high dimensional data aqueducts. In: Society of industrial and applied mathematics (SIAM) international conference on data mining, pp 987–998
Google Scholar
Amini A, Wah TY (2012) DENGRIS-Stream: a viscosity-grid grounded clustering algorithm for evolving data aqueducts over sliding window. In: International conference on data mining computer engineering, pp 206–211
Google Scholar
Cao Y, He H, Man H (2012) SOMKE kernel viscosity estimation over data aqueducts by sequences of tone-organizing charts. IEEE Trans Neural Netw Learn Syst 23(8):1254–1268
Google Scholar
Amini A, Wah TY (2013) LeaDen-Stream: a leader density-based clustering algorithm over evolving data stream. J Comput Commun 1(5):26–31
Article Google Scholar
Rodrigues PP, Gama J, Pedroso JP (2008) ODAC: hierarchical clustering of time series data streams. IEEE Trans Knowl Data Eng 20(5):615–627
Google Scholar
Zhang T, Ramakrishnan R, Livny M (1996) BIRCH: an efficient data clustering method for very large databases. ACM SIGMOD Rec 25(2):103–114
Google Scholar
Keogh E, Chu S, Hart D, Pazzani M (2001) An online algorithm for segmenting time series. In: International conference on data mining, pp 289–296
Google Scholar
Bedi KP (2015) Clustering of categorized text data using cobweb algorithm. Int J Comput Sci Inf Technol Res 3(3):249–254
Google Scholar
Kaneriya A, Shukla M (2015) A novel approach for clustering data streams using granularity technique. In: 2015 international conference on advances in computer engineering and applications, pp 586–590. https://doi.org/10.1109/ICACEA.2015.7164759
Yui M, Kojima I (2013) A database-hadoop hybrid approach to scalable machine learning. In: IEEE international congress on big data, pp 1–8
Google Scholar
De Morales G (2013) SAMOA: a platform for mining big data streams. In: Proceedings of the twenty second international conference on world wide web, pp 777–778
Google Scholar
Tamboli J, Shukla M (2016) A survey of outlier detection algorithms for data streams. In: 2016 3rd international conference on computing for sustainable global development (INDIACom), pp 3535–3540
Google Scholar
Najafabadi MM, Villanustre F, Khoshgoftaar TM, Seliya N, Wald R, Muharemagic E (2015) Deep learning applications and challenges in big data analytics. J Big Data 2(1):1
Article Google Scholar
Jordan MI, Mitchell TM (2015) Machine learning with trends, perspectives, and prospects. Science 349:255–260
Article MathSciNet Google Scholar
Keshvani T, Shukla M (2018) A comparative study on data stream clustering algorithms. In: International conference on computer networks, big data and IoT. Springer, Cham
Google Scholar
Shukla M, Kosta YP (2016) Empirical analysis and improvement of density based clustering algorithm in data streams. In: 2016 international conference on inventive computation technologies (ICICT), pp 1–4. https://doi.org/10.1109/INVENTIVE.2016.7823262
Shukla M, Kosta YP, Jayswal M (2017) A modified approach of OPTICS algorithm for data streams. Eng Technol Appl Sci Res 7(2)
Google Scholar
Amini A, Saboohi TYH (2014) On density-based data streams clustering algorithms: a survey. J Comput Sci Technol 29(1):116–141
Article Google Scholar
Gomes Bezerra C, Costa BSJ, Guedes LA, Angelov PP (2019) An evolving approach to data streams clustering based on typicality and eccentricity data analytics. Inf Sci. https://doi.org/10.1016/j.ins.2019.12.022

Download references

Author information

Authors and Affiliations

Computer Engineering Department, Ph.d scholar Marwadi University, Rajkot, India
Madhuri Parekh
Computer Engineering Department, Associate Professor & Head Computer Engineering-AI, Marwadi University, Rajkot, India
Madhu Shukla

Authors

Madhuri Parekh
View author publications
You can also search for this author in PubMed Google Scholar
Madhu Shukla
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Madhuri Parekh .

Editor information

Editors and Affiliations

Global Knowledge Research Foundation, Ahmedabad, India
Amit Joshi
Nottingham Trent University, Nottingham, UK
Mufti Mahmud
University of Peradeniya, Kandy, Sri Lanka
Roshan G. Ragel

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Parekh, M., Shukla, M. (2023). Survey of Streaming Clustering Algorithms in Machine Learning on Big Data Architecture. In: Joshi, A., Mahmud, M., Ragel, R.G. (eds) Information and Communication Technology for Competitive Strategies (ICTCS 2021). Lecture Notes in Networks and Systems, vol 400. Springer, Singapore. https://doi.org/10.1007/978-981-19-0095-2_48

Download citation

DOI: https://doi.org/10.1007/978-981-19-0095-2_48
Published: 23 June 2022
Publisher Name: Springer, Singapore
Print ISBN: 978-981-19-0094-5
Online ISBN: 978-981-19-0095-2
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics