Abstract
Machine learning is becoming increasingly popular in a range of fields. Big data helps machine learning algorithms better timely and accurate recommendations than ever before. Machine learning big data phases and subsystems point the way to comparable issues and threats, as well as to associated research in various of previously new of unexplored areas. In latest days, data stream mining has become a popular research topic. The biggest challenge in streaming data mining is extracting important information directly from a vast, persistent, and changeable data stream in just one scan. Clustering is a powerful method for resolving this issue. Financial transactions, electronic communications, and other fields can benefit from data stream clustering. This research examines approaches for synthesizing enormous data streams, including projection clusters for high-dimensional data, scalability, and spreading computing, as well as the issues of big data and machine learning. The data abstraction and regular features of learning streams, such as abstraction wander, data flow system, and outlier observation, are discussed in this study. AutoCloud, which seems supported by previous inform prospect of typicality with eccentricity data science, will be used to discover deviations but instead resolve them using the support clustering approach, which is the process of computing difficulty and clustering exactness, will be detailed in this article. We present the MLAutoCloud algorithm, which will be used in machine learning PySpark frameworks. Implement the MLAutoCloud algorithm in the future to tackle the AutoCloud algorithm’s problems.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Mavragani A, Ochoa G, Tsagarakis KP (2018) Assessing the methods, tools, and statistical procedures in google trends research: systematic review
Kolajo T, Daramola O, Adebiyi A (2019) Big data stream analysis: a systematic literature review. J Big Data
Qian Z, He Y, Su C, Wu Z, Zhu H, Zhang T, Zhou L, Yu Y, Zhang Z (2013) TimeStream: reliable stream computation in the cloud. In: Proceedings of the eighth ACM European conference on computer systems, EuroSys 2013. ACM Press, Prague, pp 1–4
Zhou L, Pan S, Wang J, Vasilakos AV (2017) Machine learning on big data: opportunities and challenges
Tsai C-W, Lai C-F, Chao H-C, Vasilakos AV (2015) Big data analytics: a survey. J Big Data 2:1–32
Russell S, Norvig P (2010) Artificial intelligence: a modern approach, 3rd edn. Prentice Hall
Bengio Y, Courville A, Vincent P (2013) Representation learning: a review and new perspectives. IEEE Trans Pattern Anal Mach Intell 35
O’Callaghan L, Mishra N, Meyerson A, Guha S, Motwani R (2002) Streaming-data algorithms for high-quality clustering. In: 18th international conference on data engineering, pp 685–694
Aggarwal CC, Han J, Wang J, Yu PS (2003) A framework for clustering evolving data streams. In: International conference on very large databases, vol 29, pp 81–92
Aggarwal CC, Han J, Wang J, Yu PS (2004) A framework for projected clustering of high dimensional data streams. In: Thirtieth international conference on very large data bases, vol 30, pp 852–863
Dharni C, Bnasal M (2013) An enhancement of DBSCAN algorithm to dissect cluster for large datasets. In: IEEE international conference on MOOC innovation and technology in education (MITE), pp 42–46
Ankerst M, Breunig MM, Kriegel H, Sander J (1999) OPTICS ordering points to identify the clustering structure. ACM SIGMOD Rec 28(2):49–60
Wan L, Ng WK, Dang XH, Yu PS, Zhang K (2009) Viscosity-grounded clustering of data aqueducts at multiple judgments. ACM Trans Knowl Discov Data (TKDD) 3(3):1–28
Ntoutsi I, Zimek A, Palpanas T, Kröger P, Kriegel H (2012) Viscosity-grounded projected clustering over high dimensional data aqueducts. In: Society of industrial and applied mathematics (SIAM) international conference on data mining, pp 987–998
Amini A, Wah TY (2012) DENGRIS-Stream: a viscosity-grid grounded clustering algorithm for evolving data aqueducts over sliding window. In: International conference on data mining computer engineering, pp 206–211
Cao Y, He H, Man H (2012) SOMKE kernel viscosity estimation over data aqueducts by sequences of tone-organizing charts. IEEE Trans Neural Netw Learn Syst 23(8):1254–1268
Amini A, Wah TY (2013) LeaDen-Stream: a leader density-based clustering algorithm over evolving data stream. J Comput Commun 1(5):26–31
Rodrigues PP, Gama J, Pedroso JP (2008) ODAC: hierarchical clustering of time series data streams. IEEE Trans Knowl Data Eng 20(5):615–627
Zhang T, Ramakrishnan R, Livny M (1996) BIRCH: an efficient data clustering method for very large databases. ACM SIGMOD Rec 25(2):103–114
Keogh E, Chu S, Hart D, Pazzani M (2001) An online algorithm for segmenting time series. In: International conference on data mining, pp 289–296
Bedi KP (2015) Clustering of categorized text data using cobweb algorithm. Int J Comput Sci Inf Technol Res 3(3):249–254
Kaneriya A, Shukla M (2015) A novel approach for clustering data streams using granularity technique. In: 2015 international conference on advances in computer engineering and applications, pp 586–590. https://doi.org/10.1109/ICACEA.2015.7164759
Yui M, Kojima I (2013) A database-hadoop hybrid approach to scalable machine learning. In: IEEE international congress on big data, pp 1–8
De Morales G (2013) SAMOA: a platform for mining big data streams. In: Proceedings of the twenty second international conference on world wide web, pp 777–778
Tamboli J, Shukla M (2016) A survey of outlier detection algorithms for data streams. In: 2016 3rd international conference on computing for sustainable global development (INDIACom), pp 3535–3540
Najafabadi MM, Villanustre F, Khoshgoftaar TM, Seliya N, Wald R, Muharemagic E (2015) Deep learning applications and challenges in big data analytics. J Big Data 2(1):1
Jordan MI, Mitchell TM (2015) Machine learning with trends, perspectives, and prospects. Science 349:255–260
Keshvani T, Shukla M (2018) A comparative study on data stream clustering algorithms. In: International conference on computer networks, big data and IoT. Springer, Cham
Shukla M, Kosta YP (2016) Empirical analysis and improvement of density based clustering algorithm in data streams. In: 2016 international conference on inventive computation technologies (ICICT), pp 1–4. https://doi.org/10.1109/INVENTIVE.2016.7823262
Shukla M, Kosta YP, Jayswal M (2017) A modified approach of OPTICS algorithm for data streams. Eng Technol Appl Sci Res 7(2)
Amini A, Saboohi TYH (2014) On density-based data streams clustering algorithms: a survey. J Comput Sci Technol 29(1):116–141
Gomes Bezerra C, Costa BSJ, Guedes LA, Angelov PP (2019) An evolving approach to data streams clustering based on typicality and eccentricity data analytics. Inf Sci. https://doi.org/10.1016/j.ins.2019.12.022
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Parekh, M., Shukla, M. (2023). Survey of Streaming Clustering Algorithms in Machine Learning on Big Data Architecture. In: Joshi, A., Mahmud, M., Ragel, R.G. (eds) Information and Communication Technology for Competitive Strategies (ICTCS 2021). Lecture Notes in Networks and Systems, vol 400. Springer, Singapore. https://doi.org/10.1007/978-981-19-0095-2_48
Download citation
DOI: https://doi.org/10.1007/978-981-19-0095-2_48
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-19-0094-5
Online ISBN: 978-981-19-0095-2
eBook Packages: EngineeringEngineering (R0)