Skip to main content

Survey of Streaming Clustering Algorithms in Machine Learning on Big Data Architecture

  • Conference paper
  • First Online:
Information and Communication Technology for Competitive Strategies (ICTCS 2021)

Part of the book series: Lecture Notes in Networks and Systems ((LNNS,volume 400))

  • 480 Accesses

Abstract

Machine learning is becoming increasingly popular in a range of fields. Big data helps machine learning algorithms better timely and accurate recommendations than ever before. Machine learning big data phases and subsystems point the way to comparable issues and threats, as well as to associated research in various of previously new of unexplored areas. In latest days, data stream mining has become a popular research topic. The biggest challenge in streaming data mining is extracting important information directly from a vast, persistent, and changeable data stream in just one scan. Clustering is a powerful method for resolving this issue. Financial transactions, electronic communications, and other fields can benefit from data stream clustering. This research examines approaches for synthesizing enormous data streams, including projection clusters for high-dimensional data, scalability, and spreading computing, as well as the issues of big data and machine learning. The data abstraction and regular features of learning streams, such as abstraction wander, data flow system, and outlier observation, are discussed in this study. AutoCloud, which seems supported by previous inform prospect of typicality with eccentricity data science, will be used to discover deviations but instead resolve them using the support clustering approach, which is the process of computing difficulty and clustering exactness, will be detailed in this article. We present the MLAutoCloud algorithm, which will be used in machine learning PySpark frameworks. Implement the MLAutoCloud algorithm in the future to tackle the AutoCloud algorithm’s problems.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 189.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 249.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Mavragani A, Ochoa G, Tsagarakis KP (2018) Assessing the methods, tools, and statistical procedures in google trends research: systematic review

    Google Scholar 

  2. Kolajo T, Daramola O, Adebiyi A (2019) Big data stream analysis: a systematic literature review. J Big Data

    Google Scholar 

  3. Qian Z, He Y, Su C, Wu Z, Zhu H, Zhang T, Zhou L, Yu Y, Zhang Z (2013) TimeStream: reliable stream computation in the cloud. In: Proceedings of the eighth ACM European conference on computer systems, EuroSys 2013. ACM Press, Prague, pp 1–4

    Google Scholar 

  4. Zhou L, Pan S, Wang J, Vasilakos AV (2017) Machine learning on big data: opportunities and challenges

    Google Scholar 

  5. Tsai C-W, Lai C-F, Chao H-C, Vasilakos AV (2015) Big data analytics: a survey. J Big Data 2:1–32

    Article  Google Scholar 

  6. Russell S, Norvig P (2010) Artificial intelligence: a modern approach, 3rd edn. Prentice Hall

    Google Scholar 

  7. Bengio Y, Courville A, Vincent P (2013) Representation learning: a review and new perspectives. IEEE Trans Pattern Anal Mach Intell 35

    Google Scholar 

  8. O’Callaghan L, Mishra N, Meyerson A, Guha S, Motwani R (2002) Streaming-data algorithms for high-quality clustering. In: 18th international conference on data engineering, pp 685–694

    Google Scholar 

  9. Aggarwal CC, Han J, Wang J, Yu PS (2003) A framework for clustering evolving data streams. In: International conference on very large databases, vol 29, pp 81–92

    Google Scholar 

  10. Aggarwal CC, Han J, Wang J, Yu PS (2004) A framework for projected clustering of high dimensional data streams. In: Thirtieth international conference on very large data bases, vol 30, pp 852–863

    Google Scholar 

  11. Dharni C, Bnasal M (2013) An enhancement of DBSCAN algorithm to dissect cluster for large datasets. In: IEEE international conference on MOOC innovation and technology in education (MITE), pp 42–46

    Google Scholar 

  12. Ankerst M, Breunig MM, Kriegel H, Sander J (1999) OPTICS ordering points to identify the clustering structure. ACM SIGMOD Rec 28(2):49–60

    Article  Google Scholar 

  13. Wan L, Ng WK, Dang XH, Yu PS, Zhang K (2009) Viscosity-grounded clustering of data aqueducts at multiple judgments. ACM Trans Knowl Discov Data (TKDD) 3(3):1–28

    Google Scholar 

  14. Ntoutsi I, Zimek A, Palpanas T, Kröger P, Kriegel H (2012) Viscosity-grounded projected clustering over high dimensional data aqueducts. In: Society of industrial and applied mathematics (SIAM) international conference on data mining, pp 987–998

    Google Scholar 

  15. Amini A, Wah TY (2012) DENGRIS-Stream: a viscosity-grid grounded clustering algorithm for evolving data aqueducts over sliding window. In: International conference on data mining computer engineering, pp 206–211

    Google Scholar 

  16. Cao Y, He H, Man H (2012) SOMKE kernel viscosity estimation over data aqueducts by sequences of tone-organizing charts. IEEE Trans Neural Netw Learn Syst 23(8):1254–1268

    Google Scholar 

  17. Amini A, Wah TY (2013) LeaDen-Stream: a leader density-based clustering algorithm over evolving data stream. J Comput Commun 1(5):26–31

    Article  Google Scholar 

  18. Rodrigues PP, Gama J, Pedroso JP (2008) ODAC: hierarchical clustering of time series data streams. IEEE Trans Knowl Data Eng 20(5):615–627

    Google Scholar 

  19. Zhang T, Ramakrishnan R, Livny M (1996) BIRCH: an efficient data clustering method for very large databases. ACM SIGMOD Rec 25(2):103–114

    Google Scholar 

  20. Keogh E, Chu S, Hart D, Pazzani M (2001) An online algorithm for segmenting time series. In: International conference on data mining, pp 289–296

    Google Scholar 

  21. Bedi KP (2015) Clustering of categorized text data using cobweb algorithm. Int J Comput Sci Inf Technol Res 3(3):249–254

    Google Scholar 

  22. Kaneriya A, Shukla M (2015) A novel approach for clustering data streams using granularity technique. In: 2015 international conference on advances in computer engineering and applications, pp 586–590. https://doi.org/10.1109/ICACEA.2015.7164759

  23. Yui M, Kojima I (2013) A database-hadoop hybrid approach to scalable machine learning. In: IEEE international congress on big data, pp 1–8

    Google Scholar 

  24. De Morales G (2013) SAMOA: a platform for mining big data streams. In: Proceedings of the twenty second international conference on world wide web, pp 777–778

    Google Scholar 

  25. Tamboli J, Shukla M (2016) A survey of outlier detection algorithms for data streams. In: 2016 3rd international conference on computing for sustainable global development (INDIACom), pp 3535–3540

    Google Scholar 

  26. Najafabadi MM, Villanustre F, Khoshgoftaar TM, Seliya N, Wald R, Muharemagic E (2015) Deep learning applications and challenges in big data analytics. J Big Data 2(1):1

    Article  Google Scholar 

  27. Jordan MI, Mitchell TM (2015) Machine learning with trends, perspectives, and prospects. Science 349:255–260

    Article  MathSciNet  Google Scholar 

  28. Keshvani T, Shukla M (2018) A comparative study on data stream clustering algorithms. In: International conference on computer networks, big data and IoT. Springer, Cham

    Google Scholar 

  29. Shukla M, Kosta YP (2016) Empirical analysis and improvement of density based clustering algorithm in data streams. In: 2016 international conference on inventive computation technologies (ICICT), pp 1–4. https://doi.org/10.1109/INVENTIVE.2016.7823262

  30. Shukla M, Kosta YP, Jayswal M (2017) A modified approach of OPTICS algorithm for data streams. Eng Technol Appl Sci Res 7(2)

    Google Scholar 

  31. Amini A, Saboohi TYH (2014) On density-based data streams clustering algorithms: a survey. J Comput Sci Technol 29(1):116–141

    Article  Google Scholar 

  32. Gomes Bezerra C, Costa BSJ, Guedes LA, Angelov PP (2019) An evolving approach to data streams clustering based on typicality and eccentricity data analytics. Inf Sci. https://doi.org/10.1016/j.ins.2019.12.022

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Madhuri Parekh .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Parekh, M., Shukla, M. (2023). Survey of Streaming Clustering Algorithms in Machine Learning on Big Data Architecture. In: Joshi, A., Mahmud, M., Ragel, R.G. (eds) Information and Communication Technology for Competitive Strategies (ICTCS 2021). Lecture Notes in Networks and Systems, vol 400. Springer, Singapore. https://doi.org/10.1007/978-981-19-0095-2_48

Download citation

  • DOI: https://doi.org/10.1007/978-981-19-0095-2_48

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-19-0094-5

  • Online ISBN: 978-981-19-0095-2

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics