Skip to main content

Density-Based Mining Algorithms for Dynamic Data: An Incremental Approach

  • Chapter
  • First Online:
Intelligent Technologies: Concepts, Applications, and Future Directions

Part of the book series: Studies in Computational Intelligence ((SCI,volume 1028))

Abstract

Traditionally, data mining algorithms deal with a static set of input. The major bottlenecks with this class of algorithms may include redundant computation, higher latency along with increased consumption of available resources. Given the essence of dealing with dynamic data, this work focused on designing incremental mining algorithms specifically in the field of density-based clustering and outlier detection. The primary reason being density-based algorithms shows robustness in finding clusters of varying granularity or extracting outliers from variable density regions. Through these works, we proposed incremental extensions to two density-based clustering algorithms: MBSCAN, SNN-DBSCAN and an outlier detection algorithm KNNOD. The incremental extensions to MBSCAN (iMass) and KNNOD (KAGO) are approximate in nature supporting single point insertions. While for SNN-DBSCAN, we proposed exact incremental algorithms (\(\text {BISDB}_\mathrm{add}\) and \(\text {BISDB}_\mathrm{del}\)) supporting both insertion and removal of data in batch mode. iMass obtained a maximum efficiency upto an order of 2.28 (\(\approx \)191 times) maintaining a mean clustering accuracy of around 60.375%. \(\text {BISDB}_\mathrm{add}\) and \(\text {BISDB}_\mathrm{del}\) achieved high efficiency upto an order of 3, 4, respectively. The clusters obtained through both these algorithms were identical to SNN-DBSCAN. KAGO outperformed KNNOD by achieving a maximum efficiency upto an order of 3.9 (\(\approx \)8304 times) across two intrusion detection datasets and a bidding data pertaining to a search engine. On evaluating outliers using these datasets, the Rand-Index and F1-score pertaining to KAGO showed an average improved accuracy of around 3.3%.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 149.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 199.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 199.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    Dataset upon which updates are made.

References

  1. Chang H, Lin J, Cheng M, Huang S (2016) A novel incremental data mining algorithm based on fp-growth for big data. In: 2016 International Conference on Networking and Network Applications (NaNA), pp 375–378

    Google Scholar 

  2. Pham DT, Dimov SS, Nguyen CD (2004) An incremental k-means algorithm. In: Proc Inst Mech Eng Part C: J Mech Eng Sci 218(7):783–795

    Google Scholar 

  3. Moses C, Chandra C, Tomás F, Rajeev M (2004) Incremental clustering and dynamic information retrieval. SIAM J Comput 33(6):1417–1440

    Article  MathSciNet  Google Scholar 

  4. Su M-Y, Yu G-J, Chun-Yuen L (2009) A real-time network intrusion detection system for large-scale attacks based on an incremental mining approach. Comput Secur 28(5):301–309

    Article  Google Scholar 

  5. Jain AK, Murty MN, Flynn PJ (1999) Data clustering: a review. ACM Comput Surv (CSUR) 31(3):264–323

    Article  Google Scholar 

  6. Xu R, Wunsch DC (2005) Survey of clustering algorithms

    Google Scholar 

  7. Martin E, Hans-Peter K, Jörg S, Xu X et al (1996) A density-based algorithm for discovering clusters in large spatial databases with noise. In Kdd 96:226–231

    Google Scholar 

  8. Sammut C, Webb GI (2017) Encyclopedia of machine learning and data mining. Springer Publishing Company, Incorporated

    Book  Google Scholar 

  9. Ng RT, Han J (2002) Clarans: a method for clustering objects for spatial data mining. IEEE Trans Knowl Data Eng 5:1003–1016

    Article  Google Scholar 

  10. Hartigan JA, Wong MA (1979) Algorithm as 136: a k-means clustering algorithm. J Roy Stat Soc. Series C (Appl Stat) 28(1):100–108

    MATH  Google Scholar 

  11. Silverman BW (2018) Density estimation for statistics and data analysis. Routledge

    Google Scholar 

  12. Gan G, Ma C, Wu J (2007) Data clustering: theory, algorithms, and applications, vol 20. Siam

    Google Scholar 

  13. Ertöz L, Steinbach M, Kumar V (2003) Finding clusters of different sizes, shapes, and densities in noisy, high dimensional data. In: Proceedings of the 2003 SIAM international conference on data mining, pp 47–58. SIAM, 2003

    Google Scholar 

  14. Varun C, Arindam B, Vipin K (2009) Anomaly detection: a survey. ACM Comput Surveys (CSUR) 41(3):15

    Google Scholar 

  15. Liao Y, Vemuri VR (2002) Use of k-nearest neighbor classifier for intrusion detection. Comput Secur 21(5):439–448

    Article  Google Scholar 

  16. Breunig MM, Kriegel H-P, Ng RT, Sander J (2000) Lof: identifying density-based local outliers. In: ACM Sigmod Rec 29:93–104. ACM (2000)

    Google Scholar 

  17. Basilico J, Hofmann T (2004) Unifying collaborative and content-based filtering. In: Proceedings of the twenty-first international conference on Machine learning. ACM, p 9

    Google Scholar 

  18. Cha M, Kwak H, Rodriguez P, Ahn Y-Y, Moon S (2007) I tube, you tube, everybody tubes: analyzing the world’s largest user generated content video system. In: Proceedings of the 7th ACM SIGCOMM conference on internet measurement, IMC ’07, New York, NY, USA, ACM, pp 1–14

    Google Scholar 

  19. Shah H, Mustafizur R, Syed A (2015) Sensor anomaly detection in wireless sensor networks for healthcare. Sensors 15(4):8764–8786

    Article  Google Scholar 

  20. Tuarob S, Tucker CS, Salathe M, Ram N (2015) Modeling individual-level infection dynamics using social network information. In: Proceedings of the 24th ACM International on Conference on Information and Knowledge Management, CIKM ’15, New York, NY, USA, ACM, pp 1501–1510

    Google Scholar 

  21. Robert M, Ray C (2013) Behavior-rule based intrusion detection systems for safety critical smart grid applications. IEEE Trans Smart Grid 4(3):1254–1263

    Article  Google Scholar 

  22. Li Y, Fang B, Guo L, Chen Y (2007) Network anomaly detection based on tcm-knn algorithm. In: Proceedings of the 2nd ACM symposium on information, computer and communications security. ACM, pp 13–19

    Google Scholar 

  23. Abhinav S, Amlan K, Shamik S, Arun M (2008) Credit card fraud detection using hidden markov model. IEEE Trans Dependable and Secure Comput 5(1):37–48

    Article  Google Scholar 

  24. Ting KM, Zhu Y, Carman M, Zhu Y, Zhou Z-H (2016) Overcoming key weaknesses of distance-based neighbourhood methods using a data dependent dissimilarity measure. In: Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 1205–1214

    Google Scholar 

  25. Dang TT, Ngan HYT, Liu W (2015) Distance-based k-nearest neighbors outlier detection method in large-scale traffic data. In: 2015 IEEE international conference on Digital Signal Processing (DSP). IEEE, pp 507–510

    Google Scholar 

  26. Panthadeep B, Pinaki M (2021) Imass: an approximate adaptive clustering algorithm for dynamic data using probability based dissimilarity. Front Comput Sci 15(2):1–3

    Google Scholar 

  27. Asuncion A, Newman D (2007) Uci machine learning repository

    Google Scholar 

  28. Fränti P, Sieranoja S (2018) K-means properties on six clustering benchmark datasets

    Google Scholar 

  29. Panthadeep B, Pinaki M (2020) Bisdbx: towards batch-incremental clustering for dynamic datasets using snn-dbscan. Pattern Anal Appl 23(2):975–1009

    Article  MathSciNet  Google Scholar 

  30. Singh S, Awekar A (2013) Incremental shared nearest neighbor density-based clustering. In: Proceedings of the 22nd ACM international conference on Information & Knowledge Management. ACM, pp 1533–1536

    Google Scholar 

  31. Bhattacharjee P, Garg A, Mitra P (2021) Kago: an approximate adaptive grid-based outlier detection approach using kernel density estimate. Pattern Anal Appl 1–22

    Google Scholar 

  32. Svante W, Kim E, Paul G (1987) Principal component analysis. Chemomet Intell Lab Syst 2(1–3):37–52

    Google Scholar 

  33. Qin X, Cao L, Rundensteiner EA, Madden S (2019) Scalable kernel density estimation-based local outlier detection over large data streams. In: EDBT, pp 421–432

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Panthadeep Bhattacharjee .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Bhattacharjee, P., Mitra, P. (2022). Density-Based Mining Algorithms for Dynamic Data: An Incremental Approach. In: Dash, S.R., Lenka, M.R., Li, KC., Villatoro-Tello, E. (eds) Intelligent Technologies: Concepts, Applications, and Future Directions. Studies in Computational Intelligence, vol 1028. Springer, Singapore. https://doi.org/10.1007/978-981-19-1021-0_13

Download citation

Publish with us

Policies and ethics