Skip to main content
Log in

ICA: An Incremental Clustering Algorithm Based on OPTICS

  • Published:
Wireless Personal Communications Aims and scope Submit manuscript

Abstract

Clustering algorithms play an important role in data mining no matter whether they are used as a stand-alone tool or as a preprocessing step for further analysis on the data. With the arrival of the information era, the speed of data generation is faster and faster. As a result, clustering algorithms, such as OPTICS, that can only be operated on the static dataset can’t meet the new requirements. Motivated by the demand of clustering analysis on dynamic datasets efficiently, in this paper, we propose an incremental clustering algorithm (ICA) based on OPTICS. The result of ICA is a cluster-ordering structure which is some similar to the result of OPTICS. In ICA, we delete the parameters ɛ and MinPts that should be preset by users in OPTICS and reachability-distance is also replaced by Distance which is easier to compute and understand. As a result, ICA is much more efficient compared with OPTICS. In addition, we propose a method named automatically extract technique to extract the clusters from the cluster-ordering structure based on the users’ needs. Our performance evaluation through a series of experiments demonstrates the effectiveness and efficiency of our algorithm. Specially, we present a detailed comparison of ICA and OPTICS and the results illustrate that ICA is much more suitable for clustering the dynamic datasets, i.e., some new data objects are added into the datasets as time goes on.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17

Similar content being viewed by others

References

  1. Sulaiman, S. N., & Isa, N. A. M. (2010). Adaptive fuzzy-K-means clustering algorithm for image segmentation. IEEE Transactions on Consumer Electronics, 56(4), 2661–2668.

    Article  Google Scholar 

  2. A fast clustering-based feature subset selection algorithm for high-dimensional data.

  3. Tan, W., Blake, M. B., Saleh, I., et al. (2013). Social-network-sourced big data analytics. IEEE Internet Computing, 17(5), 62–69.

    Article  Google Scholar 

  4. Guha, S., Mishra, N., Motwani, R., et al. (2000). Clustering data streams. In Proceedings of the 41st annual symposium on foundations of computer science, 2000 (pp. 359–366), IEEE.

  5. Ankerst, M., Breunig, M. M., Kriegel, H. P., et al. (1999). Optics: Ordering points to identify the clustering structure. ACM SIGMOD Record, 28(2), 49–60.

    Article  Google Scholar 

  6. Guha, S., Mishra, N., Motwani, R., et al. (2000). Clustering data streams. In 41st annual symposium on foundations of computer science, 2000 proceedings (pp. 359–366), IEEE.

  7. Guha, S., Meyerson, A., Mishra, N., et al. (2003). Clustering data streams: Theory and practice. IEEE Transactions on Knowledge and Data Engineering, 15(3), 515–528.

    Article  Google Scholar 

  8. Aggarwal, C. C., Han, J., Wang, J., et al. (2003). A framework for clustering evolving data streams. In Proceedings of the 29th international conference on very large data bases-volume 29 (VLDB endowment) (pp. 81–92).

  9. Cao, F., Ester, M., Qian, W., et al. (2006). Density-based clustering over an evolving data stream with noise. In SDM (vol. 6, pp. 326–337).

  10. O’callaghan, L., Meyerson, A., Motwani, R., et al. (2002). Streaming-data algorithms for high-quality clustering. In 2013 IEEE 29th international conference on data engineering (ICDE) (pp. 0685–0685), IEEE Computer Society.

  11. Sibson, R. (1973). SLINK: An optimally efficient algorithm for the single-link cluster method. The Computer Journal, 16(1), 30–34.

    Article  MathSciNet  Google Scholar 

  12. Schikuta, E. (1996). Grid-clustering: An efficient hierarchical clustering method for very large data sets. In Proceedings of the 13th international conference on pattern recognition, 1996 (vol. 2, pp. 101–105), IEEE.

  13. Schikuta, E., & Erhart, M. (1997). The BANG-clustering system: Grid-based data analysis. Advances in intelligent data analysis reasoning about data (pp. 513–524). Berlin: Springer.

    Book  Google Scholar 

  14. Zhang, T., Ramakrishnan, R., & Livny, M. (1996). BIRCH: An efficient data clustering method for very large databases. ACM SIGMOD Record, 25(2), 103–114.

    Article  Google Scholar 

  15. Guha, S., Rastogi, R., & Shim, K. (1998). CURE: An efficient clustering algorithm for large databases. ACM SIGMOD Record, 27(2), 73–84.

    Article  Google Scholar 

  16. Huang, Z. (1997). A fast clustering algorithm to cluster very large categorical data sets in data mining. In DMKD.

  17. Ng, R. T., & Han, J. (2002). CLARANS: A method for clustering objects for spatial data mining. IEEE Transactions on Knowledge and Data Engineering, 14(5), 1003–1016.

    Article  Google Scholar 

  18. Kaufman, L., & Rousseeuw, P. J. (2009). Finding groups in data: An introduction to cluster analysis. New York: Wiley.

    Google Scholar 

  19. Frey, B. J., & Dueck, D. (2007). Clustering by passing messages between data points. Science, 315(5814), 972–976.

    Article  MATH  MathSciNet  Google Scholar 

  20. Sun, L., & Guo, C. (2014). Incremental affinity propagation clustering based on message passing.

  21. Ester, M., Kriegel, H. P., Sander, J., et al. (1996). A density-based algorithm for discovering clusters in large spatial databases with noise. In KDD (vol. 96, pp. 226–231).

  22. Bentley, J. L. (1975). Multidimensional binary search trees used for associative searching. Communications of the ACM, 18(9), 509–517.

    Article  MATH  MathSciNet  Google Scholar 

  23. Beckmann, N., Kriegel, H. P., Schneider, R., et al. (1990). The R*-tree: An efficient and robust access method for points and rectangles. In ACM.

  24. Berchtold, S., Keim, D. A., & Kriegel, H. P. (2001). The X-tree: An index structure for high-dimensional data. Readings in Multimedia Computing and Networking, 451.

Download references

Acknowledgments

This research is supported by National Natural Science Foundation under Grant 61371071, Beijing Natural Science Foundation under Grant 4132057, Academic Discipline and Postgraduate Education Project of Beijing Municipal Commission of Education.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yun Liu.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Fu, JS., Liu, Y. & Chao, HC. ICA: An Incremental Clustering Algorithm Based on OPTICS. Wireless Pers Commun 84, 2151–2170 (2015). https://doi.org/10.1007/s11277-015-2517-9

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11277-015-2517-9

Keywords

Navigation