Advertisement

Scalable Hierarchical Clustering Method for Sequences of Categorical Values

  • Tadeusz Morzy
  • Marek Wojciechowski
  • Maciej Zakrzewicz
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 2035)

Abstract

Data clustering methods have many applications in the area of data mining. Traditional clustering algorithms deal with quantitative or categorical data points. However, there exist many important databases that store categorical data sequences, where significant knowledge is hidden behind sequential dependencies between the data. In this paper we introduce a problem of clustering categorical data sequences and present an efficient scalable algorithm to solve the problem. Our algorithm implements the general idea of agglomerative hierarchical clustering and uses frequently occurring subsequences as features describing data sequences. The algorithm not only discovers a set of high quality clusters containing similar data sequences but also provides descriptions of the discovered clusters.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Agrawal R., Gehrke J., Gunopulos D., Raghavan P.: Automatic Subspace Clustering of High Dimensional Data for Data Mining Applications. Proceedings of the 1998 ACM SIGMOD International Conference on Management of Data (1998)Google Scholar
  2. 2.
    Agrawal R., Srikant R.: Mining Sequential Patterns. Proceedings of the 11th International Conference on Data Engineering (1995)Google Scholar
  3. 3.
    Agrawal, R.; Mehta, M.; Shafer, J.; Srikant, R.; Arning, A.; Bollinger, T.: The Quest Data Mining System. Proceedings of the 2nd International Conference on Knowledge Discovery and Data Mining (1996)Google Scholar
  4. 4.
    Bradley P.S., Fayyad U.M., Reina C.: Scaling Clustering Algorithms to Large Databases. Proceedings of the 4th International Conference on Knowledge Discovery and Data Mining (1998)Google Scholar
  5. 5.
    Broder A., Glassman S., Manasse M., Zweig G.: Syntactic clustering of the Web. Computer Networks and ISDN Systems 29, Proceedings of the 6th International WWW Conference (1997)Google Scholar
  6. 6.
    Ester M., Kriegel H-P., Sander J., Xu X.: A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise. Proceedings of the 2nd International Conference on Knowledge Discovery and Data Mining (1996)Google Scholar
  7. 7.
    Fisher D.H.: Knowledge acquisition via incremental conceptual clustering. Machine Learning 2 (1987)Google Scholar
  8. 8.
    Ganti V., Gehrke J., Ramakrishnan R.: CACTUS-Clustering Categorical Data Using Summaries. Proceedings of the 5th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (1999)Google Scholar
  9. 9.
    Gibson D., Kleinberg J.M., Raghavan P.: Clustering Categorical Data: An Approach Based on Dynamical Systems. Proceedings of the 24th International Conference on Very Large Data Bases (1998)Google Scholar
  10. 10.
    Guha S., Rastogi R., Shim K.: CURE: An Efficient Clustering Algorithm for Large Databases. Proceedings of the 1998 ACM SIGMOD International Conference on Management of Data (1998)Google Scholar
  11. 11.
    Guha S., Rastogi R., Shim K.: ROCK: A Robust Clustering Algorithm for Categorical Attributes. Proceedings of the 15th International Conference on Data Engineering (1999)Google Scholar
  12. 12.
    Hartigan J.A.: Clustering Algorithms. John Wiley & Sons, New York (1975)zbMATHGoogle Scholar
  13. 13.
    Han E., Karypis G., Kumar V., Mobasher B.: Clustering based on association rules hypergraphs. Proceedings of the Workshop on Research Issues on Data Mining and Knowledge Discovery (1997)Google Scholar
  14. 14.
    Jain A.K., Dubes R.C.: Algorithms for Clustering Data. Prentice Hall (1988)Google Scholar
  15. 15.
    Kaufman L., Rousseeuw P.: Finding Groups in Data. John Wiley & Sons, New York (1989)Google Scholar
  16. 16.
    Ketterlin A.: Clustering Sequences of Complex Objects. Proceedings of the 3rd International Conference on Knowledge Discovery and Data Mining (1997)Google Scholar
  17. 17.
    Lesh N., Zaki M.J., Ogihara M.: Mining Features for Sequence Classification. Proceedings of the 5th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (1999)Google Scholar
  18. 18.
    Perkowitz M., Etzioni O.: Towards Adaptive Web Sites: Conceptual Framework and Case Study. Computer Networks 31, Proceedings of the 8th International WWW Conference (1999)Google Scholar
  19. 19.
    Ramkumar G.D., Swami A.: Clustering Data Without Distance Functions. Bulletin of the IEEE Computer Society Technical Committee on Data Engineering, Vol.21No. 1 (1998)Google Scholar
  20. 20.
    Srikant R., Agrawal R.: Mining Sequential Patterns: Generalizations and Performance Improvements. Proceedings of the 5th International Conference on Extending Database Technology (1996)Google Scholar
  21. 21.
    Wang K., Xu C., Liu B.: Clustering Transactions Using Large Items. Proceedings of the 1999ACM CIKM International Conference on Information and Knowledge Management (1999)Google Scholar
  22. 22.
    Zhang T., Ramakrishnan R., Livny M.: Birch: An efficient data clustering method for very large databases. Proceedings of the 1996 ACM SIGMOD International Conference on Management of Data (1996)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2001

Authors and Affiliations

  • Tadeusz Morzy
    • 1
  • Marek Wojciechowski
    • 1
  • Maciej Zakrzewicz
    • 1
  1. 1.Institute of Computing SciencePoznan University of TechnologyPoznanPoland

Personalised recommendations