A Scalable Approach to Balanced, High-Dimensional Clustering of Market-Baskets

  • Alexander Strehl
  • Joydeep Ghosh
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 1970)


This paper presents Opossum, a novel similarity-based clustering approach based on constrained, weighted graph-partitioning. Opossum is particularly attuned to real-life market baskets, characterized by very high-dimensional, highly sparse customer-product matrices with positive ordinal attribute values and significant amount of outliers. Since it is built on top of Metis, a well-known and highly efficient graphpartitioning algorithm, it inherits the scalable and easily parallelizeable attributes of the latter algorithm. Results are presented on a real retail industry data-set of several thousand customers and products, with the help of Clusion, a cluster visualization tool.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Strehl, A., hosh, J.: Value-based customer grouping from large retail data-sets. Proc. SPIE Vol. 4057, (2000) 33–42.Google Scholar
  2. 2.
    Jain, A. K., Dubes, R. C.: Algorithms for Clustering Data. Prentice Hall, New Jersey (1988)MATHGoogle Scholar
  3. 3.
    Hartigan, J.A.: Clustering Algorithms. Wiley, New York (1975)MATHGoogle Scholar
  4. 4.
    Rastogi, R., Shim, K.: Scalable algorithms for mining large databases. In Jiawei Han, (ed), KDD-99 Tutorial Notes. ACM (1999)Google Scholar
  5. 5.
    Guha, S., Rastogi, R., Shim, K.: Rock: a robust clustering algorithm for categorical attributes. Proc.15th Int’l Conf. on Data Engineering (1999)Google Scholar
  6. 6.
    Karypis, G., Han, E., Kumar, V.: Chameleon: Hierarchical clustering using dynamic modeling. IEEE Computer, 32(8), (1999) 68–75Google Scholar
  7. 7.
    Bradley, P., Fayyad, U., Reina, C.: Scaling clustering to large databases. In Proc. KDD-98, AAAI Press (1998) 9–15 1998.Google Scholar
  8. 8.
    Dhillon, I., Modha, D.: A data clustering algorithm on distributed memory multi-processors. DD Workshop on Large-Scale Parallel Systems (1999)Google Scholar
  9. 9.
    Strehl, A., Ghosh, J., Mooney, R.: Impact of similarity measures on web-page lustering. In Proc. AAAI Workshop on AI for Web Search (2000) 58–64Google Scholar
  10. 10.
    Miller, G.L., Teng, S., Vavasis, S.A., A unified geometric approach to graph separators. In Proc. 31st Annual Symposium on Foundations of Computer Science (1991) 538–547Google Scholar
  11. 11.
    Karypis, G., Kumar, V.: A fast and high quality multilevel scheme for partitioning rregular graphs. SIAM Journal of Scientific Computing, 20 1 (1998), 359–392CrossRefMathSciNetGoogle Scholar
  12. 12.
    Schloegel, K., Karypis, G., Kumar, V.: Parallel multilevel algorithms for multiconstraint raph partitioning. Technical Report 99–031, Dept of Computer Sc. and ng, Univ. of Minnesota (1999)Google Scholar
  13. 13.
    Dhillon, I., Modha, D., Spangler, W.: Visualizing class structure of multidimensional ata. In S. Weisberg, editor, Proc. 30th Symposium on the Interface: Computing cience and Statistics, (1998)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2000

Authors and Affiliations

  • Alexander Strehl
    • 1
  • Joydeep Ghosh
    • 1
  1. 1.The University of Texas at AustinAustinUSA

Personalised recommendations