Abstract
This paper presents Opossum, a novel similarity-based clustering approach based on constrained, weighted graph-partitioning. Opossum is particularly attuned to real-life market baskets, characterized by very high-dimensional, highly sparse customer-product matrices with positive ordinal attribute values and significant amount of outliers. Since it is built on top of Metis, a well-known and highly efficient graphpartitioning algorithm, it inherits the scalable and easily parallelizeable attributes of the latter algorithm. Results are presented on a real retail industry data-set of several thousand customers and products, with the help of Clusion, a cluster visualization tool.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Strehl, A., hosh, J.: Value-based customer grouping from large retail data-sets. Proc. SPIE Vol. 4057, (2000) 33–42.
Jain, A. K., Dubes, R. C.: Algorithms for Clustering Data. Prentice Hall, New Jersey (1988)
Hartigan, J.A.: Clustering Algorithms. Wiley, New York (1975)
Rastogi, R., Shim, K.: Scalable algorithms for mining large databases. In Jiawei Han, (ed), KDD-99 Tutorial Notes. ACM (1999)
Guha, S., Rastogi, R., Shim, K.: Rock: a robust clustering algorithm for categorical attributes. Proc.15th Int’l Conf. on Data Engineering (1999)
Karypis, G., Han, E., Kumar, V.: Chameleon: Hierarchical clustering using dynamic modeling. IEEE Computer, 32(8), (1999) 68–75
Bradley, P., Fayyad, U., Reina, C.: Scaling clustering to large databases. In Proc. KDD-98, AAAI Press (1998) 9–15 1998.
Dhillon, I., Modha, D.: A data clustering algorithm on distributed memory multi-processors. DD Workshop on Large-Scale Parallel Systems (1999)
Strehl, A., Ghosh, J., Mooney, R.: Impact of similarity measures on web-page lustering. In Proc. AAAI Workshop on AI for Web Search (2000) 58–64
Miller, G.L., Teng, S., Vavasis, S.A., A unified geometric approach to graph separators. In Proc. 31st Annual Symposium on Foundations of Computer Science (1991) 538–547
Karypis, G., Kumar, V.: A fast and high quality multilevel scheme for partitioning rregular graphs. SIAM Journal of Scientific Computing, 20 1 (1998), 359–392
Schloegel, K., Karypis, G., Kumar, V.: Parallel multilevel algorithms for multiconstraint raph partitioning. Technical Report 99–031, Dept of Computer Sc. and ng, Univ. of Minnesota (1999)
Dhillon, I., Modha, D., Spangler, W.: Visualizing class structure of multidimensional ata. In S. Weisberg, editor, Proc. 30th Symposium on the Interface: Computing cience and Statistics, (1998)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2000 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Strehl, A., Ghosh, J. (2000). A Scalable Approach to Balanced, High-Dimensional Clustering of Market-Baskets. In: Valero, M., Prasanna, V.K., Vajapeyam, S. (eds) High Performance Computing — HiPC 2000. HiPC 2000. Lecture Notes in Computer Science, vol 1970. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-44467-X_48
Download citation
DOI: https://doi.org/10.1007/3-540-44467-X_48
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-41429-2
Online ISBN: 978-3-540-44467-1
eBook Packages: Springer Book Archive