A Scalable Approach to Balanced, High-Dimensional Clustering of Market-Baskets
This paper presents Opossum, a novel similarity-based clustering approach based on constrained, weighted graph-partitioning. Opossum is particularly attuned to real-life market baskets, characterized by very high-dimensional, highly sparse customer-product matrices with positive ordinal attribute values and significant amount of outliers. Since it is built on top of Metis, a well-known and highly efficient graphpartitioning algorithm, it inherits the scalable and easily parallelizeable attributes of the latter algorithm. Results are presented on a real retail industry data-set of several thousand customers and products, with the help of Clusion, a cluster visualization tool.
Unable to display preview. Download preview PDF.
- 1.Strehl, A., hosh, J.: Value-based customer grouping from large retail data-sets. Proc. SPIE Vol. 4057, (2000) 33–42.Google Scholar
- 4.Rastogi, R., Shim, K.: Scalable algorithms for mining large databases. In Jiawei Han, (ed), KDD-99 Tutorial Notes. ACM (1999)Google Scholar
- 5.Guha, S., Rastogi, R., Shim, K.: Rock: a robust clustering algorithm for categorical attributes. Proc.15th Int’l Conf. on Data Engineering (1999)Google Scholar
- 6.Karypis, G., Han, E., Kumar, V.: Chameleon: Hierarchical clustering using dynamic modeling. IEEE Computer, 32(8), (1999) 68–75Google Scholar
- 7.Bradley, P., Fayyad, U., Reina, C.: Scaling clustering to large databases. In Proc. KDD-98, AAAI Press (1998) 9–15 1998.Google Scholar
- 8.Dhillon, I., Modha, D.: A data clustering algorithm on distributed memory multi-processors. DD Workshop on Large-Scale Parallel Systems (1999)Google Scholar
- 9.Strehl, A., Ghosh, J., Mooney, R.: Impact of similarity measures on web-page lustering. In Proc. AAAI Workshop on AI for Web Search (2000) 58–64Google Scholar
- 10.Miller, G.L., Teng, S., Vavasis, S.A., A unified geometric approach to graph separators. In Proc. 31st Annual Symposium on Foundations of Computer Science (1991) 538–547Google Scholar
- 12.Schloegel, K., Karypis, G., Kumar, V.: Parallel multilevel algorithms for multiconstraint raph partitioning. Technical Report 99–031, Dept of Computer Sc. and ng, Univ. of Minnesota (1999)Google Scholar
- 13.Dhillon, I., Modha, D., Spangler, W.: Visualizing class structure of multidimensional ata. In S. Weisberg, editor, Proc. 30th Symposium on the Interface: Computing cience and Statistics, (1998)Google Scholar