Transaction Clustering Using a Seeds Based Approach
Transaction clustering has received a great deal of attention in the past few years. Its functionality extends well beyond traditional clustering algorithms which basically perform a near-neighbourhood search for locating groups of similar instances. The basic concept underlying transaction clustering stems from the concept of large items as defined by association rule mining algorithms. Clusters formed on the basis of large items that are shared between instances offer an attractive alternative to association rule mining systems. Currently, none of the techniques proposed offer a good solution to scenarios where large items overlap across clusters. In this paper we overcome the aforementioned limitations by using cluster seeds that represent initial centroids. Seeds are generated from sets of transaction items that occur together above a certain threshold and such seeds may overlap in their itemsets across clusters.
KeywordsFrequent Item Relative Support Cluster Centroid Large Item Traditional Cluster
Unable to display preview. Download preview PDF.
- 3.Gibson, D., Kleinberg, J.M., Raghavan, P.: Clustering categorical data: An approach based on dynamical systems. In: VLDB 1998: Proceedings of the 24rd International Conference on Very Large Data Bases, pp. 311–322. Morgan Kaufmann Publishers Inc., San Francisco (1998)Google Scholar
- 5.Agrawal, R., Imielinski, T., Swami, A.: Mining association rules between sets of items in large databases. In: Buneman, P., Jajodia, S. (eds.) Proceedings of the 1993 ACM SIGMOD International Conference on Management of Data, pp. 207–216 (1993)Google Scholar
- 8.Cutting, D.R., Pedersen, J.O., Karger, D., Tukey, J.W.: Scatter/gather: A cluster-based approach to browsing large document collections. In: Proceedings of the Fifteenth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 318–329 (1992)Google Scholar
- 9.Yun, C.H., Chuang, K.T., Chen, M.S.: An efficient clustering algorithm for market basket data based on small large ratios. In: COMPSAC 2001: Proceedings of the 25th International Computer Software and Applications Conference on Invigorating Software Development, pp. 505–510. IEEE Computer Society, Washington (2001)Google Scholar
- 10.Ivchenko, G.I., Honov, S.A.: On the jaccard similarity test. Journal of Mathematical Sciences 88(6)Google Scholar
- 11.Newman, D., Hettich, S., Blake, C., Merz, C.: UCI repository of machine learning databases (1998), http://www.ics.uci.edu/~mlearn/MLRepository.html
- 12.Sharma, S.: Applied multivariate techniques. John Wiley & Sons Inc., Chichester (1996)Google Scholar