Advertisement

Transaction Clustering Using a Seeds Based Approach

  • Yun Sing Koh
  • Russel Pears
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5012)

Abstract

Transaction clustering has received a great deal of attention in the past few years. Its functionality extends well beyond traditional clustering algorithms which basically perform a near-neighbourhood search for locating groups of similar instances. The basic concept underlying transaction clustering stems from the concept of large items as defined by association rule mining algorithms. Clusters formed on the basis of large items that are shared between instances offer an attractive alternative to association rule mining systems. Currently, none of the techniques proposed offer a good solution to scenarios where large items overlap across clusters. In this paper we overcome the aforementioned limitations by using cluster seeds that represent initial centroids. Seeds are generated from sets of transaction items that occur together above a certain threshold and such seeds may overlap in their itemsets across clusters.

Keywords

Frequent Item Relative Support Cluster Centroid Large Item Traditional Cluster 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Jain, A.K., Murty, M.N., Flynn, P.J.: Data clustering: a review. ACM Comput. Surv. 31(3), 264–323 (1999)CrossRefGoogle Scholar
  2. 2.
    Ganti, V., Gehrke, J., Ramakrishnan, R.: CACTUS: Clustering categorical data using summaries. In: KDD 1999: Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 73–83. ACM Press, New York (1999)CrossRefGoogle Scholar
  3. 3.
    Gibson, D., Kleinberg, J.M., Raghavan, P.: Clustering categorical data: An approach based on dynamical systems. In: VLDB 1998: Proceedings of the 24rd International Conference on Very Large Data Bases, pp. 311–322. Morgan Kaufmann Publishers Inc., San Francisco (1998)Google Scholar
  4. 4.
    Guha, S., Rastogi, R., Shim, K.: ROCK: A robust clustering algorithm for categorical attributes. Information Systems 25(5), 345–366 (2000)CrossRefGoogle Scholar
  5. 5.
    Agrawal, R., Imielinski, T., Swami, A.: Mining association rules between sets of items in large databases. In: Buneman, P., Jajodia, S. (eds.) Proceedings of the 1993 ACM SIGMOD International Conference on Management of Data, pp. 207–216 (1993)Google Scholar
  6. 6.
    Xu, J., Xiong, H., Sung, S.Y., Kumar, V.: A new clustering algorithm for transaction data via caucus. In: Whang, K.-Y., Jeon, J., Shim, K., Srivastava, J. (eds.) PAKDD 2003. LNCS (LNAI), vol. 2637, pp. 551–562. Springer, Heidelberg (2003)CrossRefGoogle Scholar
  7. 7.
    Wang, K., Xu, C., Liu, B.: Clustering transactions using large items. In: CIKM 1999: Proceedings of the Eighth International Conference on Information and Knowledge Management, pp. 483–490. ACM Press, New York (1999)CrossRefGoogle Scholar
  8. 8.
    Cutting, D.R., Pedersen, J.O., Karger, D., Tukey, J.W.: Scatter/gather: A cluster-based approach to browsing large document collections. In: Proceedings of the Fifteenth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 318–329 (1992)Google Scholar
  9. 9.
    Yun, C.H., Chuang, K.T., Chen, M.S.: An efficient clustering algorithm for market basket data based on small large ratios. In: COMPSAC 2001: Proceedings of the 25th International Computer Software and Applications Conference on Invigorating Software Development, pp. 505–510. IEEE Computer Society, Washington (2001)Google Scholar
  10. 10.
    Ivchenko, G.I., Honov, S.A.: On the jaccard similarity test. Journal of Mathematical Sciences 88(6)Google Scholar
  11. 11.
    Newman, D., Hettich, S., Blake, C., Merz, C.: UCI repository of machine learning databases (1998), http://www.ics.uci.edu/~mlearn/MLRepository.html
  12. 12.
    Sharma, S.: Applied multivariate techniques. John Wiley & Sons Inc., Chichester (1996)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2008

Authors and Affiliations

  • Yun Sing Koh
    • 1
  • Russel Pears
    • 1
  1. 1.School of Computing and Mathematical SciencesAuckland University of TechnologyNew Zealand

Personalised recommendations