Clustering by Random Projections

  • Thierry Urruty
  • Chabane Djeraba
  • Dan A. Simovici
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4597)


Clustering algorithms for multidimensional numerical data must overcome special difficulties due to the irregularities of data distribution. We present a clustering algorithm for numerical data that combines ideas from random projection techniques and density-based clustering. The algorithm consists of two phases: the first phase that entails the use of random projections to detect clusters, and the second phase that consists of certain post-processing techniques of clusters obtained by several random projections. Experiments were performed on synthetic data consisting of randomly-generated points in ℝ n , synthetic images containing colored regions randomly distributed, and, finally, real images. Our results suggest the potential of our algorithm for image segmentation.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Agrawal, R., Gehrke, J., Gunopulos, D., Raghavan, P.: Automatic subspace clustering of high dimensional data for data mining applications. In: Proceedings of the ACM-SIGMOD Int. Conf. Management of Data, pp. 94–105. ACM Press, New York (1998)CrossRefGoogle Scholar
  2. 2.
    Agarwal, P., Mustafa, N.H.: k-means projective clustering. In: Proceedings of PODS, pp. 155–165 (2004)Google Scholar
  3. 3.
    Aggarwal, C.C., Procopiuc, C., Wolf, J.L., Yu, P.S., Park, J.S.: Fast algorithms for projected clustering. In: Proceedings of ACM-SIGMOD Conference on Management of Data, pp. 61–72. ACM Press, New York (1999)Google Scholar
  4. 4.
    Barthélemy, J.P., Leclerc, B.: The median procedure for partitions. In: Partitioning Data Sets. American Mathematical Society, pp. 3–14. Providence, RI (1995)Google Scholar
  5. 5.
    Chaudhri, A.B., Unland, R., Djeraba, C., Lindner, W. (eds.): EDBT 2002. LNCS, vol. 2490. Springer, Heidelberg (2002)zbMATHGoogle Scholar
  6. 6.
    Dasgupta, S., Gupta, A.: An elementary proof of the johnson-lindenstrauss lemma. Technical Report TR-99-006, International Computer Science Institute (1999)Google Scholar
  7. 7.
    Djeraba, C. (ed.): Multimedia Mining - A Highway to Intelligent Multimedia Documents. Kluwer, Dordrecht (2003)Google Scholar
  8. 8.
    Frankl, P., Maehara, H.: The johnson-lindenstrauss lemma and the sphericity of some graphs. J. Comb. Theory B 44, 355–362 (1988)zbMATHCrossRefGoogle Scholar
  9. 9.
    Jain, A.K., Dubes, R.: Algorithms for Clustering Data. Prentice-Hall, Englewood Cliffs (1988)zbMATHGoogle Scholar
  10. 10.
    Jain, A.K., Flynn, P.J.: Image segmentation using clustering. In: Advances in Image Understanding: A Festschrift for Azriel Rosenfeld, Piscataway, NJ, pp. 65–83. IEEE Press, Los Alamitos (1996)Google Scholar
  11. 11.
    Johnson, W.B., Lindenstrauss, J.: Extensions of lipshitz mappings into hilbert spaces. Contemporary Mathematics 26, 189–206 (1984)zbMATHGoogle Scholar
  12. 12.
    Jain, A.K., Murty, M.N., Flynn, P.J.: Data clustering: A review. ACM Computing Surveys 31, 264–323 (1999)CrossRefGoogle Scholar
  13. 13.
    MacQueen, J.B.: Some methods for classification and analysis of multivariate observations. In: Proceedings of 5-th Berkeley Symposium on Mathematical Statistics and Probability, Berkeley, pp. 281–297. University of California Press, California (1967)Google Scholar
  14. 14.
  15. 15.
    Sander, J., Ester, M., Kriegel, H.P., Xu, X.: Density-based clustering in spatial databases: The algorithm gdbscan and its applications. Data Mining and Knowledge Discovery, an International Journal 2, 169–194 (1998)CrossRefGoogle Scholar
  16. 16.
    Tan, P.N, Steinbach, M., Kumar, V.: Introduction to Data Mining. Pearson/Addison-Wesley, Boston (2006)Google Scholar
  17. 17.
    Vempala, S.S.: The Random Projection Method. American Mathematical Society. Providence, Rhode Island (2004)zbMATHGoogle Scholar
  18. 18.
    Witten, I.H., Frank, E.: Data Mining - Practical Machine Learning Tools and Techniques, 2nd edn. Morgan Kaufmann, San Francisco (2005)zbMATHGoogle Scholar
  19. 19.
    Zaïane, O.R., Simoff, S.J., Djeraba, C. (eds.): MDM/KDD 2002 and KDMCD 2002. LNCS (LNAI), vol. 2797. Springer, Heidelberg (2002)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2007

Authors and Affiliations

  • Thierry Urruty
    • 1
  • Chabane Djeraba
    • 1
  • Dan A. Simovici
    • 2
  1. 1.LIFL-UMR CNRS 8022, Laboratoire d’Informatique Fondamentale de Lille, Université de Lille 1France
  2. 2.University of Massachusetts Boston, Department of Computer Science, Boston, Massachusetts 02125USA

Personalised recommendations