Random Projection for k-means Clustering

  • Sami Sieranoja
  • Pasi FräntiEmail author
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10841)


We study how much the k-means can be improved if initialized by random projections. The first variant takes two random data points and projects the points to the axis defined by these two points. The second one uses furthest point heuristic for the second point. When repeated 100 times, cluster level errors of a single run of k-means reduces from CI = 4.5 to 0.8, on average. We also propose simple projective indicator that predicts when the projection-heuristic is expected to work well.


Clustering Random projection K-means 


  1. 1.
    Al-Daoud, M.B., Roberts, S.A.: New methods for the initialisation of clusters. Pattern Recogn. Lett. 17(5), 451–455 (1996)CrossRefGoogle Scholar
  2. 2.
    Anderberg, M.R.: Cluster Analysis for Applications. Academic Press, NewYork (1973)zbMATHGoogle Scholar
  3. 3.
    Bai, L., Cheng, X., Liang, J., Shen, H., Guo, Y.: Fast density clustering strategies based on the k-means algorithm. Pattern Recogn. 71, 375–386 (2017)CrossRefGoogle Scholar
  4. 4.
    Boley, D.: Principal direction divisive partitioning. Data Min. Knowl. Disc. 2(4), 325–344 (1998)CrossRefGoogle Scholar
  5. 5.
    Boutsidis, C., Zouzias, A., Mahoney, M.W., Drineas, P.: Randomized dimensionality reduction for k-means clustering. IEEE Trans. Inf. Theory 61(2), 1045–1062 (2015)MathSciNetCrossRefGoogle Scholar
  6. 6.
    Cardoso, A., Wichert, A.: Iterative random projections for high-dimensional data clustering. Pattern Recogn. Lett. 33, 1749–1755 (2012)CrossRefGoogle Scholar
  7. 7.
    Carraher, L.A., Wilsey, P.A., Moitra, A., Dey, S.: Random projection clustering on streaming data. In: IEEE International Conference on Data Mining Workshops, pp. 708–715 (2016)Google Scholar
  8. 8.
    Celebi, M.E., Kingravi, H.A., Vela, P.A.: A comparative study of efficient initialization methods for the k-means clustering algorithm. Expert Syst. Appl. 40, 200–210 (2013)CrossRefGoogle Scholar
  9. 9.
    Cleju, I., Fränti, P., Wu, X.: Clustering based on principal curve. In: Kalviainen, H., Parkkinen, J., Kaarna, A. (eds.) SCIA 2005. LNCS, vol. 3540, pp. 872–881. Springer, Heidelberg (2005). Scholar
  10. 10.
    Dasgupta, S.: Experiments with random projection. In: Uncertainty in Artificial Intelligence, pp. 143–151 (2000)Google Scholar
  11. 11.
    Duda, R.O., Hart, P.E.: Pattern Classification and Scene Analysis. Wiley, New York (1973)zbMATHGoogle Scholar
  12. 12.
    Erisoglu, M., Calis, N., Sakallioglu, S.: A new algorithm for initial cluster centers in k-means algorithm. Pattern Recogn. Lett. 32(14), 1701–1705 (2011)CrossRefGoogle Scholar
  13. 13.
    Fern, X.Z., Brodley, C.E.: Random projection for high dimensional data clustering: a cluster ensemble approach. In: International Conference on Machine Learning (ICMC), Washington, DC (2003)Google Scholar
  14. 14.
    Fränti, P.: Genetic algorithm with deterministic crossover for vector quantization. Pattern Recogn. Lett. 21(1), 61–68 (2000)CrossRefGoogle Scholar
  15. 15.
    Fränti, P., Kaukoranta, T., Nevalainen, O.: On the splitting method for VQ codebook generation. Opt. Eng. 36(11), 3043–3051 (1997)CrossRefGoogle Scholar
  16. 16.
    Fränti, P.: Efficiency of random swap clustering. J. Big Data 5(13), 1–29 (2018)MathSciNetGoogle Scholar
  17. 17.
    Fränti, P., Rezaei, M., Zhao, Q.: Centroid index: cluster level similarity measure. Pattern Recogn. 47(9), 3034–3045 (2014)CrossRefGoogle Scholar
  18. 18.
    Fränti, P., Tuononen, M., Virmajoki, O.: Deterministic and randomized local search algorithms for clustering. In: IEEE International Conference on Multimedia and Expo, Hannover, Germany, pp. 837–840, June 2008Google Scholar
  19. 19.
    Fränti, P., Virmajoki, O.: Iterative shrinking method for clustering problems. Pattern Recogn. 39(5), 761–765 (2006)CrossRefGoogle Scholar
  20. 20.
    Fränti, P., Virmajoki, O., Hautamäki, V.: Fast agglomerative clustering using a k-nearest neighbor graph. IEEE Trans. Pattern Anal. Mach. Intell. 28(11), 1875–1881 (2006)CrossRefGoogle Scholar
  21. 21.
    González, R., Tou, J.: Pattern Recognition Principles. Addison-Wesley, Boston (1974)zbMATHGoogle Scholar
  22. 22.
    He, J., Lan, M., Tan, C.-L., Sung, S.-Y., Low, H.-B.: Initialization of cluster refinement algorithms: a review and comparative study. In: IEEE International Joint Conference on Neural Networks (2004)Google Scholar
  23. 23.
    Huang, C.-M., Harris, R.W.: A comparison of several vector quantization codebook generation approaches. IEEE Trans. Image Process. 2(1), 108–112 (1993)CrossRefGoogle Scholar
  24. 24.
    Kaukoranta, T., Fränti, P., Nevalainen, O.: A fast exact GLA based on code vector activity detection. IEEE Trans. Image Process. 9(8), 1337–1342 (2000)CrossRefGoogle Scholar
  25. 25.
    Krishna, K., Murty, M.N.: Genetic k-means algorithm. IEEE Trans. Syst. Man Cybern. Part B 29(3), 433–439 (1999)CrossRefGoogle Scholar
  26. 26.
    Kärkkäinen, I., Fränti, P.: Dynamic local search algorithm for the clustering problem. Research Report A-2002-6 (2002)Google Scholar
  27. 27.
    Peña, J.M., Lozano, J.A., Larrañaga, P.: An empirical comparison of four initialization methods for the k-means algorithm. Pattern Recogn. Lett. 20(10), 1027–1040 (1999)CrossRefGoogle Scholar
  28. 28.
    Ra, S.-W., Kim, J.-K.: A fast mean-distance-ordered partial codebook search algorithm for image vector quantization. IEEE Trans. Circ. Syst. 40, 576–579 (1993)CrossRefGoogle Scholar
  29. 29.
    Rezaei, M., Fränti, P.: Set-matching methods for external cluster validity. IEEE Trans. Knowl. Data Eng. 28(8), 2173–2186 (2016)CrossRefGoogle Scholar
  30. 30.
    Steinley, D., Brusco, M.J.: Initializing k-means batch clustering: a critical evaluation of several techniques. J. Classif. 24, 99–121 (2007)MathSciNetCrossRefGoogle Scholar
  31. 31.
    Su, T., Dy, J.G.: In search of deterministic methods for initializing k-means and Gaussian mixture clustering. Intell. Data Anal. 11(4), 319–338 (2007)CrossRefGoogle Scholar
  32. 32.
    Wu, X.: Optimal quantization by matrix searching. J. Algorithms 12(4), 663–673 (1991)MathSciNetCrossRefGoogle Scholar
  33. 33.
    Wu, X., Zhang, K.: A better tree-structured vector quantizer. In: IEEE Data Compression Conference, Snowbird, UT, pp. 392–401 (1991)Google Scholar
  34. 34.
    Yan, D., Huang, L., Jordan, M.I.: Fast approximate spectral clustering. In: ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 907–916, 2009Google Scholar
  35. 35.
    Yedla, M., Pathakota, S.R., Srinivasa, T.M.: Enhancing k-means clustering algorithm with improved initial center. Int. J. Comput. Sci. Inf. Technol. 1(2), 121–125 (2010)Google Scholar
  36. 36.
    Zhang, T., Ramakrishnan, R., Livny, M.: BIRCH: a new data clustering algorithm and its applications. Data Min. Knowl. Disc. 1(2), 141–182 (1997)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing AG, part of Springer Nature 2018

Authors and Affiliations

  1. 1.School of ComputingUniversity of Eastern FinlandJoensuuFinland

Personalised recommendations