Advertisement

Optimization Letters

, Volume 13, Issue 8, pp 1837–1853 | Cite as

Attainable accuracy guarantee for the k-medians clustering in [0, 1]

  • Michael Khachay
  • Daniel KhachayEmail author
Original Paper
  • 122 Downloads

Abstract

We consider the famous k-medians clustering problem in the context of a zero-sum two-player game, which is defined as follows. For given integers \(n>1\) and \(k>1\), strategy sets of the first and second players consist of n-samples drawn from the unit segment [0, 1] and partitions of the index set \(\{1,\ldots , n\}\) into k nonempty subsets (clusters), respectively. As a payoff, we take a loss function of the k-medians clustering evaluated in terms of the sample chosen by the first player and the partition taken by the second one. Actually, the payoff coincides with the sum of distances between points of the sample and the nearest center of a cluster. It is easy to verify that this game has no value. In this paper, for any \(n>1\) and \(k>1\), we show that \(0.5n/(2k-1)\) is an upper bound for the lower value of this game. Furthermore, for any k, we prove attainability of this bound for some \({\bar{n}}={\bar{n}}(k)\) and an arbitrary \(n\ge {\bar{n}}\). As a consequence, we show that any n-sample from [0, 1] can be partitioned into k clusters, such that the value of k-medians clustering criterion does not exceed the bound obtained and this bound is tight for sufficiently large n.

Keywords

k-Medians clustering Attainable accuracy guarantee Farkas-Minkowski lemma 

Notes

Acknowledgements

This research is supported by RFBR, grants no. 16-07-00266, 16-01-00505, and 17-08-01385.

References

  1. 1.
    Abbey, R., Diepenbrock, J., Langville, A.N., Meyer, C.D., Race, S., Zhou, D.: Data clustering via principal direction gap partitioning. CoRR arXiv:1211.4142 (2012)
  2. 2.
    Aggarwal, C.C., Reddy, C.K.: Data Clustering: Algorithms and Applications, 1st edn. Chapman & Hall/CRC, Boca Raton (2013)CrossRefGoogle Scholar
  3. 3.
    Ames, B.P.W.: Guaranteed clustering and biclustering via semidefinite programming. Math. Program. 147(1), 429–465 (2014).  https://doi.org/10.1007/s10107-013-0729-x MathSciNetCrossRefzbMATHGoogle Scholar
  4. 4.
    Boley, D.: Principal direction divisive partitioning. Data Min. Knowl. Discov. 2(4), 325–344 (1998).  https://doi.org/10.1023/A:1009740529316 CrossRefGoogle Scholar
  5. 5.
    Candès, E.J., Li, X., Ma, Y., Wright, J.: Robust principal component analysis? J. ACM 58(3), 11:1–11:37 (2011).  https://doi.org/10.1145/1970392.1970395 MathSciNetCrossRefzbMATHGoogle Scholar
  6. 6.
    Dasgupta, S.: Performance guarantees for hierarchical clustering. In: Kivinen, J., Sloan, R.H. (eds.) Computational Learning Theory, pp. 351–363. Springer, Berlin (2002)CrossRefGoogle Scholar
  7. 7.
    de Berg, M., Buchin, K., Jansen, B.M.P., Woeginger, G.: Fine-grained complexity analysis of two classic TSP variants. In: Chatzigiannakis, I., Mitzenmacher, M., Rabani, Y., Sangiorgi, D. (eds.) 43rd International Colloquium on Automata, Languages, and Programming (ICALP 2016), Leibniz International Proceedings in Informatics (LIPIcs), vol. 55, pp. 5:1–5:14. Schloss Dagstuhl–Leibniz-Zentrum fuer Informatik, Dagstuhl, Germany (2016).  https://doi.org/10.4230/LIPIcs.ICALP.2016.5, http://drops.dagstuhl.de/opus/volltexte/2016/6277
  8. 8.
    Duda, R.O., Hart, P.E., Stork, D.G.: Pattern Classification, 2nd edn. Wiley, Hoboken (2001)zbMATHGoogle Scholar
  9. 9.
    Enomoto, H., Oda, Y., Ota, K.: Pyramidal tours with step-backs and the asymmetric traveling salesman problem. Discrete Appl. Math. 87(1–3), 57–65 (1998).  https://doi.org/10.1016/S0166-218X(98)00048-1 MathSciNetCrossRefzbMATHGoogle Scholar
  10. 10.
    Eremin, I.: Theory of Linear Optimization. Inverse and Ill-Posed Problems, vol. 29. VSP, Utrecht (2002)Google Scholar
  11. 11.
    Grønlund, A., Larsen, K.G., Mathiasen, A., Nielsen, J.S.: Fast exact k-means, k-medians and Bregman divergence clustering in 1D. CoRR arXiv:1701.07204 (2017)
  12. 12.
    Guruswami, V., Indyk, P.: Embeddings and non-approximability of geometric problems. In: Proceedings of the Fourteenth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA ’03, pp. 537–538. Society for Industrial and Applied Mathematics, Philadelphia, PA, USA. http://dl.acm.org/citation.cfm?id=644108.644198 (2003)
  13. 13.
    Gutin, G., Punnen, A.P.: The Traveling Salesman Problem and Its Variations. Springer, Boston (2007)CrossRefGoogle Scholar
  14. 14.
    Har-Peled, S., Mazumdar, S.: On coresets for k-means and k-median clustering. In: Proceedings of the Thirty-sixth Annual ACM Symposium on Theory of Computing, STOC ’04, pp. 291–300. ACM, New York, NY, USA (2004).  https://doi.org/10.1145/1007352.1007400
  15. 15.
    Khachay, M., Neznakhina, K.: Generalized Pyramidal Tours for the Generalized Traveling Salesman Problem. LNCS, vol. 10627, pp. 265–277. Springer, Cham (2017).  https://doi.org/10.1007/978-3-319-71150-8-23 CrossRefzbMATHGoogle Scholar
  16. 16.
    Khachay, M., Neznakhina, K.: Polynomial Time Solvable Subclass of the Generalized Traveling Salesman Problem on Grid Clusters. LNCS, vol. 10716, pp. 346–355. Springer, Cham (2018).  https://doi.org/10.1007/978-3-319-73013-4_32 CrossRefGoogle Scholar
  17. 17.
    Khachay, M., Pankratov, V., Khachay, D.: Attainable best guarantee for the accuracy of k-medians clustering in [0, 1]. In: Optimization and Applications (OPTIMA2017), pp. 322–327. http://ceur-ws.org/Vol-1987/paper-47.pdf (2017)
  18. 18.
    Klyaus, P.: Generation of testproblems for the traveling salesman problem. Preprint Inst. Mat. Akad. Nauk. BSSR (16) (1976) (in Russian) Google Scholar
  19. 19.
    Kovaleva, E.V., Mirkin, B.G.: Bisecting k-means and 1D projection divisive clustering: a unified framework and experimental comparison. J. Classif. 32(3), 414–442 (2015).  https://doi.org/10.1007/s00357-015-9186-y MathSciNetCrossRefzbMATHGoogle Scholar
  20. 20.
    Kumar, A., Sabharwal, Y., Sen, S.: Linear-time approximation schemes for clustering problems in any dimensions. J. ACM 57(2), 5:1–5:32 (2010).  https://doi.org/10.1145/1667053.1667054 MathSciNetCrossRefzbMATHGoogle Scholar
  21. 21.
    Nilsson, M.: Hierarchical clustering using non-greedy principal direction divisive partitioning. Inf. Retr. 5(4), 311–321 (2002).  https://doi.org/10.1023/A:1020443310743 CrossRefGoogle Scholar
  22. 22.
    Oda, Y., Ota, K.: Algorithmic aspects of pyramidal tours with restricted jump-backs. Interdiscip. Inf. Sci. 7(1), 123–133 (2001).  https://doi.org/10.4036/iis.2001.123 MathSciNetCrossRefzbMATHGoogle Scholar
  23. 23.
    Sabo, K., Scitovski, R., Vazler, I.: One-dimensional center-based \(l_1\)-clustering method. Optim. Lett. 7(1), 5–22 (2013).  https://doi.org/10.1007/s11590-011-0389-9 MathSciNetCrossRefzbMATHGoogle Scholar
  24. 24.
    Schrijver, A.: Theory of Linear and Integer Programming. Wiley, London (1998)zbMATHGoogle Scholar
  25. 25.
    Tasoulis, S., Tasoulis, D., Plagianakos, V.: Enhancing principal direction divisive clustering. Pattern Recognit. 43(10), 3391–3411 (2010).  https://doi.org/10.1016/j.patcog.2010.05.025 CrossRefzbMATHGoogle Scholar
  26. 26.
    Zeimpekis, D., Gallopoulos, E.: Principal direction divisive partitioning with kernels and k-means steering (2008).  https://doi.org/10.1007/978-1-84800-046-9_3 CrossRefGoogle Scholar

Copyright information

© Springer-Verlag GmbH Germany, part of Springer Nature 2018

Authors and Affiliations

  1. 1.Krasovsky Institute of Mathematics and MechanicsEkaterinburgRussia
  2. 2.Ural Federal UniversityEkaterinburgRussia
  3. 3.Omsk State Technical UniversityOmskRussia

Personalised recommendations