An Approximation Algorithm for a Problem of Partitioning a Sequence into Clusters with Restrictions on Their Cardinalities

  • Alexander Kel’manov
  • Ludmila Mikhailova
  • Sergey Khamidullin
  • Vladimir Khandeev
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9869)

Abstract

We consider the problem of partitioning a finite sequence of points in Euclidean space into a given number of clusters (subsequences) minimizing the sum of squared distances between cluster elements and the corresponding cluster centers. It is assumed that the center of one of the desired clusters is the origin, while the centers of the other clusters are unknown and determined as the mean values over clusters elements. Additionally, there are a few structural restrictions on the elements of clusters with unknown centers: (1) clusters form non-overlapping subsequences of the input sequence, (2) the difference between two consecutive indices is bounded from below and above by prescribed constants, and (3) the total number of elements in these clusters is given as an input. It is shown that the problem is strongly NP-hard. A 2-approximation algorithm which runs in polynomial time for a fixed number of clusters is proposed for this problem.

Keywords

Clustering Structural constraints Euclidean space Minimum sum-of-squared distances NP-hardness Guaranteed approximation factor 

Notes

Acknowledgments

This work was supported by Russian Science Foundation, project no. 16-11-10041.

References

  1. 1.
    Fu, T.: A review on time series data mining. Eng. Appl. Artif. Intell. 24(1), 164–181 (2011)CrossRefGoogle Scholar
  2. 2.
    Kuenzer, C., Dech, S., Wagner, W.: Remote Sensing Time Series. Remote Sensing and Digital Image Processing, vol. 22. Springer, Switzerland (2015)Google Scholar
  3. 3.
    Warren Liao, T.: Clustering of time series data – a survey. Pattern Recogn. 38(11), 1857–1874 (2005)CrossRefMATHGoogle Scholar
  4. 4.
    Aggarwal, C.C.: Data Mining: The Textbook. Springer, Switzerland (2015)CrossRefMATHGoogle Scholar
  5. 5.
    Kel’manov, A.V., Pyatkin, A.V.: On complexity of some problems of cluster analysis of vector sequences. J. Appl. Ind. Math. 7(3), 363–369 (2013)MathSciNetCrossRefMATHGoogle Scholar
  6. 6.
    Kel’manov, A.V., Khamidullin, S.A.: An approximating polynomial algorithm for a sequence partitioning problem. J. Appl. Ind. Math. 8(2), 236–244 (2014)MathSciNetCrossRefMATHGoogle Scholar
  7. 7.
    Kel’manov, A.V., Mikhailova, L.V.: Joint detection of a given number of reference fragments in a quasi-periodic sequence and its partition into segments containing series of identical fragments. Comput. Math. Math. Phys. 46(1), 165–181 (2006)MathSciNetCrossRefMATHGoogle Scholar
  8. 8.
    Kel’manov, A.V., Khamidullin, S.A., Khandeev, V.I.: An exact pseudopolynomial algorithm for a sequence bi-clustering problem (in Russian). In: Book of Abstract of the XVth Russian Conference “Mathematical Programming and Applications”, pp. 139–140. Inst. Mat. Mekh. UrO RAN, Ekaterinburg (2015)Google Scholar
  9. 9.
    Kel’manov, A.V., Khamidullin, S.A., Khandeev, V.I.: A fully polynomial-time approximation scheme for a sequence 2-cluster partitioning problem. J. Appl. Indust. Math. 10(2), 209–219 (2016)CrossRefMATHGoogle Scholar
  10. 10.
    Kel’manov, A.V., Romanchenko, S.M.: An FPTAS for a vector subset search problem. J. Appl. Indust. Math. 8(3), 329–336 (2014)MathSciNetCrossRefMATHGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2016

Authors and Affiliations

  • Alexander Kel’manov
    • 1
    • 2
  • Ludmila Mikhailova
    • 1
  • Sergey Khamidullin
    • 1
  • Vladimir Khandeev
    • 1
    • 2
  1. 1.Sobolev Institute of MathematicsNovosibirskRussia
  2. 2.Novosibirsk State UniversityNovosibirskRussia

Personalised recommendations