Advertisement

Pattern Recognition and Image Analysis

, Volume 28, Issue 4, pp 703–711 | Cite as

An Exact Algorithm of Searching for the Largest Cluster in an Integer-Valued Problem of 2-Partitioning a Sequence

  • A. V. Kel’manov
  • S. A. Khamidullin
  • V. I. Khandeev
  • A. V. Pyatkin
Mathematical Method in Pattern Recognition
  • 3 Downloads

Abstract

We analyze mathematical aspects of one of the fundamental data analysis problems consisting in the search (selection) for the subset with the largest number of similar elements among a collection of objects. In particular, the problem appears in connection with the analysis of data in the form of time series (discrete signals). One of the problems in modeling this challenge is considered, namely, the problem of finding the cluster of the largest size (cardinality) in a 2-partition of a finite sequence of points in Euclidean space into two clusters (subsequences) under two constraints. The first constraint is on the choice of the indices of elements included in the clusters. This constraint simulates the set of time-admissible configurations of similar elements in the observed discrete signal. The second constraint is imposed on the value of the quadratic clustering function. This constraint simulates the level of intracluster proximity of objects. The clustering function under the second constraint is the sum (over both clusters) of the intracluster sums of squared distances between the cluster elements and its center. The center of one of the clusters is unknown and defined as the centroid (the arithmetic mean over all elements of this cluster). The center of the other cluster is the origin. Under the first constraint, the difference between any two subsequent indices of elements contained in a cluster with an unknown center is bounded above and below by some constants. It is established in the paper that the optimization problem under consideration, which models one of the simplest significant problems of data analysis, is strongly NP-hard. We propose an exact algorithm for the case of a problem with integer coordinates of its input points. If the dimension of the space is bounded by a constant, then the algorithm is pseudopolynomial.

Keywords

time-series analysis similar elements Euclidean space sequence 2-partition longest subsequence quadratic scattering NP-hard problem integer coordinates exact algorithm fixed space dimension pseudopolynomial running time 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    J. B. MacQueen, “Some methods for classification and analysis of multivariate observations,” in Proc. Fifth Berkeley Symposium on Mathematical Statistics and Probability (Univ. of California Press, Berkeley, 1967), Vol. 1, pp. 281–297.Google Scholar
  2. 2.
    M. Rao, “Cluster analysis and mathematical programming,” J. Amer. Stat. Assoc. 66, 622–626 (1971).zbMATHGoogle Scholar
  3. 3.
    P. Hansen, B. Jaumard, and N. Mladenović, “Minimum sum of squares clustering in a low dimensional space,” J. Classif. 15 (1), 37–55 (1998).MathSciNetzbMATHGoogle Scholar
  4. 4.
    P. Hansen and B. Jaumard, “Cluster analysis and mathematical programming,” Math. Program. 79 (1–3), 191–215 (1997).MathSciNetzbMATHGoogle Scholar
  5. 5.
    R. A. Fisher, Statistical Methods and Scientific Inference (Hafner Press, New York, 1956).zbMATHGoogle Scholar
  6. 6.
    D. Aloise, A. Deshpande, P. Hansen, and P. Popat, “NP–hardness of Euclidean sum–of–squares clustering,” Mach. Learn. 75 (2), 245–248 (2009).zbMATHGoogle Scholar
  7. 7.
    P. Drineas, A. Frieze, R. Kannan, S. Vempala, V. Vinay, “Clustering large graphs via the singular value decomposition,” Mach. Learn. 56 (1–3), 9–33 (2004).zbMATHGoogle Scholar
  8. 8.
    A. V. Dolgushev and A. V. Kel’manov, “On the algorithmic complexity of a problem in cluster analysis,” J. Appl. Ind. Math. 5 (2), 191–194 (2011).MathSciNetGoogle Scholar
  9. 9.
    M. Mahajan, P. Nimbhorkar, and K. Varadarajan, “The planar k–means problem is NP–hard,” Theor. Comput. Sci. 442, 13–21 (2012).MathSciNetzbMATHGoogle Scholar
  10. 10.
    A. V. Kel’manov and S. A. Khamidullin. “An approximating polynomial algorithm for a sequence partitioning problem,” J. Appl. Ind. Math. 8 (2), 236–244 (2014).MathSciNetzbMATHGoogle Scholar
  11. 11.
    A. V. Kel’manov, S. A. Khamidullin, and V. I. Khandeev, “Exact pseudopolynomial algorithm for one sequence partitioning problem,” Autom. Remote Control, 78 (1), 66–73 (2017).zbMATHGoogle Scholar
  12. 12.
    A. V. Kel’manov, S. A. Khamidullin, and V. I. Khandeev, “A fully polynomial–time approximation scheme for a sequence 2–cluster partitioning problem,” J. Appl. Ind. Math. 10 (2), 209–219 (2016).MathSciNetzbMATHGoogle Scholar
  13. 13.
    A. V. Kel’manov, S. A. Khamidullin, and V. I. Khandeev, “A randomized algorithm for a sequence 2–clustering problem,” Comput. Math. and Math. Phys. (accepted) (2018).Google Scholar
  14. 14.
    A. V. Kel’manov, S. A. Khamidullin, and V. I. Khandeev, “A randomized algorithm for 2–partition of a sequence,” in Analysis of Images, Social Networks and Texts, AIST 2017, Ed. by W. M. P. van der Aalst, Lecture Notes in Computer Science (Springer, Cham, 2018), Vol. 10716, pp. 297–306.Google Scholar
  15. 15.
    C. M. Bishop, Pattern Recognition and Machine Learning (Springer Science + Business Media, New York, 2006).zbMATHGoogle Scholar
  16. 16.
    G. James, D. Witten, T. Hastie, and R. Tibshirani, An Introduction to Statistical Learning: With Application in R (Springer Science + Business Media, New York, 2013).zbMATHGoogle Scholar
  17. 17.
    T. Hastie, R. Tibshirani, and J. Friedman, The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd ed. (Springer, New York, 2009).zbMATHGoogle Scholar
  18. 18.
    C. C. Aggarwal, Data Mining: The Textbook (Springer International Publishing, Cham, 2015).zbMATHGoogle Scholar
  19. 19.
    I. Goodfellow, Y. Bengio, and A. Courville, Deep Learning, in Adaptive Computation and Machine Learning series (MIT Press, Cambridge, MA, 2016).zbMATHGoogle Scholar
  20. 20.
    A. S. Shirkhorshidi, S. Aghabozorgi, T. Y. Wah, and T. Herawan, “Big data clustering: A review,” in Computational Science and Its Applications—ICCSA 2014, Ed. by B. Murgante, Lecture Notes in Computer Science (Springer, Cham, 2014), Vol. 8583, pp. 707–720.Google Scholar
  21. 21.
    A. K. Jain, “Data clustering: 50 years beyond–means,” Pattern Recogn. Lett. 31 (8), 651–666 (2010).Google Scholar
  22. 22.
    J. Pach and P. K. Agarwal, Combinatorial Geometry (Wiley, New York, 1995).zbMATHGoogle Scholar
  23. 23.
    T.–C. Fu, “A review on time series data mining,” Eng. Appl. Artif. Intell. 24 (1), 164–181 (2011).Google Scholar
  24. 24.
    C. Künzer, S. Dech, and W. Wagner (Eds.), Remote Sensing Time Series, in Remote Sensing and Digital Image Processing (Springer, Cham, 2015), Vol. 22.Google Scholar
  25. 25.
    T. W. Liao, “Clustering of time series data–A survey,” Pattern Recogn. 38 (11), 1857–1874 (2005).zbMATHGoogle Scholar
  26. 26.
    A. V. Kel’manov and A. V. Pyatkin, “On the complexity of a search for a subset of “similar” vectors,” Dokl. Math. 78 (1), 574–575 (2008).MathSciNetzbMATHGoogle Scholar
  27. 27.
    A. V. Kel’manov and A. V. Pyatkin, “On a version of the problem of choosing a vector subset,” J. Appl. Ind. Math. 3 (4), 447–455 (2009).MathSciNetGoogle Scholar
  28. 28.
    A. V. Kel’manov and V. I. Khandeev, “A 2–approximation polynomial algorithm for a clustering problem,” J. Appl. Ind. Math. 7 (4), 515–521 (2013).MathSciNetzbMATHGoogle Scholar
  29. 29.
    E. Kh. Gimadi, A. V. Kel’manov, M. A. Kel’manova, and S. A. Khamidullin, “A posteriori detection of a quasiperiodic fragment with a given number of repetitions in a numerical sequence,” Sib. Zh. Ind. Mat. 9 (1(25)), 55–74 (2006) [in Russian].Google Scholar
  30. 30.
    E. Kh. Gimadi, A. V. Kel’manov, M. A. Kel’manova, and S. A. Khamidullin, “A posteriori detecting a quasiperiodic fragment in a numerical sequence,” Pattern Recogn. Image Anal. 18 (1), 30–42 (2008).zbMATHGoogle Scholar
  31. 31.
    A. E. Baburin, E. Kh. Gimadi, N. I. Glebov, and A. V. Pyatkin, “The problem of finding a subset of vectors with the maximum total weight,” J. Appl. Ind. Math. 2 (1), 32–38 (2008).MathSciNetzbMATHGoogle Scholar
  32. 32.
    A. V. Kel’manov and V. I. Khandeev, “An exact pseudopolynomial algorithm for a problem of the two–cluster partitioning of a set of vectors,” J. Appl. Ind. Math. 9 (4), 497–502 (2015).MathSciNetzbMATHGoogle Scholar
  33. 33.
    E. Kh. Gimadi, A. V. Pyatkin, and I. A. Rykov, “On polynomial solvability of some problems of a vector subset choice in a Euclidean space of fixed dimension,” J. Appl. Ind. Math. 4 (1), 48–53 (2010).MathSciNetGoogle Scholar
  34. 34.
    V. V. Shenmaier, “Solving some vector subset problems by Voronoi diagrams,” J. Appl. Ind. Math. 10 (4), 560–566 (2016).MathSciNetzbMATHGoogle Scholar
  35. 35.
    A. V. Dolgushev and A. V. Kel’manov, “An approximation algorithm for solving a problem of cluster analysis,” J. Appl. Ind. Math. 5 (4), 551–558 (2011).MathSciNetzbMATHGoogle Scholar
  36. 36.
    A. V. Dolgushev, A. V. Kel’manov, and V. V. Shenmaier, “Polynomial–time approximation scheme for a problem of partitioning a finite set into two clusters,” Proc. Steklov Inst. Math. 295 (Suppl. 1), S47–S56 (2016).Google Scholar
  37. 37.
    A. V. Kel’manov and V. I. Khandeev, “Fully polynomial–time approximation scheme for a special case of a quadratic Euclidean 2–clustering problem,” Comput. Math. Math. Phys. 56 (2), 334–341 (2016).MathSciNetzbMATHGoogle Scholar
  38. 38.
    A. V. Kel’manov, A. V. Motkova, and V. V. Shenmaier, “An approximation scheme for a weighted two–cluster partition problem,” in Analysis of Images, Social Networks and Texts, AIST 2017, Ed. by W. M. P. van der Aalst, Lecture Notes in Computer Science (Springer, Cham, 2018), Vol. 10716, pp. 323–333.Google Scholar
  39. 39.
    A. V. Kel’manov and V. I. Khandeev, “A randomized algorithm for two–cluster partition of a set of vectors,” Comput. Math. Math. Phys. 55 (2), 330–339 (2015).MathSciNetzbMATHGoogle Scholar
  40. 40.
    A. V. Kel’manov, V. I. Khandeev, and A. V. Panasenko, “Exact algorithms for the special cases of two hard to solve problems of searching for the largest subset,” in Proc. 7th Intern. Conf. on Analysis of Images, Social Networks, and Texts (AIST 2018), Moscow, Russia, July 5–7, 2018, Lecture Notes in Computer Science (accepted) (Springer, Cham, 2019).Google Scholar
  41. 41.
    A. V. Kel’manov, V. I. Khandeev, and A. V. Panasenko, “Exact algorithms for two hard to solve 2–clustering problems,” Pattern Recogn. Image Anal. (accepted) (2019).Google Scholar
  42. 42.
    A. V. Kel’manov and A. V. Pyatkin, “On complexity of some problems of cluster analysis of vector sequences,” J. Appl. Ind. Math. 7 (3), 363–369 (2013).MathSciNetzbMATHGoogle Scholar
  43. 43.
    A. V. Kel’manov and A. V. Khamidullin, “An approximation polynomial–time algorithm for a sequence biclustering problem,” Comput. Math. Math. Phys. 55 (6), 1068–1076 (2015).MathSciNetzbMATHGoogle Scholar
  44. 44.
    A. V. Kel’manov and A. V. Pyatkin, “NP–completeness of some problems of choosing a vector subset,” J. Appl. Ind. Math. 5 (3), 352–357 (2011).MathSciNetGoogle Scholar
  45. 45.
    M. R. Garey and D. S. Johnson, Computers and Intractability: A Guide to the Theory of NP–Completeness (W. H. Freeman and Co., San Francisco, 1979).zbMATHGoogle Scholar
  46. 46.
    A. V. Kel’manov and S. A. Khamidullin, “An approximating polynomial algorithm for a sequence partitioning problem,” J. Appl. Ind. Math. 8 (2), 236–244 (2014).MathSciNetzbMATHGoogle Scholar
  47. 47.
    A. V. Kel’manov and S. A. Khamidullin, “A posteriori detection of a given number of identical subsequences in a quasiperiodic sequence,” Comput. Math. Math. Phys. 41 (5), 762–774 (2001).MathSciNetzbMATHGoogle Scholar

Copyright information

© Pleiades Publishing, Ltd. 2018

Authors and Affiliations

  • A. V. Kel’manov
    • 1
    • 2
  • S. A. Khamidullin
    • 1
  • V. I. Khandeev
    • 1
    • 2
  • A. V. Pyatkin
    • 1
    • 2
  1. 1.Sobolev Institute of Mathematics, Siberian BranchRussian Academy of SciencesNovosibirskRussia
  2. 2.Novosibirsk State UniversityNovosibirskRussia

Personalised recommendations