Skip to main content

An Exact Algorithm of Searching for the Largest Size Cluster in an Integer Sequence 2-Clustering Problem

  • Conference paper
  • First Online:
  • 535 Accesses

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 974))

Abstract

A problem of partitioning a finite sequence of points in Euclidean space into two subsequences (clusters) maximizing the size of the first cluster subject to two constraints is considered. The first constraint deals with every two consecutive indices of elements of the first cluster: the difference between them is bounded from above and below by some constants. The second one restricts the value of a quadratic clustering function that is the sum of the intracluster sums over both clusters. The intracluster sum is the sum of squared distances between cluster elements and the cluster center. The center of the first cluster is unknown and determined as the centroid (i.e. as the mean value of its elements), while the center of the second one is zero.

The strong NP-hardness of the problem is shown and an exact algorithm is suggested for the case of integer coordinates of input points. If the space dimension is bounded by some constant this algorithm runs in a pseudopolynomial time.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. MacQueen, J.B.: Some methods for classification and analysis of multivariate observations. In: Proceedings of the 5th Berkeley Symposium on Mathematical Statistics and Probability, vol. 1, pp. 281–297. University of California Press, Berkeley (1967)

    Google Scholar 

  2. Rao, M.: Cluster analysis and mathematical programming. J. Am. Stat. Assoc. 66, 622–626 (1971)

    Article  Google Scholar 

  3. Hansen, P., Jaumard, B., Mladenovich, N.: Minimum sum of squares clustering in a low dimensional space. J. Classifi. 15, 37–55 (1998)

    Article  MathSciNet  Google Scholar 

  4. Hansen, P., Jaumard, B.: Cluster analysis and mathematical programming. Math. Program. 79, 191–215 (1997)

    MathSciNet  MATH  Google Scholar 

  5. Fisher, R.A.: Statistical Methods and Scientific Inference. Hafner, New York (1956)

    MATH  Google Scholar 

  6. Aloise, D., Deshpande, A., Hansen, P., Popat, P.: NP-hardness of Euclidean sum-of-squares clustering. Mach. Learn. 75(2), 245–248 (2009)

    Article  Google Scholar 

  7. Drineas, P., Frieze, A., Kannan, R., Vempala, S., Vinay, V.: Clustering large graphs via the singular value decomposition. Mach. Learn. 56, 9–33 (2004)

    Article  Google Scholar 

  8. Dolgushev, A.V., Kel’manov, A.V.: On the algorithmic complexity of a problem in cluster analysis. J. Appl. Ind. Math. 5(2), 191–194 (2011)

    Article  MathSciNet  Google Scholar 

  9. Mahajan, M., Nimbhorkar, P., Varadarajan, K.: The planar k-means problem is NP-hard. Theor. Comput. Sci. 442, 13–21 (2012)

    Article  MathSciNet  Google Scholar 

  10. Kel’manov, A.V., Khamidullin, S.A.: An approximating polynomial algorithm for a sequence partitioning problem. J. Appl. Ind. Math. 8(2), 236–244 (2014)

    Article  MathSciNet  Google Scholar 

  11. Kel’manov, A.V., Khamidullin, S.A., Khandeev, V.I.: Exact pseudopolynomial algorithm for one sequence partitioning problem. Autom. Remote Control. 78(1), 66–73 (2017)

    Article  Google Scholar 

  12. Kel’manov, A.V., Khamidullin, S.A., Khandeev, V.I.: A fully polynomial-time approximation scheme for a sequence 2-cluster partitioning problem. J. Appl. Ind. Math. 10(2), 209–219 (2016)

    Article  MathSciNet  Google Scholar 

  13. Kel’manov, A.V., Khamidullin, S.A., Khandeev, V.I.: A randomized algorithm for a sequence 2-clustering problem. Comput. Math. Math. Phys. 58(12) (2018, in publishing)

    Google Scholar 

  14. Kel’manov, A., Khamidullin, S., Khandeev, V.: A randomized algorithm for 2-partition of a sequence. In: van der Aalst, W.M.P., et al. (eds.) AIST 2017. LNCS, vol. 10716, pp. 313–322. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-73013-4_29

    Chapter  Google Scholar 

  15. Bishop, C.M.: Pattern Recognition and Machine Learning. Springer, New York (2006)

    MATH  Google Scholar 

  16. James, G., Witten, D., Hastie, T., Tibshirani, R.: An Introduction to Statistical Learning. Springer, New York (2013). https://doi.org/10.1007/978-1-4614-7138-7

    Book  MATH  Google Scholar 

  17. Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning, 2nd edn. Springer, New York (2009). https://doi.org/10.1007/978-0-387-84858-7

    Book  MATH  Google Scholar 

  18. Aggarwal, C.C.: Data Mining: The Textbook. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-14142-8

    Book  MATH  Google Scholar 

  19. Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning (Adaptive Computation and Machine Learning series). The MIT Press, Cambridge (2017)

    MATH  Google Scholar 

  20. Shirkhorshidi, A.S., Aghabozorgi, S., Wah, T.Y., Herawan, T.: Big data clustering: a review. In: Murgante, B., Misra, S., Rocha, A.M.A.C., Torre, C., Rocha, J.G., Falcão, M.I., Taniar, D., Apduhan, B.O., Gervasi, O. (eds.) ICCSA 2014. LNCS, vol. 8583, pp. 707–720. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-09156-3_49

    Chapter  Google Scholar 

  21. Jain, A.K.: Data clustering: 50 years beyond \(k\)-means. Pattern Recognit. Lett. 31(8), 651–666 (2010)

    Article  Google Scholar 

  22. Pach, J., Agarwal, P.K.: Combinatorial Geometry. Wiley, New York (1995)

    Book  Google Scholar 

  23. Fu, T.-C.: A review on time series data mining. Eng. Appl. Artif. Intell. 24(1), 164–181 (2011)

    Article  Google Scholar 

  24. Kuenzer, C., Dech, S., Wagner, W. (eds.): Remote Sensing Time Series. Remote Sensing and Digital Image Processing, vol. 22. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-15967-6

    Book  Google Scholar 

  25. Liao, T.W.: Clustering of time series data – a survey. Pattern Recognit. 38(11), 1857–1874 (2005)

    Article  Google Scholar 

  26. Kel’manov, A.V., Pyatkin, A.V.: On the complexity of a search for a subset of “similar” vectors. Dokl. Math. 78(1), 574–575 (2008)

    Article  MathSciNet  Google Scholar 

  27. Kel’manov, A.V., Pyatkin, A.V.: On a version of the problem of choosing a vector subset. J. Appl. Ind. Math. 3(4), 447–455 (2009)

    Article  MathSciNet  Google Scholar 

  28. Kel’manov, A.V., Khandeev, V.I.: A 2-approximation polynomial algorithm for a clustering problem. J. Appl. Ind. Math. 7(4), 515–521 (2013)

    Article  MathSciNet  Google Scholar 

  29. Gimadi, E.Kh., Kel’manov, A.V., Kel’manova, M.A., Khamidullin, S.A.: A posteriori detection of a quasi periodic fragment in numerical sequences with given number of recurrences. Sib. J. Ind. Math. 9 (1(25)), 55–74 (2006). (in Russian)

    Google Scholar 

  30. Gimadi, E.Kh., Kel’manov, A.V., Kel’manova, M.A., Khamidullin, S.A.: A posteriori detecting a quasiperiodic fragment in a numerical sequence. Pattern Recognit. Image Anal. 18(1), 30–42 (2008)

    Article  Google Scholar 

  31. Baburin, A.E., Gimadi, E.Kh., Glebov, N.I., Pyatkin, A.V.: The problem of finding a subset of vectors with the maximum total weight. J. Appl. Ind. Math. 2(1), 32–38 (2008)

    Article  MathSciNet  Google Scholar 

  32. Kel’manov, A.V., Khandeev, V.I.: An exact pseudopolynomial algorithm for a problem of the two-cluster partitioning of a set of vectors. J. Appl. Ind. Math. 9(4), 497–502 (2015)

    Article  MathSciNet  Google Scholar 

  33. Gimadi, E.Kh., Pyatkin, A.V., Rykov, I.A.: On polynomial solvability of some problems of a vector subset choice in a Euclidean space of fixed dimension. J. Appl. Ind. Math. 4(1), 48–53 (2010)

    Article  MathSciNet  Google Scholar 

  34. Shenmaier, V.V.: Solving some vector subset problems by Voronoi diagrams. J. Appl. Ind. Math. 10(4), 560–566 (2016)

    Article  MathSciNet  Google Scholar 

  35. Dolgushev, A.V., Kel’manov, A.V.: An approximation algorithm for solving a problem of cluster analysis. J. Appl. Ind. Math. 5(4), 551–558 (2011)

    Article  MathSciNet  Google Scholar 

  36. Dolgushev, A.V., Kel’manov, A.V., Shenmaier, V.V.: Polynomial-time approximation scheme for a problem of partitioning a finite set into two clusters. Proc. Steklov Inst. Math. 295(Suppl. 1), 47–56 (2016)

    Article  MathSciNet  Google Scholar 

  37. Kel’manov, A.V., Khandeev, V.I.: Fully polynomial-time approximation scheme for a special case of a quadratic Euclidean 2-clustering problem. J. Appl. Ind. Math. 56(2), 334–341 (2016)

    MathSciNet  MATH  Google Scholar 

  38. Kel’manov, A., Motkova, A., Shenmaier, V.: An approximation scheme for a weighted two-cluster partition problem. In: van der Aalst, W.M.P., et al. (eds.) AIST 2017. LNCS, vol. 10716, pp. 323–333. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-73013-4_30

    Chapter  Google Scholar 

  39. Kel’manov, A.V., Khandeev, V.I.: A randomized algorithm for two-cluster partition of a set of vectors. Comput. Math. Math. Phys. 55(2), 330–339 (2015)

    Article  MathSciNet  Google Scholar 

  40. Kel’manov, A.V., Khandeev, V.I., Panasenko A.V.: Exact algorithms for the special cases of two hard to solve problems of searching for the largest subset. In: van der Aalst, W.M.P., et al. (eds.) AIST 2018. LNCS, vol. 11179, pp. 294–304. Springer, Cham (2018)

    Google Scholar 

  41. Kel’manov, A.V., Khandeev, V.I.: Panasenko A.V.: Exact algorithms for two hard to solve 2-clustering problems. Pattern Recognit. Image Anal. 27(4) (2018, in publishing)

    Google Scholar 

  42. Kel’manov, A.V., Pyatkin, A.V.: On complexity of some problems of cluster analysis of vector sequences. J. Appl. Ind. Math. 7(3), 363–369 (2013)

    Article  MathSciNet  Google Scholar 

  43. Kel’manov, A.V., Khamidullin, S.A.: An approximation polynomial-time algorithm for a sequence Bi-clustering problem. Comput. Math. Math. Phys. 55(6), 1068–1076 (2015)

    Article  MathSciNet  Google Scholar 

  44. Kel’manov, A.V., Pyatkin, A.V.: NP-completeness of some problems of choosing a vector subset. J. Appl. Ind. Math. 5(3), 352–357 (2011)

    Article  MathSciNet  Google Scholar 

  45. Garey, M.R., Johnson, D.S.: Computers and Intractability: A Guide to the Theory of NP-Completeness. Freeman, San Francisco (1979)

    MATH  Google Scholar 

  46. Kel’manov, A.V., Khamidullin, S.A.: Posterior detection of a given number of identical subsequences in a quasi-periodic sequence. Comput. Math. Math. Phys. 41(5), 762–774 (2001)

    MathSciNet  MATH  Google Scholar 

Download references

Acknowledgments

The study presented in Sects. 3 and 5 was supported by the Russian Science Foundation, project 16-11-10041. The study presented in Sects. 2 and 4 was supported by the Russian Foundation for Basic Research, projects 16-07-00168 and 18-31-00398, by the Russian Academy of Science (the Program of basic research), project 0314-2016-0015, and by the Russian Ministry of Science and Education under the 5-100 Excellence Programme.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Vladimir Khandeev .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Kel’manov, A., Khamidullin, S., Khandeev, V., Pyatkin, A. (2019). An Exact Algorithm of Searching for the Largest Size Cluster in an Integer Sequence 2-Clustering Problem. In: Evtushenko, Y., Jaćimović, M., Khachay, M., Kochetov, Y., Malkova, V., Posypkin, M. (eds) Optimization and Applications. OPTIMA 2018. Communications in Computer and Information Science, vol 974. Springer, Cham. https://doi.org/10.1007/978-3-030-10934-9_10

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-10934-9_10

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-10933-2

  • Online ISBN: 978-3-030-10934-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics