Advertisement

Exact Algorithms for Two Quadratic Euclidean Problems of Searching for the Largest Subset and Longest Subsequence

  • Alexander Kel’manov
  • Sergey Khamidullin
  • Vladimir KhandeevEmail author
  • Artem Pyatkin
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11353)

Abstract

The following two strongly NP-hard problems are considered. In the first problem, we need to find in the given finite set of points in Euclidean space the subset of largest size such that the sum of squared distances between the elements of this subset and its unknown centroid (geometrical center) does not exceed a given percentage of the sum of squared distances between the elements of the input set and its centroid. In the second problem, the input is a sequence (not a set) and we have some additional constraints on the indices of the elements of the chosen subsequence under the same restriction on the sum of squared distances as in the first problem. Both problems can be treated as data editing problems aimed to find similar elements and removal of extraneous (dissimilar) elements. We propose exact algorithms for the cases of both problems in which the input points have integer-valued coordinates. If the space dimension is bounded by some constant, our algorithms run in a pseudopolynomial time. Some results of numerical experiments illustrating the performance of the algorithms are presented.

Keywords

Euclidean space Largest set Longest subsequence Quadratic variation NP-hard problem Integer coordinates Exact algorithm Fixed space dimension Pseudopolynomial time 

Notes

Acknowledgments

The study presented in Sects. 2, 4 was supported by the Russian Science Foundation, project 16-11-10041. The study presented in Sects. 3, 5 was supported by the Russian Foundation for Basic Research, projects 16-07-00168 and 18-31-00398, by the Russian Academy of Science (the Program of basic research), project 0314-2016-0015, and by the Russian Ministry of Science and Education under the 5-100 Excellence Programme.

References

  1. 1.
    Kel’manov, A.V., Pyatkin, A.V.: NP-completeness of some problems of choosing a vector subset. J. Appl. Ind. Math. 5(3), 352–357 (2011)MathSciNetCrossRefGoogle Scholar
  2. 2.
    Kel’manov, A.V., Pyatkin, A.V.: On the complexity of some problems of choosing a vector subsequence. Zhurnal Vychislitel’noi Matematiki i Matematicheskoi Fiziki (in Russian) 52(12), 2284–2291 (2012)zbMATHGoogle Scholar
  3. 3.
    Aggarwal, A., Imai, H., Katoh, N., Suri, S.: Finding \(k\) points with minimum diameter and related problems. J. Algorithms 12(1), 38–56 (1991)Google Scholar
  4. 4.
    Kel’manov, A.V., Romanchenko, S.M.: An approximation algorithm for solving a problem of search for a vector subset. J. Appl. Ind. Math. 6(1), 90–96 (2012)MathSciNetCrossRefGoogle Scholar
  5. 5.
    Kel’manov, A.V., Romanchenko, S.M.: Pseudopolynomial algorithms for certain computationally hard vector subset and cluster analysis problems. Autom. Remote Control 73(2), 349–354 (2012)MathSciNetCrossRefGoogle Scholar
  6. 6.
    Shenmaier, V.V.: An approximation scheme for a problem of search for a vector subset. J. Appl. Ind. Math. 6(3), 381–386 (2012)MathSciNetCrossRefGoogle Scholar
  7. 7.
    Kel’manov, A.V., Romanchenko, S.M.: An FPTAS for a vector subset search problem. J. Appl. Ind. Math. 8(3), 329–336 (2014)MathSciNetCrossRefGoogle Scholar
  8. 8.
    Shenmaier, V.V.: Solving some vector subset problems by voronoi diagrams. J. Appl. Ind. Math. 10(2), 550–566 (2016)MathSciNetzbMATHGoogle Scholar
  9. 9.
    Kel’manov, A.V., Romanchenko, S.M., Khamidullin, S.A.: Approximation algorithms for some intractable problems of choosing a vector subsequence. J. Appl. Ind. Math. 6(4), 443–450 (2012)MathSciNetCrossRefGoogle Scholar
  10. 10.
    Kel’manov, A.V., Romanchenko, S.M., Khamidullin, S.A.: Exact pseudopolynomial algorithms for some np-hard problems of searching a vectors subsequence. Zhurnal Vychislitel’noi Matematiki i Matematicheskoi Fiziki (in Russian) 53(1), 143–153 (2013)zbMATHGoogle Scholar
  11. 11.
    Kel’manov, A.V., Romanchenko, S.M., Khamidullin, S.A.: An approximation scheme for the problem of finding a subsequence. Numerical Anal. Appl. 10(4), 313–323 (2017)MathSciNetCrossRefGoogle Scholar
  12. 12.
    Ageev, A.A., Kel’manov, A.V., Pyatkin, A.V., Khamidullin, S.A., Shenmaier, V.V.: Approximation polynomial algorithm for the data editing and data cleaning problem. Pattern Recognit. Image Anal. 17(3), 365–370 (2017)CrossRefGoogle Scholar
  13. 13.
    de Waal, T., Pannekoek, J., Scholtus, S.: Handbook of Statistical Data Editing and Imputation. Wiley, Hoboken (2011)CrossRefGoogle Scholar
  14. 14.
    Osborne, J.W.: Best Practices in Data Cleaning: A Complete Guide to Everything You Need to Do Before and After Collecting Your Data, 1st edn. SAGE Publication, Inc., Los Angeles (2013)CrossRefGoogle Scholar
  15. 15.
    Farcomeni, A., Greco, L.: Robust Methods for Data Reduction. Chapman and Hall/CRC, Boca Raton (2015)Google Scholar
  16. 16.
    Hansen, P., Jaumard, B.: Cluster analysis and mathematical programming. Math. Program. 79, 191–215 (1997)MathSciNetzbMATHGoogle Scholar
  17. 17.
    Jain, A.K.: Data clustering: 50 years beyond \(k\)-means. Pattern Recognit. Lett. 31(8), 651–666 (2010)Google Scholar
  18. 18.
    Shirkhorshidi, A.S., Aghabozorgi, S., Wah, T.Y., Herawan, T.: Big Data Clustering: A Review. LNCS. 8583, 707–720 (2014)Google Scholar
  19. 19.
    Bishop, C.M.: Pattern Recognition and Machine Learning. Springer Science+Business Media, LLC, New York (2006)zbMATHGoogle Scholar
  20. 20.
    James, G., Witten, D., Hastie, T., Tibshirani, R.: An Introduction to Statistical Learning. Springer Science+Business Media, LLC, New York (2013)CrossRefGoogle Scholar
  21. 21.
    Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning, 2nd edn. Springer, Berlin (2009)CrossRefGoogle Scholar
  22. 22.
    Aggarwal, C.C.: Data Mining: The Textbook. Springer International Publishing, Berlin (2015)zbMATHGoogle Scholar
  23. 23.
    Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning (Adaptive Computation and Machine Learning series). MIT Press, Cambridge (2017)zbMATHGoogle Scholar
  24. 24.
    Fu, T.: A review on time series data mining. Eng. Appl. Artif. Intell. 24(1), 164–181 (2011)CrossRefGoogle Scholar
  25. 25.
    Kuenzer, C., Dech, S., Wagner, W. (eds.): Remote Sensing Time Series. RSDIP, vol. 22. Springer, Cham (2015).  https://doi.org/10.1007/978-3-319-15967-6CrossRefGoogle Scholar
  26. 26.
    Liao, T.W.: Clustering of time series data – a survey. Pattern Recognit. 38(11), 1857–1874 (2005)CrossRefGoogle Scholar
  27. 27.
    Kel’manov, A.V., Khamidullin, S.A.: Posterior detection of a given number of identical subsequences in a quasi-periodic sequence. Comput. Math. Math. Phys. 41(5), 762–774 (2001)MathSciNetzbMATHGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  • Alexander Kel’manov
    • 1
    • 2
  • Sergey Khamidullin
    • 1
  • Vladimir Khandeev
    • 1
    • 2
    Email author
  • Artem Pyatkin
    • 1
    • 2
  1. 1.Sobolev Institute of MathematicsNovosibirskRussia
  2. 2.Novosibirsk State UniversityNovosibirskRussia

Personalised recommendations