Exact Algorithms for Two Quadratic Euclidean Problems of Searching for the Largest Subset and Longest Subsequence
The following two strongly NP-hard problems are considered. In the first problem, we need to find in the given finite set of points in Euclidean space the subset of largest size such that the sum of squared distances between the elements of this subset and its unknown centroid (geometrical center) does not exceed a given percentage of the sum of squared distances between the elements of the input set and its centroid. In the second problem, the input is a sequence (not a set) and we have some additional constraints on the indices of the elements of the chosen subsequence under the same restriction on the sum of squared distances as in the first problem. Both problems can be treated as data editing problems aimed to find similar elements and removal of extraneous (dissimilar) elements. We propose exact algorithms for the cases of both problems in which the input points have integer-valued coordinates. If the space dimension is bounded by some constant, our algorithms run in a pseudopolynomial time. Some results of numerical experiments illustrating the performance of the algorithms are presented.
KeywordsEuclidean space Largest set Longest subsequence Quadratic variation NP-hard problem Integer coordinates Exact algorithm Fixed space dimension Pseudopolynomial time
The study presented in Sects. 2, 4 was supported by the Russian Science Foundation, project 16-11-10041. The study presented in Sects. 3, 5 was supported by the Russian Foundation for Basic Research, projects 16-07-00168 and 18-31-00398, by the Russian Academy of Science (the Program of basic research), project 0314-2016-0015, and by the Russian Ministry of Science and Education under the 5-100 Excellence Programme.
- 3.Aggarwal, A., Imai, H., Katoh, N., Suri, S.: Finding \(k\) points with minimum diameter and related problems. J. Algorithms 12(1), 38–56 (1991)Google Scholar
- 15.Farcomeni, A., Greco, L.: Robust Methods for Data Reduction. Chapman and Hall/CRC, Boca Raton (2015)Google Scholar
- 17.Jain, A.K.: Data clustering: 50 years beyond \(k\)-means. Pattern Recognit. Lett. 31(8), 651–666 (2010)Google Scholar
- 18.Shirkhorshidi, A.S., Aghabozorgi, S., Wah, T.Y., Herawan, T.: Big Data Clustering: A Review. LNCS. 8583, 707–720 (2014)Google Scholar