A Randomized Algorithm for a Sequence 2-Clustering Problem
- 7 Downloads
We consider a strongly NP-hard problem of partitioning a finite Euclidean sequence into two clusters of given cardinalities minimizing the sum over both clusters of intracluster sums of squared distances from clusters elements to their centers. The center of one cluster is unknown and is defined as the mean value of all points in the cluster. The center of the other cluster is the origin. Additionally, the difference between the indices of two consequent points from the first cluster is bounded from below and above by some constants. A randomized algorithm that finds an approximation solution of the problem in polynomial time for given values of the relative error and failure probability and for an established parameter value is proposed. The conditions are established under which the algorithm is polynomial and asymptotically exact.
Keywords:partitioning sequence Euclidean space minimum sum-of-squared distances NP-hardness randomized algorithm asymptotic accuracy
This work was supported by the Russian Science Foundation, project no. 16-11-10041.
- 4.M. C. Bishop, Pattern Recognition and Machine Learning (Springer Science + Business Media, New York, 2006).Google Scholar
- 14.A. V. Kel’manov, S. A. Khamidullin, and V. I. Khandeev, “Exact pseudopolynomial algorithm for a sequence bi-clustering problem,” Proceedings of 15th All-Russia Conference on Mathematical Programming and Applications (Yekaterinburg, 2015), pp. 139–140.Google Scholar