Abstract
We consider the computational complexity of one extremal problem of choosing a subset of \(p \) points from some given \(2 \)-clustering of a finite set in a metric space. The chosen subset of points has to describe the given clusters in the best way from the viewpoint of some geometric criterion. This is a formalization of an applied problem of data mining which consists in finding a subset of typical representatives of a dataset composed of two classes based on the function of rival similarity. The problem is proved to be NP-hard. To this end, we polynomially reduce to the problem one of the well-known problems NP-hard in the strong sense, the \(p \)-median problem.
REFERENCES
M. R. Garey and D. S. Johnson, Computers and Intractability: A Guide to the Theory of NP-Completeness (Freeman, San Francisco, 1979).
D. Aloise, A. Deshpande, P. Hansen, and P. Popat, “NP-Hardness of Euclidean Sum-of-Squares Clustering,” Machine Learning 75 (2), 245–248 (2009).
S. Dasgupta, “The Hardness of \(k \)-Means Clustering,” in Technical Report CS2007-0890 (University of California, San Diego, 2008), pp. 1–6.
C. H. Papadimitriou, “Worst-Case and Probabilistic Analysis of a Geometric Location Problem,” SIAM J. Comput. 10 (3), 542–557 (1981).
S. Har-Peled and S. Mazumdar, “Coresets for \(k \)-Means and \(k \)-Median Clustering and Their Applications,” inProceedings of 36th Annual ACM Symposium on Theory of Computing (Chicago, Illinois, USA, June 13-15, 2004), pp. 291–300.
L. Kaufman and P. J. Rousseeuw, “Clustering by Means of Medoids,” inStatistical Data Analysis Based on the \(L_1 \)-Norm and Related Methods, Ed. by Y. Dodge (North Holland, Amsterdam, 1987), pp. 405–416.
A. V. Kel’manov, A. V. Pyatkin, and V. I. Khandeev, “Complexity of Some Max-Min Clustering Problems,” Trudy Inst. Mat. Mekh. Ural. Otdel. Ross. Akad. Nauk24 (4), 189–198 (2018).
A. V. Kel’manov and A. V. Pyatkin, “NP-Hardness of Some Euclidean Problems of Partition of a Finite Point Set,” Zh. Vychisl. Mat. Mat. Fiz. 58 (5), 852–856 (2018).
H. Aggarwal, N. Imai, N. Katoh, and S. Suri, “Finding \(k \) Points with Minimum Diameter and Related Problems,” J. Algorithms 12 (1), 38–56 (1991).
S. Banerjee, S. Bhore, and R. Chitnis, “Algorithms and Hardness Results for Nearest Neighbor Problems in Bicolored Point Sets,” in Proceedings of the 13th Latin American Theoretical Informatics Symposium (LATIN) (Buenos Aires, Argentina, April 16-19, 2018), pp. 80–93.
A. V. Zukhba, “NP-Completeness of the Problem of Prototype Selection in the Nearest Neighbor Method,” Pattern Recognition and Image Analysis 20 (4), 484–494 (2010).
V. N. Vapnik, The Reconstruction of Dependencies from Empirical Data (Nauka, Moscow, 1974) [in Russian].
C. J. C. Burges, “A Tutorial on Support Vector Machines for Pattern Recognition,” Data Mining Knowl. Discov. 2 (2), 121–167 (1998).
N. G. Zagoruiko, I. A. Borisova, V. V. Dyubanov, and O. A. Kutnenko, “Methods of Recognition Based on the Function of Rival Similarity,” Pattern Recognition and Image Analysis 18 (1), 1–6 (2008).
O. Kariv and S. Hakimi, “An Algorithmic Approach to Network Location Problems. The p-Medians,” SIAM J. Appl. Math. 37, 539–560 (1979).
Funding
The author was supported by the State Task to the Sobolev Institute of Mathematics (project no. 0314–2019–0015).
Author information
Authors and Affiliations
Corresponding author
Additional information
Translated by Ya.A. Kopylov
Rights and permissions
About this article
Cite this article
Borisova, I.A. Computational Complexity of the Problem of Choosing Typical Representatives in a \(\boldsymbol 2\) -Clustering of a Finite Set of Points in a Metric Space. J. Appl. Ind. Math. 14, 242–248 (2020). https://doi.org/10.1134/S1990478920020039
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1134/S1990478920020039