Abstract
We propose a new approach to the construction of \(k \)-means clustering algorithms in which the Mahalanobis distance is used instead of the Euclidean distance. The approach is based on minimizing differentiable estimates of the mean insensitive to outliers. Illustrative examples convincingly show that the proposed algorithm is highly likely to be robust with respect to a large amount of outliers in the data.
Similar content being viewed by others
Notes
\(d(\mathbf {x},\mathbf {c},\mathbf {S}) =-\ln \bigl (|\mathbf {S}|^{-1/2} \exp \bigl \{-\frac {1}{2}(\mathbf {x} -\mathbf {c})^{\prime }\mathbf {S}^{-1}(\mathbf {x}-\mathbf {c}) \bigr \}\bigr )\).
REFERENCES
Teboulle, M., A unified continuous optimization framework for center-based clustering method, J. Mach. Learn. Res., 2007, no. 8, pp. 65–102.
Bezdek, J.C., Pattern Recognition with Fuzzy Objective Function Algorithms, New York: Plenum Press, 1981.
Duda, R.O., Hart, P.E., and Stork, D.G., Pattern Classification. 2nd Ed., New York: John Wiley & Sons, 2001.
Rose, K.A, Gurewitz, E., and Fox, C.G., Deterministic annealing approach to clustering, Pattern Recognit. Lett., 1990, vol. 11, no. 9, pp. 589–594.
Banerjee, A., Merugu, S., Dhillon, I.S., and Ghosh, J., Clustering with Bregman divergences, J. Mach. Learn. Res., 2005, no. 6, pp. 1705–1749.
Mesiar, R., Komornikova, M., Kolesarova, A., and Calvo, T., Aggregation functions: a revision, in Fuzzy Sets and Their Extensions: Representation, Aggregation and Models, Bustince, H., Herrera, F., and Montero, J., Eds., Berlin–Heidelberg: Springer, 2008.
Grabich, M., Marichal, J.-L., and Pap, E., Aggregation Functions. Ser: Encyclopedia of Mathematics and its Applications, no. 127 , Cambridge: Cambridge Univ. Press, 2009.
Shibzukhov, Z.M., Principle of minimizing empirical risk and averaging aggregate functions, Dokl. Ross. Akad. Nauk, 2017, vol. 476, no. 5, pp. 495–499.
Calvo, T. and Beliakov, G., Aggregation functions based on penalties, Fuzzy Sets Syst., 2010, vol. 161, no. 10, pp. 1420–1436.
Beliakov, G., Sola, H., and Calvo, T., Practical Guide to Averaging Functions, Berlin–Heidelberg: Springer, 2016.
Franti, P. and Sieranoja, S., K-means properties on six clustering benchmark datasets, Appl. Intell., 2018, vol. 48, no. 12, pp. 4743–4759.
Clustering Basic Benchmark. http://cs.joensuu.fi/sipu/datasets/ .
Funding
This work was supported by the Russian Foundation for Basic Research, project no. 18-01-00050.
Author information
Authors and Affiliations
Corresponding author
Additional information
Translated by V. Potapchouck
Rights and permissions
About this article
Cite this article
Shibzukhov, Z.M. On a Robust Approach to Search for Cluster Centers. Autom Remote Control 82, 1742–1751 (2021). https://doi.org/10.1134/S0005117921100118
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1134/S0005117921100118