Abstract
In this paper we present a review of some metrics to be proposed as allocation functions in the Dynamic Clustering Algorithm (DCA) when data are distribution or histograms of values. The choice of the most suitable distance plays a central role in the DCA because it is related to the criterion function that is optimized. Moreover, it has to be consistent with the prototype which represents the cluster. In such a way, for each proposed metric, we identify the corresponding prototype according to the minimization of the criterion function and then to the best fitting between the partition and the best representation of the clusters. Finally, we focus our attention on a Wassertein based distance showing its optimality in partitioning a set of histogram data with respect to a representation of the clusters by means of their barycenter expressed in terms of distributions.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
BENZÉCRI, J.P. (1973): Théorie de l—information et classification d’après un tableau de contingence. L’Analyse des données, Tome 1, Dunod.
CELEUX, G., DIDAY, E., GOVAERT G., LECHEVALLIER, Y., RALAMBONDRAINY, H. (1989): Classification Automatique des Données, Environnement Statistique et Informatique. Bordas, Paris.
CHAVENT, M., DE CARVALHO, F.A.T., LECHEVALLIER, Y., and VERDE, R. (2006): New clustering methods for interval data, Computational statistics, Phisica-Verlag, 21, 211–229.
CSISZAR, I. (1967): Information type measures of difference of probability distributions and indirect observations, Studia Sci. Math. Hungar, 2, 299–318.
DIACONIS, P. (1988). Group Representations in Probability and Statistics, Institute of Mathematical Statistics, Harvard University, CA.
DIDAY, E., and SIMON, J.C. (1976): Clustering analysis, In: K.S. Fu (Eds.), Digital Pattern Recognition, 47–94, Springer Verlag, Heidelberg.
DIDAY, E. (1971): La méthode des nuées dynamiques, Revue de Statistique Appliquée, 19,2, 19–34.
GIBBS, A.L. and SU, F.E. (2002): On choosing and bounding probability metrics, International Statistical Review, 70, 419.
HELLINGER, E. (1907): Die Orthogonalinvarianten quadratischer Formen von unendlich vielen Variablen, Dissertation, Göttingen.
HUBER, P.J. (1981): Robust Statistics, John Wiley and Sons, New York.
IRPINO, A., VERDE, R., and LECHEVALLIER Y. (2006): Dynamic clustering of histograms using Wasserstein metric, in COMPSTAT 2006, (Eds. Rizzi, Vichi), Springer, Berlin, 869–876.
MALLOWS, C.L. (1972): A note on asymptotic joint normality. Annals of Mathematical Statistics, 43(2), 508–515.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2007 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Verde, R., Irpino, A. (2007). Dynamic Clustering of Histogram Data: Using the Right Metric. In: Brito, P., Cucumel, G., Bertrand, P., de Carvalho, F. (eds) Selected Contributions in Data Analysis and Classification. Studies in Classification, Data Analysis, and Knowledge Organization. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-73560-1_12
Download citation
DOI: https://doi.org/10.1007/978-3-540-73560-1_12
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-73558-8
Online ISBN: 978-3-540-73560-1
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)