Abstract
This paper deals with the clustering of complex data. The input elements to be clustered are linear models estimated on samples arising from several sub-populations (typologies of individuals). We review the main approaches to the computation of metrics between linear models. We propose to use a Wasserstein based metric for the first time in this field. We show the properties of the proposed metric and an application to real data using a dynamic clustering algorithm.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Cuesta-Albertos, J. A., Matrán, C., & Tuero-Diaz, A. (1997). Optimal transportation plans and convergence in distribution. Journal of Multivariate Analysis, 60, 72–83.
Diday, E. (1971). La méthode des Nueées dynamiques. Revue de statistique appliquée, 19(2), 19–34.
Gibbs, A. L., & Su, F. E. (2002). On choosing and bounding probability metrics. International Statistical Review, 7(3), 419–435.
Ingrassia, S., Cerioli, A., & Corbellini, A. (2003). Some issues on clustering of functional data. In: M. Shader, W. Gaul, & M. Vichi (Eds.), Between data science and applied data analysis (pp. 49–56). Berlin: Springer.
Irpino, A., & Romano, E. (2007). Optimal histogram representation of large data sets: Fisher vs piecewise linear approximations. RNTI, E-9, 99–110.
Irpino, A., & Verde, R. (2006). A new Wasserstein based distance for the hierarchical clustering of histogram symbolic data. In: V. Batanjeli, H. H. Bock, A. Ferligoj, & A. Ziberna, (Eds.), Data science and classification, IFCS 2006 (pp. 185–192). Berlin: Springer.
Irpino, A., Verde, R., & Lechevallier, Y. (2006). Dynamic clustering of histograms using Wasserstein metric. In: A. Rizzi, & M. Vichi (Eds.), COMPSTAT 2006 – Advances in computational statistics (pp. 869–876). Berlin: Physica.
McCullagh, P. (2007). What is a statistical model? The Annals of Statistics, 30(5), 1225–1310.
Piccolo, D. (1990). A distance measure for classifying ARIMA models. Journal of Time Series Analysis, 11, 153–164.
Romano, E., Giordano, G., & Lauro, C. N. (2006). An inter model distance for clustering utility function. Statistica Applicata, 18(3), 521–533.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Irpino, A., Verde, R. (2010). Clustering Linear Models Using Wasserstein Distance. In: Palumbo, F., Lauro, C., Greenacre, M. (eds) Data Analysis and Classification. Studies in Classification, Data Analysis, and Knowledge Organization. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-03739-9_5
Download citation
DOI: https://doi.org/10.1007/978-3-642-03739-9_5
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-03738-2
Online ISBN: 978-3-642-03739-9
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)