Clustering Linear Models Using Wasserstein Distance

Irpino, Antonio; Verde, Rosanna

doi:10.1007/978-3-642-03739-9_5

Antonio Irpino⁴ &
Rosanna Verde

Part of the book series: Studies in Classification, Data Analysis, and Knowledge Organization ((STUDIES CLASS))

1575 Accesses
1 Citations

Abstract

This paper deals with the clustering of complex data. The input elements to be clustered are linear models estimated on samples arising from several sub-populations (typologies of individuals). We review the main approaches to the computation of metrics between linear models. We propose to use a Wasserstein based metric for the first time in this field. We show the properties of the proposed metric and an application to real data using a dynamic clustering algorithm.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Cuesta-Albertos, J. A., Matrán, C., & Tuero-Diaz, A. (1997). Optimal transportation plans and convergence in distribution. Journal of Multivariate Analysis, 60, 72–83.
Article MATH MathSciNet Google Scholar
Diday, E. (1971). La méthode des Nueées dynamiques. Revue de statistique appliquée, 19(2), 19–34.
Google Scholar
Gibbs, A. L., & Su, F. E. (2002). On choosing and bounding probability metrics. International Statistical Review, 7(3), 419–435.
Google Scholar
Ingrassia, S., Cerioli, A., & Corbellini, A. (2003). Some issues on clustering of functional data. In: M. Shader, W. Gaul, & M. Vichi (Eds.), Between data science and applied data analysis (pp. 49–56). Berlin: Springer.
Google Scholar
Irpino, A., & Romano, E. (2007). Optimal histogram representation of large data sets: Fisher vs piecewise linear approximations. RNTI, E-9, 99–110.
Google Scholar
Irpino, A., & Verde, R. (2006). A new Wasserstein based distance for the hierarchical clustering of histogram symbolic data. In: V. Batanjeli, H. H. Bock, A. Ferligoj, & A. Ziberna, (Eds.), Data science and classification, IFCS 2006 (pp. 185–192). Berlin: Springer.
Chapter Google Scholar
Irpino, A., Verde, R., & Lechevallier, Y. (2006). Dynamic clustering of histograms using Wasserstein metric. In: A. Rizzi, & M. Vichi (Eds.), COMPSTAT 2006 – Advances in computational statistics (pp. 869–876). Berlin: Physica.
Google Scholar
McCullagh, P. (2007). What is a statistical model? The Annals of Statistics, 30(5), 1225–1310.
MathSciNet Google Scholar
Piccolo, D. (1990). A distance measure for classifying ARIMA models. Journal of Time Series Analysis, 11, 153–164.
Article MATH Google Scholar
Romano, E., Giordano, G., & Lauro, C. N. (2006). An inter model distance for clustering utility function. Statistica Applicata, 18(3), 521–533.
Google Scholar

Download references

Author information

Authors and Affiliations

Dipartimento di Studi Europei e Mediterranei, Second University of Naples, Via del Setificio, 15, Belvedere di San Leucio, 81100, Caserta, Italy
Antonio Irpino

Authors

Antonio Irpino
View author publications
You can also search for this author in PubMed Google Scholar
Rosanna Verde
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Antonio Irpino .

Editor information

Editors and Affiliations

Fac. Economia, Università Macerata, Via Crescimbeni 20, Macerata, 62100, Italy
Francesco Palumbo
Dipto. Matematica e Statistica, Università Federico II di Napoli, Via Cinthia (Monte S. Angelo), Napoli, 80126, Italy
Carlo Natale Lauro
Depto. Economía y Empresa, Universitat Pompeu Fabra, Ramon Trias Fargas 25-27, Barcelona, 08005, Spain
Michael J. Greenacre

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Irpino, A., Verde, R. (2010). Clustering Linear Models Using Wasserstein Distance. In: Palumbo, F., Lauro, C., Greenacre, M. (eds) Data Analysis and Classification. Studies in Classification, Data Analysis, and Knowledge Organization. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-03739-9_5

Download citation

DOI: https://doi.org/10.1007/978-3-642-03739-9_5
Published: 25 November 2009
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-03738-2
Online ISBN: 978-3-642-03739-9
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)

Publish with us

Policies and ethics