Skip to main content

Clustering Linear Models Using Wasserstein Distance

  • Conference paper
  • First Online:
Data Analysis and Classification

Abstract

This paper deals with the clustering of complex data. The input elements to be clustered are linear models estimated on samples arising from several sub-populations (typologies of individuals). We review the main approaches to the computation of metrics between linear models. We propose to use a Wasserstein based metric for the first time in this field. We show the properties of the proposed metric and an application to real data using a dynamic clustering algorithm.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  • Cuesta-Albertos, J. A., Matrán, C., & Tuero-Diaz, A. (1997). Optimal transportation plans and convergence in distribution. Journal of Multivariate Analysis, 60, 72–83.

    Article  MATH  MathSciNet  Google Scholar 

  • Diday, E. (1971). La méthode des Nueées dynamiques. Revue de statistique appliquée, 19(2), 19–34.

    Google Scholar 

  • Gibbs, A. L., & Su, F. E. (2002). On choosing and bounding probability metrics. International Statistical Review, 7(3), 419–435.

    Google Scholar 

  • Ingrassia, S., Cerioli, A., & Corbellini, A. (2003). Some issues on clustering of functional data. In: M. Shader, W. Gaul, & M. Vichi (Eds.), Between data science and applied data analysis (pp. 49–56). Berlin: Springer.

    Google Scholar 

  • Irpino, A., & Romano, E. (2007). Optimal histogram representation of large data sets: Fisher vs piecewise linear approximations. RNTI, E-9, 99–110.

    Google Scholar 

  • Irpino, A., & Verde, R. (2006). A new Wasserstein based distance for the hierarchical clustering of histogram symbolic data. In: V. Batanjeli, H. H. Bock, A. Ferligoj, & A. Ziberna, (Eds.), Data science and classification, IFCS 2006 (pp. 185–192). Berlin: Springer.

    Chapter  Google Scholar 

  • Irpino, A., Verde, R., & Lechevallier, Y. (2006). Dynamic clustering of histograms using Wasserstein metric. In: A. Rizzi, & M. Vichi (Eds.), COMPSTAT 2006 – Advances in computational statistics (pp. 869–876). Berlin: Physica.

    Google Scholar 

  • McCullagh, P. (2007). What is a statistical model? The Annals of Statistics, 30(5), 1225–1310.

    MathSciNet  Google Scholar 

  • Piccolo, D. (1990). A distance measure for classifying ARIMA models. Journal of Time Series Analysis, 11, 153–164.

    Article  MATH  Google Scholar 

  • Romano, E., Giordano, G., & Lauro, C. N. (2006). An inter model distance for clustering utility function. Statistica Applicata, 18(3), 521–533.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Antonio Irpino .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Irpino, A., Verde, R. (2010). Clustering Linear Models Using Wasserstein Distance. In: Palumbo, F., Lauro, C., Greenacre, M. (eds) Data Analysis and Classification. Studies in Classification, Data Analysis, and Knowledge Organization. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-03739-9_5

Download citation

Publish with us

Policies and ethics