Skip to main content

A Metric Based Approach for the Least Square Regression of Multivariate Modal Symbolic Data

  • Conference paper
  • First Online:

Abstract

In this paper we propose a linear regression model for multivariate modal symbolic data. The observed variables are probabilistic modal variables according to the definition given in (Bock and Diday (2000). Analysis of symbolic data: exploratory methods for extracting statistical information from complex data. Springer), i.e. variables whose realizations are frequency or probability distributions. The parameters are estimated through a Least Squares method based on a suitable squared distance between the predicted and the observed modal symbolic data: the squared 2 Wasserstein distance. Measures of goodness of fit are also presented and an application on real data corroborates the proposed method.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    Note that if x is a vector of scalars Eq. (8) becomes \(\mathbf{{x}^{T}y} =\sum \limits _{ i=1}^{n}x_{i} \cdot \bar{ y}_{i}\).

  2. 2.

    http://java.epa.gov/castnet/.

  3. 3.

    We supply the full table of histogram data, the MatlabTM code and workspace upon request.

References

  • Billard, L., & Diday, E. (2006). Symbolic data analysis: conceptual statistics and data mining. New York: Wiley.

    Book  Google Scholar 

  • Bock, H., & Diday, E. (2000). Analysis of symbolic data: exploratory methods for extracting statistical information from complex data. New York: Springer.

    Google Scholar 

  • Cuesta-Albertos, J. A., Matrán, C., & Tuero-Díaz, A. (1997). Optimal transportation plans and convergence in distribution. Journal of Multivariate Analysis, 60, 72–83.

    Article  MathSciNet  MATH  Google Scholar 

  • Dias, S., & Brito, P. (2011). A new linear regression model for histogram-valued variables. In 58th ISI World Statistics Congress. Dublin, Ireland.

    Google Scholar 

  • Diday, E., & Noirhomme-Fraiture, M. (2008). Symbolic data analysis and the SODAS software. New York: Wiley.

    MATH  Google Scholar 

  • Dueñas, C., Fernández, M., Cañete, S., Carretero, J., & Liger, E. (2002). Assessment of ozone variations and meteorological effects in an urban area in the mediterranean coast. Science of The Total Environment, 299(1–3), 97–113.

    Article  Google Scholar 

  • Gibbs, A., & Su, F. (2002). On choosing and bounding probability metrics. International Statistical Review, 70(3), 419–435.

    Article  MATH  Google Scholar 

  • Irpino, A., & Romano, E. (2007). Optimal histogram representation of large data sets: Fisher vs piecewise linear approximation. Revue des Nouvelles Technologies de l’Information, RNTI-E-9, 99–110.

    Google Scholar 

  • Lawson, C. L., & Hanson, R. J. (1974). Solving least square problems. Edgeworth Cliff, NJ: Prentice Hall.

    Google Scholar 

  • Verde, R., & Irpino, A. (2007). Dynamic clustering of histogram data: Using the right metric. In P. e. a. Brito (Ed.) Selected contributions in data analysis and classification (pp. 123–134). New York: Springer.

    Google Scholar 

  • Verde, R., & Irpino, A. (2007). Dynamic clustering of histogram data: Using the right metric. In P. Brito, G. Cucumel, P. Bertrand, & F. De Carvalho (Eds.) Selected contributions in data analysis and classification (Chap. 12, pp. 123–134). Berlin, Heidelberg: Springer.

    Google Scholar 

  • Verde, R., & Irpino, A. (2008). Comparing histogram data using a mahalanobis-wasserstein distance. In P. Brito (Ed.) COMPSTAT 2008 (Chap. 7, pp. 77–89). Heidelberg: Physica-Verlag HD.

  • Verde, R., & Irpino, A. (2010). Ordinary least squares for histogram data based on wasserstein distance. In Y. Lechevallier, & G. Saporta (Eds.) Proceedings of COMPSTAT’2010. (Chap. 60, pp. 581–588). Heidelberg: Physica-Verlag HD.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Antonio Irpino .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer International Publishing Switzerland

About this paper

Cite this paper

Irpino, A., Verde, R. (2013). A Metric Based Approach for the Least Square Regression of Multivariate Modal Symbolic Data. In: Giudici, P., Ingrassia, S., Vichi, M. (eds) Statistical Models for Data Analysis. Studies in Classification, Data Analysis, and Knowledge Organization. Springer, Heidelberg. https://doi.org/10.1007/978-3-319-00032-9_19

Download citation

Publish with us

Policies and ethics