Abstract
In this paper we propose a linear regression model for multivariate modal symbolic data. The observed variables are probabilistic modal variables according to the definition given in (Bock and Diday (2000). Analysis of symbolic data: exploratory methods for extracting statistical information from complex data. Springer), i.e. variables whose realizations are frequency or probability distributions. The parameters are estimated through a Least Squares method based on a suitable squared distance between the predicted and the observed modal symbolic data: the squared ℓ 2 Wasserstein distance. Measures of goodness of fit are also presented and an application on real data corroborates the proposed method.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsNotes
- 1.
Note that if x is a vector of scalars Eq. (8) becomes \(\mathbf{{x}^{T}y} =\sum \limits _{ i=1}^{n}x_{i} \cdot \bar{ y}_{i}\).
- 2.
- 3.
We supply the full table of histogram data, the MatlabTM code and workspace upon request.
References
Billard, L., & Diday, E. (2006). Symbolic data analysis: conceptual statistics and data mining. New York: Wiley.
Bock, H., & Diday, E. (2000). Analysis of symbolic data: exploratory methods for extracting statistical information from complex data. New York: Springer.
Cuesta-Albertos, J. A., Matrán, C., & Tuero-Díaz, A. (1997). Optimal transportation plans and convergence in distribution. Journal of Multivariate Analysis, 60, 72–83.
Dias, S., & Brito, P. (2011). A new linear regression model for histogram-valued variables. In 58th ISI World Statistics Congress. Dublin, Ireland.
Diday, E., & Noirhomme-Fraiture, M. (2008). Symbolic data analysis and the SODAS software. New York: Wiley.
Dueñas, C., Fernández, M., Cañete, S., Carretero, J., & Liger, E. (2002). Assessment of ozone variations and meteorological effects in an urban area in the mediterranean coast. Science of The Total Environment, 299(1–3), 97–113.
Gibbs, A., & Su, F. (2002). On choosing and bounding probability metrics. International Statistical Review, 70(3), 419–435.
Irpino, A., & Romano, E. (2007). Optimal histogram representation of large data sets: Fisher vs piecewise linear approximation. Revue des Nouvelles Technologies de l’Information, RNTI-E-9, 99–110.
Lawson, C. L., & Hanson, R. J. (1974). Solving least square problems. Edgeworth Cliff, NJ: Prentice Hall.
Verde, R., & Irpino, A. (2007). Dynamic clustering of histogram data: Using the right metric. In P. e. a. Brito (Ed.) Selected contributions in data analysis and classification (pp. 123–134). New York: Springer.
Verde, R., & Irpino, A. (2007). Dynamic clustering of histogram data: Using the right metric. In P. Brito, G. Cucumel, P. Bertrand, & F. De Carvalho (Eds.) Selected contributions in data analysis and classification (Chap. 12, pp. 123–134). Berlin, Heidelberg: Springer.
Verde, R., & Irpino, A. (2008). Comparing histogram data using a mahalanobis-wasserstein distance. In P. Brito (Ed.) COMPSTAT 2008 (Chap. 7, pp. 77–89). Heidelberg: Physica-Verlag HD.
Verde, R., & Irpino, A. (2010). Ordinary least squares for histogram data based on wasserstein distance. In Y. Lechevallier, & G. Saporta (Eds.) Proceedings of COMPSTAT’2010. (Chap. 60, pp. 581–588). Heidelberg: Physica-Verlag HD.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer International Publishing Switzerland
About this paper
Cite this paper
Irpino, A., Verde, R. (2013). A Metric Based Approach for the Least Square Regression of Multivariate Modal Symbolic Data. In: Giudici, P., Ingrassia, S., Vichi, M. (eds) Statistical Models for Data Analysis. Studies in Classification, Data Analysis, and Knowledge Organization. Springer, Heidelberg. https://doi.org/10.1007/978-3-319-00032-9_19
Download citation
DOI: https://doi.org/10.1007/978-3-319-00032-9_19
Published:
Publisher Name: Springer, Heidelberg
Print ISBN: 978-3-319-00031-2
Online ISBN: 978-3-319-00032-9
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)