Skip to main content
Log in

Exploratory data analysis for interval compositional data

  • Regular Article
  • Published:
Advances in Data Analysis and Classification Aims and scope Submit manuscript

Abstract

Compositional data are considered as data where relative contributions of parts on a whole, conveyed by (log-)ratios between them, are essential for the analysis. In Symbolic Data Analysis (SDA), we are in the framework of interval data when elements are characterized by variables whose values are intervals on \(\mathbb {R}\) representing inherent variability. In this paper, we address the special problem of the analysis of interval compositions, i.e., when the interval data are obtained by the aggregation of compositions. It is assumed that the interval information is represented by the respective midpoints and ranges, and both sources of information are considered as compositions. In this context, we introduce the representation of interval data as three-way data. In the framework of the log-ratio approach from compositional data analysis, it is outlined how interval compositions can be treated in an exploratory context. The goal of the analysis is to represent the compositions by coordinates which are interpretable in terms of the original compositional parts. This is achieved by summarizing all relative information (logratios) about each part into one coordinate from the coordinate system. Based on an example from the European Union Statistics on Income and Living Conditions (EU-SILC), several possibilities for an exploratory data analysis approach for interval compositions are outlined and investigated.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

References

  • Aitchison J (1986) The statistical analysis of compositional data. Chapman and Hall, London

    Book  MATH  Google Scholar 

  • Aitchison J, Greenacre M (2002) Biplots for compositional data. J R Stat Soc Ser C (Appl Stat) 51(4):375–392

    Article  MathSciNet  MATH  Google Scholar 

  • Aitchison J, Ng KW (2005) The role of perturbation in compositional data analysis. Stat Model 5:173–185

    Article  MathSciNet  MATH  Google Scholar 

  • Alfons A, Templ M (2013) Estimation of social exclusion indicators from complex surveys: the R package laeken. J Stat Softw 54(15):1–25

    Article  Google Scholar 

  • Billheimer D, Guttorp P, Fagan W (2001) Statistical interpretation of species composition. J Am Stat Assoc 96:1205–1214

    Article  MathSciNet  MATH  Google Scholar 

  • Billard L, Diday E (2003) From the statistics of data to the statistics of knowledge: symbolic data analysis. J Am Stat Assoc 98(462):470–487

    Article  MathSciNet  Google Scholar 

  • Bock H-H, Diday E (eds) (2000) Analysis of symbolic data, exploratory methods for extracting statistical information from complex data. Springer, Heidelberg

    MATH  Google Scholar 

  • Brito P, Duarte Silva AP (2012) Modelling interval data with Normal and Skew-Normal distributions. J Appl Stat 39(1):3–20

    Article  MathSciNet  Google Scholar 

  • Bro R (1997) PARAFAC. Tutorial and applications. Chemometr Intell Lab Syst 38:149–171

    Article  Google Scholar 

  • Cazes P, Chouakria A, Diday E, Schektman Y (1997) Extensions de l’Analyse en Composantes Principales à des données de type intervalle. Rev Stat Appl 24:5–24

    Google Scholar 

  • Chouakria A, Cazes P, Diday E (2000) Symbolic principal component analysis. In: Bock HH, Diday E (eds) Analysis of symbolic data, exploratory methods for extracting statistical information from complex data. Springer, Heidelberg, pp 200–212

    Google Scholar 

  • Diday E, Noirhomme-Fraiture M (eds) (2008) Symbolic data analysis and the SODAS software. Wiley, Chichester

    MATH  Google Scholar 

  • Di Palma AM, Filzmoser P, Gallo M, Hron K (2015) A robust CP model for compositional data(Submitted)

  • Eaton ML (1983) Multivariate statistics. A vector space approach. John Wiley & Sons, New York

    MATH  Google Scholar 

  • Egozcue JJ, Pawlowsky-Glahn V, Mateu-Figueras G, Barceló-Vidal V (2003) Isometric logratio transformations for compositional data analysis. Math Geol 35:279–300

    Article  MathSciNet  MATH  Google Scholar 

  • Egozcue JJ, Pawlowsky-Glahn V (2005) Groups of parts and their balances in compositional data analysis. Math Geol 37:795–828

    Article  MathSciNet  MATH  Google Scholar 

  • Egozcue JJ, Pawlowsky-Glahn V (2006) Simplicial geometry for compositional data. In: Buccianti A, Mateu-Figueras G, Pawlowsky-Glahn V (eds) Compositional data analysis in the geosciences: from theory to practice. Geological Society, Special Publications, London, pp 145–160

    Google Scholar 

  • Filzmoser P, Hron K (2008) Outlier detection for compositional data using robust methods. Math Geosci 40(3):233–248

    Article  MATH  Google Scholar 

  • Filzmoser P, Hron K, Reimann C (2009) Principal component analysis for compositional data with outliers. Environmetrics 20(6):621–632

    Article  MathSciNet  Google Scholar 

  • Filzmoser P, Hron K (2009) Correlation analysis for compositional data. Math Geosci 41(8):905–919

    Article  MathSciNet  MATH  Google Scholar 

  • Filzmoser P, Hron K, Reimann C (2012) Interpretation of multivariate outliers for compositional data. Comput Geosci 39:77–85

    Article  Google Scholar 

  • Filzmoser P, Hron K (2011) Robust statistical analysis. In: Pawlowsky-Glahn V, Buccianti A (eds) Compositional data analysis: theory and applications. Wiley, Chichester, pp 59–72

    Chapter  Google Scholar 

  • Fišerová E, Hron K (2011) On interpretation of orthonormal coordinates for compositional data. Math Geosci 43:455–468

    Article  Google Scholar 

  • Engle MA, Gallo M, Schroeder KT, Geboy NJ, Zupancic JW (2014) Three-way compositional analysis of water quality monitoring data. Environ Ecol Stat 21(3):565–581

    Article  MathSciNet  Google Scholar 

  • Giordani P, Kiers HAL (2006) A comparison of three methods for Principal Component Analysis of fuzzy interval data. Comput Stat Data Anal, special issue “The Fuzzy Approach to Statistical Analysis” 51(1):379–397

  • Kojadinovic I, Holmes M (2009) Tests of independence among continuous random vectors based on Cramér-von Mises functionals of the empirical copula process. J Multivar Anal 100:1137–1154

    Article  MATH  Google Scholar 

  • Kroonenberg EM (1983) Three-mode principal component analysis: theory and applications. DSWO, Leiden

    Google Scholar 

  • Kroonenberg EM, De Leeuw J (1980) Principal component analysis of three-mode data by means of alternating least squares algorithms. Psychometrika 45:69–97

    Article  MathSciNet  MATH  Google Scholar 

  • Lauro C, Palumbo F (2005) Principal component analysis for non-precise data. In: Vichi M et al (eds) New developments in classification and data analysis. Springer, Heidelberg, pp 173–184

    Chapter  Google Scholar 

  • Mateu-Figueras G, Pawlowsky-Glahn V (2008) A critical approach to probability laws in geochemistry. Math Geosci 40:489–502

    Article  MATH  Google Scholar 

  • Moore RE (1966) Interval analysis. Prentice Hall, New Jersey

    MATH  Google Scholar 

  • Morrison DF (1990) Multivariate statistical methods, 3rd edn. McGraw-Hill, New York

    MATH  Google Scholar 

  • Neto EAL, De Carvalho FAT (2008) Centre and range method for fitting a linear regression model to symbolic intervalar data. Comput Stat Data Anal 52(3):1500–1515

    Article  MATH  Google Scholar 

  • Neto EAL, De Carvalho FAT (2010) Constrained linear regression models for symbolic interval-valued variables. Comput Stat Data Anal 54(2):333–347

    Article  MathSciNet  MATH  Google Scholar 

  • Noirhomme-Fraiture M, Brito P (2011) Far beyond the classical data models: symbolic data analysis. Stat Anal Data Min 4(2):157–170

    Article  MathSciNet  Google Scholar 

  • Palarea-Albaladejo J, Martín-Fernández JA (2012) Dealing with distances and transformations for fuzzy c-means clustering of compositional data. J Classifi 29:144–169

    Article  MathSciNet  MATH  Google Scholar 

  • Pavlačka O (2013) Note on the lack of equality between fuzzy weighted average and fuzzy convex sum. Fuzzy Sets Syst 213:102–105

    Article  MathSciNet  MATH  Google Scholar 

  • Pawlowsky-Glahn V, Egozcue JJ (2001) Geometric approach to statistical analysis on the simplex. Stoch Environ Res Risk Assess 15:384–398

    Article  MATH  Google Scholar 

  • Pawlowsky-Glahn V, Egozcue JJ, Tolosana-Delgado R (2015a) Modeling and analysis of compositional data. Wiley, Chichester

    Google Scholar 

  • Pawlowsky-Glahn V, Egozcue JJ, Lovell D (2015b) Tools for compositional data with a total. Stat Model 15:175–190

    Article  MathSciNet  Google Scholar 

  • Rousseeuw PJ, Ruts I, Tukey JW (1999) The bagplot: a bivariate boxplot. Am Stat 53(4):382–387

    Google Scholar 

  • Seber GAF (1984) Multivariate observations. Wiley, New York

  • Teles P, Brito P (2015) Modeling interval time series with space-time processes. Commun Stat Theory Methods 44(17):3599–3627

    Article  MathSciNet  MATH  Google Scholar 

  • Wang H, Guan R, Wu J (2012) CIPCA: complete-information-based principal component analysis for interval-valued data. Neurocomputing 86:158–169

    Article  Google Scholar 

  • Zuccolotto P (2007) Principal components of sample estimates: an approach through symbolic data analysis. Stat Methods Appl 16(2):173–192

    Article  MathSciNet  MATH  Google Scholar 

Download references

Acknowledgments

Karel Hron gratefully acknowledges the support of the grant COST Action CRoNoS IC1408 and the grants IGA_PrF_2015_013, IGA_PrF_2016_025 Mathematical Models of the Internal Grant Agency of the Palacky University in Olomouc. Peter Filzmoser was supported by the K-project DEXHELPP through COMET - Competence Centers for Excellent Technologies, supported by BMVIT, BMWFI and the province Vienna. The COMET program is administrated by FFG. Finally, this work is financed also by the ERDF – European Regional Development Fund through the Operational Programme for Competitiveness and Internationalization – COMPETE 2020 Programme within project «POCI-01-0145-FEDER-006961», and by National Funds through the FCT - Fundação para a Ci\(\hat{\mathrm{e}}\)ncia e a Tecnologia (Portuguese Foundation for Science and Technology) as part of project UID/EEA/50014/2013.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Karel Hron.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Hron, K., Brito, P. & Filzmoser, P. Exploratory data analysis for interval compositional data. Adv Data Anal Classif 11, 223–241 (2017). https://doi.org/10.1007/s11634-016-0245-y

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11634-016-0245-y

Keywords

Mathematics Subject Classification

Navigation