Computational Statistics

, Volume 28, Issue 5, pp 2117–2138 | Cite as

Principal component histograms from interval-valued observations

Original Paper

Abstract

The focus of this paper is to propose an approach to construct histogram values for the principal components of interval-valued observations. Le-Rademacher and Billard (J Comput Graph Stat 21:413–432, 2012) show that for a principal component analysis on interval-valued observations, the resulting observations in principal component space are polytopes formed by the convex hulls of linearly transformed vertices of the observed hyper-rectangles. In this paper, we propose an algorithm to translate these polytopes into histogram-valued data to provide numerical values for the principal components to be used as input in further analysis. Other existing methods of principal component analysis for interval-valued data construct the principal components, themselves, as intervals which implicitly assume that all values within an observation are uniformly distributed along the principal components axes. However, this assumption is only true in special cases where the variables in the dataset are mutually uncorrelated. Representation of the principal components as histogram values proposed herein more accurately reflects the variation in the internal structure of the observations in a principal component space. As a consequence, subsequent analyses using histogram-valued principal components as input result in improved accuracy.

Keywords

Interval-valued input data Histogram-valued output data Principal component analysis Linear transformation  Polytopes 

Supplementary material

180_2013_399_MOESM1_ESM.docx (18 kb)
Supplementary material 1 (DOCX 18 KB)
180_2013_399_MOESM1_ESM.txt (17 kb)
Supplementary material 2 (TXT 17 KB)
180_2013_399_MOESM2_ESM.txt (4 kb)
Supplementary material 3 (TXT 4 KB)

References

  1. Anderson TW (1963) Asymptotic theory for principal components analysis. Ann Math Stat 34:122–148CrossRefMATHGoogle Scholar
  2. Anderson TW (1984) An introduction to multivariate statistical analysis, 2nd edn. Wiley, New YorkMATHGoogle Scholar
  3. Bertrand P, Goupil F (2000) Descriptive statistics for symbolic data. In: Bock H-H, Diday E (eds) Analysis of symbolic data: explanatory methods for extracting statistical information from complex data. Springer, Berlin, pp 106–124CrossRefGoogle Scholar
  4. Billard L (2008) Sample covariance functions for complex quantitative data. In: Mizuta M, Nakano J (eds) Proceedings world conference of the international association for statistical computing. Japan, pp 157–163Google Scholar
  5. Billard L, Diday E (2003) From the statistics of data to the statistics of knowledge: symbolic data analysis. J Am Stat Assoc 98:470–487MathSciNetCrossRefGoogle Scholar
  6. Billard L, Diday E (2006) Symbolic data analysis: conceptual statistics and data mining. Wiley, New YorkCrossRefGoogle Scholar
  7. Bock H-H, Diday E (eds) (2000) Analysis of symbolic data: explanatory methods for extracting statistical information from complex data. Springer, BerlinGoogle Scholar
  8. Cazes P (2002) Analyse Factorielle d’un Tableau de Lois de Probabilité. Revue de Statistique Appliquée 50(3):5–24MathSciNetGoogle Scholar
  9. Cazes P, Chouakria A, Diday E, Schektman Y (1997) Extension de l’Analyse en Composantes Principales à des Données de Type Intervalle. Revue de Statistique Appliquée 45(3):5–24Google Scholar
  10. Chouakria A (1998) Extension des Méthodes d’analyse Factorielle a des Données de Type Intervalle. Université Paris, Dauphine, Doctoral ThesisGoogle Scholar
  11. Coppi R, Giordani P, D’Urso P (2006) Component models for fuzzy data. Psychometrika 71:733–761MathSciNetCrossRefGoogle Scholar
  12. Davidson KR, Donsig AP (2002) Real analysis with real applications. Prentice Hall, New JerseyGoogle Scholar
  13. Diday E (1987) Introduction à l’Approache Symbolique en Analyse des Données. CEREMADE, Université Paris, Premières Journées Symbolic-Numérique, pp 21–56Google Scholar
  14. Douzal-Chouakria A, Billard L, Diday E (2011) Principal component analysis for interval-valued observations. Stat Anal Data Min 4:229–246MathSciNetCrossRefGoogle Scholar
  15. Gioia F, Lauro NC (2006) Principal component analysis on interval data. Comput Stat 21:343–363MathSciNetCrossRefMATHGoogle Scholar
  16. Giordani P, Kiers HAL (2004) Principal component analysis of symmetric fuzzy data. Comput Stat Data Anal 45:519–548MathSciNetCrossRefMATHGoogle Scholar
  17. Ichino M (2011) The quantile method for symbolic principal component analysis. Stat Anal Data Min 4:184–198MathSciNetCrossRefGoogle Scholar
  18. Irpino A, Lauro NC, Verde R (2003) Visualizing symbolic data by closed shapes. In: Schader M, Gaul W, Vichi M (eds) Between data science and applied data analysis. Springer, Berlin, pp 244–251CrossRefGoogle Scholar
  19. Johnson RA, Wichern DW (2002) Applied multivariate statistical analysis, 5th edn. Prentice Hall, New JerseyGoogle Scholar
  20. Jolliffe IT (2004) Principal component analysis, 2nd edn. Springer, New YorkGoogle Scholar
  21. Lauro NC, Palumbo F (2000) Principal component analysis of interval data: a symbolic data analysis approach. Comput Stat 15:73–87CrossRefMATHGoogle Scholar
  22. Lauro NC, Verde R, Irpino A (2008) Principal component analysis of symbolic data described by intervals. In: Diday E, Noirhomme-Fraiture M (eds) Symbolic data analysis and the SODAS software. Wiley, Chichester, pp 279–311Google Scholar
  23. Leroy B, Chouakria A, Herlin I, Diday E (1996) Approche Géométrique et Classification pour la Reconnaissance de Visage. Reconnaissance des Forms et Intelligence Artificelle, INRIA and IRISA and CNRS, France, pp 548–557Google Scholar
  24. Le-Rademacher J, Billard L (2012) Symbolic-covariance principal component analysis and visualization for interval-valued data. J Comput Graph Stat 21:413–432MathSciNetCrossRefGoogle Scholar
  25. Makosso Kallyth S, Diday E (2010) Analyse en Axes Principaux de Variables Symboliques de Type Histogrammes. Act. XLII Journées de Statistiques, Marseille, France, pp 1–6. http://hal.archives-ouvertes.fr/inria-00494681/
  26. Palumbo F, Lauro NC (2003) A PCA for interval-valued data based on midpoints and radii. In: Yanai H, Okada A, Shigemasu K, Kano Y, Meulman J (eds) New developments in psychometrics. Springer, Tokyo, pp 641–648CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  1. 1.Division of BiostatisticsMedical College of WisconsinMilwaukeeUSA
  2. 2.Department of StatisticsUniversity of GeorgiaAthensUSA

Personalised recommendations