Advertisement

Principal Curves and Surfaces to Interval Valued Variables

  • Jorge Arce G.Email author
  • Oldemar Rodríguez R.
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10022)

Abstract

In this paper we propose a generalization to symbolic interval valued variables, of the Principal Curves and Surfaces method proposed by Hastie in [6]. Given a data set X with n observations and m continuous variables, the main idea of Principal Curves and Surfaces method is to generalize the principal component line, providing a smooth one-dimensional curved approximation to a set of data points in \(\mathbb {R}^m\). A principal surface is more general, providing a curved manifold approximation of dimension 2 or more. In our case we are interested in finding the main principal curve that approximates better symbolic interval data variables. In [3, 4], authors proposed the Centers Method and the Vertices Method to extend the well-known principal components analysis method to a particular kind of symbolic objects characterized by multi-valued variables of interval type. In this paper we generalize both, the Centers Method and the Vertices Method, finding a smooth curve that passes through the middle of the data X in an orthogonal sense. Some comparisons of the proposed method regarding the Centers and the Vertices Methods are made, this was done with the RSDA package using Ichino data set, see [1, 10]. To make these comparisons we have used the correlation index.

Keywords

Interval-valued variables Principal curves and surfaces Symbolic data analysis 

References

  1. 1.
    Billard, L., Diday, E.: Symbolic Data Analysis: Conceptual Statistics and Data Mining. Wiley, Hoboken (2006)CrossRefzbMATHGoogle Scholar
  2. 2.
    Bickel, P.J., Doksum, K.A.: Mathematical Statistics. Prentice Hall, Upper Saddle River (1977)zbMATHGoogle Scholar
  3. 3.
    Cazes, P., Chouakria, A., Diday, E., Schektman, Y.: Extension de l’analyse en com-posantes principales á des données de type intervalle. Rev. Statistique Appliquée XLV(3), 5–24 (1997)Google Scholar
  4. 4.
    Douzal-Chouakria, A., Billard, L., Diday, E.: Principal component analysis for interval-valued observations. Stat. Anal. Data Min. 4(2), 229–246 (2011)MathSciNetCrossRefGoogle Scholar
  5. 5.
    Ichino, M.: General metrics for mixed features - the Cartesian space theory for pattern recognition. In: Conference on Systems, Man, and Cybernetics, pp. 494–497. Pergamon, Oxford (1988)Google Scholar
  6. 6.
    Hastie, T.: Principal curves and surface. Ph.D. thesis Stanford University (1984)Google Scholar
  7. 7.
    Hastie, T., Weingessel, A.: Princurve - fits a principal curve in arbitrary dimension (2014). R package version 1.1-12 http://cran.r-project.org/web/packages/princurve/index.html
  8. 8.
    Hastie, T., Stuetzle, W.: Principal curves. J. Am. Stat. Assoc. 84(406), 502–516 (1989)MathSciNetCrossRefzbMATHGoogle Scholar
  9. 9.
    Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning; Data Mining, Inference and Prediction. Springer, New York (2008)zbMATHGoogle Scholar
  10. 10.
    Rodríguez, O. with contributions from Olger Calderon, Roberto Zuñiga and Jorge Arce. RSDA - R to Symbolic Data Analysis (2015). R package version 1.3 http://CRAN.R-project.org/package=RSDA
  11. 11.
    Rodríguez, O.: Classification et Modèles Linéaires en Analyse des Données Symboliques. Ph.D. thesis, Paris IX-Dauphine University (2000)Google Scholar
  12. 12.
    Diday, E.: Introduction a L’approache Symbolique en Analyse des Données. Premieres Journées Symbolic-Numérique, CEREMADE, Université Paris, pp. 21–56 (1987)Google Scholar

Copyright information

© Springer International Publishing AG 2016

Authors and Affiliations

  1. 1.University of Costa RicaSan JoséCosta Rica
  2. 2.National Bank of Costa RicaSan JoséCosta Rica

Personalised recommendations