# Principal Curves and Surfaces to Interval Valued Variables

• Jorge Arce G.
• Oldemar Rodríguez R.
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10022)

## Abstract

In this paper we propose a generalization to symbolic interval valued variables, of the Principal Curves and Surfaces method proposed by Hastie in [6]. Given a data set X with n observations and m continuous variables, the main idea of Principal Curves and Surfaces method is to generalize the principal component line, providing a smooth one-dimensional curved approximation to a set of data points in $$\mathbb {R}^m$$. A principal surface is more general, providing a curved manifold approximation of dimension 2 or more. In our case we are interested in finding the main principal curve that approximates better symbolic interval data variables. In [3, 4], authors proposed the Centers Method and the Vertices Method to extend the well-known principal components analysis method to a particular kind of symbolic objects characterized by multi-valued variables of interval type. In this paper we generalize both, the Centers Method and the Vertices Method, finding a smooth curve that passes through the middle of the data X in an orthogonal sense. Some comparisons of the proposed method regarding the Centers and the Vertices Methods are made, this was done with the RSDA package using Ichino data set, see [1, 10]. To make these comparisons we have used the correlation index.

## Keywords

Interval-valued variables Principal curves and surfaces Symbolic data analysis

## References

1. 1.
Billard, L., Diday, E.: Symbolic Data Analysis: Conceptual Statistics and Data Mining. Wiley, Hoboken (2006)
2. 2.
Bickel, P.J., Doksum, K.A.: Mathematical Statistics. Prentice Hall, Upper Saddle River (1977)
3. 3.
Cazes, P., Chouakria, A., Diday, E., Schektman, Y.: Extension de l’analyse en com-posantes principales á des données de type intervalle. Rev. Statistique Appliquée XLV(3), 5–24 (1997)Google Scholar
4. 4.
Douzal-Chouakria, A., Billard, L., Diday, E.: Principal component analysis for interval-valued observations. Stat. Anal. Data Min. 4(2), 229–246 (2011)
5. 5.
Ichino, M.: General metrics for mixed features - the Cartesian space theory for pattern recognition. In: Conference on Systems, Man, and Cybernetics, pp. 494–497. Pergamon, Oxford (1988)Google Scholar
6. 6.
Hastie, T.: Principal curves and surface. Ph.D. thesis Stanford University (1984)Google Scholar
7. 7.
Hastie, T., Weingessel, A.: Princurve - fits a principal curve in arbitrary dimension (2014). R package version 1.1-12 http://cran.r-project.org/web/packages/princurve/index.html
8. 8.
Hastie, T., Stuetzle, W.: Principal curves. J. Am. Stat. Assoc. 84(406), 502–516 (1989)
9. 9.
Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning; Data Mining, Inference and Prediction. Springer, New York (2008)
10. 10.
Rodríguez, O. with contributions from Olger Calderon, Roberto Zuñiga and Jorge Arce. RSDA - R to Symbolic Data Analysis (2015). R package version 1.3 http://CRAN.R-project.org/package=RSDA
11. 11.
Rodríguez, O.: Classification et Modèles Linéaires en Analyse des Données Symboliques. Ph.D. thesis, Paris IX-Dauphine University (2000)Google Scholar
12. 12.
Diday, E.: Introduction a L’approache Symbolique en Analyse des Données. Premieres Journées Symbolic-Numérique, CEREMADE, Université Paris, pp. 21–56 (1987)Google Scholar