Summary
In the last decade, factorial and clustering techniques have been developed to analyze multidimensional interval data (MIDs). In classic data analysis, PCA and clustering of the most significant components are usually performed to extract cluster structure from data. The clustering of the projected data is then performed, once the noise is filtered out, in a subspace generated by few orthogonal variables. In the framework of interval data analysis, we propose the same strategy. Several computational questions arise from this generalization. First of all, the representation of data onto a factorial subspace: in classic data analysis projected points remain points, but projected MIDs do not remains MIDs. Further, the choice of a distance between the represented data: many distances between points can be computed, few distances between convex sets of points are defined. We here propose optimized techniques for representing data by convex shapes, for computing the Hausdorff distance between convex shapes, based on an L 2 norm, and for performing a hierarchical clustering of projected data.
Similar content being viewed by others
Notes
1 Analogously it is possible to define Hausdorff points for the reverse distance.
2The reverse distance can be calculated as the forward distance between B and A.
References
Atallah, M. J. (1983), ‘A linear time algorithm for the hausdorff distance between convex polygons,’ Information Processing Letters 17, 207–209.
Bock, H. H. & Diday, E. (2000), Analysis of Symbolic Data, Exploratory methods for extracting statistical information from complex data, Studies in Classification, Data Analysis and Knowledge Organisation, Springer-Verlag.
Breiman, L., Friedman, J., Olshen, R. & Stone, C. (1984), Classification and regression trees, Chapman-Hall.
Brito, P. (2001), Hierarchical and pyramidal clustering for symbolic data, in ‘Proc. of International Conference on New Trends in Computational Statistics with Biomedical Applications’, ICNCB, Japan.
Cazes, P., Chouakria, A., Diday, E. & Schektman, Y. (1997), ‘Extension de l’analyse en composantes principales à des données de type intervalle,’ Revue de Statistique Appliquée XIV(3), 5–24.
Chand, D. & Kapur, S. (1970), ‘An algorithm for convex polytopes’, J. ACM 421–464(121).
Chavent, M. (2000), Criterion-based divisive clustering for symbolic objects, in H. H. Bock & E. Diday, eds, ‘Analysis of Symbolic Data, Exploratory methods for extracting statistical information from complex data’, Springer.
Chavent, M., De Carvalho, F., Lechevallier, Y. & Verde, R. (2003), ‘Trois nouvelles méthodes de classification automatique de données symboliques de type intervalle.’, Rev. Statistique Appliquée 4, 5–29.
D’Urso, P. & Giordani, P. (2004), ‘A least squares approach to principal component analysis for interval valued data,’ Chemometrics and Intelligent Laboratory Systems 70(2), 179–192.
Giordani, P. & Kiers, H. (2004), ‘Principal component analysis of symmetric fuzzy data,’ Comp. Stat. Data Anal. (45), 519–548.
Goodman, J. (1997), The Handbook of Discrete and Computational Geometry, CRC Press, Boca Raton, FL.
Hickey, T., Ju, Q. & Van Emden, M. H. (2001), ‘Interval arithmetic: From principles to implementation,’ Jour. of the ACM 48(5), 1038–1068.
Irpino, A., Verde, R. & Lauro, N. (2003), Visualizing symbolic data by closed shapes, in Shader-Gaul-Vichi, ed., ‘Between Data Science and Applied Data Analysis,’ Springer, Berlin, pp. 244–251.
Jaulin, L., Kieffer, M., Didrit, O. & Walter, E. (2001), Applied Interval Analysis, Springer.
Johnson, S. (1967), ‘Hierarchical clustering schemes,’ Psychometrika 2, 241–254.
Jones, C., Kerrigan, E. & Maciejowski, J. (2004), Equality set projection: A new algorithm for the projection of polytopes in halfspace representation, Technical report, CUED/F-INFENG/TR.463, Cambridge University Engineering Department. URL: http://www-control.eng.cam.ac.uk/∼cnj22/docs/resp_mar_04_15.pdf
Kiers, H. & ten Berge, J. (1989), ‘Alternating least squares algorithms for simultaneous components analysis with equal component weight matrices for all populations,’ Psychometrica 54, 467–473.
Lauro, C. N. & Palumbo, F. (2000), ‘Principal component analysis of interval data: A symbolic data analysis approach,’ Computational Statistics 15(1), 73–87.
Lauro, C. N., Verde, R. & Palumbo, F. (2000), Factorial methods with cohesion constraints on symbolic objects, in ‘IFCS’00’.
Lebart, L., Morineau, A. & Piron, M. (1995), ‘Statistique exploratoire multidimensionelle, Dunod, Paris.
Palumbo, F. & Lauro, C. (2003), A pca for interval valued data based on midpoints and radii, in H. Y. et al., ed., ‘New developments in Psychometrics’, Psychometric Society, Springer-Verlag, Tokio.
Porzio, G., Ragozini, G. & Verde, R. (1998), Generalization of symbolic objects by convex hulls, in ‘proceedings of ASUS 98’.
Sodas 2 (2004), ASSO Project, Analysis System of Symbolic Official data. URL: http://www.info.fundp.ac.be/asso/index.html
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Irpino, A., Tontodonato, V. Clustering reduced interval data using Hausdorff distance. Computational Statistics 21, 271–288 (2006). https://doi.org/10.1007/s00180-006-0263-x
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00180-006-0263-x