Skip to main content
Log in

Clustering reduced interval data using Hausdorff distance

  • Published:
Computational Statistics Aims and scope Submit manuscript

Summary

In the last decade, factorial and clustering techniques have been developed to analyze multidimensional interval data (MIDs). In classic data analysis, PCA and clustering of the most significant components are usually performed to extract cluster structure from data. The clustering of the projected data is then performed, once the noise is filtered out, in a subspace generated by few orthogonal variables. In the framework of interval data analysis, we propose the same strategy. Several computational questions arise from this generalization. First of all, the representation of data onto a factorial subspace: in classic data analysis projected points remain points, but projected MIDs do not remains MIDs. Further, the choice of a distance between the represented data: many distances between points can be computed, few distances between convex sets of points are defined. We here propose optimized techniques for representing data by convex shapes, for computing the Hausdorff distance between convex shapes, based on an L 2 norm, and for performing a hierarchical clustering of projected data.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Figure 1:
Figure 2:
Figure 3:
Figure 4:
Figure 5:
Figure 6:

Similar content being viewed by others

Notes

  1. 1 Analogously it is possible to define Hausdorff points for the reverse distance.

  2. 2The reverse distance can be calculated as the forward distance between B and A.

References

  • Atallah, M. J. (1983), ‘A linear time algorithm for the hausdorff distance between convex polygons,’ Information Processing Letters 17, 207–209.

    Article  MathSciNet  Google Scholar 

  • Bock, H. H. & Diday, E. (2000), Analysis of Symbolic Data, Exploratory methods for extracting statistical information from complex data, Studies in Classification, Data Analysis and Knowledge Organisation, Springer-Verlag.

  • Breiman, L., Friedman, J., Olshen, R. & Stone, C. (1984), Classification and regression trees, Chapman-Hall.

  • Brito, P. (2001), Hierarchical and pyramidal clustering for symbolic data, in ‘Proc. of International Conference on New Trends in Computational Statistics with Biomedical Applications’, ICNCB, Japan.

  • Cazes, P., Chouakria, A., Diday, E. & Schektman, Y. (1997), ‘Extension de l’analyse en composantes principales à des données de type intervalle,’ Revue de Statistique Appliquée XIV(3), 5–24.

    Google Scholar 

  • Chand, D. & Kapur, S. (1970), ‘An algorithm for convex polytopes’, J. ACM 421–464(121).

  • Chavent, M. (2000), Criterion-based divisive clustering for symbolic objects, in H. H. Bock & E. Diday, eds, ‘Analysis of Symbolic Data, Exploratory methods for extracting statistical information from complex data’, Springer.

  • Chavent, M., De Carvalho, F., Lechevallier, Y. & Verde, R. (2003), ‘Trois nouvelles méthodes de classification automatique de données symboliques de type intervalle.’, Rev. Statistique Appliquée 4, 5–29.

    Google Scholar 

  • D’Urso, P. & Giordani, P. (2004), ‘A least squares approach to principal component analysis for interval valued data,’ Chemometrics and Intelligent Laboratory Systems 70(2), 179–192.

    Article  Google Scholar 

  • Giordani, P. & Kiers, H. (2004), ‘Principal component analysis of symmetric fuzzy data,’ Comp. Stat. Data Anal. (45), 519–548.

    Article  MathSciNet  Google Scholar 

  • Goodman, J. (1997), The Handbook of Discrete and Computational Geometry, CRC Press, Boca Raton, FL.

    MATH  Google Scholar 

  • Hickey, T., Ju, Q. & Van Emden, M. H. (2001), ‘Interval arithmetic: From principles to implementation,’ Jour. of the ACM 48(5), 1038–1068.

    Article  MathSciNet  Google Scholar 

  • Irpino, A., Verde, R. & Lauro, N. (2003), Visualizing symbolic data by closed shapes, in Shader-Gaul-Vichi, ed., ‘Between Data Science and Applied Data Analysis,’ Springer, Berlin, pp. 244–251.

    Chapter  Google Scholar 

  • Jaulin, L., Kieffer, M., Didrit, O. & Walter, E. (2001), Applied Interval Analysis, Springer.

  • Johnson, S. (1967), ‘Hierarchical clustering schemes,’ Psychometrika 2, 241–254.

    Article  Google Scholar 

  • Jones, C., Kerrigan, E. & Maciejowski, J. (2004), Equality set projection: A new algorithm for the projection of polytopes in halfspace representation, Technical report, CUED/F-INFENG/TR.463, Cambridge University Engineering Department. URL: http://www-control.eng.cam.ac.uk/∼cnj22/docs/resp_mar_04_15.pdf

  • Kiers, H. & ten Berge, J. (1989), ‘Alternating least squares algorithms for simultaneous components analysis with equal component weight matrices for all populations,’ Psychometrica 54, 467–473.

    Article  Google Scholar 

  • Lauro, C. N. & Palumbo, F. (2000), ‘Principal component analysis of interval data: A symbolic data analysis approach,’ Computational Statistics 15(1), 73–87.

    Article  Google Scholar 

  • Lauro, C. N., Verde, R. & Palumbo, F. (2000), Factorial methods with cohesion constraints on symbolic objects, in ‘IFCS’00’.

  • Lebart, L., Morineau, A. & Piron, M. (1995), ‘Statistique exploratoire multidimensionelle, Dunod, Paris.

    MATH  Google Scholar 

  • Palumbo, F. & Lauro, C. (2003), A pca for interval valued data based on midpoints and radii, in H. Y. et al., ed., ‘New developments in Psychometrics’, Psychometric Society, Springer-Verlag, Tokio.

    Google Scholar 

  • Porzio, G., Ragozini, G. & Verde, R. (1998), Generalization of symbolic objects by convex hulls, in ‘proceedings of ASUS 98’.

  • Sodas 2 (2004), ASSO Project, Analysis System of Symbolic Official data. URL: http://www.info.fundp.ac.be/asso/index.html

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Irpino, A., Tontodonato, V. Clustering reduced interval data using Hausdorff distance. Computational Statistics 21, 271–288 (2006). https://doi.org/10.1007/s00180-006-0263-x

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00180-006-0263-x

Keywords

Navigation