Advertisement

Journal of Classification

, Volume 29, Issue 2, pp 144–169 | Cite as

Dealing with Distances and Transformations for Fuzzy C-Means Clustering of Compositional Data

  • Javier Palarea-Albaladejo
  • Josep Antoni Martín-Fernández
  • Jesús A. Soto
Article

Abstract

Clustering techniques are based upon a dissimilarity or distance measure between objects and clusters. This paper focuses on the simplex space, whose elements—compositions—are subject to non-negativity and constant-sum constraints. Any data analysis involving compositions should fulfill two main principles: scale invariance and subcompositional coherence. Among fuzzy clustering methods, the FCM algorithm is broadly applied in a variety of fields, but it is not well-behaved when dealing with compositions. Here, the adequacy of different dissimilarities in the simplex, together with the behavior of the common log-ratio transformations, is discussed in the basis of compositional principles. As a result, a well-founded strategy for FCM clustering of compositions is suggested. Theoretical findings are accompanied by numerical evidence, and a detailed account of our proposal is provided. Finally, a case study is illustrated using a nutritional data set known in the clustering literature.

Keywords

Fuzzy clustering FCM Compositional data Closed data Simplex space Aitchison distance 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. AITCHISON, J. (1986), The Statistical Analysis of Compositional Data, London: Chapman & Hall, reprinted in 2003 by Blackburn Press.zbMATHCrossRefGoogle Scholar
  2. AITCHISON, J. (1992), “On Criteria for Measures of Compositional Difference,” Mathematical Geology, 24, 365–379.MathSciNetzbMATHCrossRefGoogle Scholar
  3. AITCHISON, J., BARCELÓ-VIDAL, C., MARTÍN-FERNÁNDEZ, J.A., and PAWLOWSKY-GLAHN, V. (2000), “Logratio Analysis and Compositional Distance,” Mathematical Geology, 32, 271–275.zbMATHCrossRefGoogle Scholar
  4. AITCHISON, J., and GREENACRE, M. (2002), “Biplots for Compositional Data,” Journal of the Royal Statistical Society, Series C, 51, 375–392.MathSciNetzbMATHCrossRefGoogle Scholar
  5. BAXTER, M.J., and FREESTONE, I.C. (2006), “Log-ratio Compositional Data Analysis in Archeometry,” Archaeometry, 48, 511–531.CrossRefGoogle Scholar
  6. BERGET, I., MEVIK, B-H., and NAES, T. (2008), “New Modifications and Applications of Fuzzy C-Means Methodology,” Computational Statistics & Data Analysis, 52, 2403–2418.MathSciNetzbMATHCrossRefGoogle Scholar
  7. BEZDEK, J. (1981), Pattern Recognition with Fuzzy Objective Function Algorithms, New York: Plenum Press.zbMATHCrossRefGoogle Scholar
  8. BILLHEIMER, D., GUTTORP, P., and FAGAN, W. (2001), “Statistical Interpretation of Species Composition,” Journal of the American Statistical Association, 96, 1205–1214.MathSciNetzbMATHCrossRefGoogle Scholar
  9. CHACÓN, J.E., MATEU-FIGUERAS, G., and MARTÍN-FERNÁNDEZ, J.A. (2011), “Gaussian Kernels for Density Estimation with Compositional Data,” Computers & Geosciences, 37, 702–711.CrossRefGoogle Scholar
  10. DESARBO, W.S., RAMASWAMY, V., and LENK, P. (1993), “A Latent Class Procedure for the Structural Analysis of Two-Way Compositional Data,” Journal of Classification, 10, 159–193.zbMATHCrossRefGoogle Scholar
  11. DÖRING, C., LESOT, M-J., and KRUSE, R. (2006), “Data Analysis with Fuzzy Clustering Methods,” Computational Statistics & Data Analysis, 51, 192–214.MathSciNetzbMATHCrossRefGoogle Scholar
  12. EGOZCUE, J.J., PAWLOWSKY-GLAHN, V., MATEU-FIGUERAS, G., and BARCELÓ-VIDAL, C. (2003), “Isometric Logratio Transformations for Compositional Data Analysis,” Mathematical Geology, 35, 279–300.MathSciNetCrossRefGoogle Scholar
  13. EGOZCUE, J.J., and PAWLOWSKY-GLAHN, V. (2005), “CoDa-Dendrogram: A New Exploratory Tool,” in Proceedings of the Second Compositional Data Analysis Workshop - CoDaWork’05, Girona, Spain.Google Scholar
  14. GABRIEL, K.R. (1971), “The Biplot Graphic Display of Matrices with Application to Principal Component Analysis,” Biometrika, 58, 453–467.MathSciNetzbMATHCrossRefGoogle Scholar
  15. GAVIN, D.G., OSWALD, W.W., WAHL, E.R., and WILLIAMS, J.W. (2003), “A Statistical Approach to Evaluating Distance Metrics and Analog Assignments for Pollen Records,” Quaternary Research, 60, 356–367.CrossRefGoogle Scholar
  16. GREENACRE, M. (1988), “Clustering the Rows and Columns of a Contingency Table,” Journal of Classification, 5, 39–51.MathSciNetzbMATHCrossRefGoogle Scholar
  17. HARTIGAN, J.A. (1975), Clustering Algorithms, New York: Wiley & Sons.zbMATHGoogle Scholar
  18. HÖPPNER, F., KLAWONN, F., KRUSE, R., and RUNKLER, T. (1999), Fuzzy Cluster Analysis: Methods for Classification, Data analysis, and Image Recognition, Chichester: John Wiley & Sons.zbMATHGoogle Scholar
  19. LEGENDRE, P., and GALLAGHER, E.D. (2001), “Ecologically Meaningful Transformations for Ordination of Species Data,” Oecologia, 129, 271–280.CrossRefGoogle Scholar
  20. MARTÍN, M.C. (1996), “Performance of Eight Dissimilarity Coefficients to Cluster a Compositional Data Set,” in Abstracts of the Fifth Conference of International Federation of Classification Societies (Vol. 1), Kobe, Japan, pp. 215–217.Google Scholar
  21. MARTÍN-FERNÁNDEZ, J.A., BREN, M., BARCELÓ-VIDAL, C., and PAWLOWSKYGLAHN, V. (1999), “A Measure of Difference for Compositional Data Based On Measures of Divergence,” in Proceedings of the Fifth Annual Conference of the International Assotiation for Mathematical Geology (Vol. 1), Trondheim, Norway, pp. 211–215.Google Scholar
  22. MARTÍN-FERNÁNDEZ, J.A., BARCELÓ-VIDAL, C., and PAWLOWSKY-GLAHN, V. (2003), “Dealing with Zeros and Missing Values in Compositional Data Sets,” Mathematical Geology, 35, 253–278.CrossRefGoogle Scholar
  23. MILLER, W.E. (2002), “Revisiting the Geometry of a Ternary Diagram with the Half-Taxi Metric,” Mathematical Geology, 34, 275–290.MathSciNetzbMATHCrossRefGoogle Scholar
  24. PALAREA-ALBALADEJO, J., MARTÍN-FERNÁNDEZ, J.A., and GÓMEZ-GARCÍA, J. (2007), “A Parametric Approach for Dealing with Compositional Rounded Zeros,” Mathematical Geology, 39, 625–645.zbMATHCrossRefGoogle Scholar
  25. PALAREA-ALBALADEJO, J., and MARTÍN-FERNÁNDEZ, J.A. (2008), “A Modified EM alr-Algorithm for Replacing Rounded Zeros in Compositional Data Sets,” Computers & Geosciences, 34, 902–917.CrossRefGoogle Scholar
  26. PAWLOWSKY-GLAHN, V., and EGOZCUE, J.J. (2001), “Geometric Approach to Statistical Analysis on the Simplex,” Stochastic Environmental Research and Risk Assessment, 15, 384–398.zbMATHCrossRefGoogle Scholar
  27. PAWLOWSKY-GLAHN, V. (2003), “Statistical Modelling on Coordinates,” in Proceedings of the First Compositional Data Analysis Workshop - CoDaWork’03, Girona, Spain.Google Scholar
  28. PAWLOWSKY-GLAHN, V., and EGOZCUE, J.J. (2008), “Compositional Data and Simpson’s Paradox,” in Proceedings of the Third Compositional Data Analysis Workshop - CoDaWork’08, Girona, Spain.Google Scholar
  29. SOTO, J., FLORES-SINTAS, A., and PALAREA-ALBALADEJO, J. (2008), “Improving Probabilities in a Fuzzy Clustering Partition,” Fuzzy Sets & Systems, 159, 406–421.MathSciNetzbMATHCrossRefGoogle Scholar
  30. TEMPL, M., FILZMOSER, P., and REIMANN, C. (2008), “Cluster Analysis Applied to Regional Geochemical Data: Problems and Possibilities,” Applied Geochemistry, 23, 2198–2213.CrossRefGoogle Scholar
  31. VÊNCIO, R., VARUZZA, L., PEREIRA, C., BRENTANI, H. and SHMULEVICH, I. (2007), “Simcluster: Clustering Enumeration Gene Expression Data on the Simplex Space,” BMC Bioinformatics, 8, 246.CrossRefGoogle Scholar
  32. WAHL, E.R. (2004), “A General Framework for Determining Cut-off Values to Select Pollen Analogs with Dissimilarity Metrics in the Modern Analog Technique,” Review of Palaeobotany and Palynology, 128, 263–280.CrossRefGoogle Scholar
  33. WANG, H., LIU, Q., MOK, H.M.K., FU, L., and TSE, W.M. (2007), “A Hyperspherical Transformation Forecasting Model for Compositional Data,” European Journal of Operations Research, 179, 459–468.zbMATHCrossRefGoogle Scholar
  34. WATSON, D.F., and PHILIP, G.M. (1989), “Measures of Variability for Geological Data,” Mathematical Geology, 21, 233–254.CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC 2012

Authors and Affiliations

  • Javier Palarea-Albaladejo
    • 1
  • Josep Antoni Martín-Fernández
    • 2
  • Jesús A. Soto
    • 3
  1. 1.Biomathematics and Statistics Scotland, JCMBEdinburghUK
  2. 2.Universitat de GironaGironaSpain
  3. 3.Universidad Católica San AntonioMurciaSpain

Personalised recommendations