Skip to main content
Log in

Advances in self-organizing maps for their application to compositional data

  • Original Paper
  • Published:
Stochastic Environmental Research and Risk Assessment Aims and scope Submit manuscript

Abstract

A self-organizing map (SOM) is a non-linear projection of a D-dimensional data set, where the distance among observations is approximately preserved on to a lower dimensional space. The SOM arranges multivariate data based on their similarity to each other by allowing pattern recognition leading to easier interpretation of higher dimensional data. The SOM algorithm allows for selection of different map topologies, distances and parameters, which determine how the data will be organized on the map. In the particular case of compositional data (such as elemental, mineralogical, or maceral abundance), the sample space is governed by Aitchison geometry and extra steps are required prior to their SOM analysis. Following the principle of working on log-ratio coordinates, the simplicial operations and the Aitchison distance, which are appropriate elements for the SOM, are presented. With this structure developed, a SOM using Aitchison geometry is applied to properly interpret elemental data from combustion products (bottom ash, fly ash, and economizer fly ash) in a Wyoming coal-fired power plant. Results from this effort provide knowledge about the differences between the ash composition in the coal combustion process.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

References

  • Affolter RH, Groves S, Betterton W, Benzel W, Conrad KL, Swanson SM, Ruppert LF, Clough JG, Belkin HE, Kolker A, Hower JC (2011) Geochemical database of feed coal and coal combustion products (CCPs) from five power plants in the United States. U.S. Geological Survey Data Series 635, pamphlet, 19 pp

  • Aitchison J (1986) The statistical analysis of compositional data. Monographs on statistics and applied probability, Chapman & Hall/CRC. Reprinted in 2003 by The Blackburn Press, Caldwell, NJ

  • Aitchison J (2008) The single principle of compositional data analysis, continuing fallacies, confusions and misunderstandings and some suggested remedies. In: Daunis-i-Estadella J, Martín-Fernández JA (eds) Proceedings of CODAWORK’08, The 3rd Compositional Data Analysis Workshop, May 27–30, University of Girona, Girona (Spain), CD-ROM (ISBN: 978-84-8458-272-4, http://hdl.handle.net/10256/706)

  • Akinduko AA, Mirkes EM, Gorban AN (2016) SOM: stochastic initialization versus principal components. Inf Sci 364–365:213–221

    Article  Google Scholar 

  • Barceló-Vidal C, Martín-Fernández JA (2016) The mathematics of compositional analysis. Austrian J Stat 45(4):57–71

    Article  Google Scholar 

  • Cortés JA, Palma JL (2013) Geological applications of self-organizing maps to multidimensional compositional data. Pioneer J Adv Appl Math 7(2):17–49

    Google Scholar 

  • Cox TF, Cox MAA (2001) Multidimensional scaling, 2nd edn. CRC Press, Boca Raton, p 308

    Google Scholar 

  • Dickson BL, Giblin AM (2007) An evaluation of methods for imputation of missing trace element data in groundwaters. Geochem Explor Environ Anal 7:173–178

    Article  CAS  Google Scholar 

  • Edjabou ME, Martín-Fernández JA, Scheutz C, Astrup TF (2017) Statistical analysis of solid waste composition data: arithmetic mean, standard deviation and correlation coefficients. Waste Manag 69:13–23

    Article  Google Scholar 

  • Egozcue JJ, Daunis-i-Estadella J, Pawlowsky-Glahn V, Hron K, Filzmoser P (2012) Simplicial regression. The normal model. J Appl Probab Stat 6(1):87–108

    Google Scholar 

  • Everitt BS, Landau S, Leese M, Stahl D (2011) Cluster analysis, 5th edn. Wiley, Chichester, p 330

    Book  Google Scholar 

  • Jarauta-Bragulat E, Hervada-Sala C, Egozcue JJ (2016) Air quality index revisited from a compositional point of view. Math Geosci 48(5):581–593

    Article  CAS  Google Scholar 

  • Jolliffe IT (2002) Principal component analysis. Springer Series in Statistics, 2nd edn. Springer, New York, p 487

    Google Scholar 

  • Kohonen T (2001) Self-organizing maps. Number 30 in Springer Series in Information Sciences, 3rd edn. Springer, Berlin, p 501

    Google Scholar 

  • Kolker A, Scott C, Hower JC, Vazquez JA, Lopano CL, Dai S (2017) Distribution of rare earth elements in coal combustion fly ash, determined by SHRIMP-RG ion microprobe. Int J Coal Geol 184:1–10

    Article  CAS  Google Scholar 

  • Martín-Fernández JA, Daunis-i-Estadella J, Mateu-Figueras G (2015) On the interpretation of differences between groups for compositional data. SORT 39(2):231–252

    Google Scholar 

  • Martín-Fernández JA, Olea RA, Ruppert LF (2018a) Compositional data analysis of coal combustion products with an application to a Wyoming power plant. Math Geosci 50(6):639–657

    Article  CAS  Google Scholar 

  • Martín-Fernández JA, Pawlowsky-Glahn V, Egozcue JJ, Tolosana-Delgado R (2018b) Principal balances for compositional data. Math Geosci 50(3):273–298

    Article  Google Scholar 

  • Mateu-Figueras G, Pawlowsky-Glahn V, Egozcue JJ (2011) The principle of working on coordinates. In: Pawlowsky-Glahn V, Buccianti A (eds) Compositional data analysis: theory and applications. Wiley, Chichester. https://doi.org/10.1002/9781119976462.ch3

    Chapter  Google Scholar 

  • Melssen W, Wehrens R, Buydens L (2006) Supervised Kohonen networks for classification problems. Chemom Int Lab Syst 83:99–113

    Article  CAS  Google Scholar 

  • Olea RA, Janardhana Raju N, Egozcue JJ, Pawlowsky-Glahn V, Singh Shubhra (2018) Advancements in hydrochemistry mapping: application to groundwater arsenic and iron concentrations in Varanasi, Uttar Pradesh, India. Stoch Env Res Risk Assess 32(1):241–259

    Article  Google Scholar 

  • Palarea-Albaladejo J, Martín-Fernández JA (2015) zCompositions—R package for multivariate imputation of nondetects and zeros in compositional data sets. Chemom Intell Lab Syst 143:85–96

    Article  CAS  Google Scholar 

  • Palarea-Albaladejo J, Martín-Fernández JA, Soto JA (2012) Dealing with distances and transformations for fuzzy C-means clustering of compositional data. J Classif 29:144–169

    Article  Google Scholar 

  • Pawlowsky-Glahn V, Egozcue JJ, Tolosana-Delgado R (2015) Modeling and analysis of compositional data. Wiley, Chichester, p 378

    Google Scholar 

  • Ruhl L, Vengosh A, Dwyer GS, Hsu-Kim H, Deonarine A, Bergin M, Kravchenko J (2009) Survey of the potential environmental and health impacts in the immediate aftermath of the coal ash spill in Kingston, Tennessee. Environ Sci Technol 43:6326–6333

    Article  CAS  Google Scholar 

  • Sokal RR, Rohlf FJ (1962) The comparison of dendrograms by objective methods. Taxon 11:33–40

    Article  Google Scholar 

  • Swanson SM, Engle MA, Ruppert LF, Affolter RH, Jones KB (2013) Partitioning of selected trace elements in coal combustion products from two coal-burning power plants in the United States. Int J Coal Geol 113:116–126

    Article  CAS  Google Scholar 

  • Vasighi M, Kompany-Zareh M (2013) Classification ability of self-organizing maps in comparison with other classification methods. Commun Math Comput Chem 70:29–44

    Google Scholar 

  • Vesanto J, Alhoniemi E (2000) Clustering of the self-organizing map. IEEE Trans Neural Netw 11(3):586–600

    Article  CAS  Google Scholar 

  • Wehrens R, Buydens LMC (2007) Self- and Super-organizing maps in R: the kohonen package. J Stat Softw 21(5):1–19

    Article  Google Scholar 

Download references

Acknowledgements

This work has been supported by the project “CODA-RETOS” (Spanish Ministry of Economy and Competitiveness; Ref: MTM2015-65016-C2-1-R) and the project “Compositional Data Analysis Related to Energy Resources Modeling” (“Salvador de Madariaga” program; “Fulbright” distinction; MECD; Ref.: PRX16/00258). Any use of trade, firm, or product names is for descriptive purposes only and does not imply endorsement by the U.S. Government. We are grateful to C.Ö. Karacan (USGS) and G. Mateu-Figueras (U. de Girona) for their insightful review of a previous version of the paper.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Josep A. Martín-Fernández.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Martín-Fernández, J.A., Engle, M.A., Ruppert, L.F. et al. Advances in self-organizing maps for their application to compositional data. Stoch Environ Res Risk Assess 33, 817–826 (2019). https://doi.org/10.1007/s00477-019-01659-1

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00477-019-01659-1

Keywords

Navigation