Abstract
A self-organizing map (SOM) is a non-linear projection of a D-dimensional data set, where the distance among observations is approximately preserved on to a lower dimensional space. The SOM arranges multivariate data based on their similarity to each other by allowing pattern recognition leading to easier interpretation of higher dimensional data. The SOM algorithm allows for selection of different map topologies, distances and parameters, which determine how the data will be organized on the map. In the particular case of compositional data (such as elemental, mineralogical, or maceral abundance), the sample space is governed by Aitchison geometry and extra steps are required prior to their SOM analysis. Following the principle of working on log-ratio coordinates, the simplicial operations and the Aitchison distance, which are appropriate elements for the SOM, are presented. With this structure developed, a SOM using Aitchison geometry is applied to properly interpret elemental data from combustion products (bottom ash, fly ash, and economizer fly ash) in a Wyoming coal-fired power plant. Results from this effort provide knowledge about the differences between the ash composition in the coal combustion process.
Similar content being viewed by others
References
Affolter RH, Groves S, Betterton W, Benzel W, Conrad KL, Swanson SM, Ruppert LF, Clough JG, Belkin HE, Kolker A, Hower JC (2011) Geochemical database of feed coal and coal combustion products (CCPs) from five power plants in the United States. U.S. Geological Survey Data Series 635, pamphlet, 19 pp
Aitchison J (1986) The statistical analysis of compositional data. Monographs on statistics and applied probability, Chapman & Hall/CRC. Reprinted in 2003 by The Blackburn Press, Caldwell, NJ
Aitchison J (2008) The single principle of compositional data analysis, continuing fallacies, confusions and misunderstandings and some suggested remedies. In: Daunis-i-Estadella J, Martín-Fernández JA (eds) Proceedings of CODAWORK’08, The 3rd Compositional Data Analysis Workshop, May 27–30, University of Girona, Girona (Spain), CD-ROM (ISBN: 978-84-8458-272-4, http://hdl.handle.net/10256/706)
Akinduko AA, Mirkes EM, Gorban AN (2016) SOM: stochastic initialization versus principal components. Inf Sci 364–365:213–221
Barceló-Vidal C, Martín-Fernández JA (2016) The mathematics of compositional analysis. Austrian J Stat 45(4):57–71
Cortés JA, Palma JL (2013) Geological applications of self-organizing maps to multidimensional compositional data. Pioneer J Adv Appl Math 7(2):17–49
Cox TF, Cox MAA (2001) Multidimensional scaling, 2nd edn. CRC Press, Boca Raton, p 308
Dickson BL, Giblin AM (2007) An evaluation of methods for imputation of missing trace element data in groundwaters. Geochem Explor Environ Anal 7:173–178
Edjabou ME, Martín-Fernández JA, Scheutz C, Astrup TF (2017) Statistical analysis of solid waste composition data: arithmetic mean, standard deviation and correlation coefficients. Waste Manag 69:13–23
Egozcue JJ, Daunis-i-Estadella J, Pawlowsky-Glahn V, Hron K, Filzmoser P (2012) Simplicial regression. The normal model. J Appl Probab Stat 6(1):87–108
Everitt BS, Landau S, Leese M, Stahl D (2011) Cluster analysis, 5th edn. Wiley, Chichester, p 330
Jarauta-Bragulat E, Hervada-Sala C, Egozcue JJ (2016) Air quality index revisited from a compositional point of view. Math Geosci 48(5):581–593
Jolliffe IT (2002) Principal component analysis. Springer Series in Statistics, 2nd edn. Springer, New York, p 487
Kohonen T (2001) Self-organizing maps. Number 30 in Springer Series in Information Sciences, 3rd edn. Springer, Berlin, p 501
Kolker A, Scott C, Hower JC, Vazquez JA, Lopano CL, Dai S (2017) Distribution of rare earth elements in coal combustion fly ash, determined by SHRIMP-RG ion microprobe. Int J Coal Geol 184:1–10
Martín-Fernández JA, Daunis-i-Estadella J, Mateu-Figueras G (2015) On the interpretation of differences between groups for compositional data. SORT 39(2):231–252
Martín-Fernández JA, Olea RA, Ruppert LF (2018a) Compositional data analysis of coal combustion products with an application to a Wyoming power plant. Math Geosci 50(6):639–657
Martín-Fernández JA, Pawlowsky-Glahn V, Egozcue JJ, Tolosana-Delgado R (2018b) Principal balances for compositional data. Math Geosci 50(3):273–298
Mateu-Figueras G, Pawlowsky-Glahn V, Egozcue JJ (2011) The principle of working on coordinates. In: Pawlowsky-Glahn V, Buccianti A (eds) Compositional data analysis: theory and applications. Wiley, Chichester. https://doi.org/10.1002/9781119976462.ch3
Melssen W, Wehrens R, Buydens L (2006) Supervised Kohonen networks for classification problems. Chemom Int Lab Syst 83:99–113
Olea RA, Janardhana Raju N, Egozcue JJ, Pawlowsky-Glahn V, Singh Shubhra (2018) Advancements in hydrochemistry mapping: application to groundwater arsenic and iron concentrations in Varanasi, Uttar Pradesh, India. Stoch Env Res Risk Assess 32(1):241–259
Palarea-Albaladejo J, Martín-Fernández JA (2015) zCompositions—R package for multivariate imputation of nondetects and zeros in compositional data sets. Chemom Intell Lab Syst 143:85–96
Palarea-Albaladejo J, Martín-Fernández JA, Soto JA (2012) Dealing with distances and transformations for fuzzy C-means clustering of compositional data. J Classif 29:144–169
Pawlowsky-Glahn V, Egozcue JJ, Tolosana-Delgado R (2015) Modeling and analysis of compositional data. Wiley, Chichester, p 378
Ruhl L, Vengosh A, Dwyer GS, Hsu-Kim H, Deonarine A, Bergin M, Kravchenko J (2009) Survey of the potential environmental and health impacts in the immediate aftermath of the coal ash spill in Kingston, Tennessee. Environ Sci Technol 43:6326–6333
Sokal RR, Rohlf FJ (1962) The comparison of dendrograms by objective methods. Taxon 11:33–40
Swanson SM, Engle MA, Ruppert LF, Affolter RH, Jones KB (2013) Partitioning of selected trace elements in coal combustion products from two coal-burning power plants in the United States. Int J Coal Geol 113:116–126
Vasighi M, Kompany-Zareh M (2013) Classification ability of self-organizing maps in comparison with other classification methods. Commun Math Comput Chem 70:29–44
Vesanto J, Alhoniemi E (2000) Clustering of the self-organizing map. IEEE Trans Neural Netw 11(3):586–600
Wehrens R, Buydens LMC (2007) Self- and Super-organizing maps in R: the kohonen package. J Stat Softw 21(5):1–19
Acknowledgements
This work has been supported by the project “CODA-RETOS” (Spanish Ministry of Economy and Competitiveness; Ref: MTM2015-65016-C2-1-R) and the project “Compositional Data Analysis Related to Energy Resources Modeling” (“Salvador de Madariaga” program; “Fulbright” distinction; MECD; Ref.: PRX16/00258). Any use of trade, firm, or product names is for descriptive purposes only and does not imply endorsement by the U.S. Government. We are grateful to C.Ö. Karacan (USGS) and G. Mateu-Figueras (U. de Girona) for their insightful review of a previous version of the paper.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Martín-Fernández, J.A., Engle, M.A., Ruppert, L.F. et al. Advances in self-organizing maps for their application to compositional data. Stoch Environ Res Risk Assess 33, 817–826 (2019). https://doi.org/10.1007/s00477-019-01659-1
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00477-019-01659-1