A software framework for data dimensionality reduction: application to chemical crystallography
- 2.1k Downloads
Materials science research has witnessed an increasing use of data mining techniques in establishing process‐structure‐property relationships. Significant advances in high‐throughput experiments and computational capability have resulted in the generation of huge amounts of data. Various statistical methods are currently employed to reduce the noise, redundancy, and the dimensionality of the data to make analysis more tractable. Popular methods for reduction (like principal component analysis) assume a linear relationship between the input and output variables. Recent developments in non‐linear reduction (neural networks, self‐organizing maps), though successful, have computational issues associated with convergence and scalability. Another significant barrier to use dimensionality reduction techniques in materials science is the lack of ease of use owing to their complex mathematical formulations. This paper reviews various spectral‐based techniques that efficiently unravel linear and non‐linear structures in the data which can subsequently be used to tractably investigate process‐structure‐property relationships. In addition, we describe techniques (based on graph‐theoretic analysis) to estimate the optimal dimensionality of the low‐dimensional parametric representation. We show how these techniques can be packaged into a modular, computationally scalable software framework with a graphical user interface ‐ Scalable Extensible Toolkit for Dimensionality Reduction (SETDiR). This interface helps to separate out the mathematics and computational aspects from the materials science applications, thus significantly enhancing utility to the materials science community. The applicability of this framework in constructing reduced order models of complicated materials dataset is illustrated with an example dataset of apatites described in structural descriptor space. Cluster analysis of the low‐dimensional plots yielded interesting insights into the correlation between several structural descriptors like ionic radius and covalence with characteristic properties like apatite stability. This information is crucial as it can promote the use of apatite materials as a potential host system for immobilizing toxic elements.
KeywordsNon‐linear dimensionality reduction Process‐structure‐property Apatites Materials science High‐throughput analysis
We gratefully acknowledge the support from the National Science Foundation (NSF) grant CDI‐ NSF‐CDI ‐PHY 09‐41576. KR acknowledges the support from NSF: DMR‐ 13‐07811 and DMS‐11‐25909, Department of Homeland Security/NSF‐ARI Program: CMMI 09‐389018; Army Research Office grant W911NF‐10‐0397, Air Force Office of Scientific Research SFA9550‐12‐1‐0456, and the Wilkinson Professorship of Interdisciplinary Engineering. BG also acknowledges the support from NSF CAREER CMMI‐11‐49365.
- 2.Morgan D, Rodgers J, Ceder G: Automatic construction, implementation and assessment of pettifor maps. J Phys: Condens Matter 2003, 15(25):4361.Google Scholar
- 9.Zabaras N, Sundararaghavan V, Sankaran S: An information‐theoretic approach for obtaining property PDFs from macro specifications of microstructural variability. TMS Lett 2006, 3: 1–2.Google Scholar
- 12.Lumley JL (1967) The structure of inhomogeneous turbulent flows. Atmospheric turbulence and radio wave propagation166–178. Lumley JL (1967) The structure of inhomogeneous turbulent flows. Atmospheric turbulence and radio wave propagation166–178.Google Scholar
- 20.Curtarolo S, Morgan D, Persson K, Rodgers J, Ceder G (2003) Predicting crystal structures with data mining of quantum calculations. Phys Rev Lett 91: 135503. Curtarolo S, Morgan D, Persson K, Rodgers J, Ceder G (2003) Predicting crystal structures with data mining of quantum calculations. Phys Rev Lett 91: 135503.Google Scholar
- 24.Van der Maaten LJP, Postma EO, Van Den Herik HJ (2009) Dimensionality reduction: a comparative review.Google Scholar
- 25.Elliott JC: Structure and chemistry of the apatites and other calcium orthophosphates, volume 4. Elsevier, Amsterdam; 1994.Google Scholar
- 26.Mercier PHJ, Le Page Y, Whitfield PS, Mitchell LD, Davidson IJ, White TJ: Geometrical parameterization of the crystal chemistry of P63/m apatites: comparison with experimental data and ab initio results. Acta Crystallogr Sect B: Structural Sci 2005, 61(6):635–655. 10.1107/S0108768105031125CrossRefGoogle Scholar
- 30.Samudrala S, Rajan K, Ganapathysubramanian B (2013) Data dimensionality reduction in materials science In: Informatics for materials science and engineering: data-driven discovery for accelerated experimentation and application.. Elsevier Science.Google Scholar
- 36.Wodo O, Tirthapura S, Chaudhary S, Ganapathysubramanian B (2012) A novel graph based formulation for characterizing morphology with application to organic solar cells. Org Electron: 1105–1113.Google Scholar
- 37.Golub GH, Van Loan CF (1996) Matrix computations. The John Hopkins University Press. Golub GH, Van Loan CF (1996) Matrix computations. The John Hopkins University Press.Google Scholar
- 39.Bernstein M, De Silva V, Langford JC, Tenenbaum JB: Graph approximations to geodesics on embedded manifolds. Technical report, Department of Psychology, Stanford University; 2000.Google Scholar
- 43.Balachandran PV: Statistical learning for chemical crystallography. PhD thesis, Iowa State University; 2011.Google Scholar
- 46.Pauling L (1960) The nature of the chemical bond and the structure of molecules and crystals: an introduction to modern structural chemistry, vol 18. Cornell University Press.Google Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited.
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (https://creativecommons.org/licenses/by/4.0), which permits use, duplication, adaptation, distribution, and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.