A software framework for data dimensionality reduction: application to chemical crystallography

  • Sai Kiranmayee SamudralaEmail author
  • Prasanna Venkataraman Balachandran
  • Jaroslaw Zola
  • Krishna RajanEmail author
  • Baskar GanapathysubramanianEmail author
Part of the following topical collections:
  1. Use of Digital Data in Materials Science and Engineering


Materials science research has witnessed an increasing use of data mining techniques in establishing process‐structure‐property relationships. Significant advances in high‐throughput experiments and computational capability have resulted in the generation of huge amounts of data. Various statistical methods are currently employed to reduce the noise, redundancy, and the dimensionality of the data to make analysis more tractable. Popular methods for reduction (like principal component analysis) assume a linear relationship between the input and output variables. Recent developments in non‐linear reduction (neural networks, self‐organizing maps), though successful, have computational issues associated with convergence and scalability. Another significant barrier to use dimensionality reduction techniques in materials science is the lack of ease of use owing to their complex mathematical formulations. This paper reviews various spectral‐based techniques that efficiently unravel linear and non‐linear structures in the data which can subsequently be used to tractably investigate process‐structure‐property relationships. In addition, we describe techniques (based on graph‐theoretic analysis) to estimate the optimal dimensionality of the low‐dimensional parametric representation. We show how these techniques can be packaged into a modular, computationally scalable software framework with a graphical user interface ‐ Scalable Extensible Toolkit for Dimensionality Reduction (SETDiR). This interface helps to separate out the mathematics and computational aspects from the materials science applications, thus significantly enhancing utility to the materials science community. The applicability of this framework in constructing reduced order models of complicated materials dataset is illustrated with an example dataset of apatites described in structural descriptor space. Cluster analysis of the low‐dimensional plots yielded interesting insights into the correlation between several structural descriptors like ionic radius and covalence with characteristic properties like apatite stability. This information is crucial as it can promote the use of apatite materials as a potential host system for immobilizing toxic elements.


Non‐linear dimensionality reduction Process‐structure‐property Apatites Materials science High‐throughput analysis 



We gratefully acknowledge the support from the National Science Foundation (NSF) grant CDI‐ NSF‐CDI ‐PHY 09‐41576. KR acknowledges the support from NSF: DMR‐ 13‐07811 and DMS‐11‐25909, Department of Homeland Security/NSF‐ARI Program: CMMI 09‐389018; Army Research Office grant W911NF‐10‐0397, Air Force Office of Scientific Research SFA9550‐12‐1‐0456, and the Wilkinson Professorship of Interdisciplinary Engineering. BG also acknowledges the support from NSF CAREER CMMI‐11‐49365.

Supplementary material

40192_2014_17_MOESM1_ESM.gif (29 kb)
Authors’ original file for figure 1
40192_2014_17_MOESM2_ESM.gif (54 kb)
Authors’ original file for figure 2
40192_2014_17_MOESM3_ESM.gif (158 kb)
Authors’ original file for figure 3
40192_2014_17_MOESM4_ESM.gif (70 kb)
Authors’ original file for figure 4
40192_2014_17_MOESM5_ESM.gif (19 kb)
Authors’ original file for figure 5
40192_2014_17_MOESM6_ESM.gif (8 kb)
Authors’ original file for figure 6
40192_2014_17_MOESM7_ESM.gif (121 kb)
Authors’ original file for figure 7
40192_2014_17_MOESM8_ESM.gif (63 kb)
Authors’ original file for figure 8
40192_2014_17_MOESM9_ESM.gif (96 kb)
Authors’ original file for figure 9
40192_2014_17_MOESM10_ESM.gif (32 kb)
Authors’ original file for figure 10
40192_2014_17_MOESM11_ESM.gif (39 kb)
Authors’ original file for figure 11
40192_2014_17_MOESM12_ESM.gif (36 kb)
Authors’ original file for figure 12
40192_2014_17_MOESM13_ESM.gif (36 kb)
Authors’ original file for figure 13


  1. 1.
    Rabe KM, Phillips JC, Villars P, Brown ID: Global multinary structural chemistry of stable quasicrystals, high‐ tc ferroelectrics, and high‐ tc superconductors. Phys Rev B 1992, 45: 7650–7676. 10.1103/PhysRevB.45.7650CrossRefGoogle Scholar
  2. 2.
    Morgan D, Rodgers J, Ceder G: Automatic construction, implementation and assessment of pettifor maps. J Phys: Condens Matter 2003, 15(25):4361.Google Scholar
  3. 3.
    Chawla N, Ganesh VV, Wunsch B: Three‐dimensional (3d) microstructure visualization and finite element modeling of the mechanical behavior of SiC particle reinforced aluminum composites. Scripta Materialia 2004, 51(2):161–165. 10.1016/j.scriptamat.2004.03.043CrossRefGoogle Scholar
  4. 4.
    Langer SA, Jr. Fuller ER, Carter WC: OOF: an image‐based finite‐element analysis of material microstructures. Comput Sci Eng 2001, 3(3):15–23. 10.1109/5992.919261CrossRefGoogle Scholar
  5. 5.
    Liu ZK, Chen LQ, Raghavan P, Du Q, Sofo JO, Langer SA, Wolverton C: An integrated framework for multi‐scale materials simulation and design. J Comput Aided Mater Des 2004, 11: 183–199. 10.1007/s10820-005-3173-2CrossRefGoogle Scholar
  6. 6.
    van Rietbergen B, Weinans H, Huiskes R, Odgaard A: A new method to determine trabecular bone elastic properties and loading using micromechanical finite‐element models. J Biomech 1995, 28(1):69–81. 10.1016/0021-9290(95)80008-5CrossRefGoogle Scholar
  7. 7.
    Yue ZQ, Chen S, Tham LG: Finite element modeling of geomaterials using digital image processing. Comput Geotechnics 2003, 30(5):375–397. 10.1016/S0266-352X(03)00015-6CrossRefGoogle Scholar
  8. 8.
    McVeigh C, Liu WK: Linking microstructure and properties through a predictive multiresolution continuum. Comput Methods Appl Mech Eng 2008, 197(4142):3268–3290. 10.1016/j.cma.2007.12.020CrossRefGoogle Scholar
  9. 9.
    Zabaras N, Sundararaghavan V, Sankaran S: An information‐theoretic approach for obtaining property PDFs from macro specifications of microstructural variability. TMS Lett 2006, 3: 1–2.Google Scholar
  10. 10.
    Meredith JC, Smith AP, Karim A, Amis EJ: Combinatorial materials science for polymer thin‐film dewetting. Macromolecules 2000, 33(26):9747–9756. 10.1021/ma001298gCrossRefGoogle Scholar
  11. 11.
    Takeuchi I, Lauterbach J, Fasolka MJ: Combinatorial materials synthesis. Mater Today 2005, 8(10):18–26. 10.1016/S1369-7021(05)71121-4CrossRefGoogle Scholar
  12. 12.
    Lumley JL (1967) The structure of inhomogeneous turbulent flows. Atmospheric turbulence and radio wave propagation166–178. Lumley JL (1967) The structure of inhomogeneous turbulent flows. Atmospheric turbulence and radio wave propagation166–178.Google Scholar
  13. 13.
    Tenenbaum JB, de Silva V, Langford JC: A global geometric framework for nonlinear dimensionality reduction. Science 2000, 290(5500):2319–2323. 10.1126/science.290.5500.2319CrossRefGoogle Scholar
  14. 14.
    Donoho DL, Grimes C: Hessian eigenmaps: new locally linear embedding techniques for high‐dimensional data. Proc Natl Acad Sci 2003, 100: 5591–5596. 10.1073/pnas.1031596100CrossRefGoogle Scholar
  15. 15.
    Jain A, Ong SP, Hautier G, Chen W, Richards WD, Dacek S, Cholia S, Gunter D, Skinner D, Ceder G, Persson K: The materials project: a materials genome approach to accelerating materials innovation. APL Mater 2013, 1(1):011002. 10.1063/1.4812323CrossRefGoogle Scholar
  16. 16.
    Page YL (2006) Data mining in and around crystal structure databases. MRS Bulletin 31: 991–994. Page YL (2006)CrossRefGoogle Scholar
  17. 17.
    Rajan K, Suh C, Mendez PF: Principal component analysis and dimensional analysis as materials informatics tools to reduce dimensionality in materials science and engineering. Stat Anal Data Mining 2009, 1(6):361–371. 10.1002/sam.10031CrossRefGoogle Scholar
  18. 18.
    Brasca R, Vergara LI, Passeggi MCG, Ferrona J (2007) Chemical changes of titanium and titanium dioxide under electron bombardment. Mat Res 10: 283–288.CrossRefGoogle Scholar
  19. 19.
    Ganapathysubramanian B, Zabaras N: A non‐linear dimension reduction methodology for generating data‐driven stochastic input models. J Comput Phys 2008, 227(13):6612–6637. 10.1016/ Scholar
  20. 20.
    Curtarolo S, Morgan D, Persson K, Rodgers J, Ceder G (2003) Predicting crystal structures with data mining of quantum calculations. Phys Rev Lett 91: 135503. Curtarolo S, Morgan D, Persson K, Rodgers J, Ceder G (2003) Predicting crystal structures with data mining of quantum calculations. Phys Rev Lett 91: 135503.Google Scholar
  21. 21.
    Fischer CC, Tibbetts KJ, Morgan D, Ceder G: Predicting crystal structure by merging data mining with quantum mechanics. Nat Mater 2006, 5(8):641–646. 10.1038/nmat1691CrossRefGoogle Scholar
  22. 22.
    Morgan D, Ceder G, Curtarolo S: High‐throughput and data mining with ab initio methods. Meas Sci Technol 2005, 16(1):296. 10.1088/0957-0233/16/1/039CrossRefGoogle Scholar
  23. 23.
    Lee JA, Verleysen M (2007) Nonlinear dimensionality reduction. Springer.CrossRefGoogle Scholar
  24. 24.
    Van der Maaten LJP, Postma EO, Van Den Herik HJ (2009) Dimensionality reduction: a comparative review.Google Scholar
  25. 25.
    Elliott JC: Structure and chemistry of the apatites and other calcium orthophosphates, volume 4. Elsevier, Amsterdam; 1994.Google Scholar
  26. 26.
    Mercier PHJ, Le Page Y, Whitfield PS, Mitchell LD, Davidson IJ, White TJ: Geometrical parameterization of the crystal chemistry of P63/m apatites: comparison with experimental data and ab initio results. Acta Crystallogr Sect B: Structural Sci 2005, 61(6):635–655. 10.1107/S0108768105031125CrossRefGoogle Scholar
  27. 27.
    Pramana SS, Klooster WT, White TJ: A taxonomy of apatite frameworks for the crystal chemical design of fuel cell electrolytes. J Solid State Chem 2008, 181(8):1717–1722. 10.1016/j.jssc.2008.03.028CrossRefGoogle Scholar
  28. 28.
    White T, Ferraris C, Kim J, Madhavi S: Apatite–an adaptive framework structure. Rev Mineralogy Geochem 2005, 57(1):307–401. 10.2138/rmg.2005.57.10CrossRefGoogle Scholar
  29. 29.
    White TJ, Dong ZL: Structural derivation and crystal chemistry of apatites. Acta Crystallogr Sect B: Structural Sci 2003, 59(1):1–16. 10.1107/S0108768102019894CrossRefGoogle Scholar
  30. 30.
    Samudrala S, Rajan K, Ganapathysubramanian B (2013) Data dimensionality reduction in materials science In: Informatics for materials science and engineering: data-driven discovery for accelerated experimentation and application.. Elsevier Science.Google Scholar
  31. 31.
    Bergman S (1950) The kernel function and conformal mapping. Am Math Soc.CrossRefGoogle Scholar
  32. 32.
    Roweis ST, Saul LK: Nonlinear dimensionality reduction by locally linear embedding. Science 2000, 290(5500):2323–2326. 10.1126/science.290.5500.2323CrossRefGoogle Scholar
  33. 33.
    Fontanini A, Olsen M, Ganapathysubramanian B (2011) Thermal comparison between ceiling diffusers and fabric ductwork diffusers for green buildings, Energy and Buildings43(11): 2973–2987. ISSN 0378–7788. ., []CrossRefGoogle Scholar
  34. 34.
    Amini H, Sollier E, Masaeli M, Xie Y, Ganapathysubramanian B, Stone HA, Di Carlo D (2013) Engineering fluid flow using sequenced microstructures. Nature Communications 4: 2013.CrossRefGoogle Scholar
  35. 35.
    Guo Q (2013) Incorporating stochastic analysis in wind turbine design: data-driven random temporal-spatial parameterization and uncertainty quantication. Graduate Theses and Dissertations. Paper 13206. ., []Google Scholar
  36. 36.
    Wodo O, Tirthapura S, Chaudhary S, Ganapathysubramanian B (2012) A novel graph based formulation for characterizing morphology with application to organic solar cells. Org Electron: 1105–1113.Google Scholar
  37. 37.
    Golub GH, Van Loan CF (1996) Matrix computations. The John Hopkins University Press. Golub GH, Van Loan CF (1996) Matrix computations. The John Hopkins University Press.Google Scholar
  38. 38.
    Floyd RW: Algorithm 97: shortest path. Commun ACM 1962, 5(6):345. 10.1145/367766.368168CrossRefGoogle Scholar
  39. 39.
    Bernstein M, De Silva V, Langford JC, Tenenbaum JB: Graph approximations to geodesics on embedded manifolds. Technical report, Department of Psychology, Stanford University; 2000.Google Scholar
  40. 40.
    Belkin M, Niyogi P: Laplacian eigenmaps for dimensionality reduction and data representation. Neural Comput 2003, 15(6):1373–1396. 10.1162/089976603321780317CrossRefGoogle Scholar
  41. 41.
    Beardwood J, Halton JH, Hammersley JM (1959) The shortest path through many points. Math Proc Camb Philos Soc 55: 299–327.CrossRefGoogle Scholar
  42. 42.
    Grassberger P, Procaccia I: Measuring the strangeness of strange attractors. Phys D: Nonlinear Phenomena 1983, 9(12):189–208. 10.1016/0167-2789(83)90298-1CrossRefGoogle Scholar
  43. 43.
    Balachandran PV: Statistical learning for chemical crystallography. PhD thesis, Iowa State University; 2011.Google Scholar
  44. 44.
    Balachandran PV, Rajan K (2012) Structure maps for AI4AII6(BO4)6X2 apatite compounds via data mining. Acta Crystallogr Sect B 68(1): 24–33.CrossRefGoogle Scholar
  45. 45.
    Shannon RD: Revised effective ionic radii and systematic studies of interatomic distances in halides and chalcogenides. Acta Crystallographic Sect A: Crystal Phys Diffraction Theor Gen Crystallography 1976, 32(5):751–767. 10.1107/S0567739476001551CrossRefGoogle Scholar
  46. 46.
    Pauling L (1960) The nature of the chemical bond and the structure of molecules and crystals: an introduction to modern structural chemistry, vol 18. Cornell University Press.Google Scholar
  47. 47.
    Matsunaga K, Inamori H, Murata H (2008) Theoretical trend of ion exchange ability with divalent cations in hydroxyapatite. Phys Rev B 78: 094101.CrossRefGoogle Scholar
  48. 48.
    Balachandran PV, Rajan K, Rondinelli JM (2014) Electronically driven structural transitions in A10(PO4)6F2 apatites (A = Ca, Sr, Pb, Cd and Hg). Acta Crystallogr Sect B 70: 612–615.CrossRefGoogle Scholar
  49. 49.
    Flora NJ, Hamilton KW, Schaeffer RW, Yoder CH: A comparative study of the synthesis of calcium, strontium, barium, cadmium, and lead apatites in aqueous solution. Synthesis Reactivity Inorganic Metal‐organic Chem 2004, 34(3):503–521. 10.1081/SIM-120030437CrossRefGoogle Scholar
  50. 50.
    Prim RC: Shortest connection networks and some generalizations. Bell Syst Tech J 1957, 36(6):1389–1401. 10.1002/j.1538-7305.1957.tb01515.xCrossRefGoogle Scholar

Copyright information

© Samudrala et al. licensee Springer 2014

This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited.

Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (, which permits use, duplication, adaptation, distribution, and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

Authors and Affiliations

  1. 1.School of Mechanical EngineeringGeorgia TechAtlantaUSA
  2. 2.Department of Materials ScienceDrexel UniversityPhiladelphiaUSA
  3. 3.Rutgers Discovery Informatics InstituteRutgers UniversityPiscatawayUSA
  4. 4.Department of Materials Science and EngineeringIowa State UniversityAmesUSA
  5. 5.Department of Mechanical EngineeringIowa State UniversityAmesUSA

Personalised recommendations