Abstract
Our ability to describe crystal structure features is of crucial importance when attempting to understand structure–property relationships in the solid state. In this paper, the authors introduce robocrystallographer, an open-source toolkit for analyzing crystal structures. This package combines new and existing open-source analysis tools to provide structural information, including the local coordination and polyhedral type, polyhedral connectivity, octahedral tilt angles, component-dimensionality, and molecule-within-crystal and fuzzy prototype identification. Using this information, robocrystallographer can generate text-based descriptions of crystal structures that resemble descriptions written by human crystallographers. The authors use robocrystallographer to investigate the dimensionalities of all compounds in the Materials Project database and highlight its potential in machine learning studies.
This is a preview of subscription content, access via your institution.




References
W.H. Bragg: The significance of crystal structure. J. Chem. Soc. Trans. 121, 2766 (1922).
A. Van De Walle: A complete representation of structure-property relationships in crystals. Nat. Mater. 7, 455–458 (2008).
H.O. Pierson: Handbook of Carbon, Graphite, Diamonds and Fullerenes: Processing, Properties and Applications (William Andrew, New York, 2012).
A. von Hippel: Ferroelectricity, domain structure, and phase transitions of barium titanate. Rev. Mod. Phys. 22, 221–237 (1950).
J.K. Burdett and S. Lee: Peierls distortions in two and three dimensions and the structures of AB solids. J. Am. Chem. Soc. 105, 1079–1083 (1983).
D.O. Scanlon, C.W. Dunnill, J. Buckeridge, S.A. Shevlin, A.J. Logsdail, S.M. Woodley, R.A. Catlow, M.J. Powell, R.G. Palgrave, G.W. Watson, T.W. Keal, P. Sherwood, A. Walsh, and A.A. Sokol: Band alignment of rutile and anatase TiO2. Nat. Mater 12, 798–801 (2013).
A. Zunger: Inverse design in search of materials with target functionalities. Nat. Rev. Chem. 2, 0121 (2018).
P. Gorai, E.S. Toberer, and V. Stevanovic: Computational identification of promising thermoelectric materials among known quasi-2D binary compounds. J. Mater. Chem. A 4, 11110–11116 (2016).
P.M. Larsen, M. Pandey, M. Strange, and K.W. Jacobsen: Definition of a scoring parameter to identify low-dimensional materials components. (2018). arXiv:1808.02114 1–11.
L. Himanen, P. Rinke, and A.S. Foster: Materials structure genealogy and high-throughput topological classification of surfaces and 2D materials. npj Comput. Mater. 4, 1–10 (2018).
M. Ashton, J. Paul, S.B. Sinnott, and R.G. Hennig: Topology-scaling identification of layered solids and stable exfoliated 2D materials. Phys. Rev. Lett. 118, 1–6 (2017).
A. Togo and I. Tanaka: Spglib: a software library for crystal symmetry search. (2018). arXiv:1808.01590 1–11.
M.J. Mehl, D. Hicks, C. Toher, O. Levy, R.M. Hanson, Gus Hart, and S. Curtarolo: The AFLOW library of crystallographic prototypes: part 1. Comput. Mater. Sci 136, S1–S828 (2017).
D. Waroquiers, Xavier Gonze, G.-M. Rignanese, C. Welker-Nieuwoudt, F. Rosowski, M. Göbel, S. Schenk, P. Degelmann, R. André, R. Glaum, and G. Hautier: Statistical analysis of coordination environments in oxides. Chem. Mater 29, 8346–8360 (2017).
N.E.R. Zimmermann, M.K. Horton, A. Jain, and M. Haranczyk: Assessing local structure motifs using order parameters for motif recognition, interstitial identification, and diffusion path characterization. Front. Mater. 4, 1–13 (2017).
S.P. Ong, W.D. Richards, G. Hautier, M. Kocher, S. Cholia, D. Gunter, V.L. Chevrier, K.A. Persson, and G. Ceder: Python materials genomics (pymatgen): a robust, open-source python library for materials analysis. Comput. Mater. Sci 68, 314–319 (2013).
L. Ward, A. Dunn, A. Fahaninia, N.E.R. Zimmermann, S. Bajaj, Q. Wang, J. Montoya, J. Chen, K. Bystrom, M. Dylla, K. Chard, M. Asta, K.A. Persson, G.J. Snyder, I. Foster, and A. Jain: Matminer: an open source toolkit for materials data mining. Comput. Mater. Sci 152, 60–69 (2018).
N.M. O’Boyle, M. Banck, C.A. James, C. Morley, T. Vandermeersch, and G. Hutchison: Open babel: an open chemical toolbox. J. Cheminform 3, 33 (2011).
M. Swain: PubChemPy. https://github.com/mcs07/PubChemPy (accessed January 11, 2019).
S. Kim, P.A. Thiessen, E.E. Bolton, J. Chen, G. Fu, A. Gindulyte, L. Han, J. He, S. He, B.A. Shoemaker, and J. Wang: Pubchem substance and compound databases. Nucleic Acids Res 44, D1202–D1213 (2016).
Pymatgen. http://pymatgen.org (accessed January 14, 2019): 2019.
G. Voronoi: Nouvelles applications des paramètres continus à la théorie des formes quadratiques. Premier mémoire. Sur quelques propriétés des formes quadratiques positives parfaites. J. Reine Angew. Math. 133, 97–178 (1908).
G. Giesecke and H. Pfister: Präzisionsbestimmung der Gitterkonstanten von AIIIBv -verbindungen. Acta Crystallogr. 11, 369–371 (1958).
B.C. Frazer and P.J. Brown: Antiferromagnetic structure of CrVO4 and the anhydrous sulfates of divalent Fe, Ni, and Co. Phys. Rev. 125, 1283–1291 (1962).
L.N. Kholodkovskaya, L.G. Akselrud, A.M. Kusainova, V.A. Dolgikh, and B.A. Popovkin: Bicuseo: synthesis and crystal structure. Mater. Sci. Forum 133–136, 693–696 (1993).
M. Roos and G. Meyer: Kristallstrukturen von NH4GaF4 und NH4GaF4·NH3. Zeitschr. Anorg. Allg. Chem. 625, 1843–1847 (1999).
A. Jain, S.P. Ong, G. Hautier, W. Chen, W.D. Richards, S. Dacek, S. Cholia, D. Gunter, D. Skinner, and G. Ceder: Commentary: the materials project: a materials genome approach to accelerating materials innovation. APL Mater 1, 011002 (2013).
M. de Jong, W. Chen, T. Angsten, A. Jain, R. Notestine, A. Gamst, M. Sluiter, C.K. Ande, S. van der Zwagg, J.J. Plata, and C. Toher: Charting the complete elastic properties of inorganic crystalline compounds. Sci. Data 2, 150009 (2015).
L. Ward, A. Agrawal, A. Choudhary, and C. Wolverton: A general-purpose machine learning framework for predicting properties of inorganic materials. npj Comput. Mater. 2, 16028 (2016).
F. Faber, A. Lindmaa, O.A. Von Lilienfeld, and R. Armiento: Crystal structure representations for machine learning models of formation energies. Int. J. Quantum Chem. 115, 1094–1101 (2015).
G. Tzanis, C. Berberidis, and I. Vlahavas: Machine Learning and Data Mining in Bioinformatics. Machine Learning (IGI Global, Pennsylvania, 2011).
R.J. Urbanowicz, R.S. Olson, P. Schmitt, M. Meeker, and J.H. Moore: Benchmarking relief-based feature selection methods for bioinformatics data mining. J. Biomed. Inform. 85, 168–188 (2017).
M.C. Swain and J.M. Cole: Chemdataextractor: a toolkit for automated extraction of chemical information from the scientific literature. J. Chem. Inf. Model. 56, 1894–1904 (2016).
E. Kim, K. Huang, A. Saunders, A. McCallum, G. Ceder, and E. Olivetti: Materials synthesis insights from scientific literature via text extraction and machine learning. Chem. Mater 29, 9436–9444 (2017).
W.H. Gomaa and A.A. Fahmy: A survey of text similarity approaches. Int. J. Comput. Appl. 68, 13–18 (2013).
Acknowledgments
The authors acknowledge many useful discussions with Matt Horton regarding structure dimensionality. The authors additionally acknowledge Matt Horton for his work on the StructureGraph and BondedStructure components of pymatgen and for parsing the AFLOW prototype library. The authors acknowledge Evan Spotte-Smith for his work on the MoleculeGraph functionality in pymatgen. The authors acknowledge useful discussions with Leigh Weston regarding materials science text mining. The authors acknowledge Donny Winston for facilitating the calculation of robocrystallographer on all structures in the Materials Project database. The authors acknowledge useful conversations with Alex Dunn regarding machine learning model optimization. This work was intellectually led and funded by the U.S. Department of Energy (DOE) Basic Energy Sciences (BES) program—the Materials Project—under Grant No. KC23MP. Lawrence Berkeley National Laboratory is funded by the DOE under award DE-AC02-05CH11231.
Author information
Authors and Affiliations
Corresponding author
Supplementary Material
Supplementary Material
The supplementary material for this article can be found at https://doi.org/10.1557/mrc.2019.94.
Rights and permissions
About this article
Cite this article
Ganose, A.M., Jain, A. Robocrystallographer: automated crystal structure text descriptions and analysis. MRS Communications 9, 874–881 (2019). https://doi.org/10.1557/mrc.2019.94
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1557/mrc.2019.94