Skip to main content

Robocrystallographer: automated crystal structure text descriptions and analysis

Abstract

Our ability to describe crystal structure features is of crucial importance when attempting to understand structure–property relationships in the solid state. In this paper, the authors introduce robocrystallographer, an open-source toolkit for analyzing crystal structures. This package combines new and existing open-source analysis tools to provide structural information, including the local coordination and polyhedral type, polyhedral connectivity, octahedral tilt angles, component-dimensionality, and molecule-within-crystal and fuzzy prototype identification. Using this information, robocrystallographer can generate text-based descriptions of crystal structures that resemble descriptions written by human crystallographers. The authors use robocrystallographer to investigate the dimensionalities of all compounds in the Materials Project database and highlight its potential in machine learning studies.

This is a preview of subscription content, access via your institution.

Figure 1
Figure 2
Figure 3
Figure 4

References

  1. W.H. Bragg: The significance of crystal structure. J. Chem. Soc. Trans. 121, 2766 (1922).

    Article  CAS  Google Scholar 

  2. A. Van De Walle: A complete representation of structure-property relationships in crystals. Nat. Mater. 7, 455–458 (2008).

    Article  Google Scholar 

  3. H.O. Pierson: Handbook of Carbon, Graphite, Diamonds and Fullerenes: Processing, Properties and Applications (William Andrew, New York, 2012).

    Google Scholar 

  4. A. von Hippel: Ferroelectricity, domain structure, and phase transitions of barium titanate. Rev. Mod. Phys. 22, 221–237 (1950).

    Article  Google Scholar 

  5. J.K. Burdett and S. Lee: Peierls distortions in two and three dimensions and the structures of AB solids. J. Am. Chem. Soc. 105, 1079–1083 (1983).

    Article  CAS  Google Scholar 

  6. D.O. Scanlon, C.W. Dunnill, J. Buckeridge, S.A. Shevlin, A.J. Logsdail, S.M. Woodley, R.A. Catlow, M.J. Powell, R.G. Palgrave, G.W. Watson, T.W. Keal, P. Sherwood, A. Walsh, and A.A. Sokol: Band alignment of rutile and anatase TiO2. Nat. Mater 12, 798–801 (2013).

    Article  CAS  Google Scholar 

  7. A. Zunger: Inverse design in search of materials with target functionalities. Nat. Rev. Chem. 2, 0121 (2018).

    Article  CAS  Google Scholar 

  8. P. Gorai, E.S. Toberer, and V. Stevanovic: Computational identification of promising thermoelectric materials among known quasi-2D binary compounds. J. Mater. Chem. A 4, 11110–11116 (2016).

    Article  CAS  Google Scholar 

  9. P.M. Larsen, M. Pandey, M. Strange, and K.W. Jacobsen: Definition of a scoring parameter to identify low-dimensional materials components. (2018). arXiv:1808.02114 1–11.

    Google Scholar 

  10. L. Himanen, P. Rinke, and A.S. Foster: Materials structure genealogy and high-throughput topological classification of surfaces and 2D materials. npj Comput. Mater. 4, 1–10 (2018).

    Article  Google Scholar 

  11. M. Ashton, J. Paul, S.B. Sinnott, and R.G. Hennig: Topology-scaling identification of layered solids and stable exfoliated 2D materials. Phys. Rev. Lett. 118, 1–6 (2017).

    Article  Google Scholar 

  12. A. Togo and I. Tanaka: Spglib: a software library for crystal symmetry search. (2018). arXiv:1808.01590 1–11.

    Google Scholar 

  13. M.J. Mehl, D. Hicks, C. Toher, O. Levy, R.M. Hanson, Gus Hart, and S. Curtarolo: The AFLOW library of crystallographic prototypes: part 1. Comput. Mater. Sci 136, S1–S828 (2017).

    Article  CAS  Google Scholar 

  14. D. Waroquiers, Xavier Gonze, G.-M. Rignanese, C. Welker-Nieuwoudt, F. Rosowski, M. Göbel, S. Schenk, P. Degelmann, R. André, R. Glaum, and G. Hautier: Statistical analysis of coordination environments in oxides. Chem. Mater 29, 8346–8360 (2017).

    Article  CAS  Google Scholar 

  15. N.E.R. Zimmermann, M.K. Horton, A. Jain, and M. Haranczyk: Assessing local structure motifs using order parameters for motif recognition, interstitial identification, and diffusion path characterization. Front. Mater. 4, 1–13 (2017).

    Article  Google Scholar 

  16. S.P. Ong, W.D. Richards, G. Hautier, M. Kocher, S. Cholia, D. Gunter, V.L. Chevrier, K.A. Persson, and G. Ceder: Python materials genomics (pymatgen): a robust, open-source python library for materials analysis. Comput. Mater. Sci 68, 314–319 (2013).

    Article  CAS  Google Scholar 

  17. L. Ward, A. Dunn, A. Fahaninia, N.E.R. Zimmermann, S. Bajaj, Q. Wang, J. Montoya, J. Chen, K. Bystrom, M. Dylla, K. Chard, M. Asta, K.A. Persson, G.J. Snyder, I. Foster, and A. Jain: Matminer: an open source toolkit for materials data mining. Comput. Mater. Sci 152, 60–69 (2018).

    Article  Google Scholar 

  18. N.M. O’Boyle, M. Banck, C.A. James, C. Morley, T. Vandermeersch, and G. Hutchison: Open babel: an open chemical toolbox. J. Cheminform 3, 33 (2011).

    Article  Google Scholar 

  19. M. Swain: PubChemPy. https://github.com/mcs07/PubChemPy (accessed January 11, 2019).

  20. S. Kim, P.A. Thiessen, E.E. Bolton, J. Chen, G. Fu, A. Gindulyte, L. Han, J. He, S. He, B.A. Shoemaker, and J. Wang: Pubchem substance and compound databases. Nucleic Acids Res 44, D1202–D1213 (2016).

    Article  CAS  Google Scholar 

  21. Pymatgen. http://pymatgen.org (accessed January 14, 2019): 2019.

  22. G. Voronoi: Nouvelles applications des paramètres continus à la théorie des formes quadratiques. Premier mémoire. Sur quelques propriétés des formes quadratiques positives parfaites. J. Reine Angew. Math. 133, 97–178 (1908).

    Article  Google Scholar 

  23. G. Giesecke and H. Pfister: Präzisionsbestimmung der Gitterkonstanten von AIIIBv -verbindungen. Acta Crystallogr. 11, 369–371 (1958).

    Article  CAS  Google Scholar 

  24. B.C. Frazer and P.J. Brown: Antiferromagnetic structure of CrVO4 and the anhydrous sulfates of divalent Fe, Ni, and Co. Phys. Rev. 125, 1283–1291 (1962).

    Article  CAS  Google Scholar 

  25. L.N. Kholodkovskaya, L.G. Akselrud, A.M. Kusainova, V.A. Dolgikh, and B.A. Popovkin: Bicuseo: synthesis and crystal structure. Mater. Sci. Forum 133–136, 693–696 (1993).

    Article  Google Scholar 

  26. M. Roos and G. Meyer: Kristallstrukturen von NH4GaF4 und NH4GaF4·NH3. Zeitschr. Anorg. Allg. Chem. 625, 1843–1847 (1999).

    Article  CAS  Google Scholar 

  27. A. Jain, S.P. Ong, G. Hautier, W. Chen, W.D. Richards, S. Dacek, S. Cholia, D. Gunter, D. Skinner, and G. Ceder: Commentary: the materials project: a materials genome approach to accelerating materials innovation. APL Mater 1, 011002 (2013).

    Article  Google Scholar 

  28. M. de Jong, W. Chen, T. Angsten, A. Jain, R. Notestine, A. Gamst, M. Sluiter, C.K. Ande, S. van der Zwagg, J.J. Plata, and C. Toher: Charting the complete elastic properties of inorganic crystalline compounds. Sci. Data 2, 150009 (2015).

    Article  Google Scholar 

  29. L. Ward, A. Agrawal, A. Choudhary, and C. Wolverton: A general-purpose machine learning framework for predicting properties of inorganic materials. npj Comput. Mater. 2, 16028 (2016).

    Article  Google Scholar 

  30. F. Faber, A. Lindmaa, O.A. Von Lilienfeld, and R. Armiento: Crystal structure representations for machine learning models of formation energies. Int. J. Quantum Chem. 115, 1094–1101 (2015).

    Article  CAS  Google Scholar 

  31. G. Tzanis, C. Berberidis, and I. Vlahavas: Machine Learning and Data Mining in Bioinformatics. Machine Learning (IGI Global, Pennsylvania, 2011).

    Google Scholar 

  32. R.J. Urbanowicz, R.S. Olson, P. Schmitt, M. Meeker, and J.H. Moore: Benchmarking relief-based feature selection methods for bioinformatics data mining. J. Biomed. Inform. 85, 168–188 (2017).

    Article  Google Scholar 

  33. M.C. Swain and J.M. Cole: Chemdataextractor: a toolkit for automated extraction of chemical information from the scientific literature. J. Chem. Inf. Model. 56, 1894–1904 (2016).

    Article  CAS  Google Scholar 

  34. E. Kim, K. Huang, A. Saunders, A. McCallum, G. Ceder, and E. Olivetti: Materials synthesis insights from scientific literature via text extraction and machine learning. Chem. Mater 29, 9436–9444 (2017).

    Article  CAS  Google Scholar 

  35. W.H. Gomaa and A.A. Fahmy: A survey of text similarity approaches. Int. J. Comput. Appl. 68, 13–18 (2013).

    Google Scholar 

Download references

Acknowledgments

The authors acknowledge many useful discussions with Matt Horton regarding structure dimensionality. The authors additionally acknowledge Matt Horton for his work on the StructureGraph and BondedStructure components of pymatgen and for parsing the AFLOW prototype library. The authors acknowledge Evan Spotte-Smith for his work on the MoleculeGraph functionality in pymatgen. The authors acknowledge useful discussions with Leigh Weston regarding materials science text mining. The authors acknowledge Donny Winston for facilitating the calculation of robocrystallographer on all structures in the Materials Project database. The authors acknowledge useful conversations with Alex Dunn regarding machine learning model optimization. This work was intellectually led and funded by the U.S. Department of Energy (DOE) Basic Energy Sciences (BES) program—the Materials Project—under Grant No. KC23MP. Lawrence Berkeley National Laboratory is funded by the DOE under award DE-AC02-05CH11231.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Anubhav Jain.

Electronic supplementary material

Supplementary Material

Supplementary Material

The supplementary material for this article can be found at https://doi.org/10.1557/mrc.2019.94.

Rights and permissions

Reprints and Permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ganose, A.M., Jain, A. Robocrystallographer: automated crystal structure text descriptions and analysis. MRS Communications 9, 874–881 (2019). https://doi.org/10.1557/mrc.2019.94

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1557/mrc.2019.94