Skip to main content
Log in

A picture is worth a thousand words: applying natural language processing tools for creating a quantum materials database map

  • Artificial Intelligence Research Letter
  • Published:
MRS Communications Aims and scope Submit manuscript

Abstract

This paper demonstrates the application of Natural Language Processing (NLP) tools to explore large libraries of documents and to correlate heuristic associations between text descriptions in figure captions with interpretations of images and figures. The use of visualization tools based on NLP methods permits one to quickly assess the extent of the research described in the literature related to a specific topic. The authors demonstrate how the use of NLP methods on only the figure captions without having to navigate the entire text of a document can provide an accelerated assessment of the literature in a given domain.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Figure 1
Figure 2
Figure 3
Figure 4
Figure 5

Similar content being viewed by others

References

  1. E. Kim, K. Huang, A. Tomala, S. Matthews, E. Strubell, A. Saunders, A. McCallum, and E. Olivetti: Machine-learned and codified synthesis parameters of oxide materials. Sci. Data 127, 170127 (2017).

    Article  Google Scholar 

  2. P. Murray-Rust and H.S. Rzepa: Chemical markup, XML, and the world wide web. 4. CML schema. J. Chem. Inf. Comput. Sci. 43, 757–772 (2003).

    Article  CAS  Google Scholar 

  3. H.E. Pence and A. Williams: Chemspider: an online chemical information resource. J. Chem. Educ. 87, 1123–1124 (2010).

    Article  CAS  Google Scholar 

  4. R. Sheshadri and T.D. Sparks: Perspective: interactive material databases through aggregation of literature data. APL Mater 4, 053206 (2016).

    Article  Google Scholar 

  5. L.C. Lin, A.H. Berger, R.L. Martin, J. Kim, J.A. Swisher, K. Jariwala, C.H. Rycroft, A.S. Bhown, M.W. Deem, M. Haranczyk, and B. Smit: In silico screening of carbon capture materials. Nat. Mater 11, 633–641 (2012).

    Article  CAS  Google Scholar 

  6. A.O. Oliynyk, E. Antono, T.D. Sparks, L. Ghadbeigi, M.W. Gaultois, B. Meredig, and A. Mar: High throughput machine learning driven synthesis of full-Heusler compounds. Chem. Mater 28, 7324–7331 (2016).

    Article  CAS  Google Scholar 

  7. E.O. Pyzer‐Knapp, K. Li, and A. Aspuru-Guzik: Learning from the Harvard Clean Energy Project: the use of neural networks to accelerate materials discovery. Adv. Funct. Mater. 25, 649–6502 (2015).

    Article  Google Scholar 

  8. B.G. Sumpter, R.K. Vasudevan, T. Potok, and S.V. Kalinin: A bridge for accelerating materials by design. NPJ Comp. Mater 1, 15008 (2015).

    Article  CAS  Google Scholar 

  9. T. Rocktaschel, M. Weidlich, and U. Leser: ChemSport: a hybrid system for chemical named entity recognition. Bioinformatics 28, 1633–1640 (2012).

    Article  Google Scholar 

  10. C.E. Wilmer, M. Leaf, C.Y. Lee, O.K. Farha, B.G. Hauser, J.T. Hupp, and R. Q. Snurr: Large scale screening of hypothetical metal-organic frameworks. Nat. Chem. 4, 83–89 (2011).

    Article  Google Scholar 

  11. E. Kim, K. Huang, J. Stefanie, and E. Olivetti: Virtual screening of inorganic materials synthesis parameters with deep learning. NPJ Comp. Mater 3, 53 (2017).

    Article  Google Scholar 

  12. M.C. Swain and J.M. Cole: ChemDataExtractor: a toolkit for automated extraction of chemical information from the scientific literature. J. Chem. Inf. Model. 56, 1894–1904 (2016).

    Article  CAS  Google Scholar 

  13. C.J. Callum and J.M. Cole: Auto-generated materials database of Curie and Neel temperatures via semi-supervised relationship extraction. Sci. Data 5, 180111 (2018).

    Article  Google Scholar 

  14. N.P. Bansal and J. Lamon: Ceramic Matrix Composites: Materials, Modelling, and Technology (John Wiley & Sons, Hoboken, NJ, 2016).

    Google Scholar 

  15. M. Sato and Y. Ando: Topological Superconductors: a review. Rep. Prog. Phys 80, 076501 (2017).

    Article  Google Scholar 

  16. Elsevier: Elsevier Developers. (2018). https://dev.elsevier.com/ (cited 2018).

    Google Scholar 

  17. A. Torralba, R. Fergus, and W.T. Freeman: 80 Million tiny images: a large dataset for non-parametric object and scene recognition. IEEE Trans. Pattern Anal. Mach. Intell. 30, 1958–1970 (2008).

    Article  Google Scholar 

  18. J. Deng, W. Dong, R. Socher, L.J. Li, K. Li, and L. Fei-Fei: ImageNet: a large scale hierarchial image database. In 2009 IEEE Conference on Computer Vision and Pattern Recognition, 2009; pp. 248–255.

    Chapter  Google Scholar 

  19. K.S. Jones: A statistical interpretation of term specificity and its application in retrieval. J. Doc 28, 11–21 (1972).

    Article  Google Scholar 

  20. L. van der Maaten and G. Hinton: Visualizing Data using t-SNE. Journal of Machine Learning Research 9, 2579–2605 (2008).

    Google Scholar 

  21. Yoichi Ando and Liang Fu: Topological Crystalline Insulators and Topological Superconductors: From Concepts to Materials. Annual Review of Condensed Matter Physics 6(1), 361–381 (2015). http://dx.doi.org/10.1146/annurev-conmatphys-031214-014501.

    Article  CAS  Google Scholar 

  22. Rabia Sultana, P. Neha, R. Goyal, S. Patnaik, and V.P.S. Awana: Unusual non saturating Giant Magneto-resistance in single crystalline Bi 2 Te 3 topological insulator. Journal of Magnetism and Magnetic Materials 428, 213–218 (2017). http://dx.doi.org/10.1016/j.jmmm.2016.12.011.

    Article  CAS  Google Scholar 

  23. A.F Goncharov and V.V Struzhkin: Pressure dependence of the Raman spectrum, lattice parameters and superconducting critical temperature of MgB2: evidence for pressure-driven phonon-assisted electronic topo-logical transition. Physica C: Superconductivity 385(1-2), 117–130 (2003). http://dx.doi.org/10.1016/S0921-4534(02)02311-0.

    Article  CAS  Google Scholar 

  24. C.C. Chang, T.K. Chen, W.C. Lee, P.H. Lin, M.J. Wang, Y.C. Wen, P.M. Wu, and M.K. Wu: Superconductivity in Fe-chalcogenides. Physica C: Superconductivity and its Applications 514, 423–434 (2015). http://dx.doi.org/10.1016/j.physc.2015.02.011.

    Article  CAS  Google Scholar 

  25. A. Andrada-Chacón, V.G. Baonza, and J. Sánchez-Benítez: Correlation between electrical resistance and defect concentration in graphite under non-hydrostatic stress. Carbon 113, 20–211 (2017). http://dx.doi.org/10.1016/j.carbon.2016.11.058.

    Article  Google Scholar 

  26. Marianna V. Kharlamova: Advances in tailoring the electronic properties of single-walled carbon nanotubes. Progress in Materials Science 77, 12–211 (2016). http://dx.doi.org/10.1016/j.pmatsci.2015.09.001.

    Article  Google Scholar 

  27. Francesco Bonaccorso, Antonio Lombardo, Tawfique Hasan, Zhipei Sun, Luigi Colombo, and Andrea C. Ferrari: Production and processing of gra-phene and 2d crystals. Materials Today 15(12), 564–589 (2012). http://dx.doi.org/10.1016/S1369-7021(13)70014-2.

    Article  CAS  Google Scholar 

  28. Yu. A. Freiman and H.J. Jodl: Solid oxygen. Physics Reports 401(1-4), 1–228 (2004). http://dx.doi.org/10.1016/j.physrep.2004.06.002.

    Article  CAS  Google Scholar 

  29. Marc D. Fontana and Patrice Bourson: Microstructure and defects probed by Raman spectroscopy in lithium niobate crystals and devices. Applied Physics Reviews 2(4), 040602 (2015). http://dx.doi.org/10.1063/1. 4934203.

    Article  Google Scholar 

Download references

Acknowledgments

We gratefully acknowledge support from the National Science Foundation (NSF) DIBBs program, award number 1640867. The authors would also like to acknowledge support from the Toyota Research Institute Accelerated Materials Design and Discovery program. K.R. acknowledges the Erich Bloch Endowed Chair at the University at Buffalo.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Krishna Rajan.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Venugopal, V., Broderick, S.R. & Rajan, K. A picture is worth a thousand words: applying natural language processing tools for creating a quantum materials database map. MRS Communications 9, 1134–1141 (2019). https://doi.org/10.1557/mrc.2019.136

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1557/mrc.2019.136

Navigation