Skip to main content

Challenges and Advances in Information Extraction from Scientific Literature: a Review


Scientific articles have long been the primary means of disseminating scientific discoveries. Over the centuries, valuable data and potentially groundbreaking insights have been collected and buried deep in the mountain of publications. In materials engineering, such data are spread across technical handbooks specification sheets, journal articles, and laboratory notebooks in myriad formats. Extracting information from papers on a large scale has been a tedious and time-consuming job to which few researchers have wanted to devote their limited time and effort, yet is an activity that is essential for modern data-driven design practices. However, in recent years, significant progress has been made by the computer science community on techniques for automated information extraction from free text. Yet, transformative application of these techniques to scientific literature remains elusive—due not to a lack of interest or effort but to technical and logistical challenges. Using the challenges in the materials science literature as a driving motivation, we review the gaps between state-of-the-art information extraction methods and the practical application of such methods to scientific texts, and offer a comprehensive overview of work that can be undertaken to close these gaps.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6


  1. 1.

    E. Landhuis, Nature 535(7612), 457 (2016)

    Article  Google Scholar 

  2. 2.

    M. Ware, M. Mabe, The STM Report: An Overview of Scientific and Scholarly Journal Publishing (International Association of Scientific, Technical and Medical Publishers, Oxford, 2015)

    Google Scholar 

  3. 3.

    G. Olson, Scr. Mater. 70, 1 (2014)

    Article  Google Scholar 

  4. 4.

    J.J. de Pablo, N.E. Jackson, M.A. Webb, L.Q. Chen, J.E. Moore, D. Morgan, R. Jacobs, T. Pollock, D.G. Schlom, E.S. Toberer, J. Analytis, I. Dabo, D.M. DeLongchamp, G.A. Fiete, G.M. Grason, G. Hautier, Y. Mo, K. Rajan, E.J. Reed, E. Rodriguez, V. Stevanovic, J. Suntivich, K. Thornton, J.C. Zhao, NPJ Comput. Mater. 5, 1 (2019)

    Article  Google Scholar 

  5. 5.

    J. Brandrup, E.H. Immergut, E.A. Grulke (eds.), Polymer Handbook, 4th edn. (Wiley, Hoboken, 2004)

    Google Scholar 

  6. 6.

    S. Gražulis, D. Chateigner, R.T. Downs, A.F.T. Yokochi, M. Quirós, L. Lutterotti, E. Manakova, J. Butkus, P. Moeck, A.L. Bail, J. Appl. Crystallogr. 42(4), 726 (2009)

    Article  Google Scholar 

  7. 7.

    S. Kirklin, J.E. Saal, B. Meredig, A. Thompson, J.W. Doak, M. Aykol, S. Rühl, C. Wolverton, NPJ Comput. Mater. 1(1), 1 (2015)

    Article  Google Scholar 

  8. 8.

    C. Kim, A. Chandrasekaran, T.D. Huan, D. Das, R. Ramprasad, J. Phys. Chem. C 122(31), 17575 (2018)

    Article  Google Scholar 

  9. 9.

    A. Jain, S.P. Ong, G. Hautier, W. Chen, W.D. Richards, S. Dacek, S. Cholia, D. Gunter, D. Skinner, G. Ceder et al., APL Mater. 1(1), 011002 (2013)

    Article  Google Scholar 

  10. 10.

    C. Borkowski, J. Sperling Martin, J. Am. Soc. Inform. Sci. 26(2), 94 (1975)

    Article  Google Scholar 

  11. 11.

    F.B. Rogers, Bull. Med. Libr. Assoc. 52(1), 150 (1964)

    Google Scholar 

  12. 12.

    R.J. Roberts, Proc. Natl. Acad. Sci. 98(2), 381 (2001).

  13. 13.

    D.R. Swanson, N.R. Smalheiser, Artif. Intell. 91(2), 183 (1997)

    Article  Google Scholar 

  14. 14.

    L. Tanabe, U. Scherf, L. Smith, J. Lee, L. Hunter, J. Weinstein, Biotechniques 27(6), 1210 (1999)

    Article  Google Scholar 

  15. 15.

    E.A. Olivetti, J.M. Cole, E. Kim, O. Kononova, G. Ceder, T.Y.J. Han, A.M. Hiszpanski, Appl. Phys. Rev. 7(4), 041317 (2020)

    Article  Google Scholar 

  16. 16.

    O. Kononova, H. Huo, T. He, Z. Rong, T. Botari, W. Sun, V. Tshitoyan, G. Ceder, Sci. Data 6(1), 1 (2019)

    Article  Google Scholar 

  17. 17.

    S. Huang, J.M. Cole, Sci. Data 7(1), 1 (2020)

    Article  Google Scholar 

  18. 18. An annotation tool for AI, Machine Learning, and NLP. (2021). Accessed on 02 May 2021

  19. 19.

    C.A. Clark, S.K. Divvala, in AAAI Workshop: Scholarly Big Data, vol. 6 (2015)

  20. 20.

    Y. Liu, K. Bai, P. Mitra, C.L. Giles, in Proceedings of the 7th ACM/IEEE-CS Joint Conference on Digital Libraries (2007), p. 91

  21. 21.

    B. Gatos, D. Danatsas, I. Pratikakis, S.J. Perantonis, International Conference on Pattern Recognition and Image Analysis (Springer, New York, 2005), p. 609

    Google Scholar 

  22. 22.

    I. Kavasidis, C. Pino, S. Palazzo, F. Rundo, D. Giordano, P. Messina, C. Spampinato, International Conference on Image Analysis and Processing (Springer, New York, 2019), p. 292

    Google Scholar 

  23. 23.

    V. Tshitoyan, J. Dagdelen, L. Weston, A. Dunn, Z. Rong, O. Kononova, K.A. Persson, G. Ceder, A. Jain, Nature 571(7763), 95 (2019)

    Article  Google Scholar 

  24. 24.

    D. Nadeau, S. Sekine, Lingvist. Invest. 30(1), 3 (2007)

    Article  Google Scholar 

  25. 25.

    J. Li, A. Sun, J. Han, C. Li, IEEE Trans. Knowl. Data Eng. (2020)

  26. 26.

    Y. Zhu, R. Kiros, R. Zemel, R. Salakhutdinov, R. Urtasun, A. Torralba, S. Fidler, IEEE Int. Conf. Comput. Vis. (2015), p. 19

  27. 27.

    C. Sun, Z. Yang, L. Wang, Y. Zhang, H. Lin, J. Wang, J. Biomed. Inform. 103, 103392 (2020)

    Article  Google Scholar 

  28. 28.

    A. Yates, M. Banko, M. Broadhead, M.J. Cafarella, O. Etzioni, S. Soderland, Annual Conference of the North American Chapter of the Association for Computational Linguistics (2007), p. 25

  29. 29.

    F. Wu, D.S. Weld, in Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics (2010), p. 118

  30. 30.

    G. Angeli, M.J.J. Premkumar, C.D. Manning, in Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and 7th International Joint Conference on Natural Language Processing (2015), p. 344

  31. 31.

    E.F. Tjong Kim Sang, F. De Meulder, in Proceedings of the 7th Conference on Natural Language Learning at HLT-NAACL 2003 (2003), p. 142

  32. 32.

    Y. Zhang, V. Zhong, D. Chen, G. Angeli, C.D. Manning, in Conference on Empirical Methods in Natural Language Processing (2017), p. 35

  33. 33.

    PDFTron. PDF2Text. (2021). Accessed on 15 Feb 2021

  34. 34.

    C. Ramakrishnan, A. Patnia, E. Hovy, G.A. Burns, Source Code Biol. Med. 7(1), 1 (2012)

    Article  Google Scholar 

  35. 35.

    M.M. Mirończuk, Knowl. Inf. Syst. 54(3), 711 (2018)

    Article  Google Scholar 

  36. 36.

    R.B. Tchoua, K. Chard, D. Audus, J. Qin, J. de Pablo, I. Foster, Proc. Comput. Sci. 80, 386 (2016)

    Article  Google Scholar 

  37. 37.

    R.B. Tchoua, K. Chard, D.J. Audus, L.T. Ward, J. Lequieu, J.J. De Pablo, I.T. Foster, in IEEE 13th International Conference on e-Science (IEEE, 2017), p. 109

  38. 38.

    R. Tchoua, A. Ajith, Z. Hong, L. Ward, K. Chard, D. Audus, S. Patel, J. de Pablo, I. Foster, in Proceedings of the 15th International Conference on eScience (IEEE, 2019), p. 126

  39. 39.

    Z. Hong, R. Tchoua, K. Chard, I. Foster, in International Conference on Computational Science (Springer, 2020), p. 308

  40. 40.

    R. Tchoua, Z. Hong, D. Audus, S. Patel, L. Ward, K. Chard, J. De Pablo, I. Foster, Bull. Am. Phys. Soc. 65 (2020)

  41. 41.

    L. Von Ahn, B. Maurer, C. McMillen, D. Abraham, M. Blum, Science 321(5895), 1465 (2008)

    MathSciNet  Article  Google Scholar 

  42. 42.

    F. Hillen, B. Höfle, Int. J. Appl. Earth Obs. Geoinf. 40, 29 (2015)

    Article  Google Scholar 

  43. 43.

    S. Yan, W.S. Spangler, Y. Chen, IEEE/ACM Trans. Comput. Biol. Bioinf. 10(5), 1218 (2013)

    Article  Google Scholar 

  44. 44.

    A.J. Yepes, A. MacKinlay, N. Gunn, C. Schieber, N. Faux, M. Downton, B. Goudey, R.L. Martin, in AMIA Annual Symposium Proceedings, vol. 2018 (American Medical Informatics Association, 2018), vol. 2018, p. 616

  45. 45.

    K. Ganchev, F. Pereira, M. Mandel, S. Carroll, P. White, in Proceedings of the linguistic annotation workshop (2007), p. 53

  46. 46.

    Y. Jo, E. Mayfield, C. Reed, E. Hovy, in Proceedings of the 12th Language Resources and Evaluation Conference (2020), p. 1008

  47. 47.

    Z. Hong, J.G. Pauloski, L. Ward, K. Chard, B. Blaiszik, I. Foster, arXiv preprint arXiv:2101.04617 (2021)

  48. 48.

    K. Lybarger, M. Ostendorf, M. Yetisgen, J. Biomed. Inform. 113, 103631 (2021)

    Article  Google Scholar 

  49. 49.

    S.M. Swanberg, J. Med. Libr. Assoc. 105(1), 106 (2017)

    Google Scholar 

  50. 50.

    I. Beltagy, K. Lo, A. Cohan, in Conference on Empirical Methods in Natural Language Processing (2019)

  51. 51.

    M. Marcus, B. Santorini, M.A. Marcinkiewicz, Building a large annotated corpus of English: The Penn Treebank. Technical Report MS-CIS-93-8, University of Pennsylvania, Department of Computer and Information Science (1993)

  52. 52.

    K. Bontcheva, I. Roberts, L. Derczynski, S. Alexander-Eames, in Demonstrations at the 14th Conference of the European Chapter of the Association for Computational Linguistics (2014), p. 9

  53. 53.

    B.M. Good, M. Nanis, C. Wu, A.I. Su, Pacific Symposium on Biocomputing (World Scientific, Singapore, 2014), p. 282

    Google Scholar 

  54. 54.

    C.G. Northcutt, A. Athalye, J. Mueller, arXiv preprint arXiv:2103.14749 (2021)

  55. 55.

    R.B. Tchoua, J. Qin, D.J. Audus, K. Chard, I.T. Foster, J. de Pablo, J. Chem. Edu. 93(9), 1561 (2016)

    Article  Google Scholar 

  56. 56.

    M. Krallinger, O. Rabal, F. Leitner, M. Vazquez, D. Salgado, Z. Lu, R. Leaman, Y. Lu, D. Ji, D.M. Lowe, R.A. Sayle, R.T. Batista-Navarro, R. Rak, T. Huber, T. Rocktäschel, S. Matos, D. Campos, B. Tang, H. Xu, T. Munkhdalai, K.H. Ryu, S. Ramanan, S. Nathan, S. Žitnik, M. Bajec, L. Weber, M. Irmer, S.A. Akhondi, J.A. Kors, S. Xu, X. An, U.K. Sikdar, A. Ekbal, M. Yoshioka, T.M. Dieb, M. Choi, K. Verspoor, M. Khabsa, C.L. Giles, H. Liu, K.E. Ravikumar, A. Lamurias, F.M. Couto, H.J. Dai, R.T.H. Tsai, C. Ata, T. Can, A. Usié, R. Alves, I. Segura-Bedmar, P. Martínez, J. Oyarzabal, A. Valencia, J. Cheminform. 7(1), 1 (2015)

    Article  Google Scholar 

  57. 57.

    S. Mysore, Z. Jensen, E. Kim, K. Huang, H.S. Chang, E. Strubell, J. Flanigan, A. McCallum, E. Olivetti, in Proceedings of the 13th Linguistic Annotation Workshop (Association for Computational Linguistics, 2019), p. 56

  58. 58.

    A. Peskin, A. Dima, Integ. Mater. Manuf. Innov. 6(2), 187 (2017)

    Article  Google Scholar 

  59. 59.

    L. Von Ahn, Computer 39(6), 92 (2006)

    Article  Google Scholar 

  60. 60.

    A. Kawrykow, G. Roumanis, A. Kam, D. Kwak, C. Leung, C. Wu, E. Zarour, L. Sarmenta, M. Blanchette, J. Waldispühl, PLoS ONE 7(3), e31362 (2012)

    Article  Google Scholar 

  61. 61.

    B. Guillaume, K. Fort, N. Lefebvre, in International Conference on Computational Linguistics (2016)

  62. 62.

    H.A. Favre, W.H. Powell, Nomenclature of Organic Chemistry: IUPAC Recommendations and Preferred Names 2013 (Royal Society of Chemistry, London, 2013)

    Google Scholar 

  63. 63.

    H.L. Morgan, J. Chem. Doc. 5(2), 107 (1965)

    Article  Google Scholar 

  64. 64.

    C. Bizer, J. Lehmann, G. Kobilarov, S. Auer, C. Becker, R. Cyganiak, S. Hellmann, J. Web Sem. 7(3), 154 (2009)

    Article  Google Scholar 

  65. 65.

    B. Settles, Synth. Lect. Artif. Intell. Mach. Learn. 6(1), 1 (2012)

    Google Scholar 

  66. 66.

    A.R. Camacho, in Proceedings of the 14th IAPR International Workshop on Document Analysis Systems, vol. 12116 (Springer, 2020), p. 324

  67. 67.

    M. Mintz, S. Bills, R. Snow, D. Jurafsky, in Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP (2009), p. 1003

  68. 68.

    S. Riedel, L. Yao, A. McCallum, in Joint European Conference on Machine Learning and Knowledge Discovery in Databases (Springer, 2010), p. 148

  69. 69.

    M. Surdeanu, J. Tibshirani, R. Nallapati, C.D. Manning, in Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (2012), p. 455

  70. 70.

    T. Liu, K. Wang, B. Chang, Z. Sui, in Conference on Empirical Methods in Natural Language Processing (2017), p. 1790

  71. 71.

    W. Xu, R. Hoffmann, L. Zhao, R. Grishman, in Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers) (2013), p. 665

  72. 72.

    T. Onishi, T. Kadohira, I. Watanabe, Sci. Technol. Adv. Mater. 19(1), 649 (2018)

    Article  Google Scholar 

  73. 73.

    K. Ravikumar, H. Liu, J.D. Cohn, M.E. Wall, K. Verspoor, J. Biomed. Sem. 3(3), 1 (2012)

    Google Scholar 

  74. 74.

    C. Quirk, H. Poon, in Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics (2017), p. 1171

  75. 75.

    D. Buscaldi, D. Dessì, E. Motta, F. Osborne, D.R. Recupero, in European Semantic Web Conference (Springer, 2019), p. 8

  76. 76.

    A. Fader, S. Soderland, O. Etzioni, in Conference on Empirical Methods in Natural Language Processing (2011), p. 1535

  77. 77.

    S. Soderland, B. Roof, B. Qin, S. Xu, O. Etzioni, AI Mag. 31(3), 93 (2010)

    Google Scholar 

  78. 78.

    Y. Luan, L. He, M. Ostendorf, H. Hajishirzi, in Conference on Empirical Methods in Natural Language Processing (2018), p. 3219

  79. 79.

    R. Kruiper, J.F. Vincent, J. Chen-Burger, M.P. Desmulliez, I. Konstas, arXiv preprint arXiv:2005.07751 (2020)

  80. 80.

    K. White, Publications output: US trends and international comparisons. Technical report, National Science Foundation (2019).

  81. 81.

    E. Riloff, in Proceedings of the 11th National Conference on Artificial Intelligence (1993), p. 811

  82. 82.

    S. Soderland, Mach. Learn. 34(1), 233 (1999)

    Article  Google Scholar 

  83. 83.

    E. Murphy, Ensemble labeling towards scientific information extraction (ELSIE). Ph.D. thesis, College of Computing and Digital Media (2020)

  84. 84.

    I. Hendrickx, S.N. Kim, Z. Kozareva, P. Nakov, D. Ó Séaghdha, S. Padó, M. Pennacchiotti, L. Romano, S. Szpakowicz, in Proceedings of the 5th International Workshop on Semantic Evaluation (Association for Computational Linguistics, 2010), p. 33

  85. 85.

    D.D.A. Bui, G. Del Fiol, S. Jonnalagadda, J. Biomed. Inform. 61, 141 (2016)

    Article  Google Scholar 

  86. 86.

    C. Blaschke, L. Hirschman, A. Valencia, Brief. Bioinform. 3(2), 154 (2002)

    Article  Google Scholar 

  87. 87.

    K.B. Cohen, K. Verspoor, H.L. Johnson, C. Roeder, P. Ogren, W.A. Baumgartner, E. White, L. Hunter, in BioNLP 2009 Workshop Companion Volume for Shared Task (2009), p. 50

  88. 88.

    Q.L. Nguyen, D. Tikk, U. Leser, J. Biomed. Sem. 1(1), 1 (2010)

    Article  Google Scholar 

  89. 89.

    V. Pillet, Méthodologie d’extraction automatique d’information à partir de la littérature scientifique en vue d’alimenter un nouveau système d’information: application à la génétique moléculaire pour l’extraction d’information sur les interactions. Ph.D. thesis, Univ. d’Aix-Marseille 3 (2000)

  90. 90.

    J.R. Quinlan, C4.5: Programs for Machine Learning (Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 1993)

  91. 91.

    C. Nédellec, M.O.A. Vetah, P. Bessieres, in European Conference on Principles of Data Mining and Knowledge Discovery (Springer, 2001), p. 326

  92. 92.

    A.H. Aliwy, E.A. Ameer, Int. J. Appl. Eng. Res. 12(14), 4309 (2017)

    Google Scholar 

  93. 93.

    E. Riloff, J. Wiebe, W. Phillips, in AAAI (2005), p. 1106

  94. 94.

    E. Riloff, J. Wiebe, T. Wilson, in Proceedings of the 7th Conference on Natural Language Learning (2003), p. 25

  95. 95.

    J. Wiebe, E. Riloff, in International Conference on Intelligent Text Processing and Computational Linguistics (Springer, 2005), p. 486

  96. 96.

    J. Wiebe, E. Riloff, IEEE Trans. Affect. Comput. 2(4), 175 (2011)

    Article  Google Scholar 

  97. 97.

    A. Ratner, S.H. Bach, H. Ehrenberg, J. Fries, S. Wu, C. Ré, Int. Conf. Very Large Data Bases 11(3), 269 (2017)

    Google Scholar 

  98. 98.

    A.J. Ratner, S.H. Bach, H.R. Ehrenberg, C. Ré, in ACM International Conference on Management of Data (2017), p. 1683

  99. 99.

    E.F. Sang, F. De Meulder, arXiv preprint cs/0306050 (2003)

  100. 100.

    R. Weischedel, S. Pradhan, L. Ramshaw, M. Palmer, N. Xue, M. Marcus, A. Taylor, C. Greenberg, E. Hovy, R. Belvin, A. Houston, OntoNotes Release 5.0. Web download, Linguistic Data Consortium (2013).

  101. 101.

    J. Pennington, R. Socher, C.D. Manning, in Conference on Empirical Methods in Natural Language Processing (2014), p. 1532

  102. 102.

    T. Mikolov, E. Grave, P. Bojanowski, C. Puhrsch, A. Joulin, in International Conference on Language Resources and Evaluation (2018)

  103. 103.

    J. Devlin, M.W. Chang, K. Lee, K. Toutanova, in Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Association for Computational Linguistics, 2019), p. 4171

  104. 104.

    C. Rosset, Microsoft Research Blog (2020).

  105. 105.

    H. Saif, M. Fernandez, Y. He, H. Alani, in Proceedings of the 1st International Workshop on Emotion and Sentiment in Social and Expressive Media: Approaches and Perspectives from AI (2013)

  106. 106.

    A.L. Maas, R.E. Daly, P.T. Pham, D. Huang, A.Y. Ng, C. Potts, in Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies (Association for Computational Linguistics, 2011), p. 142

  107. 107.

    H. Elsahar, P. Vougiouklis, A. Remaci, C. Gravier, J. Hare, F. Laforest, E. Simperl, in Proceedings of the 11th International Conference on Language Resources and Evaluation (European Language Resources Association, 2018)

  108. 108.

    W. Sun, X. Peng, X. Wan, in Proceedings of the Sixth International Joint Conference on Natural Language Processing (2013), p. 180

  109. 109.

    A. Trask, P. Michalak, J. Liu, arXiv preprint arXiv:1511.06388 (2015)

  110. 110.

    P. Groth, M. Lauruhn, A. Scerri, R. Daniel, arXiv preprint arXiv:1802.05574 (2018)

  111. 111.

    E. Kim, K. Huang, A. Tomala, S. Matthews, E. Strubell, A. Saunders, A. McCallum, E. Olivetti, Sci. Data 4(1), 1 (2017)

    Google Scholar 

  112. 112.

    E. Kim, Z. Jensen, A. van Grootel, K. Huang, M. Staib, S. Mysore, H.S. Chang, E. Strubell, A. McCallum, S. Jegelka, E. Olivetti, J. Chem. Inf. Model. 60(3), 1194 (2020)

    Article  Google Scholar 

  113. 113.

    D.S. Maitra, U. Bhattacharya, S.K. Parui, in Proceedings of the 13th International Conference on Document Analysis and Recognition (IEEE, 2015), p. 1021

  114. 114.

    Y. Wu, M. Schuster, Z. Chen, Q.V. Le, M. Norouzi, W. Macherey, M. Krikun, Y. Cao, Q. Gao, K. Macherey, J. Klingner, A. Shah, M. Johnson, X. Liu, L. Kaiser, S. Gouws, Y. Kato, T. Kudo, H. Kazawa, K. Stevens, G. Kurian, N. Patil, W. Wang, C. Young, J. Smith, J. Riesa, A. Rudnick, O. Vinyals, G. Corrado, M. Hughes, J. Dean, arXiv preprint arXiv:1609.08144 (2016)

  115. 115.

    C.B. Do, A.Y. Ng, Adv. Neural. Inf. Process. Syst. 18, 299 (2005)

    Google Scholar 

  116. 116.

    M. Raghu, C. Zhang, J. Kleinberg, S. Bengio, in Proceedings of the 33rd Conference on Neural Information Processing Systems (2019)

  117. 117.

    H. Yamada, C. Liu, S. Wu, Y. Koyama, S. Ju, J. Shiomi, J. Morikawa, R. Yoshida, ACS Cent. Sci. 5(10), 1717 (2019)

    Article  Google Scholar 

  118. 118.

    Y. Gong, H. Shao, J. Luo, Z. Li, Compos. Struct. 252, 112681 (2020)

    Article  Google Scholar 

  119. 119.

    T. Mikolov, K. Chen, G. Corrado, J. Dean, arXiv preprint arXiv:1301.3781 (2013)

  120. 120.

    T. Mikolov, I. Sutskever, K. Chen, G. Corrado, J. Dean, in Proceedings of the 26th International Conference on Neural Information Processing Systems (2013), p. 3111

  121. 121.

    T.B. Brown, B. Mann, N. Ryder, M. Subbiah, J. Kaplan, P. Dhariwal, A. Neelakantan, P. Shyam, G. Sastry, A. Askell, et al., arXiv preprint arXiv:2005.14165 (2020)

  122. 122.

    Google. Google News Word2Vec. (2021). Accessed 07 Apr 2021

  123. 123.

    É. Grave, P. Bojanowski, P. Gupta, A. Joulin, T. Mikolov, in Proceedings of the 11th International Conference on Language Resources and Evaluation (2018)

  124. 124.

    I. Sutskever, O. Vinyals, Q.V. Le, arXiv preprint arXiv:1409.3215 (2014)

  125. 125.

    A.H. Larsen, J.J. Mortensen, J. Blomqvist, I.E. Castelli, R. Christensen, M. Dułak, J. Friis, M.N. Groves, B. Hammer, C. Hargus, E.D. Hermes, P.C. Jennings, P.B. Jensen, J. Kermode, J.R. Kitchin, E.L. Kolsbjerg, J. Kubal, K. Kaasbjerg, S. Lysgaard, J.B. Maronsson, T. Maxson, T. Olsen, L. Pastewka, A. Peterson, C. Rostgaard, J. Schiøtz, O. Schütt, M. Strange, K.S. Thygesen, T. Vegge, L. Vilhelmsen, M. Walter, Z. Zeng, K.W. Jacobsen, J. Phys. Condens. Matter 29(27), 273002 (2017).

    Article  Google Scholar 

  126. 126.

    M.C. Swain, J.M. Cole, J. Chem. Inf. Model. 56(10), 1894 (2016)

    Article  Google Scholar 

  127. 127.

    S.R. Hall, F.H. Allen, I.D. Brown, Acta Crystallogr. A 47(6), 655 (1991)

    Article  Google Scholar 

  128. 128.

    C. Draxl, M. Scheffler, MRS Bull. 43(9), 676 (2018)

    Article  Google Scholar 

  129. 129.

    B. Blaiszik, K. Chard, J. Pruyne, R. Ananthakrishnan, S. Tuecke, I. Foster, J. Mater. (2016)

  130. 130.

    B. Blaiszik, L. Ward, M. Schwarting, J. Gaff, R. Chard, D. Pike, K. Chard, I. Foster, MRS Commun. 9(4), 1125 (2019)

    Article  Google Scholar 

  131. 131.

    M.R. Seringhaus, M.B. Gerstein, BMC Bioinform. 8(1), 1 (2007)

    Article  Google Scholar 

  132. 132.

    B. Mons, H. van Haagen, C. Chichester, J.T. den Dunnen, G. van Ommen, E. van Mulligen, B. Singh, R. Hooft, M. Roos, J. Hammond et al., Nat. Genet. 43(4), 281 (2011)

    Article  Google Scholar 

  133. 133.

    M. Frenkel, R.D. Chiroco, V. Diky, Q. Dong, K.N. Marsh, J.H. Dymond, W.A. Wakeham, S.E. Stein, E. Königsberger, A.R.H. Goodwin, Pure Appl. Chem. 78(3), 541 (2006).

    Article  Google Scholar 

  134. 134.

    C.W. Andersen, R. Armiento, E. Blokhin, G.J. Conduit, S. Dwaraknath, M.L. Evans, Á. Fekete, A. Gopakumar, S. Gražulis, A. Merkys, F. Mohamed, C. Oses, G. Pizzi, G.M. Rignanese, M. Scheidgen, L. Talirz, C. Toher, D. Winston, R. Aversa, K. Choudhary, P. Colinet, S. Curtarolo, D.D. Stefano, C. Draxl, S. Er, M. Esters, M. Fornari, M. Giantomassi, M. Govoni, G. Hautier, V. Hegde, M.K. Horton, P. Huck, G. Huhs, J. Hummelshøj, A. Kariryaa, B. Kozinsky, S. Kumbhar, M. Liu, N. Marzari, A.J. Morris, A.A. Mostofi, K.A. Persson, G. Petretto, T. Purcell, F. Ricci, F. Rose, M. Scheffler, D. Speckhard, M. Uhrin, A. Vaitkus, P. Villars, D. Waroquiers, C. Wolverton, M. Wu, X. Yang, Sci. Data 8, 1 (2021).

    Article  Google Scholar 

  135. 135.

    L. Ward, M. Aykol, B. Blaiszik, I. Foster, B. Meredig, J. Saal, S. Suram, MRS Bull. 43(9), 683 (2018).

    Article  Google Scholar 

  136. 136.

    D. Metzler, Y. Tay, D. Bahri, M. Najork, arXiv preprint arXiv:2105.02274 (2021)

Download references


This work was performed under financial assistance award 70NANB19H005 from the US Department of Commerce, National Institute of Standards and Technology, as part of the Center for Hierarchical Materials Design (CHiMaD), and was also supported in part by the US Department of Energy, Office of Science, Advanced Scientific Computing Research, under Contract DE-AC02-06CH11357, and by the Joint Center for Energy Storage Research (JCESR), an Energy Innovation Hub funded by the US Department of Energy (DOE), Office of Science, Office of Basic Energy Sciences.

Author information



Corresponding author

Correspondence to Logan Ward.

Ethics declarations

Conflict of Interest

On behalf of all authors, the corresponding author states that there are no conflicts of interest.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Hong, Z., Ward, L., Chard, K. et al. Challenges and Advances in Information Extraction from Scientific Literature: a Review. JOM (2021).

Download citation


  • Information extraction
  • Text mining
  • Scientific data