Skip to main content

Automated Mining of Leaderboards for Empirical AI Research

Part of the Lecture Notes in Computer Science book series (LNISA,volume 13133)


With the rapid growth of research publications, empowering scientists to keep an oversight over scientific progress is of paramount importance. In this regard, the leaderboards facet of information organization provides an overview on the state-of-the-art by aggregating empirical results from various studies addressing the same research challenge. Crowdsourcing efforts like PapersWithCode among others are devoted to the construction of leaderboards predominantly for various subdomains in Artificial Intelligence. Leaderboards provide machine-readable scholarly knowledge that has proven to be directly useful for scientists to keep track of research progress – their construction could be greatly expedited with automated text mining.

This study presents a comprehensive approach for generating leaderboards for knowledge-graph-based scholarly information organization. Specifically, we investigate the problem of automated leaderboard construction using state-of-the-art transformer models, viz. Bert, SciBert, and XLNet. Our analysis reveals an optimal approach that significantly outperforms existing baselines for the task with evaluation scores above 90% in F1. This, in turn, offers new state-of-the-art results for leaderboard extraction. As a result, a vast share of empirical AI research can be organized in the next-generation digital libraries as knowledge graphs.


  • Table mining
  • Information extraction
  • Scholarly text mining
  • Knowledge graphs
  • Neural machine learning

This is a preview of subscription content, access via your institution.

Buying options

USD   29.95
Price excludes VAT (USA)
  • DOI: 10.1007/978-3-030-91669-5_35
  • Chapter length: 18 pages
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
USD   89.00
Price excludes VAT (USA)
  • ISBN: 978-3-030-91669-5
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
Softcover Book
USD   119.99
Price excludes VAT (USA)
Fig. 1.
Fig. 2.


  1. 1.

  2. 2.

  3. 3.,

  4. 4.

    They also evaluated the extracting the best score as an automated task which proved very challenging owing to inconsistency with which the best scores are reported and thereby the inability of pdf-to-text extractors to mine the data effectively.

  5. 5.

  6. 6.

    Our corpus was downloaded from the PwC Github repository and was constructed by combining the information in the files All papers with abstracts and Evaluation tables which included article urls and TDM crowdsourced annotation metadata.

  7. 7.


  1. AI metrics. Accessed 26 Apr 2021

  2. Natural Language Inference. Accessed 22 Apr 2021

  3. Nlp-progress. Accessed 26 Apr 2021

  4. Accessed 26 Apr 2021

  5. Reddit sota. Accessed 26 Apr 2021

  6. Squad explorer. Accessed 26 Apr 2021

  7. Anteghini, M., D’Souza, J., Dos Santos, V.A.M., Auer, S.: SciBERT-based semantification of bioassays in the open research knowledge graph. In: EKAW-PD 2020, pp. 22–30 (2020)

    Google Scholar 

  8. Anteghini, M., D’Souza, J., Martins dos Santos, V.A.P., Auer, S.: Representing semantified biological assays in the open research knowledge graph. In: Ishita, E., Pang, N.L.S., Zhou, L. (eds.) ICADL 2020. LNCS, vol. 12504, pp. 89–98. Springer, Cham (2020).

    CrossRef  Google Scholar 

  9. Auer, S.: Towards an open research knowledge graph, January 2018.

  10. Augenstein, I., Das, M., Riedel, S., Vikraman, L., McCallum, A.: SemEval 2017 task 10: ScienceIE - extracting keyphrases and relations from scientific publications. In: SemEval@ACL (2017)

    Google Scholar 

  11. Beltagy, I., Lo, K., Cohan, A.: SciBERT: a pretrained language model for scientific text. arXiv preprint arXiv:1903.10676 (2019)

  12. Brack, A., D’Souza, J., Hoppe, A., Auer, S., Ewerth, R., et al.: Domain-independent extraction of scientific concepts from research articles. In: Jose, J.M. (ed.) ECIR 2020. LNCS, vol. 12035, pp. 251–266. Springer, Cham (2020).

    CrossRef  Google Scholar 

  13. Chiarelli, A., Johnson, R., Richens, E., Pinfield, S.: Accelerating scholarly communication: the transformative role of preprints (2019)

    Google Scholar 

  14. Dai, Z., Yang, Z., Yang, Y., Carbonell, J.G., Le, Q., Salakhutdinov, R.: Transformer-XL: attentive language models beyond a fixed-length context. In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pp. 2978–2988 (2019)

    Google Scholar 

  15. Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)

  16. D’Souza, J., Auer, S., Pedersen, T.: SemEval-2021 task 11: NLPcontributiongraph - structuring scholarly NLP contributions for a research knowledge graph. In: Proceedings of the Fifteenth Workshop on Semantic Evaluation. Association for Computational Linguistics, Bangkok, August 2021

    Google Scholar 

  17. D’Souza, J., Auer, S., Pederson, T.: SemEval-2021 task 11: NLPContributionGraph - structuring scholarly NLP contributions for a research knowledge graph, May 2021.

  18. D’Souza, J., Hoppe, A., Brack, A., Jaradeh, M.Y., Auer, S., Ewerth, R.: The STEM-ECR dataset: grounding scientific entity references in stem scholarly content to authoritative encyclopedic and lexicographic sources. In: LREC, Marseille, France, pp. 2192–2203, May 2020

    Google Scholar 

  19. D’Souza, J., Auer, S.: Sentence, phrase, and triple annotations to build a knowledge graph of natural language processing contributions–a trial dataset. J. Data Inf. Sci. 20210429 (2021)

    Google Scholar 

  20. Gábor, K., Buscaldi, D., Schumann, A.K., QasemiZadeh, B., Zargayouna, H., Charnois, T.: SemEval-2018 task 7: semantic relation extraction and classification in scientific papers. In: Proceedings of The 12th International Workshop on Semantic Evaluation, pp. 679–688 (2018)

    Google Scholar 

  21. Ghasemi-Gol, M., Szekely, P.: TabVec: table vectors for classification of web tables. arXiv preprint arXiv:1802.06290 (2018)

  22. Handschuh, S., QasemiZadeh, B.: The ACL RD-TEC: a dataset for benchmarking terminology extraction and classification in computational linguistics. In: COLING 2014: 4th International Workshop on Computational Terminology (2014)

    Google Scholar 

  23. Herzig, J., Nowak, P.K., Mueller, T., Piccinno, F., Eisenschlos, J.: TaPas: weakly supervised table parsing via pre-training. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 4320–4333 (2020)

    Google Scholar 

  24. Hou, Y., Jochim, C., Gleize, M., Bonin, F., Ganguly, D.: Identification of tasks, datasets, evaluation metrics, and numeric scores for scientific leaderboards construction. arXiv preprint arXiv:1906.09317 (2019)

  25. Hou, Y., Jochim, C., Gleize, M., Bonin, F., Ganguly, D.: TDMSci: a specialized corpus for scientific literature entity tagging of tasks datasets and metrics. In: Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume, pp. 707–714 (2021)

    Google Scholar 

  26. Jain, S., van Zuylen, M., Hajishirzi, H., Beltagy, I.: SciREX: a challenge dataset for document-level information extraction. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 7506–7516 (2020)

    Google Scholar 

  27. Jaradeh, M.Y., et al.: Open research knowledge graph: next generation infrastructure for semantic scholarly knowledge. In: Proceedings of the 10th International Conference on Knowledge Capture, pp. 243–246 (2019)

    Google Scholar 

  28. Jiang, M., D’Souza, J., Auer, S., Downie, J.S.: Improving scholarly knowledge representation: evaluating BERT-based models for scientific relation classification. In: Ishita, E., Pang, N.L.S., Zhou, L. (eds.) ICADL 2020. LNCS, vol. 12504, pp. 3–19. Springer, Cham (2020).

    CrossRef  Google Scholar 

  29. Jinha, A.E.: Article 50 million: an estimate of the number of scholarly articles in existence. Learn. Publ. 23(3), 258–263 (2010)

    CrossRef  Google Scholar 

  30. Kardas, M., et al.: AxCell: automatic extraction of results from machine learning papers. In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 8580–8594 (2020)

    Google Scholar 

  31. Kononova, O., et al.: Text-mined dataset of inorganic materials synthesis recipes. Sci. Data 6(1), 1–11 (2019)

    CrossRef  MathSciNet  Google Scholar 

  32. Kulkarni, C., Xu, W., Ritter, A., Machiraju, R.: An annotated corpus for machine reading of instructions in wet lab protocols. In: NAACL: HLT, Volume 2 (Short Papers), New Orleans, Louisiana, pp. 97–106, June 2018.

  33. Kuniyoshi, F., Makino, K., Ozawa, J., Miwa, M.: Annotating and extracting synthesis process of all-solid-state batteries from scientific literature. In: LREC, pp. 1941–1950 (2020)

    Google Scholar 

  34. Liu, Y., Bai, K., Mitra, P., Giles, C.L.: TableSeer: automatic table metadata extraction and searching in digital libraries. In: Proceedings of the 7th ACM/IEEE-CS Joint Conference on Digital Libraries, pp. 91–100 (2007)

    Google Scholar 

  35. Lopez, P.: GROBID: combining automatic bibliographic data recognition and term extraction for scholarship publications. In: Agosti, M., Borbinha, J., Kapidakis, S., Papatheodorou, C., Tsakonas, G. (eds.) ECDL 2009. LNCS, vol. 5714, pp. 473–474. Springer, Heidelberg (2009).

    CrossRef  Google Scholar 

  36. Luan, Y., He, L., Ostendorf, M., Hajishirzi, H.: Multi-task identification of entities, relations, and coreference for scientific knowledge graph construction. arXiv preprint arXiv:1808.09602 (2018)

  37. Luan, Y., He, L., Ostendorf, M., Hajishirzi, H.: Multi-task identification of entities, relations, and coreference for scientific knowledge graph construction. In: EMNLP (2018)

    Google Scholar 

  38. Manning, C.D.: Computational linguistics and deep learning. Comput. Linguist. 41(4), 701–707 (2015)

    CrossRef  MathSciNet  Google Scholar 

  39. Milosevic, N., Gregson, C., Hernandez, R., Nenadic, G.: A framework for information extraction from tables in biomedical literature. Int. J. Doc. Anal. Recognit. 22(1), 55–78 (2019).

    CrossRef  Google Scholar 

  40. Mondal, I., Hou, Y., Jochim, C.: End-to-end NLP knowledge graph construction. arXiv preprint arXiv:2106.01167 (2021)

  41. Mysore, S., et al.: The materials science procedural text corpus: annotating materials synthesis procedures with shallow semantic structures. In: Proceedings of the 13th Linguistic Annotation Workshop, pp. 56–64 (2019)

    Google Scholar 

  42. Oelen, A., Stocker, M., Auer, S.: Crowdsourcing scholarly discourse annotations. In: 26th International Conference on Intelligent User Interfaces, pp. 464–474 (2021)

    Google Scholar 

  43. Renear, A.H., Palmer, C.L.: Strategic reading, ontologies, and the future of scientific publishing. Science 325(5942), 828–832 (2009)

    CrossRef  Google Scholar 

  44. Vaswani, A., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems, pp. 5998–6008 (2017)

    Google Scholar 

  45. Ware, M., Mabe, M.: The STM report: an overview of scientific and scholarly journal publishing, March 2015

    Google Scholar 

  46. Wei, X., Croft, B., Mccallum, A.: Table extraction for answer retrieval. Inf. Retr. 9(5), 589–611 (2006).

    CrossRef  Google Scholar 

  47. Yang, Z., Dai, Z., Yang, Y., Carbonell, J., Salakhutdinov, R., Le, Q.V.: XLNet: generalized autoregressive pretraining for language understanding. arXiv preprint arXiv:1906.08237 (2019)

Download references


This work was co-funded by the Federal Ministry of Education and Research (BMBF) of Germany for the project LeibnizKILabor (grant no. 01DD20003) and by the European Research Council for the project ScienceGRAPH (Grant agreement ID: 819536).

Author information

Authors and Affiliations


Corresponding authors

Correspondence to Salomon Kabongo , Jennifer D’Souza or Sören Auer .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and Permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Verify currency and authenticity via CrossMark

Cite this paper

Kabongo, S., D’Souza, J., Auer, S. (2021). Automated Mining of Leaderboards for Empirical AI Research. In: Ke, HR., Lee, C.S., Sugiyama, K. (eds) Towards Open and Trustworthy Digital Societies. ICADL 2021. Lecture Notes in Computer Science(), vol 13133. Springer, Cham.

Download citation

  • DOI:

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-91668-8

  • Online ISBN: 978-3-030-91669-5

  • eBook Packages: Computer ScienceComputer Science (R0)