Domain-Independent Extraction of Scientific Concepts from Research Articles
We examine the novel task of domain-independent scientific concept extraction from abstracts of scholarly articles and present two contributions. First, we suggest a set of generic scientific concepts that have been identified in a systematic annotation process. This set of concepts is utilised to annotate a corpus of scientific abstracts from 10 domains of Science, Technology and Medicine at the phrasal level in a joint effort with domain experts. The resulting dataset is used in a set of benchmark experiments to (a) provide baseline performance for this task, (b) examine the transferability of concepts between domains. Second, we present a state-of-the-art deep learning baseline. Further, we propose the active learning strategy for an optimal selection of instances from among the various domains in our data. The experimental results show that (1) a substantial agreement is achievable by non-experts after consultation with domain experts, (2) the baseline system achieves a fairly high F1 score, (3) active learning enables us to nearly halve the amount of required training data.
KeywordsSequence labelling Information extraction Scientific articles Active learning Scholarly communication Research knowledge graph
Scholarly communication as of today is a document-centric process. Research results are usually conveyed in written articles, as a PDF file with text, tables and figures. Automatic indexing of these texts is limited and generally does not access their semantic content. There are thus severe limitations how current research infrastructures can support scientists in their work: finding relevant research works, comparing them, and compiling summaries is still a tedious and error-prone manual work. The heightened increase in the number of published research papers aggravates this situation .
Knowledge graphs are recognised as an effective approach to facilitate semantic search . For academic search engines, Xiong et al.  have shown that exploiting knowledge bases like Freebase can improve search results. However, the introduction of new scientific concepts occurs at a faster pace than knowledge base curation, resulting in a large gap in knowledge base coverage of scientific entities , e.g. the task geolocation estimation of photos from the Computer Vision field is neither present in Wikipedia nor in more specialised knowledge bases like Computer Science Ontology (CSO)  or “Papers with code” . Information extraction from text helps to identify emerging entities and to populate knowledge graphs . It then is a first vital step towards a fine-grained research knowledge graph in which research articles are described and interconnected through entities like tasks, materials, and methods. Our work is motivated by the idea of the automatic construction of a research knowledge graph.
Information extraction from scientific texts, obviously, differs from its general domain counterpart: Understanding a research paper and determining its most important statements demands certain expertise in the article’s domain. Every domain is characterised by its specific terminology and phrasing which is hard to grasp for a non-expert reader. In consequence, extraction of scientific concepts from text would entail the involvement of domain experts and a specific design of an extraction methodology for each scientific discipline – both requirements are rather time-consuming and costly.
At present, a systematic study of these assumptions is missing. We thus present the task of domain-independent scientific concept extraction. We examine the intuition that most research papers share certain core concepts such as the mentions of research tasks or methods. If so, these would allow a domain-independent information extraction system to support populating a research knowledge graph, which does not reach all semantic depths of the analysed article, but still provides some science-specific structure.
In this paper, we introduce a set of common scientific concepts that we find are relevant over a set of 10 examined domains from Science, Technology, and Medicine (STM). These generic concepts have been identified in a systematic, joint effort of domain experts and non-domain experts. The inter-coder agreement is measured to ensure the adequacy and quality of concepts. A set of research abstracts has been annotated using these concepts and the results are discussed with experts from the corresponding fields. The resulting dataset serves as a basis to train two baseline deep learning classifiers. In particular, we present an active learning approach to reduce the number of required training data. The systems are evaluated in different experimental setups.
Our main contributions can be summarised as follows: (1) We introduce the novel task domain-independent scientific concept extraction, which aims at automatically extracting scientific entities in a domain-independent manner. (2) We release a new corpus that comprises 110 abstracts of 10 STM domains annotated at the phrasal level. (3) We present and evaluate a state-of-the-art deep learning approach for this task. Additionally, we employ active learning for an optimal selection of instances, which to our knowledge, is demonstrated for the first time on scholarly text. We find that strategic instance selection gives us the same performance with only about half of the training data. (4) We release a silver-labelled corpus with 62 K automatically annotated abstracts of Elsevier with CCBY license and 1.2 Mio. extracted unique concepts comprising 24 domains. (5) We make our corpora and source code publicly available to facilitate further research.
2 Related Work
This section gives a brief overview of existing annotated datasets for scientific information extraction, followed by related work on some exemplary applications for domain-independent information extraction from scientific papers.
2.1 Scientific Corpora
Sentence Level Annotation. Early approaches for semantic structuring of research papers focused on sentences as the basic unit of analysis. This enables, for instance, automatic highlighting of relevant paper passages to enable efficient assessment regarding quality and relevance. Several ontologies have been created that focus on the rhetorical [11, 19], argumentative [31, 46] or activity-based  structure of research papers.
Annotated datasets exist for several domains, e.g. PubMed200k  from biomedical randomized controlled trials, NICTA-PIBOSO  from evidence-based medicine, Dr. Inventor  from Computer Graphics, Core Scientific Concepts (CoreSC)  from Chemistry and Biochemistry, and Argumentative Zoning (AZ)  from Chemistry and Computational Linguistics, Sentence Corpus  from Biology, Machine Learning and Psychology. Most datasets cover only a single domain, while few other datasets cover three domains. Several machine learning methods have been proposed for scientific sentence classification [12, 15, 24, 30].
Phrase Level Annotation. More recent corpora have been annotated at phrasal level (e.g. noun phrases). SciCite  and ACL ARC  are datasets for citation intent classification from Computer Science, Medicine, and Computational Linguistics. ACL RD-TEC  from Computational Linguistics aims at extracting scientific technology and non-technology terms. ScienceIE-17  from Computer Science, Material Sciences, and Physics contains three concepts Process, Task and Material. SciERC  from the machine learning domain contains six concepts Task, Method, Metric, Material, Other-ScientificTerm and Generic. Each corpus covers at most three domains.
Experts vs. Non-experts. The aforementioned datasets were usually annotated by domain experts [2, 12, 20, 26, 31, 32]. In contrast, Teufel et al.  explicitly use non-experts in their annotation tasks, arguing that text understanding systems can use general, rhetorical and logical aspects also when qualifying scientific text. According to this line of thought, more researchers used (presumably cheaper) non-expert annotation as an alternative [8, 15].
Snow et al.  provide a study on expert versus non-expert performance for general, non-scientific annotation tasks. They state that about four non-experts (Mechanical Turk workers, in their case) were needed to rival the experts’ annotation quality. However, systems trained on data generated by non-experts showed to benefit from annotation diversity and to suffer less from annotator bias. A recent study  examines the agreement between experts and non-experts for visual concept classification and person recognition in historical video data. For the task of face recognition, training with expert annotations lead to an increase of only 1.5% in classification accuracy.
Active Learning in Natural Language Processing (NLP). To the best of our knowledge, active learning has not been utilised in classification approaches for scientific text yet. Recent publications demonstrate the effectiveness of active learning for NLP tasks such as Named Entity Recognition (NER)  and sentence classification . Siddhant and Lipton  and Shen et. al.  compare several sampling strategies on NLP tasks and show that Maximum Normalized Log-Probability (MNLP) based on uncertainty sampling performs well in NER.
2.2 Applications for Domain-Independent Scientific Information Extraction
Academic Search Engines. Academic search engines such as Google Scholar , Microsoft Academic  and Semantic Scholar  specialise in search of scholarly literature. They exploit graph structures such as the Microsoft Academic Knowledge Graph , SciGraph , or the Semantic Scholar Corpus . These graphs interlink the papers through meta-data such as citations, authors, venues, and keywords, but not through deep semantic representation of the articles’ content.
However, first attempts towards a more semantic representation of article content exist: Ammar et al.  interlink the Semantic Scholar Corpus with DBpedia  and Unified Medical Language System (UMLS)  using entity linking techniques. Yaman et al.  connect SciGraph with DBpedia person entities. Xiong et al.  demonstrate that academic search engines can greatly benefit from exploiting general-purpose knowledge bases. However, the coverage of science-specific concepts is rather low .
Research Paper Recommendation Systems. Beel et al.  provide a comprehensive survey about research paper recommendation systems. Such systems usually employ different strategies (e.g. content-based and collaborative filtering) and several data sources (e.g. text in the documents, ratings, feedback, stereotyping). Graph-based systems, in particular, exploit citation graphs and genes mentioned in the papers . Beel et al. conclude that it is not possible to determine the most effective recommendation approach at the moment. However, we believe that a fine-grained research knowledge graph can improve such systems. Although “Papers with code”  is not a typical recommendation system, it allows researchers to browse easily for papers from the field of machine learning that address a certain task.
3 Corpus for Domain-Independent Scientific Concept Extraction
In this section, we introduce the novel task of domain-independent extraction of scientific concepts and present an annotated corpus. As the discussion of related work reveals, the annotation of scientific resources is not a novel task. However, most researchers focus on at most three scientific disciplines and on expert-level annotations. In this work, we explore the domain-independent annotation of lexical phrasal units indicating scientific knowledge, i.e. scientific concepts, in abstracts from ten different science domains. Since other studies have also shown that non-expert annotations are feasible for the scientific domain, we go for a cost-efficient middle course: annotations by non-experts with scientific proficiency, and consultation with domain-experts. Finally, we explore how well a state-of-the-art deep learning model performs on this novel information extraction task and whether active learning can help to reduce the amount of required training data. Our novel corpus and the annotation process are described below.
3.1 OA-STM Corpus
The OA-STM corpus  is a set of open access (OA) articles from various domains in Science, Technology and Medicine (STM). It was published in 2017 as a platform for benchmarking methods in scholarly article processing, amongst other scientific information extraction. The dataset contains a selection of 110 articles from 10 domains, namely Agriculture (Agr), Astronomy (Ast), Biology (Bio), Chemistry (Che), Computer Science (CS), Earth Science (ES), Engineering (Eng), Materials Science (MS), Mathematics (Mat), and Medicine (Med). This first annotation cycle focuses on the articles’ abstracts as they contain a condensed summary of the article.
3.2 Annotation Process
The four core scientific concepts that were derived in this study
Open image in new window Natural phenomenon or activities, e.g. growing (Bio), reduction (Mat), flooding (ES)
Open image in new window A commonly used procedure that acts on entities, e.g. powder X-ray (Che), the PRAM analysis (CS), magnetoencephalography (Med)
Open image in new window A physical or abstract entity used in scientific experiments or proofs, e.g. soil (Agr), the moon (Ast), the carbonator (Che)
Open image in new window The data themselves, measurements, or quantitative or qualitative characteristics of entities, e.g. rotational energy (Eng), tensile strength (MS), 3D time-lapse seismic data (ES)
Pre-annotation. A literature review of annotation schemes [2, 11, 30, 31] provided a seed set of potential candidate concepts. Both non-experts independently annotated a subset of the STM abstracts with these concepts (non-overlapping) and discussed the outcome. In a three-step process, the concept set was pruned to only contain those which seemed suitably transferable between domains. Our set of generic scientific concepts consists of Process, Method, Material, and Data (see Table 1 for their definitions). We also identified Task , Object , and Results , however, in this study we do not consider nested span concepts, hence we leave them out since they were almost always nested with the other scientific entities (e.g. a Result may be nested with Data).
Phase I. Five abstracts per domain (i.e. 50 abstracts) were annotated by both annotators and the inter-annotator agreement was computed using Cohen’s \(\kappa \)  at exact annotated spans. Results showed a moderate inter-annotator agreement of 0.52 \(\kappa \).
Phase II. The annotations were then presented to subject specialists who each reviewed (a) the choice of concepts and (b) annotation decisions on the respective domain corpus. The interviews mostly confirmed the concept candidates as generally applicable. The experts’ feedback on the annotation was even more valuable: The comments allowed for a more precise reformulation of the annotation guidelines, including illustrating examples from the corpus.
Per-domain and overall inter-annotator agreement (Cohen’s Kappa \(\kappa \)) for Process, Method, Material, and Method scientific concept annotation
3.3 Corpus Characteristics
The annotated corpus characteristics containing 11 abstracts per domain in terms of size and the number of scientific concept phrases
Avg. # Tokens/Abstract
# Gold scientific concept phrases
# Unique gold scientific concept phrases
4 Automatic Domain-Independent Scientific Concept Extraction
The current state-of-the-art for scientific entity extraction is Beltagy et al.’s deep learning system with SciBERT word embeddings , which were pre-trained on scientific texts using the BERT  architecture. It consists of three components: (a) a token embedding layer comprising a per-sentence sequence of tokens, where each token is represented as a concatenation of SciBERT word embedding and CNN-based character embeddings , (b) a token-level encoder with two stacked bidirectional LSTMs , and (c) a Conditional Random Field (CRF) based tag decoder  with BILOU (beginning, inside, last, outside, unit) tagging scheme. This deep learning architecture is implemented in AllenNLP  and uses spaCy  for text preprocessing, i.e. for tokenisation and sentence-splitting.
4.1 Supervised Learning with Full Training Dataset
Using the above mentioned architecture, we train one model with data from all domains combined. We refer to this model as the domain-independent classifier. Similarly, we train 10 models for each domain in our corpus – the domain-specific classifier.
To obtain a robust evaluation of models, we perform five-fold cross-validation experiments. In each fold experiment, we train a model on 8 abstracts per domain (i.e. 80 abstracts), tune hyperparameters on 1 abstract per domain (i.e. 10 abstracts), and test on the remaining 2 abstracts per domain (i.e. 20 abstracts) ensuring that the data splits are not identical between the folds. All results reported in the paper are averaged over the five folds. We still obtain reliably trained domain-specific classifiers since on average they are trained on 400 concepts.
4.2 Active Learning with Training Data Subset
In this setting, we employ an active learning strategy [42, 49] to train a new domain-independent classifier. Active learning is usually applied to determine the optimal set of sufficiently distinct instances to minimise annotation costs. With our application of active learning we find which proportion of our annotations suffice for training a robust classifier. We decide to use the MNLP  sampling strategy. We prefer it over its contemporary, Bayesian Active Learning by Disagreement (BALD) , since it has less computational requirements. The MNLP objective involves greedy sampling of sentences preferring those with the least logarithmic likelihood of the predicted tag sequence output by the CRF tag decoder, normalised by the number of tokens to avoid preferring longer sentences. In our experiments, we found that adding 4% of the data to be the most discriminative selection of classifier performance. Therefore, we run 25 iterations of active learning in each stage adding 4% training data. We perform five-fold cross validation as before and the per-fold models are retrained after data resampling.
5 Experimental Results and Discussion
In this section, we discuss the results obtained with our trained classifiers and the correlation analysis between inter-annotator agreement and performance of the classifiers.
5.1 Domain-Independent and Domain-Specific Classifiers: Full Training Dataset
The domain-independent classifier results in terms of Precision (P), Recall (R), and F1-score on scientific concepts, respectively, and Overall
Next, we compare and contrast the 10 domain-specific classifiers (see Fig. 1) by their capability to extract the concepts from their own domains and in other domains.
Most Robust Domain. Bio (third bar in each domain in Fig. 1) extracts scientific concepts from its own domain at the same performance as the domain-independent classifier with an F1 score of 71% (±9.0) demonstrating a robust domain. It comprises only 11% of the overall data, yet the domain-independent classifier trained on all data does not outperform it.
Most Specialised Domain. Mat (the second last bar in each domain in Fig. 1) shows the lowest performance in extracting scientific concepts from all domains except itself. Hence it shows to be the most specialised domain in our corpus. Notably, a characteristic feature of this domain is that it has short abstracts (nearly a third of the size of the longest abstracts), so it is also the most underrepresented in our corpus. Also, distinct from the other domains, Mat has triple the number of Data entities compared to each of its other concepts, where in the other domains Process and Material are consistently predominant.
Medical and Life Science Domains. The Med, Agr, and Bio domains show strong domain relatedness. Their respective domain-specific classifiers show top five system performances among the three domains, when applied to another domain. For instance, the Med domain shows the strongest domain relatedness and is classified best by Med (last bar), followed by Bio (third bar) and Agr (first bar).
Domain-Independent vs. Domain-Specific Classifier. Except for Bio the domain-independent classifier clearly outperforms the domain-specific one in extracting concepts from their respective domains. We attribute this, in part, to the improved span-detection performance. Span-detection merely relies on syntactic regularity, thus the domain-independent classifier can benefit from more training data of other domains. E.g., the CS classifier shows a relative improvement of 49.5% domain-specific F1 score to 65.9% in the domain-independent setting, which is supported by the enhanced span-detection performance from 73.4% to 82.0% in F1. Accuracy on token-level also improves from 67.7% to 77.5% F1 for CS, that is correct labelling of the tokens also benefits from other domains. This is also supported by the results in the confusion matrix depicted in Fig. 2 for the CS and the domain-independent classifier on token-level.
5.2 Domain-Independent Classifier with Active Learning
The results of the active learning experiment over the full dataset plotted over the 25 iterations are depicted in Fig. 4, showing that MNLP clearly outperforms the random baseline. While using only 52% of the training data, the best result of the domain-independent classifier trained with all training data is surpassed with an F1 score of 65.5% (±1.0). The random baseline achieves an F1 score of only 62.5% (±2.6) with the same proportion of training data. When 76% of the data are sampled by MNLP, the best active learning performance across all steps is achieved with an F1 score of 69.0% on the validation set, having the best F1 of 66.4% (±2.0) on the test set. Thus, 76% of our annotated sentences suffice to train an optimal performing model.
Analysing the distribution of sentences in the training data sampled by MNLP, shows (Math, CS) as the most preferred domains and (Eng, MS) the least preferred ones. Nonetheless, all domains are represented, that is a non-uniformly mix of sentences sampled by MNLP yields the most generic model with less training data. In contrast, the random sampling strategy uniformly samples sentences from all domains.
Performance of active learning with MNLP and random sampling strategy for the fraction of training data when the performance with entire training dataset is achieved; for SciERC and ScienceIE-17 results are reported across 5 random restarts
5.3 Correlations Between Inter-annotator Agreement and Performance
In this section, we analyse the correlations (Pearson’s R) of inter-coder agreement \(\kappa \) and the number of annotated concepts per domain (#) on (1) the performance F1 and (2) variance resp. standard deviation (std) of the classifiers across five-fold cross validation.
Inter-annotator agreement (\(\kappa \)) and the number of concept phrases (#) per domain; F1 and std of domain-specific classifiers on their domains; F1 and std of domain-independent and AL-trained classifier on each domain; the right side depicts correlation coefficients (R) of each row with \(\kappa \) and the number of concept phrases
R \(\kappa \)
Inter-annotator agreement (\(\kappa \))
# concept phrases (#)
The correlation values for the variance are different between the classifier types. For the domain-specific classifier the correlation between \(\kappa \) and std, and the number of concepts per domain and std are slightly positive (R 0.29, 0.28), i.e. the higher the agreement and the size of the domain, the higher the variance of the domain-specific classifier. For the domain-independent classifier, there is no correlation (R 0.11, −0.05) and for the AL-trained classifier, the correlations become negative (R −0.41, −0.72), i.e. higher agreement and more annotated concepts per domain lead to less variance for the AL-trained classifier. In summary, we hypothesise that more diverse training data from several domains lead to better performance and lower variance by introducing an inductive bias.
In this paper, we have introduced the novel task of domain-independent concept extraction from scientific texts. During a systematic annotation procedure involving domain experts, we have identified four general core concepts that are relevant across the domains of Science, Technology and Medicine. To enable and foster research on these topics, we have annotated a corpus for the domains. We have verified the adequacy of the concepts by evaluating the human annotator agreement for our broad STM domain corpus. The results indicate that the identification of the generic concepts in a corpus covering 10 different scholarly domains is feasible by non-experts with moderate agreement and after consultation of domain experts with substantial agreement (0.76 \(\kappa \)).
We evaluated a state-of-the-art system on our annotated corpus which achieved a fairly high F1 score (65.5% overall). The domain-independent system noticeably outperforms the domain-specific systems, which indicates that the model can generalise well across domains. We also observed a strong correlation between the number of annotated concepts per domain and classifier performance, and only a weak correlation between inter-annotator agreement per domain and the performance. It is assumed that more annotated data positively influence the performance in the respective domain.
Furthermore, we have suggested active learning for our novel task. We have shown that only approx. 5 annotated abstracts per domain serving as training data are sufficient to build a performant model. Our active learning results for SciERC  and ScienceIE17  datasets were similar. The promising results suggest that we do not need a large annotated dataset for scientific information extraction. Active learning can significantly save annotation costs and enable fast adaptation to new domains.
We make our annotated corpus, a silver-labelled corpus with 62K abstracts comprising 24 domains, and source code publicly available.1 Thereby, we hope to facilitate research on the task of scientific information extraction and its several applications, e.g. academic search engines or research paper recommendation systems.
In the future, we plan to extend and refine the concepts for certain domains. We also intend to apply and evaluate our automatic scientific concept extraction system to expand an open research knowledge graph . For this purpose, we plan to extend the corpus with additional relevant annotation layers such as with coreference links  and relations [16, 32].
- 1.Ammar, W., et al.: Construction of the literature graph in semantic scholar. In: NAACL-HLT (2018)Google Scholar
- 2.Augenstein, I., Das, M., Riedel, S., Vikraman, L., McCallum, A.: Semeval 2017 task 10: Scienceie - extracting keyphrases and relations from scientific publications. In: SemEval@ACL (2017)Google Scholar
- 5.Beltagy, I., Lo, K., Cohan, A.: SciBERT: pretrained language model for scientific text. In: EMNLP (2019)Google Scholar
- 6.Bodenreider, O.: The unified medical language system (UMLS): integrating biomedical terminology. Nucleic Acids Res. 32(Database issue), D267-70 (2004)Google Scholar
- 7.Bornmann, L., Mutz, R.: Growth rates of modern science: a bibliometric analysis based on the number of publications and cited references. J. Assoc. Inf. Sci. Technol. 66(11), 2215–2222 (2015) Google Scholar
- 8.Chambers, A.: Statistical models for text classification and clustering: applications and analysis. Ph.D. thesis, University of California, Irvine (2013)Google Scholar
- 9.Cohan, A., Ammar, W., van Zuylen, M., Cady, F.: Structural scaffolds for citation intent classification in scientific publications. In: NAACL-HLT (2019)Google Scholar
- 12.Dernoncourt, F., Lee, J.Y.: Pubmed 200k RCT: a dataset for sequential sentence classification in medical abstracts. In: IJCNLP (2017)Google Scholar
- 13.Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. CoRR abs/1810.04805 (2018)Google Scholar
- 14.Elsevier OA STM Corpus. https://github.com/elsevierlabs/OA-STM-Corpus. Accessed 12 Apr 2019
- 15.Fisas, B., Saggion, H., Ronzano, F.: On the discoursive structure of computer graphics research papers. In: LAW@NAACL-HLT (2015)Google Scholar
- 16.Gábor, K., Buscaldi, D., Schumann, A.K., QasemiZadeh, B., Zargayouna, H., Charnois, T.: Semeval-2018 task 7: semantic relation extraction and classification in scientific papers. In: Proceedings of The 12th International Workshop on Semantic Evaluation, pp. 679–688 (2018)Google Scholar
- 17.Gardner, M., et al.: AllenNLP: a deep semantic natural language processing platform. arXiv preprint arXiv:1803.07640 (2018)
- 18.Google scholar. https://scholar.google.com/. Accessed 12 Sept 2019
- 19.Groza, T., Kim, H., Handschuh, S.: Salt: semantically annotated latex. In: SAAW@ISWC (2006)Google Scholar
- 20.Handschuh, S., Zadeh, B.Q.: The ACL RD-TEC: a dataset for benchmarking terminology extraction and classification in computational linguistics. In: COLING 2014: 4th International Workshop on Computational Terminology (2014)Google Scholar
- 22.Houlsby, N., Huszar, F., Ghahramani, Z., Lengyel, M.: Bayesian active learning for classification and preference learning. CoRR abs/1112.5745 (2011)Google Scholar
- 23.Jaradeh, M.Y., et al.: Open research knowledge graph: next generation infrastructure for semantic scholarly knowledge. In: K-CAP 2019 (2019)Google Scholar
- 24.Jin, D., Szolovits, P.: Hierarchical neural networks for sequential sentence classification in medical scientific abstracts. In: EMNLP (2018)Google Scholar
- 26.Kim, S., Martínez, D., Cavedon, L., Yencken, L.: Automatic classification of sentences to support evidence based medicine. In: BMC Bioinformatics (2011)Google Scholar
- 28.Lee, K., He, L., Lewis, M., Zettlemoyer, L.S.: End-to-end neural coreference resolution. In: EMNLP (2017)Google Scholar
- 31.Liakata, M., Teufel, S., Siddharthan, A., Batchelor, C.R.: Corpora for the conceptualisation and zoning of scientific papers. In: LREC (2010)Google Scholar
- 32.Luan, Y., He, L., Ostendorf, M., Hajishirzi, H.: Multi-task identification of entities, relations, and coreference for scientific knowledge graph construction. In: EMNLP (2018)Google Scholar
- 33.Ma, X., Hovy, E.H.: End-to-end sequence labeling via bi-directional LSTM-CNNS-CRF. CoRR abs/1603.01354 (2016)Google Scholar
- 34.Microsoft Academic. https://academic.microsoft.com/home. Accessed 12 Sept 2019
- 35.Microsoft Academic Knowledge Graph. http://ma-graph.org/. Accessed 12 Sept 2019
- 36.Papers with code. https://paperswithcode.com/. Accessed 12 Sept 2019
- 38.Pustu-Iren, K., et al.: Investigating correlations of inter-coder agreement and machine annotation performance for historical video data. In: TPDL (2019)Google Scholar
- 39.Salatino, A.A., Thanapalasingam, T., Mannocci, A., Osborne, F., Motta, E.: The computer science ontology: a large-scale taxonomy of research areas. In: International Semantic Web Conference (2018)Google Scholar
- 40.Semantic scholar. https://www.semanticscholar.org/. Accessed 12 Sept 2019
- 41.Shen, Y., Yun, H., Lipton, Z.C., Kronrod, Y., Anandkumar, A.: Deep active learning for named entity recognition. In: ICLR (2017)Google Scholar
- 42.Siddhant, A., Lipton, Z.C.: Deep Bayesian active learning for natural language processing: results of a large-scale empirical study. In: EMNLP (2018)Google Scholar
- 43.Snow, R., O’Connor, B.T., Jurafsky, D., Ng, A.Y.: Cheap and fast - but is it good? Evaluating non-expert annotations for natural language tasks. In: EMNLP (2008)Google Scholar
- 44.spaCy: Industrial-strength natural language processing. http://www.spacy.io. Accessed 02 Sep 2019
- 45.Springer Nature SciGraph. https://www.springernature.com/gp/researchers/scigraph. Accessed 12 Sept 2019
- 46.Teufel, S., Siddharthan, A., Batchelor, C.: Towards discipline-independent argumentative zoning: evidence from chemistry and computational linguistics. In: Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 3, vol. 3, pp. 1493–1502. Association for Computational Linguistics (2009)Google Scholar
- 47.Xiong, C., Power, R., Callan, J.P.: Explicit semantic ranking for academic search via knowledge graph embedding. In: WWW (2017)Google Scholar
- 48.Yaman, B., Pasin, M., Freudenberg, M.: Interlinking SciGraph and DBpedia datasets using link discovery and named entity recognition techniques. In: LDK (2019)Google Scholar
- 49.Zhang, Y., Lease, M., Wallace, B.C.: Active discriminative text representation learning. In: AAAI (2016)Google Scholar