Skip to main content
Log in

A semantic annotation framework for scientific publications

  • Published:
Quality & Quantity Aims and scope Submit manuscript

Abstract

Considering the growing volume of scientific literature, techniques that enable automatic detection of informational entities existing in scientific research articles may contribute to the extension of scientific knowledge and practical usages. Although there have been several efforts to extract informative entities from patent and biomedical research articles, there are few attempts in other scientific literatures. In this paper, we introduce an automatic semantic annotation framework for research articles based on entity recognition techniques. Our approach includes tag set modeling for semantic annotation, semi-automatic annotation tool, manual annotation for training data preparation, and supervised machine learning to develop entity type recognition module. For experiments, we choose two different domains, such as information and communication technology and chemical engineering due to their high usages. In addition, we provide three application scenarios of how our annotation framework can be used and extended further. It is to guide potential researchers who are willing to link their own contents with external data.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

References

  • Asahara, M., Matsumoto, Y.: Japanese named entity extraction with redundant morphological analysis. In: Proceedings of the 2003 conference of the North American chapter of the association for computational linguistics on human language technology, vol. 1, pp. 8–15 (2003)

  • Atdaǧ, S., Labatut, V.: A comparison of named entity recognition tools applied to biographical texts. In: ICSCS 2013, 2nd International conference on systems and computer science, pp. 228–233 (2013)

  • Bikel, D.M., Miller, S., Schwartz, R., Weischedel, R.: Nymble: a high-performance learning name-finder. In: Proceedings of the fifth conference on Applied natural language processing, pp. 194–201 (1998)

  • Cajaiba-Santana, G.: Social innovation: moving the field forward. A conceptual framework. Technol. Forecast. Soc. Change 82(1), 42–51 (2014)

    Article  Google Scholar 

  • Carmel, D., Chang, M.-W., Gabrilovich, E., Hsu, B.P., Wang, K.: ERD’14: Entity Recognition and Disambigutation Challenge. In: SIGIR’14, Proceedings of the 37th international ACM SIGIR conference on Research & development in information retrieval, vol. 48, pp. 1292–1292 (2014)

  • Chiu, Y., Shih, Y., Lee, Y., Shao, C.: NTUNLP Approaches to Recognizing and Disambiguating Entities in Long and Short Text in the 2014 ERD Challenge. In ERD’14, Proceedings of the first international workshop on Entity recognition & disambiguation, pp. 3–12 (2014)

  • CRF++: Yet Another CRF Toolkit. https://taku910.github.io/crfpp/ (2016). Accessed 9 June 2016

  • Daiber, J., Jakob, M., Hokamp, C., Mendes, P.N.: Improving efficiency and accuracy in multilingual entity extraction. In: Proceedings of the 9th international conference on semantic systems (I-Semantics), pp. 121–124 (2012)

  • Dey, L., Mahajan, D., Gupta, H.: Obtaining technology insights from large and heterogeneous document collections. In: 2014 IEEE/WIC/ACM international joint conferences on web intelligence (WI) and intelligent agent technologies (IAT), pp. 102–109 (2014)

  • Dos Santos, C.N., Guimarães, V.: Boosting named entity recognition with neural character embeddings. In: Proceedings of the fifth named entity workshop, joint with 53rd ACL and the 7th IJCNLP, vol. 2014, pp. 25–33 (2015)

  • Eltyeb, S., Salim, N.: Chemical named entities recognition: a review on approaches and applications. J. Cheminform. 6(1), 1–12 (2014)

    Article  Google Scholar 

  • Fadul, J.A.: Big data and knowledge generation in tertiary education in the Philippines. J. Contemp. East. Asia 13(1), 5–18 (2014)

    Article  Google Scholar 

  • Ferragina, P., Scaiella, U.: TAGME: One-the-fly Annotation of Short Text Fragmetns (by Wikiepdia Entities). In: CIKM’10, Proceedings of the 19th ACM international conference on Information and knowledge management, pp. 1625–1628 (2010)

  • Gongchang, R., Qi, L., Fenghai, Y.: On classification and extraction of deep knowledge in patents based on TRIZ theory. In: 2014 Fifth international conference on intelligent systems design and engineering applications, pp. 666–670 (2014)

  • Grishman, R., Borthwick, A., Sterling, J., Agichtein, E.: NYU: description of the MENE named entity system as used in MUC-7. In: Proceedings of the seventh message understanding conference (MUC-7) (1998)

  • Guo, Y., Korhonen, A., Poibeau, T.: A weakly-supervised approach to argumentative zoning of scientific documents. In: EMNLP’11, Proceedings of the conference on empirical methods in natural language processing, pp. 273–283 (2011)

  • Gupta, S., Manning, C.: Analyzing the dynamics of research by extracting key aspects of scientific papers. In: Proceedings of 5th international joint conference on natural language processing, pp. 1–9 (2011)

  • He, C., Loh, H.T.: Pattern-oriented associative rule-based patent classification. Expert Syst. Appl. 37(3), 2395–2404 (2010)

    Article  Google Scholar 

  • Ibekwe-SanJuan, F.: Semantic metadata annotation: tagging Medline abstracts for enhanced information access. Aslib Proc. 62, 476–488 (2010)

    Article  Google Scholar 

  • IBM Watson Discovery Advisorhttp://www.ibm.com/smarterplanet/us/en/ibmwatson/discovery-advisor.html (2014). Accessed 9 June 2016

  • Joachims, T., Finley, T., Yu, C.N.J.: Cutting-plane training of structural SVMs. Mach. Learn. 77, 27–59 (2009)

    Article  Google Scholar 

  • Jung, K., Park, H.W.: A semantic (TRIZ) network analysis of South Korea’s ‘Open Public Data’ policy. Gov. Inf. Q. 32(3), 353–358 (2015)

    Article  Google Scholar 

  • Lee, Y.-G.: Multidisciplinary Team Research as an Innovation Engine in Knowledge-Based Transition Economies and Implication for Asian Countries -From the Perspective of the Science of Team Science. J. Contemp. East. Asia 12(1), 49–63 (2013)

    Article  Google Scholar 

  • Lee, C., Jang, M.G.: A modified fixed-threshold SMO for 1-slack structural SVMs. ETRI J. 32(1), 120–128 (2010)

    Article  Google Scholar 

  • Lee, C., Hwang, Y.-G., Oh, H.-J., Lim, S., Heo, J., Lee, C.-H., Kim, H.-J., Wang, J.-H., Jang, M.-G.: Fine-grained named entity recognition using conditional random fields for question answering. In: Information Retrieval Technology, vol. 5839, pp. 581–587 (2006)

  • Lee, C., Ryu, P.-M., Kim, H.: Named entity recognition using a modified Pegasos algorithm. In: CIKM’11, Proceedings of the 20th ACM international conference on information and knowledge management, pp. 2337–2340 (2011)

  • McCallum, A., Li, W.: Early results for named entity recognition with conditional random fields, feature induction and web-enhanced lexicons. In: HLT-NAACL 2003, Proceedings of the seventh conference on natural language learning, vol. 4, pp. 188–191 (2003)

  • Mizuta, N.C.Y., Korhonen, A., Mullen, T.: Zone analysis in biology articles as a basis for in- formation extraction. Int. J. Med. Informatics. 75(6), 468–487 (2006)

    Article  Google Scholar 

  • Murphy, T., Mcintosh, T., Curran, J. R.: Named entity recognition for astronomy literature. In: Proceedings of the Australasian language technology workshop (ALTW), pp. 59–66 (2006)

  • Open Sciencehttps://en.wikipedia.org/wiki/Open_science (2016). Accessed 9 June 2016

  • Park, H.W., Leydesdorff, L.: Decomposing social and semantic networks in emerging ‘big data’ research. J. Informetr. 7(3), 756–765 (2013)

    Article  Google Scholar 

  • Park, Y.M., Kang, S.W., Seo, J.G.: Title named entity recognition using Wikipedia and abbreviation generation. In: International conference on big data and smart computing (BIGCOMP), pp. 169–172 (2014)

  • Phillips, F.: Triple helix and the circle of innovation. J. Contemp. East. Asia 13(1), 57–68 (2013)

    Article  Google Scholar 

  • Stenetorp, P., Pyysalo, S., Topić, G., Ohta, T., Ananiadou, S., Tsujii, J.: BRAT: a web-based tool for NLP-assisted text annotation. In: Proceedings of the demonstrations at the 13th conference of the European chapter of the association for computational linguistics (EACL), pp. 102–107 (2012)

  • Tateisi, Y., Shidahara, Y., Miyao, Y., Aizawa, A.: Annotation of computer science papers for semantic relation extraction. In: Proceedings of the 9th international conference on language resources and evaluation, pp. 1423–1429 (2014)

  • Teufel, S., Moens, M.: Summarizing scientific articles: experiments with relevance and rhetorical status. Comput. Linguist. 28, 409–445 (2002)

    Article  Google Scholar 

  • Teufel, S., Batchelor, C.: Towards discipline-independent argumentative zoning: evidence from chemistry and computational linguistics. In: Proceedings of the 2009 conference on empirical methods in natural language processing, pp. 1493–1502 (2009)

  • Tsochantaridis, I., Hofmann, T., Joachims, T., Altun, Y.: Support vector machine learning for interdependent and structured output spaces. In: ICML’04 Proceedings of the twenty-first international conference on Machine learning, p. 104 (2004)

  • Yoon, B., Park, I., Coh, B.Y.: Exploring technological opportunities by linking technology and products: application of morphology analysis and text mining. Technol. Forecast. Soc. Change 86, 287–303 (2014)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yuchul Jung.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Jung, Y. A semantic annotation framework for scientific publications. Qual Quant 51, 1009–1025 (2017). https://doi.org/10.1007/s11135-016-0369-3

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11135-016-0369-3

Keywords

Navigation