Quality & Quantity

, Volume 51, Issue 3, pp 1009–1025 | Cite as

A semantic annotation framework for scientific publications

  • Yuchul JungEmail author


Considering the growing volume of scientific literature, techniques that enable automatic detection of informational entities existing in scientific research articles may contribute to the extension of scientific knowledge and practical usages. Although there have been several efforts to extract informative entities from patent and biomedical research articles, there are few attempts in other scientific literatures. In this paper, we introduce an automatic semantic annotation framework for research articles based on entity recognition techniques. Our approach includes tag set modeling for semantic annotation, semi-automatic annotation tool, manual annotation for training data preparation, and supervised machine learning to develop entity type recognition module. For experiments, we choose two different domains, such as information and communication technology and chemical engineering due to their high usages. In addition, we provide three application scenarios of how our annotation framework can be used and extended further. It is to guide potential researchers who are willing to link their own contents with external data.


Entity type recognition Research article Structural support vector machine Semantic annotation Knowledge construction 


  1. Asahara, M., Matsumoto, Y.: Japanese named entity extraction with redundant morphological analysis. In: Proceedings of the 2003 conference of the North American chapter of the association for computational linguistics on human language technology, vol. 1, pp. 8–15 (2003)Google Scholar
  2. Atdaǧ, S., Labatut, V.: A comparison of named entity recognition tools applied to biographical texts. In: ICSCS 2013, 2nd International conference on systems and computer science, pp. 228–233 (2013)Google Scholar
  3. Bikel, D.M., Miller, S., Schwartz, R., Weischedel, R.: Nymble: a high-performance learning name-finder. In: Proceedings of the fifth conference on Applied natural language processing, pp. 194–201 (1998)Google Scholar
  4. Cajaiba-Santana, G.: Social innovation: moving the field forward. A conceptual framework. Technol. Forecast. Soc. Change 82(1), 42–51 (2014)CrossRefGoogle Scholar
  5. Carmel, D., Chang, M.-W., Gabrilovich, E., Hsu, B.P., Wang, K.: ERD’14: Entity Recognition and Disambigutation Challenge. In: SIGIR’14, Proceedings of the 37th international ACM SIGIR conference on Research & development in information retrieval, vol. 48, pp. 1292–1292 (2014)Google Scholar
  6. Chiu, Y., Shih, Y., Lee, Y., Shao, C.: NTUNLP Approaches to Recognizing and Disambiguating Entities in Long and Short Text in the 2014 ERD Challenge. In ERD’14, Proceedings of the first international workshop on Entity recognition & disambiguation, pp. 3–12 (2014)Google Scholar
  7. CRF++: Yet Another CRF Toolkit. (2016). Accessed 9 June 2016
  8. Daiber, J., Jakob, M., Hokamp, C., Mendes, P.N.: Improving efficiency and accuracy in multilingual entity extraction. In: Proceedings of the 9th international conference on semantic systems (I-Semantics), pp. 121–124 (2012)Google Scholar
  9. Dey, L., Mahajan, D., Gupta, H.: Obtaining technology insights from large and heterogeneous document collections. In: 2014 IEEE/WIC/ACM international joint conferences on web intelligence (WI) and intelligent agent technologies (IAT), pp. 102–109 (2014)Google Scholar
  10. Dos Santos, C.N., Guimarães, V.: Boosting named entity recognition with neural character embeddings. In: Proceedings of the fifth named entity workshop, joint with 53rd ACL and the 7th IJCNLP, vol. 2014, pp. 25–33 (2015)Google Scholar
  11. Eltyeb, S., Salim, N.: Chemical named entities recognition: a review on approaches and applications. J. Cheminform. 6(1), 1–12 (2014)CrossRefGoogle Scholar
  12. Fadul, J.A.: Big data and knowledge generation in tertiary education in the Philippines. J. Contemp. East. Asia 13(1), 5–18 (2014)CrossRefGoogle Scholar
  13. Ferragina, P., Scaiella, U.: TAGME: One-the-fly Annotation of Short Text Fragmetns (by Wikiepdia Entities). In: CIKM’10, Proceedings of the 19th ACM international conference on Information and knowledge management, pp. 1625–1628 (2010)Google Scholar
  14. Gongchang, R., Qi, L., Fenghai, Y.: On classification and extraction of deep knowledge in patents based on TRIZ theory. In: 2014 Fifth international conference on intelligent systems design and engineering applications, pp. 666–670 (2014)Google Scholar
  15. Grishman, R., Borthwick, A., Sterling, J., Agichtein, E.: NYU: description of the MENE named entity system as used in MUC-7. In: Proceedings of the seventh message understanding conference (MUC-7) (1998)Google Scholar
  16. Guo, Y., Korhonen, A., Poibeau, T.: A weakly-supervised approach to argumentative zoning of scientific documents. In: EMNLP’11, Proceedings of the conference on empirical methods in natural language processing, pp. 273–283 (2011)Google Scholar
  17. Gupta, S., Manning, C.: Analyzing the dynamics of research by extracting key aspects of scientific papers. In: Proceedings of 5th international joint conference on natural language processing, pp. 1–9 (2011)Google Scholar
  18. He, C., Loh, H.T.: Pattern-oriented associative rule-based patent classification. Expert Syst. Appl. 37(3), 2395–2404 (2010)CrossRefGoogle Scholar
  19. Ibekwe-SanJuan, F.: Semantic metadata annotation: tagging Medline abstracts for enhanced information access. Aslib Proc. 62, 476–488 (2010)CrossRefGoogle Scholar
  20. IBM Watson Discovery Advisor (2014). Accessed 9 June 2016
  21. Joachims, T., Finley, T., Yu, C.N.J.: Cutting-plane training of structural SVMs. Mach. Learn. 77, 27–59 (2009)CrossRefGoogle Scholar
  22. Jung, K., Park, H.W.: A semantic (TRIZ) network analysis of South Korea’s ‘Open Public Data’ policy. Gov. Inf. Q. 32(3), 353–358 (2015)CrossRefGoogle Scholar
  23. Lee, Y.-G.: Multidisciplinary Team Research as an Innovation Engine in Knowledge-Based Transition Economies and Implication for Asian Countries -From the Perspective of the Science of Team Science. J. Contemp. East. Asia 12(1), 49–63 (2013)CrossRefGoogle Scholar
  24. Lee, C., Jang, M.G.: A modified fixed-threshold SMO for 1-slack structural SVMs. ETRI J. 32(1), 120–128 (2010)CrossRefGoogle Scholar
  25. Lee, C., Hwang, Y.-G., Oh, H.-J., Lim, S., Heo, J., Lee, C.-H., Kim, H.-J., Wang, J.-H., Jang, M.-G.: Fine-grained named entity recognition using conditional random fields for question answering. In: Information Retrieval Technology, vol. 5839, pp. 581–587 (2006)Google Scholar
  26. Lee, C., Ryu, P.-M., Kim, H.: Named entity recognition using a modified Pegasos algorithm. In: CIKM’11, Proceedings of the 20th ACM international conference on information and knowledge management, pp. 2337–2340 (2011)Google Scholar
  27. McCallum, A., Li, W.: Early results for named entity recognition with conditional random fields, feature induction and web-enhanced lexicons. In: HLT-NAACL 2003, Proceedings of the seventh conference on natural language learning, vol. 4, pp. 188–191 (2003)Google Scholar
  28. Mizuta, N.C.Y., Korhonen, A., Mullen, T.: Zone analysis in biology articles as a basis for in- formation extraction. Int. J. Med. Informatics. 75(6), 468–487 (2006)CrossRefGoogle Scholar
  29. Murphy, T., Mcintosh, T., Curran, J. R.: Named entity recognition for astronomy literature. In: Proceedings of the Australasian language technology workshop (ALTW), pp. 59–66 (2006)Google Scholar
  30. Open Science (2016). Accessed 9 June 2016
  31. Park, H.W., Leydesdorff, L.: Decomposing social and semantic networks in emerging ‘big data’ research. J. Informetr. 7(3), 756–765 (2013)CrossRefGoogle Scholar
  32. Park, Y.M., Kang, S.W., Seo, J.G.: Title named entity recognition using Wikipedia and abbreviation generation. In: International conference on big data and smart computing (BIGCOMP), pp. 169–172 (2014)Google Scholar
  33. Phillips, F.: Triple helix and the circle of innovation. J. Contemp. East. Asia 13(1), 57–68 (2013)CrossRefGoogle Scholar
  34. Stenetorp, P., Pyysalo, S., Topić, G., Ohta, T., Ananiadou, S., Tsujii, J.: BRAT: a web-based tool for NLP-assisted text annotation. In: Proceedings of the demonstrations at the 13th conference of the European chapter of the association for computational linguistics (EACL), pp. 102–107 (2012)Google Scholar
  35. Tateisi, Y., Shidahara, Y., Miyao, Y., Aizawa, A.: Annotation of computer science papers for semantic relation extraction. In: Proceedings of the 9th international conference on language resources and evaluation, pp. 1423–1429 (2014)Google Scholar
  36. Teufel, S., Moens, M.: Summarizing scientific articles: experiments with relevance and rhetorical status. Comput. Linguist. 28, 409–445 (2002)CrossRefGoogle Scholar
  37. Teufel, S., Batchelor, C.: Towards discipline-independent argumentative zoning: evidence from chemistry and computational linguistics. In: Proceedings of the 2009 conference on empirical methods in natural language processing, pp. 1493–1502 (2009)Google Scholar
  38. Tsochantaridis, I., Hofmann, T., Joachims, T., Altun, Y.: Support vector machine learning for interdependent and structured output spaces. In: ICML’04 Proceedings of the twenty-first international conference on Machine learning, p. 104 (2004)Google Scholar
  39. Yoon, B., Park, I., Coh, B.Y.: Exploring technological opportunities by linking technology and products: application of morphology analysis and text mining. Technol. Forecast. Soc. Change 86, 287–303 (2014)CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media Dordrecht 2016

Authors and Affiliations

  1. 1.Department of Information Convergence ResearchKorea Institute of Science and Technology Information (KISTI)DaejeonKorea

Personalised recommendations