Skip to main content

Coner: A Collaborative Approach for Long-Tail Named Entity Recognition in Scientific Publications

  • 1229 Accesses

Part of the Lecture Notes in Computer Science book series (LNISA,volume 11799)

Abstract

Named Entity Recognition (NER) for rare long-tail entities as e.g., often found in domain-specific scientific publications is a challenging task, as typically the extensive training data and test data for fine-tuning NER algorithms is lacking. Recent approaches presented promising solutions relying on training NER algorithms in an iterative weakly-supervised fashion, thus limiting human interaction to only providing a small set of seed terms. Such approaches heavily rely on heuristics in order to cope with the limited training data size. As these heuristics are prone to failure, the overall achievable performance is limited. In this paper, we therefore introduce a collaborative approach which incrementally incorporates human feedback on the relevance of extracted entities into the training cycle of such iterative NER algorithms. This approach, called Coner, allows to still train new domain specific rare long-tail NER extractors with low costs, but with ever increasing performance while the algorithm is actively used in an application.

This is a preview of subscription content, access via your institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • DOI: 10.1007/978-3-030-30760-8_1
  • Chapter length: 15 pages
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
eBook
USD   69.99
Price excludes VAT (USA)
  • ISBN: 978-3-030-30760-8
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
Softcover Book
USD   89.99
Price excludes VAT (USA)
Fig. 1.

Notes

  1. 1.

    https://github.com/vliegenthart/coner_interactive_viewer.

  2. 2.

    http://wordnet.princeton.edu/.

References

  1. Abekawa, T., Aizawa, A.: SideNoter: scholarly paper browsing system based on PDF restructuring and text annotation. In: COLING (Demos), pp. 136–140 (2016)

    Google Scholar 

  2. Aizawa, A.: PDFNLT (2018). https://github.com/KMCS-NII/PDFNLT

  3. Brambilla, M., Ceri, S., Della Valle, E., Volonterio, R., Acero Salazar, F.X.: Extracting emerging knowledge from social media. In: Proceedings of the 26th International Conference on World Wide Web, pp. 795–804. International World Wide Web Conferences Steering Committee (2017)

    Google Scholar 

  4. Eftimov, T., Seljak, B.K., Korošec, P.: A rule-based named-entity recognition method for knowledge extraction of evidence-based dietary recommendations. PLoS One 12(6), e0179488 (2017)

    CrossRef  Google Scholar 

  5. Goldberg, S.L., Wang, D.Z., Kraska, T.: CASTLE: crowd-assisted system for text labeling and extraction. In: First AAAI Conference on Human Computation and Crowdsourcing (2013)

    Google Scholar 

  6. Kejriwal, M., Szekely, P.: Information extraction in illicit web domains. In: Proceedings of the 26th International Conference on World Wide Web, pp. 997–1006. International World Wide Web Conferences Steering Committee (2017)

    Google Scholar 

  7. Lample, G., Ballesteros, M., Subramanian, S., Kawakami, K., Dyer, C.: Neural architectures for named entity recognition. In: Proceedings of NAACL-HLT, pp. 260–270 (2016)

    Google Scholar 

  8. Marrero, M., Urbano, J.: A semi-automatic and low-cost method to learn patterns for named entity recognition. Nat. Lang. Eng. 24, 1–37 (2017)

    Google Scholar 

  9. Mathew, G., Agarwal, A., Menzies, T.: Trends in topics at SE conferences (1993–2013). arXiv preprint arXiv:1608.08100 (2016)

  10. Mesbah, S., Fragkeskos, K., Lofi, C., Bozzon, A., Houben, G.-J.: Facet embeddings for explorative analytics in digital libraries. In: Kamps, J., Tsakonas, G., Manolopoulos, Y., Iliadis, L., Karydis, I. (eds.) TPDL 2017. LNCS, vol. 10450, pp. 86–99. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-67008-9_8

    CrossRef  Google Scholar 

  11. Mesbah, S., Fragkeskos, K., Lofi, C., Bozzon, A., Houben, G.-J.: Semantic annotation of data processing pipelines in scientific publications. In: Blomqvist, E., Maynard, D., Gangemi, A., Hoekstra, R., Hitzler, P., Hartig, O. (eds.) ESWC 2017. LNCS, vol. 10249, pp. 321–336. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-58068-5_20

    CrossRef  Google Scholar 

  12. Mesbah, S., Lofi, C., Torre, M.V., Bozzon, A., Houben, G.-J.: TSE-NER: an iterative approach for long-tail entity extraction in scientific publications. In: Vrandečić, D., et al. (eds.) ISWC 2018. LNCS, vol. 11136, pp. 127–143. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-00671-6_8

    CrossRef  Google Scholar 

  13. Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Advances in Neural Information Processing Systems, pp. 3111–3119 (2013)

    Google Scholar 

  14. Seitner, J., et al.: A large database of hypernymy relations extracted from the web. In: LREC (2016)

    Google Scholar 

  15. Shen, D., Zhang, J., Su, J., Zhou, G., Tan, C.L.: Multi-criteria-based active learning for named entity recognition. In: Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics, p. 589. Association for Computational Linguistics (2004)

    Google Scholar 

  16. Siddiqui, T., Ren, X., Parameswaran, A., Han, J.: FacetGist: collective extraction of document facets in large technical corpora. In: International Conference on Information and Knowledge Management, pp. 871–880. ACM (2016)

    Google Scholar 

  17. Song, M., Yu, H., Han, W.S.: Developing a hybrid dictionary-based bio-entity recognition technique. BMC Med. Inform. Decis. Mak. 15(1), S9 (2015)

    CrossRef  Google Scholar 

  18. Tsai, C.T., Kundu, G., Roth, D.: Concept-based analysis of scientific literature. In: Proceedings of the 22nd ACM International Conference on Conference on Information and Knowledge Management, pp. 1733–1738. ACM (2013)

    Google Scholar 

  19. Tuarob, S., Bhatia, S., Mitra, P., Giles, C.L.: AlgorithmSeer: a system for extracting and searching for algorithms in scholarly big data. IEEE Trans. Big Data 2(1), 3–17 (2016)

    CrossRef  Google Scholar 

  20. Wang, A., Hoang, C.D.V., Kan, M.Y.: Perspectives on crowdsourcing annotations for natural language processing. Lang. Resour. Eval. 47(1), 9–31 (2013)

    CrossRef  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sepideh Mesbah .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and Permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Verify currency and authenticity via CrossMark

Cite this paper

Vliegenthart, D., Mesbah, S., Lofi, C., Aizawa, A., Bozzon, A. (2019). Coner: A Collaborative Approach for Long-Tail Named Entity Recognition in Scientific Publications. In: Doucet, A., Isaac, A., Golub, K., Aalberg, T., Jatowt, A. (eds) Digital Libraries for Open Knowledge. TPDL 2019. Lecture Notes in Computer Science(), vol 11799. Springer, Cham. https://doi.org/10.1007/978-3-030-30760-8_1

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-30760-8_1

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-30759-2

  • Online ISBN: 978-3-030-30760-8

  • eBook Packages: Computer ScienceComputer Science (R0)