Advertisement

Improving Language-Dependent Named Entity Detection

  • Gerald PetzEmail author
  • Werner Wetzlinger
  • Dietmar Nedbal
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10410)

Abstract

Named Entity Recognition (NER) and Named Entity Linking (NEL) are two research areas that have shown big advancements in recent years. The majority of this research is based on the English language. Hence, some of these improvements are language-dependent and do not necessarily lead to better results when applied to other languages. Therefore, this paper discusses TOMO, an approach to language-aware named entity detection and evaluates it for the German language. This also required the development of a German gold standard dataset, which was based on the English dataset used by the OKE 2016 challenge. An evaluation of the named entity detection task using the web-based platform GERBIL was undertaken and results show that our approach produced higher F1 values than the other annotators did. This indicates that language-dependent features do improve the overall quality of the spotter.

Keywords

Entity recognition Entity detection Language-dependent Dataset development Gold standard NER 

Notes

Acknowledgements

This research was supported by HC Solutions GesmbH, Linz, Austria. We have to express out appreciation to Florian Wurzer, Reinhard Schwab and Manfred Kain for discussing these topics with us.

The TOMO Named Entity Linking is part of TOMO ® (http://www.tomo-base.at), a big data platform for aggregating content, analyzing and visualizing content.

References

  1. 1.
    Petasis, G., Spiliotopoulos, D., Tsirakis, N., Tsantilas, P.: Large-scale sentiment analysis for reputation management. In: Gindl, S., Remus, R., Wiegand, M. (eds.) 2nd Workshop on Practice and Theory of Opinion Mining and Sentiment Analysis (2013)Google Scholar
  2. 2.
    Derczynski, L., Maynard, D., Rizzo, G., van Erp, M., Gorrell, G., Troncy, R., Petrak, J., Bontcheva, K.: Analysis of Named Entity Recognition and Linking for Tweets. Preprint submitted to Elsevier (2014)Google Scholar
  3. 3.
    Rizzo, G., van Erp, M., Troncy, R.: Benchmarking the extraction and disambiguation of named entities on the semantic web. In: 9th International Conference on Language Resources and Evaluation (LREC 2014), pp. 4593–4600 (2014)Google Scholar
  4. 4.
    Holzinger, A.: Introduction to machine learning and knowledge extraction (MAKE). Mach. Learn. Knowl. Extr. 1, 1–20 (2017)CrossRefGoogle Scholar
  5. 5.
    Rizzo, G., Troncy, R., Hellmann, S., Brümmer, M.: NERD meets NIF: lifting NLP extraction results to the linked data cloud. In: LDOW, 5th Workshop on Linked Data on the Web, 16 April 2012, Lyon, France (2012)Google Scholar
  6. 6.
    Piccinno, F., Ferragina, P.: From TagME to WAT: a new entity annotator. In: Proceedings of the First International Workshop on Entity Recognition & Disambiguation, pp. 55–62. ACM, New York (2014)Google Scholar
  7. 7.
    Daiber, J., Jakob, M., Hokamp, C., Mendes, P.N.: improving efficiency and accuracy in multilingual entity extraction. In: Proceedings of the 9th International Conference on Semantic Systems, pp. 121–124. ACM, New York (2013)Google Scholar
  8. 8.
    Nuzzolese, A.G., Gentile, A.L., Presutti, V., Gangemi, A., Garigliotti, D., Navigli, R.: Open knowledge extraction challenge. In: Gandon, F., Cabrio, E., Stankovic, M., Zimmermann, A. (eds.) SemWebEval 2015. CCIS, vol. 548, pp. 3–15. Springer, Cham (2015). doi: 10.1007/978-3-319-25518-7_1 CrossRefGoogle Scholar
  9. 9.
    Rizzo, G., Pereira, B., Varga, A., van Erp, M., Cano Basave, A.E.: Lessons learnt from the named entity rEcognition and linking (NEEL) challenge series. Semantic Web J. (2017, in press)Google Scholar
  10. 10.
    Carmel, D., Chang, M.-W., Gabrilovich, E., Hsu, B.-J., Wang, K.: ERD 2014: entity recognition and disambiguation challenge. SIGIR Forum 48, 63–77 (2014)CrossRefGoogle Scholar
  11. 11.
    Usbeck, R., Röder, M., Ngonga Ngomo, A.-C.: GERBIL – General Entity Annotator Benchmarking Framework (2015)Google Scholar
  12. 12.
    Röder, M., Usbeck, R., Ngonga Ngomo, A.-C.: GERBIL’s New Stunts: Semantic Annotation Benchmarking Improved (2016)Google Scholar
  13. 13.
    Hachey, B., Radford, W., Nothman, J., Honnibal, M., Curran, J.R.: Evaluating entity linking with wikipedia. Artif. Intell. 194, 130–150 (2013)MathSciNetCrossRefzbMATHGoogle Scholar
  14. 14.
    Mendes, P.N., Jakob, M., Garcia-Silva, A., Bizer, C.: DBpedia spotlight: shedding light on the web of documents. In: Proceedings of the 7th International Conference on Semantic Systems, pp. 1–8. ACM, New York (2011)Google Scholar
  15. 15.
    Mendes, P.N., Jakob, M., Bizer, C.: DBpedia: a multilingual cross-domain knowledge base. In: Proceedings of the International Conference on Language Resources and Evaluation (LREC), pp. 1813–1817 (2012)Google Scholar
  16. 16.
    Rizzo, G., Troncy, R.: NERD: evaluating named entity recognition tools in the web of data. In: ISWC 2011, Workshop on Web Scale Knowledge Extraction (WEKEX 2011), 23–27 October 2011, Bonn, Germany (2011)Google Scholar
  17. 17.
    Hoffart, J., Yosef, M.A., Bordino, I., Fürstenau, H., Pinkal, M., Spaniol, M., Taneva, B., Thater, S., Weikum, G.: Robust disambiguation of named entities in text. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, pp. 782–792. Association for Computational Linguistics, Stroudsburg, PA, USA (2011)Google Scholar
  18. 18.
    Charton, E., Gagnon, M., Ozell, B.: Automatic semantic web annotation of named entities. In: Butz, C., Lingras, P. (eds.) AI 2011. LNCS, vol. 6657, pp. 74–85. Springer, Heidelberg (2011). doi: 10.1007/978-3-642-21043-3_10 CrossRefGoogle Scholar
  19. 19.
    Eckhardt, A., Hreško, J., Procházka, J., Smrž, O.: Entity Recognition Based on the Co-occurrence Graph and Entity Probability (2014)Google Scholar
  20. 20.
    Zhao, S., Li, C., Ma, S., Ma, T., Ma, D.: Combining POS tagging, lucene search and similarity metrics for entity linking. In: Lin, X., Manolopoulos, Y., Srivastava, D., Huang, G. (eds.) WISE 2013. LNCS, vol. 8180, pp. 503–509. Springer, Heidelberg (2013). doi: 10.1007/978-3-642-41230-1_44 CrossRefGoogle Scholar
  21. 21.
    Zhang, L., Dong, Y., Rettinger, A.: Towards Entity Correctness, Completeness and Emergence for Entity Recognition (2015)Google Scholar
  22. 22.
    Moro, A., Raganato, A., Navigli, R.: Entity linking meets word sense disambiguation: a unified approach. Trans. Assoc. Comput. Linguist. 2, 231–244 (2014)Google Scholar
  23. 23.
    Leaman, R., Gonzalez, G.: BANNER: an executable survey of advances in biomedical named entity recognition. In: Pacific Symposium on Biocomputing, vol. 13, pp. 652–663 (2008)Google Scholar
  24. 24.
    Cucerzan, S.: Large-scale named entity disambiguation based on wikipedia data. In: Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL), pp. 708–716. Association for Computational Linguistics, Prague, Czech Republic (2007)Google Scholar
  25. 25.
    Dojchinovski, M., Kliegr, T.: Entityclassifier.eu: Real-Time Classification of Entities in Text with Wikipedia. In: Blockeel, H., Kersting, K., Nijssen, S., Železný, F. (eds.) ECML PKDD 2013. LNCS, vol. 8190, pp. 654–658. Springer, Heidelberg (2013). doi: 10.1007/978-3-642-40994-3_48 CrossRefGoogle Scholar
  26. 26.
    Kliegr, T.: Linked hypernyms: Enriching DBpedia with Targeted Hypernym Discovery. Web Semantics: Science, Services and Agents on the World Wide Web (2014)Google Scholar
  27. 27.
    Tonelli, S., Giuliano, C., Tymoshenko, K.: Wikipedia-based WSD for multilingual frame annotation. Artif. Intell. 194, 203–221 (2013)MathSciNetCrossRefzbMATHGoogle Scholar
  28. 28.
    Goudas, T., Louizos, C., Petasis, G., Karkaletsis, V.: Argument Extraction from News, Blogs, and Social Media. In: Likas, A., Blekas, K., Kalles, D. (eds.) SETN 2014. LNCS, vol. 8445, pp. 287–299. Springer, Cham (2014). doi: 10.1007/978-3-319-07064-3_23 CrossRefGoogle Scholar
  29. 29.
    Ritter, A., Clark, S., Mausam, Etzioni, O.: Named entity recognition in Tweets: An Experimental Study. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, pp. 1524–1534. Association for Computational Linguistics, Stroudsburg, PA, USA (2011)Google Scholar
  30. 30.
    Olieman, A., Azarbonyad, H., Dehghani, M., Kamps, J., Marx, M.: Entity linking by focusing DBpedia candidate entities. In: Carmel, D., Chang, M.-W., Gabrilovich, E., Hsu, B.-J., Wang, K. (eds.) The First International Workshop, pp. 13–24 (2014)Google Scholar
  31. 31.
    Chiu, Y.-P., Shih, Y.-S., Lee, Y.-Y., Shao, C.-C., Cai, M.-L., Wei, S.-L., Chen, H.-H.: NTUNLP approaches to recognizing and disambiguating entities in long and short text at the ERD challenge 2014. In: Carmel, D., Chang, M.-W., Gabrilovich, E., Hsu, B.-J., Wang, K. (eds.) The First International Workshop, pp. 3–12Google Scholar
  32. 32.
    Barrena, A., Agirre, E., Soroa, A.: UBC entity recognition and disambiguation at ERD 2014. In: Carmel, D., Chang, M.-W., Gabrilovich, E., Hsu, B.-J., Wang, K. (eds.) The First International Workshop, pp. 79–82 (2014)Google Scholar
  33. 33.
    Noraset, T., Bhagavatula, C., Downey, D.: WebSAIL wikifier at ERD 2014. In: Carmel, D., Chang, M.-W., Gabrilovich, E., Hsu, B.-J., Wang, K. (eds.) The First International Workshop, pp. 119–124 (2014)Google Scholar
  34. 34.
    Lipczak, M., Koushkestani, A., Milios, E.: Tulip: lightweight entity recognition and disambiguation using wikipedia-based topic centroids. In: Carmel, D., Chang, M.-W., Gabrilovich, E., Hsu, B.-J., Wang, K. (eds.) The First International Workshop, pp. 31–36 (2014)Google Scholar
  35. 35.
    Petasis, G., Spiliotopoulos, D., Tsirakis, N., Tsantilas, P.: Sentiment Analysis for Reputation Management: Mining the Greek Web. In: Likas, A., Blekas, K., Kalles, D. (eds.) SETN 2014. LNCS, vol. 8445, pp. 327–340. Springer, Cham (2014). doi: 10.1007/978-3-319-07064-3_26 CrossRefGoogle Scholar
  36. 36.
    Ceccarelli, D., Lucchese, C., Orlando, S., Perego, R., Trani, S.: Dexter: an open source framework for entity linking. In: Proceedings of the Sixth International Workshop on Exploiting Semantic Annotations in Information Retrieval, pp. 17–20. ACM, New York (2013)Google Scholar
  37. 37.
    Ceccarelli, D., Lucchese, C., Orlando, S., Perego, R., Trani, S.: Dexter 2.0 - an open source tool for semantically enriching data. In: Horridge, M., Rospocher, M., van Ossenbruggen, J. (eds.) Proceedings of the ISWC 2014 Posters & Demonstrations Track, pp. 417–420 (2014)Google Scholar
  38. 38.
    Ferragina, P., Scaiella, U.: TAGME: On-the-fly annotation of short text fragments (by Wikipedia Entities). In: Proceedings of the 19th ACM International Conference on Information and Knowledge Management, pp. 1625–1628. ACM, New York (2010)Google Scholar
  39. 39.
    Mihalcea, R., Csomai, A.: Wikify!: linking documents to encyclopedic knowledge. In: Proceedings of the Sixteenth ACM Conference on Conference on Information and Knowledge Management, pp. 233–241. ACM, New York (2007)Google Scholar
  40. 40.
    Ratinov, L., Roth, D., Downey, D., Anderson, M.: Local and global algorithms for disambiguation to wikipedia. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, vol. 1, pp. 1375–1384. Association for Computational Linguistics, Stroudsburg, PA, USA (2011)Google Scholar
  41. 41.
    Agirre, E., Soroa, A.: Personalizing PageRank for word sense disambiguation. In: Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics, pp. 33–41. Association for Computational Linguistics, Athens, Greece (2009)Google Scholar
  42. 42.
    Agirre, E., de Lacalle, O.L., Soroa, A.: Random walks for knowledge-based word sense disambiguation. Comput. Linguist. 40, 57–84 (2014)CrossRefGoogle Scholar
  43. 43.
    Milne, D., Witten, I.H.: An open-source toolkit for mining Wikipedia. Artif. Intell. Wikipedia Semi-Struct. Res. 194, 222–239 (2013)MathSciNetGoogle Scholar
  44. 44.
    Kemmerer, S., Großmann, B., Müller, C., Adolphs, P., Ehrig, H.: The neofonie NERD system at the ERD challenge 2014. In: Carmel, D., Chang, M.-W., Gabrilovich, E., Hsu, B.-J., Wang, K. (eds.) The First International Workshop, pp. 83–88 (2014)Google Scholar
  45. 45.
    Gangemi, A., Presutti, V., Reforgiato Recupero, D., Nuzzolese, A.G., Draicchio, F., Mongiovì, M., Alani, H.: Semantic Web machine reading with FRED. In: SW, pp.1–21 (2016)Google Scholar
  46. 46.
    Lehmann, J., Monahan, S., Nezda, L., Jung, A., Shi, Y.: LCC approaches to knowledge base population at TAC 2010. In: TAC 2010 Proceedings Papers (2010)Google Scholar
  47. 47.
    Han, X., Zhao, J.: NLPR_KBP in TAC 2009 KBP track: a two-stage method to entity linking. In: TAC 2009 Workshop (2009)Google Scholar
  48. 48.
    Dredze, M., McNamee, P., Rao, D., Gerber, A., Finin, T.: Entity disambiguation for knowledge base population. In: Proceedings of the 23rd International Conference on Computational Linguistics, Coling 2010, pp. 277–285 (2010)Google Scholar
  49. 49.
    Monahan, S., Lehmann, J., Nyberg, T., Plymale, J., Jung, A.: cross-lingual cross-document coreference with entity linking. In: Proceedings of the Text Analysis Conference. (2011)Google Scholar
  50. 50.
    Jain, A., Cucerzan, S., Azzam, S.: Acronym-expansion recognition and ranking on the web. In: 2007 IEEE International Conference on Information Reuse and Integration, pp. 209–214. IEEE (2007)Google Scholar
  51. 51.
    Hakimov, S., Oto, S.A., Dogdu, E.: Named entity recognition and disambiguation using linked data and graph-based centrality scoring. In: Proceedings of the 4th International Workshop on Semantic Web Information Management, p. 4. ACM, New York (2012)Google Scholar
  52. 52.
    Milne, D., Witten, I.H.: Learning to link with wikipedia. In: Proceedings of the 17th ACM Conference on Information and Knowledge Management, pp. 509–518. ACM, New York (2008)Google Scholar
  53. 53.
    Han, X., Le, S.: A generative entity-mention model for linking entities with knowledge base. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, vol. 1, pp. 945–954. Association for Computational Linguistics, Stroudsburg, PA, USA (2011)Google Scholar
  54. 54.
    Han, X., Le, S.: An entity-topic model for entity linking. In: Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, pp. 105–115. Association for Computational Linguistics, Stroudsburg, PA, USA (2012)Google Scholar
  55. 55.
    Carpenter, B.: Phrasal queries with LingPipe and Lucene: ad hoc genomics text retrieval. In: Voorhees, E.M., Buckland, L.P. (eds.) Proceedings of the Thirteenth Text REtrieval Conference, TREC 2004. National Institute of Standards and Technology (NIST) (2004)Google Scholar
  56. 56.
  57. 57.
    Neumann, G., Backofen, R., Baur, J., Becker, M., Braun, C.: An information extraction core system for real world German text processing. In: Grishman, R. (ed.) The Fifth Conference, pp. 209–216Google Scholar
  58. 58.
    Speck, R., Ngonga Ngomo, A.-C.: Named entity recognition using FOX. In: International Semantic Web Conference 2014 (ISWC2014), Demos & Posters (2014)Google Scholar
  59. 59.
    Speck, R., Ngonga Ngomo, A.-C.: Ensemble learning for named entity recognition. In: Mika, P., et al. (eds.) ISWC 2014. LNCS, vol. 8796, pp. 519–534. Springer, Cham (2014). doi: 10.1007/978-3-319-11964-9_33 Google Scholar
  60. 60.

Copyright information

© IFIP International Federation for Information Processing 2017

Authors and Affiliations

  1. 1.University of Applied Sciences Upper AustriaSteyrAustria

Personalised recommendations