Skip to main content

Multilingual Fine-Grained Entity Typing

  • Conference paper
  • First Online:
  • 1339 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 10318))

Abstract

Many entity recognition approaches classify recognised entities into a limited set of coarse-grained entity types. However, for deeper natural language analysis and end-user tasks, fine-grained entity types are more useful. For example, while standard named entity recognition may determine that an entity is a person knowing whether that entity is a politician or an actor is important for determining whether, in a subsequent relation extraction task, a relation should be acts or governs. Currently, fine-grained entity typing has only been investigated for English. In this paper, we present a fine-grained entity typing system for Dutch and Spanish using training data extracted from Wikipedia and DBpedia. Our system achieves comparable performance to English with an F\(_{1}\) measure of .90 on over 40 types for both Dutch and Spanish.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    https://www.mpi-inf.mpg.de/departments/databases-and-information-systems/research/yago-naga/yago/.

  2. 2.

    https://wordnet.princeton.edu/.

  3. 3.

    https://dumps.wikimedia.org/backup-index.html.

  4. 4.

    https://github.com/attardi/wikiextractor.

  5. 5.

    Using the wikilinks and instance types dumps from the latest DBpedia, version 2016-04 http://wiki.dbpedia.org/downloads-2016-04.

  6. 6.

    The types we could not map were the following: location/structure/government, organization/stock_exchange, other/health, other/living_thing, other/product/car, other/product/computer, person/education, person/education/student, person/education/teacher.

  7. 7.

    Although there is more text in the Spanish DBpedia, we only included a sample here to showcase the adaptability of the approach to other languages.

  8. 8.

    https://github.com/facebookresearch/fastText.

  9. 9.

    If an entity X has types location/structure and organisation/education assigned to it, two instances are generated namely X, location/structure and X, organisation/education.

  10. 10.

    The number of types from levels 1–3 do not add up to the total number of types as some of the higher level types are not present on their own, such as other.

  11. 11.

    dbpedia: is shorthand for http://dbpedia.org/resource.

  12. 12.

    dbo: is shorthand for http://dbpedia.org/ontology/.

  13. 13.

    http://www.mpi-inf.mpg.de/departments/databases-and-information-systems/research/yago-naga/yago/downloads/.

References

  1. Bizer, C., Lehmann, J., Kobilarov, G., Auer, S., Becker, C., Cyganiak, R., Hellmann, S.: DBpedia - a crystallization point for the web of data. Web Semant. Sci. Serv. Agents World Wide Web 7(3), 154–165 (2009)

    Article  Google Scholar 

  2. Bojanowski, P., Grave, E., Joulin, A., Mikolov, T.: Enriching word vectors with subword information. Technical report, Archiv (2016). https://arxiv.org/abs/1607.04606

  3. Corro, L.D., Abujabal, A., Gemulla, R., Weikum, G.: FINET: context-aware fine-grained named entity typing. In: Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, Lisbon, Portugal, 17–21 September 2015, pp. 868–878 (2015)

    Google Scholar 

  4. Ekbal, A., Sourjikova, E., Frank, A., Ponzetto, S.P.: Assessing the challenge of fine-grained named entity recognition and classification. In: Proceedings of the 2010 Named Entities Workshop at ACL 2010, Uppsala, Sweden, July 2010, pp. 93–101 (2010)

    Google Scholar 

  5. Gillick, D., Lazic, N., Ganchev, K., Kirchner, J., Huynh, D.: Context-dependent fine-grained entity type tagging. arXiv (2014)

    Google Scholar 

  6. Giuliano, C.: Fine-grained classification of named entities exploiting latent semantic Kernels. In: Proceedings of the Thirteenth Conference on Computational Natural Language Learning, CNLL, Boulder, Colorado, USA, pp. 201–209 (2009)

    Google Scholar 

  7. Hovy, D.: How well can we learn interpretable entity types from text? In: Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Short papers), Baltimore, Maryland, USA, 23–25 June 2014, pp. 482–487. Association for Computational Linguistics (2014)

    Google Scholar 

  8. Joulin, A., Grave, E., Bojanowski, P., Mikolov, T.: Bag of tricks for efficient text classification. Technical report, arXiv (2016). https://arxiv.org/abs/1607.01759

  9. Ling, X., Weld, D.S.: Fine-grained entity recognition. In: AAAI (2012)

    Google Scholar 

  10. Linguistic Data Consortium: ACE (automatic content extraction) english annotation guidelines for entities. Technical report, Linguistic Data Consortium, version 5.6.6 2006.08.01 (2006)

    Google Scholar 

  11. Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 (2013)

  12. Nadeau, D., Sekine, S.: A survey of named entity recognition and classification. Lingvisticae Investigationes 30(1), 3–26 (2007)

    Article  Google Scholar 

  13. Nadeau, D., Turney, P.D., Matwin, S.: Unsupervised named-entity recognition: generating gazetteers and resolving ambiguity. In: Lamontagne, L., Marchand, M. (eds.) AI 2006. LNCS, vol. 4013, pp. 266–277. Springer, Heidelberg (2006). doi:10.1007/11766247_23

    Chapter  Google Scholar 

  14. Nakashole, N., Tylenda, T., Weikum, G.: Fine-grained semantic typing of emerging entities. In: Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics, Sofia, Bulgaria, 4–9 August 2013, pp. 1488–1497. Association for Computational Linguistics (2013)

    Google Scholar 

  15. Nothman, J., Curran, J., Murphy, T.: Transforming wikipedia into named entity training data. In: Proceedings of the Australasian Language Technology Association Workshop, pp. 124–132 (2008)

    Google Scholar 

  16. Recasens, M., de Marneffe, M.C., Potts, C.: The life and death of discourse entities: identifying singleton mentions. In: Proceedings of NAACL (2013)

    Google Scholar 

  17. Ren, X., He, W., Qu, M., Hang, L., Ji, H., Han, J.: AFET: automatic fine-grained entity typing by hierarchical partial-label embedding. In: Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing (EMNLP), Austin, TX, USA, 1–5 November 2016

    Google Scholar 

  18. Sekine, S., Sudo, K., Nobata, C.: Extended named entity hierarchy. In: LREC (2002)

    Google Scholar 

  19. Tjong Kim Sang, E.F., De Meulder, F.: Introduction to the CoNLL-2003 shared task: language-independent named entity recognition. In: Proceedings of the Seventh Conference on Natural Language Learning at HLT-NAACL 2003, vol. 4, pp. 142–147. Association for Computational Linguistics (2003)

    Google Scholar 

  20. Weischedel, R., Hovy, E., Marcus, M., Palmer, M., Belvin, R., Pradhan, S., Ramshaw, L., Xue, N.: Ontonotes: a large training corpus for enhanced processing. In: Olive, J., Christianson, C., McCary, J. (eds.) Handbook of Natural Language Processing and Machine Translation: DARPA Global Autonomous Language Exploitation, pp. 54–63. Springer, New York (2011)

    Google Scholar 

  21. Yaghoobzadeh, Y., Schütze, H.: Corpus-level fine-grained entity typing using contextual information. In: Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, Lisbon, Portugal, 17–21 September 2015, pp. 715–725. Association for Computational Linguistics (2015)

    Google Scholar 

  22. Yaghoobzadeh, Y., Schütze, H.: Multi-level representations for fine-grained typing of knowledge base entities. In: Proceedings of the European Chapter of the Association for Computational Linguistics (EACL), 3–7 April 2017. https://arxiv.org/abs/1701.02025 (2017, to appear)

  23. Yao, L., Riedel, S., McCallum, A.: Collective cross-document relation extraction without labelled data. In: Proceedings of EMNLP (2010)

    Google Scholar 

  24. Yogatama, D., Gillick, D., Lazic, N.: Embedding methods for fine grained entity type classification. In: Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (ACL-IJCNLP 2015), Short papers, Bejing, China, 26–31 July 2015, pp. 291–296. Association for Computational Linguistics (2015)

    Google Scholar 

  25. Yosef, M.A., Bauer, S., Hoffart, J., Spaniol, M., Weikum, G.: HYENA: hierarchical types classification for entity names. In: Proceedings of COLING 2012: Posters, Mumbai, India, December 2012, pp. 1361–1370 (2012)

    Google Scholar 

Download references

Acknowledgements

The research for this paper was made possible by the CLARIAH-CORE project financed by NWO.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Marieke van Erp .

Editor information

Editors and Affiliations

Appendix A: Results

Appendix A: Results

Table 5. Precision, recall and F\(_{1}\) scores on the overall datasets (macro-averaged) and per class.

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

van Erp, M., Vossen, P. (2017). Multilingual Fine-Grained Entity Typing. In: Gracia, J., Bond, F., McCrae, J., Buitelaar, P., Chiarcos, C., Hellmann, S. (eds) Language, Data, and Knowledge. LDK 2017. Lecture Notes in Computer Science(), vol 10318. Springer, Cham. https://doi.org/10.1007/978-3-319-59888-8_23

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-59888-8_23

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-59887-1

  • Online ISBN: 978-3-319-59888-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics