Semantic Web Evaluation Challenge

Semantic Web Evaluation Challenges pp 16-27

CETUS – A Baseline Approach to Type Extraction

  • Michael Röder
  • Ricardo Usbeck
  • René Speck
  • Axel-Cyrille Ngonga Ngomo
Conference paper
Part of the Communications in Computer and Information Science book series (CCIS, volume 548)

Abstract

The concurrent growth of the Document Web and the Data Web demands accurate information extraction tools to bridge the gap between the two. In particular, the extraction of knowledge on real-world entities is indispensable to populate knowledge bases on the Web of Data. Here, we focus on the recognition of types for entities to populate knowledge bases and enable subsequent knowledge extraction steps. We present CETUS, a baseline approach to entity type extraction. CETUS is based on a three-step pipeline comprising (i) offline, knowledge-driven type pattern extraction from natural-language corpora based on grammar-rules, (ii) an analysis of input text to extract types and (iii) the mapping of the extracted type evidence to a subset of the DOLCE+DnS Ultra Lite ontology classes. We implement and compare two approaches for the third step using the YAGO ontology as well as the FOX entity recognition tool.

References

  1. 1.
    Baldridge, J.: The opennlp project (2005)Google Scholar
  2. 2.
    Consoli, S., Reforgiato, D.: Using fred for named entity resolution, linking and typing for knowledge base population. In: Gandon, F., Sabou, M., Sack, H., Cabrio, E., Stankovic, M., Zimmermann, A. (eds.) Proceedings of the OKE Challenge 2015 co-located with the 12th Extended Semantic Web Conference (ESWC 2015). Springer International Publishing, Switzerland (2015)Google Scholar
  3. 3.
    Gao, J., Mazumdar, S.: Exploiting linked open data to uncover entity type. In: Gandon, F., Sabou, M., Sack, H., Cabrio, E., Stankovic, M., Zimmermann, A. (eds.) Proceedings of the OKE Challenge 2015 co-located with the 12th Extended Semantic Web Conference (ESWC 2015), pp. 51–62. Springer International Publishing, Switzerland (2015)Google Scholar
  4. 4.
    Hearst, M.A.: Automatic acquisition of hyponyms from large text corpora. In: COLING (1992)Google Scholar
  5. 5.
    Plu, G.R.J., Troncy, R.: An hybrid approach for entity recognition and linking. In: Gandon, F., Sabou, M., Sack, H., Cabrio, E., Stankovic, M., Zimmermann, A. (eds.) Proceedings of the OKE Challenge 2015 co-located with the 12th Extended Semantic Web Conference (ESWC 2015), pp. 28–39. Springer International Publishing, Switzerland (2015)Google Scholar
  6. 6.
    Lee, H., Chang, A., Peirsman, Y., Chambers, N., Surdeanu, M., Jurafsky, D.: Deterministic coreference resolution based on entity-centric, precision-ranked rules. Comput. Linguist. 39(4), 885–916 (2013)CrossRefGoogle Scholar
  7. 7.
    Mahdisoltani, F., Biega, J., Suchanek, F.: YAGO3: a knowledge base from multilingual Wikipedias. In: CIDR (2014)Google Scholar
  8. 8.
    Manning, C.D., Surdeanu, M., Bauer, J., Finkel, J., Bethard, S.J., McClosky, D.: The stanford CoreNLP natural language processing toolkit. In: Proceedings of 52nd Annual Meeting of the Association for Computational Linguistics: System Demonstrations, pp. 55–60 (2014)Google Scholar
  9. 9.
    Nadeau, D.: Balie-baseline information extraction: Multilingual information extraction from text with machine learning and natural language techniques. Technical report, University of Ottawa (2005)Google Scholar
  10. 10.
    Nadeau, D., Sekine, S.: A survey of named entity recognition and classification. Lingvisticae Investigationes 30(1), 3–26 (2007)CrossRefGoogle Scholar
  11. 11.
    Ratinov, L., Roth, D.: Design challenges and misconceptions in named entity recognition. In: Proceedings of the Thirteenth Conference on Computational Natural Language Learning, CoNLL 2009, pp. 147–155. Association for Computational Linguistics, Stroudsburg, PA, USA (2009)Google Scholar
  12. 12.
    Snow, R., Jurafsky, D., Ng, A.Y.: Learning syntactic patterns for automatic hypernym discovery. In: Advances in Neural Information Processing Systems (NIPS 2004), November 2004Google Scholar
  13. 13.
    Speck, R., Ngonga Ngomo, A.-C.: Ensemble learning for named entity recognition. In: Mika, P., et al. (eds.) ISWC 2014, Part I. LNCS, vol. 8796, pp. 519–534. Springer, Heidelberg (2014) Google Scholar
  14. 14.
    Usbeck, R., Röder, M., Ngomo, A.-C.N., Baron, C., Both, A., Brümmer, M., Ceccarelli, D., Cornolti, M., Cherix, D., Eickmann, B., Ferragina, P., Lemke, C., Moro, A., Navigli, R., Piccinno, F., Rizzo, G., Sack, H., Speck, R., Troncy, R., Waitelonis, J., Wesemann, L.: GERBIL - general entity annotation benchmark framework. In: 24th WWW conference (2015)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  • Michael Röder
    • 1
  • Ricardo Usbeck
    • 1
  • René Speck
    • 1
  • Axel-Cyrille Ngonga Ngomo
    • 1
  1. 1.University of LeipzigLeipzigGermany

Personalised recommendations