CETUS – A Baseline Approach to Type Extraction

Röder, Michael; Usbeck, Ricardo; Speck, René; Ngomo, Axel-Cyrille Ngonga

doi:10.1007/978-3-319-25518-7_2

Michael Röder¹⁴,
Ricardo Usbeck¹⁴,
René Speck¹⁴ &
…
Axel-Cyrille Ngonga Ngomo¹⁴

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 548))

Included in the following conference series:

Semantic Web Evaluation Challenges

687 Accesses
10 Citations

Abstract

The concurrent growth of the Document Web and the Data Web demands accurate information extraction tools to bridge the gap between the two. In particular, the extraction of knowledge on real-world entities is indispensable to populate knowledge bases on the Web of Data. Here, we focus on the recognition of types for entities to populate knowledge bases and enable subsequent knowledge extraction steps. We present CETUS, a baseline approach to entity type extraction. CETUS is based on a three-step pipeline comprising (i) offline, knowledge-driven type pattern extraction from natural-language corpora based on grammar-rules, (ii) an analysis of input text to extract types and (iii) the mapping of the extracted type evidence to a subset of the DOLCE+DnS Ultra Lite ontology classes. We implement and compare two approaches for the third step using the YAGO ontology as well as the FOX entity recognition tool.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
http://nlp.cs.rpi.edu/kbp/2014/.
2.
http://www.scc.lancs.ac.uk/microposts2015/.
3.
http://2015.eswc-conferences.org/important-dates/call-OKEC.
4.
See http://stlab.istc.cnr.it/stlab/WikipediaOntology/. Throughout this paper, we use the prefix dul for types of this ontology.
5.
http://www.antlr.org/.
6.
Abbreviations in Listing 1.1: ADJ = adjective, CD = cardinal number.
7.
The complete grammar can be found in the projects source code repository.
8.
The rdfs prefix stands for http://www.w3.org/2000/01/rdf-schema while the prefix ex could stay for every user defined vocabulary, e.g., http://example.com/.
9.
This mapping can be found inside the git repository of the project at https://github.com/AKSW/Cetus/blob/master/DOLCE_YAGO_links.nt.
10.
Throughout this paper, we use the prefix yago for http://yago-knowledge.org/resource/.
11.
The results of the challenge can be found at https://github.com/anuzzolese/oke-challenge#results.

References

Baldridge, J.: The opennlp project (2005)
Google Scholar
Consoli, S., Reforgiato, D.: Using fred for named entity resolution, linking and typing for knowledge base population. In: Gandon, F., Sabou, M., Sack, H., Cabrio, E., Stankovic, M., Zimmermann, A. (eds.) Proceedings of the OKE Challenge 2015 co-located with the 12th Extended Semantic Web Conference (ESWC 2015). Springer International Publishing, Switzerland (2015)
Google Scholar
Gao, J., Mazumdar, S.: Exploiting linked open data to uncover entity type. In: Gandon, F., Sabou, M., Sack, H., Cabrio, E., Stankovic, M., Zimmermann, A. (eds.) Proceedings of the OKE Challenge 2015 co-located with the 12th Extended Semantic Web Conference (ESWC 2015), pp. 51–62. Springer International Publishing, Switzerland (2015)
Google Scholar
Hearst, M.A.: Automatic acquisition of hyponyms from large text corpora. In: COLING (1992)
Google Scholar
Plu, G.R.J., Troncy, R.: An hybrid approach for entity recognition and linking. In: Gandon, F., Sabou, M., Sack, H., Cabrio, E., Stankovic, M., Zimmermann, A. (eds.) Proceedings of the OKE Challenge 2015 co-located with the 12th Extended Semantic Web Conference (ESWC 2015), pp. 28–39. Springer International Publishing, Switzerland (2015)
Google Scholar
Lee, H., Chang, A., Peirsman, Y., Chambers, N., Surdeanu, M., Jurafsky, D.: Deterministic coreference resolution based on entity-centric, precision-ranked rules. Comput. Linguist. 39(4), 885–916 (2013)
Article Google Scholar
Mahdisoltani, F., Biega, J., Suchanek, F.: YAGO3: a knowledge base from multilingual Wikipedias. In: CIDR (2014)
Google Scholar
Manning, C.D., Surdeanu, M., Bauer, J., Finkel, J., Bethard, S.J., McClosky, D.: The stanford CoreNLP natural language processing toolkit. In: Proceedings of 52nd Annual Meeting of the Association for Computational Linguistics: System Demonstrations, pp. 55–60 (2014)
Google Scholar
Nadeau, D.: Balie-baseline information extraction: Multilingual information extraction from text with machine learning and natural language techniques. Technical report, University of Ottawa (2005)
Google Scholar
Nadeau, D., Sekine, S.: A survey of named entity recognition and classification. Lingvisticae Investigationes 30(1), 3–26 (2007)
Article Google Scholar
Ratinov, L., Roth, D.: Design challenges and misconceptions in named entity recognition. In: Proceedings of the Thirteenth Conference on Computational Natural Language Learning, CoNLL 2009, pp. 147–155. Association for Computational Linguistics, Stroudsburg, PA, USA (2009)
Google Scholar
Snow, R., Jurafsky, D., Ng, A.Y.: Learning syntactic patterns for automatic hypernym discovery. In: Advances in Neural Information Processing Systems (NIPS 2004), November 2004
Google Scholar
Speck, R., Ngonga Ngomo, A.-C.: Ensemble learning for named entity recognition. In: Mika, P., et al. (eds.) ISWC 2014, Part I. LNCS, vol. 8796, pp. 519–534. Springer, Heidelberg (2014)
Google Scholar
Usbeck, R., Röder, M., Ngomo, A.-C.N., Baron, C., Both, A., Brümmer, M., Ceccarelli, D., Cornolti, M., Cherix, D., Eickmann, B., Ferragina, P., Lemke, C., Moro, A., Navigli, R., Piccinno, F., Rizzo, G., Sack, H., Speck, R., Troncy, R., Waitelonis, J., Wesemann, L.: GERBIL - general entity annotation benchmark framework. In: 24th WWW conference (2015)
Google Scholar

Download references

Acknowledgements

This work has been supported by the FP7 project GeoKnow (GA No. 318159) and the BMWI Project SAKE (Project No. 01MD15006E).

Author information

Authors and Affiliations

University of Leipzig, Leipzig, Germany
Michael Röder, Ricardo Usbeck, René Speck & Axel-Cyrille Ngonga Ngomo

Authors

Michael Röder
View author publications
You can also search for this author in PubMed Google Scholar
Ricardo Usbeck
View author publications
You can also search for this author in PubMed Google Scholar
René Speck
View author publications
You can also search for this author in PubMed Google Scholar
Axel-Cyrille Ngonga Ngomo
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Michael Röder .

Editor information

Editors and Affiliations

Inria, Sophia Antipolis, France
Fabien Gandon
INRIA Sophia-Antipolis Méditerranée, Sophia Antipolis, France
Elena Cabrio
Université Paris-Sorbonne, Paris, France
Milan Stankovic
École des Mines de Saint-Étienne, Saint-Étienne, France
Antoine Zimmermann

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Röder, M., Usbeck, R., Speck, R., Ngomo, AC.N. (2015). CETUS – A Baseline Approach to Type Extraction. In: Gandon, F., Cabrio, E., Stankovic, M., Zimmermann, A. (eds) Semantic Web Evaluation Challenges. SemWebEval 2015. Communications in Computer and Information Science, vol 548. Springer, Cham. https://doi.org/10.1007/978-3-319-25518-7_2

Download citation

DOI: https://doi.org/10.1007/978-3-319-25518-7_2
Published: 07 January 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-25517-0
Online ISBN: 978-3-319-25518-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics