Skip to main content

CETUS – A Baseline Approach to Type Extraction

  • Conference paper
  • First Online:
Semantic Web Evaluation Challenges (SemWebEval 2015)

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 548))

Included in the following conference series:

Abstract

The concurrent growth of the Document Web and the Data Web demands accurate information extraction tools to bridge the gap between the two. In particular, the extraction of knowledge on real-world entities is indispensable to populate knowledge bases on the Web of Data. Here, we focus on the recognition of types for entities to populate knowledge bases and enable subsequent knowledge extraction steps. We present CETUS, a baseline approach to entity type extraction. CETUS is based on a three-step pipeline comprising (i) offline, knowledge-driven type pattern extraction from natural-language corpora based on grammar-rules, (ii) an analysis of input text to extract types and (iii) the mapping of the extracted type evidence to a subset of the DOLCE+DnS Ultra Lite ontology classes. We implement and compare two approaches for the third step using the YAGO ontology as well as the FOX entity recognition tool.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    http://nlp.cs.rpi.edu/kbp/2014/.

  2. 2.

    http://www.scc.lancs.ac.uk/microposts2015/.

  3. 3.

    http://2015.eswc-conferences.org/important-dates/call-OKEC.

  4. 4.

    See http://stlab.istc.cnr.it/stlab/WikipediaOntology/. Throughout this paper, we use the prefix dul for types of this ontology.

  5. 5.

    http://www.antlr.org/.

  6. 6.

    Abbreviations in Listing 1.1: ADJ = adjective, CD = cardinal number.

  7. 7.

    The complete grammar can be found in the projects source code repository.

  8. 8.

    The rdfs prefix stands for http://www.w3.org/2000/01/rdf-schema while the prefix ex could stay for every user defined vocabulary, e.g., http://example.com/.

  9. 9.

    This mapping can be found inside the git repository of the project at https://github.com/AKSW/Cetus/blob/master/DOLCE_YAGO_links.nt.

  10. 10.

    Throughout this paper, we use the prefix yago for http://yago-knowledge.org/resource/.

  11. 11.

    The results of the challenge can be found at https://github.com/anuzzolese/oke-challenge#results.

References

  1. Baldridge, J.: The opennlp project (2005)

    Google Scholar 

  2. Consoli, S., Reforgiato, D.: Using fred for named entity resolution, linking and typing for knowledge base population. In: Gandon, F., Sabou, M., Sack, H., Cabrio, E., Stankovic, M., Zimmermann, A. (eds.) Proceedings of the OKE Challenge 2015 co-located with the 12th Extended Semantic Web Conference (ESWC 2015). Springer International Publishing, Switzerland (2015)

    Google Scholar 

  3. Gao, J., Mazumdar, S.: Exploiting linked open data to uncover entity type. In: Gandon, F., Sabou, M., Sack, H., Cabrio, E., Stankovic, M., Zimmermann, A. (eds.) Proceedings of the OKE Challenge 2015 co-located with the 12th Extended Semantic Web Conference (ESWC 2015), pp. 51–62. Springer International Publishing, Switzerland (2015)

    Google Scholar 

  4. Hearst, M.A.: Automatic acquisition of hyponyms from large text corpora. In: COLING (1992)

    Google Scholar 

  5. Plu, G.R.J., Troncy, R.: An hybrid approach for entity recognition and linking. In: Gandon, F., Sabou, M., Sack, H., Cabrio, E., Stankovic, M., Zimmermann, A. (eds.) Proceedings of the OKE Challenge 2015 co-located with the 12th Extended Semantic Web Conference (ESWC 2015), pp. 28–39. Springer International Publishing, Switzerland (2015)

    Google Scholar 

  6. Lee, H., Chang, A., Peirsman, Y., Chambers, N., Surdeanu, M., Jurafsky, D.: Deterministic coreference resolution based on entity-centric, precision-ranked rules. Comput. Linguist. 39(4), 885–916 (2013)

    Article  Google Scholar 

  7. Mahdisoltani, F., Biega, J., Suchanek, F.: YAGO3: a knowledge base from multilingual Wikipedias. In: CIDR (2014)

    Google Scholar 

  8. Manning, C.D., Surdeanu, M., Bauer, J., Finkel, J., Bethard, S.J., McClosky, D.: The stanford CoreNLP natural language processing toolkit. In: Proceedings of 52nd Annual Meeting of the Association for Computational Linguistics: System Demonstrations, pp. 55–60 (2014)

    Google Scholar 

  9. Nadeau, D.: Balie-baseline information extraction: Multilingual information extraction from text with machine learning and natural language techniques. Technical report, University of Ottawa (2005)

    Google Scholar 

  10. Nadeau, D., Sekine, S.: A survey of named entity recognition and classification. Lingvisticae Investigationes 30(1), 3–26 (2007)

    Article  Google Scholar 

  11. Ratinov, L., Roth, D.: Design challenges and misconceptions in named entity recognition. In: Proceedings of the Thirteenth Conference on Computational Natural Language Learning, CoNLL 2009, pp. 147–155. Association for Computational Linguistics, Stroudsburg, PA, USA (2009)

    Google Scholar 

  12. Snow, R., Jurafsky, D., Ng, A.Y.: Learning syntactic patterns for automatic hypernym discovery. In: Advances in Neural Information Processing Systems (NIPS 2004), November 2004

    Google Scholar 

  13. Speck, R., Ngonga Ngomo, A.-C.: Ensemble learning for named entity recognition. In: Mika, P., et al. (eds.) ISWC 2014, Part I. LNCS, vol. 8796, pp. 519–534. Springer, Heidelberg (2014)

    Google Scholar 

  14. Usbeck, R., Röder, M., Ngomo, A.-C.N., Baron, C., Both, A., Brümmer, M., Ceccarelli, D., Cornolti, M., Cherix, D., Eickmann, B., Ferragina, P., Lemke, C., Moro, A., Navigli, R., Piccinno, F., Rizzo, G., Sack, H., Speck, R., Troncy, R., Waitelonis, J., Wesemann, L.: GERBIL - general entity annotation benchmark framework. In: 24th WWW conference (2015)

    Google Scholar 

Download references

Acknowledgements

This work has been supported by the FP7 project GeoKnow (GA No. 318159) and the BMWI Project SAKE (Project No. 01MD15006E).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Michael Röder .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Röder, M., Usbeck, R., Speck, R., Ngomo, AC.N. (2015). CETUS – A Baseline Approach to Type Extraction. In: Gandon, F., Cabrio, E., Stankovic, M., Zimmermann, A. (eds) Semantic Web Evaluation Challenges. SemWebEval 2015. Communications in Computer and Information Science, vol 548. Springer, Cham. https://doi.org/10.1007/978-3-319-25518-7_2

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-25518-7_2

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-25517-0

  • Online ISBN: 978-3-319-25518-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics