Skip to main content

Exploiting Linked Open Data to Uncover Entity Types

  • Conference paper
  • First Online:
Semantic Web Evaluation Challenges (SemWebEval 2015)

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 548))

Included in the following conference series:

Abstract

Extracting structured information from text plays a crucial role in automatic knowledge acquisition and is at the core of any knowledge representation and reasoning system. Traditional methods rely on hand-crafted rules and are restricted by the performance of various linguistic pre-processing tools. More recent approaches rely on supervised learning of relations trained on labelled examples, which can be manually created or sometimes automatically generated (referred as distant supervision). We propose a supervised method for entity typing and alignment. We argue that a rich feature space can improve extraction accuracy and we propose to exploit Linked Open Data (LOD) for feature enrichment. Our approach is tested on task-2 of the Open Knowledge Extraction challenge, including automatic entity typing and alignment. Our approach demonstrate that by combining evidences derived from LOD (e.g. DBpedia) and conventional lexical resources (e.g. WordNet) (i) improves the accuracy of the supervised induction method and (ii) enables easy matching with the Dolce+DnS Ultra Lite ontology classes.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    https://www.freebase.com/.

  2. 2.

    http://wiki.dbpedia.org/About.

  3. 3.

    https://github.com/anuzzolese/oke-challenge.

  4. 4.

    Source code can be found at https://github.com/jerrygaoLondon/oke-extractor.

  5. 5.

    http://stlab.istc.cnr.it/stlab/WikipediaOntology/.

  6. 6.

    We use the training data encoded in NIF format provided by the challenge organisers in this experiment. The NLP Interchange Format (NIF) is an RDF/OWL-based format that aims to achieve interoperability between Natural Language Processing (NLP) tools, language resources and annotations.

  7. 7.

    http://persistence.uni-leipzig.org/nlp2rdf/.

  8. 8.

    The rdfs stands for the namespace of RDF Schema (http://www.w3.org/2000/01/ rdf-schema#).

  9. 9.

    RDFLib: https://pypi.python.org/pypi/rdflib.

  10. 10.

    The SMART stop-word list built by Chris Buckley and Gerard Salton, which can be obtained from goo.gl/rBQNbO.

  11. 11.

    https://gate.ac.uk/sale/tao/splitch13.html.

  12. 12.

    https://gate.ac.uk/.

  13. 13.

    http://www.nltk.org/howto/wordnet.html.

  14. 14.

    SPARQLWrapper is a python based wrapper around a SPARQL service, access via http://rdflib.github.io/sparqlwrapper/.

  15. 15.

    http://www.nltk.org/howto/classify.html.

  16. 16.

    http://www.nltk.org/book/ch05.html.

  17. 17.

    The dul stands for the prefix for http://www.ontologydesignpatterns.org/ont/dul/DUL.owl.

  18. 18.

    The complete SPARQL query can be found in the projects source code repository.

  19. 19.

    https://github.com/anuzzolese/oke-challenge.

References

  1. Agichtein, E., Gravano, L.: Snowball: extracting relations from large plain-text collections. In: Proceedings of the Fifth ACM Conference on Digital Libraries. DL 2000, pp. 85–94. ACM, New York, NY, USA (2000). http://doi.acm.org/10.1145/336597.336644

  2. Bizer, C., Heath, T., Ayers, D., Raimond, Y.: Interlinking open data on the web. Media 79(1), 31–35 (2007). http://people.kmi.open.ac.uk/tom/papers/bizer-heath-eswc2007-interlinki ng-open-data.pdf

    Google Scholar 

  3. Bizer, C., Lehmann, J., Kobilarov, G., Auer, S., Becker, C., Cyganiak, R., Hellmann, S.: Dbpedia-a crystallization point for the web of data. Web Seman. Sci. Serv. Agents World Wide Web 7(3), 154–165 (2009)

    Article  Google Scholar 

  4. Bizer, C., Volz, J., Kobilarov, G., Gaedke, M.: Silk - a link discovery framework for the web of data. In: 18th International World Wide Web Conference, April 2009. http://www2009.eprints.org/227/

  5. Carlson, A., Betteridge, J., Kisiel, B., Settles, B., Hruschka, E.R., Mitchell, T.M.: Toward an architecture for never-ending language learning. In: AAAI (2010)

    Google Scholar 

  6. Daille, B., Habert, B., Jacquemin, C., Royauté, J.: Empirical observation of term variations and principles for their description. Terminology 3(2), 197–257 (1996)

    Article  Google Scholar 

  7. Fader, A., Soderland, S., Etzioni, O.: Identifying relations for open information extraction. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing. EMNLP 2011, pp. 1535–1545. Association for Computational Linguistics, Stroudsburg, PA, USA (2011). http://dl.acm.org/citation.cfm?id=2145432.2145596

  8. Fader, A., Zettlemoyer, L., Etzioni, O.: Open question answering over curated and extracted knowledge bases. In: Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. pp. 1156–1165. KDD 2014. ACM, New York, NY, USA (2014). http://doi.acm.org/10.1145/2623330.2623677

  9. Giunchiglia, F., Shvaiko, P., Yatskevich, M.: Discovering missing background knowledge in ontology matching. In: Proceedings of the 2006 Conference on ECAI 2006: 17th European Conference on Artificial Intelligence August 29 - September 1, 2006, Riva Del Garda, Italy. pp. 382–386. IOS Press, Amsterdam, The Netherlands (2006). http://dl.acm.org/citation.cfm?id=1567016.1567101

  10. Hearst, M.A.: Automatic acquisition of hyponyms from large text corpora (1992)

    Google Scholar 

  11. Kachroudi, M., Moussa, E.B., Zghal, S., Ben, S.: Ldoa results for oaei 2011. Ontology Matching, p. 148 (2011)

    Google Scholar 

  12. Lehmann, J., Isele, R., Jakob, M., Jentzsch, A., Kontokostas, D., Mendes, P.N., Hellmann, S., Morsey, M., van Kleef, P., Auer, S., Bizer, C.: DBpedia - a large-scale, multilingual knowledge base extracted from wikipedia. Seman. Web J. 5, 1–29 (2014)

    Google Scholar 

  13. Li, Y., Bontcheva, K., Cunningham, H.: Adapting svm for data sparseness and imbalance: a case study in information extraction. Nat. Lang. Eng. 15(02), 241–271 (2009)

    Article  Google Scholar 

  14. Min, B., Shi, S., Grishman, R., Lin, C.Y.: Ensemble semantics for large-scale unsupervised relation extraction. In: Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, pp. 1027–1037. Association for Computational Linguistics (2012)

    Google Scholar 

  15. Mintz, M., Bills, S., Snow, R., Jurafsky, D.: Distant supervision for relation extraction without labeled data. In: Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP, ACL 2009, vol. 2, pp. 1003–1011. Association for Computational Linguistics, Stroudsburg, PA, USA (2009). http://dl.acm.org/citation.cfm?id=1690219.1690287

  16. Nadeau, D., Sekine, S.: A survey of named entity recognition and classification. Lingvisticae Investigationes 30(1), 3–26 (2007). http://dx.doi.org/10.1075/li.30.1.03nad

    Article  Google Scholar 

  17. Otero-Cerdeira, L., Rodríguez-Martínez, F.J., Gómez-Rodríguez, A.: Ontology matching: a literature review. Expert Syst. Appl. 42(2), 949–971 (2015). http://www.sciencedirect.com/science/article/pii/S0957417414005144

    Article  Google Scholar 

  18. Scharffe, F., Liu, Y., Zhou, C.: Rdf-ai: an architecture for rdf datasets matching, fusion and interlink. In: Proceedings of IJCAI 2009 Workshop on Identity, Reference, and Knowledge Representation (IR-KR), Pasadena (CA US) (2009)

    Google Scholar 

  19. Singhal, A.: Introducing the knowledge graph: things, not strings. Official Google Blog, May 2012

    Google Scholar 

  20. Yao, X., Van Durme, B.: Information extraction over structured data: question answering with freebase. In: Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics, vol. 1 (Long Papers), pp. 956–966. Association for Computational Linguistics (2014). http://aclweb.org/anthology/P14-1090

Download references

Acknowledgments

Part of this research has been sponsored by the EPSRC funded project LODIE: Linked Open Data for IE, EP/J019488/1.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jie Gao .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Gao, J., Mazumdar, S. (2015). Exploiting Linked Open Data to Uncover Entity Types. In: Gandon, F., Cabrio, E., Stankovic, M., Zimmermann, A. (eds) Semantic Web Evaluation Challenges. SemWebEval 2015. Communications in Computer and Information Science, vol 548. Springer, Cham. https://doi.org/10.1007/978-3-319-25518-7_5

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-25518-7_5

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-25517-0

  • Online ISBN: 978-3-319-25518-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics