Exploiting Linked Open Data to Uncover Entity Types

Gao, Jie; Mazumdar, Suvodeep

doi:10.1007/978-3-319-25518-7_5

Jie Gao¹⁴ &
Suvodeep Mazumdar¹⁴

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 548))

Included in the following conference series:

Semantic Web Evaluation Challenges

702 Accesses
6 Citations

Abstract

Extracting structured information from text plays a crucial role in automatic knowledge acquisition and is at the core of any knowledge representation and reasoning system. Traditional methods rely on hand-crafted rules and are restricted by the performance of various linguistic pre-processing tools. More recent approaches rely on supervised learning of relations trained on labelled examples, which can be manually created or sometimes automatically generated (referred as distant supervision). We propose a supervised method for entity typing and alignment. We argue that a rich feature space can improve extraction accuracy and we propose to exploit Linked Open Data (LOD) for feature enrichment. Our approach is tested on task-2 of the Open Knowledge Extraction challenge, including automatic entity typing and alignment. Our approach demonstrate that by combining evidences derived from LOD (e.g. DBpedia) and conventional lexical resources (e.g. WordNet) (i) improves the accuracy of the supervised induction method and (ii) enables easy matching with the Dolce+DnS Ultra Lite ontology classes.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
https://www.freebase.com/.
2.
http://wiki.dbpedia.org/About.
3.
https://github.com/anuzzolese/oke-challenge.
4.
Source code can be found at https://github.com/jerrygaoLondon/oke-extractor.
5.
http://stlab.istc.cnr.it/stlab/WikipediaOntology/.
6.
We use the training data encoded in NIF format provided by the challenge organisers in this experiment. The NLP Interchange Format (NIF) is an RDF/OWL-based format that aims to achieve interoperability between Natural Language Processing (NLP) tools, language resources and annotations.
7.
http://persistence.uni-leipzig.org/nlp2rdf/.
8.
The rdfs stands for the namespace of RDF Schema (http://www.w3.org/2000/01/ rdf-schema#).
9.
RDFLib: https://pypi.python.org/pypi/rdflib.
10.
The SMART stop-word list built by Chris Buckley and Gerard Salton, which can be obtained from goo.gl/rBQNbO.
11.
https://gate.ac.uk/sale/tao/splitch13.html.
12.
https://gate.ac.uk/.
13.
http://www.nltk.org/howto/wordnet.html.
14.
SPARQLWrapper is a python based wrapper around a SPARQL service, access via http://rdflib.github.io/sparqlwrapper/.
15.
http://www.nltk.org/howto/classify.html.
16.
http://www.nltk.org/book/ch05.html.
17.
The dul stands for the prefix for http://www.ontologydesignpatterns.org/ont/dul/DUL.owl.
18.
The complete SPARQL query can be found in the projects source code repository.
19.
https://github.com/anuzzolese/oke-challenge.

References

Agichtein, E., Gravano, L.: Snowball: extracting relations from large plain-text collections. In: Proceedings of the Fifth ACM Conference on Digital Libraries. DL 2000, pp. 85–94. ACM, New York, NY, USA (2000). http://doi.acm.org/10.1145/336597.336644
Bizer, C., Heath, T., Ayers, D., Raimond, Y.: Interlinking open data on the web. Media 79(1), 31–35 (2007). http://people.kmi.open.ac.uk/tom/papers/bizer-heath-eswc2007-interlinki ng-open-data.pdf
Google Scholar
Bizer, C., Lehmann, J., Kobilarov, G., Auer, S., Becker, C., Cyganiak, R., Hellmann, S.: Dbpedia-a crystallization point for the web of data. Web Seman. Sci. Serv. Agents World Wide Web 7(3), 154–165 (2009)
Article Google Scholar
Bizer, C., Volz, J., Kobilarov, G., Gaedke, M.: Silk - a link discovery framework for the web of data. In: 18th International World Wide Web Conference, April 2009. http://www2009.eprints.org/227/
Carlson, A., Betteridge, J., Kisiel, B., Settles, B., Hruschka, E.R., Mitchell, T.M.: Toward an architecture for never-ending language learning. In: AAAI (2010)
Google Scholar
Daille, B., Habert, B., Jacquemin, C., Royauté, J.: Empirical observation of term variations and principles for their description. Terminology 3(2), 197–257 (1996)
Article Google Scholar
Fader, A., Soderland, S., Etzioni, O.: Identifying relations for open information extraction. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing. EMNLP 2011, pp. 1535–1545. Association for Computational Linguistics, Stroudsburg, PA, USA (2011). http://dl.acm.org/citation.cfm?id=2145432.2145596
Fader, A., Zettlemoyer, L., Etzioni, O.: Open question answering over curated and extracted knowledge bases. In: Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. pp. 1156–1165. KDD 2014. ACM, New York, NY, USA (2014). http://doi.acm.org/10.1145/2623330.2623677
Giunchiglia, F., Shvaiko, P., Yatskevich, M.: Discovering missing background knowledge in ontology matching. In: Proceedings of the 2006 Conference on ECAI 2006: 17th European Conference on Artificial Intelligence August 29 - September 1, 2006, Riva Del Garda, Italy. pp. 382–386. IOS Press, Amsterdam, The Netherlands (2006). http://dl.acm.org/citation.cfm?id=1567016.1567101
Hearst, M.A.: Automatic acquisition of hyponyms from large text corpora (1992)
Google Scholar
Kachroudi, M., Moussa, E.B., Zghal, S., Ben, S.: Ldoa results for oaei 2011. Ontology Matching, p. 148 (2011)
Google Scholar
Lehmann, J., Isele, R., Jakob, M., Jentzsch, A., Kontokostas, D., Mendes, P.N., Hellmann, S., Morsey, M., van Kleef, P., Auer, S., Bizer, C.: DBpedia - a large-scale, multilingual knowledge base extracted from wikipedia. Seman. Web J. 5, 1–29 (2014)
Google Scholar
Li, Y., Bontcheva, K., Cunningham, H.: Adapting svm for data sparseness and imbalance: a case study in information extraction. Nat. Lang. Eng. 15(02), 241–271 (2009)
Article Google Scholar
Min, B., Shi, S., Grishman, R., Lin, C.Y.: Ensemble semantics for large-scale unsupervised relation extraction. In: Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, pp. 1027–1037. Association for Computational Linguistics (2012)
Google Scholar
Mintz, M., Bills, S., Snow, R., Jurafsky, D.: Distant supervision for relation extraction without labeled data. In: Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP, ACL 2009, vol. 2, pp. 1003–1011. Association for Computational Linguistics, Stroudsburg, PA, USA (2009). http://dl.acm.org/citation.cfm?id=1690219.1690287
Nadeau, D., Sekine, S.: A survey of named entity recognition and classification. Lingvisticae Investigationes 30(1), 3–26 (2007). http://dx.doi.org/10.1075/li.30.1.03nad
Article Google Scholar
Otero-Cerdeira, L., Rodríguez-Martínez, F.J., Gómez-Rodríguez, A.: Ontology matching: a literature review. Expert Syst. Appl. 42(2), 949–971 (2015). http://www.sciencedirect.com/science/article/pii/S0957417414005144
Article Google Scholar
Scharffe, F., Liu, Y., Zhou, C.: Rdf-ai: an architecture for rdf datasets matching, fusion and interlink. In: Proceedings of IJCAI 2009 Workshop on Identity, Reference, and Knowledge Representation (IR-KR), Pasadena (CA US) (2009)
Google Scholar
Singhal, A.: Introducing the knowledge graph: things, not strings. Official Google Blog, May 2012
Google Scholar
Yao, X., Van Durme, B.: Information extraction over structured data: question answering with freebase. In: Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics, vol. 1 (Long Papers), pp. 956–966. Association for Computational Linguistics (2014). http://aclweb.org/anthology/P14-1090

Download references

Acknowledgments

Part of this research has been sponsored by the EPSRC funded project LODIE: Linked Open Data for IE, EP/J019488/1.

Author information

Authors and Affiliations

OAK Group, Department of Computer Science, University of Sheffield, Sheffield, UK
Jie Gao & Suvodeep Mazumdar

Authors

Jie Gao
View author publications
You can also search for this author in PubMed Google Scholar
Suvodeep Mazumdar
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jie Gao .

Editor information

Editors and Affiliations

Inria, Sophia Antipolis, France
Fabien Gandon
INRIA Sophia-Antipolis Méditerranée, Sophia Antipolis, France
Elena Cabrio
Université Paris-Sorbonne, Paris, France
Milan Stankovic
École des Mines de Saint-Étienne, Saint-Étienne, France
Antoine Zimmermann

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Gao, J., Mazumdar, S. (2015). Exploiting Linked Open Data to Uncover Entity Types. In: Gandon, F., Cabrio, E., Stankovic, M., Zimmermann, A. (eds) Semantic Web Evaluation Challenges. SemWebEval 2015. Communications in Computer and Information Science, vol 548. Springer, Cham. https://doi.org/10.1007/978-3-319-25518-7_5

Download citation

DOI: https://doi.org/10.1007/978-3-319-25518-7_5
Published: 07 January 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-25517-0
Online ISBN: 978-3-319-25518-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics