Abstract
Extracting structured information from text plays a crucial role in automatic knowledge acquisition and is at the core of any knowledge representation and reasoning system. Traditional methods rely on hand-crafted rules and are restricted by the performance of various linguistic pre-processing tools. More recent approaches rely on supervised learning of relations trained on labelled examples, which can be manually created or sometimes automatically generated (referred as distant supervision). We propose a supervised method for entity typing and alignment. We argue that a rich feature space can improve extraction accuracy and we propose to exploit Linked Open Data (LOD) for feature enrichment. Our approach is tested on task-2 of the Open Knowledge Extraction challenge, including automatic entity typing and alignment. Our approach demonstrate that by combining evidences derived from LOD (e.g. DBpedia) and conventional lexical resources (e.g. WordNet) (i) improves the accuracy of the supervised induction method and (ii) enables easy matching with the Dolce+DnS Ultra Lite ontology classes.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
- 2.
- 3.
- 4.
Source code can be found at https://github.com/jerrygaoLondon/oke-extractor.
- 5.
- 6.
We use the training data encoded in NIF format provided by the challenge organisers in this experiment. The NLP Interchange Format (NIF) is an RDF/OWL-based format that aims to achieve interoperability between Natural Language Processing (NLP) tools, language resources and annotations.
- 7.
- 8.
The rdfs stands for the namespace of RDF Schema (http://www.w3.org/2000/01/ rdf-schema#).
- 9.
RDFLib: https://pypi.python.org/pypi/rdflib.
- 10.
The SMART stop-word list built by Chris Buckley and Gerard Salton, which can be obtained from goo.gl/rBQNbO.
- 11.
- 12.
- 13.
- 14.
SPARQLWrapper is a python based wrapper around a SPARQL service, access via http://rdflib.github.io/sparqlwrapper/.
- 15.
- 16.
- 17.
The dul stands for the prefix for http://www.ontologydesignpatterns.org/ont/dul/DUL.owl.
- 18.
The complete SPARQL query can be found in the projects source code repository.
- 19.
References
Agichtein, E., Gravano, L.: Snowball: extracting relations from large plain-text collections. In: Proceedings of the Fifth ACM Conference on Digital Libraries. DL 2000, pp. 85–94. ACM, New York, NY, USA (2000). http://doi.acm.org/10.1145/336597.336644
Bizer, C., Heath, T., Ayers, D., Raimond, Y.: Interlinking open data on the web. Media 79(1), 31–35 (2007). http://people.kmi.open.ac.uk/tom/papers/bizer-heath-eswc2007-interlinki ng-open-data.pdf
Bizer, C., Lehmann, J., Kobilarov, G., Auer, S., Becker, C., Cyganiak, R., Hellmann, S.: Dbpedia-a crystallization point for the web of data. Web Seman. Sci. Serv. Agents World Wide Web 7(3), 154–165 (2009)
Bizer, C., Volz, J., Kobilarov, G., Gaedke, M.: Silk - a link discovery framework for the web of data. In: 18th International World Wide Web Conference, April 2009. http://www2009.eprints.org/227/
Carlson, A., Betteridge, J., Kisiel, B., Settles, B., Hruschka, E.R., Mitchell, T.M.: Toward an architecture for never-ending language learning. In: AAAI (2010)
Daille, B., Habert, B., Jacquemin, C., Royauté, J.: Empirical observation of term variations and principles for their description. Terminology 3(2), 197–257 (1996)
Fader, A., Soderland, S., Etzioni, O.: Identifying relations for open information extraction. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing. EMNLP 2011, pp. 1535–1545. Association for Computational Linguistics, Stroudsburg, PA, USA (2011). http://dl.acm.org/citation.cfm?id=2145432.2145596
Fader, A., Zettlemoyer, L., Etzioni, O.: Open question answering over curated and extracted knowledge bases. In: Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. pp. 1156–1165. KDD 2014. ACM, New York, NY, USA (2014). http://doi.acm.org/10.1145/2623330.2623677
Giunchiglia, F., Shvaiko, P., Yatskevich, M.: Discovering missing background knowledge in ontology matching. In: Proceedings of the 2006 Conference on ECAI 2006: 17th European Conference on Artificial Intelligence August 29 - September 1, 2006, Riva Del Garda, Italy. pp. 382–386. IOS Press, Amsterdam, The Netherlands (2006). http://dl.acm.org/citation.cfm?id=1567016.1567101
Hearst, M.A.: Automatic acquisition of hyponyms from large text corpora (1992)
Kachroudi, M., Moussa, E.B., Zghal, S., Ben, S.: Ldoa results for oaei 2011. Ontology Matching, p. 148 (2011)
Lehmann, J., Isele, R., Jakob, M., Jentzsch, A., Kontokostas, D., Mendes, P.N., Hellmann, S., Morsey, M., van Kleef, P., Auer, S., Bizer, C.: DBpedia - a large-scale, multilingual knowledge base extracted from wikipedia. Seman. Web J. 5, 1–29 (2014)
Li, Y., Bontcheva, K., Cunningham, H.: Adapting svm for data sparseness and imbalance: a case study in information extraction. Nat. Lang. Eng. 15(02), 241–271 (2009)
Min, B., Shi, S., Grishman, R., Lin, C.Y.: Ensemble semantics for large-scale unsupervised relation extraction. In: Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, pp. 1027–1037. Association for Computational Linguistics (2012)
Mintz, M., Bills, S., Snow, R., Jurafsky, D.: Distant supervision for relation extraction without labeled data. In: Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP, ACL 2009, vol. 2, pp. 1003–1011. Association for Computational Linguistics, Stroudsburg, PA, USA (2009). http://dl.acm.org/citation.cfm?id=1690219.1690287
Nadeau, D., Sekine, S.: A survey of named entity recognition and classification. Lingvisticae Investigationes 30(1), 3–26 (2007). http://dx.doi.org/10.1075/li.30.1.03nad
Otero-Cerdeira, L., RodrÃguez-MartÃnez, F.J., Gómez-RodrÃguez, A.: Ontology matching: a literature review. Expert Syst. Appl. 42(2), 949–971 (2015). http://www.sciencedirect.com/science/article/pii/S0957417414005144
Scharffe, F., Liu, Y., Zhou, C.: Rdf-ai: an architecture for rdf datasets matching, fusion and interlink. In: Proceedings of IJCAI 2009 Workshop on Identity, Reference, and Knowledge Representation (IR-KR), Pasadena (CA US) (2009)
Singhal, A.: Introducing the knowledge graph: things, not strings. Official Google Blog, May 2012
Yao, X., Van Durme, B.: Information extraction over structured data: question answering with freebase. In: Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics, vol. 1 (Long Papers), pp. 956–966. Association for Computational Linguistics (2014). http://aclweb.org/anthology/P14-1090
Acknowledgments
Part of this research has been sponsored by the EPSRC funded project LODIE: Linked Open Data for IE, EP/J019488/1.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Gao, J., Mazumdar, S. (2015). Exploiting Linked Open Data to Uncover Entity Types. In: Gandon, F., Cabrio, E., Stankovic, M., Zimmermann, A. (eds) Semantic Web Evaluation Challenges. SemWebEval 2015. Communications in Computer and Information Science, vol 548. Springer, Cham. https://doi.org/10.1007/978-3-319-25518-7_5
Download citation
DOI: https://doi.org/10.1007/978-3-319-25518-7_5
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-25517-0
Online ISBN: 978-3-319-25518-7
eBook Packages: Computer ScienceComputer Science (R0)