Skip to main content

Towards Enriching DBpedia from Vertical Enumerative Structures Using a Distant Learning Approach

  • Conference paper
  • First Online:
Knowledge Engineering and Knowledge Management (EKAW 2018)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 11313))

Included in the following conference series:

  • 949 Accesses

Abstract

Automatic construction of semantic resources at large scale usually relies on general purpose corpora as Wikipedia. This resource, by nature rich in encyclopedic knowledge, exposes part of this knowledge with strongly structured elements (infoboxes, categories, etc.). Several extractors have targeted these structures in order to enrich or to populate semantic resources as DBpedia, YAGO or BabelNet. The remain semi-structured textual structures, such as vertical enumerative structures (those using typographic and dispositional layout) have been however under-exploited. However, frequent in corpora, they are rich sources of specific semantic relations, such as hypernyms. This paper presents a distant learning approach for extracting hypernym relations from vertical enumerative structures of Wikipedia, with the aim of enriching DBpedia. Our relation extraction approach achieves an overall precision of 62%, and 99% of the extracted relations can enrich DBpedia, with respect to a reference corpus.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    A hypernym relation link two entities \(E_1\) and \(E_2\) when \(E_2\) (hyponym) is subordinate to \(E_1\) (hypernym). From a lexical point of view, this relation is called “isa”.

  2. 2.

    http://www.irit.fr/Sempedia.

  3. 3.

    http://en.wikipedia.org/wiki/Wikipedia:Manual_of_Style.

  4. 4.

    http://opennlp.apache.org/.

  5. 5.

    https://github.com/attardi/wikiextractor.

  6. 6.

    https://gate.ac.uk/sale/tao/splitch6.html#chap:annie.

  7. 7.

    https://www.DBpedia-spotlight.org/.

  8. 8.

    https://github.com/DBpedia-spotlight/spotlight-docker/tree/master/nightly-build/french.

References

  1. Asher, N.: Reference to Abstract Objects in Discourse: A Philosophical Semantics for Natural Language Metaphysics. SLAP, vol. 50. Kluwer, Dordrecht (1993)

    Google Scholar 

  2. Auer, S., Bizer, C., Kobilarov, G., Lehmann, J., Cyganiak, R., Ives, Z.: dbpedia

    Google Scholar 

  3. Berger, A.L., Pietra, V.J.D., Pietra, S.A.D.: A maximum entropy approach to natural language processing. Comput. Linguist. 22(1), 39–71 (1996)

    Google Scholar 

  4. Brin, S.: Extracting patterns and relations from the World Wide Web. In: Atzeni, P., Mendelzon, A., Mecca, G. (eds.) WebDB 1998. LNCS, vol. 1590, pp. 172–183. Springer, Heidelberg (1999). https://doi.org/10.1007/10704656_11

    Chapter  Google Scholar 

  5. Bunescu, R.C., Mooney, R.J.: A shortest path dependency kernel for relation extraction. In: Proceedings of the Conference on Human Language Technology and Empirical Methods in Natural Language Processing, pp. 724–731 (2005)

    Google Scholar 

  6. Bunescu, R.C., Mooney, R.J.: Learning to extract relations from the web using minimal supervision. In: Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics (ACL 2007), Prague, Czech Republic, June 2007

    Google Scholar 

  7. Daiber, J., Jakob, M., Hokamp, C., Mendes, P.N.: Improving efficiency and accuracy in multilingual entity extraction. In: Proceedings of the 9th International Conference on Semantic Systems (I-Semantics) (2013)

    Google Scholar 

  8. Fauconnier, J.P., Kamel, M.: Discovering hypernymy relations using text layout. In: Joint Conference on Lexical and Computational Semantics, Denver, Colorado, pp. 249–258. ACL (2015)

    Google Scholar 

  9. Fauconnier, J.-P., Kamel, M., Rothenburger, B.: Une typologie multi-dimensionnelle des structures énumératives pour l’identification des relations termino-ontologiques. In: Conférence Internationale sur la Terminologie et l’Intelligence Artificielle - TIA 2013, pp. 137–144, Paris, France, October 2013

    Google Scholar 

  10. Flati, T., Vannella, D., Pasini, T., Navigli, R.: MultiWiBi: the multilingual Wikipedia bitaxonomy project. Artif. Intell. 241, 66–102 (2016). (Complete)

    Article  MathSciNet  Google Scholar 

  11. Hearst, M.A.: Automatic acquisition of hyponyms from large text corpora. In: Proceedings of the 14th Conference on Computational Linguistics, pp. 539–545. Association for Computational Linguistics (1992)

    Google Scholar 

  12. Ho-Dac, L.-M., Péry-Woodley, M.-P., Tanguy, L.: Anatomie des Structures Énumératives. In: Traitement Automatique des Langues Naturelles, Montréal, Canada (2010)

    Google Scholar 

  13. Hovy, E., Arens, Y.: Readings in intelligent user interfaces. In: Automatic Generation of Formatted Text, pp. 256–262. Morgan Kaufmann Publishers (1998)

    Google Scholar 

  14. Jaynes, E.: Information theory and statistical mechanics. Phys. Rev. 106(4), 620 (1957)

    Article  MathSciNet  Google Scholar 

  15. Kamel, M., Trojahn, C., Ghamnia, A., Aussenac-Gilles, N., Fabre, C.: A distant learning approach for extracting hypernym relations from Wikipedia disambiguation pages. In: International Conference on Knowledge Based and Intelligent Information and Engineering Systems, 6–8 September 2017, France (2017)

    Google Scholar 

  16. Kazama, J., Torisawa, K.: Exploiting Wikipedia as external knowledge for named entity recognition. In: Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, pp. 698–707 (2007)

    Google Scholar 

  17. Lenci, A., Benotto, G.: Identifying hypernyms in distributional semantic spaces. In: Proceedings of the First Joint Conference on Lexical and Computational Semantics, pp. 75–79. Association for Computational Linguistics (2012)

    Google Scholar 

  18. Lin, Y., Shen, S., Liu, Z., Luan, H., Sun, M.: Neural relation extraction with selective attention over instances. In: ACL (2016)

    Google Scholar 

  19. Luc, C.: Représentation et composition des structures visuelles et rhétoriques du textes. Approche pour la génération de textes formatés. Ph.D. thesis (2000)

    Google Scholar 

  20. Mann, W.C., Thompson, S.A.: Rhetorical structure theory: toward a functional theory of text organization. Text 8(3), 243–281 (1988)

    Article  Google Scholar 

  21. Mintz, M., Bills, S., Snow, R., Jurafsky, D.: Distant supervision for relation extraction without labeled data. In: Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP, pp. 1003–1011 (2009)

    Google Scholar 

  22. Morsey, M., Lehmann, J., Auer, S., Stadler, C., Hellmann, S.: DBpedia and the live extraction of structured data from Wikipedia. Program Electron. Libr. Inf. Syst. 46, 27 (2012)

    Article  Google Scholar 

  23. Navigli, R., Ponzetto, S.P.: BabelNet: the automatic construction, evaluation and application of a wide-coverage multilingual semantic network. Artif. Intell. 193, 217–250 (2012)

    Article  MathSciNet  Google Scholar 

  24. Navigli, R., Velardi, P.: Learning word-class lattices for definition and hypernym extraction. In: Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, ACL 2010, Stroudsburg, PA, USA, pp. 1318–1327. Association for Computational Linguistics (2010)

    Google Scholar 

  25. Ratnaparkhi, A.: Maximum entropy models for natural language ambiguity resolution. Ph.D. thesis, University of Pennsylvania (1998)

    Google Scholar 

  26. Riedel, S., Yao, L., McCallum, A.: Modeling relations and their mentions without labeled text. In: Balcázar, J.L., Bonchi, F., Gionis, A., Sebag, M. (eds.) ECML PKDD 2010. LNCS (LNAI), vol. 6323, pp. 148–163. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-15939-8_10

    Chapter  Google Scholar 

  27. Rodriguez-Ferreira, T., Rabadan, A., Hervas, R., Diaz, A.: Improving information extraction from Wikipedia texts using basic English. In: Proceedings of the 10th International Conference on Language Resources and Evaluation (LREC) (2016)

    Google Scholar 

  28. Snow, R., Jurafsky, D., Ng, A.Y.: Learning syntactic patterns for automatic hypernym discovery. In: Advances in Neural Information Processing Systems 17 (2004)

    Google Scholar 

  29. Suchanek, F.M., Kasneci, G., Weikum, G.: Yago: a core of semantic knowledge unifying WordNet and Wikipedia. In: Proceedings of the 16th International Conference on World Wide Web, WWW 2007, pp. 697–706 (2007)

    Google Scholar 

  30. Sumida, A., Torisawa, K.: Hacking wikipedia for hyponymy relation acquisition. IJCNLP 8, 883–888 (2008)

    Google Scholar 

  31. Vergez-Couret, M., Prevot, L., Bras, M.: Interleaved discourse, the case of two-step enumerative structures. In: Proceedings of Contraints In Discourse III, Postdam, pp. 85–94 (2008)

    Google Scholar 

  32. Virbel, J.: Structured Documents, pp. 161–180. Cambridge University Press, New York (1989)

    Google Scholar 

  33. Wang, C., He, X., Zhou, A.: A short survey on taxonomy learning from text corpora: issues, resources and recent advances. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, pp. 1190–1203 (2017)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Cassia Trojahn .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Kamel, M., Trojahn, C. (2018). Towards Enriching DBpedia from Vertical Enumerative Structures Using a Distant Learning Approach. In: Faron Zucker, C., Ghidini, C., Napoli, A., Toussaint, Y. (eds) Knowledge Engineering and Knowledge Management. EKAW 2018. Lecture Notes in Computer Science(), vol 11313. Springer, Cham. https://doi.org/10.1007/978-3-030-03667-6_12

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-03667-6_12

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-03666-9

  • Online ISBN: 978-3-030-03667-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics