Automatic Typing of DBpedia Entities

  • Aldo Gangemi
  • Andrea Giovanni Nuzzolese
  • Valentina Presutti
  • Francesco Draicchio
  • Alberto Musetti
  • Paolo Ciancarini
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7649)

Abstract

We present Tìpalo, an algorithm and tool for automatically typing DBpedia entities. Tìpalo identifies the most appropriate types for an entity by interpreting its natural language definition, which is extracted from its corresponding Wikipedia page abstract. Types are identified by means of a set of heuristics based on graph patterns, disambiguated to WordNet, and aligned to two top-level ontologies: WordNet supersenses and a subset of DOLCE+DnS Ultra Lite classes. The algorithm has been tuned against a golden standard that has been built online by a group of selected users, and further evaluated in a user study.

References

  1. 1.
    Agirre, E., Soroa, A.: Personalizing pagerank for word sense disambiguation. In: Proceedings of the 12th Conference of the European chapter of the Association for Computational Linguistics (EACL 2009), Athens, Greece. The Association for Computer Linguistics (2009)Google Scholar
  2. 2.
    Cimiano, P.: Ontology Learning and Population from Text: Algorithms, Evaluation and Applications. Springer (2006)Google Scholar
  3. 3.
    Cimiano, P., Völker, J.: Text2onto - a framework for ontology learning and data-driven change discovery (2005)Google Scholar
  4. 4.
    Curran, J.R., Clark, S., Bos, J.: Linguistically motivated large-scale nlp with c&c and boxer. In: Proceedings of the ACL 2007 Demo and Poster Sessions, Prague, Czech Republic, pp. 33–36 (2007)Google Scholar
  5. 5.
    Etzioni, O., Fader, A., Christensen, J., Soderland, S., Mausam: Open information extraction: The second generation. In: IJCAI, pp. 3–10. IJCAI/AAAI (2011)Google Scholar
  6. 6.
    Gangemi, A.: Norms and plans as unification criteria for social collectives. Autonomous Agents and Multi-Agent Systems 17(1), 70–112 (2008)CrossRefGoogle Scholar
  7. 7.
    Gangemi, A., Navigli, R., Velardi, P.: The OntoWordNet Project: Extension and Axiomatization of Conceptual Relations in WordNet. In: Meersman, R., Schmidt, D.C. (eds.) CoopIS/DOA/ODBASE 2003. LNCS, vol. 2888, pp. 820–838. Springer, Heidelberg (2003)CrossRefGoogle Scholar
  8. 8.
    Hartmann, S., Szarvas, G., Gurevych, I.: Mining multiword terms from wikipedia. In: Pazienza, M.T., Stellato, A. (eds.) Semi-Automatic Ontology Development: Processes and Resources, pp. 226–258. IGI Global, Hershey (2012)CrossRefGoogle Scholar
  9. 9.
    Hearst, M.A.: Automatic acquisition of hyponyms from large text corpora. In: COLING, pp. 539–545 (1992)Google Scholar
  10. 10.
    Kalyanpur, A., Murdock, J.W., Fan, J., Welty, C.: Leveraging Community-Built Knowledge for Type Coercion in Question Answering. In: Aroyo, L., Welty, C., Alani, H., Taylor, J., Bernstein, A., Kagal, L., Noy, N., Blomqvist, E. (eds.) ISWC 2011, Part II. LNCS, vol. 7032, pp. 144–156. Springer, Heidelberg (2011)CrossRefGoogle Scholar
  11. 11.
    Lehmann, J., Bizer, C., Kobilarov, G., Auer, S., Becker, C., Cyganiak, R., Hellmann, S.: DBpedia - A Crystallization Point for the Web of Data. Journal of Web Semantics 7(3), 154–165 (2009)CrossRefGoogle Scholar
  12. 12.
    Navigli, R.: Word sense disambiguation: A survey. ACM Comput. Surv. 41(2) (2009)Google Scholar
  13. 13.
    Navigli, R., Ponzetto, S.P.: BabelNet: Building a very large multilingual semantic network. In: Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, Uppsala, Sweden, July 11-16, pp. 216–225 (2010)Google Scholar
  14. 14.
    Nuzzolese, A.G., Gangemi, A., Presutti, V., Ciancarini, P.: Encyclopedic Knowledge Patterns from Wikipedia Links. In: Aroyo, L., Welty, C., Alani, H., Taylor, J., Bernstein, A., Kagal, L., Noy, N., Blomqvist, E. (eds.) ISWC 2011, Part I. LNCS, vol. 7031, pp. 520–536. Springer, Heidelberg (2011)CrossRefGoogle Scholar
  15. 15.
    Nuzzolese, A.G., Gangemi, A., Presutti, V., Ciancarini, P.: Type inference through the analysis of wikipedia links. In: WWW 2012 Workshop on Linked Data on the Web (LDOW 2012). CEUR (2012)Google Scholar
  16. 16.
    Presutti, V., Draicchio, F., Gangemi, A.: Knowledge Extraction Based on Discourse Representation Theory and Linguistic Frames. In: ten Teije, A., Völker, J., Handschuh, S., Stuckenschmidt, H., d’Acquin, M., Nikolov, A., Aussenac-Gilles, N., Hernandez, N. (eds.) EKAW 2012. LNCS, vol. 7603, pp. 114–129. Springer, Heidelberg (2012)CrossRefGoogle Scholar
  17. 17.
    Suchanek, F.M., Kasneci, G., Weikum, G.: Yago: A Core of Semantic Knowledge. In: 16th International World Wide Web Conference (WWW 2007). ACM Press, New York (2007)Google Scholar
  18. 18.
    Tanev, H., Magnini, B.: Weakly supervised approaches for ontology population. In: Proceedings of the 2008 Conference on Ontology Learning and Population: Bridging the Gap between Text and Knowledge, pp. 129–143. IOS Press, Amsterdam (2008)Google Scholar
  19. 19.
    Völker, J., Rudolph, S.: Lexico-logical acquisition of owl dl axioms – an integrated approach to ontology refinement (2008)Google Scholar
  20. 20.
    Witte, R., Khamis, N., Rilling, J.: Flexible ontology population from text: The owlexporter. In: Calzolari, N., Choukri, K., Maegaard, B., Mariani, J., Odijk, J., Piperidis, S., Rosner, M., Tapias, D. (eds.) LREC. European Language Resources Association (2010)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Aldo Gangemi
    • 1
  • Andrea Giovanni Nuzzolese
    • 1
    • 2
  • Valentina Presutti
    • 1
  • Francesco Draicchio
    • 1
  • Alberto Musetti
    • 1
  • Paolo Ciancarini
    • 1
    • 2
  1. 1.STLab-ISTC Consiglio Nazionale delle RicercheRomeItaly
  2. 2.Dipartimento di Scienze dell’InformazioneUniversità di BolognaItaly

Personalised recommendations