A Probabilistic Approach for Integrating Heterogeneous Knowledge Sources

  • Arnab Dutta
  • Christian Meilicke
  • Simone Paolo Ponzetto
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8465)


Open Information Extraction (OIE) systems like Nell and ReVerb have achieved impressive results by harvesting massive amounts of machine-readable knowledge with minimal supervision. However, the knowledge bases they produce still lack a clean, explicit semantic data model. This, on the other hand, could be provided by full-fledged semantic networks like DBpedia or Yago, which, in turn, could benefit from the additional coverage provided by Web-scale IE. In this paper, we bring these two strains of research together, and present a method to align terms from Nell with instances in DBpedia. Our approach is unsupervised in nature and relies on two key components. First, we automatically acquire probabilistic type information for Nell terms given a set of matching hypotheses. Second, we view the mapping task as the statistical inference problem of finding the most likely coherent mapping – i.e., the maximum a posteriori (MAP) mapping – based on the outcome of the first component used as soft constraint. These two steps are highly intertwined: accordingly, we propose an approach that iteratively refines type acquisition based on the output of the mapping generator, and vice versa. Experimental results on gold-standard data indicate that our approach outperforms a strong baseline, and is able to produce ever-improving mappings consistently across iterations.




Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Agirre, E., de Lacalle, O.L., Soroa, A.: Knowledge-based WSD on specific domains: performing better than generic supervised WSD. In: Proc. of IJCAI (2009)Google Scholar
  2. 2.
    Banko, M., Cafarella, M.J., Soderland, S., Broadhead, M., Etzioni, O.: Open information extraction from the Web. In: Proc. of IJCAI 2007 (2007)Google Scholar
  3. 3.
    Bizer, C., Lehmann, J., Kobilarov, G., Auer, S., Becker, C., Cyganiak, R., Hellmann, S.: DBpedia – A crystallization point for the web of data. Journal of Web Semantics 7(3) (2009)Google Scholar
  4. 4.
    Brin, S.: Extracting patterns and relations from the World Wide Web. In: Proc. of WebDB Workshop at EDBT 1998 (1998)Google Scholar
  5. 5.
    Bunescu, R., Paşca, M.: Using encyclopedic knowledge for named entity disambiguation. In: Proc. of EACL 2006 (2006)Google Scholar
  6. 6.
    Carlson, A., Betteridge, J., Kisiel, B., Settles, B., Hruschka, E.R., Mitchell, T.M.: Toward an architecture for never-ending language learning. In: Proc. of AAAI (2010)Google Scholar
  7. 7.
    Cucerzan, S.: Large-scale named entity disambiguation based on Wikipedia data. In: Proc. of EMNLP-CoNLL 2007 (2007)Google Scholar
  8. 8.
    Dredze, M., McNamee, P., Rao, D., Gerber, A., Finin, T.: Entity disambiguation for knowledge base population. In: Proc. of COLING 2010 (2010)Google Scholar
  9. 9.
    Dutta, A., Niepert, M., Meilicke, C., Ponzetto, S.P.: Integrating open and closed information extraction: Challenges and first steps. In: Proc. of the ISWC 2013 NLP and DBpedia workshop (2013)Google Scholar
  10. 10.
    Etzioni, O.: Search needs a shake-up.. Nature 476(7358), 25–26 (2011)CrossRefGoogle Scholar
  11. 11.
    Etzioni, O., Cafarella, M., Downey, D., Kok, S., Popescu, A.-M., Shaked, T., Soderland, S., Weld, D.S., Yates, A.: Web-scale information extraction in KnowItAll (Preliminary results). In: Proc. of WWW (2004)Google Scholar
  12. 12.
    Fader, A., Soderland, S., Etzioni, O.: Identifying relations for open information extraction. In: Proc. of EMNLP (2011)Google Scholar
  13. 13.
    Galárraga, L.A., Preda, N., Suchanek, F.M.: Mining rules to align knowledge bases. In: Proc. of AKBC 2013 (2013)Google Scholar
  14. 14.
    Hoffmann, R., Zhang, C., Weld, D.S.: Learning 5000 relational extractors. In: Proc. of ACL 2010 (2010)Google Scholar
  15. 15.
    Ji, H., Grishman, R.: Knowledge base population: Successful approaches and challenges. In: Proc. of ACL 2011(2011)Google Scholar
  16. 16.
    Jiang, S., Lowd, D., Dou, D.: Learning to refine an automatically extracted knowledge base using markov logic. In: Proc. of ICDM 2012 (2012)Google Scholar
  17. 17.
    Navigli, R.: Word Sense Disambiguation: A survey. ACM Computing Surveys 41(2), 1–69 (2009)CrossRefGoogle Scholar
  18. 18.
    Navigli, R., Ponzetto, S.P.: Babelnet: The automatic construction, evaluation and application of a wide-coverage multilingual semantic network. Artificial Intelligence 193, 217–250 (2012)CrossRefzbMATHMathSciNetGoogle Scholar
  19. 19.
    Niepert, M., Meilicke, C., Stuckenschmidt, H.: A probabilistic-logical framework for ontology matching. In: Proc. of AAAI (2010)Google Scholar
  20. 20.
    Noessner, J., Niepert, M., Stuckenschmidt, H.: RockIt: Exploiting parallelism and symmetry for map inference in statistical relational models. In: Proc. of AAAI (2013)Google Scholar
  21. 21.
    Paşca, M., Van Durme, B.: Weakly-supervised acquisition of open-domain classes and class attributes from Web documents and query logs. In: Proc. of ACL 2008 (2008)Google Scholar
  22. 22.
    Ponzetto, S.P., Navigli, R.: Knowledge-rich Word Sense Disambiguation rivaling supervised systems. In: Proc. of ACL 2010 (2010)Google Scholar
  23. 23.
    Richardson, M., Domingos, P.: Markov logic networks. Machine Learning 62(1-2) (2006)Google Scholar
  24. 24.
    Suchanek, F.M., Kasneci, G., Weikum, G.: Yago: A Core of Semantic Knowledge. In: Proc. of WWW 2007. ACM Press (2007)Google Scholar
  25. 25.
    Völker, J., Niepert, M.: Statistical schema induction. In: Proc. of ESWC 2011 (2011)Google Scholar
  26. 26.
    Volz, J., Bizer, C., Gaedke, M., Kobilarov, G.: Silk - A Link Discovery Framework for the Web of Data. In: Proc. of LDOW 2009 (2009)Google Scholar
  27. 27.
    Wu, F., Weld, D.: Open information extraction using Wikipedia. In: Proc. of ACL 2010 (2010)Google Scholar
  28. 28.
    Yang Chen, D.Z.W.: Web-scale knowledge inference using markov logic networks. In: ICML workshop on Structured Learning: Inferring Graphs from Structured and Unstructured Inputs (2013)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  • Arnab Dutta
    • 1
  • Christian Meilicke
    • 1
  • Simone Paolo Ponzetto
    • 1
  1. 1.Research Data and Web ScienceUniversity of MannheimGermany

Personalised recommendations