An Evidence-Based Verification Approach to Extract Entities and Relations for Knowledge Base Population

  • Naimdjon Takhirov
  • Fabien Duchateau
  • Trond Aalberg
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7649)


This paper presents an approach to automatically extract entities and relationships from textual documents. The main goal is to populate a knowledge base that hosts this structured information about domain entities. The extracted entities and their expected relationships are verified using two evidence based techniques: classification and linking. This last process also enables the linking of our knowledge base to other sources which are part of the Linked Open Data cloud. We demonstrate the benefit of our approach through series of experiments with real-world datasets.


Linked Data Knowledge Extraction Machine Learning 


  1. 1.
    Agichtein, E., Gravano, L.: Snowball: Extracting Relations from Large Plain-Text Collections. In: Proceedings of ACM DL, pp. 85–94 (2000)Google Scholar
  2. 2.
    Auer, S., Dietzold, S., Lehmann, J., Hellmann, S., Aumueller, D.: Triplify: light-weight linked data publication from relational databases. In: Proceedings of WWW, pp. 621–630 (2009)Google Scholar
  3. 3.
    Balog, K., Serdyukov, P., de Vries, A.P.: Overview of the TREC 2011 entity track. In: Proceedings of TREC 2011. NIST (2012)Google Scholar
  4. 4.
    Bizer, C., Heath, T., Berners-Lee, T.: Linked Data - The Story So Far. International Journal of Semantic Web and Information Systems 5(3), 1–22 (2009)CrossRefGoogle Scholar
  5. 5.
    Brin, S.: Extracting Patterns and Relations from the World Wide Web. In: Atzeni, P., Mendelzon, A.O., Mecca, G. (eds.) WebDB 1998. LNCS, vol. 1590, pp. 172–183. Springer, Heidelberg (1999)CrossRefGoogle Scholar
  6. 6.
    Carlson, A., Betteridge, J., Kisiel, B., Settles, B., Hruschka Jr., E., Mitchell, T.M.: Toward an architecture for never-ending language learning. In: Proceedings of AAAI, pp. 1306–1313 (2010)Google Scholar
  7. 7.
    Cormack, G.V., Smucker, M.D., Clarke, C.L.A.: Efficient and effective spam filtering and re-ranking for large web datasets. Inf. Retr. 14(5), 441–465 (2011)CrossRefGoogle Scholar
  8. 8.
    Etzioni, O., Fader, A., Christensen, J., Soderland, S., Mausam: Open Information Extraction: The Second Generation. In: Proceedings of IJCAI, pp. 3–10 (2011)Google Scholar
  9. 9.
    Garner, S.R.: WEKA: The Waikato Environment for Knowledge Analysis. In: Proceedings of the New Zealand Computer Science Research Students Conference, pp. 57–64 (1995)Google Scholar
  10. 10.
    Hassanzadeh, O., Consens, M.P.: Linked Movie Data Base. In: Proceedings of LDOW (2009)Google Scholar
  11. 11.
    Liu, Q., Xu, K., Zhang, L., Wang, H., Yu, Y., Pan, Y.: Catriple: Extracting Triples from Wikipedia Categories. In: Domingue, J., Anutariya, C. (eds.) ASWC 2008. LNCS, vol. 5367, pp. 330–344. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  12. 12.
    Milne, D., Witten, I.H.: Learning to link with wikipedia. In: Proceedings of CIKM, pp. 509–518 (2008)Google Scholar
  13. 13.
    Mitchell, T.: Machine Learning. McGraw-Hill Education (ISE Editions) (October 1997)Google Scholar
  14. 14.
    Nakashole, N., Theobald, M., Weikum, G.: Scalable knowledge harvesting with high precision and high recall. In: Proceedings of WSDM, pp. 227–236 (2011)Google Scholar
  15. 15.
    Parundekar, R., Knoblock, C.A., Ambite, J.L.: Linking and Building Ontologies of Linked Data. In: Patel-Schneider, P.F., Pan, Y., Hitzler, P., Mika, P., Zhang, L., Pan, J.Z., Horrocks, I., Glimm, B. (eds.) ISWC 2010, Part I. LNCS, vol. 6496, pp. 598–614. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  16. 16.
    Resnik, P.: Semantic similarity in a taxonomy: An information-based measure and its application to problems of ambiguity in natural language. Journal of Artificial Intelligence Research (JAIR) 11, 95–130 (1999)zbMATHGoogle Scholar
  17. 17.
    Rode, H., Serdyukov, P., Hiemstra, D.: Combining document- and paragraph-based entity ranking. In: Proceedings of ACM SIGIR, pp. 851–852 (2008)Google Scholar
  18. 18.
    Suchanek, F., Sozio, M., Weikum, G.: SOFIE: a self-organizing framework for information extraction. In: Proceedings of WWW, pp. 631–640 (2009)Google Scholar
  19. 19.
    Takhirov, N., Duchateau, F., Aalberg, T.: Linking FRBR Entities to LOD through Semantic Matching. In: Gradmann, S., Borri, F., Meghini, C., Schuldt, H. (eds.) TPDL 2011. LNCS, vol. 6966, pp. 284–295. Springer, Heidelberg (2011)CrossRefGoogle Scholar
  20. 20.
    The International Federation of Library Associations and Institutions. Functional requirements for bibliographic records. In: UBCIM Publications - New Series, vol. 19 (1998)Google Scholar
  21. 21.
    Vercoustre, A.-M., Thom, J.A., Pehcevski, J.: Entity ranking in Wikipedia. In: Proceedings of ACM SAC, pp. 1101–1106 (2008)Google Scholar
  22. 22.
    Weikum, G., Theobald, M.: From information to knowledge: harvesting entities and relationships from web sources. In: Proceedings of PODS, pp. 65–76 (2010)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Naimdjon Takhirov
    • 1
  • Fabien Duchateau
    • 2
  • Trond Aalberg
    • 1
  1. 1.Norwegian University of Science and TechnologyTrondheimNorway
  2. 2.Université Lyon 1, LIRIS, UMR5205LyonFrance

Personalised recommendations