Advertisement

Enriching Knowledge Bases with Counting Quantifiers

  • Paramita Mirza
  • Simon Razniewski
  • Fariz Darari
  • Gerhard Weikum
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11136)

Abstract

Information extraction traditionally focuses on extracting relations between identifiable entities, such as \(\langle \)Monterey, locatedIn, California\(\rangle \). Yet, texts often also contain Counting information, stating that a subject is in a specific relation with a number of objects, without mentioning the objects themselves, for example, “California is divided into 58 counties”. Such counting quantifiers can help in a variety of tasks such as query answering or knowledge base curation, but are neglected by prior work.

This paper develops the first full-fledged system for extracting counting information from text, called CINEX. We employ distant supervision using fact counts from a knowledge base as training seeds, and develop novel techniques for dealing with several challenges: (i) non-maximal training seeds due to the incompleteness of knowledge bases, (ii) sparse and skewed observations in text sources, and (iii) high diversity of linguistic patterns. Experiments with five human-evaluated relations show that CINEX can achieve 60% average precision for extracting counting information. In a large-scale experiment, we demonstrate the potential for knowledge base enrichment by applying CINEX to 2,474 frequent relations in Wikidata. CINEX can assert the existence of 2.5M facts for 110 distinct relations, which is 28% more than the existing Wikidata facts for these relations.

References

  1. 1.
    Auer, S., Bizer, C., Kobilarov, G., Lehmann, J., Cyganiak, R., Ives, Z.: DBpedia: a nucleus for a web of open data. In: Aberer, K., et al. (eds.) ASWC/ISWC -2007. LNCS, vol. 4825, pp. 722–735. Springer, Heidelberg (2007).  https://doi.org/10.1007/978-3-540-76298-0_52CrossRefGoogle Scholar
  2. 2.
    Brin, S.: Extracting patterns and relations from the World Wide Web. In: WebDB (1998)Google Scholar
  3. 3.
    Craven, M., Kumlien, J., et al.: Constructing biological knowledge bases by extracting information from text sources. In: ISMB (1999)Google Scholar
  4. 4.
    Dang, H.T., Kelly, D., Lin, J.J.: Overview of the TREC 2007 question answering track. TREC 7, 63 (2007)Google Scholar
  5. 5.
    Darari, F., Nutt, W., Pirrò, G., Razniewski, S.: Completeness statements about RDF data sources and their use for query answering. In: Alani, H., et al. (eds.) ISWC 2013. LNCS, vol. 8218, pp. 66–83. Springer, Heidelberg (2013).  https://doi.org/10.1007/978-3-642-41335-3_5CrossRefGoogle Scholar
  6. 6.
    Del Corro, L., Gemulla, R.: ClausIE: clause-based open information extraction. In: WWW (2013)Google Scholar
  7. 7.
    Denecker, M., Cortés-Calabuig, A., Bruynooghe, M., Arieli, O.: Towards a logical reconstruction of a theory for locally closed databases. ACM Trans. Database Syst. 35(3) (2010)CrossRefGoogle Scholar
  8. 8.
    Dong, X.L., et al.: From data fusion to knowledge fusion. PVLDB 7(10), 881–892 (2014)Google Scholar
  9. 9.
    Dong, X.L., et al.: Knowledge vault: a web-scale approach to probabilistic knowledge fusion. In: KDD (2014)Google Scholar
  10. 10.
    Galárraga, L., Teflioudi, C., Hose, K., Suchanek, F.M.: Fast rule mining in ontological knowledge bases with AMIE+. VLDB J. 24(6), 707–730 (2015)CrossRefGoogle Scholar
  11. 11.
    Ibrahim, Y., Riedewald, M., Weikum, G.: Making sense of entities and quantities in web tables. In: CIKM (2016)Google Scholar
  12. 12.
    Kingma, D., Ba, J.: Adam: a method for stochastic optimization. arXiv:1412.6980 (2014)
  13. 13.
    Koch, M., Gilmer, J., Soderland, S., Weld, D.S.: Type-aware distantly supervised relation extraction with linked arguments. In: EMNLP (2014)Google Scholar
  14. 14.
    Kudo, T.: CRF++: Yet another CRF toolkit (2005). https://sourceforge.net/projects/crfpp/
  15. 15.
    Lample, G., Ballesteros, M., Subramanian, S., Kawakami, K., Dyer, C.: Neural architectures for named entity recognition. In: NAACL (2016)Google Scholar
  16. 16.
    Ling, X., Weld, D.S.: Temporal information extraction. In: AAAI (2010)Google Scholar
  17. 17.
    Madaan, A., Mittal, A., Mausam, G.R., Ramakrishnan, G., Sarawagi, S.: Numerical relation extraction with minimal supervision. In: AAAI (2016)Google Scholar
  18. 18.
    Mausam: Open information extraction systems and downstream applications. In: IJCAI (2016)Google Scholar
  19. 19.
    Mausam, Schmitz, M., Soderland, S., Bart, R., Etzioni, O.: Open language learning for information extraction. In: EMNLP (2012)Google Scholar
  20. 20.
    Min, B., Grishman, R., Wan, L., Wang, C., Gondek, D.: Distant supervision for relation extraction with an incomplete knowledge base. In: HLT-NAACL (2013)Google Scholar
  21. 21.
    Mintz, M., Bills, S., Snow, R., Jurafsky, D.: Distant supervision for relation extraction without labeled data. In: ACL/IJCNLP (2009)Google Scholar
  22. 22.
    Mirza, P., Razniewski, S., Darari, F., Weikum, G.: Cardinal virtues: extracting relation cardinalities from text. In: ACL 2017 (Short Papers) (2017)Google Scholar
  23. 23.
    Mitchell, T.M., et al.: Never-ending learning. In: AAAI (2015)Google Scholar
  24. 24.
    Neumaier, S., Umbrich, J., Parreira, J.X., Polleres, A.: Multi-level semantic labelling of numerical values. In: Groth, P., et al. (eds.) ISWC 2016. LNCS, vol. 9981, pp. 428–445. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-46523-4_26CrossRefGoogle Scholar
  25. 25.
    Palomares, T., Ahres, Y., Kangaspunta, J., Ré, C.: Wikipedia knowledge graph with DeepDive. In: ICWSM (2016)Google Scholar
  26. 26.
    Pennington, J., Socher, R., Manning, C.D.: GloVe: global vectors for word representation. In: EMNLP (2014)Google Scholar
  27. 27.
    Riedel, S., Yao, L., McCallum, A.: Modeling relations and their mentions without labeled text. In: Balcázar, J.L., Bonchi, F., Gionis, A., Sebag, M. (eds.) ECML PKDD 2010. LNCS (LNAI), vol. 6323, pp. 148–163. Springer, Heidelberg (2010).  https://doi.org/10.1007/978-3-642-15939-8_10CrossRefGoogle Scholar
  28. 28.
    Saha, S., Pal, H., Mausam: Bootstrapping for numerical open IE. In: ACL (2017)Google Scholar
  29. 29.
    Speer, R., Havasi, C.: Representing general relational knowledge in ConceptNet 5. In: LREC (2012)Google Scholar
  30. 30.
    Strötgen, J., Gertz, M.: Heideltime: high quality rule-based extraction and normalization of temporal expressions. In: SemEval Workshop (2010)Google Scholar
  31. 31.
    Suchanek, F.M., Kasneci, G., Weikum, G.: YAGO: a core of semantic knowledge. In: WWW (2007)Google Scholar
  32. 32.
    Suchanek, F.M., Sozio, M., Weikum, G.: SOFIE: a self-organizing framework for information extraction. In: WWW (2009)Google Scholar
  33. 33.
    Surdeanu, M., Tibshirani, J., Nallapati, R., Manning, C.D.: Multi-instance multi-label learning for relation extraction. In: ACL (2012)Google Scholar
  34. 34.
    Tan, C.H., Agichtein, E., Ipeirotis, P., Gabrilovich, E.: Trust, but verify: predicting contribution quality for knowledge base construction and curation. In: WSDM (2014)Google Scholar
  35. 35.
    Vrandečić, D., Krötzsch, M.: Wikidata: a free collaborative knowledgebase. In: CACM (2014)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  • Paramita Mirza
    • 1
  • Simon Razniewski
    • 1
  • Fariz Darari
    • 2
  • Gerhard Weikum
    • 1
  1. 1.Max Planck Institute for InformaticsSaarbrückenGermany
  2. 2.Universitas IndonesiaDepokIndonesia

Personalised recommendations