Enriching Knowledge Bases with Counting Quantifiers
- 5 Citations
- 3k Downloads
Abstract
Information extraction traditionally focuses on extracting relations between identifiable entities, such as \(\langle \)Monterey, locatedIn, California\(\rangle \). Yet, texts often also contain Counting information, stating that a subject is in a specific relation with a number of objects, without mentioning the objects themselves, for example, “California is divided into 58 counties”. Such counting quantifiers can help in a variety of tasks such as query answering or knowledge base curation, but are neglected by prior work.
This paper develops the first full-fledged system for extracting counting information from text, called CINEX. We employ distant supervision using fact counts from a knowledge base as training seeds, and develop novel techniques for dealing with several challenges: (i) non-maximal training seeds due to the incompleteness of knowledge bases, (ii) sparse and skewed observations in text sources, and (iii) high diversity of linguistic patterns. Experiments with five human-evaluated relations show that CINEX can achieve 60% average precision for extracting counting information. In a large-scale experiment, we demonstrate the potential for knowledge base enrichment by applying CINEX to 2,474 frequent relations in Wikidata. CINEX can assert the existence of 2.5M facts for 110 distinct relations, which is 28% more than the existing Wikidata facts for these relations.
References
- 1.Auer, S., Bizer, C., Kobilarov, G., Lehmann, J., Cyganiak, R., Ives, Z.: DBpedia: a nucleus for a web of open data. In: Aberer, K., et al. (eds.) ASWC/ISWC -2007. LNCS, vol. 4825, pp. 722–735. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-76298-0_52CrossRefGoogle Scholar
- 2.Brin, S.: Extracting patterns and relations from the World Wide Web. In: WebDB (1998)Google Scholar
- 3.Craven, M., Kumlien, J., et al.: Constructing biological knowledge bases by extracting information from text sources. In: ISMB (1999)Google Scholar
- 4.Dang, H.T., Kelly, D., Lin, J.J.: Overview of the TREC 2007 question answering track. TREC 7, 63 (2007)Google Scholar
- 5.Darari, F., Nutt, W., Pirrò, G., Razniewski, S.: Completeness statements about RDF data sources and their use for query answering. In: Alani, H., et al. (eds.) ISWC 2013. LNCS, vol. 8218, pp. 66–83. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-41335-3_5CrossRefGoogle Scholar
- 6.Del Corro, L., Gemulla, R.: ClausIE: clause-based open information extraction. In: WWW (2013)Google Scholar
- 7.Denecker, M., Cortés-Calabuig, A., Bruynooghe, M., Arieli, O.: Towards a logical reconstruction of a theory for locally closed databases. ACM Trans. Database Syst. 35(3) (2010)CrossRefGoogle Scholar
- 8.Dong, X.L., et al.: From data fusion to knowledge fusion. PVLDB 7(10), 881–892 (2014)Google Scholar
- 9.Dong, X.L., et al.: Knowledge vault: a web-scale approach to probabilistic knowledge fusion. In: KDD (2014)Google Scholar
- 10.Galárraga, L., Teflioudi, C., Hose, K., Suchanek, F.M.: Fast rule mining in ontological knowledge bases with AMIE+. VLDB J. 24(6), 707–730 (2015)CrossRefGoogle Scholar
- 11.Ibrahim, Y., Riedewald, M., Weikum, G.: Making sense of entities and quantities in web tables. In: CIKM (2016)Google Scholar
- 12.Kingma, D., Ba, J.: Adam: a method for stochastic optimization. arXiv:1412.6980 (2014)
- 13.Koch, M., Gilmer, J., Soderland, S., Weld, D.S.: Type-aware distantly supervised relation extraction with linked arguments. In: EMNLP (2014)Google Scholar
- 14.Kudo, T.: CRF++: Yet another CRF toolkit (2005). https://sourceforge.net/projects/crfpp/
- 15.Lample, G., Ballesteros, M., Subramanian, S., Kawakami, K., Dyer, C.: Neural architectures for named entity recognition. In: NAACL (2016)Google Scholar
- 16.Ling, X., Weld, D.S.: Temporal information extraction. In: AAAI (2010)Google Scholar
- 17.Madaan, A., Mittal, A., Mausam, G.R., Ramakrishnan, G., Sarawagi, S.: Numerical relation extraction with minimal supervision. In: AAAI (2016)Google Scholar
- 18.Mausam: Open information extraction systems and downstream applications. In: IJCAI (2016)Google Scholar
- 19.Mausam, Schmitz, M., Soderland, S., Bart, R., Etzioni, O.: Open language learning for information extraction. In: EMNLP (2012)Google Scholar
- 20.Min, B., Grishman, R., Wan, L., Wang, C., Gondek, D.: Distant supervision for relation extraction with an incomplete knowledge base. In: HLT-NAACL (2013)Google Scholar
- 21.Mintz, M., Bills, S., Snow, R., Jurafsky, D.: Distant supervision for relation extraction without labeled data. In: ACL/IJCNLP (2009)Google Scholar
- 22.Mirza, P., Razniewski, S., Darari, F., Weikum, G.: Cardinal virtues: extracting relation cardinalities from text. In: ACL 2017 (Short Papers) (2017)Google Scholar
- 23.Mitchell, T.M., et al.: Never-ending learning. In: AAAI (2015)Google Scholar
- 24.Neumaier, S., Umbrich, J., Parreira, J.X., Polleres, A.: Multi-level semantic labelling of numerical values. In: Groth, P., et al. (eds.) ISWC 2016. LNCS, vol. 9981, pp. 428–445. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46523-4_26CrossRefGoogle Scholar
- 25.Palomares, T., Ahres, Y., Kangaspunta, J., Ré, C.: Wikipedia knowledge graph with DeepDive. In: ICWSM (2016)Google Scholar
- 26.Pennington, J., Socher, R., Manning, C.D.: GloVe: global vectors for word representation. In: EMNLP (2014)Google Scholar
- 27.Riedel, S., Yao, L., McCallum, A.: Modeling relations and their mentions without labeled text. In: Balcázar, J.L., Bonchi, F., Gionis, A., Sebag, M. (eds.) ECML PKDD 2010. LNCS (LNAI), vol. 6323, pp. 148–163. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-15939-8_10CrossRefGoogle Scholar
- 28.Saha, S., Pal, H., Mausam: Bootstrapping for numerical open IE. In: ACL (2017)Google Scholar
- 29.Speer, R., Havasi, C.: Representing general relational knowledge in ConceptNet 5. In: LREC (2012)Google Scholar
- 30.Strötgen, J., Gertz, M.: Heideltime: high quality rule-based extraction and normalization of temporal expressions. In: SemEval Workshop (2010)Google Scholar
- 31.Suchanek, F.M., Kasneci, G., Weikum, G.: YAGO: a core of semantic knowledge. In: WWW (2007)Google Scholar
- 32.Suchanek, F.M., Sozio, M., Weikum, G.: SOFIE: a self-organizing framework for information extraction. In: WWW (2009)Google Scholar
- 33.Surdeanu, M., Tibshirani, J., Nallapati, R., Manning, C.D.: Multi-instance multi-label learning for relation extraction. In: ACL (2012)Google Scholar
- 34.Tan, C.H., Agichtein, E., Ipeirotis, P., Gabrilovich, E.: Trust, but verify: predicting contribution quality for knowledge base construction and curation. In: WSDM (2014)Google Scholar
- 35.Vrandečić, D., Krötzsch, M.: Wikidata: a free collaborative knowledgebase. In: CACM (2014)Google Scholar