Advertisement

Capture Missing Values with Inference on Knowledge Base

  • Zhixin Qi
  • Hongzhi WangEmail author
  • Fanshan Meng
  • Jianzhong Li
  • Hong Gao
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10179)

Abstract

Data imputation is a basic step for data cleaning. Traditional data imputation approaches are lack of accuracy in the absence of knowledge. Involving knowledge base in imputation could overcome this shortcoming. A challenge is that the missing value could be hardly found directly in the knowledge bases (KBs). To use knowledge base sufficiently for imputation, we present FOKES, an inference algorithm on knowledge bases. The inference not only makes full use of true facts in KBs, but also utilizes types to ensure the accuracy of captured missing values. Extensive experiments show that our proposed algorithm can capture missing values efficiently and effectively.

Keywords

Knowledge base Missing values Inference Imputation Data quality 

Notes

Acknowledgement

This paper was partially supported by NSFC grant U1509216, 61472099, National Sci-Tech Support Plan 2015BAH10F01, the Scientific Research Foundation for the Returned Overseas Chinese Scholars of Heilongjiang Provience LC2016026 and MOE - Microsoft Key Laboratory of Natural Language Processing and Speech, Harbin Institute of Technology. Hongzhi Wang is the corresponding author of this paper.

References

  1. 1.
  2. 2.
  3. 3.
  4. 4.
    Bollacker, K., Evans, C., Paritosh, P., Sturge, T., Taylor, J.: Freebase: a collaboratively created graph database for structuring human knowledge. In: SIGMOD (2008)Google Scholar
  5. 5.
    Chu, X., Morcos, J., Ilyas, I.F., Ouzzani, M., Papotti, P., Tang, N., Ye, Y.: KATARA: a data cleaning system powered by knowledge bases and crowdsourcing. In: SIGMOD (2015)Google Scholar
  6. 6.
    Hoffart, J., Suchanek, F.M., Berberich, K., Weikum, G.: YAGO2: a spatially and temporally enhanced knowledge base from wikipedia. Artif. Intell. 194, 28–61 (2013)MathSciNetCrossRefzbMATHGoogle Scholar
  7. 7.
    Hua, M., Pei, J.: DiMaC: a system for cleaning disguised missing data. In: SIGMOD (2008)Google Scholar
  8. 8.
    Lakshminarayan, K., Harp, S.A., Goldman, R.P., Samad, T.: Imputation of missing data using machine learning techniques. In: KDD (1996)Google Scholar
  9. 9.
    Mayfield, C., Neville, J., Prabhakar, S.: ERACER: a database approach for statistical inference and data cleaning. In: SIGMOD (2010)Google Scholar
  10. 10.
    Yang, K., Li, J., Wang, C.: Missing values estimation in microarray data with partial least squares regression. In: Alexandrov, V.N., Albada, G.D., Sloot, P.M.A., Dongarra, J. (eds.) ICCS 2006. LNCS, vol. 3992, pp. 662–669. Springer, Heidelberg (2006). doi: 10.1007/11758525_90 CrossRefGoogle Scholar

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  • Zhixin Qi
    • 1
  • Hongzhi Wang
    • 1
    Email author
  • Fanshan Meng
    • 1
  • Jianzhong Li
    • 1
  • Hong Gao
    • 1
  1. 1.Department of Computer Science and TechnologyHarbin Institute of TechnologyHarbinChina

Personalised recommendations