A Effective Truth Discovery Algorithm with Multi-source Sparse Data

  • Jiyuan Zhang
  • Shupeng WangEmail author
  • Guangjun WuEmail author
  • Lei ZhangEmail author
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10862)


The problem to find out the truth from inconsistent information is defined as Truth Discovery. The essence of truth discovery is to estimate source quality. Therefore the measuring mechanism of data source will immensely affect the result and process of truth discovery. However the state-of-the-art algorithms dont consider how source quality is affected when null is provided by source. We propose to use the Silent Rate, True Rate and False Rate to measure source quality in this paper. In addition, we utilize Probability Graphical Model to model truth and source quality which is measured through null and real data. Our model makes full use of all claims and null to improve the accuracy of truth discovery. Compared with prevalent approaches, the effectiveness of our approach is verified on three real datasets and the recall has improved significantly.


Truth discovery Data fusion Multi source data confliction 



This work was supported by National Natural Science Foundation of China (No. 61601458) and National Key Research and Development Program of China (No. 2016YFB0801004, 2016YFB0801305).


  1. 1.
    Blanco, L., Crescenzi, V., Merialdo, P., Papotti, P.: Probabilistic models to reconcile complex data from inaccurate data sources. In: International Conference on Advanced Information Systems Engineering, pp. 83–97 (2010)CrossRefGoogle Scholar
  2. 2.
    Dian, Y., Hongzhao, H., Taylor, C., Ji, H., Chi, W., Shi, Z., Jiawei, H., Clare, V., Malik, M.I.: The wisdom of minority: Unsupervised slot filling validation based on multi-dimensional truth-finding. In: Proceedings of 2014 International Conference on Computational Linguistics, pp. 1567–1578 (2014)Google Scholar
  3. 3.
    Furong, L., Mong Li, L., Wynne, H.: Entity profiling with varying source reliabilities. In: ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1146–1155 (2014)Google Scholar
  4. 4.
    Galland, A., Abiteboul, S., Marian, A., Senellart, P.: Corroborating information from disagreeing views, pp. 131–140 (2010)Google Scholar
  5. 5.
    Li, Q., Li, Y., Gao, J., Su, L., Zhao, B., Demirbas, M., Fan, W., Han, J.: A confidence-aware approach for truth discovery on long-tail data. Very Large Data Bases 8(4), 425–436 (2014)Google Scholar
  6. 6.
    Li, Q., Li, Y., Gao, J., Zhao, B., Fan, W., Han, J.: Resolving conflicts in heterogeneous data by truth discovery and source reliability estimation, pp. 1187–1198 (2014)Google Scholar
  7. 7.
    Wang, D., Kaplan, L.M., Le, H.K., Abdelzaher, T.F.: On truth discovery in social sensing: a maximum likelihood estimation approach, pp. 233–244 (2012)Google Scholar
  8. 8.
    Xiao, H., Gao, J., Wang, Z., Wang, S., Su, L., Liu, H.: A truth discovery approach with theoretical guarantee. In: ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1925–1934 (2016)Google Scholar
  9. 9.
    Yin, X., Han, J., Yu, P.S.: Truth discovery with multiple conflicting information providers on the web. IEEE Trans. Knowl. Data Eng. 20(6), 796–808 (2008)CrossRefGoogle Scholar
  10. 10.
    Zhao, B., Rubinstein, B.I.P., Gemmell, J., Han, J.: A Bayesian approach to discovering truth from conflicting sources for data integration. Very Large Data Bases 5(6), 550–561 (2012)Google Scholar
  11. 11.
    Zhou, D., Basu, S., Mao, Y., Platt, J.: Learning from the wisdom of crowds by minimax entropy, pp. 2195–2203 (2012)Google Scholar

Copyright information

© Springer International Publishing AG, part of Springer Nature 2018

Authors and Affiliations

  1. 1.Institute of Information Engineering, CASBeijingChina
  2. 2.School of Cyber SecurityUniversity of Chinese Academy of SciencesBeijingChina

Personalised recommendations