A Effective Truth Discovery Algorithm with Multi-source Sparse Data
The problem to find out the truth from inconsistent information is defined as Truth Discovery. The essence of truth discovery is to estimate source quality. Therefore the measuring mechanism of data source will immensely affect the result and process of truth discovery. However the state-of-the-art algorithms dont consider how source quality is affected when null is provided by source. We propose to use the Silent Rate, True Rate and False Rate to measure source quality in this paper. In addition, we utilize Probability Graphical Model to model truth and source quality which is measured through null and real data. Our model makes full use of all claims and null to improve the accuracy of truth discovery. Compared with prevalent approaches, the effectiveness of our approach is verified on three real datasets and the recall has improved significantly.
KeywordsTruth discovery Data fusion Multi source data confliction
This work was supported by National Natural Science Foundation of China (No. 61601458) and National Key Research and Development Program of China (No. 2016YFB0801004, 2016YFB0801305).
- 2.Dian, Y., Hongzhao, H., Taylor, C., Ji, H., Chi, W., Shi, Z., Jiawei, H., Clare, V., Malik, M.I.: The wisdom of minority: Unsupervised slot filling validation based on multi-dimensional truth-finding. In: Proceedings of 2014 International Conference on Computational Linguistics, pp. 1567–1578 (2014)Google Scholar
- 3.Furong, L., Mong Li, L., Wynne, H.: Entity profiling with varying source reliabilities. In: ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1146–1155 (2014)Google Scholar
- 4.Galland, A., Abiteboul, S., Marian, A., Senellart, P.: Corroborating information from disagreeing views, pp. 131–140 (2010)Google Scholar
- 5.Li, Q., Li, Y., Gao, J., Su, L., Zhao, B., Demirbas, M., Fan, W., Han, J.: A confidence-aware approach for truth discovery on long-tail data. Very Large Data Bases 8(4), 425–436 (2014)Google Scholar
- 6.Li, Q., Li, Y., Gao, J., Zhao, B., Fan, W., Han, J.: Resolving conflicts in heterogeneous data by truth discovery and source reliability estimation, pp. 1187–1198 (2014)Google Scholar
- 7.Wang, D., Kaplan, L.M., Le, H.K., Abdelzaher, T.F.: On truth discovery in social sensing: a maximum likelihood estimation approach, pp. 233–244 (2012)Google Scholar
- 8.Xiao, H., Gao, J., Wang, Z., Wang, S., Su, L., Liu, H.: A truth discovery approach with theoretical guarantee. In: ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1925–1934 (2016)Google Scholar
- 10.Zhao, B., Rubinstein, B.I.P., Gemmell, J., Han, J.: A Bayesian approach to discovering truth from conflicting sources for data integration. Very Large Data Bases 5(6), 550–561 (2012)Google Scholar
- 11.Zhou, D., Basu, S., Mao, Y., Platt, J.: Learning from the wisdom of crowds by minimax entropy, pp. 2195–2203 (2012)Google Scholar