Crowdsourcing-Enhanced Missing Values Imputation Based on Bayesian Network
Due to development of the Internet, the size of data continue to be large and rough. During the process of data collection, different kinds of data problems occurred, among where incompleteness is one of the most serious problems to deal with. The existing methods for missing values imputation have mostly relied on using statistics and machine learning. These methods are known to be limited in efficiency and accuracy, which are caused by high dimensional calculation and low quality of initial data. In this paper, we propose a new method combining Bayesian network and crowdsourcing to deal with missing values together. We use Bayesian network to inference missing values to improve efficiency while use crowdsourcing to obtain additional information in need to improve accuracy. Experiments on real datasets show that our methods achieve better performance compared to other imputation methods.
KeywordsMissing values Bayesian network Crowdsourcing
This paper was supported by NGFR 973 grant 2012CB316200, NSFC grant U1509216, 61472099, 61133002 and National Sci-Tech Support Plan 2015BAH10F01.
- 3.Shan, Y., Kernel, D.G.: PCA regression for missing data estimation in DNA microarray analysis. In: IEEE International Symposium on Circuits and Systems, ISCAS 2009, pp. 1477–1480. IEEE (2009)Google Scholar
- 4.Lakshminarayan, K., Harp, S.A., Goldman, R.P., et al.: Imputation of missing data using machine learning techniques. In: KDD, pp. 140–145 (1996)Google Scholar
- 6.Li, X.B.: A Bayesian approach for estimating and replacing missing categorical data. J. Data Inf. Qual. (JDIQ) 1(1), 3 (2009)Google Scholar
- 9.Setiawan, N.A., Venkatachalam, P.A., Hani, A.F.M.: Missing attribute value prediction based on artificial neural network and rough set theory. In: International Conference on BioMedical Engineering and Informatics, BMEI 2008, vol. 1, pp. 306–310. IEEE (2008)Google Scholar
- 10.Nowak, S., Rger, S.: How reliable are annotations via crowdsourcing: a study about inter-annotator agreement for multi-label image annotation. In: Proceedings of the International Conference on Multimedia Information Retrieval, pp. 557–566. ACM (2010)Google Scholar
- 11.Noronha, J., Hysen, E., Zhang, H., et al.: Platemate: crowdsourcing nutritional analysis from food photographs. In: Proceedings of the 24th Annual ACM Symposium on User Interface Software and Technology, pp. 1–12. ACM (2011)Google Scholar
- 22.Hochbaum, D.S.: Approximating covering and packing problems: set cover, vertex cover, independent set, and related problems. In: Approximation Algorithms for NP-Hard Problems, pp. 94–143. PWS Publishing Co. (1996)Google Scholar
- 23.Li, J., Cai, Z., Yan, M., Li, Y.: Using crowdsourced data in location-based social networks to explore influence maximization. In: The 35th Annual IEEE International Conference on Computer Communications (INFOCOM 2016) (2016)Google Scholar