Advertisement

CrowdCleaner: A Data Cleaning System Based on Crowdsourcing

  • Chen Ye
  • Hongzhi Wang
  • Keli Li
  • Qian Chen
  • Jianhua Chen
  • Jiangduo Song
  • Weidong Yuan
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8709)

Abstract

As data in real life is often dirty, data cleaning is a natural way to improve the data quality. However, due to the lack of human knowledge, existing automatic data cleaning systems cannot find the proper values for dirty data. Thus we propose an online data cleaning system CrowdCleaner based on Crowdsourcing. CrowdCleaner provides a friendly interface for users dealing with different data quality problems. In this demonstration, we show the architecture of CrowdCleaner and highlight a few of its key features. We will show the process of the CrowdCleaner to clean data.

Keywords

Data cleaning crowdsourcing truth discovery 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Howe, J.: The rise of crowdsourcing. Wired Magazine 14(6), 1–4 (2006)MathSciNetGoogle Scholar
  2. 2.
    Jin, L., Wang, H., Gao, H.: Imputation for categorical attributes with probabilistic reasoning. In: Wang, J., Xiong, H., Ishikawa, Y., Xu, J., Zhou, J. (eds.) WAIM 2013. LNCS, vol. 7923, pp. 87–98. Springer, Heidelberg (2013)CrossRefGoogle Scholar
  3. 3.
    Galhardas, H., Florescu, D., Shasha, D., Simon, E., Saita, C.-A.: Declarative data cleaning: Language, model, and algorithms. In: VLDB, pp. 371–380 (2001)Google Scholar
  4. 4.
    Raman, V., Hellerstein, J.M.: Potter’s wheel: An interactive data cleaning system. In: VLDB, pp. 381–390 (2001)Google Scholar
  5. 5.
    Redman, T.C.: Data: An unfolding quality disaster. Information Management Magazine (August 2004)Google Scholar
  6. 6.
    Shilakes, C., Tylman, J.: Enterprise information portals. Merrill Lynch (1998)Google Scholar
  7. 7.
    Bhattacharya, I., Getoor, L.: Collective entity resolution in relational data. ACM Transactions on Knowledge Discovery from Data (TKDD) 1(1), 5 (2007)CrossRefGoogle Scholar
  8. 8.
    Ye, C., Wang, H., Gao, H., Li, J., Xie, H.: Truth discovery based on crowdsourcing. In: Li, F., Li, G., Hwang, S.-w., Yao, B., Zhang, Z. (eds.) WAIM 2014. LNCS, vol. 8485, pp. 453–458. Springer, Heidelberg (2014)CrossRefGoogle Scholar
  9. 9.
    Fan, W.: Dependencies revisited for improving data quality. In: PODS, pp. 159–170 (2008)Google Scholar
  10. 10.
    Cong, G., Fan, W., Geerts, F., Jia, X., Ma, S.: Improving Data Quality: Consistency and Accuracy. In: VLDB 2007, pp. 315–326 (2007)Google Scholar
  11. 11.
    Liu, S., Liu, Y., Ni, L.M., et al.: Towards mobility-based clustering. In: Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 919–928. ACM (2010)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  • Chen Ye
    • 1
  • Hongzhi Wang
    • 1
  • Keli Li
    • 1
  • Qian Chen
    • 1
  • Jianhua Chen
    • 1
  • Jiangduo Song
    • 1
  • Weidong Yuan
    • 1
  1. 1.School of Computer ScienceHarbin Institute of TechnologyHarbinChina

Personalised recommendations