Advertisement

Leveraging Attributes and Crowdsourcing for Join

  • Jianhong Feng
  • Jianhua Feng
  • Huiqi Hu
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8485)

Abstract

Join operation is usually hard to achieve high quality with machine alone. We adopt crowdsourcing to improve the quality of join. Depending on the number of generated pairs, the overall cost can be expensive for hiring workers to do the verification. We propose a hybrid approach to generate pairs by leveraging attributes, which combines category, sorting and clustering techniques, called CSCER. We also propose an adaptive attribute-selection strategy to efficiently generate pairs based on attributes. Experiments on a real crowdsourcing platform using real datasets indicate that our approaches save the overall cost compared to existing methods and achieve high quality of join results.

Keywords

Cluster Technique Real Dataset Inference Method True Match Transitive Relation 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
  2. 2.
    Marcus, A., Wu, E., Karger, D.R., Madden, S., Miller, R.C.: Human-powered sorts and joins. PVLDB 5(1), 13–24 (2011)Google Scholar
  3. 3.
    Wang, J., Kraska, T., Franklin, M.J., Feng, J.: Crowder: Crowdsourcing entity resolution. Proceedings of the VLDB Endowment 5(11), 1483–1494 (2012)Google Scholar
  4. 4.
    Wang, J., Li, G., Kraska, T., Franklin, M.J., Feng, J.: Leveraging transitive relations for crowdsourced joins. In: SIGMOD Conference, pp. 229–240 (2013)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  • Jianhong Feng
    • 1
  • Jianhua Feng
    • 1
  • Huiqi Hu
    • 1
  1. 1.Department of Computer ScienceTsinghua UniversityBeijingChina

Personalised recommendations