Leveraging Attributes and Crowdsourcing for Join
Join operation is usually hard to achieve high quality with machine alone. We adopt crowdsourcing to improve the quality of join. Depending on the number of generated pairs, the overall cost can be expensive for hiring workers to do the verification. We propose a hybrid approach to generate pairs by leveraging attributes, which combines category, sorting and clustering techniques, called CSCER. We also propose an adaptive attribute-selection strategy to efficiently generate pairs based on attributes. Experiments on a real crowdsourcing platform using real datasets indicate that our approaches save the overall cost compared to existing methods and achieve high quality of join results.
KeywordsCluster Technique Real Dataset Inference Method True Match Transitive Relation
Unable to display preview. Download preview PDF.
- 2.Marcus, A., Wu, E., Karger, D.R., Madden, S., Miller, R.C.: Human-powered sorts and joins. PVLDB 5(1), 13–24 (2011)Google Scholar
- 3.Wang, J., Kraska, T., Franklin, M.J., Feng, J.: Crowder: Crowdsourcing entity resolution. Proceedings of the VLDB Endowment 5(11), 1483–1494 (2012)Google Scholar
- 4.Wang, J., Li, G., Kraska, T., Franklin, M.J., Feng, J.: Leveraging transitive relations for crowdsourced joins. In: SIGMOD Conference, pp. 229–240 (2013)Google Scholar