Crowdsourced Entity Alignment: A Decision Theory Based Approach

  • Yan Zhuang
  • Guoliang LiEmail author
  • Jianhua Feng
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10570)


Crowdsourcing is a new computation paradigm that utilizes the wisdom of the crowd to solve problems which are difficult for computers (e.g., image annotation and entity alignment). In crowdsourced entity alignment tasks, there are usually large numbers of candidate pairs to be verified by the crowd workers, and each pair will be assigned to multiple workers to achieve high quality. Thus, two fundamental problems are raised: (1) question selection – what are the most beneficial questions that should be crowdsourced, and (2) question assignment – which workers should be assigned to answer a selected question? In this paper, we address these two problems by decision theory. Firstly, we define the problems on two budget constraints. The first takes the marginal gain into account, and the second focuses on the limited budget. Then, we formulate the decision-making problems under different budget constraints and build influence diagram to perform result inference. We propose two efficient algorithms to address these two problems. Finally, we conduct extensive experiments to validate the efficiency and effectiveness of our proposed algorithms on both synthetic and real data.


Entity alignment Crowdsourcing Decision theory 



This work was supported by 973 Program of China (2015CB358700), NSF of China (61632016, 61373024, 61602488, 61422205, 61472198), FDCT/007/2016/AFJ, and Key Projects of Military Logistics Research (BHJ14L010).


  1. 1.
    Arasu, A., Götz, M., Kaushik, R.: On active learning of record matching packages. In: SIGMOD 2010, Indianapolis, Indiana, USA, 6 June–10 June 2010, pp. 783–794 (2010)Google Scholar
  2. 2.
    Cao, C.C., She, J., Tong, Y., Chen, L.: Whom to ask? jury selection for decision making tasks on micro-blog services. PVLDB 5(11), 1495–1506 (2012)Google Scholar
  3. 3.
    Chai, C., Li, G., Li, J., Deng, D., Feng, J.: Cost-effective crowdsourced entity resolution: a partial-order approach. In: SIGMOD 2016, San Francisco, CA, USA, 26 June–01 July 2016 (2016)Google Scholar
  4. 4.
    Mo. L., et al.: Optimizing plurality for human intelligence tasks. In: CIKM 2013, San Francisco, CA, USA, 27 October–1 November 2013, pp. 1929–1938 (2013)Google Scholar
  5. 5.
    Fan, J., Li, G., Ooi, B.C., Tan, K., Feng, J.: iCrowd: an adaptive crowdsourcing framework. In: SIGMOD, Melbourne, Victoria, Australia, 31 May–4 June 2015, pp. 1015–1030 (2015)Google Scholar
  6. 6.
    Getoor, L., Machanavajjhala, A.: Entity resolution for big data. In: KDD 2013, Chicago, IL, USA, 11 August–14 August 2013, p. 1527 (2013)Google Scholar
  7. 7.
    Gokhale, C., Das, S., Doan, A., Naughton, J.F., Rampalli, N., Shavlik, J.W., Zhu, X.: Corleone: hands-off crowdsourcing for entity matching. In: SIGMOD 2014, Snowbird, UT, USA, 22 June–27 June 2014, pp. 601–612 (2014)Google Scholar
  8. 8.
    Lehmann, J., Isele, R., Jakob, M., Jentzsch, A., Kontokostas, D., Mendes, P.N., Hellmann, S., Morsey, M., van Kleef, P., Auer, S., Bizer, C.: DBPedia - a large-scale, multilingual knowledge base extracted from wikipedia. Semant. Web 6(2), 167–195 (2015)Google Scholar
  9. 9.
    Li, G.: Human-in-the-loop data integration. PVLDB 10(12), 2006–2017 (2017)Google Scholar
  10. 10.
    Li, G., Chai, C., Fan, J., Weng, X., Li, J., Zheng, Y., Li, Y., Yu, X., Zhang, X., Yuan, H.: CDB: optimizing queries with crowd-based selections and joins. In: SIGMOD (2017)Google Scholar
  11. 11.
    Li, G., Wang, J., Zheng, Y., Franklin, M.J.: Crowdsourced data management: a survey. IEEE Trans. Knowl. Data Eng. 28(9), 2296–2319 (2016)CrossRefGoogle Scholar
  12. 12.
    Martello, S., Toth, P.: Knapsack Problems: Algorithms and Computer Implementations. Wiley, New York (1990)Google Scholar
  13. 13.
    Russell, S.J., Norvig, P.: Artificial Intelligence - A Modern Approach. Pearson Education, London (2010). (3. internat. ed.)Google Scholar
  14. 14.
    Suchanek, F.M., Kasneci, G., Weikum, G.: YAGO: a large ontology from wikipedia and wordnet. J. Web Sem. 6(3), 203–217 (2008)CrossRefGoogle Scholar
  15. 15.
    Vesdapunt, N., Bellare, K., Dalvi, N.N.: Crowdsourcing algorithms for entity resolution. PVLDB 7(12), 1071–1082 (2014)Google Scholar
  16. 16.
    Wang, J., Kraska, T., Franklin, M.J., Feng, J.: Crowder: crowdsourcing entity resolution. PVLDB 5(11), 1483–1494 (2012)Google Scholar
  17. 17.
    Wang, J., Li, G., Kraska, T., Franklin, M.J., Feng, J.: Leveraging transitive relations for crowdsourced joins. In: SIGMOD 2013, New York, NY, USA, 22 June–27 June 2013 (2013)Google Scholar
  18. 18.
    Wang, S., Xiao, X., Lee, C.: Crowd-based deduplication: an adaptive approach. In: SIGMOD 2015, Melbourne, Victoria, Australia, 31 May–June 4 2015, pp. 1263–1277 (2015)Google Scholar
  19. 19.
    Whang, S.E., Lofgren, P., Garcia-Molina, H.: Question selection for crowd entity resolution. PVLDB 6(6), 349–360 (2013)Google Scholar
  20. 20.
    Zheng, Y., Cheng, R., Maniu, S., Mo, L.: On optimality of jury selection in crowdsourcing. In: EDBT 2015, Brussels, Belgium, 23 March–27 March 2015, pp. 193–204 (2015)Google Scholar
  21. 21.
    Zheng, Y., Li, G., Li, Y., Shan, C., Cheng, R.: Truth inference in crowdsourcing: is the problem solved? PVLDB 10(5), 541–552 (2017)Google Scholar
  22. 22.
    Zheng, Y., Wang, J., Li, G., Cheng, R., Feng, J.: QASCA: a quality-aware task assignment system for crowdsourcing applications. In: SIGMOD, pp. 1031–1046 (2015)Google Scholar
  23. 23.
    Zhuang, Y., Li, G., Zhong, Z., Feng, J.: Crowd-based large knowledge bases alignment. Technical report (2016).

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  1. 1.Department of Computer ScienceTsinghua UniversityBeijingChina

Personalised recommendations