Crowdsourcing High-Quality Structured Data

  • Harry HalpinEmail author
  • Ioanna LykourentzouEmail author
Conference paper
Part of the Communications in Computer and Information Science book series (CCIS, volume 898)


One of the most difficult problems faced by consumers of semi-structured and structured data on the Web is how to discover or create the data they need. On the other hand, the producers of Web data do not have any (semi)automated way to align their data production with consumer needs. In this paper we formalize the problem of a data marketplace, hypothesize that one can quantify the value of semi-structured and structured data given a set of consumers, and that this quantification can be applied on both existing data-sets and data-sets that need to be created. Furthermore, we provide an algorithm for showing how the production of this data can be crowd-sourced while assuring the consumer a certain level of quality. Using real-world empirical data collected via data producers and consumers, we simulate a crowd-sourced data marketplace with quality guarantees.


Crowdsourcing Structured data Resource allocation Human computation 


  1. 1.
    Amsterdamer, Y., Grossman, Y., Milo, T., Senellart, P.: Crowd mining. In: SIGMOD Conference, pp. 241–252 (2013)Google Scholar
  2. 2.
    Balog, K., Azzopardi, L., de Rijke, M.: Formal models for expert finding in enterprise corpora. In: Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2006, pp. 43–50. ACM, New York (2006)Google Scholar
  3. 3.
    Bernstein, M.S., Karger, D.R., Miller, R.C., Brandt, J.: Analytic methods for optimizing realtime crowdsourcing. CoRR, abs/1204.2995 (2012)Google Scholar
  4. 4.
    Dertouzos, M., Gates, B.: What Will Be: How the New World of Information Will Change Our Lives. HarperCollins, New York City (1998)Google Scholar
  5. 5.
    Dorn, C., Dustdar, S.: Composing near-optimal expert teams: a trade-off between skills and connectivity. In: Meersman, R., Dillon, T., Herrero, P. (eds.) OTM 2010. LNCS, vol. 6426, pp. 472–489. Springer, Heidelberg (2010). Scholar
  6. 6.
    Ghosh, A., Hummel, P.: Implementing optimal outcomes in social computing: a game-theoretic approach. In: Proceedings of the 21st International Conference on World Wide Web, WWW 2012, pp. 539–548. ACM, New York (2012)Google Scholar
  7. 7.
    Halpin, H., Hayes, P.J., McCusker, J.P., McGuinness, D.L., Thompson, H.S.: When owl:sameAs Isn’t the same: an analysis of identity in linked data. In: Patel-Schneider, P.F., et al. (eds.) ISWC 2010. LNCS, vol. 6496, pp. 305–320. Springer, Heidelberg (2010). Scholar
  8. 8.
    Ho, C.-J., Slivkins, A., Suri, S., Vaughan, J.W.: Incentivizing high quality crowdwork. In: Proceedings of the 24th International Conference on World Wide Web, International World Wide Web Conferences Steering Committee, pp. 419–429 (2015)Google Scholar
  9. 9.
    Huang, E., Zhang, H., Parkes, D.C., Gajos, K.Z., Chen, Y.: Toward automatic task design: a progress report. In: Proceedings of the ACM SIGKDD Workshop on Human Computation, HCOMP 2010, pp. 77–85. ACM, New York (2010)Google Scholar
  10. 10.
    Ipeirotis, P.G., Provost, F., Wang, J.: Quality management on amazon mechanical turk. In: Human Computation Workshop (KDD-HCOMP 2010) (2010)Google Scholar
  11. 11.
    Lykourentzou, I., Vergados, D.J., Naudet, Y.: Improving wiki article quality through crowd coordination: a resource allocation approach. Int. J. Semantic Web Inf. Syst. 9(3), 105–125 (2013)CrossRefGoogle Scholar
  12. 12.
    Mao, A., et al.: Volunteering versus work for pay: incentives and tradeoffs in crowdsourcing. In: First AAAI Conference on Human Computation and crowdsourcing (2013)Google Scholar
  13. 13.
    Mason, W., Watts, D.J.: Financial incentives and the “performance of crowds”. In: Human Computation Workshop (HComp2009) (2009)Google Scholar
  14. 14.
    Nath, S., Zoeter, O., Narahari, Y., Dance, C.: Dynamic mechanism design for markets with strategic resources. In: Proceedings of the Twenty-Seventh Conference Annual Conference on Uncertainty in Artificial Intelligence (UAI 2011), pp. 539–546. AUAI Press, Corvallis (2011)Google Scholar
  15. 15.
    Shahaf, D., Horvitz, E.: Generalized task markets for human and machine computation. In: National Conference on Artificial Intelligence (2010)Google Scholar
  16. 16.
    Shen, H.Y.Z., Fauvel, S., Cui, L.: Efficient scheduling in crowdsourcing based on workers. In: 2017 IEEE International Conference on Agents (ICA), pp. 121–126. IEEE (2017)Google Scholar
  17. 17.
    Smirnova, E.: A model for expert finding in social networks. In Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2011, pp. 1191–1192. ACM, New York (2011)Google Scholar
  18. 18.
    Von Ahn, L.: Human computation. Ph.D. thesis, Carnegie Mellon University, Pittsburgh, PA, USA (2005). AAI3205378Google Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.InriaParisFrance
  2. 2.Utrecht UniversityUtrechtNetherlands

Personalised recommendations