Crowdsourcing Linked Data Quality Assessment

  • Maribel Acosta
  • Amrapali Zaveri
  • Elena Simperl
  • Dimitris Kontokostas
  • Sören Auer
  • Jens Lehmann
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8219)


In this paper we look into the use of crowdsourcing as a means to handle Linked Data quality problems that are challenging to be solved automatically. We analyzed the most common errors encountered in Linked Data sources and classified them according to the extent to which they are likely to be amenable to a specific form of crowdsourcing. Based on this analysis, we implemented a quality assessment methodology for Linked Data that leverages the wisdom of the crowds in different ways: (i) a contest targeting an expert crowd of researchers and Linked Data enthusiasts; complemented by (ii) paid microtasks published on Amazon Mechanical Turk.We empirically evaluated how this methodology could efficiently spot quality issues in DBpedia. We also investigated how the contributions of the two types of crowds could be optimally integrated into Linked Data curation processes. The results show that the two styles of crowdsourcing are complementary and that crowdsourcing-enabled quality assessment is a promising and affordable way to enhance the quality of Linked Data.


Data Type Link Data Quality Problem Quality Issue Quality Assessment Methodology 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Bernstein, M.S., Little, G., Miller, R.C., Hartmann, B., Ackerman, M.S., Karger, D.R., Crowell, D., Panovich, K.: Soylent: a word processor with a crowd inside. In: Proceedings of the 23nd Annual ACM Symposium on User Interface Software and Technology, UIST 2010, pp. 313–322. ACM, New York (2010),, doi:10.1145/1866029.1866078CrossRefGoogle Scholar
  2. 2.
    Bizer, C., Cyganiak, R.: Quality-driven information filtering using the wiqa policy framework. Web Semantics 7(1), 1–10 (2009)CrossRefGoogle Scholar
  3. 3.
    Cafarella, M.J., Halevy, A.Y., Wang, D.Z., Wu, E., Zhang, Y.: Webtables: exploring the power of tables on the web. PVLDB 1(1), 538–549 (2008)Google Scholar
  4. 4.
    Demartini, G., Difallah, D., Cudré-Mauroux, P.: Zencrowd: Leveraging probabilistic reasoning and crowdsourcing techniques for large-scale entity linking. In: 21st International Conference on World Wide Web WWW 2012, pp. 469–478 (2012)Google Scholar
  5. 5.
    Flemming, A.: Quality characteristics of linked data publishing datasources. Master’s thesis, Humboldt-Universität of Berlin (2010)Google Scholar
  6. 6.
    Guéret, C., Groth, P., Stadler, C., Lehmann, J.: Assessing linked data mappings using network measures. In: Simperl, E., Cimiano, P., Polleres, A., Corcho, O., Presutti, V. (eds.) ESWC 2012. LNCS, vol. 7295, pp. 87–102. Springer, Heidelberg (2012)CrossRefGoogle Scholar
  7. 7.
    Hogan, A., Harth, A., Passant, A., Decker, S., Polleres, A.: Weaving the pedantic web. In: LDOW (2010)Google Scholar
  8. 8.
    Hogan, A., Umbrich, J., Harth, A., Cyganiak, R., Polleres, A., Decker, S.: An empirical survey of linked data conformance. Journal of Web Semantics 14, 14–44 (2012)CrossRefGoogle Scholar
  9. 9.
    Lehmann, J., Bizer, C., Kobilarov, G., Auer, S., Becker, C., Cyganiak, R., Hellmann, S.: DBpedia - a crystallization point for the web of data. Journal of Web Semantics 7(3), 154–165 (2009)CrossRefGoogle Scholar
  10. 10.
    Markotschi, T., Völker, J.: Guesswhat?! - human intelligence for mining linked data. In: Proceedings of the Workshop on Knowledge Injection into and Extraction from Linked Data at EKAW (2010)Google Scholar
  11. 11.
    Mendes, B.C., Mühleisen, P.N., Sieve, H.: Linked data quality assessment and fusion. In: LWDM (2012)Google Scholar
  12. 12.
    Sarasua, C., Simperl, E., Noy, N.F.: crowdMap: Crowdsourcing ontology alignment with microtasks. In: Cudré-Mauroux, P., et al. (eds.) ISWC 2012, Part I. LNCS, vol. 7649, pp. 525–541. Springer, Heidelberg (2012)CrossRefGoogle Scholar
  13. 13.
    Thaler, S., Siorpaes, K., Simperl, E.: Spotthelink: A game for ontology alignment. In: Proceedings of the 6th Conference for Professional Knowledge Management (2011)Google Scholar
  14. 14.
    Wang, J., Kraska, T., Franklin, M.J., Feng, J.: Crowder: crowdsourcing entity resolution. Proc. VLDB Endow. 5, 1483–1494 (2012)Google Scholar
  15. 15.
    Zaveri, A., Kontokostas, D., Sherif, M.A., Bühmann, L., Morsey, M., Auer, S., Lehmann, J.: User-driven quality evaluation of dbpedia. In: Proceedings of 9th International Conference on Semantic Systems, I-SEMANTICS 2013, September 4–6. ACM, Graz (2013)Google Scholar
  16. 16.
    Zaveri, A., Rula, A., Maurino, A., Pietrobon, R., Lehmann, J., Auer, S.: Quality assessment methodologies for linked open data. Under Review,

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • Maribel Acosta
    • 1
  • Amrapali Zaveri
    • 2
  • Elena Simperl
  • Dimitris Kontokostas
    • 2
  • Sören Auer
    • 3
  • Jens Lehmann
    • 2
  1. 1.Institute AIFBKarlsruhe Institute of TechnologyGermany
  2. 2.Institut für Informatik, AKSWUniversität LeipzigGermany
  3. 3.Enterprise Information Systems and Fraunhofer IAISUniversity of BonnGermany

Personalised recommendations