Assessing Trust with PageRank in the Web of Data

  • José M. Giménez-García
  • Harsh Thakkar
  • Antoine Zimmermann
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9989)

Abstract

While a number of quality metrics have been successfully proposed for datasets in the Web of Data, there is a lack of trust metrics that can be computed for any given dataset. We argue that reuse of data can be seen as an act of trust. In the Semantic Web environment, datasets regularly include terms from other sources, and each of these connections express a degree of trust on that source. However, determining what is a dataset in this context is not straightforward. We study the concepts of dataset and dataset link, to finally use the concept of Pay-Level Domain to differentiate datasets, and consider usage of external terms as connections among them. Using these connections we compute the PageRank value for each dataset, and examine the influence of ignoring predicates for computation. This process has been performed for more than 300 datasets, extracted from the LOD Laundromat. The results show that reuse of a dataset is not correlated with its size, and provide some insight on the limitations of the approach and ways to improve its efficacy.

Keywords

Linked data Trust Reuse Interlinking PageRank Metric Assessment 

References

  1. 1.
    Beek, W., Rietveld, L., Bazoobandi, H.R., Wielemaker, J., Schlobach, S.: LOD laundromat: a uniform way of publishing other people’s dirty data. In: Mika, P., et al. (eds.) ISWC 2014, Part I. LNCS, vol. 8796, pp. 213–228. Springer, Heidelberg (2014)Google Scholar
  2. 2.
    Bonatti, P.A., Hogan, A., Polleres, A., Sauro, L.: Robust and scalable Linked Data reasoning incorporating provenance and trust annotations. Web Semant.: Sci. Serv. Agents World Wide Web 9(2), 165–201 (2011)CrossRefGoogle Scholar
  3. 3.
    Cheng, G., Qu, Y.: Searching linked objects with falcons: approach, implementation and evaluation. Int. J. Semant. Web Inf. Syst. (IJSWIS) 5(3), 49–70 (2009)CrossRefGoogle Scholar
  4. 4.
    Csardi, G., Nepusz, T.: The igraph software package for complex network research. InterJ. Complex Syst. 1695(5), 1–9 (2006)Google Scholar
  5. 5.
    Debattista, J., Londoño, S., Lange, C., Auer, S.: Quality assessment of linked datasets using probabilistic approximation. In: Gandon, F., Sabou, M., Sack, H., d’Amato, C., Cudré-Mauroux, P., Zimmermann, A. (eds.) ESWC 2015. LNCS, vol. 9088, pp. 221–236. Springer, Heidelberg (2015)CrossRefGoogle Scholar
  6. 6.
    Ding, L., Finin, T.W.: Characterizing the semantic web on the web. In: Cruz, I., Decker, S., Allemang, D., Preist, C., Schwabe, D., Mika, P., Uschold, M., Aroyo, L.M. (eds.) ISWC 2006. LNCS, vol. 4273, pp. 242–257. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  7. 7.
    Ding, L., Finin, T., Joshi, A., Pan, R., Cost, R.S., Peng, Y., Reddivari, P., Doshi, V., Sachs, J.: Swoogle: a search and metadata engine for the semantic web. In: Proceedings of the Thirteenth ACM International Conference on Information and Knowledge Management, pp. 652–659. ACM (2004)Google Scholar
  8. 8.
    Ding, L., Pan, R., Finin, T.W., Joshi, A., Peng, Y., Kolari, P.: Finding and ranking knowledge on the semantic web. In: Gil, Y., Motta, E., Benjamins, V.R., Musen, M.A. (eds.) ISWC 2005. LNCS, vol. 3729, pp. 156–170. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  9. 9.
    Ermilov, I., Martin, M., Lehmann, J., Auer, S.: Linked open data statistics: collection and exploitation. In: Mouromtsev, D., Klinov, P. (eds.) KESW 2013. CCIS, vol. 394, pp. 242–249. Springer, Heidelberg (2013)CrossRefGoogle Scholar
  10. 10.
    Fernández, J.D., Martínez-Prieto, M.A., Gutiérrez, C., Polleres, A., Arias, M.: Binary RDF representation for publication and exchange (HDT). Web Semant. Sci. Serv. Agents World Wide Web 19, 22–41 (2013)CrossRefGoogle Scholar
  11. 11.
    Guéret, C., Groth, P., Stadler, C., Lehmann, J.: Assessing linked data mappings using network measures. In: Simperl, E., Cimiano, P., Polleres, A., Corcho, O., Presutti, V. (eds.) ESWC 2012. LNCS, vol. 7295, pp. 87–102. Springer, Heidelberg (2012)CrossRefGoogle Scholar
  12. 12.
    Gyöngyi, Z., Garcia-Molina, H., Pedersen, J.: Combating web spam with TrustRank. In: Proceedings of the Thirtieth International Conference on Very large data bases, vol. 30, pp. 576–587. VLDB Endowment (2004)Google Scholar
  13. 13.
    Harth, A., Kinsella, S., Decker, S.: Using naming authority to rank data and ontologies for web search. In: Bernstein, A., Karger, D.R., Heath, T., Feigenbaum, L., Maynard, D., Motta, E., Thirunarayan, K. (eds.) ISWC 2009. LNCS, vol. 5823, pp. 277–292. Springer, Heidelberg (2009)CrossRefGoogle Scholar
  14. 14.
    Haveliwala, T.H.: Topic-sensitive pagerank: a context-sensitive ranking algorithm for web search. IEEE Trans. Knowl. Data Eng. 15(4), 784–796 (2003)CrossRefGoogle Scholar
  15. 15.
    Hogan, A., Harth, A., Umbrich, J., Kinsella, S., Polleres, A., Decker, S.: Searching and browsing Linked Data with SWSE: the semantic web search engine. Web Semant.: Sci. Serv. Agents World Wide Web 9(4), 365–401 (2011)CrossRefGoogle Scholar
  16. 16.
    Hogan, A., Umbrich, J., Harth, A., Cyganiak, R., Polleres, A., Decker, S.: An empirical survey of Linked Data conformance. Web Semant.: Sci. Serv. Agents World Wide Web 14, 14–44 (2012)CrossRefGoogle Scholar
  17. 17.
    Liu, S., d’Aquin, M., Motta, E.: Towards linked data fact validation through measuring consensus. In: Proceedings of the 2nd Workshop on Linked Data Quality Co-located with 12th Extended Semantic Web Conference (ESWC 2015). CEUR Workshop Proceedings, vol. 1376. CEUR-WS.org (2015)Google Scholar
  18. 18.
    Page, L., Brin, S., Motwani, R., Winograd, T.: The PageRank citation ranking: bringing order to the web (1999)Google Scholar
  19. 19.
    Paulheim, H., Bizer, C.: Improving the quality of linked data using statistical distributions. Int. J. Semant. Web Inf. Syst. (IJSWIS) 10(2), 63–86 (2014)CrossRefGoogle Scholar
  20. 20.
    Rietveld, L., Beek, W., Schlobach, S.: LOD lab: experiments at LOD scale. In: Arenas, M., et al. (eds.) ISWC 2015. LNCS, vol. 9367, pp. 339–355. Springer, Heidelberg (2015)CrossRefGoogle Scholar
  21. 21.
    Rietveld, L., Verborgh, R., Beek, W., Vander Sande, M., Schlobach, S.: Linked data-as-a-service: the semantic web redeployed. In: Gandon, F., Sabou, M., Sack, H., d’Amato, C., Cudré-Mauroux, P., Zimmermann, A. (eds.) ESWC 2015. LNCS, vol. 9088, pp. 471–487. Springer, Heidelberg (2015)CrossRefGoogle Scholar
  22. 22.
    Schmachtenberg, M., Bizer, C., Paulheim, H.: Adoption of the linked data best practices in different topical domains. In: Mika, P., et al. (eds.) ISWC 2014, Part I. LNCS, vol. 8796, pp. 245–260. Springer, Heidelberg (2014)Google Scholar
  23. 23.
    Thakkar, H., Endris, K.M., Giménez-García, J.M., Debattista, J., Lange, C., Auer, S.: Are linked datasets fit for open-domain question answering? A quality assessment. In: Proceedings of the 6th International Conference on Web Intelligence, Mining and Semantics, WIMS 2016, p. 19. ACM (2016)Google Scholar
  24. 24.
    Verborgh, R., Vander Sande, M., Colpaert, P., Coppens, S., Mannens, E., Van de Walle, R.: Web-scale querying through linked data fragments. In: Proceedings of the Workshop on Linked Data on the Web Co-located with the 23rd International World Wide Web Conference (WWW 2014). CEUR Workshop Proceedings, vol. 1184 (2014)Google Scholar

Copyright information

© Springer International Publishing AG 2016

Authors and Affiliations

  • José M. Giménez-García
    • 1
  • Harsh Thakkar
    • 2
  • Antoine Zimmermann
    • 3
  1. 1.Univ Lyon, UJM-Saint-Étienne, CNRS, Laboratoire Hubert Curien UMR 5516Saint-ÉtienneFrance
  2. 2.Enterprise Information Systems LabUniversity of BonnBonnGermany
  3. 3.Univ Lyon, MINES Saint-Étienne, CNRS, Laboratoire Hubert Curien UMR 5516Saint-ÉtienneFrance

Personalised recommendations