Automatic Discovery of High-Level Provenance Using Semantic Similarity

  • Tom De Nies
  • Sam Coppens
  • Davy Van Deursen
  • Erik Mannens
  • Rik Van de Walle
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7525)


As interest in provenance grows among the Semantic Web community, it is recognized as a useful tool across many domains. However, existing automatic provenance collection techniques are not universally applicable. Most existing methods either rely on (low-level) observed provenance, or require that the user discloses formal workflows. In this paper, we propose a new approach for automatic discovery of provenance, at multiple levels of granularity. To accomplish this, we detect entity derivations, relying on clustering algorithms, linked data and semantic similarity. The resulting derivations are structured in compliance with the Provenance Data Model (PROV-DM). While the proposed approach is purposely kept general, allowing adaptation in many use cases, we provide an implementation for one of these use cases, namely discovering the sources of news articles. With this implementation, we were able to detect 73% of the original sources of 410 news stories, at 68% precision. Lastly, we discuss possible improvements and future work.


Provenance Data Model Semantic Web Linked Data Similarity News 


  1. 1.
    Gil, Y., Cheney, J., Groth, P., Hartig, O., Miles, S., Moreau, L., Da Silva, P.P.: Provenance XG final report. Final Incubator Group Report (2010)Google Scholar
  2. 2.
    Gómez-Pérez, J.M., Corcho, O.: Problem-solving methods for understanding process executions. IEEE Computing in Science & Engineering 10, 47–52 (2008)Google Scholar
  3. 3.
    Braun, U., Garfinkel, S., Holland, D.A., Muniswamy-Reddy, K.-K., Seltzer, M.I.: Issues in Automatic Provenance Collection. In: Moreau, L., Foster, I. (eds.) IPAW 2006. LNCS, vol. 4145, pp. 171–183. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  4. 4.
    PROV-DM Part 1: The Provenance Data Model, W3C Editor’s Draft (May 29, 2012),
  5. 5.
    Rizzo, G., Troncy, R.: NERD: Evaluating Named Entity Recognition Tools in the Web of Data. In: Workshop on Web Scale Knowledge Extraction, WEKEX 2011 (2011)Google Scholar
  6. 6.
    Iacobelli, F., Nichols, N., Birnbaum, L., Hammond, K.: Finding new information via robust entity detection. In: Proactive Assistant Agents AAAI Fall Symposium (2010)Google Scholar
  7. 7.
    Hasan, M.A., Salem, S., Pupacdi, B., Zaki, M.J.: Clustering with Lower Bound on Similarity. In: Theeramunkong, T., Kijsirikul, B., Cercone, N., Ho, T.-B. (eds.) PAKDD 2009. LNCS, vol. 5476, pp. 122–133. Springer, Heidelberg (2009)CrossRefGoogle Scholar
  8. 8.
    Zhao, J., Sahoo, S.S., Missier, P., Sheth, A., Goble, C.: Extending semantic provenance into the web of data. IEEE Internet Computing, 40–48 (2011)Google Scholar
  9. 9.
    Zhao, J., Gomadam, K., Prasanna, V.: Predicting Missing Provenance using Semantic Associations in Reservoir Engineering. In: 2011 Fifth IEEE International Conference on Semantic Computing, ICSC (2011)Google Scholar
  10. 10.
    Zhang, J., Jagadish, H.V.: Lost source provenance. In: Proceedings of the 13th International Conference on Extending Database Technology. ACM (2010)Google Scholar
  11. 11.
    Ram, S., Liu, J.: A new perspective on Semantics of Data Provenance. In: First International Workshop on the Role of Semantic Web in Provenance Management, SWPM (2009)Google Scholar
  12. 12.
    Moreau, L.: The foundations for provenance on the web. Now Publishers (2010)Google Scholar
  13. 13.
    Hliaoutakis, A., Varelas, G., Voutsakis, E., Petrakis, E.G.M., Milios, E.: Information retrieval by semantic similarity. International Journal on Semantic Web and Information Systems (IJSWIS), 55–73 (2006)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Tom De Nies
    • 1
  • Sam Coppens
    • 1
  • Davy Van Deursen
    • 1
  • Erik Mannens
    • 1
  • Rik Van de Walle
    • 1
  1. 1.Department of Electronics and Information Systems, Multimedia LabGhent University - IBBTLedeberg-GhentBelgium

Personalised recommendations