Towards the Domain Agnostic Generation of Natural Language Explanations from Provenance Graphs for Casual Users

  • Darren P. RichardsonEmail author
  • Luc Moreau
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9672)


As more systems become PROV-enabled, there will be a corresponding increase in the need to communicate provenance data directly to users. Whilst there are a number of existing methods for doing this — formally, diagrammatically, and textually — there are currently no application-generic techniques for generating linguistic explanations of provenance. The principal reason for this is that a certain amount of linguistic information is required to transform a provenance graph — such as in PROV — into a textual explanation, and if this information is not available as an annotation, this transformation is presently not possible.

In this paper, we describe how we have adapted the common ‘consensus’ architecture from the field of natural language generation to achieve this graph transformation, resulting in the novel PROVglish architecture. We then present an approach to garnering the necessary linguistic information from a PROV dataset, which involves exploiting the linguistic information informally encoded in the URIs denoting provenance resources. We finish by detailing an evaluation undertaken to assess the effectiveness of this approach to lexicalisation, demonstrating a significant improvement in terms of fluency, comprehensibility, and grammatical correctness.


Linguistic Information Sentence Pair Natural Language Generation PROV Data Natural Language Interface 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.



Research was sponsored by US Army Research laboratory and the UK Ministry of Defence and was accomplished under Agreement Number W911NF-06-3-0001. The views and conclusions contained in this document are those of the authors and should not be interpreted as representing the official policies, either expressed or implied, of the US Army Research Laboratory, the U.S. Government, the UK Ministry of Defence, or the UK Government. The US and UK Governments are authorised to reproduce and distribute reprints for Government purposes notwithstanding any copyright notation hereon. The investigations and human experiment were subject to ethics approvals ERGO-FPSE-16722 and ERGO-FPSE-16731, and the source data used to generate the sentence pairs was drawn from the Southampton Provenance Store ( The research data can be found at and


  1. 1.
    Berners-Lee, T.: Universal Resource Identifiers - Axioms of Web Architecture, Technical note, World Wide Web Consortium (1996).
  2. 2.
    Bird, S., Loper, E., Klein, E.: Natural Language Processing with Python. O’Reilly Media Inc., Sebastopol (2009)zbMATHGoogle Scholar
  3. 3.
    Ell, B., Harth, A.: A language-independent method for the extraction of RDF verbalization templates. In: Proceedings of the 8th International Natural Language Generation Conference, Philadelphia, PA, USA (2014)Google Scholar
  4. 4.
    Gatt, A., Reiter, E.: SimpleNLG: a realisation engine for practical applications. In: Proceedings of the 12th European Workshop on Natural Language Generation, Athens, Greece, pp. 90–93 (2009)Google Scholar
  5. 5.
    Hoekstra, R., Groth, P.: PROV-O-Viz - understanding the role of activities in provenance. In: Ludaescher, B., Plale, B. (eds.) IPAW 2014. LNCS, vol. 8628, pp. 215–220. Springer, Heidelberg (2015)CrossRefGoogle Scholar
  6. 6.
    Lester, J.C., Porter, B.W.: Developing and empirically evaluating robust explanation generators: the KNIGHT experiments. Comput. Linguist. 23(1), 65–101 (1997)Google Scholar
  7. 7.
    Mann, H.B., Whitney, D.R.: On a test of whether one of two random variables is stochastically larger than the other. Ann. Math. Stat. 18(1), 50–60 (1947)MathSciNetCrossRefzbMATHGoogle Scholar
  8. 8.
    Mann, W.C., Thompson, S.A.: Rhetorical structure theory: toward a functional theory of text organization. Text 8(3), 243–281 (1988)Google Scholar
  9. 9.
    Marcus, M.P., Santorini, B., Marcinkiewicz, M.A.: Building a large annotated corpus of English: The Penn Treebank. Comput. Linguist. 19(2), 313–330 (1993)Google Scholar
  10. 10.
    McCrae, J., Spohr, D., Cimiano, P.: Linking lexical resources and ontologies on the semantic web with lemon. In: Antoniou, G., Grobelnik, M., Simperl, E., Parsia, B., Plexousakis, D., Leenheer, P., Pan, J. (eds.) ESWC 2011, Part I. LNCS, vol. 6643, pp. 245–259. Springer, Heidelberg (2011)CrossRefGoogle Scholar
  11. 11.
    Mellish, C., Dale, R.: Evaluation in the context of natural language generation. Comput. Speech Lang. 12(4), 349–373 (1998)CrossRefGoogle Scholar
  12. 12.
    Mellish, C., Scott, D., Cahill, L., Paiva, D., Evans, R., Reape, M.: A reference architecture for natural language generation systems. Nat. Lang. Eng. 12(1), 1–34 (2006)CrossRefGoogle Scholar
  13. 13.
    Moreau, L., Missier, P.: PROV-DM: The PROV Data Model. Recommendation of the World Wide Web Consortium (2013).
  14. 14.
    Moreau, L., Missier, P.: PROV-N: The Provenance Notation. Recommendation of the World Wide Web Consortium (2013).
  15. 15.
    Moreau, L.: Aggregation by provenance types: a technique for summarising provenance graphs. In: Proceedings of Graphs as Models 2015 (An ETAPS 2015 Workshop), in Electronic Proceedings in Theoretical Computer Science, London, UK, pp. 129–144 (2015)Google Scholar
  16. 16.
    Packer, H.S., Moreau, L.: Sentence templating for explaining provenance. In: Ludaescher, B., Plale, B. (eds.) IPAW 2014. LNCS, vol. 8628, pp. 278–280. Springer, Heidelberg (2015)CrossRefGoogle Scholar
  17. 17.
    PROV Working Group: PROV Graph Layout Conventions, Technical note, World Wide Web Consortium.
  18. 18.
    Ratnaparkhi, A.: A maximum entropy model for part-of-speech tagging. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, New Brunswick, NJ (1996)Google Scholar
  19. 19.
    Reiter, E.: Has a consensus NL generation architecture appeared, and is it psycholinguistically plausible? In: Proceedings of the Seventh International Workshop on Natural Language Generation, Kennebunkport, ME, pp. 163–170 (1994)Google Scholar
  20. 20.
    Reiter, E., Dale, R.: Building Natural Language Generation Systems. Cambridge University Press, Cambridge (2000)CrossRefGoogle Scholar
  21. 21.
    Richardson, D.P., Moreau, L., Mott, D.: Beyond the graph: telling the story with PROV and controlled English. In: Proceedings of the 2014 Annual Fall Meeting of the International Technology Alliance, Cardiff, UK (2014)Google Scholar
  22. 22.
    Sun, X., Mellish, C.: Domain independent sentence generation from RDF representations for the semantic web. In: Proceedings of the Combined Workshop on Language-Enabled Educational Technology and Development and Evaluation of Robust Spoken Dialogue Systems, Riva del Garda, Italy (2006)Google Scholar
  23. 23.
    Toniolo, A., Wentao Ouywang, R., Dropps, T., Oren, N., Norman, T.J., Srivastava, M., Allen, J.A., de Mel, G., Sullivan, P., Mastin, S., Pearson, G.: Assessing the credibility of information in collaborative intelligence analysis. In: Proceedings of the Annual Fall Meeting of the International Technology Alliance, Cardiff, UK, p. 2014 (2014)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2016

Authors and Affiliations

  1. 1.Electronics and Computer ScienceUniversity of SouthamptonSouthamptonUK

Personalised recommendations