Knowledge and Information Systems

, Volume 50, Issue 2, pp 661–688 | Cite as

Developing provenance-aware query systems: an occurrence-centric approach

  • Eladio Domínguez
  • Beatriz PérezEmail author
  • Ángel Luis Rubio
  • María A. Zapata
  • Alberto Allué
  • Antonio López
Regular Paper


In recent years, research on provenance has increased exponentially, and such studies in the field of business process monitoring have been especially remarkable. Business process monitoring deals with recording information about the actual execution of processes to then extract valuable knowledge that can be utilized for business process quality improvement. In prior research, we developed an occurrence-centric approach built on our notion of occurrence that provides a holistic perspective of system dynamics. Based on this concept, more complex structures are defined herein, namely Occurrence Base (OcBase) and Occurrence Management System (OcSystem), which serve as scaffolding to develop business process monitoring systems. This paper focuses primarily on the critical provenance task of extracting valuable knowledge from such systems by proposing an Occurrence Query Framework that includes the definition of an Occurrence Base Metamodel and an Occurrence Query Language based on this metamodel. Our framework provides a way of working for the construction of business process monitoring systems that are provenance aware. As a proof of concept, a tool implementing the various components of the framework is presented. This tool has been tested against a real system in the context of biobanks.


Provenance Monitoring Information retrieval Protocol History Health 



This work has been partially supported by the Spanish Ministry of Science and Innovation, project SMOTY (IPT-2011-1328-390000), the Univ. of La Rioja, project APPI15/02, and the Univ. of Zaragoza, project UZ2015-TEC-05.


  1. 1.
    Allué A, Domínguez E, López A, Zapata MA (2013) QRP: a CMMI appraisal tool for project quality management. Proced Technol 9:664–669CrossRefGoogle Scholar
  2. 2.
    Bézivin J (2006) Model driven engineering: an emerging technical space. In: Proceedings of GTTSE’05, Springer, pp 36–64Google Scholar
  3. 3.
    Bloesch AC, Halpin TA (1996) ConQuer: a conceptual query language. In: Thalheim B (ed) ER. Lecture Notes in Computer Science, vol 1157. Springer, pp 121–133Google Scholar
  4. 4.
    Brauer PC, Hasselbring W (2012) Capturing provenance information with a workflow monitoring extension for the kieker framework. In: Proceedings of the 3rd international workshop on semantic web in provenance management, CEUR-WSGoogle Scholar
  5. 5.
    Buneman P, Davidson SB (2010) Data provenance? the foundation of data quality. Last visited on April 2016
  6. 6.
    Campanile F, Coppolino L, Giordano S, Romano L (2008) A business process monitor for a mobile phone recharging system. J Syst Archit 54(9):843–848CrossRefGoogle Scholar
  7. 7.
    Carata L, Akoush S, Balakrishnan N, Bytheway T, Sohan R, Selter M, Hopper A (2014) A primer on provenance. Commun ACM 57(5):52–60CrossRefGoogle Scholar
  8. 8.
    Casasnovas JA, Alcalde V, Civeira F, Guallar E, Ibanez B, Jimenez-Borreguero J, Laclaustra M, Leon M, Ordovas JM, Pocovi M, Sanz G, Fuster V (2012) ‘Aragon workers’ health study—design and cohort description. BMC Cardiovasc Disord 12(45):1–11Google Scholar
  9. 9.
    Chebotko A, Lu S, Fei X, Fotouhi F (2010) Rdfprov: a relational rdf store for querying and managing scientific workflow provenance. Data Knowl Eng 69(8):836–865CrossRefGoogle Scholar
  10. 10.
    Chen P, Plale B, Aktas MS (2014) Temporal representation for mining scientific data provenance. Future Gener Comput Syst 36:363–378CrossRefGoogle Scholar
  11. 11.
    Chiticariu L, Tan W-C, Vijayvargiya G (2005) Dbnotes: a post-it system for relational databases based on provenance. In: Proceedings of the 2005 ACM SIGMOD international conference on management of data, ACM, pp 942–944Google Scholar
  12. 12.
    Costello C, Molloy O (2008) Towards a semantic framework for business activity monitoring and management. In: AAAI spring symposium: AI meets business rules and process management, pp 17–27Google Scholar
  13. 13.
    Curbera F, Doganata Y, Martens A, Mukhi NK, Slominski A (2008) Business provenance–a technology to increase traceability of end-to-end operations, vol 5331. In: OTM Conferences (1). Lecture Notes in Computer Science, vol 5331. Springer, pp 100–119Google Scholar
  14. 14.
    Curcin V, Miles S, Danger R, Chen Y, Bache R, Taweel A (2014) Implementing interoperable provenance in biomedical research. Future Gener Comput Syst 34:1–16CrossRefGoogle Scholar
  15. 15.
    da Cruz SMS, Costa RM, Manhães M, Zavaleta J (2013) Monitoring soa-based applications with business provenance. In: Proceedings of the 28th annual ACM symposium on applied computing, ACM, pp 1927–1932Google Scholar
  16. 16.
    DeFee J, Harmon P (2005) Workflow handbook. In: Future strategies, chapter business activity monitoring and simulation, pp 53–74Google Scholar
  17. 17.
    Domínguez E, Pérez B, Rubio AL, Zapata MA, Lavilla J, Allué A (2014) Occurrence-oriented design strategy for developing business process monitoring systems. IEEE Trans Knowl Data Eng 26(7):1749–1762CrossRefGoogle Scholar
  18. 18.
    Freire J, Koop D, Santos E, Silva CT (2008) Provenance for computational tasks: a survey. Comput Sci Eng 10(3):11–21CrossRefGoogle Scholar
  19. 19.
    Gadelha LM Jr, Clifford B, Mattoso M, Wilde M, Foster I (2011) Provenance management in swift. Future Gener Comput Syst 27(6):775–780CrossRefGoogle Scholar
  20. 20.
    Gerede CE, Bhattacharya K, Su J (2007) Static analysis of business artifact-centric operational models. In: Proceedings of SOCA’07, pp 133–140Google Scholar
  21. 21.
    Glavic B, Dittrich KR (2007) Data provenance: a categorization of existing approaches. In: Datenbanksysteme in business, technologie und web (BTW’07), pp 227–241Google Scholar
  22. 22.
    Glavic B, Miller RJ, Alonso G (2013) Using sql for efficient generation and querying of provenance information. In: Tannen V, Wong L, Libkin L, Fan W, Tan W-C, Fourman MP (eds) In search of elegance in the theory and practice of computation. Springer, Berlin, pp 291–320Google Scholar
  23. 23.
    Holland DA, Braun UJ, Maclean D, Muniswamy-Reddy K-K, Seltzer MI (2008) Choosing a data model and query language for provenance. In: Proceedings of the 2nd international provenance and annotation workshopGoogle Scholar
  24. 24.
    Joglekar GS, Giridhar A, Reklaitis G (2014) A workflow modeling system for capturing data provenance. Comput Chem Eng 67:148–158CrossRefGoogle Scholar
  25. 25.
    Kang B, Kim D, Kang S-H (2012) Real-time business process monitoring method for prediction of abnormal termination using KNNI-based LOF prediction. Expert Syst Appl 39(5):6061–6068CrossRefGoogle Scholar
  26. 26.
    Kang D, Lee S, Kim K, Lee JY (2009) An OWL-based semantic business process monitoring framework. Expert Syst Appl 36(4):7576–7580CrossRefGoogle Scholar
  27. 27.
    Karsai G, Krahn H, Pinkernell C, Rumpe B, Schneider M, Völkel S (2009) Design guidelines for domain specific languages. In: Proceedings of the 9th OOPSLA workshop on domain-specific modeling (DSM’09), pp 7–13Google Scholar
  28. 28.
    Karvounarakis G, Ives ZG, Tannen V (2010) Querying data provenance. In: Proceedings of SIGMOD’10, ACM, pp 951–962Google Scholar
  29. 29.
    Ko RK (2009) A computer scientist’s introductory guide to business process management (BPM). Crossroads 15(4):4:11–4:18Google Scholar
  30. 30.
    Kobryn C (2000) Architectural patterns for metamodeling. In: Evans A, Kent S, Selic B (eds) UML’00—the unified modeling language. LNCS, vol 1939. Springer, Berlin, pp 265–277Google Scholar
  31. 31.
    Lucia AD, Deufemia V, Gravino C, Risi M (2009) Design pattern recovery through visual language parsing and source code analysis. J Syst Softw 82(7):1177–1193CrossRefGoogle Scholar
  32. 32.
    Moreau L (2010) The foundations for provenance on the web. Found Trends Web Sci 2(2–3):99–241CrossRefGoogle Scholar
  33. 33.
    Moreau L, Clifford B, Freire J, Futrelle J, Gil Y, Groth P, Kwasnikowska N, Miles S, Missier P, Myers J, Plale B, Simmhan Y, Stephan E, den Bussche JV (2011) The open provenance model core specification (v1. 1). Future Gener Comput Syst 27(6):743–756CrossRefGoogle Scholar
  34. 34.
    Moreau L, Groth P, Miles S, Vazquez-Salceda J, Ibbotson J, Jiang S, Munroe S, Rana O, Schreiber A, Tan V, Varga L (2008) The provenance of electronic data. Commun ACM 51(4):52–58CrossRefGoogle Scholar
  35. 35.
    Moreau L, Missier P (2013) PROV-DM: the PROV data model, technical report, world wide web consortium.
  36. 36.
    Mukhi NK (2010) Monitoring unmanaged business processes. In: Proceedings of the 2010 international conference on the move to meaningful internet systems—volume part I’, OTM’10, Springer, pp 44–59Google Scholar
  37. 37.
    OMG (2012) UML 2.4.1 superstructure specification. Last visited on April 2016
  38. 38.
    Reichert M, Bassil S, Bobrik R, Bauer T (2010) The proviado access control model for business process monitoring components. Enterp Model Inf Syst Archit Int J 5(3):64–88Google Scholar
  39. 39.
    Scheidegger C, Koop D, Santos E, Vo H, Callahan S, Freire J, Silva C (2008) Tackling the provenance challenge one layer at a time. Concurr Comput Pract Exp 20(5):473–483CrossRefGoogle Scholar
  40. 40.
    Simmhan Y, Plale B, Gannon D (2005) A survey of data provenance in e-science. ACM Sigmod Rec 34(3):31–36CrossRefGoogle Scholar
  41. 41.
    ter Hofstede AHM, van der Aalst WMP, Adams M, Russell N (2010) Modern business process automation: YAWL and its support environment. Springer, HeidelbergCrossRefGoogle Scholar
  42. 42.
    Tian H, Sunderraman R, Yian H (2007) A domain-specific conceptual data modeling and querying methodology. In: Proceedings of the 1st international conference on information systems, technology and management, New Delhi, IndiaGoogle Scholar
  43. 43.
    van der Aalst WMP (2007) Exploring the CSCW spectrum using process mining. Adv Eng Inf 21(2):191–199CrossRefGoogle Scholar
  44. 44.
    van der Aalst W, Weijter A, Maruster L (2003) Workflow mining: discovering process models from event logs. IEEE Trans Knowl Data Eng 16(9):1128–1142CrossRefGoogle Scholar
  45. 45.
    Widom J (2008) Trio: a system for data, uncertainty, and lineage. In: Aggarwal CC (ed) Managing and mining uncertain data. Springer, Berlin, pp 113–148Google Scholar

Copyright information

© Springer-Verlag London 2016

Authors and Affiliations

  • Eladio Domínguez
    • 1
  • Beatriz Pérez
    • 2
    Email author
  • Ángel Luis Rubio
    • 2
  • María A. Zapata
    • 1
  • Alberto Allué
    • 3
  • Antonio López
    • 3
  1. 1.Departamento de Informática e Ingeniería de SistemasUniversidad de ZaragozaZaragozaSpain
  2. 2.Departamento de Matemáticas y ComputaciónUniversidad de La RiojaLa RiojaSpain
  3. 3.Infozara Consultoría InformáticaZaragozaSpain

Personalised recommendations