Querying an Integrated Complex-Object Dataflow Database

  • Natalia Kwasnikowska
  • Jan Van den Bussche

Abstract

We consider an integrated complex-object dataflow database in which multiple dataflow specifications can be stored, together with multiple executions of these dataflows, including the complex-object data that are involved, and annotations. We focus on dataflow applications frequently encountered in the scientific community, involving the manipulation of data with a complex-object structure combined with service calls, which can be either internal or external. Internal services are dataflows acting as a subprogram of an other dataflow, whereas external services are modeled as functions with a possibly non-deterministic behavior. Dataflow specifications are expressed in a high-level programming language based on the nested relational calculus, the operators of which provide the right “glue” needed to combine different service calls into a complex-object dataflow. All entities involved, whether complex-objects, dataflow executions or dataflow specifications, are first-class citizens of the integrated database: they are all data. We discuss how such dataflow repositories can be queried in a variety of ways, including provenance queries. We show that a modern SQL platform with support for (external) routines and SQL/XML suffices to support all types of dataflow repository queries.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    van der Aalst, W., van Hee, K.: Workflow Management. MIT Press (2004)Google Scholar
  2. 2.
    Foster, I., Kesselman, C. (eds.): The Grid: Blueprint for a New Computing Infrastructure, 2nd edn. Elsevier (2004)Google Scholar
  3. 3.
    Shankar, S., et al.: Integrating databases and workflow systems. SIGMOD Record 34(3), 5–11 (2005)MathSciNetCrossRefGoogle Scholar
  4. 4.
    Ludaescher, B., Goble, C. (eds.): Special Section on Scientific Workflows. SIGMOD Record, vol. 34(3). ACM (2005)Google Scholar
  5. 5.
    Brown Jr., A.L.: Enforcing the scientific method. In: Freire, J., Koop, D., Moreau, L. (eds.) IPAW 2008. LNCS, vol. 5272, p. 2. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  6. 6.
    Hidders, J., Kwasnikowska, N., Sroka, J., Tyszkiewicz, J., Van den Bussche, J.: A formal model of dataflow repositories. In: Cohen-Boulakia, S., Tannen, V. (eds.) DILS 2007. LNCS (LNBI), vol. 4544, pp. 105–121. Springer, Heidelberg (2007)CrossRefGoogle Scholar
  7. 7.
    Provenance challenge Wiki, http://twiki.ipaw.info/bin/view/Challenge/
  8. 8.
    Moreau, L., Ludäscher, B., et al.: Special issue: The first provenance challenge. Concurrency and Computation: Practice and Experience 20(5), 409–597 (2008)CrossRefGoogle Scholar
  9. 9.
    Buneman, P., Naqvi, S., Tannen, V., Wong, L.: Principles of programming with complex objects and collection types. Theoretical Computer Science 149(1), 3–48 (1995)MathSciNetCrossRefMATHGoogle Scholar
  10. 10.
    Turi, D., Missier, P., Goble, C., et al.: Taverna workflows: Syntax and semantics. In: 3rd e-Science, pp. 441–448. IEEE Computer Society (2007)Google Scholar
  11. 11.
    Missier, P., Belhajjame, K., Zhao, J., Roos, M., Goble, C.A.: Data lineage model for Taverna workflows with lightweight annotation requirements. In: Freire, J., Koop, D., Moreau, L. (eds.) IPAW 2008. LNCS, vol. 5272, pp. 17–30. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  12. 12.
    McPhillips, T., Bowers, S., Ludäscher, B.: Collection-oriented scientific workflows for integrating and analyzing biological data. In: Leser, U., Naumann, F., Eckman, B. (eds.) DILS 2006. LNCS (LNBI), vol. 4075, pp. 248–263. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  13. 13.
    Bowers, S., McPhillips, T., Ludäscher, B.: Provenance in collection-oriented scientific workflows. Concurrency and Computation: Practice and Experience 20(5), 519–529 (2008)CrossRefGoogle Scholar
  14. 14.
    Foster, I., Vöckler, J., Wilde, M., Zhao, Y.: Chimera: A virtual data system for representing, querying, and automating data derivation. In: 14th SSDBM, pp. 27–46. IEEE Computer Society (2002)Google Scholar
  15. 15.
    Clifford, B., Foster, I., et al.: Tracking provenance in a virtual data grid. Concurrency and Computation: Practice and Experience 20(5), 519–529 (2008)CrossRefGoogle Scholar
  16. 16.
    Chen, I., Markowitz, V.: An overview of the object protocol model (OPM) and the OPM data management tools. Information Systems 20(5), 393–418 (1995)CrossRefGoogle Scholar
  17. 17.
    Ailamaki, A., Ioannidis, Y., Livy, M.: Scientific workflow management by database management. In: 10th SSDBM, pp. 190–199. IEEE Computer Society (1998)Google Scholar
  18. 18.
    Biton, O., Cohen Boulakia, S., Davidson, S.: Querying and managing provenance through user views in scientific workflows. In: 24th ICDE, pp. 1072–1081. IEEE Computer Society (2008)Google Scholar
  19. 19.
    Cohen Boulakia, S., Biton, O., Cohen, S., Davidson, S.: Addressing the provenance challenge using ZOOM. Concurrency and Computation: Practice and Experience 20(5), 497–506 (2008)CrossRefGoogle Scholar
  20. 20.
    Chebotko, A., Fei, X., Lin, C., Lu, S., Fotouhi, F.: Storing and querying scientific workflow provenance metadata using an RDBMS. In: 3rd e-Science, pp. 611–618. IEEE Computer Society (2007)Google Scholar
  21. 21.
    Van den Bussche, J., Vansummeren, S., Vossen, G.: Towards practical meta-querying. Information Systems 30(4), 317–332 (2005)CrossRefGoogle Scholar
  22. 22.
    Van den Bussche, J., Vansummeren, S., Vossen, G.: Meta-SQL: Towards practical meta-querying. In: Bertino, E., Christodoulakis, S., Plexousakis, D., Christophides, V., Koubarakis, M., Böhm, K. (eds.) EDBT 2004. LNCS, vol. 2992, pp. 823–825. Springer, Heidelberg (2004)CrossRefGoogle Scholar
  23. 23.
    van der Aalst, W., Reijers, H., Weijters, A., et al.: Business process mining: An industrial application. Information Systems 32(5), 713–732 (2007)CrossRefGoogle Scholar
  24. 24.
    Santos, E., Lins, L., Ahrens, J.P., Freire, J., Silva, C.T.: A first study on clustering collections of workflow graphs. In: Freire, J., Koop, D., Moreau, L. (eds.) IPAW 2008. LNCS, vol. 5272, pp. 160–173. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  25. 25.
    Ludäscher, B., Podhorszki, N., et al.: From computation models to models of provenance: the RWS approach. Concurrency and Computation: Practice and Experience 20(5), 507–518 (2008)CrossRefGoogle Scholar
  26. 26.
    Zao, J., et al.: Mining Taverna’s semantic web of provenance. Concurrency and Computation: Practice and Experience 20(5), 463–472 (2008)CrossRefGoogle Scholar
  27. 27.
    Barga, R., Digiampietri, L.: Automatic capture and efficient storage of e-science experiment provenance. Concurrency and Computation: Practice and Experience 20(5), 419–429 (2008)CrossRefGoogle Scholar
  28. 28.
    Miles, S., et al.: Extracting causal graphs from an open provenance model. Concurrency and Computation: Practice and Experience 20(5), 577–586 (2008)MathSciNetCrossRefGoogle Scholar
  29. 29.
    Kwasnikowska, N., Van den Bussche, J.: Mapping the NRC dataflow model to the open provenance model. In: Freire, J., Koop, D., Moreau, L. (eds.) IPAW 2008. LNCS, vol. 5272, pp. 3–16. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  30. 30.
    Moreau, L., et al.: The open provenance model. Technical Report 14979, University of Southampton, School of Electronics and Computer Science (2007)Google Scholar
  31. 31.
    Beeri, C., Eyal, A., Kamenkovich, S., Milo, T.: Querying business processes with BP-QL. Information Systems 33(6), 477–507 (2008)CrossRefGoogle Scholar
  32. 32.
    Beeri, C., Eyal, A., Milo, T., Pilberg, A.: Monitoring business processes with queries. In: 33rd VLDB, pp. 603–614. ACM (2007)Google Scholar
  33. 33.
    Chen, J., Chung, S.Y., Wong, L.: The Kleisli query system as a backbone for bioinformatics data integration and analysis. In: Lacroix, Z., Critchlow, T. (eds.) Bioinformatics: Managing Scientific Data, pp. 147–187. Morgan Kaufmann (2003)Google Scholar
  34. 34.
    Davidson, S., Wong, L.: The Kleisli approach to data transformation and integration. In: Gray, P., Kerschberg, L., King, P., Poulovassilis, A. (eds.) The Functional Approach to Data Management, pp. 135–165. Springer (2004)Google Scholar
  35. 35.
    Buneman, P., Cheney, J., Vansummeren, S.: On the expressiveness of implicit provenance in query and update languages. In: Schwentick, T., Suciu, D. (eds.) ICDT 2007. LNCS, vol. 4353, pp. 209–223. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  36. 36.
    Cheney, J., Ahmed, A., Acar, U.: Provenance as dependency analysis. In: Arenas, M. (ed.) DBPL 2007. LNCS, vol. 4797, pp. 138–152. Springer, Heidelberg (2007)CrossRefGoogle Scholar
  37. 37.
    Blockeel, H.: Experiment databases: A novel methodology for experimental research. In: Bonchi, F., Boulicaut, J.-F. (eds.) KDID 2005. LNCS, vol. 3933, pp. 72–85. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  38. 38.
    Blockeel, H., Vanschoren, J.: Experiment databases: Towards an improved experimental methodology in machine learning. In: Kok, J.N., Koronacki, J., Lopez de Mantaras, R., Matwin, S., Mladenič, D., Skowron, A. (eds.) PKDD 2007. LNCS (LNAI), vol. 4702, pp. 6–17. Springer, Heidelberg (2007)CrossRefGoogle Scholar
  39. 39.
    Abiteboul, S., Hull, R., Vianu, V.: Foundations of Databases. Addison-Wesley (1995)Google Scholar
  40. 40.
    Eisenberg, A., Melton, J.: Advancements in SQL/XML. SIGMOD Record 33(3), 79–86 (2004)CrossRefGoogle Scholar
  41. 41.
    Özcan, F., Chamberlin, D., Kulkarni, K., Michels, J.E.: Integration of SQL and XQuery in IBM DB2. IBM Systems Journal 45(2), 245–270 (2006)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • Natalia Kwasnikowska
    • 1
    • 2
  • Jan Van den Bussche
    • 1
    • 2
  1. 1.Hasselt UniversityBelgium
  2. 2.Transnational University of LimburgBelgium

Personalised recommendations