Towards Low Overhead Provenance Tracking in Near Real-Time Stream Filtering

  • Nithya N. Vijayakumar
  • Beth Plale
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4145)


Data streams flowing from the physical environment are as unpredictable as the environment itself. Radars go down, long haul networks drop packets, and readings are corrupted on the wire. Yet the data driven scientific models and data mining algorithms do not necessarily account for the inaccuracies when assimilating the data. Low overhead provenance collection partially solves this problem. We propose a data model and collection model for near real time provenance collection. We define a system architecture for stream provenance tracking and motivate with a real-world application in meteorology forecasting.


Data Stream Input Stream Execution Plan Collection Model Data Mining Algorithm 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Abadi, D.J., et al.: The Design of the Borealis Stream Processing Engine. In: Conference on Innovative Data Systems Research (CIDR) (2005)Google Scholar
  2. 2.
    Babcock, B., Babu, S., Datar, M., Motwani, R., Widom, J.: Models and issues in data stream systems. In: ACM Symposium on Principles of Database Systems (2002)Google Scholar
  3. 3.
    Chandrasekaran, S., et al.: TelegraphCQ: continuous dataflow processing. In: International conference on Management of Data (SIGMOD) (2003)Google Scholar
  4. 4.
    Chen, L., Reddy, K., Agrawal, G.: GATES: A Grid-Based Middleware for Processing Distributed Data Streams. In: IEEE International Symposium on High-Performance Distributed Computing (2004)Google Scholar
  5. 5.
    Droegemeier, K., et al.: Service-oriented environments in research and education for dynamically interacting with mesoscale weather. IEEE Computing in Science and Engineering 7(6) (2005)Google Scholar
  6. 6.
    Foster, I., Vockler, J., Wilde, M., Zhao, Y.: The Virtual Data Grid: A new model and architecture for data-intensive collaboration. In: Conference on Innovative Data Systems Research (2003)Google Scholar
  7. 7.
    Groth, P., Luck, M., Moreau, L.: A protocol for recording provenance in service-oriented Grids. In: Higashino, T. (ed.) OPODIS 2004. LNCS, vol. 3544, pp. 124–139. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  8. 8.
    Log4j. Apache Software Foundation,
  9. 9.
    Myers, J.D., Chappell, A., Elder, M., Geist, A., Schwidder, J.: Re-Integrating the Research Record. IEEE Computing in Science and Engineering 5(3), 44–50 (2003)Google Scholar
  10. 10.
    The OGSA-DAI Project,
  11. 11.
    Plale, B., Schwan, K.: Dynamic querying of streaming data with the dQUOB system. IEEE Transactions on Parallel and Distributed Systems 14(4), 422–432 (2003)CrossRefGoogle Scholar
  12. 12.
    Simmhan, Y.L., Plale, B., Gannon, D.: A survey of data provenance in e-science. ACM SIGMOD Record 34(3), 31–36 (2005)CrossRefGoogle Scholar
  13. 13.
    Simmhan, L.Y., Plale, B., Gannon, D., Marru, S.: Performance Evaluation of the Karma Provenance Framework for Scientific Workflows. In: Moreau, L., Foster, I. (eds.) IPAW 2006. LNCS, vol. 4145, pp. 222–236. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  14. 14.
    Szomszor, M., Moreau, L.: Recording and reasoning over data provenance in web and grid services. In: Int. conference on ontologies, databases and applications of semantics (2003)Google Scholar
  15. 15.
  16. 16.
    Tan, V., Groth, P., Miles, S., Jiang, S., Munroe, S., Tsasakou, S., Moreau, L.: Security Issues in a SOA-based Provenance System. In: Moreau, L., Foster, I. (eds.) IPAW 2006. LNCS, vol. 4145, pp. 203–211. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  17. 17.
    Vijayakumar, N., Liu, Y., Plale, B.: Calder query grid service: Insights and experimental evaluation. In: CCGrid (to appear, 2006)Google Scholar
  18. 18.
    Widom, J.: Trio: A system for integrated management of data, accuracy, and lineage. In: Conference on Innovative Data Systems Research (2005)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Nithya N. Vijayakumar
    • 1
  • Beth Plale
    • 1
  1. 1.Department of Computer ScienceIndiana University 

Personalised recommendations