Abstract
Data streams flowing from the physical environment are as unpredictable as the environment itself. Radars go down, long haul networks drop packets, and readings are corrupted on the wire. Yet the data driven scientific models and data mining algorithms do not necessarily account for the inaccuracies when assimilating the data. Low overhead provenance collection partially solves this problem. We propose a data model and collection model for near real time provenance collection. We define a system architecture for stream provenance tracking and motivate with a real-world application in meteorology forecasting.
This work supported under NSF cooperative agreement ATM-0331480 and DOE DE-FG02-04ER25600.
Chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Abadi, D.J., et al.: The Design of the Borealis Stream Processing Engine. In: Conference on Innovative Data Systems Research (CIDR) (2005)
Babcock, B., Babu, S., Datar, M., Motwani, R., Widom, J.: Models and issues in data stream systems. In: ACM Symposium on Principles of Database Systems (2002)
Chandrasekaran, S., et al.: TelegraphCQ: continuous dataflow processing. In: International conference on Management of Data (SIGMOD) (2003)
Chen, L., Reddy, K., Agrawal, G.: GATES: A Grid-Based Middleware for Processing Distributed Data Streams. In: IEEE International Symposium on High-Performance Distributed Computing (2004)
Droegemeier, K., et al.: Service-oriented environments in research and education for dynamically interacting with mesoscale weather. IEEE Computing in Science and Engineering 7(6) (2005)
Foster, I., Vockler, J., Wilde, M., Zhao, Y.: The Virtual Data Grid: A new model and architecture for data-intensive collaboration. In: Conference on Innovative Data Systems Research (2003)
Groth, P., Luck, M., Moreau, L.: A protocol for recording provenance in service-oriented Grids. In: Higashino, T. (ed.) OPODIS 2004. LNCS, vol. 3544, pp. 124–139. Springer, Heidelberg (2005)
Log4j. Apache Software Foundation, http://logging.apache.org/log4j/
Myers, J.D., Chappell, A., Elder, M., Geist, A., Schwidder, J.: Re-Integrating the Research Record. IEEE Computing in Science and Engineering 5(3), 44–50 (2003)
The OGSA-DAI Project, http://www.ogsadai.org.uk/
Plale, B., Schwan, K.: Dynamic querying of streaming data with the dQUOB system. IEEE Transactions on Parallel and Distributed Systems 14(4), 422–432 (2003)
Simmhan, Y.L., Plale, B., Gannon, D.: A survey of data provenance in e-science. ACM SIGMOD Record 34(3), 31–36 (2005)
Simmhan, L.Y., Plale, B., Gannon, D., Marru, S.: Performance Evaluation of the Karma Provenance Framework for Scientific Workflows. In: Moreau, L., Foster, I. (eds.) IPAW 2006. LNCS, vol. 4145, pp. 222–236. Springer, Heidelberg (2006)
Szomszor, M., Moreau, L.: Recording and reasoning over data provenance in web and grid services. In: Int. conference on ontologies, databases and applications of semantics (2003)
TeraGrid, http://www.teragrid.org
Tan, V., Groth, P., Miles, S., Jiang, S., Munroe, S., Tsasakou, S., Moreau, L.: Security Issues in a SOA-based Provenance System. In: Moreau, L., Foster, I. (eds.) IPAW 2006. LNCS, vol. 4145, pp. 203–211. Springer, Heidelberg (2006)
Vijayakumar, N., Liu, Y., Plale, B.: Calder query grid service: Insights and experimental evaluation. In: CCGrid (to appear, 2006)
Widom, J.: Trio: A system for integrated management of data, accuracy, and lineage. In: Conference on Innovative Data Systems Research (2005)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Vijayakumar, N.N., Plale, B. (2006). Towards Low Overhead Provenance Tracking in Near Real-Time Stream Filtering. In: Moreau, L., Foster, I. (eds) Provenance and Annotation of Data. IPAW 2006. Lecture Notes in Computer Science, vol 4145. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11890850_6
Download citation
DOI: https://doi.org/10.1007/11890850_6
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-46302-3
Online ISBN: 978-3-540-46303-0
eBook Packages: Computer ScienceComputer Science (R0)