Abstract
Inherent imprecision of data streams in many applications leads to need for real-time uncertainty management. The new emerging Probabilistic Data Stream Management Systems (PDSMSs) are being developed to handle uncertainties of data streams in real-time. Many approaches have been proposed so far but there is no way to compare them regarding precision and efficiency. This problem motivated us to design an evaluation framework to compare performance and accuracy of PDSMSs with each other and also with probabilistic databases. In this paper, after a brief introduction to PDSMSs, we describe requirements and challenges for designing a PDSMS benchmark. Then, we present different parts of our framework including probabilistic data stream generator, queries, and result evaluator. Furthermore, we focus on implementation aspects and use our framework to evaluate effects of floating precision in our PDSMS prototype.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Deshpande, A., Guestrin, C., Madden, S., Hellerstein, J.M., Hong, W.: Model-Driven Data Acquisition in Sensor Networks. In: VLDB, pp. 588–599 (2004)
Jeffery, S.R., Franklin, M.J., Garofalakis, M.N.: An Adaptive RFID Middleware for Supporting Metaphysical Data Independence. VLDB Journal 17(2), 265–289 (2007)
Welbourne, E., Khoussainova, N., Letchner, J., Li, Y., Balazinska, M., Borriello, G., Suciu, D.: Cascadia: A System for Specifying, Detecting, and Managing RFID Events. In: MobiSys, pp. 281–294 (2008)
Kanagal, B., Deshpande, A.: Online Filtering, Smoothing and Probabilistic Modeling of Streaming Data. In: ICDE (2008)
Kulkarni, P., Shenoy, P., Ganesan, D.: Approximate Initialization of Camera Sensor Networks. In: Langendoen, K.G., Voigt, T. (eds.) EWSN 2007. LNCS, vol. 4373, pp. 67–82. Springer, Heidelberg (2007)
Kurose, J., Lyons, E., McLaughlin, D., Pepyne, D., Philips, B., Westbrook, D.L., Zink, M.: An End-User-Responsive Sensor Network Architecture for Hazardous Weather Detection, Prediction and Response. In: Cho, K., Jacquet, P. (eds.) AINTEC 2006. LNCS, vol. 4311, pp. 1–15. Springer, Heidelberg (2006)
Singh, S., Mayfield, C., Shah, R., Prabhakar, S., Hambrusch, S., Neville, J., Cheng, R.: Database Support for Probabilistic Attributes and Tuples. In: Proc. of the IEEE 24th International Conference on Data Engineering (2008)
Agrawal, P., Widom, J.: Continuous Uncertainty in Trio. In: MUD (2009)
Singh, S., Mayfield, C., Mittal, S., Prabhakar, S., Hambrusch, S., Shah, R.: Orion 2.0: Native Support for Uncertain Data. In: Proc. of ACM SIGMOD (2009)
Huang, J., Antova, L., Koch, C., Olteanu, D.: MayBMS: a Probabilistic Database Management System. In: Proc. of the 35th SIGMOD (2009)
Re, C., Suciu, D.: Managing Probabilistic Data with MystiQ: The Can-Do, the Could-Do, and the Can’t-Do. In: Greco, S., Lukasiewicz, T. (eds.) SUM 2008. LNCS (LNAI), vol. 5291, pp. 5–18. Springer, Heidelberg (2008)
Diao, Y., Li, B., Liu, A., Peng, L., Sutton, C., Tran, T., Zink, M.: Capturing Data Uncertainty in High-Volume Stream Processing. In: CIDR (2009)
Tran, T.T., Peng, L., Li, B., Diao, Y., Liu, A.: PODS: a New Model and Processing Algorithms for Uncertain Data Streams. In: Proc. of the International Conference on Management of Data, Indianapolis, Indiana (2010)
Haghjoo, M.S., Dezfuli, M.G., Azizjalali, A.: Designing a Probabilistic Data Stream Management System. International Review on Computers and Software 5(6) (2010)
Arasu, A., Cherniack, M., Galvez, E.F., Maier, D., Maskey, A., Ryvkina, E., Stonebraker, M., Tibbetts, R.: Linear Road: a Stream Data Management Benchmark. In: VLDB Conference, Toronto (2004)
Galindo, J., Urrutia, A., Piattini, M.: Fuzzy Databases: Modeling, Design, and Implementation. Idea Group Publishing (2006)
Liu, H., Hwang, S., Srivastava, J.: Probabilistic Stream Relational Algebra: a Data Model for Sensor Data Streams. Technical Report, University of Minnesota (2004)
Faradjian, A., Gehrke, J., Bonnet, P.: GADT: a Probability Space ADT for Representing and Querying the Physical World. In: ICDE (2002)
Benjelloun, O., Sarma, A.D., Halevy, A., Widom, J.: ULDBs: Databases with Uncertainty and Lineage. In: Proc. of the 32nd International Conference on VLDB, pp. 953–964 (2006)
Deshpande, A., Madden, S.: MauveDB: Supporting Model-based User Views in Database Systems. In: Proc. of ACM SIGMOD, pp. 73–84 (2006)
Barbará, D., Garcia-Molina, H., Porter, D.: The Management of Probabilistic Data. IEEE Transactions on Knowledge and Data Engineering 4(5), 487–502 (1992)
Böhme, T., Rahm, E.: XMach-1: a Benchmark for XML Data Management. In: Datenbanksysteme in BÜRo, Technik Und Wissenschaft (Btw), 9. Gi-Fachtagung (2001)
Stonebraker, M., Frew, J., Gardels, K., Meredith, J.: The Sequoia 2000 Benchmark. In: SIGMOD Conference, pp. 2–11 (1993)
O’Neil, P.E.: Database Performance Measurement. In: The Computer Science and Engineering Handbook, pp. 1078–1092. CRC Press (1997)
OLAP Council: APB-1 OLAP Benchmark Release II (1998), http://www.olapcouncil.org/research/bmarkly.html
Transaction Processing Performance Council (2000), http://www.tpc.org
Chaudhri, A.B.: Benchmarks (2000), http://www.soi.city.ac.uk/~akmal/html.dir/benchmarks.html
Tucker, P.A., Tufte, K., Papadimos, V., Maier, D.: Nexmark - a Benchmark for Querying Data Streams. Technical Report, OGI School of Science & Engineering at OHSU (2003)
Schmidt, A., Waas, F., Kersten, M., Carey, M.J., Manolescu, I., Busse, R.: XMark: a Benchmark for XML Data Management. In: Proc. of the 28th International Conference on Very Large Data Bases (2002)
Koch, C., Re, C., Olteanu, D., Lenz, H.J., Keulen, M.V., Haas, P.J., Pan, J.Z.: Working Group: Report of the Probabilistic Databases Benchmarking. In: Proc. of Dagstuhl Seminar 08421 on Uncertainty Management in Information Systems, pp. 12–17 (2008)
Arasu, A., Babu, S., Widom, J.: The CQL Continuous Query Language: Semantic Foundations and Query Execution. The VLDB Journal 15(2), 121–142 (2006)
Mitzenmacher, M., Upfal, E.: Probability & Computing: Randomized Algorithms and Probabilistic Analysis. Cambridge U. Press (2005)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Karachi, A., Dezfuli, M.G., Haghjoo, M.S. (2012). PLR: A Benchmark for Probabilistic Data Stream Management Systems. In: Pan, JS., Chen, SM., Nguyen, N.T. (eds) Intelligent Information and Database Systems. ACIIDS 2012. Lecture Notes in Computer Science(), vol 7198. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-28493-9_43
Download citation
DOI: https://doi.org/10.1007/978-3-642-28493-9_43
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-28492-2
Online ISBN: 978-3-642-28493-9
eBook Packages: Computer ScienceComputer Science (R0)