Skip to main content

PLR: A Benchmark for Probabilistic Data Stream Management Systems

  • Conference paper
Intelligent Information and Database Systems (ACIIDS 2012)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 7198))

Included in the following conference series:

Abstract

Inherent imprecision of data streams in many applications leads to need for real-time uncertainty management. The new emerging Probabilistic Data Stream Management Systems (PDSMSs) are being developed to handle uncertainties of data streams in real-time. Many approaches have been proposed so far but there is no way to compare them regarding precision and efficiency. This problem motivated us to design an evaluation framework to compare performance and accuracy of PDSMSs with each other and also with probabilistic databases. In this paper, after a brief introduction to PDSMSs, we describe requirements and challenges for designing a PDSMS benchmark. Then, we present different parts of our framework including probabilistic data stream generator, queries, and result evaluator. Furthermore, we focus on implementation aspects and use our framework to evaluate effects of floating precision in our PDSMS prototype.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Deshpande, A., Guestrin, C., Madden, S., Hellerstein, J.M., Hong, W.: Model-Driven Data Acquisition in Sensor Networks. In: VLDB, pp. 588–599 (2004)

    Google Scholar 

  2. Jeffery, S.R., Franklin, M.J., Garofalakis, M.N.: An Adaptive RFID Middleware for Supporting Metaphysical Data Independence. VLDB Journal 17(2), 265–289 (2007)

    Article  Google Scholar 

  3. Welbourne, E., Khoussainova, N., Letchner, J., Li, Y., Balazinska, M., Borriello, G., Suciu, D.: Cascadia: A System for Specifying, Detecting, and Managing RFID Events. In: MobiSys, pp. 281–294 (2008)

    Google Scholar 

  4. Kanagal, B., Deshpande, A.: Online Filtering, Smoothing and Probabilistic Modeling of Streaming Data. In: ICDE (2008)

    Google Scholar 

  5. Kulkarni, P., Shenoy, P., Ganesan, D.: Approximate Initialization of Camera Sensor Networks. In: Langendoen, K.G., Voigt, T. (eds.) EWSN 2007. LNCS, vol. 4373, pp. 67–82. Springer, Heidelberg (2007)

    Chapter  Google Scholar 

  6. Kurose, J., Lyons, E., McLaughlin, D., Pepyne, D., Philips, B., Westbrook, D.L., Zink, M.: An End-User-Responsive Sensor Network Architecture for Hazardous Weather Detection, Prediction and Response. In: Cho, K., Jacquet, P. (eds.) AINTEC 2006. LNCS, vol. 4311, pp. 1–15. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  7. Singh, S., Mayfield, C., Shah, R., Prabhakar, S., Hambrusch, S., Neville, J., Cheng, R.: Database Support for Probabilistic Attributes and Tuples. In: Proc. of the IEEE 24th International Conference on Data Engineering (2008)

    Google Scholar 

  8. Agrawal, P., Widom, J.: Continuous Uncertainty in Trio. In: MUD (2009)

    Google Scholar 

  9. Singh, S., Mayfield, C., Mittal, S., Prabhakar, S., Hambrusch, S., Shah, R.: Orion 2.0: Native Support for Uncertain Data. In: Proc. of ACM SIGMOD (2009)

    Google Scholar 

  10. Huang, J., Antova, L., Koch, C., Olteanu, D.: MayBMS: a Probabilistic Database Management System. In: Proc. of the 35th SIGMOD (2009)

    Google Scholar 

  11. Re, C., Suciu, D.: Managing Probabilistic Data with MystiQ: The Can-Do, the Could-Do, and the Can’t-Do. In: Greco, S., Lukasiewicz, T. (eds.) SUM 2008. LNCS (LNAI), vol. 5291, pp. 5–18. Springer, Heidelberg (2008)

    Chapter  Google Scholar 

  12. Diao, Y., Li, B., Liu, A., Peng, L., Sutton, C., Tran, T., Zink, M.: Capturing Data Uncertainty in High-Volume Stream Processing. In: CIDR (2009)

    Google Scholar 

  13. Tran, T.T., Peng, L., Li, B., Diao, Y., Liu, A.: PODS: a New Model and Processing Algorithms for Uncertain Data Streams. In: Proc. of the International Conference on Management of Data, Indianapolis, Indiana (2010)

    Google Scholar 

  14. Haghjoo, M.S., Dezfuli, M.G., Azizjalali, A.: Designing a Probabilistic Data Stream Management System. International Review on Computers and Software 5(6) (2010)

    Google Scholar 

  15. Arasu, A., Cherniack, M., Galvez, E.F., Maier, D., Maskey, A., Ryvkina, E., Stonebraker, M., Tibbetts, R.: Linear Road: a Stream Data Management Benchmark. In: VLDB Conference, Toronto (2004)

    Google Scholar 

  16. Galindo, J., Urrutia, A., Piattini, M.: Fuzzy Databases: Modeling, Design, and Implementation. Idea Group Publishing (2006)

    Google Scholar 

  17. Liu, H., Hwang, S., Srivastava, J.: Probabilistic Stream Relational Algebra: a Data Model for Sensor Data Streams. Technical Report, University of Minnesota (2004)

    Google Scholar 

  18. Faradjian, A., Gehrke, J., Bonnet, P.: GADT: a Probability Space ADT for Representing and Querying the Physical World. In: ICDE (2002)

    Google Scholar 

  19. Benjelloun, O., Sarma, A.D., Halevy, A., Widom, J.: ULDBs: Databases with Uncertainty and Lineage. In: Proc. of the 32nd International Conference on VLDB, pp. 953–964 (2006)

    Google Scholar 

  20. Deshpande, A., Madden, S.: MauveDB: Supporting Model-based User Views in Database Systems. In: Proc. of ACM SIGMOD, pp. 73–84 (2006)

    Google Scholar 

  21. Barbará, D., Garcia-Molina, H., Porter, D.: The Management of Probabilistic Data. IEEE Transactions on Knowledge and Data Engineering 4(5), 487–502 (1992)

    Article  Google Scholar 

  22. Böhme, T., Rahm, E.: XMach-1: a Benchmark for XML Data Management. In: Datenbanksysteme in BÜRo, Technik Und Wissenschaft (Btw), 9. Gi-Fachtagung (2001)

    Google Scholar 

  23. Stonebraker, M., Frew, J., Gardels, K., Meredith, J.: The Sequoia 2000 Benchmark. In: SIGMOD Conference, pp. 2–11 (1993)

    Google Scholar 

  24. O’Neil, P.E.: Database Performance Measurement. In: The Computer Science and Engineering Handbook, pp. 1078–1092. CRC Press (1997)

    Google Scholar 

  25. OLAP Council: APB-1 OLAP Benchmark Release II (1998), http://www.olapcouncil.org/research/bmarkly.html

  26. Transaction Processing Performance Council (2000), http://www.tpc.org

  27. Chaudhri, A.B.: Benchmarks (2000), http://www.soi.city.ac.uk/~akmal/html.dir/benchmarks.html

  28. Tucker, P.A., Tufte, K., Papadimos, V., Maier, D.: Nexmark - a Benchmark for Querying Data Streams. Technical Report, OGI School of Science & Engineering at OHSU (2003)

    Google Scholar 

  29. Schmidt, A., Waas, F., Kersten, M., Carey, M.J., Manolescu, I., Busse, R.: XMark: a Benchmark for XML Data Management. In: Proc. of the 28th International Conference on Very Large Data Bases (2002)

    Google Scholar 

  30. Koch, C., Re, C., Olteanu, D., Lenz, H.J., Keulen, M.V., Haas, P.J., Pan, J.Z.: Working Group: Report of the Probabilistic Databases Benchmarking. In: Proc. of Dagstuhl Seminar 08421 on Uncertainty Management in Information Systems, pp. 12–17 (2008)

    Google Scholar 

  31. Arasu, A., Babu, S., Widom, J.: The CQL Continuous Query Language: Semantic Foundations and Query Execution. The VLDB Journal 15(2), 121–142 (2006)

    Article  Google Scholar 

  32. Mitzenmacher, M., Upfal, E.: Probability & Computing: Randomized Algorithms and Probabilistic Analysis. Cambridge U. Press (2005)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Karachi, A., Dezfuli, M.G., Haghjoo, M.S. (2012). PLR: A Benchmark for Probabilistic Data Stream Management Systems. In: Pan, JS., Chen, SM., Nguyen, N.T. (eds) Intelligent Information and Database Systems. ACIIDS 2012. Lecture Notes in Computer Science(), vol 7198. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-28493-9_43

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-28493-9_43

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-28492-2

  • Online ISBN: 978-3-642-28493-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics