Skip to main content

Xtream: A System for Continuous Querying over Uncertain Data Streams

  • Conference paper
Scalable Uncertainty Management (SUM 2012)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 7520))

Included in the following conference series:

Abstract

Data stream and probabilistic data have been recently considered noticeably in isolation. However, there are many applications including sensor data management systems and object monitoring systems which need both issues in tandem. The existence of complex correlations and lineages prevents Probabilistic DBMSs (PDBMSs) from continuously querying temporal positioning and sensed data. Our main contribution is developing a new system to continuously run monitoring queries on probabilistic data streams with a satisfactory fast speed, while being faithful to correlations and uncertainty aspects of data. We designed a new data model for probabilistic data streams. We also presented new query operators to implement threshold SPJ queries with aggregation (SPJA queries). In addition and most importantly, we build a java-based working system, called Xtream, which supports uncertainty from input data streams to final query results. Unlike probabilistic databases, the data-driven design of Xtream makes it possible to continuously query high-volumes of bursty probabilistic data streams. In this paper, after reviewing main characteristics and motivating applications for probabilistic data streams, we present our new data model. Then we focus on algorithms and approximations for basic operators (select, project, join, and aggregate). Finally, we compare our prototype with Orion the only existing probabilistic DBMS that supports continuous distributions. Our experiments demonstrate how Xtream outperforms Orion w.r.t. efficiency metrics such as tuple latency (response time) and throughput as well as accuracy, which are critical parameters in any probabilistic data stream management system.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Dalvi, N., Ré, C., Suciu, D.: Probabilistic databases: diamonds in the dirt. Commun. ACM 52(7), 86–94 (2009)

    Article  Google Scholar 

  2. Kanagal, B., Deshpande, A.: Efficient Query Evaluation over Temporally Correlated Probabilistic Streams. In: Proceedings of the 2009 IEEE International Conference on Data Engineering, ICDE 2009 (2009)

    Google Scholar 

  3. Tran, T., Sutton, C., Cocci, R., Nie, Y., Diao, Y., Shenoy, P.: Probabilistic Inference over RFID Streams in Mobile Environments. In: Proceedings of ICDE 2009 (2009)

    Google Scholar 

  4. Kanagal, B., Deshpande, A.: Online Filtering, Smoothing and Probabilistic Modeling of Streaming Data. In: ICDE, pp. 1160–1169 (2008)

    Google Scholar 

  5. Kurose, J., Lyons, E., McLaughlin, D., Pepyne, D., Philips, B., Westbrook, D., Zink, M.: An End-User-Responsive Sensor Network Architecture for Hazardous Weather Detection, Prediction and Response. In: Cho, K., Jacquet, P. (eds.) AINTEC 2006. LNCS, vol. 4311, pp. 1–15. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  6. Cheng, R., Singh, S., Prabhakar, S., Shah, R., Vitter, J.S., Xia, Y.: Efficient join processing over uncertain data. In: Proceedings of the 15th ACM International Conference on Information and Knowledge Management, CIKM 2006 (2006)

    Google Scholar 

  7. Diao, Y., Li, B., Liu, A., Peng, L., Sutton, C., Tran, T., Zink, M.: Capturing Data Uncertainty in High-Volume Stream Processing. In: CIDR (2009)

    Google Scholar 

  8. Safaei, A.A., Haghjoo, M.S.: Parallel Processing of Data Stream Query Operators. Journal of Distributed and Parallel Databases 28(2-3) (2010)

    Google Scholar 

  9. Mayfield, C., Neville, J., Prabhakar, S.: ERACER: a database approach for statistical inference and data cleaning. In: Proceedings of the 2010 International Conference on Management of Data, SIGMOD 2010, pp. 75–86. ACM, New York (2010)

    Chapter  Google Scholar 

  10. Fuhr, N., Rölleke, T.: A probabilistic relational algebra for the integration of information retrieval and database systems. ACM Trans. Inf. Syst. 15(1), 32–66 (1997)

    Article  Google Scholar 

  11. Dalvi, N., Suciu, D.: Efficient query evaluation on probabilistic databases. The VLDB Journal 16(4), 523–544 (2007)

    Article  Google Scholar 

  12. Singh, S., Mayfield, C., Shah, R., Prabhakar, S., Hambrusch, S., Neville, J., Cheng, R.: Database Support for Probabilistic Attributes and Tuples. In: Proceedings of the IEEE 24th International Conference on Data Engineering, April 07-12 (2008)

    Google Scholar 

  13. Karachi, A., Dezfuli, M.G., Haghjoo, M.S.: PLR: A Benchmark for Probabilistic Data Stream Management Systems. In: Pan, J.-S., Chen, S.-M., Nguyen, N.T. (eds.) ACIIDS 2012, Part III. LNCS, vol. 7198, pp. 405–415. Springer, Heidelberg (2012)

    Chapter  Google Scholar 

  14. Benjelloun, O., Sarma, A.D., Halevy, A., Widom, J.: ULDBs: Databases with Uncertainty and Lineage. In: Proceedings of the 32nd International Conference on VLDB, pp. 953–964 (2006)

    Google Scholar 

  15. Antova, L., Koch, C., Olteanu, D.: From Complete to Incomplete Information and Back. In: Proceedings of the 2007 ACM SIGMOD International Conference on Management of Data, Beijing, China, June 11-14 (2007)

    Google Scholar 

  16. Sen, P., Deshpande, A., Getoor, L.: PrDB: managing and exploiting rich correlations in probabilistic databases. The VLDB Journal 18(5), 1065–1090 (2009)

    Article  Google Scholar 

  17. Agrawal, P., Widom, J.: Continuous Uncertainty in Trio. In: MUD (2009)

    Google Scholar 

  18. Tran, T., Peng, L., Li, B., Diao, Y., Liu, A.: PODS: a new model and processing algorithms for uncertain data streams. In: Proceedings of the 2010 International Conference on Management of Data, SIGMOD 2010 (2010)

    Google Scholar 

  19. Ré, C., Letchner, J., Balazinksa, M., Suciu, D.: Event queries on correlated probabilistic streams. In: Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data, SIGMOD 2008, pp. 715–728. ACM, New York (2008)

    Chapter  Google Scholar 

  20. Ge, T., Zdonik, S.: Handling Uncertain Data in Array Database Systems. In: Proceedings of the IEEE 24th International Conference on Data Engineering, April 07-12 (2008)

    Google Scholar 

  21. Krämer, J., Seeger, B.: Semantics and implementation of continuous sliding window queries over data streams. ACM Trans. Database Syst. 34(1), Article 4 (April 2009)

    Google Scholar 

  22. Dezfuli, M.G., Haghjoo, M.S.: A Semantical Model for Probabilistic Data Stream Management Systems with Continuous Distributions. Submitted to Distributed and Parallel Databases Journal (2011)

    Google Scholar 

  23. Arasu, A., Cherniack, M., Galvez, E., Maier, D., Maskey, A.S., Ryvkina, E., Stonebraker, M., Tibbetts, R.: Linear road: a stream data management benchmark. In: Proceedings of the Thirtieth International Conference on Very Large Data Bases, VLDB 2004, vol. 30 (2004)

    Google Scholar 

  24. Mitzenmacher, M., Upfal, E.: Probability & Computing: Randomized Algorithms and Probabilistic Analysis. Cambridge U. Press (2005)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Dezfuli, M.G., Haghjoo, M.S. (2012). Xtream: A System for Continuous Querying over Uncertain Data Streams. In: Hüllermeier, E., Link, S., Fober, T., Seeger, B. (eds) Scalable Uncertainty Management. SUM 2012. Lecture Notes in Computer Science(), vol 7520. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-33362-0_1

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-33362-0_1

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-33361-3

  • Online ISBN: 978-3-642-33362-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics