Abstract
Data stream and probabilistic data have been recently considered noticeably in isolation. However, there are many applications including sensor data management systems and object monitoring systems which need both issues in tandem. The existence of complex correlations and lineages prevents Probabilistic DBMSs (PDBMSs) from continuously querying temporal positioning and sensed data. Our main contribution is developing a new system to continuously run monitoring queries on probabilistic data streams with a satisfactory fast speed, while being faithful to correlations and uncertainty aspects of data. We designed a new data model for probabilistic data streams. We also presented new query operators to implement threshold SPJ queries with aggregation (SPJA queries). In addition and most importantly, we build a java-based working system, called Xtream, which supports uncertainty from input data streams to final query results. Unlike probabilistic databases, the data-driven design of Xtream makes it possible to continuously query high-volumes of bursty probabilistic data streams. In this paper, after reviewing main characteristics and motivating applications for probabilistic data streams, we present our new data model. Then we focus on algorithms and approximations for basic operators (select, project, join, and aggregate). Finally, we compare our prototype with Orion the only existing probabilistic DBMS that supports continuous distributions. Our experiments demonstrate how Xtream outperforms Orion w.r.t. efficiency metrics such as tuple latency (response time) and throughput as well as accuracy, which are critical parameters in any probabilistic data stream management system.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Dalvi, N., Ré, C., Suciu, D.: Probabilistic databases: diamonds in the dirt. Commun. ACM 52(7), 86–94 (2009)
Kanagal, B., Deshpande, A.: Efficient Query Evaluation over Temporally Correlated Probabilistic Streams. In: Proceedings of the 2009 IEEE International Conference on Data Engineering, ICDE 2009 (2009)
Tran, T., Sutton, C., Cocci, R., Nie, Y., Diao, Y., Shenoy, P.: Probabilistic Inference over RFID Streams in Mobile Environments. In: Proceedings of ICDE 2009 (2009)
Kanagal, B., Deshpande, A.: Online Filtering, Smoothing and Probabilistic Modeling of Streaming Data. In: ICDE, pp. 1160–1169 (2008)
Kurose, J., Lyons, E., McLaughlin, D., Pepyne, D., Philips, B., Westbrook, D., Zink, M.: An End-User-Responsive Sensor Network Architecture for Hazardous Weather Detection, Prediction and Response. In: Cho, K., Jacquet, P. (eds.) AINTEC 2006. LNCS, vol. 4311, pp. 1–15. Springer, Heidelberg (2006)
Cheng, R., Singh, S., Prabhakar, S., Shah, R., Vitter, J.S., Xia, Y.: Efficient join processing over uncertain data. In: Proceedings of the 15th ACM International Conference on Information and Knowledge Management, CIKM 2006 (2006)
Diao, Y., Li, B., Liu, A., Peng, L., Sutton, C., Tran, T., Zink, M.: Capturing Data Uncertainty in High-Volume Stream Processing. In: CIDR (2009)
Safaei, A.A., Haghjoo, M.S.: Parallel Processing of Data Stream Query Operators. Journal of Distributed and Parallel Databases 28(2-3) (2010)
Mayfield, C., Neville, J., Prabhakar, S.: ERACER: a database approach for statistical inference and data cleaning. In: Proceedings of the 2010 International Conference on Management of Data, SIGMOD 2010, pp. 75–86. ACM, New York (2010)
Fuhr, N., Rölleke, T.: A probabilistic relational algebra for the integration of information retrieval and database systems. ACM Trans. Inf. Syst. 15(1), 32–66 (1997)
Dalvi, N., Suciu, D.: Efficient query evaluation on probabilistic databases. The VLDB Journal 16(4), 523–544 (2007)
Singh, S., Mayfield, C., Shah, R., Prabhakar, S., Hambrusch, S., Neville, J., Cheng, R.: Database Support for Probabilistic Attributes and Tuples. In: Proceedings of the IEEE 24th International Conference on Data Engineering, April 07-12 (2008)
Karachi, A., Dezfuli, M.G., Haghjoo, M.S.: PLR: A Benchmark for Probabilistic Data Stream Management Systems. In: Pan, J.-S., Chen, S.-M., Nguyen, N.T. (eds.) ACIIDS 2012, Part III. LNCS, vol. 7198, pp. 405–415. Springer, Heidelberg (2012)
Benjelloun, O., Sarma, A.D., Halevy, A., Widom, J.: ULDBs: Databases with Uncertainty and Lineage. In: Proceedings of the 32nd International Conference on VLDB, pp. 953–964 (2006)
Antova, L., Koch, C., Olteanu, D.: From Complete to Incomplete Information and Back. In: Proceedings of the 2007 ACM SIGMOD International Conference on Management of Data, Beijing, China, June 11-14 (2007)
Sen, P., Deshpande, A., Getoor, L.: PrDB: managing and exploiting rich correlations in probabilistic databases. The VLDB Journal 18(5), 1065–1090 (2009)
Agrawal, P., Widom, J.: Continuous Uncertainty in Trio. In: MUD (2009)
Tran, T., Peng, L., Li, B., Diao, Y., Liu, A.: PODS: a new model and processing algorithms for uncertain data streams. In: Proceedings of the 2010 International Conference on Management of Data, SIGMOD 2010 (2010)
Ré, C., Letchner, J., Balazinksa, M., Suciu, D.: Event queries on correlated probabilistic streams. In: Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data, SIGMOD 2008, pp. 715–728. ACM, New York (2008)
Ge, T., Zdonik, S.: Handling Uncertain Data in Array Database Systems. In: Proceedings of the IEEE 24th International Conference on Data Engineering, April 07-12 (2008)
Krämer, J., Seeger, B.: Semantics and implementation of continuous sliding window queries over data streams. ACM Trans. Database Syst. 34(1), Article 4 (April 2009)
Dezfuli, M.G., Haghjoo, M.S.: A Semantical Model for Probabilistic Data Stream Management Systems with Continuous Distributions. Submitted to Distributed and Parallel Databases Journal (2011)
Arasu, A., Cherniack, M., Galvez, E., Maier, D., Maskey, A.S., Ryvkina, E., Stonebraker, M., Tibbetts, R.: Linear road: a stream data management benchmark. In: Proceedings of the Thirtieth International Conference on Very Large Data Bases, VLDB 2004, vol. 30 (2004)
Mitzenmacher, M., Upfal, E.: Probability & Computing: Randomized Algorithms and Probabilistic Analysis. Cambridge U. Press (2005)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Dezfuli, M.G., Haghjoo, M.S. (2012). Xtream: A System for Continuous Querying over Uncertain Data Streams. In: Hüllermeier, E., Link, S., Fober, T., Seeger, B. (eds) Scalable Uncertainty Management. SUM 2012. Lecture Notes in Computer Science(), vol 7520. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-33362-0_1
Download citation
DOI: https://doi.org/10.1007/978-3-642-33362-0_1
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-33361-3
Online ISBN: 978-3-642-33362-0
eBook Packages: Computer ScienceComputer Science (R0)