Abstract
Continuous query processing in data stream management systems (DSMS) has received considerable attention recently. Many applications share the same need for processing data streams in a continuous fashion. For most distributed streaming applications, the centralized processing of continuous queries over distributed data is simply not viable. This paper addresses the problem of computing approximate answers to continuous join queries over distributed data streams. We present a new method, called DHTJoin, which combines hash-based placement of tuples in a Distributed Hash Table (DHT) and dissemination of queries by exploiting the embedded trees in the underlying DHT, thereby incurring little overhead. DHTJoin also deals with join attribute value skew which may hurt load balancing and result completeness. We provide a performance evaluation of DHTJoin which shows that it can achieve significant performance gains in terms of network traffic.
Similar content being viewed by others
References
Akbarinia, R., Pacitti, E., Valduriez, P.: Best position algorithms for top-k queries. In: VLDB, pp. 495–506 (2007)
Arasu, A., Widom, J.: A denotational semantics for continuous queries over streams and relations. SIGMOD Rec. 33(3), 6–12 (2004)
Bawa, M., Gionis, A., Garcia-Molina, H., Motwani, R.: The price of validity in dynamic networks. In: SIGMOD Conference, pp. 515–526 (2004)
Bonnet, P., Gehrke, J., Seshadri, P.: Towards sensor database systems. In: Mobile Data Management, pp. 3–14 (2001)
Castro, M., Jones, M.B., Kermarrec, A.-M., Rowstron, A.I.T., Theimer, M., Wang, H.J., Wolman, A.: An evaluation of scalable application-level multicast built using peer-to-peer overlays. In: INFOCOM (2003)
Chandrasekaran, S., Cooper, O., Deshpande, A., Franklin, M.J., Hellerstein, J.M., Hong, W., Krishnamurthy, S., Madden, S., Raman, V., Reiss, F., Shah, M.A.: Telegraphcq: Continuous dataflow processing for an uncertain world. In: CIDR (2003)
Chen, J., DeWitt, D.J., Tian, F., Wang, Y.: Niagaracq: A scalable continuous query system for Internet databases. In: SIGMOD Conference, pp. 379–390 (2000)
Cormode, G., Garofalakis, M.N.: Streaming in a connected world: querying and tracking distributed data streams. In: SIGMOD Conference, pp. 1178–1181 (2007)
Dabek, F., Zhao, B.Y., Druschel, P., Kubiatowicz, J., Stoica, I.: Towards a common api for structured peer-to-peer overlays. In: IPTPS, pp. 33–44 (2003)
DeWitt, D.J., Naughton, J.F., Schneider, D.A., Seshadri, S.: Practical skew handling in parallel joins. In: VLDB, pp. 27–40 (1992)
El-Ansary, S., Alima, L.O., Brand, P., Haridi, S.: Efficient broadcast in structured p2p networks. In: IPTPS, pp. 304–314 (2003)
Gedik, B., Liu, L.: Peercq: A decentralized and self-configuring peer-to-peer information monitoring system. In: ICDCS, pp. 490–499 (2003)
Golab, L., Johnson, T., Koudas, N., Srivastava, D., Toman, D.: Optimizing away joins on data streams. In: SSPS, pp. 48–57 (2008)
Golab, L., Özsu, M.T.: Processing sliding window multi-joins in continuous queries over data streams. In: VLDB, pp. 500–511 (2003)
Hua, K.A., Lee, C.: Handling data skew in multiprocessor database computers using partition tuning. In: VLDB, pp. 525–535 (1991)
Huebsch, R., Hellerstein, J.M., Lanham, N., Loo, B.T., Shenker, S., Stoica, I.: Querying the Internet with pier. In: VLDB, pp. 321–332 (2003)
Idreos, S., Liarou, E., Koubarakis, M.: Continuous multi-way joins over distributed hash tables. In: EDBT, pp. 594–605 (2008)
Kang, J., Naughton, J.F., Viglas, S.: Evaluating window joins over unbounded streams. In: ICDE, pp. 341–352 (2003)
Karnstedt, M., Sattler, K.-U., Haß, M., Hauswirth, M., Sapkota, B., Schmidt, R.: Estimating the number of answers with guarantees for structured queries in p2p databases. In: CIKM, pp. 1407–1408 (2008)
Kermarrec, A.-M., Massoulié, L., Ganesh, A.J.: Probabilistic reliable dissemination in large-scale systems. IEEE Trans. Parallel Distrib. Syst. 14(3), 248–258 (2003)
Kitsuregawa, M., Ogawa, Y.: Bucket spreading parallel hash: A new, robust, parallel hash join method for data skew in the super database computer (sdc). In: VLDB, pp. 210–221 (1990)
Lin, M.-J., Marzullo, K., Masini, S.: Gossip versus deterministically constrained flooding on small networks. In: DISC, pp. 253–267 (2000)
Liu, B., Rundensteiner, E.A.: Revisiting pipelined parallelism in multi-join query processing. In: VLDB, pp. 829–840 (2005)
Liu, B., Zhu, Y., Jbantova, M., Momberger, B., Rundensteiner, E.A.: A dynamically adaptive distributed system for processing complex continuous queries. In: VLDB, pp. 1338–1341 (2005)
Narayanan, D., Donnelly, A., Mortier, R., Rowstron, A.I.T.: Delay aware querying with seaweed. VLDB J. 17(2), 315–331 (2008)
Naumann, F., Freytag, J.C., Leser, U.: Completeness of integrated information sources. Inf. Syst. 29(7), 583–615 (2004)
Özsu, M.T., Valduriez, P.: Principles of Distributed Database Systems, 2nd edn. Prentice-Hall, Englewood Cliffs (1999)
Palma, W., Akbarinia, R., Pacitti, E., Valduriez, P.: Efficient processing of continuous join queries using distributed hash tables. In: Euro-Par, pp. 632–641 (2008)
Plaxton, C.G., Rajaraman, R., Richa, A.W.: Accessing nearby copies of replicated objects in a distributed environment. Theory Comput. Syst. 32(3), 241–280 (1999)
Ratnasamy, S., Francis, P., Handley, M., Karp, R.M., Shenker, S.: A scalable content-addressable network. In: SIGCOMM, pp. 161–172 (2001)
Rowstron, A.I.T., Druschel, P.: Pastry: Scalable, decentralized object location, and routing for large-scale peer-to-peer systems. In: Middleware, pp. 329–350 (2001)
Srivastava, U., Widom, J.: Memory-limited execution of windowed stream joins. In: VLDB, pp. 324–335 (2004)
Stoica, I., Morris, R., Karger, D.R., Kaashoek, M.F., Balakrishnan, H.: Chord: A scalable peer-to-peer lookup service for Internet applications. In: SIGCOMM, pp. 149–160 (2001)
Sullivan, M.: Tribeca: A stream database manager for network traffic analysis. In: VLDB, p. 594 (1996)
Tatbul, N., Çetintemel, U., Zdonik, S.B.: Staying fit: Efficient load shedding techniques for distributed stream processing. In: VLDB, pp. 159–170 (2007)
Tatbul, N., Zdonik, S.B.: Window-aware load shedding for aggregation queries over data streams. In: VLDB, pp. 799–810 (2006)
Viglas, S., Naughton, J.F., Burger, J.: Maximizing the output rate of multi-way join queries over streaming information sources. In: VLDB, pp. 285–296 (2003)
Wolf, J.L., Yu, P.S., Turek, J., Dias, D.M.: An effective algorithm for parallelizing hash joins in the presence of data skew. Wishful Research Result RC 15510, IBM T.J. Watson Research Center (1990)
Xu, Y., Kostamaa, P., Zhou, X., Chen, L.: Handling data skew in parallel joins in shared-nothing systems. In: SIGMOD Conference, pp. 1043–1052 (2008)
Yang, Y., Papadias, D.: Just-in-time processing of continuous queries. In: ICDE, pp. 1150–1159 (2008)
Zhao, B.Y., Huang, L., Stribling, J., Rhea, S.C., Joseph, A.D., Kubiatowicz, J.: Tapestry: a resilient global-scale overlay for service deployment. IEEE J. Sel. Areas Commun. 22(1), 41–53 (2004)
Zhao, K., Zhou, S., Tan, K.-L., Zhou, A.: Supporting ranked join in peer-to-peer networks. In: DEXA Workshops, pp. 796–800 (2005)
Zhou, Y., Yan, Y., Yu, F., Zhou, A.: Pmjoin: Optimizing distributed multi-way stream joins by stream partitioning. In: DASFAA, pp. 325–341 (2006)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Palma, W., Akbarinia, R., Pacitti, E. et al. DHTJoin: processing continuous join queries using DHT networks. Distrib Parallel Databases 26, 291 (2009). https://doi.org/10.1007/s10619-009-7054-7
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s10619-009-7054-7