Skip to main content
Log in

DHTJoin: processing continuous join queries using DHT networks

  • Published:
Distributed and Parallel Databases Aims and scope Submit manuscript

Abstract

Continuous query processing in data stream management systems (DSMS) has received considerable attention recently. Many applications share the same need for processing data streams in a continuous fashion. For most distributed streaming applications, the centralized processing of continuous queries over distributed data is simply not viable. This paper addresses the problem of computing approximate answers to continuous join queries over distributed data streams. We present a new method, called DHTJoin, which combines hash-based placement of tuples in a Distributed Hash Table (DHT) and dissemination of queries by exploiting the embedded trees in the underlying DHT, thereby incurring little overhead. DHTJoin also deals with join attribute value skew which may hurt load balancing and result completeness. We provide a performance evaluation of DHTJoin which shows that it can achieve significant performance gains in terms of network traffic.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Akbarinia, R., Pacitti, E., Valduriez, P.: Best position algorithms for top-k queries. In: VLDB, pp. 495–506 (2007)

  2. Arasu, A., Widom, J.: A denotational semantics for continuous queries over streams and relations. SIGMOD Rec. 33(3), 6–12 (2004)

    Article  Google Scholar 

  3. Bawa, M., Gionis, A., Garcia-Molina, H., Motwani, R.: The price of validity in dynamic networks. In: SIGMOD Conference, pp. 515–526 (2004)

  4. Bonnet, P., Gehrke, J., Seshadri, P.: Towards sensor database systems. In: Mobile Data Management, pp. 3–14 (2001)

  5. Castro, M., Jones, M.B., Kermarrec, A.-M., Rowstron, A.I.T., Theimer, M., Wang, H.J., Wolman, A.: An evaluation of scalable application-level multicast built using peer-to-peer overlays. In: INFOCOM (2003)

  6. Chandrasekaran, S., Cooper, O., Deshpande, A., Franklin, M.J., Hellerstein, J.M., Hong, W., Krishnamurthy, S., Madden, S., Raman, V., Reiss, F., Shah, M.A.: Telegraphcq: Continuous dataflow processing for an uncertain world. In: CIDR (2003)

  7. Chen, J., DeWitt, D.J., Tian, F., Wang, Y.: Niagaracq: A scalable continuous query system for Internet databases. In: SIGMOD Conference, pp. 379–390 (2000)

  8. Cormode, G., Garofalakis, M.N.: Streaming in a connected world: querying and tracking distributed data streams. In: SIGMOD Conference, pp. 1178–1181 (2007)

  9. Dabek, F., Zhao, B.Y., Druschel, P., Kubiatowicz, J., Stoica, I.: Towards a common api for structured peer-to-peer overlays. In: IPTPS, pp. 33–44 (2003)

  10. DeWitt, D.J., Naughton, J.F., Schneider, D.A., Seshadri, S.: Practical skew handling in parallel joins. In: VLDB, pp. 27–40 (1992)

  11. El-Ansary, S., Alima, L.O., Brand, P., Haridi, S.: Efficient broadcast in structured p2p networks. In: IPTPS, pp. 304–314 (2003)

  12. Gedik, B., Liu, L.: Peercq: A decentralized and self-configuring peer-to-peer information monitoring system. In: ICDCS, pp. 490–499 (2003)

  13. Golab, L., Johnson, T., Koudas, N., Srivastava, D., Toman, D.: Optimizing away joins on data streams. In: SSPS, pp. 48–57 (2008)

  14. Golab, L., Özsu, M.T.: Processing sliding window multi-joins in continuous queries over data streams. In: VLDB, pp. 500–511 (2003)

  15. Hua, K.A., Lee, C.: Handling data skew in multiprocessor database computers using partition tuning. In: VLDB, pp. 525–535 (1991)

  16. Huebsch, R., Hellerstein, J.M., Lanham, N., Loo, B.T., Shenker, S., Stoica, I.: Querying the Internet with pier. In: VLDB, pp. 321–332 (2003)

  17. Idreos, S., Liarou, E., Koubarakis, M.: Continuous multi-way joins over distributed hash tables. In: EDBT, pp. 594–605 (2008)

  18. Kang, J., Naughton, J.F., Viglas, S.: Evaluating window joins over unbounded streams. In: ICDE, pp. 341–352 (2003)

  19. Karnstedt, M., Sattler, K.-U., Haß, M., Hauswirth, M., Sapkota, B., Schmidt, R.: Estimating the number of answers with guarantees for structured queries in p2p databases. In: CIKM, pp. 1407–1408 (2008)

  20. Kermarrec, A.-M., Massoulié, L., Ganesh, A.J.: Probabilistic reliable dissemination in large-scale systems. IEEE Trans. Parallel Distrib. Syst. 14(3), 248–258 (2003)

    Article  Google Scholar 

  21. Kitsuregawa, M., Ogawa, Y.: Bucket spreading parallel hash: A new, robust, parallel hash join method for data skew in the super database computer (sdc). In: VLDB, pp. 210–221 (1990)

  22. Lin, M.-J., Marzullo, K., Masini, S.: Gossip versus deterministically constrained flooding on small networks. In: DISC, pp. 253–267 (2000)

  23. Liu, B., Rundensteiner, E.A.: Revisiting pipelined parallelism in multi-join query processing. In: VLDB, pp. 829–840 (2005)

  24. Liu, B., Zhu, Y., Jbantova, M., Momberger, B., Rundensteiner, E.A.: A dynamically adaptive distributed system for processing complex continuous queries. In: VLDB, pp. 1338–1341 (2005)

  25. Narayanan, D., Donnelly, A., Mortier, R., Rowstron, A.I.T.: Delay aware querying with seaweed. VLDB J. 17(2), 315–331 (2008)

    Article  Google Scholar 

  26. Naumann, F., Freytag, J.C., Leser, U.: Completeness of integrated information sources. Inf. Syst. 29(7), 583–615 (2004)

    Article  Google Scholar 

  27. Özsu, M.T., Valduriez, P.: Principles of Distributed Database Systems, 2nd edn. Prentice-Hall, Englewood Cliffs (1999)

    Google Scholar 

  28. Palma, W., Akbarinia, R., Pacitti, E., Valduriez, P.: Efficient processing of continuous join queries using distributed hash tables. In: Euro-Par, pp. 632–641 (2008)

  29. Plaxton, C.G., Rajaraman, R., Richa, A.W.: Accessing nearby copies of replicated objects in a distributed environment. Theory Comput. Syst. 32(3), 241–280 (1999)

    Article  MATH  MathSciNet  Google Scholar 

  30. Ratnasamy, S., Francis, P., Handley, M., Karp, R.M., Shenker, S.: A scalable content-addressable network. In: SIGCOMM, pp. 161–172 (2001)

  31. Rowstron, A.I.T., Druschel, P.: Pastry: Scalable, decentralized object location, and routing for large-scale peer-to-peer systems. In: Middleware, pp. 329–350 (2001)

  32. Srivastava, U., Widom, J.: Memory-limited execution of windowed stream joins. In: VLDB, pp. 324–335 (2004)

  33. Stoica, I., Morris, R., Karger, D.R., Kaashoek, M.F., Balakrishnan, H.: Chord: A scalable peer-to-peer lookup service for Internet applications. In: SIGCOMM, pp. 149–160 (2001)

  34. Sullivan, M.: Tribeca: A stream database manager for network traffic analysis. In: VLDB, p. 594 (1996)

  35. Tatbul, N., Çetintemel, U., Zdonik, S.B.: Staying fit: Efficient load shedding techniques for distributed stream processing. In: VLDB, pp. 159–170 (2007)

  36. Tatbul, N., Zdonik, S.B.: Window-aware load shedding for aggregation queries over data streams. In: VLDB, pp. 799–810 (2006)

  37. Viglas, S., Naughton, J.F., Burger, J.: Maximizing the output rate of multi-way join queries over streaming information sources. In: VLDB, pp. 285–296 (2003)

  38. Wolf, J.L., Yu, P.S., Turek, J., Dias, D.M.: An effective algorithm for parallelizing hash joins in the presence of data skew. Wishful Research Result RC 15510, IBM T.J. Watson Research Center (1990)

  39. Xu, Y., Kostamaa, P., Zhou, X., Chen, L.: Handling data skew in parallel joins in shared-nothing systems. In: SIGMOD Conference, pp. 1043–1052 (2008)

  40. Yang, Y., Papadias, D.: Just-in-time processing of continuous queries. In: ICDE, pp. 1150–1159 (2008)

  41. Zhao, B.Y., Huang, L., Stribling, J., Rhea, S.C., Joseph, A.D., Kubiatowicz, J.: Tapestry: a resilient global-scale overlay for service deployment. IEEE J. Sel. Areas Commun. 22(1), 41–53 (2004)

    Article  Google Scholar 

  42. Zhao, K., Zhou, S., Tan, K.-L., Zhou, A.: Supporting ranked join in peer-to-peer networks. In: DEXA Workshops, pp. 796–800 (2005)

  43. Zhou, Y., Yan, Y., Yu, F., Zhou, A.: Pmjoin: Optimizing distributed multi-way stream joins by stream partitioning. In: DASFAA, pp. 325–341 (2006)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Wenceslao Palma.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Palma, W., Akbarinia, R., Pacitti, E. et al. DHTJoin: processing continuous join queries using DHT networks. Distrib Parallel Databases 26, 291 (2009). https://doi.org/10.1007/s10619-009-7054-7

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s10619-009-7054-7

Keywords

Navigation