PMJoin: Optimizing Distributed Multi-way Stream Joins by Stream Partitioning

  • Yongluan Zhou
  • Ying Yan
  • Feng Yu
  • Aoying Zhou
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3882)


In emerging data stream applications, data sources are typically distributed. Evaluating multi-join queries over streams from different sources may incur large communication cost. As queries run continuously, the precious bandwidths would be aggressively consumed without careful optimization of operator ordering and placement. In this paper, we focus on the optimization of continuous multi-join queries over distributed streams. We observe that by partitioning streams into substreams we can significantly reduce the communication cost and hence propose a novel partition-based join scheme – PMJoin. A few partitioning techniques are studied. To generate the query plan for each substream, a heuristic algorithm is proposed based on a rate-based model. Results from an extensive experimental study show that our techniques can sufficiently reduce the communication cost.


Arrival Rate Query Processing Communication Cost Query Plan Continuous Query 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Abadi, D.J., et al.: The Design of the Borealis Stream Processing Engine. In: CIDR (2005)Google Scholar
  2. 2.
    Ahmad, Y., Çetintemel, U.: Networked query processing for distributed stream-based applications. In: VLDB (2004)Google Scholar
  3. 3.
    Apers, P.M.G.: Data allocation in distributed database systems. ACM Trans. Database Syst. (1988)Google Scholar
  4. 4.
    Avnur, R., Hellerstein, J.M.: Eddies: continuously adaptive query processing. In: SIGMOD (2000)Google Scholar
  5. 5.
    Ayad, A.M., Naughton, J.F.: Static optimization of conjunctive queries with sliding windows over infinite streams. In: SIGMOD (2004)Google Scholar
  6. 6.
    Babu, S., et al.: Adaptive ordering of pipelined stream filters. In: SIGMOD (2004)Google Scholar
  7. 7.
    Bernstein, P.A., et al.: Query processing in a system for distributed databases (sdd-1). ACM Trans. Database Syst (1981)Google Scholar
  8. 8.
    DeWitt, D.J., Gerber, R.H.: Multiprocessor hash-based join algorithms. In: VLDB (1985)Google Scholar
  9. 9.
    Epstein, R., Stonebraker, M., Wong, E.: Distributed query processing in a relational data base system. In: SIGMOD (1978)Google Scholar
  10. 10.
    Golab, L., Özsu, M.T.: Processing sliding window multi-joins in continuous queries over data streams. In: Aberer, K., Koubarakis, M., Kalogeraki, V. (eds.) VLDB 2003. LNCS, vol. 2944, Springer, Heidelberg (2004)Google Scholar
  11. 11.
    Jain, A., Dubes, R.: Algorithms for Clustering Data. Prentice-Hall, Englewood Cliffs (1998)MATHGoogle Scholar
  12. 12.
    Kang, J., et al.: Evaluating window joins over unbounded streams. In: ICDE (2003)Google Scholar
  13. 13.
    Kossmann, D.: The state of the art in distributed query processing. ACM Comput. Surv. (2000)Google Scholar
  14. 14.
    Lohman, G.M., et al.: Query processing in r*. In: Query Processing in Database Systems, Springer, Heidelberg (1985)Google Scholar
  15. 15.
    Madden, S., et al.: Continuously adaptive continuous queries over streams. In: SIGMOD (2002)Google Scholar
  16. 16.
    Shasha, D., Wang, J.T.-L.: Optimizing equijoin queries in distributed databases where relations are hash partitioned. ACM Trans. Database Syst (1991)Google Scholar
  17. 17.
    Sidell, J., et al.: Data replication in mariposa. In: ICDE (1996)Google Scholar
  18. 18.
    Tian, F., DeWitt, D.J.: Tuple routing strategies for distributed eddies. In: Aberer, K., Koubarakis, M., Kalogeraki, V. (eds.) VLDB 2003. LNCS, vol. 2944, Springer, Heidelberg (2004)Google Scholar
  19. 19.
    Viglas, S., Naughton, J.F., Burger, J.: Maximizing the output rate of multi-way join queries over streaming information sources. In: Aberer, K., Koubarakis, M., Kalogeraki, V. (eds.) VLDB 2003. LNCS, vol. 2944, Springer, Heidelberg (2004)Google Scholar
  20. 20.
    Wolfson, O., Jajodia, S., Huang, Y.: An adaptive data replication algorithm. ACM Trans. Database Syst. (1997)Google Scholar
  21. 21.
    Yu, C.T., et al.: Partition strategy for distributed query processing in fast local networks. IEEE Trans. Software Eng. (1989)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Yongluan Zhou
    • 1
  • Ying Yan
    • 2
  • Feng Yu
    • 1
  • Aoying Zhou
    • 2
  1. 1.National University of SingaporeSingapore
  2. 2.Fudan UniversityChina

Personalised recommendations