Distributed and Parallel Databases

, Volume 30, Issue 5–6, pp 401–414 | Cite as

High-throughput query scheduling with spatial clustering based on distributed exponential moving average

  • Beomseok Nam
  • Deukyeon Hwang
  • Jinwoong Kim
  • Minho Shin
Article

Abstract

In distributed scientific query processing systems, leveraging distributed cached data is becoming more important. In such systems, a front-end query scheduler distributes queries among many application servers rather than processing queries in a few high-performance workstations. Although many query scheduling policies exist such as round-robin and load-monitoring, they are not sophisticated enough to exploit cached results as well as balance the workload. Efforts were made to improve the query processing performance using statistical methods such as exponential moving average. However, existing methods have limitations for certain query patterns: queries with hotspots, or dynamic query distributions. In this paper, we propose novel query scheduling policies that take into account both the contents of distributed caching infrastructure and the load balance among the servers. Our experiments show that the proposed query scheduling policies outperform existing policies by producing better query plans in terms of load balance and cache-hit ratio.

Keywords

Distributed query scheduling Multiple query optimization Spatial clustering Cache aware load balancing 

Notes

Acknowledgements

This research was supported by PLSI resources, 1.100027.01 Research Fund of the UNIST (Ulsan National Institute of Science and Technology), and 2.110147.01 National Research Foundation of Korea. This work was also supported by 2011 Research Fund of Myongji University.

References

  1. 1.
    Andrade, H., Kurc, T., Sussman, A., Saltz, J.: Multiple query optimization for data analysis applications on clusters of SMPs. In: Proceedings of the 2nd IEEE/ACM International Symposium on Cluster Computing and the Grid (CCGrid). IEEE Comput. Soc., Los Alamitos (2002) Google Scholar
  2. 2.
    Aron, M., Sanders, D., Druschel, P., Zwaenepoel, W.: Scalable content-aware request distribution in cluster-based network servers. In: Proceedings of Usenix Annual Technical Conference (2000) Google Scholar
  3. 3.
    Arya, S., Mount, D.M., Netanyahu, N.S., Silverman, R., Wu, A.Y.: An optimal algorithm for approximate nearest neighbor searching fixed dimensions. J. ACM 45(6), 891–923 (1998) MathSciNetMATHCrossRefGoogle Scholar
  4. 4.
    de Berg, M., Cheong, O., van Kreveld, M., Overmars, M.: Computational Geometry, Algorithms and Applications. Springer, Berlin (1998) Google Scholar
  5. 5.
    Chen, F.C.F., Dunham, M.H.: Common subexpression processing in multiple-query processing. Transactions on Knowledge and Data Engineering 10(5), 493–499 (199) Google Scholar
  6. 6.
    Chou, Y.l.: Statistical Analysis. Holt International (1975) Google Scholar
  7. 7.
    Kullback, S., Leibler, R.A.: On information and sufficiency. Ann. Math. Stat. 22(1), 79–86 (1951) MathSciNetMATHCrossRefGoogle Scholar
  8. 8.
    Menasce, D.A., Almeida, V.A.F.: Scaling for E-Business: Technologies, Models, Performance, and Capacity Planning. Prentice Hall, New York (2000) Google Scholar
  9. 9.
    Nam, B., Shin, M., Andrade, H., Sussman, A.: Multiple query scheduling for distributed semantic caches. J. Parallel Distrib. Comput. 70(5), 598–611 (2010) MATHCrossRefGoogle Scholar
  10. 10.
    Pai, V., Aron, M., Banga, G., Svendsen, M., Druschel, P., Zwaenepoel, W., Nahum, E.: Locality-aware request distribution in cluster-based network servers. In: Proceedings of ACM ASPLOS (1998) Google Scholar
  11. 11.
    Ren, Q., Dunham, M.H., Kumar, V.: Semantic caching and query processing. IEEE Trans. Knowl. Data Eng. 15(1), 192–210 (2003) CrossRefGoogle Scholar
  12. 12.
    Roy, P., Sehadri, S., Sudarshan, S., Bhobe, S.: Efficient and extensible algorithms for multi query optimization. In: Proceedings of 2000 ACM SIGMOD International Conference on Management of Data (SIGMOD), pp. 249–260 (2000) CrossRefGoogle Scholar
  13. 13.
    Sellis, T.K., Ghosh, S.: On the multiple-query optimization problem. IEEE Trans. Knowl. Data Eng. 2(2), 262–266 (1990) CrossRefGoogle Scholar
  14. 14.
    Xiong, X., Mokbel, M.F., Aref, W.G., Hambrusch, S.E., Prabhakar, S.: Scalable spatio-temporal continuous query processing for location-aware services. In: Proceedings of 16th International Conference on Scientific and Statistical Database Management (SSDBM) (2004) Google Scholar
  15. 15.
    Zhang, K., Andrade, H., Raschid, L., Sussman, A.: Query planning for the Grid: adapting to dynamic resource availability. In: Proceedings of the 5th IEEE/ACM International Symposium on Cluster Computing and the Grid (CCGrid), Cardiff, UK (2005) Google Scholar
  16. 16.
    Zhao, Y., Desshpande, P.M., Naughton, J.F., Shukla, A.: Simultaneous optimization and evaluation of multiple dimensional queries. In: Proceedings of 1998 ACM SIGMOD International Conference on Management of Data (SIGMOD), pp. 271–282 (1998) CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC 2012

Authors and Affiliations

  • Beomseok Nam
    • 1
  • Deukyeon Hwang
    • 1
  • Jinwoong Kim
    • 1
  • Minho Shin
    • 2
  1. 1.Electrical and Computer EngineeringUlsan National Inst. of Science and TechnologyUlsanKorea
  2. 2.Dept. of Computer EngineeringMyongji UniversityYonginKorea

Personalised recommendations