Advertisement

The Journal of Supercomputing

, Volume 63, Issue 3, pp 722–736 | Cite as

A workload-driven approach to database query processing in the cloud

  • Adnene Guabtni
  • Rajiv Ranjan
  • Fethi A. Rabhi
Article

Abstract

This paper is concerned with data provisioning services (information search, retrieval, storage, etc.) dealing with a large and heterogeneous information repository. Increasingly, this class of services is being hosted and delivered through Cloud infrastructures. Although such systems are becoming popular, existing resource management methods (e.g. load-balancing techniques) do not consider workload patterns nor do they perform well when subjected to non-uniformly distributed datasets. If these problems can be solved, this class of services can be made to operate in more a scalable, efficient, and reliable manner.

The main contribution of this paper is a approach that combines proprietary cloud-based load balancing techniques and density-based partitioning for efficient range query processing across relational database-as-a-service in cloud computing environments. The study is conducted over a real-world data provisioning service that manages a large historical news database from Thomson Reuters. The proposed approach has been implemented and tested as a multi-tier web application suite consisting of load-balancing, application, and database layers. We have validated our approach by conducting a set of rigorous performance evaluation experiments using the Amazon EC2 infrastructure. The results prove that augmenting a cloud-based load-balancing service (e.g. Amazon Elastic Load Balancer) with workload characterization intelligence (density and distribution of data; composition of queries) offers significant benefits with regards to the overall system’s performance (i.e. query latency and database service throughput).

Keywords

Range query processing Load balancing Data density Cloud computing 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Armbrust M et al (2009) Above the clouds: A Berkeley view of cloud computing. Tech Rep UCB/EECS-2009-28, EECS Department. University of California, Berkeley Google Scholar
  2. 2.
    Rochwerger B et al (2009) The RESERVOIR model and architecture for open federated cloud computing. IBM J Res Dev 53(4):535–545 CrossRefGoogle Scholar
  3. 3.
    Buyya R et al (2009) Cloud computing and emerging it platforms: Vision, hype, and reality for delivering computing as the 5th utility. Future Gener Comput Syst 25(6):599–616 CrossRefGoogle Scholar
  4. 4.
    Gillett FE et al (2008) Future view: The new tech ecosystems of cloud, cloud services, and cloud computing, Tech rep, Forrester Research, Inc Google Scholar
  5. 5.
    Varia J (2009) Cloud architectures, Tech rep, Amazon Web Services Google Scholar
  6. 6.
    Windows azure platform. http://www.microsoft.com/azure/ (accessed August 2011)
  7. 7.
    Wang L et al (2010) Provide virtual machine information for grid computing. IEEE Trans Syst Man Cybern, Part A, Syst Hum 40(6):1362–1374 CrossRefGoogle Scholar
  8. 8.
    Amazon cloudwatch service. http://aws.amazon.com/cloudwatch/ (accessed August 2011)
  9. 9.
    Amazon load balancer service. http://aws.amazon.com/elasticloadbalancing/ (accessed August 2011)
  10. 10.
    Amazon elastic mapreduce service. http://aws.amazon.com/elasticmapreduce/ (accessed August 2011)
  11. 11.
    Force.com cloud solutions (saas). http://www.salesforce.com/platform/ (accessed August 2011)
  12. 12.
    Wang L et al (2010) Cloud computing: a perspective study. New Gener Comput 28(2):137–146 zbMATHCrossRefGoogle Scholar
  13. 13.
    Pitoura T et al (2006) Replication, load balancing and efficient range query processing in dhts. In: Advances in database technology - EDBT 2006, vol 3896. Springer, Berlin, pp 131–148 CrossRefGoogle Scholar
  14. 14.
    Chen D et al (2010) Synchronization in federation community networks. J Parallel Distrib Comput 70(2):144–159 zbMATHCrossRefGoogle Scholar
  15. 15.
    Olofson CW (August 2010) Keeping your data in the clouds and your feet on the ground, whitepaper, idc, sponsored by: Sybase Google Scholar
  16. 16.
    Curino C et al (2011) Relational cloud: A database service for the cloud. In: 5th biennial conference on innovative data Systems research. Asilomar, CA Google Scholar
  17. 17.
    S A et al (2008) Automatic virtual machine configuration for database workloads. In: Proceedings of the 2008 ACM SIGMOD international conference on Management of data, Vancouver, Canada. pp 953–966 Google Scholar
  18. 18.
    Sakr S et al (2011) A survey of large scale data management approaches in cloud environments. IEEE Commun Surv Tutor PP(99):1–26 Google Scholar
  19. 19.
    Brantner M et al (2008) Building a database on s3. In: Proceedings of the 2008 ACM SIGMOD international conference on management of data. ACM, Vancouver, pp 251–264 CrossRefGoogle Scholar
  20. 20.
    SIRCA, Thomson Reuters news database. http://www.sirca.org.au/ (accessed august 2011)
  21. 21.
    S J et al (2006) Adaptive self-tuning memory in db2. In: Proceedings of the 2006 (32nd) international conference on very large data bases, VLDB Endowment. pp 1081–1092 Google Scholar
  22. 22.
    Narayanan D et al (2005) Continuous resource monitoring for self-predicting dbms. In: Proceedings of the 2005 IEEE international symposium on modeling, analysis, and simulation of computer and telecommunication Systems. IEEE Press, New York Google Scholar
  23. 23.
    C CY et al (1993) Optimal mmi file systems for orthogonal range retrieval. Inf Syst 18(1):37–54 zbMATHCrossRefGoogle Scholar
  24. 24.
    Harris P et al (1993) Optimal dynamic multi-attribute hashing for range queries. BIT Numer Math 33(4):561–579 zbMATHCrossRefGoogle Scholar
  25. 25.
    Lee J et al (1997) A region splitting strategy for physical database design of multidimensional file organizations. In: Proceedings of the 1997 (23rd) international conference on very large data bases. Kaufmann, San Francisco, pp 416–425 Google Scholar

Copyright information

© Springer Science+Business Media, LLC 2011

Authors and Affiliations

  1. 1.School of Computer Science and EngineeringThe University of New South WalesSydneyAustralia
  2. 2.Information Engineering Laboratory, CSIRO ICT Center, Building 108Australian National UniversityCanberraAustralia

Personalised recommendations