A workload-driven approach to database query processing in the cloud
- 529 Downloads
This paper is concerned with data provisioning services (information search, retrieval, storage, etc.) dealing with a large and heterogeneous information repository. Increasingly, this class of services is being hosted and delivered through Cloud infrastructures. Although such systems are becoming popular, existing resource management methods (e.g. load-balancing techniques) do not consider workload patterns nor do they perform well when subjected to non-uniformly distributed datasets. If these problems can be solved, this class of services can be made to operate in more a scalable, efficient, and reliable manner.
The main contribution of this paper is a approach that combines proprietary cloud-based load balancing techniques and density-based partitioning for efficient range query processing across relational database-as-a-service in cloud computing environments. The study is conducted over a real-world data provisioning service that manages a large historical news database from Thomson Reuters. The proposed approach has been implemented and tested as a multi-tier web application suite consisting of load-balancing, application, and database layers. We have validated our approach by conducting a set of rigorous performance evaluation experiments using the Amazon EC2 infrastructure. The results prove that augmenting a cloud-based load-balancing service (e.g. Amazon Elastic Load Balancer) with workload characterization intelligence (density and distribution of data; composition of queries) offers significant benefits with regards to the overall system’s performance (i.e. query latency and database service throughput).
KeywordsRange query processing Load balancing Data density Cloud computing
Unable to display preview. Download preview PDF.
- 1.Armbrust M et al (2009) Above the clouds: A Berkeley view of cloud computing. Tech Rep UCB/EECS-2009-28, EECS Department. University of California, Berkeley Google Scholar
- 4.Gillett FE et al (2008) Future view: The new tech ecosystems of cloud, cloud services, and cloud computing, Tech rep, Forrester Research, Inc Google Scholar
- 5.Varia J (2009) Cloud architectures, Tech rep, Amazon Web Services Google Scholar
- 6.Windows azure platform. http://www.microsoft.com/azure/ (accessed August 2011)
- 8.Amazon cloudwatch service. http://aws.amazon.com/cloudwatch/ (accessed August 2011)
- 9.Amazon load balancer service. http://aws.amazon.com/elasticloadbalancing/ (accessed August 2011)
- 10.Amazon elastic mapreduce service. http://aws.amazon.com/elasticmapreduce/ (accessed August 2011)
- 11.Force.com cloud solutions (saas). http://www.salesforce.com/platform/ (accessed August 2011)
- 15.Olofson CW (August 2010) Keeping your data in the clouds and your feet on the ground, whitepaper, idc, sponsored by: Sybase Google Scholar
- 16.Curino C et al (2011) Relational cloud: A database service for the cloud. In: 5th biennial conference on innovative data Systems research. Asilomar, CA Google Scholar
- 17.S A et al (2008) Automatic virtual machine configuration for database workloads. In: Proceedings of the 2008 ACM SIGMOD international conference on Management of data, Vancouver, Canada. pp 953–966 Google Scholar
- 18.Sakr S et al (2011) A survey of large scale data management approaches in cloud environments. IEEE Commun Surv Tutor PP(99):1–26 Google Scholar
- 20.SIRCA, Thomson Reuters news database. http://www.sirca.org.au/ (accessed august 2011)
- 21.S J et al (2006) Adaptive self-tuning memory in db2. In: Proceedings of the 2006 (32nd) international conference on very large data bases, VLDB Endowment. pp 1081–1092 Google Scholar
- 22.Narayanan D et al (2005) Continuous resource monitoring for self-predicting dbms. In: Proceedings of the 2005 IEEE international symposium on modeling, analysis, and simulation of computer and telecommunication Systems. IEEE Press, New York Google Scholar
- 25.Lee J et al (1997) A region splitting strategy for physical database design of multidimensional file organizations. In: Proceedings of the 1997 (23rd) international conference on very large data bases. Kaufmann, San Francisco, pp 416–425 Google Scholar