Skip to main content
Log in

Query optimization in cloud environments: challenges, taxonomy, and techniques

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

Improving query performance remains one of the most interesting and challenging goals for both the academic and industrial communities. Indeed, cloud computing has complicated the traditional process of query optimization since many new challenges must be considered. Great efforts have been made to address this problem in the context of cloud computing. The present article aims to provide a complete view of query optimization in cloud computing. It provides a systematic survey on query processing in cloud environment through three main phases. It first identifies the specific cloud challenges facing query processing techniques. Then, it reviews and classifies the current query optimization techniques based on a proposed taxonomy. Finally, it compares and discusses the surveyed techniques based on the specific challenges related to the cloud environment. This paper provides readers with some recommendations which must be considered in future work.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2

Similar content being viewed by others

References

  1. Curino C, Jones EPC, Popa RA, Malviya N, Madden E, Wu S, Balakrishnan H, Zeldovich N (2011) Relational cloud: a database-as-a-service for the cloud. In: Proceedings of the 5th Biennial Conference on Innovative Data Systems Research. Pacific Grove, CA, pp 235–241

  2. Mansouri Y, Toosi AN, Buyya R (2018) Data storage management in cloud environments: taxonomy, survey, and future directions. ACM Comput Surv (CSUR) 50(6):91. https://doi.org/10.1145/3136623

    Google Scholar 

  3. Ioannidis YE (1996) Query optimization. ACM Comput Surv (CSUR) 28(1):121–123. https://doi.org/10.1145/234313.234367

    Article  Google Scholar 

  4. DeWitt D, Gray J (1992) Parallel database systems: the future of high performance database systems. Commun ACM 35(6):85–98

    Article  Google Scholar 

  5. Graefe G (1993) Query evaluation techniques for large databases. ACM Comput Surv (CSUR) 25(2):73–169. https://doi.org/10.1145/152610.152611

    Article  Google Scholar 

  6. Ilyas IF, Beskales G, Soliman MA (2008) A survey of top-k query processing techniques in relational database systems. ACM Comput Surv (CSUR) 40(4):11. https://doi.org/10.1145/1391729.1391730

    Article  Google Scholar 

  7. Ordonez C (2010) Optimization of linear recursive queries in SQL. IEEE Trans Knowl Data Eng 22(2):264–277. https://doi.org/10.1109/tkde.2009.83

    Article  Google Scholar 

  8. Bruno N, Jain S, Zhou J (2013) Continuous cloud-scale query optimization and processing. Proc VLDB Endow 6(11):961–972. https://doi.org/10.14778/2536222.2536223

    Article  Google Scholar 

  9. Lee KH, Lee YJ, Choi H, Chung YD, Moon B (2012) Parallel data processing with MapReduce: a survey. ACM SIGMOD Rec 40(4):11–20. https://doi.org/10.1145/2094114.2094118

    Article  Google Scholar 

  10. Costa CM, Sousa AL (2013) Adaptive query processing in cloud database systems. In: 3rd International Conference on Cloud and Green Computing (CGC), 2013. IEEE, pp 201–202. https://doi.org/10.1109/cgc.2013.39

  11. Talha AM, Kamel I, Al Aghbari Z (2017) DISC: query processing on the cloud service provider for dynamic spatial databases. In: IEEE 3rd International Conference on Multimedia Big Data, 2017. IEEE, pp 318–321. https://doi.org/10.1109/bigmm.2017.24

  12. Bu Y, Howe B, Balazinska M, Ernst M (2010) Haloop: efficient iterative data processing on large clusters. Proc VLDB Endow 3(1–2):285–296. https://doi.org/10.14778/1920841.1920881

    Article  Google Scholar 

  13. Sakr S, Liu A, Batista DM, Alomari M (2011) A survey of large scale data management approaches in cloud environments. IEEE Commun Surv Tutor 13(3):311–336. https://doi.org/10.1109/surv.2011.032211.00087

    Article  Google Scholar 

  14. Hashem IAT, Yaqoob I, Anuar NB, Mokhtar S, Gani A, Khan SU (2015) The rise of “big data” on cloud computing: review and open research issues. Inform Syst 47:98–115. https://doi.org/10.1016/j.is.2014.07.006

    Article  Google Scholar 

  15. Attasena V, Darmont J, Harbi N (2017) Secret sharing for cloud data security: a survey. VLDB J 26(5):657–668. https://doi.org/10.1007/s00778-017-0470-9

    Article  Google Scholar 

  16. Gani A, Siddiqa A, Shamshirband S, Hanum F (2016) A survey on indexing techniques for big data: taxonomy and performance evaluation. Knowl Inf Syst 46(2):241–284. https://doi.org/10.1007/s10115-015-0830-y

    Article  Google Scholar 

  17. Mell P, Grance T (2011) The NIST definition of cloud computing. Viewed March 2018

  18. Armbrust M, Fox A, Griffith R, Joseph AD, Katz R, Konwinski A, Zaharia M (2010) A view of cloud computing. Commun ACM 53(4):50–58. https://doi.org/10.1145/1721654.1721672

    Article  Google Scholar 

  19. Mell P, Grance T (2011) The NIST Definition of Cloud Computing. National Institute of Standards and Technology, Version 15

  20. Al-Roomi M, Al-Ebrahim S, Buqrais S, Ahmad I (2013) Cloud computing pricing models: a survey. Int J Grid Distrib Comput 6(5):93–106. https://doi.org/10.14257/ijgdc.2013.6.5.09

    Article  Google Scholar 

  21. Ji C, Li Y, Qiu W, Awada U, Li K (2012) Big data processing in cloud computing environments. In: 12th International Symposium on Pervasive Systems, Algorithms and Networks (ISPAN), 2012. IEEE, pp 17–23. https://doi.org/10.1109/i-span.2012.9

  22. Chandramouli B, Fernandez RC, Goldstein J, Eldawy A, Quamar A (2016) Quill: efficient, transferable, and rich analytics at scale. Proc VLDB Endow 9(14):1623–1634. https://doi.org/10.14778/3007328.3007329

    Article  Google Scholar 

  23. Koh JL, Chen CC, Chan CY, Chen AL (2017) MapReduce skyline query processing with partitioning and distributed dominance tests. Inf Sci 375:114–137. https://doi.org/10.1016/j.ins.2016.09.046

    Article  Google Scholar 

  24. Sellami R, Defude B (2018) Big data integration in cloud environments: requirements, solutions and challenges. NoSQL Data Models Trends Chall 1:93–134. https://doi.org/10.1002/9781119528227.ch4

    Article  Google Scholar 

  25. Wang C, Chen MS (1996) On the complexity of distributed query optimization. IEEE Trans Knowl Data Eng 4:650–662. https://doi.org/10.1109/69.536256

    Article  Google Scholar 

  26. Taylor R (2010) Query optimization for distributed database systems. Thesis, University of Oxford, Oxford, UK

  27. Cai F, Zhu N, He J, Mu P, Li W, Yu Y (2018) Survey of access control models and technologies for cloud computing. Cluster Comput. https://doi.org/10.1007/s10586-018-1850-7

    Google Scholar 

  28. Cheng CL, Sun CJ, Xu XL, Zhang DY (2014) A multi-dimensional index structure based on improved VA-file and CAN in the cloud. Int J Autom Comput 11(1):109–117. https://doi.org/10.1007/s11633-014-0772-y

    Article  Google Scholar 

  29. Shao X, Jibiki M, Teranishi Y, Nishinaga N (2018) An efficient load-balancing mechanism for heterogeneous range-queriable cloud storage. Future Gener Comput Syst 78:920–930. https://doi.org/10.1016/j.future.2017.07.053

    Article  Google Scholar 

  30. Graefe G, Nica A, Stolze K, Neumann T, Eavis T, Petrov I, Fekete D (2013) Elasticity in cloud databases and their query processing. Int J Data Warehous Min (IJDWM) 9(2):1–20. https://doi.org/10.4018/jdwm.2013040101

    Article  Google Scholar 

  31. Da Silva TLC, Nascimento MA, de Macêdo JAF, Sousa FR, Machado JC (2013) Non-intrusive elastic query processing in the cloud. J Comput Sci Technol 28(6):932–947. https://doi.org/10.1007/s11390-013-1389-2

    Article  Google Scholar 

  32. He J, Wen Y, Huang J, Wu D (2014) On the Cost–QoE tradeoff for cloud-based video streaming under Amazon EC2’s pricing models. IEEE Trans Circuits Syst Video Technol 24(4):669–680. https://doi.org/10.1109/tcsvt.2013.2283430

    Article  Google Scholar 

  33. Zhou M, Zhang R, Xie W, Qian W, Zhou A (2010) Security and privacy in cloud computing: a survey. In: 6th International Conference on Semantics Knowledge and Grid (SKG), 2010. IEEE, pp 105–112. https://doi.org/10.1109/skg.2010.19

  34. Han F, Qin J, Hu J (2016) Secure searches in the cloud: a survey. Future Gener Comput Syst 62:66–75. https://doi.org/10.1016/j.future.2016.01.007

    Article  Google Scholar 

  35. Reinsel D, Gantz J, Rydning J (2018) The digitization of the world: from edge to core. IDC White Paper Doc# US44413318. Viewed March 2018

  36. Sebaa A, Nouicer A, Tari A Impact of technology evolution on the materialised views: current issues and future trends. Int J Bus Inform Syst. (in press). https://www.inderscience.com/info/ingeneral/forthcoming.php?jcode=ijbis

  37. Mami I, Bellahsene Z (2012) A survey of view selection methods. ACM SIGMOD Record 41(1):20–29. https://doi.org/10.1145/2206869.2206874

    Article  Google Scholar 

  38. Goldstein J, Larson PÅ (2001) Optimizing queries using materialized views: a practical, scalable solution. ACM SIGMOD Rec 30(2):331–342. https://doi.org/10.1145/376284.375706

    Article  Google Scholar 

  39. Sebaa A, Tari A (2019) Materialized view maintenance: issues, classification, and open challenges. Int J Coop Inform Syst. https://doi.org/10.1142/s0218843019300018

    Google Scholar 

  40. Nguyen TVA, Bimonte S, d’Orazio L, Darmont J (2012). Cost models for view materialization in the cloud. In: Proceedings of the 2012 Joint EDBT/ICDT Workshops. ACM, pp 47–54. https://doi.org/10.1145/2320765.2320788

  41. Qu W, Dessloch S (2014) A real-time materialized view approach for analytic flows in hybrid cloud environments. Datenbank-Spektrum 14(2):97–106. https://doi.org/10.1007/s13222-014-0155-0

    Article  Google Scholar 

  42. Wu S, Jiang D, Ooi BC, Wu KL (2010) Efficient B-tree based indexing for cloud data processing. Proc VLDB Endow 3(1-2):1207–1218. https://doi.org/10.14778/1920841.1920991

    Article  Google Scholar 

  43. Zhou W, Lu J, Luan Z, Wang S, Xue G, Yao S (2014) SNB-index: a SkipNet and B + tree based auxiliary cloud index. Cluster Comput 17(2):453–462. https://doi.org/10.1007/s10586-013-0246-y

    Article  Google Scholar 

  44. Vo HT, Chen C, Ooi BC (2010) Towards elastic transactional cloud storage with range query support. Proc VLDB Endow 3(1–2):506–514. https://doi.org/10.14778/1920841.1920907

    Article  Google Scholar 

  45. Guo T, Papaioannou TG, Aberer K (2014) Efficient indexing and query processing of model-view sensor data in the cloud. Big Data Res 1:52–65. https://doi.org/10.1016/j.bdr.2014.07.005

    Article  Google Scholar 

  46. Li JF, Chen SP, Duan LM, Niu L (2017) A PR-quadtree based multi-dimensional indexing for complex query in a cloud system. Cluster Comput 20(4):2931–2942. https://doi.org/10.1007/s10586-017-0928-y

    Article  Google Scholar 

  47. Li Y, Lai J, Wang C, Zhang J, Xiong J (2017) Verifiable range query processing for cloud computing. In: International Conference on Information Security Practice and Experience, pp 333–349. https://doi.org/10.1007/978-3-319-72359-4_19

  48. Mei Z, Zhu H, Cui Z, Wu Z, Peng G, Wu B, Zhang C (2018) Executing multi-dimensional range query efficiently and flexibly over outsourced ciphertexts in the cloud. Inf Sci 432:79–96. https://doi.org/10.1016/j.ins.2017.11.065

    Article  Google Scholar 

  49. Kamel I, Talha AM, Al Aghbari Z (2017) Dynamic spatial index for efficient query processing on the cloud. J Cloud Comput 6(1):5. https://doi.org/10.1186/s13677-017-0077-0

    Article  Google Scholar 

  50. Zhang X, Ai J, Wang Z, Lu J, Meng X (2009). An efficient multi-dimensional index for cloud data management. In: Proceedings of the 1st International Workshop on Cloud Data Management. Hong Kong, China, pp 17–24. https://doi.org/10.1145/1651263.1651267

  51. Dash D, Kantere V, Ailamaki A (2009) An economic model for self-tuned cloud caching. In: ICDE’09, IEEE 25th International Conference on Data Engineering, 2009. IEEE, pp 1687–1693. https://doi.org/10.1109/icde.2009.143

  52. Ma K, Yang B, Yang Z, Yu Z (2017) Segment access-aware dynamic semantic cache in cloud computing environment. J Parallel Distrib Comput 110:42–51. https://doi.org/10.1016/j.jpdc.2017.04.011

    Article  Google Scholar 

  53. Kumar KA, Quamar A, Deshpande A, Khuller S (2014) SWORD: workload-aware data placement and replica selection for cloud data management systems. VLDB J 23(6):845–870. https://doi.org/10.1007/s00778-014-0362-1

    Article  Google Scholar 

  54. Wang YX, Luo JZ, Song AB, Dong F (2013) Partition-based online aggregation with shared sampling in the cloud. J Comput Sci Technol 28(6):989–1011. https://doi.org/10.1007/s11390-013-1393-6

    Article  Google Scholar 

  55. Oktay KY, Kantarcioglu M, Mehrotra S (2017). Secure and efficient query processing over hybrid clouds. In: IEEE 33rd International Conference on Data Engineering (ICDE), 2017. IEEE, pp 733–744. https://doi.org/10.1109/icde.2017.125

  56. Huang W, Zhang W, Zhang D, Meng L (2017) Elastic spatial query processing in OpenStack cloud computing environment for time-constraint data analysis. ISPRS Int J Geo-Inf 6(3):84. https://doi.org/10.3390/ijgi6030084

    Article  Google Scholar 

  57. Guabtni A, Ranjan R, Rabhi FA (2013) A workload-driven approach to database query processing in the cloud. J Supercomput 63(3):722–736. https://doi.org/10.1007/s11227-011-0717-y

    Article  Google Scholar 

  58. Zhao J, Hu X, Meng X (2010). ESQP: an efficient SQL query processing for cloud data management. In Proceedings of the 2nd International Workshop on Cloud Data Management, pp 1–8. https://doi.org/10.1145/1871929.1871931

  59. Kllapi H, Sitaridi E, Tsangaris MM, Ioannidis Y (2011). Schedule optimization for data processing flows on the cloud. In: Proceedings of the 2011 ACM Sigmod International Conference on Management of data. ACM, pp 289–300. https://doi.org/10.1145/1989323.1989355

  60. Yang D, Feng Y, Yuan Y, Han X, Wang J, Li J (2013) Ad-hoc aggregate query processing algorithms based on bit-store for query intensive applications in cloud computing. Future Gener Comput Syst 29(7):1725–1735. https://doi.org/10.1016/j.future.2012.03.009

    Article  Google Scholar 

  61. Yang HC, Dasdan A, Hsiao R L, Parker DS (2007). Map-reduce-merge: simplified relational data processing on large clusters. In: Proceedings of the 2007 ACM SIGMOD International Conference on Management of Data. ACM, pp 1029–1040. https://doi.org/10.1145/1247480.1247602

  62. Siddiqa A, Hashem IAT, Yaqoob I, Marjani M, Shamshirband S, Gani A, Nasaruddin F (2016) A survey of big data management: taxonomy and state-of-the-art. J Netw Comput Appl 71:151–166. https://doi.org/10.1016/j.jnca.2016.04.008

    Article  Google Scholar 

  63. Ji C, Li Y, Qiu W, Jin Y, Xu Y, Awada U, Qu W (2012) Big data processing: big challenges and opportunities. J Interconnect Netw 13(03n04):1250009. https://doi.org/10.1142/s0219265912500090

    Article  Google Scholar 

  64. Gounaris A, Karampaglis Z, Naskos A, Manolopoulos Y (2014) A bi-objective cost model for optimizing database queries in a multi-cloud environment. J Innov Digit Ecosyst 1(1–2):12–25. https://doi.org/10.1016/j.jides.2015.02.002

    Article  Google Scholar 

  65. Sellami R, Defude B (2018) Complex queries optimization and evaluation over relational and NoSQL data stores in cloud environments. IEEE Trans Big Data 4(2):217–230. https://doi.org/10.1109/tbdata.2017.2719054

    Article  Google Scholar 

  66. Armbrust M, Curtis K, Kraska T, Fox A, Franklin MJ (2011) PIQL: success-tolerant query processing in the cloud. Proc VLDB Endow 5(3):181–192. https://doi.org/10.14778/2078331.2078334

    Article  Google Scholar 

  67. Ding L, Xin J, Wang G (2016) An efficient query processing optimization based on ELM in the cloud. Neural Comput Appl 27(1):35–44. https://doi.org/10.1007/s00521-013-1543-3

    Article  Google Scholar 

  68. Dokeroglu T, Bayir MA, Cosar A (2015) Robust heuristic algorithms for exploiting the common tasks of relational cloud database queries. Appl Soft Comput 30:72–82. https://doi.org/10.1016/j.asoc.2015.01.026

    Article  Google Scholar 

  69. Kolev B, Valduriez P, Bondiombouy C, Jimenez-Peris R, Pau R, Pereira J (2016) CloudMdsQL: querying heterogeneous cloud data stores with a common language. Distrib Parallel Databases 34(4):463–503. https://doi.org/10.1007/s10619-015-7185-y

    Article  Google Scholar 

  70. Silva YN, Larson PA, Zhou J (2012). Exploiting common subexpressions for cloud query processing. In: 2012 IEEE 28th International Conference on Data Engineering (ICDE). IEEE, pp 1337–1348. https://doi.org/10.1109/icde.2012.106

  71. Ge X, Yao B, Guo M, Xu C, Zhou J, Wu C, Xue G (2014) LSShare: an efficient multiple query optimization system in the cloud. Distrib Parallel Databases 32(4):583–605. https://doi.org/10.1007/s10619-014-7150-1

    Article  Google Scholar 

  72. Khattar N, Sidhu J, Singh J (2019) Toward energy-efficient cloud computing: a survey of dynamic power management and heuristics-based optimization techniques. J Supercomput. https://doi.org/10.1007/s11227-019-02764-2

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Abderrazak Sebaa.

Ethics declarations

Conflict of interest

The authors declare that they have no conflicts of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Sebaa, A., Tari, A. Query optimization in cloud environments: challenges, taxonomy, and techniques. J Supercomput 75, 5420–5450 (2019). https://doi.org/10.1007/s11227-019-02806-9

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11227-019-02806-9

Keywords

Navigation