Advertisement

Query optimization in cloud environments: challenges, taxonomy, and techniques

  • Abderrazak SebaaEmail author
  • Abdelkamel Tari
Article
  • 21 Downloads

Abstract

Improving query performance remains one of the most interesting and challenging goals for both the academic and industrial communities. Indeed, cloud computing has complicated the traditional process of query optimization since many new challenges must be considered. Great efforts have been made to address this problem in the context of cloud computing. The present article aims to provide a complete view of query optimization in cloud computing. It provides a systematic survey on query processing in cloud environment through three main phases. It first identifies the specific cloud challenges facing query processing techniques. Then, it reviews and classifies the current query optimization techniques based on a proposed taxonomy. Finally, it compares and discusses the surveyed techniques based on the specific challenges related to the cloud environment. This paper provides readers with some recommendations which must be considered in future work.

Keywords

Query optimization Query processing Cloud computing Index Views 

Notes

Compliance with ethical standards

Conflict of interest

The authors declare that they have no conflicts of interest.

References

  1. 1.
    Curino C, Jones EPC, Popa RA, Malviya N, Madden E, Wu S, Balakrishnan H, Zeldovich N (2011) Relational cloud: a database-as-a-service for the cloud. In: Proceedings of the 5th Biennial Conference on Innovative Data Systems Research. Pacific Grove, CA, pp 235–241Google Scholar
  2. 2.
    Mansouri Y, Toosi AN, Buyya R (2018) Data storage management in cloud environments: taxonomy, survey, and future directions. ACM Comput Surv (CSUR) 50(6):91.  https://doi.org/10.1145/3136623 Google Scholar
  3. 3.
    Ioannidis YE (1996) Query optimization. ACM Comput Surv (CSUR) 28(1):121–123.  https://doi.org/10.1145/234313.234367 CrossRefGoogle Scholar
  4. 4.
    DeWitt D, Gray J (1992) Parallel database systems: the future of high performance database systems. Commun ACM 35(6):85–98CrossRefGoogle Scholar
  5. 5.
    Graefe G (1993) Query evaluation techniques for large databases. ACM Comput Surv (CSUR) 25(2):73–169.  https://doi.org/10.1145/152610.152611 CrossRefGoogle Scholar
  6. 6.
    Ilyas IF, Beskales G, Soliman MA (2008) A survey of top-k query processing techniques in relational database systems. ACM Comput Surv (CSUR) 40(4):11.  https://doi.org/10.1145/1391729.1391730 CrossRefGoogle Scholar
  7. 7.
    Ordonez C (2010) Optimization of linear recursive queries in SQL. IEEE Trans Knowl Data Eng 22(2):264–277.  https://doi.org/10.1109/tkde.2009.83 CrossRefGoogle Scholar
  8. 8.
    Bruno N, Jain S, Zhou J (2013) Continuous cloud-scale query optimization and processing. Proc VLDB Endow 6(11):961–972.  https://doi.org/10.14778/2536222.2536223 CrossRefGoogle Scholar
  9. 9.
    Lee KH, Lee YJ, Choi H, Chung YD, Moon B (2012) Parallel data processing with MapReduce: a survey. ACM SIGMOD Rec 40(4):11–20.  https://doi.org/10.1145/2094114.2094118 CrossRefGoogle Scholar
  10. 10.
    Costa CM, Sousa AL (2013) Adaptive query processing in cloud database systems. In: 3rd International Conference on Cloud and Green Computing (CGC), 2013. IEEE, pp 201–202.  https://doi.org/10.1109/cgc.2013.39
  11. 11.
    Talha AM, Kamel I, Al Aghbari Z (2017) DISC: query processing on the cloud service provider for dynamic spatial databases. In: IEEE 3rd International Conference on Multimedia Big Data, 2017. IEEE, pp 318–321.  https://doi.org/10.1109/bigmm.2017.24
  12. 12.
    Bu Y, Howe B, Balazinska M, Ernst M (2010) Haloop: efficient iterative data processing on large clusters. Proc VLDB Endow 3(1–2):285–296.  https://doi.org/10.14778/1920841.1920881 CrossRefGoogle Scholar
  13. 13.
    Sakr S, Liu A, Batista DM, Alomari M (2011) A survey of large scale data management approaches in cloud environments. IEEE Commun Surv Tutor 13(3):311–336.  https://doi.org/10.1109/surv.2011.032211.00087 CrossRefGoogle Scholar
  14. 14.
    Hashem IAT, Yaqoob I, Anuar NB, Mokhtar S, Gani A, Khan SU (2015) The rise of “big data” on cloud computing: review and open research issues. Inform Syst 47:98–115.  https://doi.org/10.1016/j.is.2014.07.006 CrossRefGoogle Scholar
  15. 15.
    Attasena V, Darmont J, Harbi N (2017) Secret sharing for cloud data security: a survey. VLDB J 26(5):657–668.  https://doi.org/10.1007/s00778-017-0470-9 CrossRefGoogle Scholar
  16. 16.
    Gani A, Siddiqa A, Shamshirband S, Hanum F (2016) A survey on indexing techniques for big data: taxonomy and performance evaluation. Knowl Inf Syst 46(2):241–284.  https://doi.org/10.1007/s10115-015-0830-y CrossRefGoogle Scholar
  17. 17.
    Mell P, Grance T (2011) The NIST definition of cloud computing. Viewed March 2018Google Scholar
  18. 18.
    Armbrust M, Fox A, Griffith R, Joseph AD, Katz R, Konwinski A, Zaharia M (2010) A view of cloud computing. Commun ACM 53(4):50–58.  https://doi.org/10.1145/1721654.1721672 CrossRefGoogle Scholar
  19. 19.
    Mell P, Grance T (2011) The NIST Definition of Cloud Computing. National Institute of Standards and Technology, Version 15Google Scholar
  20. 20.
    Al-Roomi M, Al-Ebrahim S, Buqrais S, Ahmad I (2013) Cloud computing pricing models: a survey. Int J Grid Distrib Comput 6(5):93–106.  https://doi.org/10.14257/ijgdc.2013.6.5.09 CrossRefGoogle Scholar
  21. 21.
    Ji C, Li Y, Qiu W, Awada U, Li K (2012) Big data processing in cloud computing environments. In: 12th International Symposium on Pervasive Systems, Algorithms and Networks (ISPAN), 2012. IEEE, pp 17–23.  https://doi.org/10.1109/i-span.2012.9
  22. 22.
    Chandramouli B, Fernandez RC, Goldstein J, Eldawy A, Quamar A (2016) Quill: efficient, transferable, and rich analytics at scale. Proc VLDB Endow 9(14):1623–1634.  https://doi.org/10.14778/3007328.3007329 CrossRefGoogle Scholar
  23. 23.
    Koh JL, Chen CC, Chan CY, Chen AL (2017) MapReduce skyline query processing with partitioning and distributed dominance tests. Inf Sci 375:114–137.  https://doi.org/10.1016/j.ins.2016.09.046 CrossRefGoogle Scholar
  24. 24.
    Sellami R, Defude B (2018) Big data integration in cloud environments: requirements, solutions and challenges. NoSQL Data Models Trends Chall 1:93–134.  https://doi.org/10.1002/9781119528227.ch4 CrossRefGoogle Scholar
  25. 25.
    Wang C, Chen MS (1996) On the complexity of distributed query optimization. IEEE Trans Knowl Data Eng 4:650–662.  https://doi.org/10.1109/69.536256 CrossRefGoogle Scholar
  26. 26.
    Taylor R (2010) Query optimization for distributed database systems. Thesis, University of Oxford, Oxford, UKGoogle Scholar
  27. 27.
    Cai F, Zhu N, He J, Mu P, Li W, Yu Y (2018) Survey of access control models and technologies for cloud computing. Cluster Comput.  https://doi.org/10.1007/s10586-018-1850-7 Google Scholar
  28. 28.
    Cheng CL, Sun CJ, Xu XL, Zhang DY (2014) A multi-dimensional index structure based on improved VA-file and CAN in the cloud. Int J Autom Comput 11(1):109–117.  https://doi.org/10.1007/s11633-014-0772-y CrossRefGoogle Scholar
  29. 29.
    Shao X, Jibiki M, Teranishi Y, Nishinaga N (2018) An efficient load-balancing mechanism for heterogeneous range-queriable cloud storage. Future Gener Comput Syst 78:920–930.  https://doi.org/10.1016/j.future.2017.07.053 CrossRefGoogle Scholar
  30. 30.
    Graefe G, Nica A, Stolze K, Neumann T, Eavis T, Petrov I, Fekete D (2013) Elasticity in cloud databases and their query processing. Int J Data Warehous Min (IJDWM) 9(2):1–20.  https://doi.org/10.4018/jdwm.2013040101 CrossRefGoogle Scholar
  31. 31.
    Da Silva TLC, Nascimento MA, de Macêdo JAF, Sousa FR, Machado JC (2013) Non-intrusive elastic query processing in the cloud. J Comput Sci Technol 28(6):932–947.  https://doi.org/10.1007/s11390-013-1389-2 CrossRefGoogle Scholar
  32. 32.
    He J, Wen Y, Huang J, Wu D (2014) On the Cost–QoE tradeoff for cloud-based video streaming under Amazon EC2’s pricing models. IEEE Trans Circuits Syst Video Technol 24(4):669–680.  https://doi.org/10.1109/tcsvt.2013.2283430 CrossRefGoogle Scholar
  33. 33.
    Zhou M, Zhang R, Xie W, Qian W, Zhou A (2010) Security and privacy in cloud computing: a survey. In: 6th International Conference on Semantics Knowledge and Grid (SKG), 2010. IEEE, pp 105–112.  https://doi.org/10.1109/skg.2010.19
  34. 34.
    Han F, Qin J, Hu J (2016) Secure searches in the cloud: a survey. Future Gener Comput Syst 62:66–75.  https://doi.org/10.1016/j.future.2016.01.007 CrossRefGoogle Scholar
  35. 35.
    Reinsel D, Gantz J, Rydning J (2018) The digitization of the world: from edge to core. IDC White Paper Doc# US44413318. Viewed March 2018Google Scholar
  36. 36.
    Sebaa A, Nouicer A, Tari A Impact of technology evolution on the materialised views: current issues and future trends. Int J Bus Inform Syst. (in press). https://www.inderscience.com/info/ingeneral/forthcoming.php?jcode=ijbis
  37. 37.
    Mami I, Bellahsene Z (2012) A survey of view selection methods. ACM SIGMOD Record 41(1):20–29.  https://doi.org/10.1145/2206869.2206874 CrossRefGoogle Scholar
  38. 38.
    Goldstein J, Larson PÅ (2001) Optimizing queries using materialized views: a practical, scalable solution. ACM SIGMOD Rec 30(2):331–342.  https://doi.org/10.1145/376284.375706 CrossRefGoogle Scholar
  39. 39.
    Sebaa A, Tari A (2019) Materialized view maintenance: issues, classification, and open challenges. Int J Coop Inform Syst.  https://doi.org/10.1142/s0218843019300018 Google Scholar
  40. 40.
    Nguyen TVA, Bimonte S, d’Orazio L, Darmont J (2012). Cost models for view materialization in the cloud. In: Proceedings of the 2012 Joint EDBT/ICDT Workshops. ACM, pp 47–54.  https://doi.org/10.1145/2320765.2320788
  41. 41.
    Qu W, Dessloch S (2014) A real-time materialized view approach for analytic flows in hybrid cloud environments. Datenbank-Spektrum 14(2):97–106.  https://doi.org/10.1007/s13222-014-0155-0 CrossRefGoogle Scholar
  42. 42.
    Wu S, Jiang D, Ooi BC, Wu KL (2010) Efficient B-tree based indexing for cloud data processing. Proc VLDB Endow 3(1-2):1207–1218.  https://doi.org/10.14778/1920841.1920991 CrossRefGoogle Scholar
  43. 43.
    Zhou W, Lu J, Luan Z, Wang S, Xue G, Yao S (2014) SNB-index: a SkipNet and B + tree based auxiliary cloud index. Cluster Comput 17(2):453–462.  https://doi.org/10.1007/s10586-013-0246-y CrossRefGoogle Scholar
  44. 44.
    Vo HT, Chen C, Ooi BC (2010) Towards elastic transactional cloud storage with range query support. Proc VLDB Endow 3(1–2):506–514.  https://doi.org/10.14778/1920841.1920907 CrossRefGoogle Scholar
  45. 45.
    Guo T, Papaioannou TG, Aberer K (2014) Efficient indexing and query processing of model-view sensor data in the cloud. Big Data Res 1:52–65.  https://doi.org/10.1016/j.bdr.2014.07.005 CrossRefGoogle Scholar
  46. 46.
    Li JF, Chen SP, Duan LM, Niu L (2017) A PR-quadtree based multi-dimensional indexing for complex query in a cloud system. Cluster Comput 20(4):2931–2942.  https://doi.org/10.1007/s10586-017-0928-y CrossRefGoogle Scholar
  47. 47.
    Li Y, Lai J, Wang C, Zhang J, Xiong J (2017) Verifiable range query processing for cloud computing. In: International Conference on Information Security Practice and Experience, pp 333–349.  https://doi.org/10.1007/978-3-319-72359-4_19
  48. 48.
    Mei Z, Zhu H, Cui Z, Wu Z, Peng G, Wu B, Zhang C (2018) Executing multi-dimensional range query efficiently and flexibly over outsourced ciphertexts in the cloud. Inf Sci 432:79–96.  https://doi.org/10.1016/j.ins.2017.11.065 CrossRefGoogle Scholar
  49. 49.
    Kamel I, Talha AM, Al Aghbari Z (2017) Dynamic spatial index for efficient query processing on the cloud. J Cloud Comput 6(1):5.  https://doi.org/10.1186/s13677-017-0077-0 CrossRefGoogle Scholar
  50. 50.
    Zhang X, Ai J, Wang Z, Lu J, Meng X (2009). An efficient multi-dimensional index for cloud data management. In: Proceedings of the 1st International Workshop on Cloud Data Management. Hong Kong, China, pp 17–24.  https://doi.org/10.1145/1651263.1651267
  51. 51.
    Dash D, Kantere V, Ailamaki A (2009) An economic model for self-tuned cloud caching. In: ICDE’09, IEEE 25th International Conference on Data Engineering, 2009. IEEE, pp 1687–1693.  https://doi.org/10.1109/icde.2009.143
  52. 52.
    Ma K, Yang B, Yang Z, Yu Z (2017) Segment access-aware dynamic semantic cache in cloud computing environment. J Parallel Distrib Comput 110:42–51.  https://doi.org/10.1016/j.jpdc.2017.04.011 CrossRefGoogle Scholar
  53. 53.
    Kumar KA, Quamar A, Deshpande A, Khuller S (2014) SWORD: workload-aware data placement and replica selection for cloud data management systems. VLDB J 23(6):845–870.  https://doi.org/10.1007/s00778-014-0362-1 CrossRefGoogle Scholar
  54. 54.
    Wang YX, Luo JZ, Song AB, Dong F (2013) Partition-based online aggregation with shared sampling in the cloud. J Comput Sci Technol 28(6):989–1011.  https://doi.org/10.1007/s11390-013-1393-6 CrossRefGoogle Scholar
  55. 55.
    Oktay KY, Kantarcioglu M, Mehrotra S (2017). Secure and efficient query processing over hybrid clouds. In: IEEE 33rd International Conference on Data Engineering (ICDE), 2017. IEEE, pp 733–744.  https://doi.org/10.1109/icde.2017.125
  56. 56.
    Huang W, Zhang W, Zhang D, Meng L (2017) Elastic spatial query processing in OpenStack cloud computing environment for time-constraint data analysis. ISPRS Int J Geo-Inf 6(3):84.  https://doi.org/10.3390/ijgi6030084 CrossRefGoogle Scholar
  57. 57.
    Guabtni A, Ranjan R, Rabhi FA (2013) A workload-driven approach to database query processing in the cloud. J Supercomput 63(3):722–736.  https://doi.org/10.1007/s11227-011-0717-y CrossRefGoogle Scholar
  58. 58.
    Zhao J, Hu X, Meng X (2010). ESQP: an efficient SQL query processing for cloud data management. In Proceedings of the 2nd International Workshop on Cloud Data Management, pp 1–8.  https://doi.org/10.1145/1871929.1871931
  59. 59.
    Kllapi H, Sitaridi E, Tsangaris MM, Ioannidis Y (2011). Schedule optimization for data processing flows on the cloud. In: Proceedings of the 2011 ACM Sigmod International Conference on Management of data. ACM, pp 289–300.  https://doi.org/10.1145/1989323.1989355
  60. 60.
    Yang D, Feng Y, Yuan Y, Han X, Wang J, Li J (2013) Ad-hoc aggregate query processing algorithms based on bit-store for query intensive applications in cloud computing. Future Gener Comput Syst 29(7):1725–1735.  https://doi.org/10.1016/j.future.2012.03.009 CrossRefGoogle Scholar
  61. 61.
    Yang HC, Dasdan A, Hsiao R L, Parker DS (2007). Map-reduce-merge: simplified relational data processing on large clusters. In: Proceedings of the 2007 ACM SIGMOD International Conference on Management of Data. ACM, pp 1029–1040.  https://doi.org/10.1145/1247480.1247602
  62. 62.
    Siddiqa A, Hashem IAT, Yaqoob I, Marjani M, Shamshirband S, Gani A, Nasaruddin F (2016) A survey of big data management: taxonomy and state-of-the-art. J Netw Comput Appl 71:151–166.  https://doi.org/10.1016/j.jnca.2016.04.008 CrossRefGoogle Scholar
  63. 63.
    Ji C, Li Y, Qiu W, Jin Y, Xu Y, Awada U, Qu W (2012) Big data processing: big challenges and opportunities. J Interconnect Netw 13(03n04):1250009.  https://doi.org/10.1142/s0219265912500090 CrossRefGoogle Scholar
  64. 64.
    Gounaris A, Karampaglis Z, Naskos A, Manolopoulos Y (2014) A bi-objective cost model for optimizing database queries in a multi-cloud environment. J Innov Digit Ecosyst 1(1–2):12–25.  https://doi.org/10.1016/j.jides.2015.02.002 CrossRefGoogle Scholar
  65. 65.
    Sellami R, Defude B (2018) Complex queries optimization and evaluation over relational and NoSQL data stores in cloud environments. IEEE Trans Big Data 4(2):217–230.  https://doi.org/10.1109/tbdata.2017.2719054 CrossRefGoogle Scholar
  66. 66.
    Armbrust M, Curtis K, Kraska T, Fox A, Franklin MJ (2011) PIQL: success-tolerant query processing in the cloud. Proc VLDB Endow 5(3):181–192.  https://doi.org/10.14778/2078331.2078334 CrossRefGoogle Scholar
  67. 67.
    Ding L, Xin J, Wang G (2016) An efficient query processing optimization based on ELM in the cloud. Neural Comput Appl 27(1):35–44.  https://doi.org/10.1007/s00521-013-1543-3 CrossRefGoogle Scholar
  68. 68.
    Dokeroglu T, Bayir MA, Cosar A (2015) Robust heuristic algorithms for exploiting the common tasks of relational cloud database queries. Appl Soft Comput 30:72–82.  https://doi.org/10.1016/j.asoc.2015.01.026 CrossRefGoogle Scholar
  69. 69.
    Kolev B, Valduriez P, Bondiombouy C, Jimenez-Peris R, Pau R, Pereira J (2016) CloudMdsQL: querying heterogeneous cloud data stores with a common language. Distrib Parallel Databases 34(4):463–503.  https://doi.org/10.1007/s10619-015-7185-y CrossRefGoogle Scholar
  70. 70.
    Silva YN, Larson PA, Zhou J (2012). Exploiting common subexpressions for cloud query processing. In: 2012 IEEE 28th International Conference on Data Engineering (ICDE). IEEE, pp 1337–1348.  https://doi.org/10.1109/icde.2012.106
  71. 71.
    Ge X, Yao B, Guo M, Xu C, Zhou J, Wu C, Xue G (2014) LSShare: an efficient multiple query optimization system in the cloud. Distrib Parallel Databases 32(4):583–605.  https://doi.org/10.1007/s10619-014-7150-1 CrossRefGoogle Scholar
  72. 72.
    Khattar N, Sidhu J, Singh J (2019) Toward energy-efficient cloud computing: a survey of dynamic power management and heuristics-based optimization techniques. J Supercomput.  https://doi.org/10.1007/s11227-019-02764-2 Google Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2019

Authors and Affiliations

  1. 1.LIMED Laboratory, Faculty of Exact SciencesUniversity of BejaiaBejaiaAlgeria

Personalised recommendations