Skip to main content
Log in

Survey on performance optimization for database systems

  • Review
  • Published:
Science China Information Sciences Aims and scope Submit manuscript

Abstract

The performance optimization of database systems has been widely studied for years. From the perspective of the operation and maintenance personnel, it mainly includes three topics: prediction, diagnosis, and tuning. The prediction of future performance can guide the adjustment of configurations and resources. The diagnosis of anomalies can determine the root cause of performance regression. Tuning operations improve performance by adjusting influencing factors, e.g., knobs, indexes, views, resources, and structured query language (SQL) design. In this review, we focus on the performance optimization of database systems and review notable research work on the topics of prediction, diagnosis, and tuning. For prediction, we summarize the techniques, strengths, and limitations of several proposed systems for single and concurrent queries. For diagnosis, we categorize the techniques by the input data, i.e., monitoring metrics, logs, or time metrics, and analyze their abilities. For tuning, we focus on the approaches commonly adopted by the operation and maintenance personnel, i.e., knob tuning, index selection, view materialization, elastic resource, storage management, and SQL antipattern detection. Finally, we discuss some challenges and future work.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Ross R B, Amvrosiadis G, Carns P, et al. Mochi: composing data services for high-performance computing environments. J Comput Sci Technol, 2020, 35: 121–144

    Article  Google Scholar 

  2. Lan H, Bao Z, Peng Y. A survey on advancing the DBMS query optimizer: cardinality estimation, cost model, and plan enumeration. Data Sci Eng, 2021, 6: 86–101

    Article  Google Scholar 

  3. Dong Z Y, Tang C Z, Wang J C, et al. Optimistic transaction processing in deterministic database. J Comput Sci Technol, 2020, 35: 382–394

    Article  Google Scholar 

  4. Zhou X, Chai C, Li G, et al. Database meets artificial intelligence: a survey. IEEE Trans Knowl Data Eng, 2022, 34: 1096–1116

    Article  Google Scholar 

  5. Cooper B F, Silberstein A, Tam E, et al. Benchmarking cloud serving systems with YCSB. In: Proceedings of ACM Symposium on Cloud Computing, 2010. 143–154

  6. Alomari M, Cahill M J, Fekete A D, et al. The cost of serializability on platforms that use snapshot isolation. In: Proceedings of IEEE International Conference on Data Engineering, 2008. 576–585

  7. Leis V, Gubichev A, Mirchev A, et al. How good are query optimizers, really? In: Proceedings of the VLDB Endowment, 2015. 204–215

  8. Ma M, Yin Z, Zhang S, et al. Diagnosing root causes of intermittent slow queries in large-scale cloud databases. In: Proceedings of the VLDB Endowment, 2020. 1176–1189

  9. Mozafari B, Curino C, Jindal A, et al. Performance and resource modeling in highly-concurrent OLTP workloads. In: Proceedings of ACM SIGMOD International Conference on Management of Data, 2013. 301–312

  10. Pavlo A, Angulo G, Arulraj J, et al. Self-driving database management systems. In: Proceedings of Conference on Innovative Data Systems Research, 2017

  11. Ma L, Zhang W, Jiao J, et al. MB2: decomposed behavior modeling for self-driving database management systems. In: Proceedings of ACM SIGMOD International Conference on Management of Data, 2021. 1248–1261

  12. Ganapathi A, Kuno H A, Dayal U, et al. Predicting multiple metrics for queries: better decisions enabled by machine learning. In: Proceedings of IEEE International Conference on Data Engineering, 2009. 592–603

  13. Akdere M, Çetintemel U, Riondato M, et al. Learning-based query performance modeling and prediction. In: Proceedings of IEEE International Conference on Data Engineering, 2012. 390–401

  14. Wu W, Chi Y, Zhu S, et al. Predicting query execution time: are optimizer cost models really unusable? In: Proceedings of IEEE International Conference on Data Engineering, 2013. 1081–1092

  15. Marcus R C, Papaemmanouil O. Plan-structured deep neural network models for query performance prediction. In: Proceedings of the VLDB Endowment, 2019. 1733–1746

  16. Zhou X, Sun J, Li G, et al. Query performance prediction for concurrent queries using graph embedding. In: Proceedings of the VLDB Endowment, 2020. 1416–1428

  17. Bach F R, Jordan M I. Kernel independent component analysis. J Mach Learn Res, 2002, 3: 1–48

    MathSciNet  Google Scholar 

  18. Shawe-Taylor J, Cristianini N. Kernel Methods for Pattern Analysis. Cambridge: Cambridge University Press, 2004

    Book  Google Scholar 

  19. Sun J, Li G. An end-to-end learning-based cost estimator. In: Proceedings of the VLDB Endowment, 2019. 307–319

  20. Mozafari B, Curino C, Madden S. DBSeer: resource and performance prediction for building a next generation database cloud. In: Proceedings of Conference on Innovative Data Systems Research, 2013

  21. Yoon D Y, Mozafari B, Brown D P. DBSeer: pain-free database administration through workload intelligence. In: Proceedings of the VLDB Endowment, 2015. 2036–2039

  22. Ester M, Kriegel H, Sander J, et al. A density-based algorithm for discovering clusters in large spatial databases with noise. In: Proceedings of International Conference on Knowledge Discovery and Data Mining, 1996. 226–231

  23. Thomasian A. On a more realistic lock contention model and its analysis. In: Proceedings of IEEE International Conference on Data Engineering, 1994. 2–9

  24. Scarselli F, Gori M, Tsoi A C, et al. The graph neural network model. IEEE Trans Neural Netw, 2009, 20: 61–80

    Article  Google Scholar 

  25. Yoon D Y, Niu N, Mozafari B. DBSherlock: a performance diagnostic tool for transactional databases. In: Proceedings of ACM SIGMOD International Conference on Management of Data, 2016. 1599–1614

  26. Liu P, Zhang S, Sun Y, et al. FluxInfer: automatic diagnosis of performance anomaly for online database system. In: Proceedings of IEEE International Performance Computing and Communications Conference, 2020. 1–8

  27. Samariya D, Ma J. A new dimensionality-unbiased score for efficient and effective outlying aspect mining. Data Sci Eng, 2022, 7: 120–135

    Article  Google Scholar 

  28. Dundjerski D, Tomasevic M. Automatic database troubleshooting of azure SQL databases. IEEE Trans Cloud Comput, 2022, 10: 1604–1619

    Article  Google Scholar 

  29. Nagaraj K, Killian C E, Neville J. Structured comparative analysis of systems logs to diagnose performance problems. In: Proceedings of USENIX Symposium on Networked Systems Design and Implementation, 2012. 353–366

  30. Glasbergen B, Abebe M, Daudjee K, et al. Sentinel: universal analysis and insight for data systems. In: Proceedings of the VLDB Endowment, 2020. 2720–2733

  31. Dias K, Ramacher M, Shaft U, et al. Automatic performance diagnosis and tuning in oracle. In: Proceedings of Conference on Innovative Data Systems Research, 2005. 84–94

  32. Kalmegh P, Babu S, Roy S. Analyzing query performance and attributing blame for contentions in a cluster computing framework. 2017. ArXiv:1708.08435

  33. Mogul J C, Wilkes J. Nines are not enough: meaningful metrics for clouds. In: Proceedings of ACM Workshop on Hot Topics in Operating Systems, 2019. 136–141

  34. Cao W, Gao Y, Lin B, et al. TcpRT: instrument and diagnostic analysis system for service quality of cloud databases at massive scale in real-time. In: Proceedings of ACM SIGMOD International Conference on Management of Data, 2018. 615–627

  35. Pettitt A N. A non-parametric approach to the change-point problem. Appl Stat, 1979, 28: 126–135

    Article  Google Scholar 

  36. Agrawal R, Imielinski T, Swami A N. Mining association rules between sets of items in large databases. In: Proceedings of ACM SIGMOD International Conference on Management of Data, 1993. 207–216

  37. Kim M, Sumbaly R, Shah S. Root cause detection in a service-oriented architecture. In: Proceedings of ACM SIGMETRICS Performance Evaluation Review, 2013. 93–104

  38. Bentley J L. Multidimensional binary search trees used for associative searching. Commun ACM, 1975, 18: 509–517

    Article  Google Scholar 

  39. Kim B, Rudin C, Shah J A. The Bayesian case model: a generative approach for case-based reasoning and prototype classification. In: Proceedings of Conference and Workshop on Neural Information Processing Systems, 2014. 1952–1960

  40. Xing W, Ghorbani A A. Weighted PageRank algorithm. In: Proceedings of IEEE Conference on Communication Networks and Services Research, 2004. 305–314

  41. Neapolitan R E, et al. Learning Bayesian Networks. Upper Saddle River: Pearson Prentice Hall, 2004

    Google Scholar 

  42. Bernstein P A, Cseri I, Dani N, et al. Adapting Microsoft SQL server for cloud computing. In: Proceedings of IEEE International Conference on Data Engineering, 2011. 1255–1263

  43. Han J, Jia T, Wu Y, et al. Feedback-aware anomaly detection through logs aware anomaly detection through logs for large for large-scale software systems scale software systems. ZTE commun, 2021, 19: 88–94

    Google Scholar 

  44. Heckerman D, Chickering D M, Meek C, et al. Dependency networks for inference, collaborative filtering, and data visualization. J Mach Learn Res, 2000, 1: 49–75

    Google Scholar 

  45. Pele O, Werman M. Fast and robust earth mover’s distances. In: Proceedings of IEEE International Conference on Computer Vision, 2009. 460–467

  46. Kalmegh P, Babu S, Roy S. iQCAR: inter-query contention analyzer for data analytics frameworks. In: Proceedings of ACM SIGMOD International Conference on Management of Data, 2019. 918–935

  47. Storm A J, Garcia-Arellano C, Lightstone S, et al. Adaptive self-tuning memory in DB2. In: Proceedings of the VLDB Endowment, 2006. 1081–1092

  48. Zhu Y, Liu J, Guo M, et al. BestConfig: tapping the performance potential of systems via automatic configuration tuning. In: Proceedings of ACM Symposium on Cloud Computing, 2017. 338–350

  49. Duan S, Thummala V, Babu S. Tuning database configuration parameters with iTunes. In: Proceedings of the VLDB Endowment, 2009. 1246–1257

  50. Aken D V, Pavlo A, Gordon G J, et al. Automatic database management system tuning through large-scale machine learning. In: Proceedings of ACM SIGMOD International Conference on Management of Data, 2017. 1009–1024

  51. Fekry A, Carata L, Pasquier T F J, et al. To tune or not to tune? In search of optimal configurations for data analytics. In: Proceedings of ACM KDD Conference on Knowledge Discovery & Data Mining, 2020. 2494–2504

  52. Kunjir M, Babu S. Black or white? How to develop an autotuner for memory-based analytics. In: Proceedings of ACM SIGMOD International Conference on Management of Data, 2020. 1667–1683

  53. Zhang X, Wu H, Chang Z, et al. ResTune: resource oriented tuning boosted by meta-learning for cloud databases. In: Proceedings of ACM SIGMOD International Conference on Management of Data, 2021. 2102–2114

  54. Zhang J, Liu Y, Zhou K, et al. An end-to-end automatic cloud database tuning system using deep reinforcement learning. In: Proceedings of ACM SIGMOD International Conference on Management of Data, 2019. 415–432

  55. Li G, Zhou X, Li S, et al. Qtune: a query-aware database tuning system with deep reinforcement learning. In: Proceedings of the VLDB Endowment, 2019. 2118–2130

  56. Whang K. Index selection in relational databases. In: Proceedings of Foundations of Data Organization, 1985. 487–500

  57. Chaudhuri S, Narasayya V R. An efficient cost-driven index selection tool for Microsoft SQL server. In: Proceedings of the VLDB Endowment, 1997. 146–155

  58. Chaudhuri S, Narasayya V. Anytime algorithm of database tuning advisor for Microsoft SQL server. 2020. https://www.microsoft.com/en-us/research/publication/anytime-algorithm-of-database-tuning-advisor-for-microsoft-sql-server/

  59. Valentin G, Zuliani M, Zilio D C, et al. DB2 advisor: an optimizer smart enough to recommend its own indexes. In: Proceedings of IEEE International Conference on Data Engineering, 2000. 101–110

  60. Bruno N, Chaudhuri S. Automatic physical database tuning: a relaxation-based approach. In: Proceedings of ACM SIGMOD International Conference on Management of Data, 2005. 227–238

  61. Dash D, Polyzotis N, Ailamaki A. CoPhy: a scalable, portable, and interactive index advisor for large workloads. In: Proceedings of the VLDB Endowment, 2011. 362–372

  62. Schlosser R, Kossmann J, Boissier M. Efficient scalable multi-attribute index selection using recursive strategies. In: Proceedings of IEEE International Conference on Data Engineering, 2019. 1238–1249

  63. Basu D, Lin Q, Chen W, et al. Regularized cost-model oblivious database tuning with reinforcement learning. In: Transactions on Large-Scale Data- and Knowledge-Centered Systems XXVIII, 2016. 28: 96–132

  64. Sadri Z, Gruenwald L, Leal E. DRLindex: deep reinforcement learning index advisor for a cluster database. In: Proceedings of ACM Symposium on International Database Engineering & Applications, 2020. 1–8

  65. Sadri Z, Gruenwald L, Leal E. Online index selection using deep reinforcement learning for a cluster database. In: Proceedings of IEEE International Conference on Data Engineering Workshops, 2020. 158–161

  66. Sharma A, Schuhknecht F M, Dittrich J. The case for automatic database administration using deep reinforcement learning. 2018. ArXiv:1801.05643

  67. Lan H, Bao Z, Peng Y. An index advisor using deep reinforcement learning. In: Proceedings of International Conference on Information and Knowledge Management, 2020. 2105–2108

  68. Ding B, Das S, Marcus R, et al. AI meets AI: leveraging query executions to improve index recommendations. In: Proceedings of ACM SIGMOD International Conference on Management of Data, 2019. 1241–1258

  69. Dökeroglu T, Bayir M A, Cosar A. Robust heuristic algorithms for exploiting the common tasks of relational cloud database queries. Appl Soft Computing, 2015, 30: 72–82

    Article  Google Scholar 

  70. Zilio D C, Zuzarte C, Lightstone S, et al. Recommending materialized views and indexes with the IBM DB2 design advisor. In: Proceedings of International Conference on Autonomic Computing, 2004. 180–187

  71. Jindal A, Karanasos K, Rao S, et al. Selecting subexpressions to materialize at datacenter scale. In: Proceedings of the VLDB Endowment, 2018. 800–812

  72. Jindal A, Qiao S, Patel H, et al. Computation reuse in analytics job service at Microsoft. In: Proceedings of ACM SIGMOD International Conference on Management of Data, 2018. 191–203

  73. Yuan H, Li G, Feng L, et al. Automatic view generation with deep learning and reinforcement learning. In: Proceedings of IEEE International Conference on Data Engineering, 2020. 1501–1512

  74. Liang X, Elmore A J, Krishnan S. Opportunistic view materialization with deep reinforcement learning. 2019. ArXiv:1903.01363

  75. Serafini M, Mansour E, Aboulnaga A, et al. Accordion: elastic scalability for database systems supporting distributed transactions. In: Proceedings of the VLDB Endowment, 2014. 1035–1046

  76. Taft R, Mansour E, Serafini M, et al. E-Store: fine-grained elastic partitioning for distributed transaction processing systems. In: Proceedings of the VLDB Endowment, 2014. 245–256

  77. Serafini M, Taft R, Elmore A J, et al. Clay: fine-grained adaptive partitioning for general database schemas. In: Proceedings of the VLDB Endowment, 2016. 445–456

  78. Marcus R, Papaemmanouil O, Semenova S, et al. NashDB: an end-to-end economic method for elastic database fragmentation, replication, and provisioning. In: Proceedings of ACM SIGMOD International Conference on Management of Data, 2018. 1253–1267

  79. Taft R, El-Sayed N, Serafini M, et al. P-Store: an elastic database system with predictive provisioning. In: Proceedings of ACM SIGMOD International Conference on Management of Data, 2018. 205–219

  80. Das S, Nishimura S, Agrawal D, et al. Albatross: lightweight elasticity in shared storage databases for the cloud using live data migration. In: Proceedings of the VLDB Endowment, 2011. 494–505

  81. Elmore A J, Das S, Agrawal D, et al. Zephyr: live migration in shared nothing databases for elastic cloud platforms. In: Proceedings of ACM SIGMOD International Conference on Management of Data, 2011. 301–312

  82. Elmore A J, Arora V, Taft R, et al. Squall: fine-grained live reconfiguration for partitioned main memory databases. In: Proceedings of ACM SIGMOD International Conference on Management of Data, 2015. 299–313

  83. Lin Y, Pi S, Liao M, et al. MgCrab: transaction crabbing for live migration in deterministic database systems. In: Proceedings of the VLDB Endowment, 2019. 597–610

  84. Ding X, Chen L, Gao Y, et al. UlTraMan: a unified platform for big trajectory data management and analytics. In: Proceedings of the VLDB Endowment, 2018. 787–799

  85. Fang Z, Chen L, Gao Y, et al. Dragoon: a hybrid and efficient big trajectory management system for offline and online analytics. VLDB J, 2021, 30: 287–310

    Article  Google Scholar 

  86. Shao S, Qiu Z, Yu X, et al. Database-access performance antipatterns in database-backed web applications. In: Proceedings of IEEE International Conference on Software Maintenance and Evolution (ICSME), 2020. 58–69

  87. Khumnin P, Senivongse T. SQL antipatterns detection and database refactoring process. In: Proceedings of IEEE/ACIS International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing (SNPD), 2017. 199–205

  88. Dintyala P, Narechania A, Arulraj J. SQLCheck: automated detection and diagnosis of SQL anti-patterns. In: Proceedings of ACM SIGMOD International Conference on Management of Data, 2020. 2331–2345

  89. Ge J K, Chai Y F, Chai Y P. WATuning: a workload-aware tuning system with attention-based deep reinforcement learning. J Comput Sci Technol, 2021, 36: 741–761

    Article  Google Scholar 

  90. Sullivan D G, Seltzer M I, Pfeffer A. Using probabilistic reasoning to automate software tuning. In: Proceedings of the Joint International Conference on Measurement and Modeling of Computer Systems, 2004. 404–405

  91. Zhang X, Chang Z, Li Y, et al. Facilitating database tuning with hyper-parameter optimization: a comprehensive experimental evaluation. In: Proceedings of the VLDB Endowment, 2022. 1808–1821

  92. Tian W, Martin P, Powley W. Techniques for automatically sizing multiple buffer pools in DB2. In: Proceedings of Conference of the Centre for Advanced Studies on Collaborative Research, 2003. 294–302

  93. Narayanan D, Thereska E, Ailamaki A. Continuous resource monitoring for self-predicting DBMS. In: Proceedings of IEEE International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems, 2005. 239–248

  94. Hutter F, Hoos H H, Leyton-Brown K. Sequential model-based optimization for general algorithm configuration. In: Proceedings of International Conference on Learning and Intelligent Optimization, 2011. 507–523

  95. McKay M D. Latin hypercube sampling as a tool in uncertainty analysis of computer models. In: Proceedings of Conference on Winter Simulation, 1992. 557–564

  96. Zaharia M, Chowdhury M, Das T, et al. Resilient distributed datasets: a fault-tolerant abstraction for in-memory cluster computing. In: Proceedings of USENIX Symposium on Networked Systems Design and Implementation, 2012. 15–28

  97. Li Y, Shen Y, Zhang W, et al. OpenBox: a generalized black-box optimization service. In: Proceedings of ACM SIGKDD Conference on Knowledge Discovery & Data Mining, 2021. 3209–3219

  98. Zhang X, Wu H, Li Y, et al. Towards dynamic and safe configuration tuning for cloud databases. In: Proceedings of International Conference on Management of Data, 2022

  99. Lillicrap T P, Hunt J J, Pritzel A, et al. Continuous control with deep reinforcement learning. In: Proceedings of the International Conference on Learning Representations, 2016

  100. Bellman R E. A Markov decision process. J Math Fluid Mech, 1957, 6: 679–684

    Google Scholar 

  101. Schnaitter K, Polyzotis N, Getoor L. Index interactions in physical design tuning: modeling, analysis, and applications. In: Proceedings of the VLDB Endowment, 2009. 1234–1245

  102. Kossmann J, Halfpap S, Jankrift M, et al. Magic mirror in my hand, which is the best in the land? An experimental evaluation of index selection algorithms. In: Proceedings of the VLDB Endowment, 2020. 2382–2395

  103. Puterman M L. Markov Decision Processes: Discrete Stochastic Dynamic Programming. New York: John Wiley & Sons, Inc., 1994

    Book  Google Scholar 

  104. Lagoudakis M G, Parr R. Least-squares policy iteration. J Mach Learn Res, 2003, 4: 1107–1149

    MathSciNet  Google Scholar 

  105. Mnih V, Kavukcuoglu K, Silver D, et al. Playing atari with deep reinforcement learning. 2013. ArXiv:1312.5602

  106. Mnih V, Kavukcuoglu K, Silver D, et al. Human-level control through deep reinforcement learning. Nature, 2015, 518: 529–533

    Article  Google Scholar 

  107. Cosar A, Lim E, Srivastava J. Multiple query optimization with depth-first branch-and-bound and dynamic query ordering. In: Proceedings of International Conference on Information and Knowledge Management, 1993. 433–438

  108. Mitchell M, Holland J H, Forrest S. When will a genetic algorithm outperform hill climbing. In: Proceedings of Conference and Workshop on Neural Information Processing Systems, 1993. 51–58

  109. Lozano M, Herrera F, Krasnogor N, et al. Real-coded memetic algorithms with crossover hill-climbing. Evolary Computation, 2004, 12: 273–302

    Article  Google Scholar 

  110. Tao F, Feng Y, Zhang L, et al. CLPS-GA: a case library and Pareto solution-based hybrid genetic algorithm for energy-aware cloud service scheduling. Appl Soft Computing, 2014, 19: 264–279

    Article  Google Scholar 

  111. Martella C, Logothetis D, Loukas A, et al. Spinner: scalable graph partitioning in the cloud. In: Proceedings of IEEE International Conference on Data Engineering, 2017. 1083–1094

  112. Chaiken R, Jenkins B, Larson P, et al. SCOPE: easy and efficient parallel processing of massive data sets. In: Proceedings of the VLDB Endowment, 2008. 1265–1276

  113. Zhou J, Bruno N, Wu M C, et al. SCOPE: parallel databases meet MapReduce. VLDB J, 2012, 21: 611–636

    Article  Google Scholar 

  114. Ji Y, Chai Y, Zhou X, et al. Smart intra-query fault tolerance for massive parallel processing databases. Data Sci Eng, 2020, 5: 65–79

    Article  Google Scholar 

  115. Mehta M, DeWitt D J. Data placement in shared-nothing parallel database systems. VLDB J, 1997, 6: 53–72

    Article  Google Scholar 

  116. Chen G, He W, Liu J, et al. Energy-aware server provisioning and load dispatching for connection-intensive internet services. In: Proceedings of USENIX Symposium on Networked Systems Design and Implementation, 2012. 337–350

  117. Kallman R, Kimura H, Natkins J, et al. H-store: a high-performance, distributed main memory transaction processing system. In: Proceedings of the VLDB Endowment, 2008. 1496–1499

  118. Hao D, Luo S M, Zhang H S. A distributed in-memory database solution for mass data applications. ZTE Commun, 2020, 8: 45–48

    Google Scholar 

Download references

Acknowledgements

This work was supported by National Natural Science Foundation of China (Grant No. 61832001) and ZTE-PKU Joint Program (Grant No. HV-CN-20210407026).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Bin Cui.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Huang, S., Qin, Y., Zhang, X. et al. Survey on performance optimization for database systems. Sci. China Inf. Sci. 66, 121102 (2023). https://doi.org/10.1007/s11432-021-3578-6

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s11432-021-3578-6

Keywords

Navigation