The VLDB Journal

, Volume 20, Issue 4, pp 589–615 | Cite as

Interaction-aware scheduling of report-generation workloads

  • Mumtaz Ahmad
  • Ashraf Aboulnaga
  • Shivnath Babu
  • Kamesh Munagala
Regular Paper

Abstract

The typical workload in a database system consists of a mix of multiple queries of different types that run concurrently. Interactions among the different queries in a query mix can have a significant impact on database performance. Hence, optimizing database performance requires reasoning about query mixes rather than considering queries individually. Current database systems lack the ability to do such reasoning. We propose a new approach based on planning experiments and statistical modeling to capture the impact of query interactions. Our approach requires no prior assumptions about the internal workings of the database system or the nature and cause of query interactions, making it portable across systems. To demonstrate the potential of modeling and exploiting query interactions, we have developed a novel interaction-aware query scheduler for report-generation workloads. Our scheduler, called QShuffler, uses two query scheduling algorithms that leverage models of query interactions. The first algorithm is optimized for workloads where queries are submitted in large batches. The second algorithm targets workloads where queries arrive continuously, and scheduling decisions have to be made online. We report an experimental evaluation of QShuffler using TPC-H workloads running on IBM DB2. The evaluation shows that QShuffler, by modeling and exploiting query interactions, can consistently outperform (up to 4x) query schedulers in current database systems.

Keywords

Business intelligence Report generation Query interactions Scheduling Experiment-driven performance modeling Workload management 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Aster data systems. http://www.asterdata.com/
  2. 2.
  3. 3.
  4. 4.
  5. 5.
    Ahmad, M., Aboulnaga, A., Babu, S., Munagala, K.: Modeling and exploiting query interactions in database systems. In: CIKM (2008)Google Scholar
  6. 6.
    Ahmad, M., Aboulnaga, A., Babu, S., Munagala, K.: QShuffler: Getting the Query Mix Right. In: ICDE (2008). (poster)Google Scholar
  7. 7.
    Ahmad, M., Aboulnaga, A., Babu, S.: Query interactions in database workloads. In: DBTest Workshop (2009)Google Scholar
  8. 8.
    Roy P., Seshadri S., Sudarshan S., Bhobe S.: Efficient and extensible algorithms for multi query optimization. SIGMOD Rec. 29(2), 249–260 (2000)CrossRefGoogle Scholar
  9. 9.
    O’Gorman K., El Abbadi A., Agrawal D.: Multiple query optimization in middleware using query teamwork. Softw. Pract. Experience 35(4), 361–391 (2005)CrossRefGoogle Scholar
  10. 10.
    Albuitiu, M.C., Kemper, A.: Synergy-based workload management. In: PhD Workshop, VLDB (2009)Google Scholar
  11. 11.
    Conway R.H., Maxwell W.L., Miller L.W.: Theory of Scheduling. Addison-Wesley, Reading, Massachusetts (1967)MATHGoogle Scholar
  12. 12.
    Ibaraki T., Kameda T., Katoh N.: Cautious transaction schedulers for database concurrency control. IEEE Trans. Softw. Eng. 14(7), 997–1009 (1988)CrossRefGoogle Scholar
  13. 13.
    Katoh N., Ibaraki T., Kameda T.: Cautious transaction schedulers with admission control. TODS 10(2), 205–229 (1985)MATHCrossRefGoogle Scholar
  14. 14.
    Abbott R., Garcia-Molina H.: Scheduling real-time transactions. SIGMOD Rec. 17(1), 71–81 (1988)CrossRefGoogle Scholar
  15. 15.
    Abbott, R., Garcia-Molina, H.: Scheduling real-time transactions with disk resident data. In: VLDB (1989)Google Scholar
  16. 16.
    Abbott R.K., Garcia-Molina H.: Scheduling real-time transactions: a performance evaluation. TODS 17(3), 513–560 (1992)CrossRefGoogle Scholar
  17. 17.
    Kang, K.D., Son, S.H., Stankovic, J.A.: Service differentiation in real-time main memory databases. In: Proceedings IEEE International Symposium on Object-Oriented Real-Time Distributed Computing (2002)Google Scholar
  18. 18.
    Pang H., Carey M.J., Livny M.: Multiclass query scheduling in real-time database systems. TKDE 7(4), 533–551 (1995)Google Scholar
  19. 19.
    Carey, M.J., Jauhari, R., Livny, M.: Priority in DBMS resource scheduling. In: VLDB (1989)Google Scholar
  20. 20.
    McWherter, D.T., Schroeder, B., Ailamaki, A., Harchol-Balter, M.: Priority mechanisms for OLTP and transactional web applications. In: ICDE (2004)Google Scholar
  21. 21.
    McWherter, D.T., Schroeder, B., Ailamaki, A., Harchol-Balter, M.: Improving preemptive prioritization via statistical characterization of OLTP locking. In: ICDE (2005)Google Scholar
  22. 22.
    Sacco G.M., Schkolnick M.: Buffer management in relational database systems. TODS 11(4), 473–498 (1986)CrossRefGoogle Scholar
  23. 23.
    Schroeder B., Harchol-Balter M.: Web servers under overload: how scheduling can help. ACM Trans. Internet Technol. 6(1), 20–52 (2006)CrossRefGoogle Scholar
  24. 24.
    Elnikety, S., Nahum, E., Tracey, J., Zwaenepoel, W.: A method for transparent admission control and request scheduling in e-commerce web sites. In: WWW (2004)Google Scholar
  25. 25.
    Kelly, T.: Detecting performance anomalies in global applications. In: Proceedings Workshop on Real, Large Distributed Systems (2005)Google Scholar
  26. 26.
    Stewart, C., Kelly, T., Zhang, A.: Exploiting nonstationarity for performance prediction. In: EuroSys (2007)Google Scholar
  27. 27.
    Zhang, Q., Cherkasova, L., Smirni, E.: A regression-based analytic model for dynamic resource provisioning of multi-tier applications. In: ICAC (2007)Google Scholar
  28. 28.
    Zhang, Q., Cherkasova, L., Mathews, G., Greene, W., Smirni, E.: R-capriccio: a capacity planning and anomaly detection tool for enterprise services with live workloads. In: Middleware (2007)Google Scholar
  29. 29.
    Heiss, H.U., Wagner, R.: Adaptive load control in transaction processing systems. In: VLDB (1991)Google Scholar
  30. 30.
    Schroeder, B., Harchol-Balter, M., Iyengar, A., Nahum, E., Wierman, A.: How to determine a good multi-programming level for external scheduling. In: ICDE (2006)Google Scholar
  31. 31.
    Mönkeberg, A., Weikum, G.: Performance evaluation of an adaptive and robust load control method for the avoidance of data- contention thrashing. In: VLDB (1992)Google Scholar
  32. 32.
    Mehta, A., Gupta, C., Dayal, U.: BI Batch Manager: a system for managing batch workloads on enterprise data warehouses. In: EDBT (2008)Google Scholar
  33. 33.
    Niu, B., Martin, P., Powley, W., Bird, P., Horman, R.: Adapting mixed workloads to meet SLOs in autonomic DBMSs. In: SMDB Workshop, ICDE (2007)Google Scholar
  34. 34.
    Niu B., Martin P., Powley W.: Towards autonomic workload management in DBMSs. J. Database Manag. 20(3), 1–17 (2009)CrossRefGoogle Scholar
  35. 35.
    Ganapathi, A., Kuno, H., Dayal, U., Wiener, J., Fox, A., Jordan, M., Patterson, D.: Predicting multiple metrics for queries: Better decisions enabled by machine learning. In: ICDE (2009)Google Scholar
  36. 36.
    Babu, S., Borisov, N., Duan, S., Herodotou, H., Thummala, V.: Automated experiment-driven management of (database) systems. In: HotOS Workshop (2009)Google Scholar
  37. 37.
    Duan, S., Thummala, V., Babu, S.: Tuning database configuration parameters with iTuned. In: VLDB (2009)Google Scholar
  38. 38.
    Zheng, W., Bianchini, R., Janakiraman, G.J., Santos, J.R., Turner, Y.: JustRunIt: Experiment-based management of virtualized data centers. In: Proceedings USENIX Annual Technical Conference (2009)Google Scholar
  39. 39.
    Belknap, P., Dageville, B., Dias, K., Yagoub, K.: Self-tuning for SQL performance in Oracle database 11g. In: SMDB Workshop, ICDE (2009)Google Scholar
  40. 40.
    Transaction processing performance council (TPC). http://www.tpc.org/
  41. 41.
    Babcock B., Babu S., Datar M., Motwani R., Thomas D.: Operator scheduling in data stream systems. VLDB J. 13(4), 333–353 (2004)CrossRefGoogle Scholar
  42. 42.
    Ryser, H.J.: Combinatorial Mathematics. The Mathematical Association of America (1963)Google Scholar
  43. 43.
    Schrijver, A.: Theory of Linear and Integer Programming. Wiley (1998)Google Scholar
  44. 44.
  45. 45.
    Coady, Y., Cox, R., Detreville, J., Druschel, P., Hellerstein, J., Hume, A., Keeton, K., Nguyen, T., Small, C., Stein, L., Warfield, A.: Falling off the cliff: when systems go nonlinear. In: HotOS Workshop (2005)Google Scholar
  46. 46.
    Zilio, D.C., Rao, J., Lightstone, S., Lohman, G., Storm, A., Garcia-Arellano, C., Fadden, S.: DB2 design advisor: integrated automatic physical database design. In: VLDB (2004)Google Scholar
  47. 47.
    Agrawal, S., Chaudhuri, S., Narasayya, V.R.: Automated selection of materialized views and indexes in SQL databases. In: VLDB (2000)Google Scholar
  48. 48.
    Niu, B., Martin, P., Powley, W., Horman, R., Bird, P.: Workload adaptation in autonomic DBMSs. In: CASCON (2006)Google Scholar
  49. 49.
    Niu, B., Shi, J.: Scalable workload adaptation for mixed workload. In: Infoscale Conference (2009)Google Scholar
  50. 50.
    Loh W.Y.: Regression trees with unbiased variable selection and interaction detection. Stat. Sin. 12, 361–386 (2002)MathSciNetMATHGoogle Scholar
  51. 51.
    Witten I.H., Frank E.: Data Mining: Practical Machine Learning Tools and Techniques. Morgan Kaufmann, San Francisco (2005)MATHGoogle Scholar
  52. 52.
  53. 53.
    Garrod C., Manjhi A., Ailamaki A., Maggs B.M., Mowry T.C., Olston C., Tomasic A.: Scalable query result caching for web applications. PVLDB 1(1), 550–561 (2008)Google Scholar
  54. 54.
    Manjhi, A., Gibbons, P.B., Ailamaki, A., Garrod, C., Maggs, B.M., Mowry, T.C., Olston, C., Tomasic, A., Yu, H.: Invalidation clues for database scalability services. In: ICDE (2007)Google Scholar
  55. 55.
    Ioannidis, Y.: The history of histograms (abridged). In: VLDB (2003)Google Scholar
  56. 56.
    Fano U.: On the theory of ionization yield of radiations in different substances. Phys. Rev. 70, 44–52 (1946)CrossRefGoogle Scholar
  57. 57.
    Cox D.R., Lewis P.A.: Statistical Analysis of Series of Events. Chapman & Hall, London (1966)MATHGoogle Scholar
  58. 58.
    Kaufman L., Rousseeuw P.J.: Finding Groups in Data: An Introduction to Cluster Analysis. John Wiley and Sons, Inc, New York, NY (1990)Google Scholar
  59. 59.
    Skewed TPC-D data generator. ftp://ftp.research.microsoft.com/users/viveknar/TPCDSkew/

Copyright information

© Springer-Verlag 2011

Authors and Affiliations

  • Mumtaz Ahmad
    • 1
  • Ashraf Aboulnaga
    • 1
  • Shivnath Babu
    • 2
  • Kamesh Munagala
    • 2
  1. 1.D.R. Cheriton School of Computer ScienceUniversity of WaterlooWaterlooCanada
  2. 2.Department of Computer ScienceDuke UniversityDurhamUSA

Personalised recommendations