Abstract
Data center operators face a bewildering set of choices when considering how to provision resources on machines with complex I/O subsystems. Modern I/O subsystems often have a rich mix of fast, high performing, but expensive SSDs sitting alongside with cheaper but relatively slower (for random accesses) traditional hard disk drives. The data center operators need to determine how to provision the I/O resources for specific workloads so as to abide by existing service level agreements, while minimizing the total operating cost (TOC) of running the workload, where the TOC includes the amortized hardware costs and the run-time energy costs. The focus of this paper is on introducing this new problem of TOC-based storage allocation, cast in a framework that is compatible with traditional DBMS query optimization and query processing architecture. We also present a heuristic-based solution to this problem, called DOT. We have implemented DOT in PostgreSQL, and experiments using TPC-H and TPC-C demonstrate significant TOC reduction by DOT in various settings.
Similar content being viewed by others
Notes
A complementary problem is to pick the “right” server hardware from a range of options, for a pre-defined workload. Our framework can be adapted to this problem, and a full exploration of this variant is discussed in Sect. 5.
This problem was first studied in our PVLDB paper [32], and expanded in this extended journal paper.
The queries in this subset include: Q1, Q3, Q4, Q6, Q12, Q13, Q14, Q17, Q18, Q19, Q22.
When the lineitem table is placed on the SSD RAID 0 device, or an even slower storage class (e.g., SSD), the query optimizer chooses the query plan in Table 7 to execute # Query 17, which means that the lineitem table has to be sequentially accessed.
In Table 8, we list some costs for each component in the data center, and we consider and sum up the costs of Networking Equipment, Power Distribution and Cooling, Other Infrastructure and Management, so the total cost is: 294,943+626,211+137,461+105,927 = $1,164,542 per month for 46,000 servers. For each server and with a 36 month hardware lifespan, the cost is \(1,164,542 \div 46,000 *36 = \$ 911\) per server for 36 months.
References
Database test suite. http://osdldbt.sourceforge.net/
Oracle sparc supercluster with t3–4 servers, tpc-c 5.11.0, retrieved on 19-may-2011 http://www.tpc.org/results/individual_results/Oracle/Oracle_SPARC_SuperCluster_with_T3-4s_TPC-C_ES_120210.pdf
Overall data center costs. http://perspectives.mvdirona.com/2010/09/18/OverallDataCenterCosts.aspx
Sql azure service level agreement (sla), retrieved on october 27, 2010. http://go.microsoft.com/fwlink/?LinkId=159706
Towards cost-effective storage provisioning for dbmss: add- endum—query templates and examples. http://pages.cs.wisc.edu/~nzhang/pubs/actual_queries.pdf
Tpc-h homepage. http://www.tpc.org/tpch/
Agrawal, D., Ganesan, D., Sitaraman, R.K., Diao, Y., Singh, S.: Lazy-adaptive tree: An optimized index structure for flash devices. PVLDB 2(1), 361–372 (2009)
Agrawal, S., Chu, E., Narasayya, V.R.: Automatic physical design tuning: workload as a sequence. In: SIGMOD Conference, pp. 683–694 (2006)
Bobroff, N., Kochut, A., Beaty, K.A.: Dynamic placement of virtual machines for managing sla violations. In: Integrated Network Management, pp. 119–128 (2007)
Bruno, N., Chaudhuri, S.: Automatic physical database tuning: a relaxation-based approach. In: SIGMOD Conference, pp. 227–238 (2005)
Bruno, N., Chaudhuri, S.: An online approach to physical design tuning. In: ICDE, pp. 826–835 (2007)
Canim, M., Bhattacharjee, B., Mihaila, G.A., Lang, C.A., Ross, K.A.: An object placement advisor for db2 using solid state storage. PVLDB 2(2), 1318–1329 (2009)
Canim, M., Mihaila, G.A., Bhattacharjee, B., Ross, K.A., Lang, C.A.: Ssd bufferpool extensions for database systems. PVLDB, 3(2), 2010
Chaisiri, S., Lee, B.-S., Niyato, D.: Optimal virtual machine placement across multiple cloud providers. In: APSCC, pp. 103–110 (2009)
Chaudhuri, S., Narasayya, V.R.: Self-tuning database systems: a decade of progress. In: VLDB, pp. 3–14 (2007)
Chi, Y., Moon, H.J., Hacigumus, H.: icbs: Incremental cost-based scheduling under piecewise linear slas. In: PVLDB (2011)
DeCandia, G., Hastorun, D., Jampani, M., Kakulapati, G., Lakshman, A., Pilchin, A., Sivasubramanian, S., Vosshall, P., Vogels, W.: Dynamo: amazon’s highly available key-value store. In: SOSP, pp. 205–220 (2007)
Graefe, G.: The five-minute rule twenty years later, and how flash memory changes the rules. In: DaMoN, p. 6 (2007)
Hamilton, J.R.: Cooperative expendable micro-slice servers (cems): Low cost, low power servers for internet-scale services. In: CIDR (2009)
Hyser, C., McKee, B., Gardner, R., Watson, B. J.: Autonomic virtual machine placement in the data center. HPL-2007-189 (2008)
Koltsidas, I., Viglas, S.: Flashing up the storage layer. 1, 514–525 (2008)
Lee, S.-W., Moon, B.: Design of flash-based dbms: an in-page logging approach. In: SIGMOD Conference, pp. 55–66 (2007)
Lee, S.-W., Moon, B., Park, C., Kim, J.-M., Kim, S.-W.: A case for flash memory ssd in enterprise database applications. In: SIGMOD Conference, pp. 1075–1086 (2008)
Li, Y., He, B., Yang, J., Luo, Q., Yi, K.: Tree indexing on solid state drives. PVLDB 3(1), 1195–1206 (2010)
Ozmen, O., Salem, K., Schindler, J., Daniel, S.: Workload-aware storage layout for database systems. In: SIGMOD Conference, pp. 939–950, (2010)
Polte, M., Simsa, J., Gibson, G.: Enabling enterprise solid state disks performance. In: Workshop on Integrating Solid-state Memory into the Storage Hierarchy (2009)
Ross, K.A.: Modeling the performance of algorithms on flash memory devices. In: DaMoN, pp. 11–16, (2008)
Shah, M.A., Harizopoulos, S., Wiener, J.L., Graefe, G.: Fast scans and joins using flash drives. In: DaMoN, pp. 17–24, (2008)
Soror, A.A., Minhas, U.F., Aboulnaga, A., Salem, K., Kokosielis, P., Kamath, S.: Automatic virtual machine configuration for database workloads. In: SIGMOD Conference, pp. 953–966 (2008)
Tsirogiannis, D., Harizopoulos, S., Shah, M.A., Wiener, J.L., Graefe, G.: Query processing techniques for solid state drives. In: SIGMOD Conference, pp. 59–72 (2009)
Xiong, P., Chi, Y., Zhu, S., Moon, H.J., Pu, C., Hacigumus, H.: Intelligent management of virtualized resources for database management systems in cloud environment. In: ICDE (2011)
Zhang, N., Tatemura, J., Patel, J.M., Hacigümüs, H.: Towards cost-effective storage provisioning for dbmss. PVLDB 5(4), 274–285 (2011)
Acknowledgments
We would like to thank the reviewers of this paper for their insightful feedback on an earlier draft of this paper. This work was supported in part by a gift donation from NEC and by the National Science Foundation under Grant IIS-0963993.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Zhang, N., Tatemura, J., Patel, J.M. et al. Toward cost-effective storage provisioning for DBMSs. The VLDB Journal 23, 329–354 (2014). https://doi.org/10.1007/s00778-013-0334-x
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00778-013-0334-x