DASFAA 2017: Database Systems for Advanced Applications pp 361-375 | Cite as
Query Optimization on Hybrid Storage
Abstract
Thanks to the rapid growth of memory capacity, it is now feasible to perform query processing completely in memory. Nevertheless, as main memory is substantially more expensive than most secondary storage equipments, including HDD and SSD, it is not suitable for storing cold data. Therefore, a hybrid data storage composed of both memory and secondary storage is expected to stay popular in the foreseeable future. In this paper, we introduce a query optimization model for hybrid data storage. Different from traditional query processors, which treat either main memory as a cache or secondary storage as an anti-cache, our model performs semantic data partitioning between memory and secondary storage. Query optimization can thus take the partitioning of data into account, to achieve enhanced performance. We conducted experimental evaluation on a columnar query engine to demonstrate the advantage of the proposed approach.
Notes
Acknowledgement
This work is partially supported by Chinese National High-tech R&D Program (863 Program) (2015AA015307) and the NSFC Porject (No. 61272138).
References
- 1.Akbar, M.M., Rahman, M.S., Kaykobad, M., Manning, E.G., Shoja, G.C.: Solving the multidimensional multiple-choice knapsack problem by constructing convex hulls. Comput. Oper. Res. 33(5), 1259–1273 (2006)MathSciNetCrossRefMATHGoogle Scholar
- 2.Bernstein, P.A., Goodman, N., Wong, E., Reeve, C.L., Rothnie Jr., J.B.: Query processing in a system for distributed databases (sdd-1). ACM TODS 6(4), 602–625 (1981)CrossRefMATHGoogle Scholar
- 3.Boncz, P.A., Zukowski, M., Nes, N.: Monetdb, x100: hyper-pipelining query execution. In: CIDR, pp. 225–237 (2005)Google Scholar
- 4.Ceri, S., Gottlob, G.: Optimizing joins between two partitioned relations in distributed databases. J. Parallel Distrib. Comput. 3(2), 183–205 (1986)CrossRefGoogle Scholar
- 5.Chaudhuri, S.: An overview of query optimization in relational systems. In: Proceedings of the Seventeenth ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, pp. 34–43. ACM (1998)Google Scholar
- 6.Dar, S., Franklin, M.J., Jonsson, B.T., Srivastava, D., Tan, M., et al.: Semantic data caching and replacement. In: Proceedings of VLDB, vol. 96, pp. 330–341. Citeseer (1996)Google Scholar
- 7.DeBrabant, J., Pavlo, A., Tu, S., Stonebraker, M., Zdonik, S.: Anti-caching: a new approach to database management system architecture. Proc. VLDB Endow. 6(14), 1942–1953 (2013)CrossRefGoogle Scholar
- 8.Eldawy, A., Levandoski, J., Larson, P.-Å.: Trekking through siberia: managing cold data in a memory-optimized database. Proc. VLDB Endow. 7(11), 931–942 (2014)CrossRefGoogle Scholar
- 9.Finkelstein, S.: Common expression analysis in database applications. In: Proceedings of SIGMOD, pp. 235–245. ACM (1982)Google Scholar
- 10.Ganguly, S., Hasan, W., Krishnamurthy, R.: Query optimization for parallel execution. In: Proceedings of the SIGMOD, pp. 9–18 (1992)Google Scholar
- 11.Giannikis, G., Alonso, G., Kossmann, D.: Shareddb: killing one thousand queries with one stone. Proc. VLDB Endow. 5(6), 526–537 (2012)CrossRefGoogle Scholar
- 12.Gray, J., Chaudhuri, S., Bosworth, A., Layman, A., Reichart, D., Venkatrao, M., Pellow, F., Pirahesh, H.: Data cube: a relational aggregation operator generalizing group-by, cross-tab, and sub-totals. Data Mining Knowl. Discov. 1(1), 29–53 (1997)CrossRefGoogle Scholar
- 13.Herodotou, H., Borisov, N., Babu, S.: Query optimization techniques for partitioned tables. In: Proceedings of the SIGMOD, pp. 49–60. ACM (2011)Google Scholar
- 14.Kemper, A., Neumann, T.: Hyper: a hybrid OLTP & OLAP main memory database system based on virtual memory snapshots. In: Proceedings of ICDE, pp. 195–206. IEEE (2011)Google Scholar
- 15.Kossmann, D., Franklin, M.J., Drasch, G., Ag, W.: Cache investment: integrating query optimization and distributed data placement. ACM TODS 25(4), 517–558 (2000)CrossRefMATHGoogle Scholar
- 16.Manegold, S., Boncz, P., Kersten, M.L.: Optimizing main-memory join on modern hardware. IEEE TKDE 14(4), 709–730 (2002)Google Scholar
- 17.Manegold, S., Boncz, P., Kersten, M.L.: Generic database cost models for hierarchical memory systems. In Proceedings of VLDB, VLDB 2002, pp. 191–202. VLDB Endowment (2002)Google Scholar
- 18.Neumann, T.: Efficiently compiling efficient query plans for modern hardware. Proc. VLDB Endow. 4(9), 539–550 (2011)CrossRefGoogle Scholar
- 19.Polyzotis, N.: Selectivity-based partitioning: a divide-and-union paradigm for effective query optimization. In: Proceedings of CIKM, pp. 720–727. ACM (2005)Google Scholar
- 20.Rao, J., Ross, K.A.: Making b+-trees cache conscious in main memory. ACM SIGMOD Record 29, 475–486 (2000)CrossRefGoogle Scholar
- 21.Ren, Q., Dunham, M.H., Kumar, V.: Semantic caching and query processing. IEEE TKDE 15(1), 192–210 (2003)Google Scholar
- 22.Sellis, T.K.: Multiple-query optimization. ACM TODS 13(1), 23–52 (1988)CrossRefGoogle Scholar
- 23.Zhang, H., Chen, G., Ooi, B.C., Tan, K.-L., Zhang, M.: In-memory big data management and processing: a survey. IEEE TKDE 27(7), 1920–1948 (2015)Google Scholar
- 24.Zhang, H., Chen, G., Ooi, B.C., Wong, W.-F., Wu, S., Xia, Y.: Anti-caching-based elastic memory management for big data. In: Proceedings of ICDE, pp. 1268–1279. IEEE (2015)Google Scholar
- 25.Zhang, Y., Zhou, X., Zhang, Y., Zhang, Y., Su, M., Wang, S.: Virtual denormalization via array index reference for main memory OLAP. IEEE TKDE 28(4), 1061–1074 (2016)Google Scholar
- 26.Zhou, J., Larson, P.-A., Chaiken, R.: Incorporating partitioning and parallel plans into the scope optimizer. In Proceedings of ICDE, pp. 1060–1071. IEEE (2010)Google Scholar
- 27.Zukowski, M., van de Wiel, M., Boncz, P.: Vectorwise: a vectorized analytical dbms. In: Proceedings of ICDE, pp. 1349–1350. IEEE (2012)Google Scholar