Abstract
Window functions have been a part of the SQL standard since 2003 and have been well studied during the past decade. As the demand increases in analytics tools, window functions have seen an increasing amount of potential applications. Although the current mainstream commercial databases support window functions, the existing implementation strategies are inefficient for the real-time processing of big data. Recently, some algorithms based on sampling (e.g., online aggregation) have been proposed to deal with large and complex data in relational databases, which offer us a flexible tradeoff between accuracy and efficiency. However, sampling techniques have not been considered for window functions in databases. In this paper, we first propose two algorithms to deal with window functions based on two sampling techniques, Naive Random Sampling and Incremental Random Sampling. The proposed algorithms are highly efficient and are general enough to aggregate other existing algorithms of window functions. In particular, we evaluated our algorithms in the latest version of PostgreSQL, which demonstrated superior performance over the TPC-H benchmark.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Zuzarte, C., Pirahesh, H., Ma, W., Cheng, Q., Liu, L., Wong, V.: Winmagic: subquery elimination using window aggregation. In: Proceedings of the 2003 ACM SIGMOD International Conference on Management of Data (SIGMOD 2003), pp. 652–656. ACM (2003)
Bellamkonda, S., Ahmed, R., Witkowski, A., Amor, A., Zait, M., Lin, C.-C.: Enhanced subquery optimizations in oracle. Proc. VLDB Endow. 2(2), 1366–1377 (2009)
Ben-Gan, I.: Microsoft SQL Server 2012 High-Performance T-SQL Using Window Functions. Microsoft Press (2012)
Cao, Y., Bramandia, R., Chan, C.Y., Tan, K.-L.: Optimized query evaluation using cooperative sorts. In: Proceedings of the 26th International Conference on Data Engineering, ICDE, Long Beach, California, USA, pp. 601–612. IEEE (2010)
Cao, Y., Chan, C.-Y., Li, J., Tan, K.-L.: Optimization of analytic window functions. Proc. VLDB Endow. 5(11), 1244–1255 (2012)
Cao, Y., Bramandia, R., Chan, C.-Y., Tan, K.-L.: Sort-sharing-aware query processing. VLDB J. 21(3), 411–436 (2012)
Neumann, T., Moerkotte, G.: A combined framework for grouping and order optimization. In: Proceedings of the 30th International Conference on Very Large Data Bases (VLDB 2004), pp. 960–971. VLDB Endowment (2004)
Simmen, D., Shekita, E., Malkemus, T.: Fundamental techniques for order optimization. In: Proceedings of the 1996 ACM SIGMOD International Conference on Management of Data (SIGMOD 1996), pp. 57–67. ACM (1996)
Wang, X.Y., Chernicak, M.: Avoiding sorting and grouping in processing queries. In: Proceedings of the 29th International Conference on Very Large Data Bases (VLDB 2003), pp. 826–837. VLDB Endowment (2003)
Leis, V., Kan, K., Kemper, A., et al.: Efficient processing of window functions in analytical SQL queries. Proc. VLDB Endow. 8(10), 1058–1069 (2015)
Wesley, R., Xu, F.: Incremental computation of common windowed holistic aggregates. Proc. VLDB Endow. 9(12), 1221–1232 (2016)
Hellerstein, J.M., Haas, P.J., Wang, H.J.: Online aggregation. ACM SIGMOD Rec. 26(2), 171–182 (1997)
Xu, F., Jermaine, C.M., Dobra, A.: Confidence bounds for sampling-based group by estimates. ACM Trans. Database Syst. 33(3), 16 (2015)
Li, F., Wu, B., Yi, K., Join, W., et al.: Online aggregation via random walks. In: Proceedings of 35th ACM SIGMOD International Conference on Management of Data (SIGMOD 2016), pp. 615–629. ACM (2016)
Haas, P.J.: Large-sample and deterministic confidence intervals for online aggregation. In: Proceedings of International Conference on Scientific and Statistical Database Management, pp. 51–63. IEEE (1997)
Wu, S., Ooi, B.C., Tan, K.: Continuous sampling for online aggregation over multiple queries. In: SIGMOD, pp. 651–662. ACM (2010)
Olken, F.: Random sampling from databases. Ph.D. thesis, University of California at Berkeley (1993)
Wang, L., Christensen, R., Li, F., et al.: Spatial online sampling and aggregation. Proc. VLDB Endow. 9(3), 84–95 (2016)
Haas, P.J., Hellerstein, J.M.: Ripple joins fro online aggregation. In: Proceedings of ACM SIGMOD International Conference on Management of Data, pp. 287–298. ACM (1999)
Qin, C., Rusu, F.: PF-OLA: a high-performance framework for parallel online aggregation. Distrib. Parallel Databases 32(3), 337–375 (2014)
Window functions for postgreSQL desgin overview (2008). http://www.umitanuki.net/pgsql/wfv08/design.html
Bellamkonda, S., Bozkaya, T., Ghosh, B., Gupta, A., Haydu, J., Subramanian, S., Witkowski, A.: Analytic functions in oracle 8i. Technical report (2000)
Jin, R., Glimcher, L., Jermaine, C., et al.: New sampling-based estimators for OLAP queries. In: International Conference on Data Engineering, p. 18. IEEE (2016)
Joshi, S., Jermaine, C.M.: Sampling-based estimators for subset-based queries. VLDB J. 18(1), 181–202 (2009)
Vitter, J.S.: Random sampling with a reservoir. ACM Trans. Math. Softw. 11(1), 37–57 (1985)
Murgai, S.R.: Reference Use Statistics: Statistical Sampling Method Works (University of Tennessee at Chattanooga), p. 54. Southeastern Librarian (2006)
Adcock, B., Hansen, A.C.: Generalized sampling and infinite-dimensional compressed sensing. Found. Comput. Math. 16(5), 1263–1323 (2016)
Bengio, S., Vinyals, O., Jaitly, N., et al.: Scheduled Sampling for Sequence Prediction with Recurrent Neural Networks. Computer Science (2015)
Acknowledgement
This work was supported by NSFC grants (No. 61532021 and 61472141), Shanghai Knowledge Service Platform Project (No. ZF1213), Shanghai Leading Academic Discipline Project (Project Number: B412) and Shanghai Agriculture Applied Technology Development Program (Grant No. G20160201).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Song, G., Qu, W., Wang, Y., Wang, X. (2017). Optimizing Window Aggregate Functions via Random Sampling. In: Chen, L., Jensen, C., Shahabi, C., Yang, X., Lian, X. (eds) Web and Big Data. APWeb-WAIM 2017. Lecture Notes in Computer Science(), vol 10367. Springer, Cham. https://doi.org/10.1007/978-3-319-63564-4_19
Download citation
DOI: https://doi.org/10.1007/978-3-319-63564-4_19
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-63563-7
Online ISBN: 978-3-319-63564-4
eBook Packages: Computer ScienceComputer Science (R0)