Skip to main content

Optimizing Window Aggregate Functions via Random Sampling

  • Conference paper
  • First Online:
Web and Big Data (APWeb-WAIM 2017)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 10367))

Abstract

Window functions have been a part of the SQL standard since 2003 and have been well studied during the past decade. As the demand increases in analytics tools, window functions have seen an increasing amount of potential applications. Although the current mainstream commercial databases support window functions, the existing implementation strategies are inefficient for the real-time processing of big data. Recently, some algorithms based on sampling (e.g., online aggregation) have been proposed to deal with large and complex data in relational databases, which offer us a flexible tradeoff between accuracy and efficiency. However, sampling techniques have not been considered for window functions in databases. In this paper, we first propose two algorithms to deal with window functions based on two sampling techniques, Naive Random Sampling and Incremental Random Sampling. The proposed algorithms are highly efficient and are general enough to aggregate other existing algorithms of window functions. In particular, we evaluated our algorithms in the latest version of PostgreSQL, which demonstrated superior performance over the TPC-H benchmark.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Zuzarte, C., Pirahesh, H., Ma, W., Cheng, Q., Liu, L., Wong, V.: Winmagic: subquery elimination using window aggregation. In: Proceedings of the 2003 ACM SIGMOD International Conference on Management of Data (SIGMOD 2003), pp. 652–656. ACM (2003)

    Google Scholar 

  2. Bellamkonda, S., Ahmed, R., Witkowski, A., Amor, A., Zait, M., Lin, C.-C.: Enhanced subquery optimizations in oracle. Proc. VLDB Endow. 2(2), 1366–1377 (2009)

    Article  Google Scholar 

  3. Ben-Gan, I.: Microsoft SQL Server 2012 High-Performance T-SQL Using Window Functions. Microsoft Press (2012)

    Google Scholar 

  4. Cao, Y., Bramandia, R., Chan, C.Y., Tan, K.-L.: Optimized query evaluation using cooperative sorts. In: Proceedings of the 26th International Conference on Data Engineering, ICDE, Long Beach, California, USA, pp. 601–612. IEEE (2010)

    Google Scholar 

  5. Cao, Y., Chan, C.-Y., Li, J., Tan, K.-L.: Optimization of analytic window functions. Proc. VLDB Endow. 5(11), 1244–1255 (2012)

    Article  Google Scholar 

  6. Cao, Y., Bramandia, R., Chan, C.-Y., Tan, K.-L.: Sort-sharing-aware query processing. VLDB J. 21(3), 411–436 (2012)

    Article  Google Scholar 

  7. Neumann, T., Moerkotte, G.: A combined framework for grouping and order optimization. In: Proceedings of the 30th International Conference on Very Large Data Bases (VLDB 2004), pp. 960–971. VLDB Endowment (2004)

    Google Scholar 

  8. Simmen, D., Shekita, E., Malkemus, T.: Fundamental techniques for order optimization. In: Proceedings of the 1996 ACM SIGMOD International Conference on Management of Data (SIGMOD 1996), pp. 57–67. ACM (1996)

    Google Scholar 

  9. Wang, X.Y., Chernicak, M.: Avoiding sorting and grouping in processing queries. In: Proceedings of the 29th International Conference on Very Large Data Bases (VLDB 2003), pp. 826–837. VLDB Endowment (2003)

    Google Scholar 

  10. Leis, V., Kan, K., Kemper, A., et al.: Efficient processing of window functions in analytical SQL queries. Proc. VLDB Endow. 8(10), 1058–1069 (2015)

    Article  Google Scholar 

  11. Wesley, R., Xu, F.: Incremental computation of common windowed holistic aggregates. Proc. VLDB Endow. 9(12), 1221–1232 (2016)

    Article  Google Scholar 

  12. Hellerstein, J.M., Haas, P.J., Wang, H.J.: Online aggregation. ACM SIGMOD Rec. 26(2), 171–182 (1997)

    Article  Google Scholar 

  13. Xu, F., Jermaine, C.M., Dobra, A.: Confidence bounds for sampling-based group by estimates. ACM Trans. Database Syst. 33(3), 16 (2015)

    Google Scholar 

  14. Li, F., Wu, B., Yi, K., Join, W., et al.: Online aggregation via random walks. In: Proceedings of 35th ACM SIGMOD International Conference on Management of Data (SIGMOD 2016), pp. 615–629. ACM (2016)

    Google Scholar 

  15. Haas, P.J.: Large-sample and deterministic confidence intervals for online aggregation. In: Proceedings of International Conference on Scientific and Statistical Database Management, pp. 51–63. IEEE (1997)

    Google Scholar 

  16. Wu, S., Ooi, B.C., Tan, K.: Continuous sampling for online aggregation over multiple queries. In: SIGMOD, pp. 651–662. ACM (2010)

    Google Scholar 

  17. Olken, F.: Random sampling from databases. Ph.D. thesis, University of California at Berkeley (1993)

    Google Scholar 

  18. Wang, L., Christensen, R., Li, F., et al.: Spatial online sampling and aggregation. Proc. VLDB Endow. 9(3), 84–95 (2016)

    Article  Google Scholar 

  19. Haas, P.J., Hellerstein, J.M.: Ripple joins fro online aggregation. In: Proceedings of ACM SIGMOD International Conference on Management of Data, pp. 287–298. ACM (1999)

    Google Scholar 

  20. Qin, C., Rusu, F.: PF-OLA: a high-performance framework for parallel online aggregation. Distrib. Parallel Databases 32(3), 337–375 (2014)

    Article  Google Scholar 

  21. Window functions for postgreSQL desgin overview (2008). http://www.umitanuki.net/pgsql/wfv08/design.html

  22. Bellamkonda, S., Bozkaya, T., Ghosh, B., Gupta, A., Haydu, J., Subramanian, S., Witkowski, A.: Analytic functions in oracle 8i. Technical report (2000)

    Google Scholar 

  23. Jin, R., Glimcher, L., Jermaine, C., et al.: New sampling-based estimators for OLAP queries. In: International Conference on Data Engineering, p. 18. IEEE (2016)

    Google Scholar 

  24. Joshi, S., Jermaine, C.M.: Sampling-based estimators for subset-based queries. VLDB J. 18(1), 181–202 (2009)

    Article  Google Scholar 

  25. Vitter, J.S.: Random sampling with a reservoir. ACM Trans. Math. Softw. 11(1), 37–57 (1985)

    Article  MathSciNet  MATH  Google Scholar 

  26. Murgai, S.R.: Reference Use Statistics: Statistical Sampling Method Works (University of Tennessee at Chattanooga), p. 54. Southeastern Librarian (2006)

    Google Scholar 

  27. Adcock, B., Hansen, A.C.: Generalized sampling and infinite-dimensional compressed sensing. Found. Comput. Math. 16(5), 1263–1323 (2016)

    Article  MathSciNet  MATH  Google Scholar 

  28. Bengio, S., Vinyals, O., Jaitly, N., et al.: Scheduled Sampling for Sequence Prediction with Recurrent Neural Networks. Computer Science (2015)

    Google Scholar 

Download references

Acknowledgement

This work was supported by NSFC grants (No. 61532021 and 61472141), Shanghai Knowledge Service Platform Project (No. ZF1213), Shanghai Leading Academic Discipline Project (Project Number: B412) and Shanghai Agriculture Applied Technology Development Program (Grant No. G20160201).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xiaoling Wang .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Song, G., Qu, W., Wang, Y., Wang, X. (2017). Optimizing Window Aggregate Functions via Random Sampling. In: Chen, L., Jensen, C., Shahabi, C., Yang, X., Lian, X. (eds) Web and Big Data. APWeb-WAIM 2017. Lecture Notes in Computer Science(), vol 10367. Springer, Cham. https://doi.org/10.1007/978-3-319-63564-4_19

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-63564-4_19

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-63563-7

  • Online ISBN: 978-3-319-63564-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics