Skip to main content
SpringerLink
Log in
Menu
Find a journal Publish with us
Search
Cart
  1. Home
  2. International Journal of Automation and Computing
  3. Article

Bounded Evaluation: Querying Big Data with Bounded Resources

  • Research Article
  • Open Access
  • Published: 04 July 2020
  • volume 17, pages 502–526 (2020)
Download PDF

You have full access to this open access article

International Journal of Automation and Computing Aims and scope Submit manuscript
Bounded Evaluation: Querying Big Data with Bounded Resources
Download PDF
  • Yang Cao  ORCID: orcid.org/0000-0001-7984-32191,
  • Wen-Fei Fan  ORCID: orcid.org/0000-0001-5149-26561,2,3 &
  • Teng-Fei Yuan1 
  • 698 Accesses

  • 1 Altmetric

  • Explore all metrics

  • Cite this article

Abstract

This work aims to reduce queries on big data to computations on small data, and hence make querying big data possible under bounded resources. A query Q is boundedly evaluable when posed on any big dataset \({\cal D}\), there exists a fraction \({{\cal D}_Q}\) of \({\cal D}\) such that \(Q({\cal D}) = Q({{\cal D}_Q})\), and the cost of identifying \({{\cal D}_Q}\) is independent of the size of \({\cal D}\). It has been shown that with an auxiliary structure known as access schema, many queries in relational algebra (RA) are boundedly evaluable under the set semantics of RA. This paper extends the theory of bounded evaluation to RAaggr, i.e., RA extended with aggregation, under the bag semantics. (1) We extend access schema to bag access schema, to help us identify \({{\cal D}_Q}\) for RAaggr queries Q. (2) While it is undecidable to determine whether an RAaggr query is boundedly evaluable under a bag access schema, we identify special cases that are decidable and practical. (3) In addition, we develop an effective syntax for bounded RAaggr queries, i.e., a core subclass of boundedly evaluable RAaggr queries without sacrificing their expressive power. (4) Based on the effective syntax, we provide efficient algorithms to check the bounded evaluability of RAaggr queries and to generate query plans for bounded RAaggr queries. (5) As proof of concept, we extend PostgreSQL to support bounded evaluation. We experimentally verify that the extended system improves performance by orders of magnitude.

Download to read the full article text

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

References

  1. C. H. Papadimitriou. Computational Complexity, Reading, USA: Addison-Wesley, 1994.

    MATH  Google Scholar 

  2. S. Akateboul, Ft. Hull, V. Vianu. Foundations of Databases, Boston, USA: Addison Wesley, 1995.

    Google Scholar 

  3. R. Horak. Telecommunications and Data Communications Handbook, New York, USA: Wiley, 2007.

    Book  Google Scholar 

  4. W. F. Fan, X. Wang, Y. H. Wu, D. Deng. Distributed graph simulation: Impossibility and possibility. Proceedings of the VLDB Endowment, vol. 7, no. 12, pp. 1083–1094, 2014. DOI: https://doi.org/10.14778/2732977.2732983.

    Article  Google Scholar 

  5. W. F. Fan, F. Geerts, Y. Cao, T. Deng, P. Lu. Querying big data by accessing small data. In Proceedings of the 34th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems, ACM, Melbourne, Victoria, Auatralia, pp. 173–184, 2015. DOI: https://doi.org/10.1145/2745754.2745771.

    Google Scholar 

  6. Y. Cao, W. F. Fan. An effective syntax for bounded relational queries. In Proceedings of 2016 International Conference on Management of Data, ACM, San Francisco, USA, 2016. DOI: https://doi.org/10.1145/2882903.2882942.

    Google Scholar 

  7. The University of Edinburgh. Huawei deal to advance expertise in data science, [Online], Available: https://www.ed.ac.uk/news/2017/huawei-deal-to-advance-expertise-in-data-science, June 14, 2017.

    Google Scholar 

  8. Facebook. Introducing graph search beta, [Online], Available: https://about.fb.com/news/2013/01/introducing-graph-search-beta/, January 15, 2013.

    Google Scholar 

  9. I. Grujic, S. Bogdanovic-Dmic, L. Stoimenov. Collecting and analyzing data from e-government Facebook pages. In ICT Innovations, Ohrid, Macedonia, pp. 86–96, 2014.

  10. Facebook. Newsroom, [Online], Available: http://news-room.fb.com.

  11. R. Ramakrishnan, J. Gehrke. Database Management Systems, 2nd ed., New York, USA: McGraw-Hill Education, 2000.

    MATH  Google Scholar 

  12. J. D. Ullman. Principles of Database Systems, 2nd ed., Computer Science Press, 1982.

  13. A. P. Stolboushkin, M. A. Taitslin. Finite queries do not have efffective syntax. In Proceedings of the 14th ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, ACM, San Jose, USA, pp. 277–285, 1995. DOI: https://doi.org/10.1145/212433.212477.

    Google Scholar 

  14. A. van Gelder, R. W. Topor. Safety and translation of relational calculus queries. ACM Transactions on Database Systems, vol 16, no. 2, pp. 235–278, 1991. DOI: https://doi.org/10.1145/114325.103712.

    Article  MathSciNet  Google Scholar 

  15. TPC. TPC-H, [Online], Available: http://www.tpc.org/tpch/.

  16. W. F. Fan. Making Big Data Small, UK: British Royal Society, 2019. DOI: https://doi.org/10.1098/rspa.2019.0034.

    Book  Google Scholar 

  17. Y. Cao, W. F. Fan. Data driven approximation with bounded resources. Proceedings of the VLDB Endowment, vol. 10, no. 9, pp. 973–984, 2017. DOI: https://doi.org/10.114778/3099622.3099628.

    Article  Google Scholar 

  18. Y. Cao, W. F. Fan, T. F. Yuan. Block as a value for SQL over NoSQL. Proceedings of the VLDB Endowment, vol. 12, no. 10, pp. 1153–1166, 2019. DOI: https://doi.org/10.14778/3339490.3339498.

    Article  Google Scholar 

  19. W. F. Fan, F. Geerts, L. Libkin. On scale independence for querying big data. In Proceedings of the 33rd ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, ACM, Snowbird, USA, 2044. DOI: https://doi.org/10.1145/2594538.2594551.

    Google Scholar 

  20. D. Abadi, P. A. Boncz, S. Harizopoulos, S. Idreos, S. Madden. The design and implementation of modern column-oriented database systems. Foundations and Trends® in Databases, vol. 5, no. 3, pp. 197–280, 2013. DOI: https://doi.org/10.1561/1900000024.

    Article  Google Scholar 

  21. Microsoft SQL server columnstore indexes: Overview, [Online], Available: https://docs.microsoft.com/en-us/sql/relational-databases/indexes/columnstore-indexes-over-view?view=sql-server-ver15.

  22. TPC. TPC-DS, [Online], Available: http://www.tpc.org/tpcds/.

  23. M. R. Garey, D. S. Johnson. Computers and Intractability: a Guide to the Theory of NP-Completeness, San Francisco, USA: W. H. Freeman, 1979.

    MATH  Google Scholar 

  24. M. L. Fredman, R. E. Tarjan. Fibonacci heaps and their uses in improved network optimization algorithms. Journal of the ACM, vol 34, no. 3, pp. 596–615, 9987. DOI: https://doi.org/10.1145/28869.28874.

    Article  MathSciNet  Google Scholar 

  25. Bureau of Transportation Statistics. The carrier on-time performance database, [Online], Available: http://www.transtats.bts.gov/DatabaseInfo.asp?DB_ID=120.

  26. Bureau of Transportation Statistics. The air carrier statistics database, [Online], Available: http://www.transtats.bts.gov/DatabaseInfo.asp?DB_ID=110.

  27. Department for Transport. Anonymised mot tests and results, [Online], Available: http://data.gov.uk/dataset/anonymised_mot_test, January 11, 2019.

  28. Department for Transport. Roadside survey of vehicle observations, [Online], Available: https://data.gov.uk/dataset/52e1e2ab-5687-489b-a4d8-b207cd5d6767/roadside-survey-of-vehicle-observations.

  29. Y. Huhtala, J. Kärkkainen, P. Porkka, H. Toivonen. Tane: An efficient algorithm for discovering functional and approximate dependencies. The Computer Journal, vol. 42, no. 2, pp. 100–111, 1999. DOI: https://doi.org/10.1093/comjnl/42.2.100.

    Article  Google Scholar 

  30. M. Armbrust, A. Fox, D. A Patterson, N. Lanham, B. Trushkowsky, J. Trutna, H. Oh. Scads: Scale-independent storage for social computing applications. In Proceedings of the 4th Biennial Conference on Innovative Data Systems Research, Asilomar, USA, 2009.

  31. M. Armbrust, S. Tu, A. Fox, M. J. Franklin, D. A. Patterson, N. Lanham, B. Trushkowsky, J. Trutna. PIQL: A performance insightful query language. In Proceedings of 2010 ACM SIGMOD International Conference on Management of Data, ACM, Indiana, USA., pp. 1207–1210, 2010. DOI: https://doi.org/10.1145/1807167.1807320.

    Chapter  Google Scholar 

  32. Y. Cao, W. F. Fan, F. Geerts, P. Lu. Bounded query rewriting using views. In Proceedings of the 35th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems, ACM, San Francisco, USA, pp. 107–119, 2016. DOI: https://doi.org/10.1145/2902251.2902294.

    Chapter  Google Scholar 

  33. Y. Cao, W. F. Fan, T. Y. Wo, W. Y. Yu. Bounded conjunctive queries. Proceedings of the VLDB Endowment, vol. 7, no. 12, pp. 1231–1242, 2014. DOI: https://doi.org/10.14778/2732977.2732996.

    Article  Google Scholar 

  34. Y. Cao, W. F. Fan, Y. H. Wang, T. F. Yuan, Y. C. Li, L. Y. Chen. BEAS: Bounded evaluation of SQL queries. In Proceedings of ACM International Conference on Management of Data, ACM, Chicago, USA, pp. 1667–1670, 2017. DOI: https://doi.org/10.1145/3035918.3058748.

    Google Scholar 

  35. S. Acharya, P. B. Gibbons, V. Poosala. Congressional samples for approximate answering of group-by queries. In Proceedings of ACM SIGMOD International Conference on Management of Data, ACM, Dallas, Txxas, USA, pp. 487–498, 2000. DOI: https://doi.org/10.1145/342009.335450.

    Google Scholar 

  36. Y. E. Ioannidis, V. Poosala. Histogram-based approximation of set-valued query-answers. In Proceedings of the 25th International Conference on Very Large Data Bases, Edinburgh, Scotland, UK, pp. 174–185, 1999.

  37. H. V. Jagadish, N. Koudas, S. Muthukrishnan, V. Poosala, K. C. Sevcik, T. Suel. Optimal histograms with quality guarantees. In Proceedings of the 24rd International Conference on Very Large Data Bases, New York City, USA, pp.275-286,2009.

  38. K. Chakrabarti, M. N. Garofalakis, R. Rastogi, K. Shim. Approximate query processing using wavelets. The VLDB Journal, vol. 10, no. 2–3, pp. 199–223, 2001.

    Article  Google Scholar 

  39. G. Cormode, M. Garofalakis. Sketching streams through the net: Distributed approximate query tracking. In Proceedings of the 31st International Conference on Very Large Data Bases, ACM, Trondheim, Norway, 2005.

    Google Scholar 

  40. B. Babcock, S. Chaudhuri, G. Das. Dynamic sample selection for approximate query processing. In Proceedings of ACM SIGMOD International Conference on Management of Data, ACM, San Diego, USA, pp. 539–550, 2003. DOI: https://doi.org/10.1145/872757.872822.

    Google Scholar 

  41. S. Kanduhs, A. Shanbhag, A. Vitorovic, M. Omina, R. Grandl, S. Chaudhuri, B. Ding. Quickr: Lazily approximating complex AdHoc queries in BigData clusters. In Proceedings of International Conference on Management of Data, ACM, San Francisco, USA, pp. 631–646, 2016. DOI: https://doi.org/10.1145/2882903.2882940.

    Google Scholar 

  42. S. Agarwal, B. Mozafari, A. Panda, H. Milner, S. Madden, I. Stoica. BlinkDB: Queries with bounded errors and bounded response times on very large data. In Proceedings of the 8th ACM European Conference on Computer Systems, SCM, Prague, Czech Republic, pp. 29–42, 2013. DOI: https://doi.org/10.1145/2465351.2465355.

    Google Scholar 

  43. C. Li. Computing complete answers to queries in the presence of limited access patterns. The VLDB Journal, vol. 12, no. 3, pp. 211–227, 2003. DOI: https://doi.org/10.1007/s00778-002-0085-6.

    Article  MathSciNet  Google Scholar 

  44. M. Benedikt, J. Leblay, B. ten Cate, E. Tsamoura. Generating Plans from Proofs: Synthesis Lectures on Data Maragement, vol.8, no.1, pp. 1–205, 2016. DOI: https://doi.org/10.2200/S00703ED1V01Y201602DTM043.

    Google Scholar 

  45. A. Nash, B. Ludäscher. Processing first-order queries under limited access patterns. In Proceedings of the 23rd ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, CCM, Pasis, France, pp. 307–318, 2004. DOI: https://doi.org/10.1145/1055558.1055601.

    Google Scholar 

  46. M. S. Kester, M. Athanassoulis, S. Idreos. Access path selection in main-memory optimized data systems: Should I scan or should I probe? In Proceedings of ACM International Conference on Management of Data, ACM, Chicago, USA, pp. 715–730, 2017. DOI: https://doi.org/10.1145/3035918.3064049.

    Google Scholar 

  47. T. Neumann. Query simplification: Graceful degradation for join-order optimization. In Proceedings of ACM SIGMOD International Conference on Management of Data, ACM, Rodee Island, TOA, pp.403–414, 2009. DOI: https://doi.org/10.1145/1559845.1559889.

    Chapter  Google Scholar 

  48. M. Eich, P. Fender, G. Moerkotte. Faster plan generation through consideration off functional dependencies add keys. Proceedings of the VLDB Endowment, vol. 9, no. 10, pp. 756–767, 2016. DOI: https://doi.org/10.14778/2977797.2977802.

    Article  Google Scholar 

  49. B. L. Ding, S. Das, R. Marcus, W. T. Wu, S. Chaudhuri, V. R. Narasayya. AI meets AI: Leveraging query executions to improve index recommendations. In Proceedings of International Conference on Management of Data, ACM, Amsterdam, The Netherlands, pp. 1241–1258, 2019. DOI: https://doi.org/10.1145/3299869.3324957.

    Google Scholar 

  50. T. Kraska, A. Beutel, E. H. Chi, J. Dean, N. Polyzotis. The case for learned index structures. In Proceedings of International Conference on Management of Data, ACM, Houston, USA, pp. 489–504, 2018. DOI: https://doi.org/10.1145/3183713.3196909.

    Google Scholar 

  51. A. Galakatos, M Markovitch, C Binnig, R Fonseca, T. Kraska. Fiting-tree: A data-aware index structure. In Proceedings of 2019 International Conference on Management of Data, ACM, Amsterdam, The Natherlands, pp. 1189–1206, 2019. DOI: https://doi.org/10.1145/3299869.3319860.

    Google Scholar 

  52. R. C. Marcus, P. Negi, H. Z. Mao, C. Zhang, M. Alizadeh, T. Kraska, O. Papaemmanouil, N. Tatbul. Neo: A learned query optimizer. Proceedings of the VLDB Endowment, vol. 12, no. 11, pp. 1705–1718, 2019. DOI: https://doi.org/10.14778/3342263.3342644.

    Article  Google Scholar 

  53. J. Sun, G. Li. An end-to-end learning-based cost estimator. Proceedings of the VLDB Endowment, vol. 13, no. 3, pp. 307–319, 2019. DOI: https://doi.org/10.14778/3368289.3368296.

    MathSciNet  Google Scholar 

  54. I. Trummer, J. Wang, D. Maram, S. Moseley, S. Jo, J. Antonakakis. Skinnerdb: Regret-bounded query evaluation via reinforcement learning. https://arxiv.org/abs/1901.05152v1, 2019. DOI: https://doi.org/10.1145/3299869.3300088.

Download references

Acknowlegements

The authors are supported in part by Royal Society Wolfson Research Merit Award WRM/R1/180014, ERC 652976, EPSRC EP/M025268/1, Shenzhen Institute of Computing Sciences, and Beijing Advanced Innovation Center for Big Data and Brain Computing.

Author information

Authors and Affiliations

  1. University of Edinburgh, Edinburg, EH8 9AB, UK

    Yang Cao, Wen-Fei Fan & Teng-Fei Yuan

  2. Shenzhen Institute of Computing Sciences, Shenzhen University, Shenzhen, 518060, China

    Wen-Fei Fan

  3. Beijing Advanced Innovation Center for Big Data and Brain Computing, Beihang University, Beijing, 100191, China

    Wen-Fei Fan

Authors
  1. Yang Cao
    View author publications

    You can also search for this author in PubMed Google Scholar

  2. Wen-Fei Fan
    View author publications

    You can also search for this author in PubMed Google Scholar

  3. Teng-Fei Yuan
    View author publications

    You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yang Cao.

Additional information

Recommended by Editor-in-Chief Guo-Ping Liu

Yang Cao received the B. Sc. degree from Beihang University, China. He received the Ph.D. degree from University of Edinburgh, UK. He is a faculty member in the School of Informatics, University of Edinburgh, UK. He is the recipient of SIGMOD Research Highlight ward 2018, SIGMOD Best Paper ward 2017, and Microsoft Research Asia Fellowship. His research has been invited to publish in TODS special issues on “Best of SIGMOD 2017” and “Best of PODS 2016”, and in the Computer Journal special issue on “Best of BICOD 2015”.

His research interests include query processing, graph data management and distributed databases.

Wen-Fei Fan received the B. Sc. degree and M.Sc. degree from Peking University China. He received the Ph. D. degree from University of Pennsylvania, USA. He is the Chair of Web Data Management at the University of Edinburgh, UK, the Chief Scientist of Shenzhen Institute of Computing Science, and the Chief Scientist of Beijing Advanced Innovation Center for Big Data and Brain Computing, China. He is a Fellow of the Royal Society (FRS), a Fellow of the Royal Society of Edinburgh (FRSE), a Member of the Academy of Europe (MAE), an ACM Fellow (FACM), and a Foreign Member of Chinese Academy of Sciences. He is a recipient of Royal Society Wolfson Research Merit Award in 2018, ERC Advanced Fellowship in 2015, the Roger Needham Award, UK in 2008, Yangtze River Scholar, China in 2007, the Outstanding Overseas Young Scholar Award, China in 2003, the Career Award, USA in 2001, and several Test-of-Time and Best Paper Awards USA (Alberto O. Mendelzon Test-of-Time Award of ACM PODS 2015 and 2010, Best Paper Awards for SIGMOD 2017, VLDB 2010, ICDE 2007 and Computer Networks 2002).

His research interests include database theory and systems, in particular big data, data quality, data sharing, distributed query processing, query languages, recommender systems and social media marketing.

Teng-Fei Yuan received the B.Eng. degree from Shandong University China. He is Ph.D. degree cadidate in LFCS, School of Informatics, University of Edinburgh UK.

His research interest is development of BEAS, a system for bounded evaluation of SQL queries.

Rights and permissions

Open access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made.

The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and Permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Cao, Y., Fan, WF. & Yuan, TF. Bounded Evaluation: Querying Big Data with Bounded Resources. Int. J. Autom. Comput. 17, 502–526 (2020). https://doi.org/10.1007/s11633-020-1236-1

Download citation

  • Received: 21 February 2020

  • Accepted: 20 April 2020

  • Published: 04 July 2020

  • Issue Date: August 2020

  • DOI: https://doi.org/10.1007/s11633-020-1236-1

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

Keywords

  • Bounded evaluation
  • resource-bounded query processing
  • effective syntax
  • access schema
  • boundedness
Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Advertisement

Search

Navigation

  • Find a journal
  • Publish with us

Discover content

  • Journals A-Z
  • Books A-Z

Publish with us

  • Publish your research
  • Open access publishing

Products and services

  • Our products
  • Librarians
  • Societies
  • Partners and advertisers

Our imprints

  • Springer
  • Nature Portfolio
  • BMC
  • Palgrave Macmillan
  • Apress
  • Your US state privacy rights
  • Accessibility statement
  • Terms and conditions
  • Privacy policy
  • Help and support

Not affiliated

Springer Nature

© 2023 Springer Nature