Tpc-h benchmark (2017). http://www.tpc.org/tpch/
Agarwal, S., Milner, H., Kleiner, A., Talwalkar, A., Jordan, M.I., Madden, S., Mozafari, B., Stoica, I.: Knowing when you’re wrong: building fast and reliable approximate query processing systems. In: International Conference on Management of Data, SIGMOD 2014, Snowbird, UT, USA, 22–27 June 2014, pp. 481–492 (2014). https://doi.org/10.1145/2588555.2593667
Agarwal, S., Mozafari, B., Panda, A., Milner, H., Madden, S., Stoica, I.: Blinkdb: queries with bounded errors and bounded response times on very large data. In: Eighth Eurosys Conference 2013, EuroSys ’13, Prague, Czech Republic, 14–17 April 2013, pp. 29–42 (2013). https://doi.org/10.1145/2465351.2465355
Alabi, D., Wu, E.: Pfunk-h: approximate query processing using perceptual models. In: Proceedings of the Workshop on Human-In-the-Loop Data Analytics, HILDA@SIGMOD 2016, San Francisco, CA, USA, 26 June–01 July 2016, p. 10 (2016) https://doi.org/10.1145/2939502.2939512
Amaran, S., Sahinidis, N.V., Sharda, B., Bury, S.J.: Simulation optimization: A review of algorithms and applications. CoRR (2017). arxiv:1706.08591
Bhatia, R., Davis, C.: A better bound on the variance. Am. Math. Mon. 107(4), 353–357 (2000). http://www.jstor.org/stable/2589180
Box, G.E.P., Draper, N.R.: Empirical Model-Building and Response Surface. Wiley, New York (1986)
MATH
Google Scholar
Boyd, S., Vandenberghe, L.: Convex Optimization. Cambridge University Press, Cambridge (2004)
Book
Google Scholar
Casella, G., Berger, R.L.: Statistical Inference. Duxbury Advanced Series in Statistics and Decision Sciences. Thomson Learning, Pacific Grove (2002)
Chen, X.: A New Generalization of Chebyshev Inequality for Random Vectors. arXiv e-prints (2007)
Chung, F., Lu, L.: Concentration inequalities and martingale inequalities: a survey. Internet Math. 3(1), 79–127 (2006)
MathSciNet
Article
Google Scholar
Cormode, G.: Data sketching. Commun. ACM 60(9), 48–55 (2017). https://doi.org/10.1145/3080008
DiCiccio, T.J., Efron, B.: Bootstrap confidence intervals. Stat. Sci. 11(3), 189–228 (1996). https://doi.org/10.1214/ss/1032280214
MathSciNet
Article
MATH
Google Scholar
Ding, B., Huang, S., Chaudhuri, S., Chakrabarti, K., Wang, C.: Sample + seek: approximating aggregates with distribution precision guarantee. In: Proceedings of the 2016 International Conference on Management of Data, SIGMOD Conference 2016, San Francisco, CA, USA, 26 June–01 July 2016, pp. 679–694 (2016). https://doi.org/10.1145/2882903.2915249
Erlandson, E.: Faster random samples with gap sampling (2014). http://erikerlandson.github.io/blog/2014/09/11/faster-random-samples-with-gap-sampling/
Gryz, J., Guo, J., Liu, L., Zuzarte, C.: Query sampling in DB2 universal database. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, Paris, France, 13–18 June 2004, pp. 839–843 (2004). https://doi.org/10.1145/1007568.1007664
Hall, P.: The Bootstrap and Edgeworth Expansion. Springer Series in Statistics. Springer, New York (1997)
Hellerstein, J.M., Haas, P.J., Wang, H.J.: Online aggregation. In: SIGMOD 1997, Proceedings ACM SIGMOD International Conference on Management of Data, 13–15 May 1997, Tucson, AZ, USA., pp. 171–182 (1997). https://doi.org/10.1145/253260.253291
Hoeffding, W.: A class of statistics with asymptotically normal distribution. Ann. Math. Stat. 19(3), 293–325 (1948). https://doi.org/10.1214/aoms/1177730196
MathSciNet
Article
MATH
Google Scholar
Hoeffding, W.: Probability inequalities for sums of bounded random variables. J. Am. Stat. Assoc. 58(301), 13–30 (1963)
Huber, P.J.: Robust estimation of a location parameter. Ann. Math. Stat. 35(1), 73–101 (1964). https://doi.org/10.1214/aoms/1177703732
MathSciNet
Article
MATH
Google Scholar
Inc., S.: Sample selection (2017). https://snappydatainc.github.io/snappydata/sde/sample_selection/
Kerrisk, M.: The Linux Programming Interface. No Starch Press Series. No Starch Press, San Francisco (2010)
Kim, A., Blais, E., Parameswaran, A.G., Indyk, P., Madden, S., Rubinfeld, R.: Rapid sampling for visualizations with ordering guarantees. PVLDB 8(5), 521–532 (2015). http://www.vldb.org/pvldb/vol8/p521-kim.pdf
Kreyszig, E.: Introductory Functional Analysis with Applications. Wiley Classics Library. Wiley, New York (1989)
Krishnan, S., Wang, J., Franklin, M.J., Goldberg, K., Kraska, T.: Privateclean: Data cleaning and differential privacy. In: Proceedings of the 2016 International Conference on Management of Data, SIGMOD Conference 2016, San Francisco, CA, USA, 26 June–01 July 2016, pp. 937–951 (2016). https://doi.org/10.1145/2882903.2915248
Li, F., Wu, B., Yi, K., Zhao, Z.: Wander join: Online aggregation via random walks. In: Proceedings of the 2016 International Conference on Management of Data, SIGMOD Conference 2016, San Francisco, CA, USA, 26 June–01 July 2016, pp. 615–629 (2016). https://doi.org/10.1145/2882903.2915235
Lohr, S.L.: Sampling: Design and Analysis. Advanced (Cengage Learning). Brooks/Cole, Boston (2009). https://books.google.com/books?id=aSXKXbyNlMQC
Mozafari, B.: Approximate query engines: Commercial challenges and research opportunities. In: Proceedings of the 2017 ACM International Conference on Management of Data, SIGMOD Conference 2017, Chicago, IL, USA, 14–19 May 2017, pp. 521–524 (2017). https://doi.org/10.1145/3035918.3056098
Mozafari, B., Niu, N.: A handbook for building an approximate query engine. IEEE Data Eng. Bull. 38(3), 3–29 (2015)
Mozafari, B., Ramnarayan, J., Menon, S., Mahajan, Y., Chakraborty, S., Bhanawat, H., Bachhav, K.: Snappydata: A unified cluster for streaming, transactions and interactice analytics. In: CIDR 2017, 8th Biennial Conference on Innovative Data Systems Research, Chaminade, CA, USA, 8–11 January 2017, Online Proceedings (2017). http://cidrdb.org/cidr2017/papers/p28-mozafari-cidr17.pdf
Nocedal, J., Wright, S.: Numerical Optimization. Springer Series in Operations Research and Financial Engineering. Springer, New York (2006)
van der Vaart, A.W.: Asymptotic Statistics. Cambridge Series in Statistical and Probabilistic Mathematics. Cambridge University Press, Cambridge (2000)
Wang, J., Krishnan, S., Franklin, M.J., Goldberg, K., Kraska, T., Milo, T.: A sample-and-clean framework for fast and accurate query processing on dirty data. In: International Conference on Management of Data, SIGMOD 2014, Snowbird, UT, USA, 22–27 June 2014, pp. 469–480 (2014). https://doi.org/10.1145/2588555.2610505
Wang, J., Lin, C., He, R., Chae, M., Papakonstantinou, Y., Swanson, S.: MILC: inverted list compression in memory. PVLDB 10(8), 853–864 (2017)
Wasserman, L.: All of Statistics: A Concise Course in Statistical Inference. Springer Texts in Statistics. Springer, New York (2004)
Wasserman, L.: All of Nonparametric Statistics. Springer Texts in Statistics. Springer, New York (2006)
Zaharia, M., Chowdhury, M., Das, T., Dave, A., Ma, J., McCauly, M., Franklin, M.J., Shenker, S., Stoica, I.: Resilient distributed datasets: A fault-tolerant abstraction for in-memory cluster computing. In: Proceedings of the 9th USENIX Symposium on Networked Systems Design and Implementation, NSDI 2012, San Jose, CA, USA, 25–27 April 2012, pp. 15–28 (2012). https://www.usenix.org/conference/nsdi12/technical-sessions/presentation/zaharia
Zeng, K., Agarwal, S., Stoica, I.: iolap: Managing uncertainty for efficient incremental OLAP. In: Proceedings of the 2016 International Conference on Management of Data, SIGMOD Conference 2016, San Francisco, CA, USA, 26 June–01 July 2016, pp. 1347–1361 (2016). https://doi.org/10.1145/2882903.2915240
Zeng, K., Gao, S., Mozafari, B., Zaniolo, C.: The analytical bootstrap: a new method for fast error estimation in approximate query processing. In: International Conference on Management of Data, SIGMOD 2014, Snowbird, UT, USA, June 22-27, 2014, pp. 277–288 (2014). https://doi.org/10.1145/2588555.2588579