Abstract
Software engineering includes the runtime evaluation of a prototype by experiments with carefully selected sample inputs. For the development of software intended to operate on relation instances for a given relational schema with a functional dependency, we are then challenged to generate appropriate sample instances of increasing size. Moreover, studying the impact of varying the sizes of the attribute domains might be important. We focus on seeing uniformly distributed collections of sample instances to be appropriate. Based on a combinatorial analysis and exploiting an algorithm for restricted integer partitions, we develop a sophisticated probabilistic procedure of high computational complexity for generating any element of such a collection with equal probability. Moreover, we demonstrate that simpler approaches based on uniform local selections fail to achieve global uniformity.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Alattar, M., Sali, A.: Functional dependencies in incomplete databases with limited domains. In: Herzig, A., Kontinen, J. (eds.) FoIKS 2020. LNCS, vol. 12012, pp. 1–21. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-39951-1_1
Arasu, A., Kaushik, R., Li, J.: Data generation using declarative constraints. In: Sellis, T.K., Miller, R.J., Kementsietsidis, A., Velegrakis, Y. (eds.) SIGMOD 2011, pp. 685–696. ACM (2011)
Beeri, C., Dowd, M., Fagin, R., Statman, R.: On the structure of armstrong relations for functional dependencies. J. ACM 31(1), 30–46 (1984)
Binnig, C., Kossmann, D., Lo, E., Özsu, M.T.: Qagen: generating query-aware test databases. In: Chan, C.Y., Ooi, B.C., Zhou, A. (eds.) SIGMOD 2007, pp. 341–352. ACM (2007)
Bitton, D., DeWitt, D.J., Turbyfill, C.: Benchmarking database systems A systematic approach. In: Schkolnick, M., Thanos, C. (eds.) VLDB 1983, pp. 8–19. Morgan Kaufmann (1983)
Blum, D., Cohen, S.: Grr: generating random RDF. In: Antoniou, G., Grobelnik, M., Simperl, E., Parsia, B., Plexousakis, D., De Leenheer, P., Pan, J. (eds.) ESWC 2011. LNCS, vol. 6644, pp. 16–30. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-21064-8_2
Bruno, N., Chaudhuri, S.: Flexible database generators. In: Böhm, K., Jensen, C.S., Haas, L.M., Kersten, M.L., Larson, P., Ooi, B.C. (eds.) VLDB 2005, pp. 1097–1107. ACM (2005)
Chandra, B., Chawda, B., Kar, B., Reddy, K.V.M., Shah, S., Sudarshan, S.: Data generation for testing and grading SQL queries. CoRR arXiv:1411.6704v5 (2017)
Cohen, S.: Generating XML structure using examples and constraints. PVLDB 1(1), 490–501 (2008). https://doi.org/10.14778/1453856.1453910
Demetrovics, J.: On the number of candidate keys. Inf. Process. Lett. 7(6), 266–269 (1978)
Demetrovics, J., Katona, G.O.H., Miklós, D., Seleznjev, O., Thalheim, B.: Asymptotic properties of keys and functional dependencies in random databases. Theor. Comput. Sci. 190(2), 151–166 (1998)
Flajolet, P., Sedgewick, R.: Analytic Combinatorics. Cambridge University Press, Cambridge (2009)
Flajolet, P., Zimmermann, P., Cutsem, B.V.: A calculus for the random generation of labelled combinatorial structures. Theor. Comput. Sci. 132(2), 1–35 (1994)
Frank, M., Poess, M., Rabl, T.: Efficient update data generation for DBMS benchmarks. In: Kaeli, D.R., Rolia, J., John, L.K., Krishnamurthy, D. (eds.) ICPE 2012, pp. 169–180. ACM (2012)
Galliani, P., Väänänen, J.: Diversity, dependence and independence. In: Herzig, A., Kontinen, J. (eds.) FoIKS 2020. LNCS, vol. 12012, pp. 106–121. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-39951-1_7
Gray, J., Sundaresan, P., Englert, S., Baclawski, K., Weinberger, P.J.: Quickly generating billion-record synthetic databases. In: Snodgrass, R.T., Winslett, M. (eds.) SIGMOD 1994, pp. 243–252. ACM (1994)
Houkjær, K., Torp, K., Wind, R.: Simple and realistic data generation. In: Dayal, U., et al. (eds.) VLDB 2006, pp. 1243–1246. ACM (2006)
Katona, G.O.H., Tichler, K.: Encoding databases satisfying a given set of dependencies. In: Lukasiewicz, T., Sali, A. (eds.) FoIKS 2012. LNCS, vol. 7153, pp. 203–223. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-28472-4_12
Kaufmann, M., Fischer, P.M., Kossmann, D., May, N.: A generic database benchmarking service. In: Jensen, C.S., Jermaine, C.M., Zhou, X. (eds.) ICDE 2013, pp. 1276–1279. IEEE Computer Society (2013)
Knuth, D.E.: The Art of Computer Programming, Volume I: Fundamental Algorithms, 2nd edn. Addison-Wesley, Reading (1973)
Knuth, D.E.: The Art of Computer Programming, Volume II: Seminumerical Algorithms, 3rd edn. Addison-Wesley, Reading (1998)
Lo, E., Cheng, N., Hon, W.: Generating databases for query workloads. PVLDB 3(1), 848–859 (2010)
Lucchesi, C.L., Osborn, S.L.: Candidate keys for relations. J. Comput. Syst. Sci. 17(2), 270–279 (1978)
De Marchi, F., Lopes, S., Petit, J.-M.: Samples for understanding data-semantics in relations. In: Hacid, M.-S., Raś, Z.W., Zighed, D.A., Kodratoff, Y. (eds.) ISMIS 2002. LNCS (LNAI), vol. 2366, pp. 565–573. Springer, Heidelberg (2002). https://doi.org/10.1007/3-540-48050-1_60
Nijenhuis, A., Wilf, H.S.: A method and two algorithms on the theory of partitions. J. Comb. Theory, Ser. A 18(2), 219–222 (1975)
Stojmenovic, I., Zoghbi, A.: Fast algorithms for generating integer partitions. Int. J. Comput. Math. 70(2), 319–332 (1998)
Tay, Y.C.: Data generation for application-specific benchmarking. PVLDB 4(12), 1470–1473 (2011)
Transaction Processing Performance Council, TPC: TCP Benchmarks & Benchmark Results. http://www.tpc.org
Acknowledgements
We would like to thank Sebastian Link, Bernhard Thalheim and Jan Van den Bussche for stimulating conversations about the problem.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Biskup, J., Preuß, M. (2020). Can We Probabilistically Generate Uniformly Distributed Relation Instances Efficiently?. In: Darmont, J., Novikov, B., Wrembel, R. (eds) Advances in Databases and Information Systems. ADBIS 2020. Lecture Notes in Computer Science(), vol 12245. Springer, Cham. https://doi.org/10.1007/978-3-030-54832-2_8
Download citation
DOI: https://doi.org/10.1007/978-3-030-54832-2_8
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-54831-5
Online ISBN: 978-3-030-54832-2
eBook Packages: Computer ScienceComputer Science (R0)