Skip to main content

Can We Probabilistically Generate Uniformly Distributed Relation Instances Efficiently?

  • Conference paper
  • First Online:
Advances in Databases and Information Systems (ADBIS 2020)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 12245))

Included in the following conference series:

Abstract

Software engineering includes the runtime evaluation of a prototype by experiments with carefully selected sample inputs. For the development of software intended to operate on relation instances for a given relational schema with a functional dependency, we are then challenged to generate appropriate sample instances of increasing size. Moreover, studying the impact of varying the sizes of the attribute domains might be important. We focus on seeing uniformly distributed collections of sample instances to be appropriate. Based on a combinatorial analysis and exploiting an algorithm for restricted integer partitions, we develop a sophisticated probabilistic procedure of high computational complexity for generating any element of such a collection with equal probability. Moreover, we demonstrate that simpler approaches based on uniform local selections fail to achieve global uniformity.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 49.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 64.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Alattar, M., Sali, A.: Functional dependencies in incomplete databases with limited domains. In: Herzig, A., Kontinen, J. (eds.) FoIKS 2020. LNCS, vol. 12012, pp. 1–21. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-39951-1_1

    Chapter  Google Scholar 

  2. Arasu, A., Kaushik, R., Li, J.: Data generation using declarative constraints. In: Sellis, T.K., Miller, R.J., Kementsietsidis, A., Velegrakis, Y. (eds.) SIGMOD 2011, pp. 685–696. ACM (2011)

    Google Scholar 

  3. Beeri, C., Dowd, M., Fagin, R., Statman, R.: On the structure of armstrong relations for functional dependencies. J. ACM 31(1), 30–46 (1984)

    Article  MathSciNet  Google Scholar 

  4. Binnig, C., Kossmann, D., Lo, E., Özsu, M.T.: Qagen: generating query-aware test databases. In: Chan, C.Y., Ooi, B.C., Zhou, A. (eds.) SIGMOD 2007, pp. 341–352. ACM (2007)

    Google Scholar 

  5. Bitton, D., DeWitt, D.J., Turbyfill, C.: Benchmarking database systems A systematic approach. In: Schkolnick, M., Thanos, C. (eds.) VLDB 1983, pp. 8–19. Morgan Kaufmann (1983)

    Google Scholar 

  6. Blum, D., Cohen, S.: Grr: generating random RDF. In: Antoniou, G., Grobelnik, M., Simperl, E., Parsia, B., Plexousakis, D., De Leenheer, P., Pan, J. (eds.) ESWC 2011. LNCS, vol. 6644, pp. 16–30. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-21064-8_2

    Chapter  Google Scholar 

  7. Bruno, N., Chaudhuri, S.: Flexible database generators. In: Böhm, K., Jensen, C.S., Haas, L.M., Kersten, M.L., Larson, P., Ooi, B.C. (eds.) VLDB 2005, pp. 1097–1107. ACM (2005)

    Google Scholar 

  8. Chandra, B., Chawda, B., Kar, B., Reddy, K.V.M., Shah, S., Sudarshan, S.: Data generation for testing and grading SQL queries. CoRR arXiv:1411.6704v5 (2017)

  9. Cohen, S.: Generating XML structure using examples and constraints. PVLDB 1(1), 490–501 (2008). https://doi.org/10.14778/1453856.1453910

    Article  Google Scholar 

  10. Demetrovics, J.: On the number of candidate keys. Inf. Process. Lett. 7(6), 266–269 (1978)

    Article  MathSciNet  Google Scholar 

  11. Demetrovics, J., Katona, G.O.H., Miklós, D., Seleznjev, O., Thalheim, B.: Asymptotic properties of keys and functional dependencies in random databases. Theor. Comput. Sci. 190(2), 151–166 (1998)

    Article  MathSciNet  Google Scholar 

  12. Flajolet, P., Sedgewick, R.: Analytic Combinatorics. Cambridge University Press, Cambridge (2009)

    Book  Google Scholar 

  13. Flajolet, P., Zimmermann, P., Cutsem, B.V.: A calculus for the random generation of labelled combinatorial structures. Theor. Comput. Sci. 132(2), 1–35 (1994)

    Article  MathSciNet  Google Scholar 

  14. Frank, M., Poess, M., Rabl, T.: Efficient update data generation for DBMS benchmarks. In: Kaeli, D.R., Rolia, J., John, L.K., Krishnamurthy, D. (eds.) ICPE 2012, pp. 169–180. ACM (2012)

    Google Scholar 

  15. Galliani, P., Väänänen, J.: Diversity, dependence and independence. In: Herzig, A., Kontinen, J. (eds.) FoIKS 2020. LNCS, vol. 12012, pp. 106–121. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-39951-1_7

    Chapter  MATH  Google Scholar 

  16. Gray, J., Sundaresan, P., Englert, S., Baclawski, K., Weinberger, P.J.: Quickly generating billion-record synthetic databases. In: Snodgrass, R.T., Winslett, M. (eds.) SIGMOD 1994, pp. 243–252. ACM (1994)

    Google Scholar 

  17. Houkjær, K., Torp, K., Wind, R.: Simple and realistic data generation. In: Dayal, U., et al. (eds.) VLDB 2006, pp. 1243–1246. ACM (2006)

    Google Scholar 

  18. Katona, G.O.H., Tichler, K.: Encoding databases satisfying a given set of dependencies. In: Lukasiewicz, T., Sali, A. (eds.) FoIKS 2012. LNCS, vol. 7153, pp. 203–223. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-28472-4_12

    Chapter  Google Scholar 

  19. Kaufmann, M., Fischer, P.M., Kossmann, D., May, N.: A generic database benchmarking service. In: Jensen, C.S., Jermaine, C.M., Zhou, X. (eds.) ICDE 2013, pp. 1276–1279. IEEE Computer Society (2013)

    Google Scholar 

  20. Knuth, D.E.: The Art of Computer Programming, Volume I: Fundamental Algorithms, 2nd edn. Addison-Wesley, Reading (1973)

    Google Scholar 

  21. Knuth, D.E.: The Art of Computer Programming, Volume II: Seminumerical Algorithms, 3rd edn. Addison-Wesley, Reading (1998)

    MATH  Google Scholar 

  22. Lo, E., Cheng, N., Hon, W.: Generating databases for query workloads. PVLDB 3(1), 848–859 (2010)

    Google Scholar 

  23. Lucchesi, C.L., Osborn, S.L.: Candidate keys for relations. J. Comput. Syst. Sci. 17(2), 270–279 (1978)

    Article  MathSciNet  Google Scholar 

  24. De Marchi, F., Lopes, S., Petit, J.-M.: Samples for understanding data-semantics in relations. In: Hacid, M.-S., Raś, Z.W., Zighed, D.A., Kodratoff, Y. (eds.) ISMIS 2002. LNCS (LNAI), vol. 2366, pp. 565–573. Springer, Heidelberg (2002). https://doi.org/10.1007/3-540-48050-1_60

    Chapter  Google Scholar 

  25. Nijenhuis, A., Wilf, H.S.: A method and two algorithms on the theory of partitions. J. Comb. Theory, Ser. A 18(2), 219–222 (1975)

    Article  MathSciNet  Google Scholar 

  26. Stojmenovic, I., Zoghbi, A.: Fast algorithms for generating integer partitions. Int. J. Comput. Math. 70(2), 319–332 (1998)

    Article  MathSciNet  Google Scholar 

  27. Tay, Y.C.: Data generation for application-specific benchmarking. PVLDB 4(12), 1470–1473 (2011)

    Google Scholar 

  28. Transaction Processing Performance Council, TPC: TCP Benchmarks & Benchmark Results. http://www.tpc.org

Download references

Acknowledgements

We would like to thank Sebastian Link, Bernhard Thalheim and Jan Van den Bussche for stimulating conversations about the problem.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Joachim Biskup .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Biskup, J., Preuß, M. (2020). Can We Probabilistically Generate Uniformly Distributed Relation Instances Efficiently?. In: Darmont, J., Novikov, B., Wrembel, R. (eds) Advances in Databases and Information Systems. ADBIS 2020. Lecture Notes in Computer Science(), vol 12245. Springer, Cham. https://doi.org/10.1007/978-3-030-54832-2_8

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-54832-2_8

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-54831-5

  • Online ISBN: 978-3-030-54832-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics