A Data Generator for Cloud-Scale Benchmarking

Rabl, Tilmann; Frank, Michael; Sergieh, Hatem Mousselly; Kosch, Harald

doi:10.1007/978-3-642-18206-8_4

Tilmann Rabl¹⁸,
Michael Frank¹⁸,
Hatem Mousselly Sergieh¹⁸ &
…
Harald Kosch¹⁸

Part of the book series: Lecture Notes in Computer Science ((LNPSE,volume 6417))

Included in the following conference series:

Technology Conference on Performance Evaluation and Benchmarking

1296 Accesses
27 Citations

Abstract

In many fields of research and business data sizes are breaking the petabyte barrier. This imposes new problems and research possibilities for the database community. Usually, data of this size is stored in large clusters or clouds. Although clouds have become very popular in recent years, there is only little work on benchmarking cloud applications. In this paper we present a data generator for cloud sized applications. Its architecture makes the data generator easy to extend and to configure. A key feature is the high degree of parallelism that allows linear scaling for arbitrary numbers of nodes. We show how distributions, relationships and dependencies in data can be computed in parallel with linear speed up.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Bennett, C., Grossman, R., Seidman, J.: Malstone: A benchmark for data intensive computing. Technical report, Open Cloud Consortium (2009)
Google Scholar
Binnig, C., Kossmann, D., Kraska, T., Loesing, S.: How is the weather tomorrow?: Towards a benchmark for the cloud. In: Proceedings of the Second International Workshop on Testing Database Systems, DBTest 2009, pp. 1–6. ACM, New York (2009)
Google Scholar
Birman, K., Chockler, G., van Renesse, R.: Toward a cloud computing research agenda. SIGACT News 40(2), 68–80 (2009)
Article Google Scholar
Bitton, D., DeWitt, D.J., Turbyfill, C.: Benchmarking database systems: A systematic approach. In: Proceedings of the 9th International Conference on Very Large Data Bases, VLDB 1983, San Francisco, CA, USA, pp. 8–19. ACM, Morgan Kaufmann Publishers Inc. (November 1983)
Google Scholar
Blackburn, S.M., McKinley, K.S., Garner, R., Hoffmann, C., Khan, A.M., Bentzur, R., Diwan, A., Feinberg, D., Frampton, D., Guyer, S.Z., Hirzel, M., Hosking, A.L., Jump, M., Lee, H., Moss, J.E.B., Phansalkar, A., Stefanovic, D., VanDrunen, T., von Dincklage, D., Wiedermann, B.: Wake up and smell the coffee: evaluation methodology for the 21st century. Communications of the ACM 51(8), 83–89 (2008)
Article Google Scholar
Boncz, P.A., Manegold, S., Kersten, M.L.: Database architecture evolution: Mammals flourished long before dinosaurs became extinct. In: Proceedings of the 35th International Conference on Very Large Data Bases, VLDB 2009, pp. 1648–1653. VLDB Endowment (2009)
Google Scholar
Bruno, N., Chaudhuri, S.: Flexible database generators. In: Proceedings of the 31st International Conference on Very Large Data Bases, VLDB 2005, pp. 1097–1107. VLDB Endowment (2005)
Google Scholar
Cooper, B.F., Silberstein, A., Tam, E., Ramakrishnan, R., Sears, R.: Benchmarking cloud serving systems with ycsb. In: Proceedings of the 1st ACM Symposium on Cloud Computing, SoCC 2010, pp. 143–154. ACM, New York (2010)
Google Scholar
Copeland, G.P., Khoshafian, S.: A decomposition storage model. In: Proceedings of the 1985 ACM SIGMOD International Conference on Management of Data, SIGMOD 1985, pp. 268–279. ACM, New York (1985)
Chapter Google Scholar
Foster, I.: Designing and Building Parallel Programs: Concepts and Tools for Parallel Software Engineering. Addison Wesley, Reading (1995)
MATH Google Scholar
Gray, J.: Database and transaction processing performance handbook. In: Gray, J. (ed.) The Benchmark Handbook for Database and Transaction Systems, 2nd edn. Morgan Kaufmann Publishers, San Francisco (1993)
Google Scholar
Gray, J., Sundaresan, P., Englert, S., Baclawski, K., Weinberger, P.J.: Quickly generating billion-record synthetic databases. In: Proceedings of the 1994 ACM SIGMOD International Conference on Management of Data, SIGMOD 1994, pp. 243–252. ACM, New York (1994)
Chapter Google Scholar
Hoag, J.E., Thompson, C.W.: A parallel general-purpose synthetic data generator. SIGMOD Record 36(1), 19–24 (2007)
Article Google Scholar
Houkjær, K., Torp, K., Wind, R.: Simple and realistic data generation. In: Proceedings of the 32nd International Conference on Very Large Data Bases, VLDB 2006, pp. 1243–1246. VLDB Endowment (2006)
Google Scholar
Korth, H.F., Bernstein, P.A., Fernández, M.F., Gruenwald, L., Kolaitis, P.G., McKinley, K.S., Özsu, M.T.: Paper and proposal reviews: is the process flawed? SIGMOD Record 37(3), 36–39 (2008)
Article Google Scholar
Lin, P.J., Samadi, B., Cipolone, A., Jeske, D.R., Cox, S., Rendón, C., Holt, D., Xiao, R.: Development of a synthetic data set generator for building and testing information discovery systems. In: Proceedings of the Third International Conference on Information Technology: New Generations, ITNG 2006, Washington, DC, USA, pp. 707–712. IEEE Computer Society, Los Alamitos (2006)
Google Scholar
O’Neil, P.E.: The set query benchmark. In: Gray, J. (ed.) The Benchmark Handbook for Database and Transaction Systems, 2nd edn. Morgan Kaufmann Publishers, San Francisco (1993)
Google Scholar
Poess, M., Floyd, C.: New tpc benchmarks for decision support and web commerce. SIGMOD Record 29(4), 64–71 (2000)
Article Google Scholar
Rabl, T., Lang, A., Hackl, T., Sick, B., Kosch, H.: Generating shifting workloads to benchmark adaptability in relational database systems. In: Nambiar, R.O., Poess, M. (eds.) TPCTC 2009. LNCS, vol. 5895, pp. 116–131. Springer, Heidelberg (2009)
Google Scholar
Ramamurthy, R., DeWitt, D.J., Su, Q.: A case for fractured mirrors. In: Proceedings of the 28th International Conference on Very Large Data Bases, VLDB 2002, pp. 430–441. VLDB Endowment (2002)
Google Scholar
Stephens, J.M., Poess, M.: Mudd: a multi-dimensional data generator. In: Proceedings of the 4th International Workshop on Software and Performance, WOSP 2004, pp. 104–109. ACM, New York (2004)
Google Scholar
Stonebraker, M.: A new direction for tpc? In: Nambiar, R.O., Poess, M. (eds.) TPCTC 2009. LNCS, vol. 5895, pp. 11–17. Springer, Heidelberg (2009)
Google Scholar
Stonebraker, M., Abadi, D.J., Batkin, A., Chen, X., Cherniack, M., Ferreira, M., Lau, E., Lin, A., Madden, S., O’Neil, E.J., O’Neil, P.E., Rasin, A., Tran, N., Zdonik, S.B.: C-store: A column-oriented dbms. In: Proceedings of the 31st International Conference on Very Large Data Bases, VLDB 2005, pp. 553–564. VLDB Endowment (2005)
Google Scholar
Szalay, A.S., Gray, J., Thakar, A., Kunszt, P.Z., Malik, T., Raddick, J., Stoughton, C., van den Berg, J.: The sdss skyserver: Public access to the sloan digital sky server data. In: Proceedings of the 2002 ACM SIGMOD International Conference on Management of Data, SIGMOD 2002, pp. 570–581. ACM, New York (2002)
Google Scholar

Download references

Author information

Authors and Affiliations

Chair of Distributed Information Systems, University of Passau, Germany
Tilmann Rabl, Michael Frank, Hatem Mousselly Sergieh & Harald Kosch

Authors

Tilmann Rabl
View author publications
You can also search for this author in PubMed Google Scholar
Michael Frank
View author publications
You can also search for this author in PubMed Google Scholar
Hatem Mousselly Sergieh
View author publications
You can also search for this author in PubMed Google Scholar
Harald Kosch
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Access and Virtualization Business Unit, Cisco Systems, Inc., 3800 Zankar Road, 95134, San Jose, CA, USA
Raghunath Nambiar
Server Technologies, Oracle Corporation, 500 Oracle Parkway, 94065, Redwood Shores, CA, USA
Meikel Poess

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Rabl, T., Frank, M., Sergieh, H.M., Kosch, H. (2011). A Data Generator for Cloud-Scale Benchmarking. In: Nambiar, R., Poess, M. (eds) Performance Evaluation, Measurement and Characterization of Complex Systems. TPCTC 2010. Lecture Notes in Computer Science, vol 6417. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-18206-8_4

Download citation

DOI: https://doi.org/10.1007/978-3-642-18206-8_4
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-18205-1
Online ISBN: 978-3-642-18206-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics