BDMS Performance Evaluation: Practices, Pitfalls, and Possibilities

Carey, Michael J.

doi:10.1007/978-3-642-36727-4_8

Michael J. Carey¹⁸

Part of the book series: Lecture Notes in Computer Science ((LNPSE,volume 7755))

Included in the following conference series:

Technology Conference on Performance Evaluation and Benchmarking

1635 Accesses
6 Citations

Abstract

Much of the IT world today is buzzing about Big Data, and we are witnessing the emergence of a new generation of data-oriented platforms aimed at storing and processing all of the anticipated Big Data. The current generation of Big Data Management Systems (BDMSs) can largely be divided into two kinds of platforms: systems for Big Data analytics, which today tend to be batch-oriented and based on MapReduce (e.g., Hadoop), and systems for Big Data storage and front-end request-serving, which are usually based on key-value (a.k.a. NoSQL) stores. In this paper we ponder the problem of evaluating the performance of such systems. After taking a brief historical look at Big Data management and DBMS benchmarking, we begin our pondering of BDMS performance evaluation by reviewing several key recent efforts to measure and compare the performance of BDMSs. Next we discuss a series of potential pitfalls that such evaluation efforts should watch out for, pitfalls mostly based on the author’s own experiences with past benchmarking efforts. Finally, we close by discussing some of the unmet needs and future possibilities with regard to BDMS performance characterization and assessment efforts.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 49.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Alsubaiee, S., Behm, A., Grover, R., Vernica, R., Borkar, V., Carey, M., Li, C.: ASTERIX: Scalable Warehouse-Style Web Data Integration. In: Proc. Int’l. Workshop on Information Integration on the Web (IIWeb), Phoenix, AZ (May 2012)
Google Scholar
Arasu, A., Cherniack, M., Galvez, E., Maier, D., Maskey, A., Ryvkina, E., Stonebraker, M., Tibbetts, R.: Linear Road: A Stream Data Management Benchmark. In: Proc. VLDB Conf., Toronto, Canada (August 2004)
Google Scholar
Apache GridMix, http://hadoop.apache.org/mapreduce/docs/current/gridmix.html
Apache Hadoop, http://hadoop.apache.org/.
Apache Hive, https://cwiki.apache.org/confluence/display/Hive/Home
Apache Pig, http://pig.apache.org/.
Apache PigMix, https://cwiki.apache.org/confluence/display/PIG/PigMix
ASTERIX Project, http://asterix.ics.uci.edu/.
Behm, A., Borkar, V., Carey, M., Grover, R., Li, C., Onose, N., Vernica, R., Deutsch, A., Papakonstantinou, Y., Tsotras, V.: ASTERIX: Towards a Scalable, Semistructured Data Platform for Evolving-World Models. Distrib. Parallel Databases 29(3) (June 2011)
Google Scholar
Borkar, V., Carey, M., Grover, R., Onose, N., Vernica, R.: Hyracks: A Flexible and Extensible Foundation for Data-Intensive Computing. In: Proc. IEEE ICDE Conf., Hanover, Germany (April 2011)
Google Scholar
Borkar, V., Carey, M., Li, C.: Inside "Big Data Management": Ogres, Onions, or Parfaits? In: Proc. EDBT Conf., Berlin, Germany (March 2012)
Google Scholar
Bu, Y., Borkar, V., Carey, M., Rosen, J., Polyzotis, N., Condie, T., Weimer, M., Ramakrishnan, R.: Scaling Datalog for Machine Learning on Big Data. arXiv:1203.0160v2 (cs.DB) (March 2012)
Google Scholar
Carey, M., Muhanna, W.: The Performance of Multiversion Concurrency Control Algorithms. ACM Trans. on Comp. Sys. 4(4) (November 1986)
Google Scholar
Carey, M., DeWitt, D., Naughton, J.: The OO7 Benchmark. In: Proc. ACM SIGMOD Conf., Washington, DC (May 1993)
Google Scholar
Carey, M., DeWitt, D., Kant, C., Naughton, J.: A Status Report on the OO7 OODBMS Benchmarking Effort. In: Proc. ACM OOPSLA Conf., Portland, OR (October 1994)
Google Scholar
Carey, M., DeWitt, D., Naughton, J., Asgarian, M., Brown, P., Gehrke, J., Shah, D.: The BUCKY Object-Relational Benchmark. In: Proc. ACM SIGMOD Conf., Tucson, AZ (May 1997)
Google Scholar
Carey, M.J., Ling, L., Nicola, M., Shao, L.: EXRT: Towards a Simple Benchmark for XML Readiness Testing. In: Nambiar, R., Poess, M. (eds.) TPCTC 2010. LNCS, vol. 6417, pp. 93–109. Springer, Heidelberg (2011)
Chapter Google Scholar
Cattell, R.: Scalable SQL and NoSQL Data Stores. ACM SIGMOD Rec. 39(4) (December 2010)
Google Scholar
Chaiken, R., Jenkins, B., Larson, P., Ramsey, B., Shakib, D., Weaver, S., Zhou, J.: SCOPE: Easy and Efficient Parallel Processing of Massive Data Sets. Proc. VLDB Endow. 1(2) (August 2008)
Google Scholar
Cooper, B., Silberstein, A., Tam, E., Ramakrishnan, R., Sears, R.: Benchmarking Cloud Serving Systems with YCSB. In: Proc. ACM Symp. on Cloud Computing, Indianapolis, IN (May 2010)
Google Scholar
Dean, J., Ghemawat, S.: MapReduce: Simplified Data Processing on Large Clusters. In: Proc. OSDI Conf. (December 2004)
Google Scholar
DeWitt, D.: The Wisconsin Benchmark: Past, Present, and Future. In: [24]
Google Scholar
DeWitt, D., Gray, J.: Parallel Database Systems: The Future of High Performance Database Systems. Comm. ACM 35(6) (June 1992)
Google Scholar
Gray, J.: Benchmark Handbook for Database and Transaction Systems, 2nd edn. Morgan Kaufmann Publishers, San Francisco (1993)
MATH Google Scholar
Grover, R., Carey, M.: Extending Map-Reduce for Efficient Predicate-Based Sampling. In: Proc. IEEE ICDE Conf., Washington, D.C (April 2012)
Google Scholar
Low, Y., Gonzalez, J., Kyrola, A., Bickson, D., Guestrin, C., Hellerstein, J.: GraphLab: A New Parallel Framework for Machine Learning. In: Proc. Conf. on Uncertainty in Artificial Intelligence (UAI), Catalina Island, CA (July 2010)
Google Scholar
Malewicz, G., Austern, M., Bik, A., Dehnert, J., Horn, I., Leiser, N., Czajkowski, G.: Pregel: A System for Large-Scale Graph Processing. In: Proc. ACM SIGMOD Conf., Indianapolis, IN (May 2010)
Google Scholar
Nicola, M., Kogan, I., Schiefer, B.: An XML Transaction Processing Benchmark. In: Proc. ACM SIGMOD Conf., Beijing, China (June 2007)
Google Scholar
NSF Workshop on Big Data Benchmarking, http://clds.ucsd.edu/wbdb2012/.
Pavlo, A., Paulson, E., Rasin, A., Abadi, D., DeWitt, D., Madden, S., Stonebraker, M.: A Comparison of Approaches to Large-Scale Data Analysis. In: Proc. ACM SIGMOD Conf., Providence, RI (June 2009)
Google Scholar
Schmidt, A., Waas, F., Kersten, M., Carey, M., Manolescu, I., Busse, R.: XMark: A Benchmark for XML Data Management. In: Proc. VLDB Conf., Hong Kong, China (August 2002)
Google Scholar
Serlin, O.: The History of DebitCredit and the TPC. In: [24]
Google Scholar
Stonebraker, M., Brown, P., Poliakov, A., Raman, S.: The Architecture of SciDB. In: Proc. SSDBM Conf., Portland, OR (July 2011)
Google Scholar

Download references

Author information

Authors and Affiliations

Information Systems Group, Computer Sciences Department, University of California, Irvine, Irvine, CA, 92697-3435, USA
Michael J. Carey

Authors

Michael J. Carey
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Data Center Group, Cisco Systems, Inc., 3800 Zanker Road, 95134, San Jose, CA, USA
Raghunath Nambiar
Server Technologies, Oracle Corporation, 500 Oracle Parkway, 94065, Redwood Shores, CA, USA
Meikel Poess

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Carey, M.J. (2013). BDMS Performance Evaluation: Practices, Pitfalls, and Possibilities. In: Nambiar, R., Poess, M. (eds) Selected Topics in Performance Evaluation and Benchmarking. TPCTC 2012. Lecture Notes in Computer Science, vol 7755. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-36727-4_8

Download citation

DOI: https://doi.org/10.1007/978-3-642-36727-4_8
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-36726-7
Online ISBN: 978-3-642-36727-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics