A Benchmark for Multidimensional Statistical Data

  • Philipp Baumgärtel
  • Gregor Endler
  • Richard Lenz
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8133)

Abstract

ProHTA (Prospective Health Technology Assessment) is a simulation project that aims at estimating the outcome of new medical innovations at an early stage. To this end, hybrid and modular simulations are employed. For this large scale simulation project, efficient management of multidimensional statistical data is important. Therefore, we propose a benchmark to evaluate query processing of this kind of data in relational and non-relational databases. We compare our benchmark with existing approaches and point out differences. This paper presents a mapping to a flexible relational model, JSON documents and RDF. The queries defined for our benchmark are mapped to SQL, SPARQL, the MongoDB query language and MapReduce. Using our benchmark, we evaluate these different systems and discuss differences between them.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Baumgärtel, P., Lenz, R.: Towards data and data quality management for large scale healthcare simulations. In: Conchon, E., Correia, C., Fred, A., Gamboa, H. (eds.) Proceedings of the International Conference on Health Informatics, pp. 275–280. SciTePress - Science and Technology Publications (2012) iSBN: 978-989-8425-88-1Google Scholar
  2. 2.
    Blanas, S., Patel, J.M., Ercegovac, V., Rao, J., Shekita, E.J., Tian, Y.: A comparison of join algorithms for log processing in mapreduce. In: Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data, SIGMOD 2010, pp. 975–986. ACM, New York (2010)CrossRefGoogle Scholar
  3. 3.
    Cooper, B.F., Silberstein, A., Tam, E., Ramakrishnan, R., Sears, R.: Benchmarking cloud serving systems with ycsb. In: Proceedings of the 1st ACM Symposium on Cloud Computing, SoCC 2010, pp. 143–154. ACM, New York (2010)Google Scholar
  4. 4.
    Cudre-Mauroux, P., Kimura, H., Lim, K.T., Rogers, J., Madden, S., Stonebraker, M., Zdonik, S.B., Brown, P.G.: Ss-db: A standard science dbms benchmark (2012) (submitted for publication)Google Scholar
  5. 5.
    Darmont, J., Boussaïd, O., Bentayeb, F.: DWEB: A data warehouse engineering benchmark. In: Tjoa, A.M., Trujillo, J. (eds.) DaWaK 2005. LNCS, vol. 3589, pp. 85–94. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  6. 6.
    Dean, J., Ghemawat, S.: Mapreduce: simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008)CrossRefGoogle Scholar
  7. 7.
    Djanatliev, A., Kolominsky-Rabas, P., Hofmann, B.M., Aisenbrey, A., German, R.: Hybrid simulation approach for prospective assessment of mobile stroke units. In: SIMULTECH 2012 - Proceedings of the 2nd International Conference on Simulation and Modeling Methodologies, Technologies and Applications, pp. 357–366 (2012)Google Scholar
  8. 8.
    Floratou, A., Teletia, N., Dewitt, D.J., Patel, J.M., Zhang, D.: Can the elephants handle the nosql onslaught? In: Proceedings of the VLDB Endowment, vol. 5 (2012)Google Scholar
  9. 9.
    Jain, R.: The art of computer systems performance analysis. John Wiley & Sons, Inc. (1991)Google Scholar
  10. 10.
    Lenz, R., Elstner, T., Siegele, H., Kuhn, K.A.: A practical approach to process support in health information systems. Journal of the American Medical Informatics Association 9(6), 571–585 (2002)CrossRefGoogle Scholar
  11. 11.
    Nadkarni, P.M., Marenco, L., Chen, R., Skoufos, E., Shepherd, G., Miller, P.: Organization of heterogeneous scientific data using the eav/cr representation. Journal of the American Medical Informatics Association 6(6), 478–493 (1999)CrossRefGoogle Scholar
  12. 12.
    Pavlo, A., Paulson, E., Rasin, A., Abadi, D.J., DeWitt, D.J., Madden, S., Stonebraker, M.: A comparison of approaches to large-scale data analysis. In: Proceedings of the 2009 ACM SIGMOD International Conference on Management of Data, SIGMOD 2009, pp. 165–178. ACM, New York (2009)Google Scholar
  13. 13.
    Stonebraker, M., Bear, C., Çetintemel, U., Cherniack, M., Ge, T., Hachem, N., Harizopoulos, S., Lifter, J., Rogers, J., Zdonik, S.: One size fits all? - part 2: benchmarking results. In: Proceedings of the 3rd Conference on Innovative Data Systems Research, CIDR (2007)Google Scholar
  14. 14.
    Tudorica, B., Bucur, C.: A comparison between several nosql databases with comments and notes. In: 2011 10th Roedunet International Conference, RoEduNet, pp. 1–5 (June 2011)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • Philipp Baumgärtel
    • 1
  • Gregor Endler
    • 1
  • Richard Lenz
    • 1
  1. 1.Institute of Computer Science 6University of Erlangen-NurembergGermany

Personalised recommendations