Abstract
Parallel computing in R has been widely used to analyse microarray data. We have seen various applications using various data distribution and calculation approaches. Newer data storage systems, such as MySQL Cluster and HBase, have been proposed for R data storage; while the parallel computation frameworks, including MPI and MapReduce, have been applied to R computation. Thus, it is difficult to understand the whole analysis workflows for which the tool kits are suited for a specific environment. In this paper we propose DSIMBench, a benchmark containing two classic microarray analysis functions with eight different parallel R workflows, and evaluate the benchmark in the IC Cloud testbed platform.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
MySQL Cluster CGE. http://www.mysql.com/products/cluster/
Momjian, B.: PostgreSQL: introduction and concepts. J. Digit. Imaging Off. J. Soc. Comput. Appl. Radiol. 22, 462 (2001). doi:10.1007/s10278-007-9097-5
Dirolf, K.C., Dorif, M.: MongoDB: The Definitive Guide. O’Reily Media, Sebastopol (2011)
George, L.: HBase The Definitive Guide. O’Reily Media, Sebastopol (2008)
Anon: MPI: A message passing interface. In: Proceedings of the Supercomputing Conference, pp. 878–883 (1993). doi:10.1109/SUPERC.1993.1263546
Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008). doi:10.1145/1327452.1327492
R Development Core Team: R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria (2008). ISBN 3-900051-07-0, http://www.R-project.org
Athey, B.D., Braxenthaler, M., Haas, M., Guo, Y.: tranSMART: an open source and community-driven informatics and data sharing platform for clinical and translational research. In: Proceedings of the AMIA Joint Summits on Translational Science 2013, pp. 6–8, PMCID: PMC3814495 (2013)
Guo, L., Guo, Y., Tia, X.: IC cloud: a design space for composable cloud computing. In: Proceedings - 2010 IEEE 3rd International Conference on Cloud Computing, CLOUD 2010, pp. 394–401 (2010). doi:10.1109/CLOUD.2010.18
Henning, J.L.: SPEC CPU2006 benchmark descriptions. ACM SIGARCH Comput. Archit. News 34(4), 1–17 (2006). doi:10.1145/1186736.1186737
TPC-H Benchmark. http://www.tpc.org/tpch/
Woo, S.C., Ohara, M., Torrie, E., Singh, J.P., Gupta, A.: The SPLASH-2 programs: characterization and methodological considerations. In: Proceedings 22nd Annual International Symposium on Computer Architecture (1995). doi:10.1109/ISCA.1995.524546
Fritts, J.E., Steiling, F.W., Tucek, J.A.: MediaBench II Video: Expediting the Next Generation of Video Systems Research. In: Proceedings of the SPIE, Embedded Processors for Multimedia and Communications II, vol. 5683, pp. 79–93 (2005)
Albayraktaroglu, K., Jaleel, A., Wu, X., Franklin, M., Jacob, B., Tseng, C.W., Yeung, D.: BioBench: a benchmark suite of bioinformatics applications. In: ISPASS 2005 - IEEE International Symposium on Performance Analysis of Systems and Software, vol. 2005, pp. 2–9 (2005). doi:10.1109/ISPASS.2005.1430554
Bader, D.A., Li, Y., Li, T., Sachdeva, V.: BioPerf: a benchmark suite to evaluate high-performance computer architecture on bioinformatics applications. In: Proceedings of the 2005 IEEE International Symposium on Workload Characterization, IISWC-2005, vol. 2005, pp. 163–173 (2005). doi:10.1109/IISWC.2005.1526013
Narayanan, R., Ozisikyilmaz, B., Zambreno, J., Memik, G., Choudhary, A.: MineBench: a benchmark suite for data mining workloads. In: Proceedings of the 2006 IEEE International Symposium on Workload Characterization, IISWC - 2006, pp. 182–188 (2006). doi:10.1109/IISWC.2006.302743
Knaus, J., Porzelius, C., Binder, H.: Easier parallel computing in R with snowfall and sfCluster. Source 1, 54–59 (2009)
Yu, H.: Rmpi: parallel statistical computing in R. R News 2, 10–14 (2002). http://cran.r-project.org/doc/Rnews/Rnews_2002-2.pdf
Squyres, J.M.: A component architecture for LAM/MPI. ACM SIGPLAN Not. (2003). doi:10.1145/966049.781510
Bridges, P., Doss, N., Gropp, W., Karrels, E., Lusk, E., Skjellum, A.: User Guide to MPICH, a Portable Implementation of MPI. Argonne National Laboratory, 9700, 60439–64801 (1995)
White, T.: Hadoop: The Definitive Guide. O’Reilly Media, Sebastopol (2012)
Barrett, T., Wilhite, S.E., Ledoux, P., Evangelista, C., Kim, I.F., Tomashevsky, M., Soboleva, A.: NCBI GEO: Archive for functional genomics data sets - Update. Nucleic Acids Res.,41 (2013). doi:10.1093/nar/gks1193
Popovici, V., Chen, W., Gallas, B.G., Hatzis, C., et al.: Effect of training-sample size and classification difficulty on the accuracy of genomic predictors. Breast Cancer Res. 12(1), R5 (2010)
Stoughton, R.B., Friend, S.H.: How molecular profiling could revolutionize drug discovery. Nat. Rev. Drug Discov. 4, 345–350 (2005). doi:10.1038/nrd1696
Kohlmann, A., Kipps, T.J., Rassenti, L.Z., Downing, J.R., et al.: An international standardization programme towards the application of gene expression profiling in routine leukaemia diagnostics: the Microarray Innovations in LEukemia study prephase. Br. J. Haematol. 142(5), 802–807 (2008)
Apache Thrift. http://thrift.apache.org
Acknowledgment
This research was partially supported by the Innovative R&D Team Support Program of Guangdong Province (NO. 201001D0104726115), China, Johnson & Johnson Pharmaceutical and Research Comp, and Innovative Medicines Initiative (IMI), EU Grant Code 115446.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Wang, S. et al. (2014). DSIMBench: A Benchmark for Microarray Data Using R. In: Zhan, J., Han, R., Weng, C. (eds) Big Data Benchmarks, Performance Optimization, and Emerging Hardware. BPOE 2014. Lecture Notes in Computer Science(), vol 8807. Springer, Cham. https://doi.org/10.1007/978-3-319-13021-7_4
Download citation
DOI: https://doi.org/10.1007/978-3-319-13021-7_4
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-13020-0
Online ISBN: 978-3-319-13021-7
eBook Packages: Computer ScienceComputer Science (R0)