Parallelizing Biochemical Stochastic Simulations: A Comparison of GPUs and Intel Xeon Phi Processors
Abstract
Stochastic simulations of biochemical reaction networks can be computationally expensive on Central Processing Units (CPUs), especially when a large number of simulations is required to compute the system states distribution or to carry out advanced model analysis. Anyway, since all simulations are independent, parallel architectures can be exploited to reduce the overall running time. The purpose of this work is to compare the computational performance of CPUs, general-purpose Graphics Processing Units (GPUs) and Intel Xeon Phi coprocessors based on the Many Integrated Core (MIC) architecture, for the execution of Gillespie’s Stochastic Simulation Algorithm (SSA). To this aim, we consider an ad hoc implementation of SSA on GPUs, while exploiting the peculiar capability of MICs of reusing existing CPUs source code. We measure the running time needed to execute several batches of simulations, for various biochemical models of increasing size. Our results show that in all tested cases GPUs outperform the other architectures, and that reusing available code with the MICs does not represent a clever strategy to fully leverage Xeon Phi horsepower.
References
- 1.Aldridge, B.B., Burke, J.M., Lauffenburger, D.A., Sorger, P.K.: Physicochemical modelling of cell signalling pathways. Nat. Cell Biol. 8, 1195–1203 (2006)CrossRefGoogle Scholar
- 2.Wilkinson, D.: Stochastic modelling for quantitative description of heterogeneous biological systems. Nat. Rev. Genet. 10, 122–133 (2009)CrossRefGoogle Scholar
- 3.Gillespie, D.T.: Exact stochastic simulation of coupled chemical reactions. J. Phys. Chem. 81, 2340–2361 (1977)CrossRefGoogle Scholar
- 4.Gillespie, D.T.: A rigorous derivation of the chemical master equation. Physica A 188, 404–425 (1992)CrossRefGoogle Scholar
- 5.Cao, Y., Gillespie, D.T., Petzold, L.R.: Efficient step size selection for the tau-leaping simulation method. J. Chem. Phys. 124, 044109 (2006)CrossRefGoogle Scholar
- 6.Nobile, M.S., Cazzaniga, P., Besozzi, D., Pescini, D., Mauri, G.: Reverse engineering of kinetic reaction networks by means of Cartesian Genetic Programming and Particle Swarm Optimization. In: IEEE Congress of Evolutionary Computation, pp. 1594–1601 (2013)Google Scholar
- 7.Tian, T., Burrage, K.: Parallel implementation of stochastic simulation of large-scale cellular processes. In: 8th International Conference on High-Performance Computing in Asia-Pacific Region, pp. 621–626 (2005)Google Scholar
- 8.Kent, E., Hoops, S., Mendes, P.: Condor-COPASI: high-throughput computing for biochemical networks. BMC Syst. Biol. 6, 91 (2012)CrossRefGoogle Scholar
- 9.Macchiarulo, L.: A massively parallel implementation of Gillespie algorithm on FPGAs. In: International Conference of the IEEE on Engineering in Medicine and Biology Society, pp. 1343–1346 (2008)Google Scholar
- 10.Nobile, M.S., Cazzaniga, P., Besozzi, D., Pescini, D., Mauri, G.: cuTauLeaping: A GPU-powered tau-leaping stochastic simulator for massive parallel analyses of biological systems. PLoS ONE 9, e91963 (2014)CrossRefGoogle Scholar
- 11.Nobile, M.S., Besozzi, D., Cazzaniga, P., Mauri, G., Pescini, D.: cupSODA: A CUDA-powered simulator of mass-action kinetics. In: Malyshkin, V. (ed.) PaCT 2013. LNCS, vol. 7979, pp. 344–357. Springer, Heidelberg (2013) CrossRefGoogle Scholar
- 12.Nobile, M.S., Cazzaniga, P., Besozzi, D., Mauri, G.: GPU-accelerated simulations of mass-action kinetics models with cupSODA. J. Supercomput. 69, 17–24 (2014)CrossRefGoogle Scholar
- 13.Bernaschi, M., Bisson, M., Salvadore, F.: Multi-Kepler GPU vs. multi-Intel MIC for spin systems simulations. Comput. Phys. Commun. 185, 2495–2503 (2014)CrossRefGoogle Scholar
- 14.Fang, J., Varbanescu, A.L., Imbernon, B., Cecilia, J.M., Perez-Sanchez, H.: Parallel computation of non-bonded interactions in drug discovery: NVidia GPUs vs. Intel Xeon Phi. In: Proceedings of the 2nd International Work-Conference on Bioinformatics and Biomedical Engineering. pp. 579–588 (2014)Google Scholar
- 15.Halyo, V., LeGresley, P., Lujan, P., Karpusenko, V., Vladimirov, A.: First evaluation of the CPU, GPGPU and MIC architectures for real time particle tracking based on Hough transform at the LHC. J. Instrum. 9, P04005 (2014)CrossRefGoogle Scholar
- 16.Lyakh, D.I.: An efficient tensor transpose algorithm for multicore CPU, Intel Xeon Phi, and NVidia Tesla GPU. Comput. Phys. Commun. 189, 84–91 (2015)CrossRefGoogle Scholar
- 17.Shimoda, T., Suzuki, S., Ohue, M., Ishida, T., Akiyama, Y.: Protein-protein docking on hardware accelerators: comparison of GPU and MIC architectures. BMC Syst. Biol. 9, S6 (2015)CrossRefGoogle Scholar
- 18.Nobile, M.S., Besozzi, D., Cazzaniga, P., Mauri, G., Pescini, D.: A GPU-based multi-swarm PSO method for parameter estimation in stochastic biological systems exploiting discrete-time target series. In: Giacobini, M., Vanneschi, L., Bush, W.S. (eds.) EvoBIO 2012. LNCS, vol. 7246, pp. 74–85. Springer, Heidelberg (2012) CrossRefGoogle Scholar
- 19.Butcher, J.C.: Numerical Methods for Ordinary Differential Equations. John Wiley & Sons, New York (2003)CrossRefzbMATHGoogle Scholar
- 20.Nickolls, J., Dally, W.J.: The GPU computing era. Micro IEEE 30, 56–69 (2010)CrossRefGoogle Scholar
- 21.Farber, R.M.: Topical perspective on massive threading and parallelism. J. Mol. Graph. Model. 30, 82–89 (2011)CrossRefGoogle Scholar
- 22.Harvey, M.J., Fabritiis, G.D.: A survey of computational molecular science using graphics processing units. WIREs Comput. Mol. Sci. 2, 734–742 (2012)CrossRefGoogle Scholar
- 23.Cavazzoni, C.: EURORA: a European architecture toward exascale. In: Proceedings of the Future HPC Systems: The Challenges of Power-Constrained Performance, 1, ACM (2012)Google Scholar
- 24.Komarov, I., D’Souza, R.M., Tapia, J.J.: Accelerating the Gillespie \(\tau \)-leaping method using graphics processing units. PLoS ONE 7, e37370 (2012)CrossRefGoogle Scholar
- 25.Fang, J., Varbanescu, A.L., Sips, H., Zhang, L., Che, Y., Xu, C.: Benchmarking Intel Xeon Phi to guide kernel design. Technical report, Delft University of Technology, Netherlands (2013)Google Scholar
- 26.Kraus, J., Pivanti, M., Schifano, S.F., Tripiccione, R., Zanella, M.: Benchmarking GPUswith a parallel Lattice-Boltzmann code. In: IEEE 25th International Symposium on ComputerArchitecture and High Performance Computing, pp. 160–167 (2013)Google Scholar
- 27.Besozzi, D., Cazzaniga, P., Pescini, D., Mauri, G., Colombo, S., Martegani, E.: The role of feedback control mechanisms on the establishment of oscillatory regimes in the Ras/cAMP/PKA pathway in S. cerevisiae. EURASIP J. Bioinform. Syst. Biol. 2012 (2012)Google Scholar
- 28.Gunawan, R., Cao, Y., Petzold, L.R., Doyle, F.J.: Sensitivity analysis of discrete stochastic systems. Biophys. J. 88, 2530–2540 (2005)CrossRefGoogle Scholar