Fully-Asynchronous Cache-Efficient Simulation of Detailed Neural Networks
- 863 Downloads
Modern asynchronous runtime systems allow the re-thinking of large-scale scientific applications. With the example of a simulator of morphologically detailed neural networks, we show how detaching from the commonly used bulk-synchronous parallel (BSP) execution allows for the increase of prefetching capabilities, better cache locality, and a overlap of computation and communication, consequently leading to a lower time to solution. Our strategy removes the operation of collective synchronization of ODEs’ coupling information, and takes advantage of the pairwise time dependency between equations, leading to a fully-asynchronous exhaustive yet not speculative stepping model. Combined with fully linear data structures, communication reduce at compute node level, and an earliest equation steps first scheduler, we perform an acceleration at the cache level that reduces communication and time to solution by maximizing the number of timesteps taken per neuron at each iteration.
Our methods were implemented on the core kernel of the NEURON scientific application. Asynchronicity and distributed memory space are provided by the HPX runtime system for the ParalleX execution model. Benchmark results demonstrate a superlinear speed-up that leads to a reduced runtime compared to the bulk synchronous execution, yielding a speed-up between 25% to 65% across different compute architectures, and in the order of 15% to 40% for distributed executions.
The work was supported by funding from the ETH Domain for the Blue Brain Project (BBP). The super-computing infrastructures were provided by the Blue Brain Project at EPFL and Indiana University. A portion of Michael Hines efforts was supported by NINDS grant R01NS11613.
- 2.Ovcharenko, A., et al.: Simulating morphologically detailed neuronal networks at extreme scale. Advances in Parallel Computing (2015)Google Scholar
- 4.Magalhaes, B., Hines, M., Sterling, T., Schuermann, F.: Asynchronous SIMD-enabled branch-parallelism of morphologically-detailed neuron models (2019, unpublished)Google Scholar
- 7.Zenke, F., Gerstner, W.: Limits to high-speed simulations of spiking neural networks using general-purpose computers. Front. Neuroinform. 8, 76 (2014). http://www.frontiersin.org/neuroinformatics/10.3389/fninf.2014.00076/abstractCrossRefGoogle Scholar
- 8.Blue Brain Project: Coreneuron - simulator optimized for large scale neural network simulations. https://github.com/bluebrain/CoreNeuron
- 9.Sterling, T., Anderson, M., Bohan, P.K., Brodowicz, M., Kulkarni, A., Zhang, B.: Towards exascale co-design in a runtime system. In: Exascale Applications and Software Conference, Stockholm, Sweden (2014)Google Scholar
- 12.Arge, L., Bender, M.A., Demaine, E.D., Holland-Minkley, B., Munro, J.I.: Cache-oblivious priority queue and graph algorithm applications. In: Proceedings of the Thirty-Fourth Annual ACM Symposium on Theory of Computing, pp. 268–276. ACM (2002)Google Scholar
- 13.Treibig, J., Hager, G., Wellein, G.: LIKWID: a lightweight performance-oriented tool suite for x86 multicore environments. In: 2010 39th International Conference on Parallel Processing Workshops (ICPPW), pp. 207–216 . IEEE (2010)Google Scholar