Advertisement

Fully-Asynchronous Cache-Efficient Simulation of Detailed Neural Networks

  • Bruno R. C. Magalhães
  • Thomas Sterling
  • Michael Hines
  • Felix Schürmann
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11538)

Abstract

Modern asynchronous runtime systems allow the re-thinking of large-scale scientific applications. With the example of a simulator of morphologically detailed neural networks, we show how detaching from the commonly used bulk-synchronous parallel (BSP) execution allows for the increase of prefetching capabilities, better cache locality, and a overlap of computation and communication, consequently leading to a lower time to solution. Our strategy removes the operation of collective synchronization of ODEs’ coupling information, and takes advantage of the pairwise time dependency between equations, leading to a fully-asynchronous exhaustive yet not speculative stepping model. Combined with fully linear data structures, communication reduce at compute node level, and an earliest equation steps first scheduler, we perform an acceleration at the cache level that reduces communication and time to solution by maximizing the number of timesteps taken per neuron at each iteration.

Our methods were implemented on the core kernel of the NEURON scientific application. Asynchronicity and distributed memory space are provided by the HPX runtime system for the ParalleX execution model. Benchmark results demonstrate a superlinear speed-up that leads to a reduced runtime compared to the bulk synchronous execution, yielding a speed-up between 25% to 65% across different compute architectures, and in the order of 15% to 40% for distributed executions.

Notes

Acknowledgements

The work was supported by funding from the ETH Domain for the Blue Brain Project (BBP). The super-computing infrastructures were provided by the Blue Brain Project at EPFL and Indiana University. A portion of Michael Hines efforts was supported by NINDS grant R01NS11613.

References

  1. 1.
    Hines, M.L., Carnevale, N.T.: The neuron simulation environment. Neural Comput. 9(6), 1179–1209 (1997)CrossRefGoogle Scholar
  2. 2.
    Ovcharenko, A., et al.: Simulating morphologically detailed neuronal networks at extreme scale. Advances in Parallel Computing (2015)Google Scholar
  3. 3.
    Hines, M.L., Markram, H., Schürmann, F.: Fully implicit parallel simulation of single neurons. J. Comput. Neurosci. 25(3), 439–448 (2008)MathSciNetCrossRefGoogle Scholar
  4. 4.
    Magalhaes, B., Hines, M., Sterling, T., Schuermann, F.: Asynchronous SIMD-enabled branch-parallelism of morphologically-detailed neuron models (2019, unpublished)Google Scholar
  5. 5.
    Kozloski, J., Wagner, J.: An ultrascalable solution to large-scale neural tissue simulation. Front. Neuroinform. 5, 15 (2011).  https://doi.org/10.3389/fninf.2011.00015CrossRefGoogle Scholar
  6. 6.
    Markram, H., et al.: Reconstruction and simulation of neocortical microcircuitry. Cell 163(2), 456–492 (2015)CrossRefGoogle Scholar
  7. 7.
    Zenke, F., Gerstner, W.: Limits to high-speed simulations of spiking neural networks using general-purpose computers. Front. Neuroinform. 8, 76 (2014). http://www.frontiersin.org/neuroinformatics/10.3389/fninf.2014.00076/abstractCrossRefGoogle Scholar
  8. 8.
    Blue Brain Project: Coreneuron - simulator optimized for large scale neural network simulations. https://github.com/bluebrain/CoreNeuron
  9. 9.
    Sterling, T., Anderson, M., Bohan, P.K., Brodowicz, M., Kulkarni, A., Zhang, B.: Towards exascale co-design in a runtime system. In: Exascale Applications and Software Conference, Stockholm, Sweden (2014)Google Scholar
  10. 10.
    Hodgkin, A.L., Huxley, A.F.: A quantitative description of membrane current and its application to conduction and excitation in nerve. J. Physiol. 117(4), 500–544 (1952)CrossRefGoogle Scholar
  11. 11.
    Niebur, E.: Neuronal cable theory. Scholarpedia 3(5), 2674 (2008). Revision 121893CrossRefGoogle Scholar
  12. 12.
    Arge, L., Bender, M.A., Demaine, E.D., Holland-Minkley, B., Munro, J.I.: Cache-oblivious priority queue and graph algorithm applications. In: Proceedings of the Thirty-Fourth Annual ACM Symposium on Theory of Computing, pp. 268–276. ACM (2002)Google Scholar
  13. 13.
    Treibig, J., Hager, G., Wellein, G.: LIKWID: a lightweight performance-oriented tool suite for x86 multicore environments. In: 2010 39th International Conference on Parallel Processing Workshops (ICPPW), pp. 207–216 . IEEE (2010)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  • Bruno R. C. Magalhães
    • 1
  • Thomas Sterling
    • 2
  • Michael Hines
    • 3
  • Felix Schürmann
    • 1
  1. 1.Blue Brain ProjectÉcole polytechnique fédérale de Lausanne Biotech CampusGenevaSwitzerland
  2. 2.CREST - Center for Research in Extreme Scale TechnologiesIndiana UniversityBloomingtonUSA
  3. 3.Department of NeuroscienceYale UniversityNew HavenUSA

Personalised recommendations