Abstract
This paper presents Cyme, a C++ library aiming at abstracting the usage of SIMD instructions while maximizing the usage of the underlying hardware. Unlike similar efforts such as Boost.simd or VC, Cyme provides generic high level containers to the users which hides SIMD complexity. Cyme accomplishes this by 1) optimization of the Abstract Syntax Tree using Expression Templates Programming to prevent temporary copies and maximize the use of Fuse Multiply Add instructions and 2) creating a data layout in memory (AoS or AoSoA), which minimizes data addressing and manipulation throughout all SIMD registers. Implementation of Cyme library has been accomplished on the IBM Blue Gene/Q architecture using the 256 bit SIMD extensions (QPX) of the Power A2 processor. Functionality of the library is demonstrated on a computationally intensive kernel of a neuro-scientific application where an increase of GFlop/s performance by a factor of 6.72 over the original implementation is observed using Clang compiler.
Keywords
- SIMD
- Vectorization
- Memory layout
- C++
- Generic Programming
This is a preview of subscription content, access via your institution.
Buying options
Preview
Unable to display preview. Download preview PDF.
References
Bik, A.J.C.: Software Vectorization Handbook, The: Applying Intel Multimedia Extensions for Maximum Performance. Intel Press (2004)
Esterie, P., Gaunard, M., Falcou, J., Lapresté, J.T., Rozoy, B.: Boost.simd: generic programming for portable simdization. In: PACT, pp. 431–432. ACM (2012)
Kretz, M., Lindenstruth, V.: Vc: A c++ library for explicit vectorization. Software: Practice and Experience 42(11), 1409–1430 (2012)
Vandevoorde, D., Josuttis, N.M.: C++ Templates. Addison-Weesley (2002)
http://software.intel.com/en-us/articles/intel-array-building-blocks
Markram, H.: The blue brain project. Nature reviews. Neuroscience 7(2) (2006)
Hay, E., Hill, S., Schürmann, F., Markram, H., Segev, I.: Models of neocortical layer 5b pyramidal cells capturing a wide range of dendritic and perisomatic active properties. PLoS Comput. Biol. 7(7) (2011)
Core Conductor Theory and Cable Properties of Neurons. J. Wiley & Sons (2011)
Herculano-Houzel, S., Mota, B., Lent, R.: Cellular scaling rules for rodent brains. Proceedings of the National Academy of Sciences of the United States of America 103(32), 12138–12143 (2006)
IBM System Blue Gene Solution: BG/Q Application Development. IBM (2013)
Finkel, H.: http://trac.alcf.anl.gov/projects/llvm-bgq
Salapura, V., Ganesan, K., Gara, A., Gschwind, M., Sexton, J.C., Walkup, R.: Next-generation performance counters: Towards monitoring over thousand concurrent events. In: ISPASS, pp. 139–146. IEEE (2008)
Williams, S., Waterman, A., Patterson, D.: Roofline: An insightful visual performance model for multicore architectures. Commun. ACM 52(4), 65–76 (2009)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Ewart, T., Delalondre, F., Schürmann, F. (2014). Cyme: A Library Maximizing SIMD Computation on User-Defined Containers. In: Kunkel, J.M., Ludwig, T., Meuer, H.W. (eds) Supercomputing. ISC 2014. Lecture Notes in Computer Science, vol 8488. Springer, Cham. https://doi.org/10.1007/978-3-319-07518-1_29
Download citation
DOI: https://doi.org/10.1007/978-3-319-07518-1_29
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-07517-4
Online ISBN: 978-3-319-07518-1
eBook Packages: Computer ScienceComputer Science (R0)