MCSTL: The Multi-core Standard Template Library

  • Johannes Singler
  • Peter Sanders
  • Felix Putze
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4641)


Future gain in computing performance will not stem from increased clock rates, but from even more cores in a processor. Since automatic parallelization is still limited to easily parallelizable sections of the code, most applications will soon have to support parallelism explicitly. The Multi-Core Standard Template Library (MCSTL) simplifies parallelization by providing efficient parallel implementations of the algorithms in the C++ Standard Template Library. Thus, simple recompilation will provide partial parallelization of applications that make consistent use of the STL. We present performance measurements on several architectures. For example, our sorter achieves a speedup of 21 on an 8-core 32-thread SUN T1.


Load Balance Parallel Algorithm Sequential Algorithm Global Rank Automatic Parallelization 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Putze, F., Sanders, P., Singler, J.: The multi-core standard template library. In: ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pp. 144–145. ACM Press, New York (2007)CrossRefGoogle Scholar
  2. 2.
    Plauger, P.J., Stepanov, A.A., Lee, M., Musser, D.R.: The C++ Standard Template Library. Prentice-Hall, Englewood Cliffs (2000)Google Scholar
  3. 3.
    An, P., Jula, A., Rus, S., Saunders, S., Smith, T., Tanase, G., Thomas, N., Amato, N.M., Rauchwerger, L.: STAPL: An Adaptive, Generic Parallel C++ Library. In: Dietz, H.G. (ed.) LCPC 2001. LNCS, vol. 2624, pp. 193–208. Springer, Heidelberg (2003), CrossRefGoogle Scholar
  4. 4.
    Thomas, N., Tanase, G., Tkachyshyn, O., Perdue, J., Amato, N.M., Rauchwerger, L.: A framework for adaptive algorithm selection in STAPL. In: ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pp. 277–288. ACM Press, New York (2005)Google Scholar
  5. 5.
    Baertschiger, D.: Multi-processing template library. Master thesis, Université de Genève (in French) (2006),
  6. 6.
    Varman, P.J., Scheufler, S.D., Iyer, B.R., Ricard, G.R.: Merging Multiple Lists on Hierarchical-Memory Multiprocessors. Journal of Parallel and Distributed Computing 12(2), 171–177 (1991)zbMATHCrossRefGoogle Scholar
  7. 7.
    Tsigas, P., Zhang, Y.: A simple, fast parallel implementation of quicksort and its performance evaluation on SUN enterprise 10000. In: 11th Euromicro Conference on Parallel, Distributed and Network-Based Processing, p. 372 (2003)Google Scholar
  8. 8.
    Finkel, R., Manber, U.: DIB – A distributed implementation of backtracking. ACM Transactions on Programming Languages and Systems 9(2), 235–256 (1987)CrossRefGoogle Scholar
  9. 9.
    Sanders, P.: Tree shaped computations as a model for parallel applications. In: ALV 1998 Workshop on Application Based Load Balancing (1998)Google Scholar
  10. 10.
    Blumofe, R.D., Leiserson, C.E.: Scheduling multithreaded computations by work stealing. Journal of the ACM 46(5), 720–748 (1999)zbMATHCrossRefMathSciNetGoogle Scholar
  11. 11.
    Sanders, P.: Randomized Receiver Initiated Load Balancing Algorithms for Tree Shaped Computations. The Computer Journal 45(5), 561–573 (2002)zbMATHCrossRefMathSciNetGoogle Scholar
  12. 12.
    JáJá, J.: An Introduction to Parallel Algorithms. Addison-Wesley, Reading (1992)zbMATHGoogle Scholar
  13. 13.
    Sanders, P.: Fast priority queues for cached memory. ACM Journal of Experimental Algorithmics 5 (2000)Google Scholar
  14. 14.
    Ranade, A., Kothari, S., Udupa, R.: Register Efficient Mergesorting. In: Prasanna, V.K., Vajapeyam, S., Valero, M. (eds.) HiPC 2000. LNCS, vol. 1970, pp. 96–103. Springer, Heidelberg (2000)CrossRefGoogle Scholar
  15. 15.
    Dementiev, R., Sanders, P.: Asynchronous parallel disk sorting. In: 15th ACM Symposium on Parallelism in Algorithms and Architectures, pp. 138–148. ACM Press, New York (2003)Google Scholar
  16. 16.
    Sanders, P.: Random permutations on distributed, external and hierarchical memory. Information Processing Letters 67(6), 305–310 (1998)CrossRefMathSciNetGoogle Scholar
  17. 17.
    Matsumoto, M., Nishimura, T.: Mersenne twister: A 623-dimensionally equidistributed uniform pseudo-random number generator. ACM Transactions on Modeling and Computer Simulation 8, 3–30 (1998)zbMATHCrossRefGoogle Scholar
  18. 18.
    OpenMP Architecture Review Board: OpenMP Application Program Interface, Version 2.5 (May 2005)Google Scholar
  19. 19.
    Singler, J.: The MCSTL website (June 2006),
  20. 20.
    Dementiev, R., Kettner, L., Sanders, P.: STXXL: Standard Template Library for XXL data sets. In: Brodal, G.S., Leonardi, S. (eds.) ESA 2005. LNCS, vol. 3669, pp. 640–651. Springer, Heidelberg (2005)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2007

Authors and Affiliations

  • Johannes Singler
    • 1
  • Peter Sanders
    • 1
  • Felix Putze
    • 1
  1. 1.Universität Karlsruhe 

Personalised recommendations