Parallel Fully Vectorized Marsa-LFIB4: Algorithmic and Language-Based Optimization of Recursive Computations

  • Przemysław StpiczyńskiEmail author
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 12044)


The aim of this paper is to present a new high-performance implementation of Marsa-LFIB4 which is an example of high-quality multiple recursive pseudorandom number generators. We propose a new algorithmic approach that combines language-based vectorization techniques together with a new divide-and-conquer method that exploits a special sparse structure of the matrix obtained from the recursive formula that defines the generator. We also show how the use of intrinsics for Intel AVX2 and AVX512 vector extensions can improve the performance. Our new implementation achieves good performance on several multicore architectures and it is much more energy-efficient than simple SIMD-optimized implementations.


Pseudorandom numbers Recursive generators Language-based vectorization Intrinsics Algorithmic approach OpenMP 



The use of computer resources installed at Maria Curie-Skłodowska University in Lublin and Czestochowa University of Technology is kindly acknowledged.


  1. 1.
    Aluru, S.: Lagged Fibonacci random number generators for distributed memory parallel computers. J. Parallel Distrib. Comput. 45(1), 1–12 (1997). Scholar
  2. 2.
    Bauke, H., Mertens, S.: Random numbers for large-scale distributed Monte Carlo simulations. Phys. Rev. E 75, 066701 (2007). Scholar
  3. 3.
    Bisseling, R.H.: Parallel Scientific Computation. A structured Approach Using BSP and MPI. Oxford University Press, Oxford (2004)CrossRefGoogle Scholar
  4. 4.
    Bradley, T., du Toit, J., Tong, R., Giles, M., Woodhams, P.: Parallelization techniques for random numbers generators. In: GPU Computing Gems, pp. 231–246. Gems Emerald Edition (2011)Google Scholar
  5. 5.
    Brent, R.P.: Uniform random number generators for supercomputers. In: Proceedings of the Fifth Australian Supercomputer Conference, pp. 95–104 (1992)Google Scholar
  6. 6.
    Jeffers, J., Reinders, J., Sodani, A.: Intel Xeon Phi Processor High-Performance Programming. Knights Landing Edition. Morgan Kaufman, Cambridge (2016)Google Scholar
  7. 7.
    Khan, K.N., Hirki, M., Niemi, T., Nurminen, J.K., Ou, Z.: RAPL in action experiences in using RAPL for power measurements. ACM Trans. Model. Perform. Eval. Comput. Syst. 3(2), 9:1–9:26 (2018). Scholar
  8. 8.
    Knuth, D.E.: The Art of Computer Programming, Volume II: Seminumerical Algorithms, 2nd edn. Addison-Wesley, Boston (1981)zbMATHGoogle Scholar
  9. 9.
    Knuth, D.E.: MMIXware. LNCS, vol. 1750. Springer, Heidelberg (1999). Scholar
  10. 10.
    Łapa, K., Cpałka, K., Przybył, A., Grzanek, K.: Negative space-based population initialization algorithm (NSPIA). In: Rutkowski, L., Scherer, R., Korytkowski, M., Pedrycz, W., Tadeusiewicz, R., Zurada, J.M. (eds.) ICAISC 2018, Part I. LNCS (LNAI), vol. 10841, pp. 449–461. Springer, Cham (2018). Scholar
  11. 11.
    L’Ecuyer, P.: Good parameters and implementations for combined multiple recursive random number generators. Oper. Res. 47(1), 159–164 (1999). Scholar
  12. 12.
    L’Ecuyer, P., Simard, R.J.: TestU01: AC library for empirical testing of random number generators. ACM Trans. Math. Softw. 33(4), 22:1–22:40 (2007). Scholar
  13. 13.
    Marsaglia, G.: Random numbers for C: The END? Posted to the electronic billboard sci.crypt.random-numbers (1999)Google Scholar
  14. 14.
    Mascagni, M., Srinivasan, A.: Algorithm 806: SPRNG: a scalable library for pseudorandom number generation. ACM Trans. Math. Softw. 26(3), 436–461 (2000). Scholar
  15. 15.
    Mascagni, M., Srinivasan, A.: Parameterizing parallel multiplicative lagged-Fibonacci generators. Parallel Comput. 30(5–6), 899–916 (2004). Scholar
  16. 16.
    Ökten, G., Willyard, M.: Parameterization based on randomized quasi-Monte Carlo methods. Parallel Comput. 36(7), 415–422 (2010). Scholar
  17. 17.
    Percus, O.E., Kalos, M.H.: Random number generators for MIMD parallel processors. J. Parallel Distrib. Comput. 6(3), 477–497 (1989). Scholar
  18. 18.
    Stpiczyński, P.: Parallel algorithms for solving linear recurrence systems. In: Bougé, L., Cosnard, M., Robert, Y., Trystram, D. (eds.) CONPAR/VAPP -1992. LNCS, vol. 634, pp. 343–348. Springer, Heidelberg (1992). Scholar
  19. 19.
    Stpiczyński, P.: Vectorized algorithm for multidimensional Monte Carlo integration on modern GPU, CPU and MIC architectures. J. Supercomput. 74(2), 936–952 (2018). Scholar
  20. 20.
    Stpiczyński, P., Szałkowski, D., Potiopa, J.: Parallel GPU-accelerated recursion-based generators of pseudorandom numbers. In: Proceedings of the Federated Conference on Computer Science and Information Systems, September 9–12, 2012, Wroclaw, Poland, pp. 571–578. IEEE Computer Society Press (2012).
  21. 21.
    Szałkowski, D., Stpiczyński, P.: Using distributed memory parallel computers and GPU clusters for multidimensional Monte Carlo integration. Concurr. Comput. Pract. Exp. 27(4), 923–936 (2015). Scholar

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  1. 1.Institute of Computer ScienceMaria Curie–Skłodowska UniversityLublinPoland

Personalised recommendations