Abstract
The aim of this paper is to present a new high-performance implementation of Marsa-LFIB4 which is an example of high-quality multiple recursive pseudorandom number generators. We propose a new algorithmic approach that combines language-based vectorization techniques together with a new divide-and-conquer method that exploits a special sparse structure of the matrix obtained from the recursive formula that defines the generator. We also show how the use of intrinsics for Intel AVX2 and AVX512 vector extensions can improve the performance. Our new implementation achieves good performance on several multicore architectures and it is much more energy-efficient than simple SIMD-optimized implementations.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Aluru, S.: Lagged Fibonacci random number generators for distributed memory parallel computers. J. Parallel Distrib. Comput. 45(1), 1–12 (1997). https://doi.org/10.1006/jpdc.1997.1363
Bauke, H., Mertens, S.: Random numbers for large-scale distributed Monte Carlo simulations. Phys. Rev. E 75, 066701 (2007). https://doi.org/10.1103/PhysRevE.75.066701
Bisseling, R.H.: Parallel Scientific Computation. A structured Approach Using BSP and MPI. Oxford University Press, Oxford (2004)
Bradley, T., du Toit, J., Tong, R., Giles, M., Woodhams, P.: Parallelization techniques for random numbers generators. In: GPU Computing Gems, pp. 231–246. Gems Emerald Edition (2011)
Brent, R.P.: Uniform random number generators for supercomputers. In: Proceedings of the Fifth Australian Supercomputer Conference, pp. 95–104 (1992)
Jeffers, J., Reinders, J., Sodani, A.: Intel Xeon Phi Processor High-Performance Programming. Knights Landing Edition. Morgan Kaufman, Cambridge (2016)
Khan, K.N., Hirki, M., Niemi, T., Nurminen, J.K., Ou, Z.: RAPL in action experiences in using RAPL for power measurements. ACM Trans. Model. Perform. Eval. Comput. Syst. 3(2), 9:1–9:26 (2018). https://doi.org/10.1145/3177754
Knuth, D.E.: The Art of Computer Programming, Volume II: Seminumerical Algorithms, 2nd edn. Addison-Wesley, Boston (1981)
Knuth, D.E.: MMIXware. LNCS, vol. 1750. Springer, Heidelberg (1999). https://doi.org/10.1007/3-540-46611-8
Łapa, K., Cpałka, K., Przybył, A., Grzanek, K.: Negative space-based population initialization algorithm (NSPIA). In: Rutkowski, L., Scherer, R., Korytkowski, M., Pedrycz, W., Tadeusiewicz, R., Zurada, J.M. (eds.) ICAISC 2018, Part I. LNCS (LNAI), vol. 10841, pp. 449–461. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-91253-0_42
L’Ecuyer, P.: Good parameters and implementations for combined multiple recursive random number generators. Oper. Res. 47(1), 159–164 (1999). https://doi.org/10.1287/opre.47.1.159
L’Ecuyer, P., Simard, R.J.: TestU01: AC library for empirical testing of random number generators. ACM Trans. Math. Softw. 33(4), 22:1–22:40 (2007). https://doi.org/10.1145/1268776.1268777
Marsaglia, G.: Random numbers for C: The END? Posted to the electronic billboard sci.crypt.random-numbers (1999)
Mascagni, M., Srinivasan, A.: Algorithm 806: SPRNG: a scalable library for pseudorandom number generation. ACM Trans. Math. Softw. 26(3), 436–461 (2000). https://doi.org/10.1145/358407.358427
Mascagni, M., Srinivasan, A.: Parameterizing parallel multiplicative lagged-Fibonacci generators. Parallel Comput. 30(5–6), 899–916 (2004). https://doi.org/10.1016/j.parco.2004.06.001
Ökten, G., Willyard, M.: Parameterization based on randomized quasi-Monte Carlo methods. Parallel Comput. 36(7), 415–422 (2010). https://doi.org/10.1016/j.parco.2010.03.003
Percus, O.E., Kalos, M.H.: Random number generators for MIMD parallel processors. J. Parallel Distrib. Comput. 6(3), 477–497 (1989). https://doi.org/10.1016/0743-7315(89)90002-6
Stpiczyński, P.: Parallel algorithms for solving linear recurrence systems. In: Bougé, L., Cosnard, M., Robert, Y., Trystram, D. (eds.) CONPAR/VAPP -1992. LNCS, vol. 634, pp. 343–348. Springer, Heidelberg (1992). https://doi.org/10.1007/3-540-55895-0_428
Stpiczyński, P.: Vectorized algorithm for multidimensional Monte Carlo integration on modern GPU, CPU and MIC architectures. J. Supercomput. 74(2), 936–952 (2018). https://doi.org/10.1007/s11227-017-2172-x
Stpiczyński, P., Szałkowski, D., Potiopa, J.: Parallel GPU-accelerated recursion-based generators of pseudorandom numbers. In: Proceedings of the Federated Conference on Computer Science and Information Systems, September 9–12, 2012, Wroclaw, Poland, pp. 571–578. IEEE Computer Society Press (2012). http://fedcsis.org/proceedings/2012/pliks/380.pdf
Szałkowski, D., Stpiczyński, P.: Using distributed memory parallel computers and GPU clusters for multidimensional Monte Carlo integration. Concurr. Comput. Pract. Exp. 27(4), 923–936 (2015). https://doi.org/10.1002/cpe.3365
Acknowledgements
The use of computer resources installed at Maria Curie-Skłodowska University in Lublin and Czestochowa University of Technology is kindly acknowledged.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Stpiczyński, P. (2020). Parallel Fully Vectorized Marsa-LFIB4: Algorithmic and Language-Based Optimization of Recursive Computations. In: Wyrzykowski, R., Deelman, E., Dongarra, J., Karczewski, K. (eds) Parallel Processing and Applied Mathematics. PPAM 2019. Lecture Notes in Computer Science(), vol 12044. Springer, Cham. https://doi.org/10.1007/978-3-030-43222-5_1
Download citation
DOI: https://doi.org/10.1007/978-3-030-43222-5_1
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-43221-8
Online ISBN: 978-3-030-43222-5
eBook Packages: Computer ScienceComputer Science (R0)