Parallel Fully Vectorized Marsa-LFIB4: Algorithmic and Language-Based Optimization of Recursive Computations

Stpiczyński, Przemysław

doi:10.1007/978-3-030-43222-5_1

Przemysław Stpiczyński ORCID: orcid.org/0000-0001-8661-414X¹²

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 12044))

Included in the following conference series:

International Conference on Parallel Processing and Applied Mathematics

605 Accesses

Abstract

The aim of this paper is to present a new high-performance implementation of Marsa-LFIB4 which is an example of high-quality multiple recursive pseudorandom number generators. We propose a new algorithmic approach that combines language-based vectorization techniques together with a new divide-and-conquer method that exploits a special sparse structure of the matrix obtained from the recursive formula that defines the generator. We also show how the use of intrinsics for Intel AVX2 and AVX512 vector extensions can improve the performance. Our new implementation achieves good performance on several multicore architectures and it is much more energy-efficient than simple SIMD-optimized implementations.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Aluru, S.: Lagged Fibonacci random number generators for distributed memory parallel computers. J. Parallel Distrib. Comput. 45(1), 1–12 (1997). https://doi.org/10.1006/jpdc.1997.1363
Article MATH Google Scholar
Bauke, H., Mertens, S.: Random numbers for large-scale distributed Monte Carlo simulations. Phys. Rev. E 75, 066701 (2007). https://doi.org/10.1103/PhysRevE.75.066701
Article MathSciNet Google Scholar
Bisseling, R.H.: Parallel Scientific Computation. A structured Approach Using BSP and MPI. Oxford University Press, Oxford (2004)
Book Google Scholar
Bradley, T., du Toit, J., Tong, R., Giles, M., Woodhams, P.: Parallelization techniques for random numbers generators. In: GPU Computing Gems, pp. 231–246. Gems Emerald Edition (2011)
Google Scholar
Brent, R.P.: Uniform random number generators for supercomputers. In: Proceedings of the Fifth Australian Supercomputer Conference, pp. 95–104 (1992)
Google Scholar
Jeffers, J., Reinders, J., Sodani, A.: Intel Xeon Phi Processor High-Performance Programming. Knights Landing Edition. Morgan Kaufman, Cambridge (2016)
Google Scholar
Khan, K.N., Hirki, M., Niemi, T., Nurminen, J.K., Ou, Z.: RAPL in action experiences in using RAPL for power measurements. ACM Trans. Model. Perform. Eval. Comput. Syst. 3(2), 9:1–9:26 (2018). https://doi.org/10.1145/3177754
Article Google Scholar
Knuth, D.E.: The Art of Computer Programming, Volume II: Seminumerical Algorithms, 2nd edn. Addison-Wesley, Boston (1981)
MATH Google Scholar
Knuth, D.E.: MMIXware. LNCS, vol. 1750. Springer, Heidelberg (1999). https://doi.org/10.1007/3-540-46611-8
Book MATH Google Scholar
Łapa, K., Cpałka, K., Przybył, A., Grzanek, K.: Negative space-based population initialization algorithm (NSPIA). In: Rutkowski, L., Scherer, R., Korytkowski, M., Pedrycz, W., Tadeusiewicz, R., Zurada, J.M. (eds.) ICAISC 2018, Part I. LNCS (LNAI), vol. 10841, pp. 449–461. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-91253-0_42
Chapter Google Scholar
L’Ecuyer, P.: Good parameters and implementations for combined multiple recursive random number generators. Oper. Res. 47(1), 159–164 (1999). https://doi.org/10.1287/opre.47.1.159
Article MathSciNet MATH Google Scholar
L’Ecuyer, P., Simard, R.J.: TestU01: AC library for empirical testing of random number generators. ACM Trans. Math. Softw. 33(4), 22:1–22:40 (2007). https://doi.org/10.1145/1268776.1268777
Article MATH Google Scholar
Marsaglia, G.: Random numbers for C: The END? Posted to the electronic billboard sci.crypt.random-numbers (1999)
Google Scholar
Mascagni, M., Srinivasan, A.: Algorithm 806: SPRNG: a scalable library for pseudorandom number generation. ACM Trans. Math. Softw. 26(3), 436–461 (2000). https://doi.org/10.1145/358407.358427
Article Google Scholar
Mascagni, M., Srinivasan, A.: Parameterizing parallel multiplicative lagged-Fibonacci generators. Parallel Comput. 30(5–6), 899–916 (2004). https://doi.org/10.1016/j.parco.2004.06.001
Article MathSciNet Google Scholar
Ökten, G., Willyard, M.: Parameterization based on randomized quasi-Monte Carlo methods. Parallel Comput. 36(7), 415–422 (2010). https://doi.org/10.1016/j.parco.2010.03.003
Article MathSciNet MATH Google Scholar
Percus, O.E., Kalos, M.H.: Random number generators for MIMD parallel processors. J. Parallel Distrib. Comput. 6(3), 477–497 (1989). https://doi.org/10.1016/0743-7315(89)90002-6
Article Google Scholar
Stpiczyński, P.: Parallel algorithms for solving linear recurrence systems. In: Bougé, L., Cosnard, M., Robert, Y., Trystram, D. (eds.) CONPAR/VAPP -1992. LNCS, vol. 634, pp. 343–348. Springer, Heidelberg (1992). https://doi.org/10.1007/3-540-55895-0_428
Chapter Google Scholar
Stpiczyński, P.: Vectorized algorithm for multidimensional Monte Carlo integration on modern GPU, CPU and MIC architectures. J. Supercomput. 74(2), 936–952 (2018). https://doi.org/10.1007/s11227-017-2172-x
Article Google Scholar
Stpiczyński, P., Szałkowski, D., Potiopa, J.: Parallel GPU-accelerated recursion-based generators of pseudorandom numbers. In: Proceedings of the Federated Conference on Computer Science and Information Systems, September 9–12, 2012, Wroclaw, Poland, pp. 571–578. IEEE Computer Society Press (2012). http://fedcsis.org/proceedings/2012/pliks/380.pdf
Szałkowski, D., Stpiczyński, P.: Using distributed memory parallel computers and GPU clusters for multidimensional Monte Carlo integration. Concurr. Comput. Pract. Exp. 27(4), 923–936 (2015). https://doi.org/10.1002/cpe.3365
Article Google Scholar

Download references

Acknowledgements

The use of computer resources installed at Maria Curie-Skłodowska University in Lublin and Czestochowa University of Technology is kindly acknowledged.

Author information

Authors and Affiliations

Institute of Computer Science, Maria Curie–Skłodowska University, Akademicka 9/519, 20-033, Lublin, Poland
Przemysław Stpiczyński

Authors

Przemysław Stpiczyński
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Przemysław Stpiczyński .

Editor information

Editors and Affiliations

Czestochowa University of Technology, Czestochowa, Poland
Roman Wyrzykowski
University of Southern California, Marina del Rey, CA, USA
Ewa Deelman
University of Tennessee, Knoxville, TN, USA
Jack Dongarra
Czestochowa University of Technology, Czestochowa, Poland
Konrad Karczewski

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Stpiczyński, P. (2020). Parallel Fully Vectorized Marsa-LFIB4: Algorithmic and Language-Based Optimization of Recursive Computations. In: Wyrzykowski, R., Deelman, E., Dongarra, J., Karczewski, K. (eds) Parallel Processing and Applied Mathematics. PPAM 2019. Lecture Notes in Computer Science(), vol 12044. Springer, Cham. https://doi.org/10.1007/978-3-030-43222-5_1

Download citation

DOI: https://doi.org/10.1007/978-3-030-43222-5_1
Published: 19 March 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-43221-8
Online ISBN: 978-3-030-43222-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics