Advertisement

Springer Nature is making SARS-CoV-2 and COVID-19 research free. View research | View latest news | Sign up for updates

Parallel implementations of streamline simulators

  • 123 Accesses

  • 9 Citations

Abstract

We discuss various strategies for parallelizing streamline simulators and present a single-phase shared memory implementation. The choice of a shared memory programming model is motivated by its suitability for streamline simulation, as well as the rapid advance of multicore processors, which are readily available at low-cost. We show that streamline-based methods are easily parallelizable on shared memory architectures through their decomposition of the multidimensional transport equations into a large set of independent 1D transport solves. We tested both a specialized explicit load balancing algorithm that optimizes the streamline load distribution across threads to minimize the time that any of the threads are idle, and the dynamic load balancing algorithms provided by OpenMP on the shared memory machines. Our results clearly indicate that built-in schedulers are competitive with specialized load balancing strategies as long as the number of streamlines per thread is sufficiently high, which is the case in field applications. The average workload per thread is nominally insensitive to workload variations between individual streamlines, and any load balancing advantage offered by explicit strategies is not sufficient to overcome associated computational and parallel overhead. In terms of the allocation of streamlines or streamline segments to threads, we investigated both the distributed approach, in which threads are assigned streamline segments, and the owner approach, in which threads own complete streamlines. We found that the owner approach is most suitable. The slight advantage that the distributed approach has in terms of load balancing is not enough to compensate for the additional overheads. Moreover, the owner approach allows straightforward re-use of existing sequential codes, which is not the case for the distributed approach in case of implicit or adaptive implicit solution strategies. The tracing and mapping stages in streamline simulation have low parallel efficiency. However, in real-field models, the computational burden of the streamline solves is significantly heavier than that of the tracing and mapping stages, and therefore, the impact of these stages is limited. We tested the parallelization on three shared memory systems: a 24 dual-core processor Sun SPARC server; an eight-way Sun Opteron server, representative of the state-of-the-art shared memory systems in use in the industry; and the very recently released Sun Niagara II multicore machine that has eight floating point compute units on the chip. We test a single-phase flow problem on three heterogeneous reservoirs with varying well placements (this system gives the worst case scenario as the tracing and mapping costs are not negligible compared to the transport costs). For the SPARC and Opteron system, we find parallel efficiencies ranging between 60 and 75 for the tracer flow problems. The sublinear speedup is mostly due to communication overheads in the tracing and mapping stages. In applications with more complex physics, the relative contributions of these stages will decrease significantly, and we predict the parallel performance to be nearly linear. On the Niagara II, we obtain almost perfect linear scalability even for the single-phase flow problem thanks to the lowered communication costs on these architectures that have a shared cache. This result is all the more satisfactory considering that future server designs will be akin to this system.

This is a preview of subscription content, log in to check access.

References

  1. 1.

    Abate, J., Wang, P., Sepehrnoori, K.: Parallel compositional reservoir simulation on clusters of PCs. Int. J. High Perform. Comput. Appl. 15(1), 13–21 (2001)

  2. 2.

    Batycky, R.P.: A three-dimensional two-phase field scale streamline simulator. Ph.D. thesis, Stanford University (1997)

  3. 3.

    Berger, M., Aftosmis, M., Marshall, D., Murman, S.: Performance of a new CFD flow solver using a hybrid programming paradigm. J. Parallel Distrib. Comput. 65(4), 414–423 (2005)

  4. 4.

    Cao, H., Tchelepi, H., Wallis, J.R., Yardumian, H.: Parallel scalable unstructured CPR-type linear solver for reservoir simulation. In: SPE Annual Technical Conference and Exhibition, Dallas, 9–12 October 2005

  5. 5.

    Charlesworth, A.: The sun fireplane system interconnect. In: Proceedings of the 2001 ACM/IEEE conference on Supercomputing (CDROM), pp. 7–7, Denver, 10–16 November 2001

  6. 6.

    Chow, E., Falgout, R., Hu, J., Tuminaro, R., Yang, U.: A survey of parallelization techniques for multigrid solvers. In: Parallel Processing for Scientific Computing, SIAM Series on Software, Environments, and Tools. SIAM, Philadelphia (2006)

  7. 7.

    Christie, M., Blunt, M.: Tenth SPE comparative solution project: a comparison of upscaling techniques. SPE Reserv. Evalu. Eng. 4(4), 308–317 (2001)

  8. 8.

    Crane, M., Bratvedt, F., Bratvedt, K., Childs, P., Olufsen, R.: A fully compositional streamline simulator. In: SPE Annual Technical Conference and Exhibition, Dallas, 1–4 October 2000

  9. 9.

    Culler, D., Singh, J., Gupta, A.: Parallel Computer Architecture: A Hardware/Software Approach. Morgan Kaufman, San Francisco (1998)

  10. 10.

    Dagum, L., Menon, R.: OpenMP: an industry-standard API for shared-memory programming. Comput. Sci. Eng. IEEE 5(1), 46–55 (1998)

  11. 11.

    DiDonato, G., Blunt, M.J.: Streamline-based dual-porosity simulation of reactive transport and flow in fractured reservoirs. Water Resour. Res. 40(4), (2004)

  12. 12.

    Gropp, W.D., Kaushik, D.K., Keyes, D.E., Smith, B.F.: High-performacne parallel implicit CFD. Parallel Comput. 27(4), 337–362 (2001)

  13. 13.

    IEEE Portable Applications Standards Committee: Portable Operating System Interface (POSIX)–Part1: System Application Programming Interface (API) [C Language]. IEEE Std 1003.1-1996, ISO/IEC 9945-1 (1996)

  14. 14.

    Keats, W.A., Lien, F.: Two-dimensional anisotropic cartesian mesh adaption for the compressible Euler equations. Int. J. Numer. Methods Fluids 46(11), 1099–1125 (2004)

  15. 15.

    Kongetira, P., Aingaran, K., Olukotun, K.: Niagara: a 32-way multithreaded SPARC processor. IEEE MICRO 25(2), 21–29 (2005)

  16. 16.

    Laudon, J., Spracklen, L.: The coming wave of multithreaded chip multiprocessors. Int. J. Parallel Program. 35(3), 299–330 (2007)

  17. 17.

    Ma, Y., Chen, Z.: Parallel computation for reservoir thermal simulation of multicomponent and multiphase fluid flow. J. Comput. Phys. 201(1), 224–237 (2004)

  18. 18.

    Mallison, B.T., Gerritsen, M.G., Matringe, S.F.: Improved mappings for streamline-based simulation. SPE J. (SPE 89352) (2005)

  19. 19.

    Mattson, T.G.: How good is OpenMP?. Sci. Program. 11, 81–93 (2003)

  20. 20.

    Message Passing Interface Forum. MPI2: a message passing interface standard. High Perform. Comput. Appl. 12(1–2), 1–299 (1998)

  21. 21.

    Oliker, L., Li, X., Husbands, P., Biswas, R.: Effects of ordering strategies and programming paradigms on sparse matrix computations. SIAM Rev. 44(3), 373–393 (2002)

  22. 22.

    Olukotun, K., Hammond, L.: The future of microprocessors. ACM Queue 3(7), 26–29 (2005)

  23. 23.

    OpenMP Architecture Review Board. OpenMP Fortran Specification v2.5. OpenMP Architecture Review Board (2005)

  24. 24.

    Pinar, A., Aykanat, C.: Fast optimal load balancing algorithms for 1D partitioning. J. Parallel Distrib. Comput. 64(8), 974–996 (2004)

  25. 25.

    Pollock, D.W.: Semi-analytical computation of path lines for finite-difference models. Ground Water 26(6), 743–750 (1988)

  26. 26.

    Sunderam, V.S.: PVM: a framework for parallel distributed computing. Concurrency: Practice and Experience 2(4), 315–339 (1990)

  27. 27.

    Sutter, H., Larus, J.: Software and the concurrency revolution. ACM Queue 3(7), 54–62 (2005)

  28. 28.

    Thiele, M., Batycky, R.P.: Using streamline-derived injection efficiencies for improved waterflood management. SPE Reserv. Evalu. Eng. 9, 187–196 (2006)

  29. 29.

    Thiele, M.R., Batycky, R.P., Blunt, M.J.: A streamline based 3D field-scale compositional reservoir simulator. SPE Reserv. Eng. 12(4), 246–254 (1997)

  30. 30.

    Wang, P., Balay, S., Sepehrnoori, K., Wheeler, J., Abate, J., Smith, B.B., Pope, G.A.: A fully implicit parallel EOS compositional simulator for large scale reservoir simulation. In: SPE 5188 (1999)

  31. 31.

    Wu, Y.-S., Zhang, K., Ding, C., Pruess, K., Elmroth, E., Bodvarsson, G.S.: An efficient parallel-computing method for modeling nonisothermal multiphase flow and multicomponent transport in porous and fractured media. Adv. Water Resour. 25(3), 243–261 (2002)

Download references

Author information

Correspondence to Margot G. Gerritsen.

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Gerritsen, M.G., Löf, H. & Thiele, M.R. Parallel implementations of streamline simulators. Comput Geosci 13, 135–149 (2009). https://doi.org/10.1007/s10596-008-9113-y

Download citation

Keywords

  • Streamline simulator
  • Parallel implementation
  • Shared memory architecture