Computational Geosciences

, Volume 13, Issue 1, pp 135–149 | Cite as

Parallel implementations of streamline simulators

  • Margot G. Gerritsen
  • Henrik Löf
  • Marco R. Thiele
Original paper

Abstract

We discuss various strategies for parallelizing streamline simulators and present a single-phase shared memory implementation. The choice of a shared memory programming model is motivated by its suitability for streamline simulation, as well as the rapid advance of multicore processors, which are readily available at low-cost. We show that streamline-based methods are easily parallelizable on shared memory architectures through their decomposition of the multidimensional transport equations into a large set of independent 1D transport solves. We tested both a specialized explicit load balancing algorithm that optimizes the streamline load distribution across threads to minimize the time that any of the threads are idle, and the dynamic load balancing algorithms provided by OpenMP on the shared memory machines. Our results clearly indicate that built-in schedulers are competitive with specialized load balancing strategies as long as the number of streamlines per thread is sufficiently high, which is the case in field applications. The average workload per thread is nominally insensitive to workload variations between individual streamlines, and any load balancing advantage offered by explicit strategies is not sufficient to overcome associated computational and parallel overhead. In terms of the allocation of streamlines or streamline segments to threads, we investigated both the distributed approach, in which threads are assigned streamline segments, and the owner approach, in which threads own complete streamlines. We found that the owner approach is most suitable. The slight advantage that the distributed approach has in terms of load balancing is not enough to compensate for the additional overheads. Moreover, the owner approach allows straightforward re-use of existing sequential codes, which is not the case for the distributed approach in case of implicit or adaptive implicit solution strategies. The tracing and mapping stages in streamline simulation have low parallel efficiency. However, in real-field models, the computational burden of the streamline solves is significantly heavier than that of the tracing and mapping stages, and therefore, the impact of these stages is limited. We tested the parallelization on three shared memory systems: a 24 dual-core processor Sun SPARC server; an eight-way Sun Opteron server, representative of the state-of-the-art shared memory systems in use in the industry; and the very recently released Sun Niagara II multicore machine that has eight floating point compute units on the chip. We test a single-phase flow problem on three heterogeneous reservoirs with varying well placements (this system gives the worst case scenario as the tracing and mapping costs are not negligible compared to the transport costs). For the SPARC and Opteron system, we find parallel efficiencies ranging between 60 and 75 for the tracer flow problems. The sublinear speedup is mostly due to communication overheads in the tracing and mapping stages. In applications with more complex physics, the relative contributions of these stages will decrease significantly, and we predict the parallel performance to be nearly linear. On the Niagara II, we obtain almost perfect linear scalability even for the single-phase flow problem thanks to the lowered communication costs on these architectures that have a shared cache. This result is all the more satisfactory considering that future server designs will be akin to this system.

Keywords

Streamline simulator Parallel implementation Shared memory architecture 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Abate, J., Wang, P., Sepehrnoori, K.: Parallel compositional reservoir simulation on clusters of PCs. Int. J. High Perform. Comput. Appl. 15(1), 13–21 (2001)CrossRefGoogle Scholar
  2. 2.
    Batycky, R.P.: A three-dimensional two-phase field scale streamline simulator. Ph.D. thesis, Stanford University (1997)Google Scholar
  3. 3.
    Berger, M., Aftosmis, M., Marshall, D., Murman, S.: Performance of a new CFD flow solver using a hybrid programming paradigm. J. Parallel Distrib. Comput. 65(4), 414–423 (2005)MATHCrossRefGoogle Scholar
  4. 4.
    Cao, H., Tchelepi, H., Wallis, J.R., Yardumian, H.: Parallel scalable unstructured CPR-type linear solver for reservoir simulation. In: SPE Annual Technical Conference and Exhibition, Dallas, 9–12 October 2005Google Scholar
  5. 5.
    Charlesworth, A.: The sun fireplane system interconnect. In: Proceedings of the 2001 ACM/IEEE conference on Supercomputing (CDROM), pp. 7–7, Denver, 10–16 November 2001Google Scholar
  6. 6.
    Chow, E., Falgout, R., Hu, J., Tuminaro, R., Yang, U.: A survey of parallelization techniques for multigrid solvers. In: Parallel Processing for Scientific Computing, SIAM Series on Software, Environments, and Tools. SIAM, Philadelphia (2006)Google Scholar
  7. 7.
    Christie, M., Blunt, M.: Tenth SPE comparative solution project: a comparison of upscaling techniques. SPE Reserv. Evalu. Eng. 4(4), 308–317 (2001)Google Scholar
  8. 8.
    Crane, M., Bratvedt, F., Bratvedt, K., Childs, P., Olufsen, R.: A fully compositional streamline simulator. In: SPE Annual Technical Conference and Exhibition, Dallas, 1–4 October 2000Google Scholar
  9. 9.
    Culler, D., Singh, J., Gupta, A.: Parallel Computer Architecture: A Hardware/Software Approach. Morgan Kaufman, San Francisco (1998)Google Scholar
  10. 10.
    Dagum, L., Menon, R.: OpenMP: an industry-standard API for shared-memory programming. Comput. Sci. Eng. IEEE 5(1), 46–55 (1998)CrossRefGoogle Scholar
  11. 11.
    DiDonato, G., Blunt, M.J.: Streamline-based dual-porosity simulation of reactive transport and flow in fractured reservoirs. Water Resour. Res. 40(4), (2004)Google Scholar
  12. 12.
    Gropp, W.D., Kaushik, D.K., Keyes, D.E., Smith, B.F.: High-performacne parallel implicit CFD. Parallel Comput. 27(4), 337–362 (2001)MATHCrossRefGoogle Scholar
  13. 13.
    IEEE Portable Applications Standards Committee: Portable Operating System Interface (POSIX)–Part1: System Application Programming Interface (API) [C Language]. IEEE Std 1003.1-1996, ISO/IEC 9945-1 (1996)Google Scholar
  14. 14.
    Keats, W.A., Lien, F.: Two-dimensional anisotropic cartesian mesh adaption for the compressible Euler equations. Int. J. Numer. Methods Fluids 46(11), 1099–1125 (2004)MATHCrossRefMathSciNetGoogle Scholar
  15. 15.
    Kongetira, P., Aingaran, K., Olukotun, K.: Niagara: a 32-way multithreaded SPARC processor. IEEE MICRO 25(2), 21–29 (2005)CrossRefGoogle Scholar
  16. 16.
    Laudon, J., Spracklen, L.: The coming wave of multithreaded chip multiprocessors. Int. J. Parallel Program. 35(3), 299–330 (2007)CrossRefGoogle Scholar
  17. 17.
    Ma, Y., Chen, Z.: Parallel computation for reservoir thermal simulation of multicomponent and multiphase fluid flow. J. Comput. Phys. 201(1), 224–237 (2004)MATHCrossRefGoogle Scholar
  18. 18.
    Mallison, B.T., Gerritsen, M.G., Matringe, S.F.: Improved mappings for streamline-based simulation. SPE J. (SPE 89352) (2005)Google Scholar
  19. 19.
    Mattson, T.G.: How good is OpenMP?. Sci. Program. 11, 81–93 (2003)Google Scholar
  20. 20.
    Message Passing Interface Forum. MPI2: a message passing interface standard. High Perform. Comput. Appl. 12(1–2), 1–299 (1998)Google Scholar
  21. 21.
    Oliker, L., Li, X., Husbands, P., Biswas, R.: Effects of ordering strategies and programming paradigms on sparse matrix computations. SIAM Rev. 44(3), 373–393 (2002)MATHCrossRefMathSciNetGoogle Scholar
  22. 22.
    Olukotun, K., Hammond, L.: The future of microprocessors. ACM Queue 3(7), 26–29 (2005)CrossRefGoogle Scholar
  23. 23.
    OpenMP Architecture Review Board. OpenMP Fortran Specification v2.5. OpenMP Architecture Review Board (2005)Google Scholar
  24. 24.
    Pinar, A., Aykanat, C.: Fast optimal load balancing algorithms for 1D partitioning. J. Parallel Distrib. Comput. 64(8), 974–996 (2004)MATHCrossRefGoogle Scholar
  25. 25.
    Pollock, D.W.: Semi-analytical computation of path lines for finite-difference models. Ground Water 26(6), 743–750 (1988)CrossRefGoogle Scholar
  26. 26.
    Sunderam, V.S.: PVM: a framework for parallel distributed computing. Concurrency: Practice and Experience 2(4), 315–339 (1990)CrossRefGoogle Scholar
  27. 27.
    Sutter, H., Larus, J.: Software and the concurrency revolution. ACM Queue 3(7), 54–62 (2005)CrossRefGoogle Scholar
  28. 28.
    Thiele, M., Batycky, R.P.: Using streamline-derived injection efficiencies for improved waterflood management. SPE Reserv. Evalu. Eng. 9, 187–196 (2006)Google Scholar
  29. 29.
    Thiele, M.R., Batycky, R.P., Blunt, M.J.: A streamline based 3D field-scale compositional reservoir simulator. SPE Reserv. Eng. 12(4), 246–254 (1997)Google Scholar
  30. 30.
    Wang, P., Balay, S., Sepehrnoori, K., Wheeler, J., Abate, J., Smith, B.B., Pope, G.A.: A fully implicit parallel EOS compositional simulator for large scale reservoir simulation. In: SPE 5188 (1999)Google Scholar
  31. 31.
    Wu, Y.-S., Zhang, K., Ding, C., Pruess, K., Elmroth, E., Bodvarsson, G.S.: An efficient parallel-computing method for modeling nonisothermal multiphase flow and multicomponent transport in porous and fractured media. Adv. Water Resour. 25(3), 243–261 (2002)CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media B.V. 2008

Authors and Affiliations

  • Margot G. Gerritsen
    • 1
  • Henrik Löf
    • 1
  • Marco R. Thiele
    • 1
    • 2
  1. 1.Department of Energy Resources EngineeringStanford UniversityStanfordUSA
  2. 2.Streamsim Technologies, Inc.San FranciscoUSA

Personalised recommendations