Advertisement

The Journal of Supercomputing

, Volume 59, Issue 1, pp 361–391 | Cite as

Dynamic-CoMPI: dynamic optimization techniques for MPI parallel applications

  • Rosa Filgueira
  • Jesús Carretero
  • David E. Singh
  • Alejandro Calderón
  • Alberto Núñez
Article

Abstract

This work presents an optimization of MPI communications, called Dynamic-CoMPI, which uses two techniques in order to reduce the impact of communications and non-contiguous I/O requests in parallel applications. These techniques are independent of the application and complementaries to each other. The first technique is an optimization of the Two-Phase collective I/O technique from ROMIO, called Locality aware strategy for Two-Phase I/O (LA-Two-Phase I/O). In order to increase the locality of the file accesses, LA-Two-Phase I/O employs the Linear Assignment Problem (LAP) for finding an optimal I/O data communication schedule. The main purpose of this technique is the reduction of the number of communications involved in the I/O collective operation. The second technique, called Adaptive-CoMPI, is based on run-time compression of MPI messages exchanged by applications. Both techniques can be applied on every application, because both of them are transparent for the users. Dynamic-CoMPI has been validated by using several MPI benchmarks and real HPC applications. The results show that, for many of the considered scenarios, important reductions in the execution time are achieved by reducing the size and the number of the messages. Additional benefits of our approach are the reduction of the total communication time and the network contention, thus enhancing, not only performance, but also scalability.

Keywords

MPI library Parallel techniques Clusters architectures Compression algorithms Collective I/O Adaptive systems Heuristics 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Message Passing Interface Forum (1994) MPI: a message-passing interface standard. Int J Supercomput Appl 8:165–414 Google Scholar
  2. 2.
    Nieuwejaar N, Kotz D, Purakayastha A, Ellis CS, Best M (1996) File-access characteristics of parallel scientific workloads. IEEE Trans Parallel Distrib Syst 7(10):1075–1089 CrossRefGoogle Scholar
  3. 3.
    Simitci H, Reed DA (1998) A comparison of logical and physical parallel I/O patterns. Int J Supercomput Appl High Perform Comput 12(3):364–380 CrossRefGoogle Scholar
  4. 4.
    Gropp W, Lusk E (1997) Sowing MPICH: a case study in the dissemination of a portable environment for parallel scientific computing. Int J Supercomput Appl High Perform Comput 11(2):103–114 CrossRefGoogle Scholar
  5. 5.
    Kotz D (1994) Disk-directed I/O for mimd multiprocessors. In: Proceedings of the 1994 symposium on operating systems design and implementation, pp 61–74 Google Scholar
  6. 6.
    Seamons K, Chen Y, Jones P, Jozwiak J, Winslett M (1995) Server-directed collective I/O in panda. In: Proceedings of supercomputing ’95 Google Scholar
  7. 7.
    del Rosario J, Bordawekar R, Choundary A (1993) Improved parallel I/O via a two-phase run-time access strategy. ACM Comput Archit News 21:31–38 CrossRefGoogle Scholar
  8. 8.
    Bordawekar R (1997) Implementation of collective I/O in the intel paragon parallel file system: initial experiences. In: ICS ’97: Proceedings of the 11th international conference on supercomputing. ACM Press, New York, pp 20–27 Google Scholar
  9. 9.
    Yu W, Vetter J, Canon RS, Jiang S (2007) Exploiting lustre file joining for effective collective io. In: Cluster computing and the grid, IEEE international symposium on, pp 267–274 Google Scholar
  10. 10.
    Thakur R, Gropp W, Lusk E (1999) Data sieving and collective I/O in ROMIO. In: Proceedings of the 7th symposium on the frontiers of massively parallel computation, Argonne national laboratory (1999), pp 182–189 Google Scholar
  11. 11.
    Thakur R, Gropp W, Lusk E (2002) Optimizing noncontiguous accesses in MPI-IO. Parallel Comput 28(1):83–106 zbMATHCrossRefGoogle Scholar
  12. 12.
    Keng Liao W, Coloma K, Choudhary A, Ward L, Russel E, Tideman S (2005) Collective caching: Application-aware client-side file caching. In: Proceedings of the 14th international symposium on high performance distributed computing (HPDC) Google Scholar
  13. 13.
    Keng Liao W, Coloma K, Choudhary AN, Ward L (2005) Cooperative write-behind data buffering for MPI-I/O. In: PVM/MPI, pp 102–109 Google Scholar
  14. 14.
    Isaila F, Malpohl G, Olaru V, Szeder G, Tichy W (2004) Integrating collective i/o and cooperative caching into the “clusterfile” parallel file system. In: ICS 04: Proceedings of the 18th annual international conference on supercomputing. ACM Press, New York, pp 58–67 CrossRefGoogle Scholar
  15. 15.
    Filgueira R, Singh DE, Pichel JC, Isaila F, Carretero J (2008) Data locality aware strategy for two-phase collective i/o. In: High performance computing for computational science—VECPAR 2008: 8th international conference, Toulouse, France, June 24–27, 2008. Revised Selected Papers, pp 137–149 Google Scholar
  16. 16.
    Balkanski D, Trams M, Rehm W (2003) Heterogeneous computing with MPICH/madeleine and PACX MPI: a critical comparison Google Scholar
  17. 17.
    Keller RML (2005) Using PACX-MPI in metacomputing applications. In: 18th symposium simulations technique, Erlangen, September 12–15 Google Scholar
  18. 18.
    Ratanaworabhan P, Ke J, Burtscher M (2006) Fast lossless compression of scientific floating-point data. In: DCC ’06: proceedings of the data compression conference. IEEE Computer Society, Washington, pp 133–142 Google Scholar
  19. 19.
    Ke J, Burtscher M, Speight E (2004) Runtime compression of MPI messages to improve the performance and scalability of parallel applications. In: SC ’04: proceedings of the 2004 ACM/IEEE conference on supercomputing. IEEE Computer Society, Washington, p 59 Google Scholar
  20. 20.
    Carretero J, No J, Park SS, Choudhary A, Chen P (1998) COMPASSION: a parallel I/O runtime system including chunking and compression for irregular applications. In: Proceedings of the international conference on high-performance computing and networking. April 1998, pp 668–677 Google Scholar
  21. 21.
    Markus F, Oberhumer XJ (2002) LZO. http://www.oberhumer.com/opensource/lzo/lzodoc.php
  22. 22.
    Garcia-Carballeira F, Calderon AJC (1999) Mimpi: a multithread-safe implementation of MPI. In: Recent advances in parallel virtual machine and message passing interface, 6th European PVM/MPI users group meeting, 1999, pp 207–214 Google Scholar
  23. 23.
    Thakur R (2006) Issues in developing a thread-safe mpi implementation. In: Recent advances in parallel virtual machine and message passing interface, 13th European PVM/MPI users group meeting. Springer, Berlin, pp 12–21 Google Scholar
  24. 24.
    Filgueira R, Singh DE, Calderon A, Carretero J (2009) CoMPI:Enhancing MPI based applications performance and scalability using compression. In: European PVM/MPI Google Scholar
  25. 25.
    Filgueira R, Singh DE, Carretero J, Calderón A (2009) Technical report:enhancing MPI based applications performance and scalability by using adaptive compression. http://www.arcos.inf.uc3m.es/doku.php?id=arcos_tr
  26. 26.
    Jonker R, Volgenant A (1987) A shortest augmenting path algorithm for dense and sparse linear assignment problems. Computing 38(4):325–340 MathSciNetzbMATHCrossRefGoogle Scholar
  27. 27.
    Blackman S (1986) Multiple-target tracking with radar applications. In: Dedham. Artech House, Norwood Google Scholar
  28. 28.
    Carpaneto SG, Oth P (1988) Algorithms and codes for the assignment problem. Ann Oper Res 13(1):191–223 CrossRefGoogle Scholar
  29. 29.
    Martin RP, Vahdat AM, Culler DE, Anderson TE (1997) Effects of communication latency, overhead, and bandwidth in a cluster architecture. SIGARCH Comput Archit News 25(2):85–97 CrossRefGoogle Scholar
  30. 30.
    Ke J, Burtscher M, Speight E (2004) Runtime compression of MPI messanes to improve the performance and scalability of parallel applications. In: SC ’04: proceedings of the 2004 ACM/IEEE conference on supercomputing, p 59 Google Scholar
  31. 31.
    Gropp W, Lusk EL (1999) Reproducible measurements of MPI performance characteristics. In: Proceedings of the 6th European PVM/MPI users’ group meeting on recent advances in parallel virtual machine and message passing interface. Springer, Berlin, pp 11–18 CrossRefGoogle Scholar
  32. 32.
    Mucci PJ, London K, Mucci PJ (1998) The MPBench Report. Technical report Google Scholar
  33. 33.
    Reussner R, Sanders P, Träff JL (2002) SKaMPI: a comprehensive benchmark for public benchmarking of MPI. Sci Program 10(1):55–65 Google Scholar
  34. 34.
    Hockney RW (1994) The communication challenge for MPP: Intel Paragon and Meiko CS-2. Parallel Comput 20(3):389–398 Google Scholar
  35. 35.
    Wunderlich HJ (1990) Multiple distributions for biased random test patterns. IEEE Trans Comput-Aided Des Integr Circuits Syst 9(6):584–593 CrossRefGoogle Scholar
  36. 36.
    Majumdar A (1996) On evaluating and optimizing weights for weighted random pattern testing. IEEE Trans Comput 45(8):904–916 zbMATHCrossRefGoogle Scholar
  37. 37.
    Gabriel E, Fagg GE, Bosilca G, Angskun T, Dongarra JJ, Squyres JM, Sahay V, Kambadur P, Barrett B, Lumsdaine A, Castain RH, Daniel DJ, Graham RL, Woodall TS (2004) Open MPI: goals, concept, and design of a next generation MPI implementation. In: Proceedings, 11th European PVM/MPI users’ group meeting, Budapest, Hungary, September 2004, pp 97–104 Google Scholar
  38. 38.
    Huang W, Santhanaraman G, Jin HW, Gao Q, Panda DKDKX (2006) Design of high performance mvapich2: Mpi2 over infiniband. In: CCGRID ’06: proceedings of the sixth IEEE international symposium on cluster computing and the grid. IEEE Computer Society, Washington, pp 43–48 CrossRefGoogle Scholar
  39. 39.
    Gropp W, Lusk E, Doss N, Skyellum A (1996) A high performance, portable implementation of the mpi message passing interface standard. Parallel Comput Google Scholar
  40. 40.
    Loureiro A, Gonzlez J, Pena TF (2003) A parallel 3D semiconductor device simulator for gradual heterojunction bipolar transistors. Int J Numer Modell Electron Netw Devices Fields 16:53–66 zbMATHCrossRefGoogle Scholar
  41. 41.
    Pichel JC, Singh DE, Rivera FF (2006) Image segmentation based on merging of sub-optimal segmentations. Pattern Recogn Lett 27(10):1105–1116 CrossRefGoogle Scholar
  42. 42.
    Mourino J, Martin M, Doallo R, Singh D, Rivera F, Bruguera J (2004) The STEM-II air quality model on a distributed memory system Google Scholar
  43. 43.
    Carter Russell, Ciotti B, Fineberg S, Nitzberg B (1992) NHT-1 I/O benchmarks. Technical Report RND-92-016, NAS Systems Division, NASA, Ames Google Scholar

Copyright information

© Springer Science+Business Media, LLC 2010

Authors and Affiliations

  • Rosa Filgueira
    • 1
  • Jesús Carretero
    • 1
  • David E. Singh
    • 1
  • Alejandro Calderón
    • 1
  • Alberto Núñez
    • 1
  1. 1.Department of Computer ScienceUniversity Carlos III of MadridMadridSpain

Personalised recommendations