Abstract
Large message latencies often lead to poor performance of parallel applications. In this paper, we investigate a latency-tolerating technique that immediately releases all blocking receives, even when the message has not yet (completely) arrived, and enforces execution correctness through page protection. This approach eliminates false message data dependencies on incoming messages and allows the computation to proceed as early as possible. We implement and evaluate our early-release technique in the context of an MPI runtime library. The results show that the execution speed of MPI applications improves by up to 60% when early release is enabled. Our approach also enables faster and easier parallel programming as it frees programmers from adopting more complex nonblocking receives and from tuning message sizes to explicitly reduce false message data dependencies.
Chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Demaine, E.D.: A Threads-Only MPI Implementation for the Development of Parallel Programs. In: Intl. Symp. on High Perf. Comp. Systems, vol. 7, pp. 153–163 (1997)
Dunning, D., Regnier, G., McApline, G., Cameron, D., Shubert, B., Berry, F., Merritt, A., Gronke, E., Dodd, C.: The Virtual Interface Architecture. IEEE Micro 3, 66–76 (1998)
Infiniband Trade Association, Infiniband Architecture Specification, Release 1.0 (October 2000)
Karwande, A., Yuan, X., Lowenthal, D.K.: CC-MPI: A Compiled Communication Capable MPI Prototype for Ethernet Switched Clusters. In: The Ninth ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, vol. 6, pp. 95–106 (2003)
Ke, J.: Adapting parallel program execution in cluster computers through thread migration. M.S. Thesis, Cornell University (2003)
Ke, J., Burtscher, M., Speight, E.: Runtime Compression of MPI Messages to Improve the Performance and Scalability of Parallel Applications. In: Supercomputing,vol. 11 (2004)
Ke, J., Burtscher, M., Speight, E.: Reducing Communication Time through Message Prefetching. In: Intl. Conf. on Parallel and Distributed Processing Techniques and Applications, vol. 6 (2005)
Liu, J., Wu, J., Kini, S.P., Wyckoff, P., Panda, D.K.: High Performance RDMA-Based MPI Implementation over InfiniBand. In: Intl. Conf. on Supercomputing, vol. 6, pp. 295–304 (2003)
Forum, M.P.I.: MPI: A Message-Passing Interface Standard. The Intl. J. of Supercomputer Applications and High Performance Computing 8(3/4), 165–414 (1994)
Speight, E., Abdel-Shafi, H., Bennett, J.K.: Realizing the Performance Potential of the Virtual Interface Architecture. In: Intl. Conf. on Supercomputing, vol. 6, pp. 184–192 (1999)
Tang, H., Yang, T.: Optimizing Threaded MPI Execution on SMP Clusters. In: Intl. Conf. on Supercomputing, vol. 6, pp. 381–392 (2001)
Thakur, R., Gropp, W.: Improving the Performance of Collective Operations in MPICH. In: European PVM/MPI Users’ Group Conference, vol. 9, pp. 257–267 (2003)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Ke, J., Burtscher, M., Speight, E. (2005). Tolerating Message Latency Through the Early Release of Blocked Receives. In: Cunha, J.C., Medeiros, P.D. (eds) Euro-Par 2005 Parallel Processing. Euro-Par 2005. Lecture Notes in Computer Science, vol 3648. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11549468_6
Download citation
DOI: https://doi.org/10.1007/11549468_6
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-28700-1
Online ISBN: 978-3-540-31925-2
eBook Packages: Computer ScienceComputer Science (R0)