Non-strict Evaluation of the FFT Algorithm in Distributed Memory Systems

  • Alfredo Cristóbal-Salas
  • Andrei Tchernykh
  • Jean-Luc Gaudiot
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 2840)


This paper focuses on the partial evaluation of local and remote memory accesses of distributed applications, not only to remove much of the excess overhead of message passing implementations, but also to reduce the number of messages, when some information about the input data set is known. The use of split- phase memory operations, the exploitation of spatial data locality, and non-strict information processing are described. Through a detailed performance analysis, we establish conditions under which the technique is beneficial. We show that by incorporating non-strict information processing to FFT MPI, a significant reduction of the number of messages can be archived, and the overall system performance can be improved.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Arvind, Nikhil, R.S., Pingali, K.-K.: I-Structures: Data Structures for Parallel Computing. ACM Transaction on Programming Languages and Systems 11(4), 598–632 (1989)CrossRefGoogle Scholar
  2. 2.
    Böhm, A.-P.-W., Hiromoto, R.-E.: The Data Flow Parallelism of FFT. In: Gao, G.-R., Bic, L., Gaudiot, J.-L. (eds.) Advanced topics in dataflow computing and multithreading, pp. 393–404 (1995) ISBN: 0-8186- 6542-4Google Scholar
  3. 3.
    Chamberlain, R.-M.: Gray codes, Fast Fourier Transforms and hypercubes. Parallel computing 6, 225–233 (1988)MATHCrossRefMathSciNetGoogle Scholar
  4. 4.
    Cristobal, A., Tchernykh, A., Gaudiot, J.-L., Lin, W.Y.: Non-Strict Execution in Parallel and Distributed Computing. International Journal of Parallel Programming 31(2), 77–105 (2003)MATHCrossRefGoogle Scholar
  5. 5.
    Dennis, J.-B., Gao, G.-R.: On memory models and cache management for shared-memory multiprocessors. CSG MEMO 363, CSL, MIT (1995)Google Scholar
  6. 6.
    Eicken, T., Culler, D.-E., Goldstein, S.-C., Schauser, K.-E.: Active Messages: a Mechnisim for Integrated Communication and Computation. In: Proceedings of the 19th International Symposium on Computer Architecture, 256-266 (1992)Google Scholar
  7. 7.
    Ershov, A.P.: Mixed computation: potential applications and problems for study. Theoretical Computer Science 18 (1982)Google Scholar
  8. 8.
    Govindarajan, R., Nemawarkar, S., LeNir, P.: Design and performance evaluation of a multithreaded architecture. In: Proceedings of the 1st international symposium on High-Performance Computer Architecture, Raliegh, pp. 298-307 (1995)Google Scholar
  9. 9.
    Gluck R., Nakashige R., Zochling R.: Binding-time analysis applied to mathematical algorithms. In Dolezal J., Fidler, J. (eds.) 17th IFIP Conference on System Modelling and Optimization, Prague, Czech Republic (1995) Google Scholar
  10. 10.
    Gupta, S.-A.: A typed approach to layered programming language design. Thesis proposal, Laboratory of computer science, Department of EE&CS, MIT (1993)Google Scholar
  11. 11.
    Jones, N-D.: An introduction to Partial Evaluation. ACM computing surveys 28(3) (1996) Google Scholar
  12. 12.
    Kavi, K.-M., Hurson, A.-R., Patadia, P., Abraham, E., Shanmugam, P.: Design of cache memories for multithreaded dataflow architecture. In: ISCA 1995, pp. 253-264 (1995)Google Scholar
  13. 13.
    Lawall, J.-L.: Faster Fourier Transforms via automatic program specialization. IRISA research reports, p. 28 (1998)Google Scholar
  14. 14.
    Lin, W.-Y., Gaudiot, J.-L.: I-Structure Software Cache – A split-Phase Transaction runtime cache system. In: Proceedings of PACT 1996, Boston, MA, pp. 20-23 (1996)Google Scholar
  15. 15.
    Ogawa, H., Matsuoka, S.: OMPI: Optimizing MPI programs using Partial Evaluation. In: Proceedings IEEE/ACM Supercomputing Conference (1996)Google Scholar
  16. 16.
    Osamu, T., Yuetsu, K., Santoshi, S., Yoshinori, Y.: Highly efficient implementation of MPI point-to-point communication using remote memory operations. In: Proceedings of 12th ACM ICS 1998, Melbourne, Australia, pp. 267-273 (1998)Google Scholar
  17. 17.
    Quinn, M.-J.: Parallel computing theory and practice. McGraw-Hill Inc., New York (1994)Google Scholar
  18. 18.
    Sperber, M., Klaeren, H., Thiemann, P.: Distributed partial evaluation. In: Kaltofen, E (ed.): PASCO 1997, Maui, Hawaii, pp. 80-87 (1997)Google Scholar
  19. 19.
    Swarztrauber, P.-N.: Multiprocessor FFTs. Parallel computing 5, 197–210 (1987)MATHCrossRefMathSciNetGoogle Scholar
  20. 20.
    Oran Brigham, E.: Fast Fourier Transform and Its Applications. Prentice-Hall, Englewood Cliffs (1988)Google Scholar
  21. 21.
    Amaral, J.N., Lin, W.-Y., Gaudiot, J.-L., Gao, G.R.: Exploiting Locality in Single Assignment Data Structures Updated Through Split-Phase Transactions. International Journal of Cluster Computing, Special Issue on Internet Scalability: Advances in Parallel, Distributed, and Mobile Systems 4(4) (2001)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2003

Authors and Affiliations

  • Alfredo Cristóbal-Salas
    • 1
  • Andrei Tchernykh
    • 2
  • Jean-Luc Gaudiot
    • 3
  1. 1.School of Chemistry Sciences and EngineeringUniversity of Baja CaliforniaTijuanaMexico
  2. 2.Computer Science DepartmentCICESE Research CenterEnsenadaMexico
  3. 3.Electrical Engineering and Computer ScienceUniversity of CaliforniaIrvineUSA

Personalised recommendations