A methodology for generating efficient disk-based algorithms from tensor product formulas

  • S. D. Kaushik
  • C. -H. Huang
  • R. W. Johnson
  • P. Sadayappan
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 768)


In this paper, we address the issue of automatic generation of disk-based algorithms from tensor product formulas. Disk-based algorithms are required in scientific applications which work with large data sets that do not fit entirely into main memory. Tensor products have been used for designing and implementing block recursive algorithms on shared-memory, vector and distributed-memory multiprocessors. We extend this theory to generate disk-based code from tensor product formulas. The methodology is based on generating algebraically equivalent tensor product formulas which have better disk performance. We demonstrate this methodology by generating disk-based code for the fast Fourier transform.


Tensor product stride permutation disk-based algorithm fast Fourier transform 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    W. O. Alltop. A computer algorithm for transposing nonsquare matrices. IEEE Transactions on Computers, C-24(10):1038–1040, 1975.Google Scholar
  2. 2.
    R. Alverson, D. Callahan, D. Cummings, B.Koblenz, A. Porterfield, and B. Smith. The Tera computer system. In 1990 International Conference on Supercomputing, pages 1–6, 1990.Google Scholar
  3. 3.
    G. L. Anderson. A stepwise approach to computing the multidimensional fast Fourier transform of large arrays. IEEE Transactions on Acoustics and Speech Signal Processing, ASSP-28(3):280–284, 1980.CrossRefGoogle Scholar
  4. 4.
    M. B. Ari. On transposing large 2n × 2n matrices. IEEE Transactions on Computers, C-27(1):72–75, 1979.Google Scholar
  5. 5.
    D. H. Bailey. FFTs in external or hierarchical memory. Journal of Supercomputing, 4:23–35, 1990.CrossRefGoogle Scholar
  6. 6.
    P. M. Chen and D. A. Patterson. Maximizing performance in a striped disk array. In Proceedings of the 17th Annual International Symposium on Computer Architecture, pages 322–331, 1990.Google Scholar
  7. 7.
    T. H. Cormen. Fast permuting on disk arrays. Journal of Parallel and Distributed Computing, 17:41–57, Jan.–Feb. 1993.CrossRefGoogle Scholar
  8. 8.
    L. G. Delcaro and G. L. Sicuranza. A method on transposing externally stored matrices. IEEE Transactions on Computers, C-23(9):801–803, 1974.Google Scholar
  9. 9.
    J. O. Eklundh. A fast computer method for matrix transposing. IEEE Transactions on Computers, 20(7):801–803, 1972.Google Scholar
  10. 10.
    D. Fraser. Array permutation by index-digit permutation. Journal of ACM, 23(2):298–309, 1976.Google Scholar
  11. 11.
    G. C. Goldbogen. Prim: A fast matrix transpose method. IEEE Transactions on Software Engineering, SE-7(2):255–257, 1981.Google Scholar
  12. 12.
    A. Graham. Kronecker Products and Matrix Calculus: With Applications. Ellis Horwood Limited, 1981.Google Scholar
  13. 13.
    S. K. S. Gupta, S. D. Kaushik, C.-H. Huang, J. R. Johnson, R. W. Johnson, and P. Sadayappan. A methodology for the generation of data distributions to optimize communication. In Fourth IEEE Symposium on Parallel and Distributed Processing, pages 436–441, 1992.Google Scholar
  14. 14.
    R. A. Horn and C. R. Johnson. Topics in Matrix Analysis. Cambridge University Press, Cambridge, 1991.Google Scholar
  15. 15.
    C.-H. Huang, J. R. Johnson, and R. W. Johnson. Generating parallel programs from tensor product formulas: A case study of Strassen's matrix multiplication algorithm. In Proc. International Conference on Parallel Processing 1992, pages 104–108, 1992.Google Scholar
  16. 16.
    Paragon XP/S product overview. Intel Corporation, 1991.Google Scholar
  17. 17.
    J. R. Johnson, R. W. Johnson, D. Rodriguez, and R. Tolimieri. A methodology for designing, modifying and implementing fourier transform algorithms on various architectures. Circuits Systems Signal Process, 9(4):449–500, 1990.Google Scholar
  18. 18.
    R. W. Johnson, C.-H. Huang, and J. R. Johnson. Multilinear algebra and parallel programming. Journal of Supercomputing, 5:189–218, 1991.Google Scholar
  19. 19.
    S. D. Kaushik, C.-H. Huang, J. R. Johnson, R. W. Johnson, and P. Sadayappan. Efficient transposition algorithms for large matrices. In Supercomputing '93, 1993. To appear.Google Scholar
  20. 20.
    S. D. Kaushik, S. Sharma, and C.-H. Huang. An algebraic theory for modeling multistage interconnection networks. Journal of Information Science and Engineering. To appear.Google Scholar
  21. 21.
    S. D. Kaushik, S. Sharma, C.-H. Huang, J. R. Johnson, R. W. Johnson, and P. Sadayappan. An algebraic theory for modeling direct interconnection networks. In Supercomputing '92, pages 488–497, 1992.Google Scholar
  22. 22.
    D. A. Patterson, G. Gibson, and R. H. Katz. A case for redundant arrays of inexpensive disks. In Proceedings of the ACM International Conference on Management of Data (SIGMOD), pages 109–116, June 1988.Google Scholar
  23. 23.
    H. K. Ramapriyan. A generalization of Eklundhs's algorithm for transposing large matrices. IEEE Transactions on Computers, C-24(12):1221–1226, 1975.Google Scholar
  24. 24.
    A. Reddy and P. Banerjee. Evaluation of multiple-disk I/O systems. IEEE Tramactions on Computers, 38:1680–1690, December 1989.Google Scholar
  25. 25.
    U. Schumann. Comment on ‘a fast computer method for matrix transposing'. IEEE Transactions on Computers, C-22(5):542–543, 1973.Google Scholar
  26. 26.
    R. C. Singleton. A method for computing the fast Fourier transform with auxiliary memory and limited high-speed storage. IEEE Transactions on Audio and Electroacoustics, AU-15(2):91–98, 1967.Google Scholar
  27. 27.
    R. E. Twogood and M. P. Ekstrom. An extension of Eklundh's matrix transposition algorithm and its application to digital signal processing. IEEE Transactions on Computers, C-25(12):950–952, 1976.Google Scholar
  28. 28.
    C. Van Loan. Computational framework for the Fast Fourier Transform. SIAM, 1992.Google Scholar
  29. 29.
    J. S. Vitter and M. Shriver. Optimal disk I/O with parallel block transfer. In Twenty Second Annual ACM Symposium on Theory of Computing, pages 159–169, May 1990.Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 1994

Authors and Affiliations

  • S. D. Kaushik
    • 1
  • C. -H. Huang
    • 1
  • R. W. Johnson
    • 2
  • P. Sadayappan
    • 1
  1. 1.Department of Computer and Information ScienceOhio State UniversityUSA
  2. 2.Department of Computer ScienceSt. Cloud State UniversityUSA

Personalised recommendations