Skip to main content

JuliusC: A Practical Approach for the Analysis of Divide-and-Conquer Algorithms

  • Conference paper
Languages and Compilers for High Performance Computing (LCPC 2004)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 3602))

  • 937 Accesses

Abstract

The development of divide and conquer (D&C) algorithms for matrix computations has led to the widespread use of high- performance scientific applications and libraries. In turn, D&C algorithms can be implemented using loop nests or recursion. Recursion is extremely appealing because it is an intuitive means for the deployment of top-down techniques, which exploit data locality and parallelism naturally. However, recursion has been considered impractical for high-performance codes, mostly because of the inherent overhead of the division process into small subproblems.

In this work, we develop techniques to model the behavior of recursive algorithms in a way suitable for use by a compiler in estimating and reducing the division process overheads. We describe these techniques and JuliusC, a (lite) C compiler, which we developed to exploit them. JuliusC unfolds the application call graph (partially) and extracts the relations among function calls. As a final result, it produces a directed acyclic graph (DAG) modeling the function calls concisely. The approach is a combination of compile-time and run-time analysis and both have negligible complexity.

We illustrate the applicability of our approach by studying 6 test cases. We present the analysis results and we show how our (optimizing) compiler can use these results to increase the efficiency of the division process between 14 to 20 million times, for our codes.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Kagström, B., Ling, P., van Loan, C.: Gemm-based level 3 blas: high-performance model implementations and performance evaluation benchmark. ACM Transactions on Mathematical Software 24, 268–302 (1998)

    Article  MATH  Google Scholar 

  2. (LAPACK – Linear Algebra PACKage), http://www.netlib.org/lapack/

  3. Dongarra, J., Duff, I., Soransen, D.C., van Der Vorst, H.: Numerical Linear Algebra for Performance Computers. SIAM, Philadelphia (2000)

    Google Scholar 

  4. Golub, G., van Loan, C.: Matrix Computations. Ed. The Johns Hopins University Press (1996)

    Google Scholar 

  5. Frens, J., Wise, D.: Auto-blocking matrix-multiplication or tracking blas3 performance from source code. In: Proc. 1997 ACM Symp. on Principles and Practice of Parallel Programming, vol. 32, pp. 206–216 (1997)

    Google Scholar 

  6. Park, J., Penner, M., Prasanna, V.: Optimizing graph algorithms for improved cache performance. In: Proceedings of the International Parallel and Distributed Processing Symposium (2002)

    Google Scholar 

  7. Whaley, R., Dongarra, J.J.: Automatically tuned linear algebra software. In: Proceedings of the 1998 ACM/IEEE conference on Supercomputing (CDROM), pp. 1–27. IEEE Computer Society, Los Alamitos (1998)

    Google Scholar 

  8. Bilmes, J., Asanovic, K., Chin, C., Demmel, J.: Optimizing matrix multiply using PHiPAC: a portable, high-performance, ANSI C coding methodology. In: Proceedings of the 11th international conference on Supercomputing, pp. 340–347. ACM Press, New York (1997)

    Chapter  Google Scholar 

  9. Lam, M., Rothberg, E., Wolfe, M.: The cache performance and optimizations of blocked algorithms. In: Proceedings of the fourth international conference on architectural support for programming languages and operating system, pp. 63–74 (1991)

    Google Scholar 

  10. Jonsson, I., Kagström, B.: Recursive blocked algorithms for solving triangular systems part i: one-sided and coupled sylvester-type matrix equations. ACM Trans. Math. Softw. 28, 392–415 (2002)

    Article  MATH  Google Scholar 

  11. Szymanski, B.: Parallel functional languages and compilers. ACM Press, New York (1991)

    MATH  Google Scholar 

  12. Frigo, M., Leiserson, C., Prokop, H., Ramachandran, S.: Cache-oblivious algorithms. In: Proceedings of the 40th Annual Symposium on Foundations of Computer Science, p. 285. IEEE Computer Society, Los Alamitos (1999)

    Google Scholar 

  13. Toledo, S.: Locality of reference in lu decomposition with partial pivoting. SIAM Journal on Matrix Analysis and Applications 18, 1065–1081 (1997)

    Article  MATH  MathSciNet  Google Scholar 

  14. Bilardi, G., D’Alberto, P., Nicolau, A.: Fractal matrix multiplication: a case study on portability of cache performance. In: Workshop on Algorithm Engineering 2001, Aarhus, Denmark (2001)

    Google Scholar 

  15. Gustavson, F., Henriksson, A., Jonsson, I., Ling, P., Kagström, B.: Recursive blocked data formats and BLAS’s for dense linear algebra algorithms. In: Verlag, S. (ed.) PARA 1998. LNCS, vol. 1541, pp. 195–206 (1998)

    Google Scholar 

  16. Frigo, M., Johnson, S.: The fastest fourier transform in the west. Technical Report MIT-LCS-TR-728, Massachusetts Institute of technology (1997)

    Google Scholar 

  17. D’Alberto, P., Nicolau, A., Veidenbaum, A.: A data cache with dynamic mapping. In: Rauchwerger, L. (ed.) LCPC 2003. LNCS, vol. 2958, Springer, Heidelberg (2004)

    Google Scholar 

  18. Hummel, J., Hendren, L., Nicolau, A.: Abstract description of pointer data structures: an approach for improving the analysis and optimization of imperative programs. ACM Lett. Program. Lang. Syst. 1, 243–260 (1992)

    Article  Google Scholar 

  19. Rugina, R., Rinard, M.: Automatic parallelization of divide and conquer algorithms. In: Proceedings of the seventh ACM SIGPLAN symposium on Principles and practice of parallel programming, pp. 72–83. ACM Press, New York (1999)

    Chapter  Google Scholar 

  20. D’Alberto: (JuliusC), http://halps.ics.uci.edu/~paolo/JuliusC

  21. Albert, E., Hanus, M., Vidal, G.: Using an Abstract Representation to Specialize Functional Logic Programs. In: Parigot, M., Voronkov, A. (eds.) LPAR 2000. LNCS (LNAI), vol. 1955, pp. 381–398. Springer, Heidelberg (2000)

    Chapter  Google Scholar 

  22. Gomard, C.: A self-applicable partial evaluator for the lambda calculus: correctness and pragmatics. ACM Trans. Program. Lang. Syst. 14, 147–172 (1992)

    Article  Google Scholar 

  23. Jones, N., Gomard, C., Sestoft, P.: Partial Evaluation and Automatic Program Generation. Soft edn. Prentice Hall International, Englewood Cliffs (1993)

    MATH  Google Scholar 

  24. Knoop, J., Rüthing, O., Steffen, B.: Partial dead code elimination. In: Proceedings of the ACM SIGPLAN 1994 conference on Programming language design and implementation, pp. 147–158. ACM Press, New York (1994)

    Chapter  Google Scholar 

  25. Pugh, W., Teitelbaum, T.: Incremental computation via function caching. In: Proceedings of the 16th ACM SIGPLAN-SIGACT symposium on Principles of programming languages, pp. 315–328. ACM Press, New York (1989)

    Chapter  Google Scholar 

  26. Pugh, W.: An improved replacement strategy for function caching. In: Proceedings of the 1988 ACM conference on LISP and functional programming, pp. 269–276. ACM Press, New York (1988)

    Chapter  Google Scholar 

  27. Heydon, A., Levin, R., Yu, Y.: Caching function calls using precise dependencies. In: Proceedings of the ACM SIGPLAN 2000 conference on Programming language design and implementation, pp. 311–320. ACM Press, New York (2000)

    Chapter  Google Scholar 

  28. Abadi, M., Lampson, B., Lévy, J.: Analysis and caching of dependencies. In: Proceedings of the first ACM SIGPLAN international conference on Functional programming, pp. 83–91. ACM Press, New York (1996)

    Chapter  Google Scholar 

  29. Liu, Y., Stoller, S.: Dynamic programming via static incrementalization. Higher Order Symbol. Comput. 16, 37–62 (2003)

    Article  MATH  Google Scholar 

  30. Liu, Y., Stoller, S.: From recursion to iteration: What are the optimizations? In: Partial Evaluation and Semantic-Based Program Manipulation, pp. 73–82 (2000)

    Google Scholar 

  31. Yi, Q., Adve, V., Kennedy, K.: Transforming loops to recursion for multi-level memory hierarchies. In: Proceedings of the ACM SIGPLAN 2000 conference on Programming language design and implementation, pp. 169–181. ACM Press, New York (2000)

    Chapter  Google Scholar 

  32. Lam, M.: SUIF (1994-current), http://suif.stanford.edu/

  33. D’Alberto, P.: Performance evaluation of data locality exploitation. Technical report, University of Bologna, Computer Science (2000)

    Google Scholar 

  34. Lenstra, A.: The development of the number field sieve. Lecture Notes in Math., vol. 1554. Springer, Heidelberg (1993)

    Book  MATH  Google Scholar 

  35. Cormen, T., Leiserson, C., Rivest, R.: Introduction to Algorithms. MIT Press, Cambridge (1990)

    MATH  Google Scholar 

  36. Floyd, R.: Algorithm 97: Shortest path. Communications of the ACM 5 (1962)

    Google Scholar 

  37. Ullman, J., Yannakakis, M.: The input/output complexity of transitive closure. In: Proceedings of the 1990 ACM SIGMOD international conference on Management of data, vol. 19 (1990)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2005 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

D’Alberto, P., Nicolau, A. (2005). JuliusC: A Practical Approach for the Analysis of Divide-and-Conquer Algorithms. In: Eigenmann, R., Li, Z., Midkiff, S.P. (eds) Languages and Compilers for High Performance Computing. LCPC 2004. Lecture Notes in Computer Science, vol 3602. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11532378_10

Download citation

  • DOI: https://doi.org/10.1007/11532378_10

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-28009-5

  • Online ISBN: 978-3-540-31813-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics