Skip to main content

Extensible Recognition of Algorithmic Patterns in DSP Programs for Automatic Parallelization

Abstract

We introduce an extensible knowledge based tool for idiom (pattern) recognition in DSP (digital signal processing) programs. Our tool utilizes functionality provided by the Cetus compiler infrastructure for detecting certain computation patterns that frequently occur in DSP code. We focus on recognizing patterns for for-loops and statements in their bodies as these often are the performance critical constructs in DSP applications for which replacement by highly optimized, target-specific parallel algorithms will be most profitable. For better structuring and efficiency of pattern recognition, we classify patterns by different levels of complexity such that patterns in higher levels are defined in terms of lower level patterns. The tool works statically on the intermediate representation. For better extensibility and abstraction, most of the structural part of recognition rules is specified in XML form to separate the tool implementation from the pattern specifications. Information about detected patterns will later be used for optimized code generation by local algorithm replacement e.g. for the low-power high-throughput multicore DSP architecture ePUMA.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Notes

  1. 1.

    Certain transformational techniques such as loop distribution are applied in order to enhance the recognition process in the presence of multiple statements in a for-loop body that together do not match any defined single pattern. Loop distribution factors out statements of the loop body with no cyclic data dependency into separate loops, which enhances the recognition process. The details of the whole process can be found in [24].

  2. 2.

    Although arrays are considered as variables in most of the programming languages, Cetus handles them in a different manner by defining a specific type (ArrayAccess) for them.

  3. 3.

    The ADDMULTIMUL pattern supports an arbitrary number of factors.

  4. 4.

    The non-structural part of recognition rules will be handled by separate auxiliary matching functions called by reflection, which will be explained in Sect. 4.4.

  5. 5.

    Generally, such merging of siblings is only possible if interferences by other read or write accesses by other siblings “in between” can be statically excluded; see [13] for details.

References

  1. 1.

    Arenaz, M., Touriño, J., Doallo, R.: Xark: an extensible framework for automatic recognition of computational kernels. ACM Trans. Program. Lang. Syst. 30, 32:1–32:56 (2008)

    Article  Google Scholar 

  2. 2.

    Bacry, E.: Lastwave (software). http://www.cmap.polytechnique.fr/~bacry/LastWave/index.html (1997–2009)

  3. 3.

    Blume, W., Eigenmann, R., Faigin, K., Grout, J., Hoeflinger, J., Padua, D., Petersen, P., Pottenger, W., Rauchwerger, L., Tu, P., Stephen, W.: Polaris: improving the effectiveness of parallelizing compilers. In: Proceedings of the Seventh Workshop on Languages and Compilers for Parallel Computing, pp. 141–154. Springer (1994)

  4. 4.

    Borsboom, E.: Vocoder (software). http://www.epiphyte.ca/proj/vocoder/, 1995–2011

  5. 5.

    Chapman, B., Mehrotra, P., Zima, H.: Programming in Vienna Fortran (1992)

  6. 6.

    de Castro, E.: Secret Rabbit Code (aka libsamplerate) (software). http://www.mega-nerd.com/SRC/index.html (2005)

  7. 7.

    Di Martino, B., Kessler, C.W.: Two program comprehension tools for automatic parallelization. IEEE Concurr. 8(1), 37–47 (2000)

    Article  Google Scholar 

  8. 8.

    Franchetti, F., de Mesmay, F., McFarlin, D., Püschel, M.: Operator language: a program generation framework for fast kernels. In: IFIP Working Conference on Domain Specific Languages (DSL WC), volume 5658 of Lecture Notes in Computer Science, pp. 385–410. Springer (2009)

  9. 9.

    Hansson, E., Sohl, J., Kessler, C., Liu, D.: Case study of efficient parallel memory access programming for the embedded heterogeneous multicore DSP architecture ePUMA. In: Proceedings of International Workshop on Multi-Core Computing Systems (MuCoCoS-2011), June 2011, IEEE CS Press, Seoul (2011)

  10. 10.

    Hind, M.: Pointer analysis: haven’t we solved this problem yet? In: Proceedings of the 2001 ACM SIGPLAN-SIGSOFT Workshop on Program Analysis for Software Tools and Engineering, PASTE ’01, pp 54–61. ACM, New York (2001)

  11. 11.

    Horwitz, S.: Precise flow-insensitive may-alias analysis is NP-hard. ACM Trans. Program. Lang. Syst. 19, 1–6 (1997)

    Google Scholar 

  12. 12.

    Johnson, T.A., Lee, S.I., Fei, L., Basumallik, A., Eigenmann, R., Midkiff, S.P.: Experiences in using Cetus for source-to-source transformations. In: Proceedings of 17th International Workshop on Languages and Compilers for Parallel Computing (LCPC) (2004)

  13. 13.

    Kessler, C.W.: Pattern-driven automatic parallelization. Scientif Program. 5(3), 251–274 (1996)

    Google Scholar 

  14. 14.

    Kuck, D.J., Kuhn, R.H., Padua, D.A., Leasure, B., Wolfe, M.: Dependence graphs and compiler optimizations. In: Proceedings of the 8th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, POPL ’81, pp. 207–218. ACM, New York (1981)

  15. 15.

    Landi, W.: Undecidability of static analysis. ACM Lett. Program. Lang. Syst. 1, 323–337 (1992)

    Article  Google Scholar 

  16. 16.

    Martino, B., Di, Iannello, G.: Pap recognizer: a tool for automatic recognition of parallelizable patterns. In: Proceedings of 4th International Workshop on Program Comprehension. IEEE Computer Society, Los Alamitos (1996)

  17. 17.

    Metzger, R., Wen, Z.: Automatic algorithm recognition and replacement: a new approach to program optimization. MIT Press, Cambridge (2000)

    Google Scholar 

  18. 18.

    Parr, T.J., Quong, R.W.: Antlr: a predicated-LL(k) parser generator. Softw. Pract. Exper. 25(7), 789–810 (1995)

    Google Scholar 

  19. 19.

    Peters, J.: Fiview: a digital filter design viewing and comparison tool (software). http://uazu.net/fiview/ (1997–2007)

  20. 20.

    Phillips, D.: Image processing in C: analyzing and enhancing digital images. R& D Publications, Inc., Lawrence (1994)

    Google Scholar 

  21. 21.

    Pottenger, B., Eigenmann, R.: Idiom recognition in the Polaris parallelizing compiler. In: Proceedings of 9th International Conference on Supercomputing, pp. 444–448. ACM (1995)

  22. 22.

    Püschel, M., Moura, J.M.F., Johnson, J.R., Padua, D., Veloso, M.M., Singer, B.W., Xiong, J., Franchetti, F., Gacic, A., Voronenko, Y., Chen, K., Johnson, R.W., Rizzolo, N.: Spiral: code generation for DSP transforms. Proc. IEEE 93 (2) (2005)

  23. 23.

    Sbragion, D.: DRC: digital room correction (software). http://drc-fir.sourceforge.net/ (2002–2006)

  24. 24.

    Sarvestani, A.S.: Automated Recognition of Algorithmic Patterns in DSP Programs. Master’s thesis, Linköping University, Department of Computer and Information Science (2011) LIU-IDA/LITH-EX-11/052-SE. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-73934

  25. 25.

    Torger, A.: BruteFIR (software). http://www.ludd.luth.se/~torger/brutefir.html (2001–2006, 2009)

  26. 26.

    Torger, A.: AlmusVCU (software). http://www.ludd.luth.se/~torger/almusvcu.html (2006)

  27. 27.

    Vinod, U.V., Baruah, P.K.: Mpiimgen—a code transformer that parallelizes image processing codes to run on a cluster of workstations. In: IEEE International Conference on Cluster Computing, pp. 5–12 (2004)

Download references

Acknowledgments

This project was supported by SSF and SeRC. We would also like to thank the anonymous reviewers for their constructive comments.

Author information

Affiliations

Authors

Corresponding author

Correspondence to Erik Hansson.

Additional information

This work was supported by SSF and SeRC.

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Shafiee Sarvestani, A., Hansson, E. & Kessler, C. Extensible Recognition of Algorithmic Patterns in DSP Programs for Automatic Parallelization. Int J Parallel Prog 41, 806–824 (2013). https://doi.org/10.1007/s10766-012-0229-2

Download citation

Keywords

  • Automatic parallelization
  • Algorithmic pattern recognition
  • Cetus
  • DSP
  • DSP code parallelization
  • Compiler frameworks