Skip to main content

Mapping Streaming Languages to General Purpose Processors through Vectorization

  • Conference paper
Book cover Languages and Compilers for Parallel Computing (LCPC 2009)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 5898))

  • 836 Accesses

Abstract

Streaming languages were originally aimed at streaming architectures, but recent work has shown the stream programming model to be useful in exploiting parallelism on general purpose processors. Current research in mapping stream code onto GPPs deals with load balancing and generating threads based on hardware features. We look into improving problems associated with stream data locality and stream data parallelism on GPPs. We suggest that automatically generating vectorized code for these streaming operations is a potential solution. We use the Brook stream language as our syntax base and augment it to generate vector intrinsics targeting the x86 architecture. This compiler uses both existing and new strategies to transform high-level streaming kernel code into vector instructions without requiring additional annotations. We compare our system’s results to existing mapping strategies aimed at using stream code on GPPs. When evaluating performance, we see a wide range of speedups from a few percent to over 2x and discuss the level of effectiveness of using vector code over scalar equivalents in specific application domains.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Owens, J.D., Rixner, S., Kapasi, U.J., Mattson, P., Towles, B., Serebrin, B., Dally, W.J.: Media processing applications on the imagine stream processor. In: International Conference on Computer Design, p. 295 (2002)

    Google Scholar 

  2. Taylor, M.B., Lee, W., Miller, J., Wentzlaff, D., Bratt, I., Greenwald, B., Hoffmann, H., Johnson, P., Kim, J., Psota, J., Saraf, A., Shnidman, N., Strumpen, V., Frank, M., Amarasinghe, S., Agarwal, A.: Evaluation of the raw microprocessor: An exposed-wire-delay architecture for ILP and streams. In: ISCA 2004: Proceedings of the 31st annual international symposium on Computer architecture, Washington, DC, USA, vol. 2. IEEE Computer Society, Los Alamitos (2004)

    Google Scholar 

  3. Khailany, B., Dally, W.J., Kapasi, U.J., Mattson, P., Namkoong, J., Owens, J.D., Towles, B., Chang, A., Rixner, S.: Imagine: Media processing with streams. IEEE Micro 21(2), 35–46 (2001)

    Article  Google Scholar 

  4. Zhang, X.D.: A streaming computation framework for the cell processor. M. eng. thesis, Massachusetts Institute of Technology, Cambridge, MA (August 2007)

    Google Scholar 

  5. Zhang, X.D., Li, Q.J., Rabbah, R., Amarasinghe, S.: A lightweight streaming layer for multicore execution. In: Workshop on Design, Architecture and Simulation of Chip Multi-Processors, Chicago, IL (December 2007)

    Google Scholar 

  6. Amarasinghe, S.: StreamIt A Programming Language for the Era of Multicores (November 2006)

    Google Scholar 

  7. Buck, I., Foley, T., Horn, D., Sugerman, J., Fatahalian, K., Houston, M., Hanrahan, P.: Brook for GPUs: stream computing on graphics hardware. In: SIGGRAPH 2004: ACM SIGGRAPH 2004 Papers, pp. 777–786. ACM, New York (2004)

    Chapter  Google Scholar 

  8. Gummaraju, J., Rosenblum, M.: Stream programming on general-purpose processors. In: MICRO 38: Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture, Washington, DC, USA, pp. 343–354. IEEE Computer Society, Los Alamitos (2005)

    Google Scholar 

  9. Gummaraju, J., Erez, M., Coburn, J., Rosenblum, M., Dally, W.J.: Architectural support for the stream execution model on general-purpose processors. In: PACT 2007: Proceedings of the 16th International Conference on Parallel Architecture and Compilation Techniques, Washington, DC, USA, pp. 3–12. IEEE Computer Society, Los Alamitos (2007)

    Google Scholar 

  10. Talla, D., John, L.K., Burger, D.: Bottlenecks in multimedia processing with SIMD style extensions and architectural enhancements. IEEE Trans. Comput. 52(8), 1015–1031 (2003)

    Article  Google Scholar 

  11. Kudlur, M., Mahlke, S.: Orchestrating the execution of stream programs on multicore platforms. In: PLDI 2008: Proceedings of the 2008 ACM SIGPLAN conference on Programming language design and implementation, pp. 114–124. ACM, New York (2008)

    Chapter  Google Scholar 

  12. Gummaraju, J., Coburn, J., Turner, Y., Rosenblum, M.: Streamware: programming general-purpose multicore processors using streams. SIGOPS Oper. Syst. Rev. 42(2), 297–307 (2008)

    Article  Google Scholar 

  13. wei Liao, S., Du, Z., Wu, G., Lueh, G.Y.: Data and computation transformations for brook streaming applications on multiprocessors. In: CGO 2006: Proceedings of the International Symposium on Code Generation and Optimization, Washington, DC, USA, pp. 196–207. IEEE Computer Society, Los Alamitos (2006)

    Chapter  Google Scholar 

  14. Thies, W., Karczmarek, M., Amarasinghe, S.P.: Streamit: A language for streaming applications. In: Horspool, R.N. (ed.) CC 2002. LNCS, vol. 2304, pp. 179–196. Springer, Heidelberg (2002)

    Chapter  Google Scholar 

  15. Das, A., Dally, W.J., Mattson, P.: Compiling for stream processing. In: PACT 2006: Proceedings of the 15th international conference on Parallel architectures and compilation techniques, pp. 33–42. ACM, New York (2006)

    Chapter  Google Scholar 

  16. Amarasinghe, S., Gordon, M.I., Karczmarek, M., Lin, J., Maze, D., Rabbah, R.M., Thies, W.: Language and compiler design for streaming applications. Int. J. Parallel Program. 33(2), 261–278 (2005)

    Article  Google Scholar 

  17. Advanced Micro Devices, Inc.: AMD Brook+ (November 2007), http://ati.amd.com/technology/streamcomputing/AMD-Brookplus.pdf

  18. Nuzman, D., Zaks, A.: Autovectorization in GCC - two years later. In: GCC Summit (June 2006)

    Google Scholar 

  19. Naishlos, D.: Autovectorization in GCC. In: GCC Summit (June 2004)

    Google Scholar 

  20. Intel Corp.: Intel(R) C++ Compiler Intrinsics Reference (2007) ftp://download.intel.com/support/performancetools/c/linux/v9/intref_cls.pdf

  21. Intel Corp.: Intel(R) 64 and IA-32 Architectures Optimization Reference Manual (2007), http://www.intel.com/design/processor/manuals/248966.pdf

  22. Mucci, P.J.: PapiEx - Execute arbitrary application and measure hardware performance counters with PAPI (2009), http://icl.cs.utk.edu/~mucci/papiex/

  23. Nuzman, D., Rosen, I., Zaks, A.: Auto-vectorization of interleaved data for SIMD. SIGPLAN Not. 41(6), 132–143 (2006)

    Article  Google Scholar 

  24. Stratton, J., Stone, S., mei Hwu, W.: MCUDA: An efficient implementation of CUDA kernels for multi-core CPUs. In: Amaral, J.N. (ed.) LCPC 2008. LNCS, vol. 5335, pp. 16–30. Springer, Heidelberg (2008)

    Chapter  Google Scholar 

  25. RapidMind: RapidMind Development Platform (May 2008), http://www.sharcnet.ca/events/ssgc2008/presentations/2008-05-27%20RapidMind%20SHARCnet.pdf

  26. Krall, A., Lelait, S.: Compilation techniques for multimedia processors. International Journal of Parallel Programming 28, 347–361 (2000)

    Article  Google Scholar 

  27. Allen, R., Kennedy, K.: Automatic translation of FORTRAN programs to vector form. ACM Transactions on Programming Languages and Systems 9, 491–542 (1987)

    Article  MATH  Google Scholar 

  28. Ren, G., Wu, P., Padua, D.: A preliminary study on the vectorization of multimedia applications for multimedia extensions. In: Rauchwerger, L. (ed.) LCPC 2003. LNCS, vol. 2958, pp. 420–435. Springer, Heidelberg (2004)

    Google Scholar 

  29. Larsen, S., Rabbah, R., Amarasinghe, S.: Exploiting vector parallelism in software pipelined loops. In: MICRO 38: Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture, Washington, DC, USA, pp. 119–129. IEEE Computer Society, Los Alamitos (2005)

    Google Scholar 

  30. Nuzman, D., Henderson, R.: Multi-platform auto-vectorization. In: CGO 2006: Proceedings of the International Symposium on Code Generation and Optimization, Washington, DC, USA, pp. 281–294. IEEE Computer Society, Los Alamitos (2006)

    Chapter  Google Scholar 

  31. Intel Corp.: Intel(R) Advanced Vector Extensions Programming Reference (2008), http://softwarecommunity.intel.com/isn/downloads/intelavx/Intel-AVX-Programming-Reference-319433003.pdf

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Manley, R., Gregg, D. (2010). Mapping Streaming Languages to General Purpose Processors through Vectorization. In: Gao, G.R., Pollock, L.L., Cavazos, J., Li, X. (eds) Languages and Compilers for Parallel Computing. LCPC 2009. Lecture Notes in Computer Science, vol 5898. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-13374-9_7

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-13374-9_7

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-13373-2

  • Online ISBN: 978-3-642-13374-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics