Mapping Streaming Languages to General Purpose Processors through Vectorization

Manley, Raymond; Gregg, David

doi:10.1007/978-3-642-13374-9_7

Raymond Manley¹⁸ &
David Gregg¹⁸

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 5898))

Included in the following conference series:

International Workshop on Languages and Compilers for Parallel Computing

836 Accesses

Abstract

Streaming languages were originally aimed at streaming architectures, but recent work has shown the stream programming model to be useful in exploiting parallelism on general purpose processors. Current research in mapping stream code onto GPPs deals with load balancing and generating threads based on hardware features. We look into improving problems associated with stream data locality and stream data parallelism on GPPs. We suggest that automatically generating vectorized code for these streaming operations is a potential solution. We use the Brook stream language as our syntax base and augment it to generate vector intrinsics targeting the x86 architecture. This compiler uses both existing and new strategies to transform high-level streaming kernel code into vector instructions without requiring additional annotations. We compare our system’s results to existing mapping strategies aimed at using stream code on GPPs. When evaluating performance, we see a wide range of speedups from a few percent to over 2x and discuss the level of effectiveness of using vector code over scalar equivalents in specific application domains.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Owens, J.D., Rixner, S., Kapasi, U.J., Mattson, P., Towles, B., Serebrin, B., Dally, W.J.: Media processing applications on the imagine stream processor. In: International Conference on Computer Design, p. 295 (2002)
Google Scholar
Taylor, M.B., Lee, W., Miller, J., Wentzlaff, D., Bratt, I., Greenwald, B., Hoffmann, H., Johnson, P., Kim, J., Psota, J., Saraf, A., Shnidman, N., Strumpen, V., Frank, M., Amarasinghe, S., Agarwal, A.: Evaluation of the raw microprocessor: An exposed-wire-delay architecture for ILP and streams. In: ISCA 2004: Proceedings of the 31st annual international symposium on Computer architecture, Washington, DC, USA, vol. 2. IEEE Computer Society, Los Alamitos (2004)
Google Scholar
Khailany, B., Dally, W.J., Kapasi, U.J., Mattson, P., Namkoong, J., Owens, J.D., Towles, B., Chang, A., Rixner, S.: Imagine: Media processing with streams. IEEE Micro 21(2), 35–46 (2001)
Article Google Scholar
Zhang, X.D.: A streaming computation framework for the cell processor. M. eng. thesis, Massachusetts Institute of Technology, Cambridge, MA (August 2007)
Google Scholar
Zhang, X.D., Li, Q.J., Rabbah, R., Amarasinghe, S.: A lightweight streaming layer for multicore execution. In: Workshop on Design, Architecture and Simulation of Chip Multi-Processors, Chicago, IL (December 2007)
Google Scholar
Amarasinghe, S.: StreamIt A Programming Language for the Era of Multicores (November 2006)
Google Scholar
Buck, I., Foley, T., Horn, D., Sugerman, J., Fatahalian, K., Houston, M., Hanrahan, P.: Brook for GPUs: stream computing on graphics hardware. In: SIGGRAPH 2004: ACM SIGGRAPH 2004 Papers, pp. 777–786. ACM, New York (2004)
Chapter Google Scholar
Gummaraju, J., Rosenblum, M.: Stream programming on general-purpose processors. In: MICRO 38: Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture, Washington, DC, USA, pp. 343–354. IEEE Computer Society, Los Alamitos (2005)
Google Scholar
Gummaraju, J., Erez, M., Coburn, J., Rosenblum, M., Dally, W.J.: Architectural support for the stream execution model on general-purpose processors. In: PACT 2007: Proceedings of the 16th International Conference on Parallel Architecture and Compilation Techniques, Washington, DC, USA, pp. 3–12. IEEE Computer Society, Los Alamitos (2007)
Google Scholar
Talla, D., John, L.K., Burger, D.: Bottlenecks in multimedia processing with SIMD style extensions and architectural enhancements. IEEE Trans. Comput. 52(8), 1015–1031 (2003)
Article Google Scholar
Kudlur, M., Mahlke, S.: Orchestrating the execution of stream programs on multicore platforms. In: PLDI 2008: Proceedings of the 2008 ACM SIGPLAN conference on Programming language design and implementation, pp. 114–124. ACM, New York (2008)
Chapter Google Scholar
Gummaraju, J., Coburn, J., Turner, Y., Rosenblum, M.: Streamware: programming general-purpose multicore processors using streams. SIGOPS Oper. Syst. Rev. 42(2), 297–307 (2008)
Article Google Scholar
wei Liao, S., Du, Z., Wu, G., Lueh, G.Y.: Data and computation transformations for brook streaming applications on multiprocessors. In: CGO 2006: Proceedings of the International Symposium on Code Generation and Optimization, Washington, DC, USA, pp. 196–207. IEEE Computer Society, Los Alamitos (2006)
Chapter Google Scholar
Thies, W., Karczmarek, M., Amarasinghe, S.P.: Streamit: A language for streaming applications. In: Horspool, R.N. (ed.) CC 2002. LNCS, vol. 2304, pp. 179–196. Springer, Heidelberg (2002)
Chapter Google Scholar
Das, A., Dally, W.J., Mattson, P.: Compiling for stream processing. In: PACT 2006: Proceedings of the 15th international conference on Parallel architectures and compilation techniques, pp. 33–42. ACM, New York (2006)
Chapter Google Scholar
Amarasinghe, S., Gordon, M.I., Karczmarek, M., Lin, J., Maze, D., Rabbah, R.M., Thies, W.: Language and compiler design for streaming applications. Int. J. Parallel Program. 33(2), 261–278 (2005)
Article Google Scholar
Advanced Micro Devices, Inc.: AMD Brook+ (November 2007), http://ati.amd.com/technology/streamcomputing/AMD-Brookplus.pdf
Nuzman, D., Zaks, A.: Autovectorization in GCC - two years later. In: GCC Summit (June 2006)
Google Scholar
Naishlos, D.: Autovectorization in GCC. In: GCC Summit (June 2004)
Google Scholar
Intel Corp.: Intel(R) C++ Compiler Intrinsics Reference (2007) ftp://download.intel.com/support/performancetools/c/linux/v9/intref_cls.pdf
Intel Corp.: Intel(R) 64 and IA-32 Architectures Optimization Reference Manual (2007), http://www.intel.com/design/processor/manuals/248966.pdf
Mucci, P.J.: PapiEx - Execute arbitrary application and measure hardware performance counters with PAPI (2009), http://icl.cs.utk.edu/~mucci/papiex/
Nuzman, D., Rosen, I., Zaks, A.: Auto-vectorization of interleaved data for SIMD. SIGPLAN Not. 41(6), 132–143 (2006)
Article Google Scholar
Stratton, J., Stone, S., mei Hwu, W.: MCUDA: An efficient implementation of CUDA kernels for multi-core CPUs. In: Amaral, J.N. (ed.) LCPC 2008. LNCS, vol. 5335, pp. 16–30. Springer, Heidelberg (2008)
Chapter Google Scholar
RapidMind: RapidMind Development Platform (May 2008), http://www.sharcnet.ca/events/ssgc2008/presentations/2008-05-27%20RapidMind%20SHARCnet.pdf
Krall, A., Lelait, S.: Compilation techniques for multimedia processors. International Journal of Parallel Programming 28, 347–361 (2000)
Article Google Scholar
Allen, R., Kennedy, K.: Automatic translation of FORTRAN programs to vector form. ACM Transactions on Programming Languages and Systems 9, 491–542 (1987)
Article MATH Google Scholar
Ren, G., Wu, P., Padua, D.: A preliminary study on the vectorization of multimedia applications for multimedia extensions. In: Rauchwerger, L. (ed.) LCPC 2003. LNCS, vol. 2958, pp. 420–435. Springer, Heidelberg (2004)
Google Scholar
Larsen, S., Rabbah, R., Amarasinghe, S.: Exploiting vector parallelism in software pipelined loops. In: MICRO 38: Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture, Washington, DC, USA, pp. 119–129. IEEE Computer Society, Los Alamitos (2005)
Google Scholar
Nuzman, D., Henderson, R.: Multi-platform auto-vectorization. In: CGO 2006: Proceedings of the International Symposium on Code Generation and Optimization, Washington, DC, USA, pp. 281–294. IEEE Computer Society, Los Alamitos (2006)
Chapter Google Scholar
Intel Corp.: Intel(R) Advanced Vector Extensions Programming Reference (2008), http://softwarecommunity.intel.com/isn/downloads/intelavx/Intel-AVX-Programming-Reference-319433003.pdf

Download references

Author information

Authors and Affiliations

Trinity College Dublin, Dublin, Ireland
Raymond Manley & David Gregg

Authors

Raymond Manley
View author publications
You can also search for this author in PubMed Google Scholar
David Gregg
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Electrical and Computer Engineering, University of Delaware, 19716, Newark, DE, USA
Guang R. Gao & Xiaoming Li &
Department of Computer and Information Sciences, University of Delaware, 19716, Newark, DE, USA
Lori L. Pollock & John Cavazos &

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Manley, R., Gregg, D. (2010). Mapping Streaming Languages to General Purpose Processors through Vectorization. In: Gao, G.R., Pollock, L.L., Cavazos, J., Li, X. (eds) Languages and Compilers for Parallel Computing. LCPC 2009. Lecture Notes in Computer Science, vol 5898. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-13374-9_7

Download citation

DOI: https://doi.org/10.1007/978-3-642-13374-9_7
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-13373-2
Online ISBN: 978-3-642-13374-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics