PALLAS: Mapping Applications onto Manycore

Anderson, Michael; Catanzaro, Bryan; Chong, Jike; Gonina, Ekaterina; Keutzer, Kurt; Lai, Chao-Yue; Murphy, Mark; Su, Bor-Yiing; Sundaram, Narayanan

doi:10.1007/978-1-4419-6460-1_4

Michael Anderson,
Bryan Catanzaro,
Jike Chong,
Ekaterina Gonina,
Kurt Keutzer³,
Chao-Yue Lai,
Mark Murphy,
Bor-Yiing Su &
…
Narayanan Sundaram

1417 Accesses

Abstract

Parallel programming using the current state-of-the-art in software engineering techniques is hard. Expertise in parallel programming is necessary to deliver good performance in applications; however, it is very common that domain experts lack the requisite expertise in parallel programming. In order to drive the computer science research toward effectively using the available parallel hardware platforms, it is very important to make parallel programming systematical and productive. We believe that the key to designing parallel programs in a systematical way is software architecture, and the key to improve the productivity of developing parallel programs is software frameworks. The basis of both is design patterns and a pattern language.

We illustrate how we can use design patterns to architect a wide variety of real applications, including image recognition, speech recognition, optical ?ow computation, video background subtraction, compressed sensing MRI, computational finance, video games, and machine translation. By exploring software architectures of our applications, we achieved 10x-140x speedups in each of the applications. We illustrate how we can develop parallel programs productively using application frameworks and programming frameworks. We achieve 50%-100% of the performance while using four times fewer lines of code compared to hand-optimized code.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Catanzaro B, Keutzer K (2010) Parallel Computing with Patterns and Frameworks. ACM Crossroads, vol. 16, no. 5, pp. 22-27.
Google Scholar
Our pattern language. http://parlab.eecs.berkeley.edu/wiki/patterns/patterns. Accessed 15 December 2009.
Keutzer K, Mattson T (2009) A design pattern language for engineering (parallel) software. Intel Technology Journal, Addressing the Challenges of Tera-scale Computing, vol.13, no. 4, pp. 6–19.
Google Scholar
Asanovic K et al (2006) The landscape of parallel computing research: A view from Berkeley. EECS Department, University of California, Berkeley, Tech. Rep. UCB/EECS-2006-183.
Google Scholar
Garlan D, Shaw M (1994) An introduction to software architecture. Tech. Rep.,, Pittsburgh, PA, USA.
Google Scholar
Maire M, Arbelaez P, Fowlkes C, and Malik J (2008) Using contours to detect and localize junctions in natural images. CVPR 2008, pp. 1–8.
Google Scholar
Cortes C, Vapnik V (1995) Support-vector networks. Machine Learning, 20: 273–297.
MATH Google Scholar
Catanzaro B, Su B, Sundaram N, Lee Y, Murphy M, Keutzer K (2009) Efficient, high quality image contour detector. ICCV 2009, pp. 2381-2388.
Google Scholar
Chang C, Lin C (2001) LIBSVM : a library for support vector machines. Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm. Accessed 15 December 2009.
Catanzaro B, Sundaram N, Keutzer K (2008) Fast support vector machine training and classification on graphics processors. ICML 2008, pp 104-111.
Article Google Scholar
Brox T, Malik J (2010) Large displacement optical flow:descriptor matching in variational motion estimation. IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 99.
Google Scholar
Brox T, Bruhn A, Papenberg N, Weickert J (2004) High accuracy optical flow estimation based on a theory for warping. ECCV 2004, pp. 25–36.
Google Scholar
Baker S, Scharstein D, Lewis J, Roth S, Black M, Szeliski R (2007) A database and evaluation methodology for optical flow. ICCV 2009, pp. 1–8.
Google Scholar
Sundaram N, Brox T, Keutzer K (2010) Dense Point Trajectories by GPU-accelerated Large Displacement Optical Flow. ECCV 2010, pp. 438–451.
Google Scholar
Zach C, Gallup D, Frahm J M (2008) Fast gain-adaptive KLT tracking on the GPU. CVPR Workshop on Visual Computer Vision on GPU’s.
Google Scholar
Sand P, Teller S (2008) Particle video: Long-range motion estimation using point trajectories. International Journal of Computer Vision, pp. 72–91.
Google Scholar
Wang L, Wang L, Wen M, Zhuo Q, Wang W (2007) Background subtraction using incremental subspace learning. ICIP 2007, vol. 5, pp. 45–48.
Google Scholar
Demmel J, Grigori L, Hoemmen M, Langou J (2008) Communication-optimal parallel and sequential QR and LU factorizations. Tech. Rep. UCB/EECS-2008-89.
Google Scholar
Chong J, You K, Yi Y, Gonina E, Hughes C, Sung W, Keutzer K (2009) Scalable HMM-based inference engine in large vocabulary continuous speech recognition. ICME 2009, pp. 1797-1800.
Google Scholar
You K, Chong J, Yi Y, Gonina E, Hughes C, Chen Y, Sung W, Keutzer K (2009) Parallel scalability in speech recognition: Inference engine in large vocabulary continuous speech recognition. IEEE Signal Processing Magazine, 26(6): 124-135.
Article Google Scholar
Chong J, Gonina E, Yi Y, Keutzer K (2009) A fully data parallel WFST-based large vocabulary continuous speech recognition on a graphics processing unit. Proceeding of the 10th Annual Conference of the International Speech Communication Association, pp. 1183 – 1186.
Google Scholar
Chong J, Gonina E, You K, Keutzer K (2010) Exploring Recognition Network Representations for Efficient Speech Inference on Highly Parallel Platforms. Proceedings of the 11^th Annual Conference of the International Speech Communication Association, pp. 1489-1492.
Google Scholar
Candès E J (2006) Compressive sampling. Proceedings of the International Congress of Mathematicians.
Google Scholar
Lustig M, Alley M, Vasanawala S, Donoho D L, Pauly J M (2009) Autocalibrating parallel imaging compressed sensing using L₁ SPIR-iT with Poisson-Disc sampling and joint sparsity constraints. ISMRM Workshop on Data Sampling and Image Reconstruction.
Google Scholar
Murphy M, Keutzer K, Vasanawala S, Lustig M (2010) Clinically Feasible Reconstruction for L1-SPIRiT Parallel Imaging and Compressed Sensing MRI. ISMRM 2010.
Google Scholar
Dixon M, Chong J, Keutzer K (2009) Acceleration of market value-at-risk estimation. Workshop on High Performance Computing in Finance at Super Computing.
Google Scholar
Worth B, Lindberg P, Granatir (2009) Smoke: Game Threading Tutorial. Game Developers Conference.
Google Scholar
Cocke J, Schwartz J T (1970) Programming languages and their compilers: Preliminary notes. Courant Institute of Mathematical Sciences, New York University, Tech. Rep.
Google Scholar
Kasami T (1965) An efficient recognition and syntax-analysis algorithm for context-free languages. Scientific report AFCRL-65-758, Air Force Cambridge Research Lab, Bedford, MA.
Google Scholar
Pollack F (1999) Microarchitecture challenges in the coming generations of CMOS process tech-nologies. MICRO-32.
Google Scholar
Gustafson J L (1988) Reevaluating Amdahl’s Law, CACM, 31(5): 532-533.
Google Scholar
Luszczek P, Bailey D, Dongarra J, Kepner J, Lucas R, Rabenseifner R, Takahashi D (2006) The HPC Challenge (HPCC) benchmark suite. SC06 Conference Tutorial.
Google Scholar
Sundaram N, Raghunathan, Chakradhar S (2009) A framework for efficient and scalable execution of domain specific templates on GPUs. IEEE International Parallel and Distributed Processing Symposium.
Google Scholar
Catanzaro B, Kamil S, Lee Y, Asanovic K, Demmel J, Keutzer K, Shalf J, Yelick K, Fox A (2009) SEJITS: Getting productivity and performance with Selective Embedded JIT Specialization. Programming Models for Emerging Architectures.
Google Scholar
Catanzaro B, Garland M, Keutzer K (2010) Copperhead: Compiling an Embedded Data Parallel Language. Tech. Rep. UCB/EECS-2010-124.
Google Scholar

Download references

Author information

Authors and Affiliations

University of California, Berkeley, CA, USA
Kurt Keutzer

Authors

Michael Anderson
View author publications
You can also search for this author in PubMed Google Scholar
Bryan Catanzaro
View author publications
You can also search for this author in PubMed Google Scholar
Jike Chong
View author publications
You can also search for this author in PubMed Google Scholar
Ekaterina Gonina
View author publications
You can also search for this author in PubMed Google Scholar
Kurt Keutzer
View author publications
You can also search for this author in PubMed Google Scholar
Chao-Yue Lai
View author publications
You can also search for this author in PubMed Google Scholar
Mark Murphy
View author publications
You can also search for this author in PubMed Google Scholar
Bor-Yiing Su
View author publications
You can also search for this author in PubMed Google Scholar
Narayanan Sundaram
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Kurt Keutzer .

Editor information

Editors and Affiliations

Fak. Elektrotechnik, Inst. Technik der, Universität Karlsruhe, Engesser Str. 5, Karlsruhe, 76128, Germany
Michael Hübner
Institut für Technik der, Informationsverarbeitung, Karlsruhe Institute of Technology (KIT), Vincenz-Prießnitz-Straße 1, Karlsruhe, 76131, Germany
Jürgen Becker

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Anderson, M. et al. (2011). PALLAS: Mapping Applications onto Manycore. In: Hübner, M., Becker, J. (eds) Multiprocessor System-on-Chip. Springer, New York, NY. https://doi.org/10.1007/978-1-4419-6460-1_4

Download citation

DOI: https://doi.org/10.1007/978-1-4419-6460-1_4
Published: 09 November 2010
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4419-6459-5
Online ISBN: 978-1-4419-6460-1
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics