Skip to main content

PALLAS: Mapping Applications onto Manycore

  • Chapter
  • First Online:
Multiprocessor System-on-Chip

Abstract

Parallel programming using the current state-of-the-art in software engineering techniques is hard. Expertise in parallel programming is necessary to deliver good performance in applications; however, it is very common that domain experts lack the requisite expertise in parallel programming. In order to drive the computer science research toward effectively using the available parallel hardware platforms, it is very important to make parallel programming systematical and productive. We believe that the key to designing parallel programs in a systematical way is software architecture, and the key to improve the productivity of developing parallel programs is software frameworks. The basis of both is design patterns and a pattern language.

We illustrate how we can use design patterns to architect a wide variety of real applications, including image recognition, speech recognition, optical ?ow computation, video background subtraction, compressed sensing MRI, computational finance, video games, and machine translation. By exploring software architectures of our applications, we achieved 10x-140x speedups in each of the applications. We illustrate how we can develop parallel programs productively using application frameworks and programming frameworks. We achieve 50%-100% of the performance while using four times fewer lines of code compared to hand-optimized code.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Catanzaro B, Keutzer K (2010) Parallel Computing with Patterns and Frameworks. ACM Crossroads, vol. 16, no. 5, pp. 22-27.

    Google Scholar 

  2. Our pattern language. http://parlab.eecs.berkeley.edu/wiki/patterns/patterns. Accessed 15 December 2009.

  3. Keutzer K, Mattson T (2009) A design pattern language for engineering (parallel) software. Intel Technology Journal, Addressing the Challenges of Tera-scale Computing, vol.13, no. 4, pp. 6–19.

    Google Scholar 

  4. Asanovic K et al (2006) The landscape of parallel computing research: A view from Berkeley. EECS Department, University of California, Berkeley, Tech. Rep. UCB/EECS-2006-183.

    Google Scholar 

  5. Garlan D, Shaw M (1994) An introduction to software architecture. Tech. Rep.,, Pittsburgh, PA, USA.

    Google Scholar 

  6. Maire M, Arbelaez P, Fowlkes C, and Malik J (2008) Using contours to detect and localize junctions in natural images. CVPR 2008, pp. 1–8.

    Google Scholar 

  7. Cortes C, Vapnik V (1995) Support-vector networks. Machine Learning, 20: 273–297.

    MATH  Google Scholar 

  8. Catanzaro B, Su B, Sundaram N, Lee Y, Murphy M, Keutzer K (2009) Efficient, high quality image contour detector. ICCV 2009, pp. 2381-2388.

    Google Scholar 

  9. Chang C, Lin C (2001) LIBSVM : a library for support vector machines. Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm. Accessed 15 December 2009.

  10. Catanzaro B, Sundaram N, Keutzer K (2008) Fast support vector machine training and classification on graphics processors. ICML 2008, pp 104-111.

    Article  Google Scholar 

  11. Brox T, Malik J (2010) Large displacement optical flow:descriptor matching in variational motion estimation. IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 99.

    Google Scholar 

  12. Brox T, Bruhn A, Papenberg N, Weickert J (2004) High accuracy optical flow estimation based on a theory for warping. ECCV 2004, pp. 25–36.

    Google Scholar 

  13. Baker S, Scharstein D, Lewis J, Roth S, Black M, Szeliski R (2007) A database and evaluation methodology for optical flow. ICCV 2009, pp. 1–8.

    Google Scholar 

  14. Sundaram N, Brox T, Keutzer K (2010) Dense Point Trajectories by GPU-accelerated Large Displacement Optical Flow. ECCV 2010, pp. 438–451.

    Google Scholar 

  15. Zach C, Gallup D, Frahm J M (2008) Fast gain-adaptive KLT tracking on the GPU. CVPR Workshop on Visual Computer Vision on GPU’s.

    Google Scholar 

  16. Sand P, Teller S (2008) Particle video: Long-range motion estimation using point trajectories. International Journal of Computer Vision, pp. 72–91.

    Google Scholar 

  17. Wang L, Wang L, Wen M, Zhuo Q, Wang W (2007) Background subtraction using incremental subspace learning. ICIP 2007, vol. 5, pp. 45–48.

    Google Scholar 

  18. Demmel J, Grigori L, Hoemmen M, Langou J (2008) Communication-optimal parallel and sequential QR and LU factorizations. Tech. Rep. UCB/EECS-2008-89.

    Google Scholar 

  19. Chong J, You K, Yi Y, Gonina E, Hughes C, Sung W, Keutzer K (2009) Scalable HMM-based inference engine in large vocabulary continuous speech recognition. ICME 2009, pp. 1797-1800.

    Google Scholar 

  20. You K, Chong J, Yi Y, Gonina E, Hughes C, Chen Y, Sung W, Keutzer K (2009) Parallel scalability in speech recognition: Inference engine in large vocabulary continuous speech recognition. IEEE Signal Processing Magazine, 26(6): 124-135.

    Article  Google Scholar 

  21. Chong J, Gonina E, Yi Y, Keutzer K (2009) A fully data parallel WFST-based large vocabulary continuous speech recognition on a graphics processing unit. Proceeding of the 10th Annual Conference of the International Speech Communication Association, pp. 1183 – 1186.

    Google Scholar 

  22. Chong J, Gonina E, You K, Keutzer K (2010) Exploring Recognition Network Representations for Efficient Speech Inference on Highly Parallel Platforms. Proceedings of the 11th Annual Conference of the International Speech Communication Association, pp. 1489-1492.

    Google Scholar 

  23. Candès E J (2006) Compressive sampling. Proceedings of the International Congress of Mathematicians.

    Google Scholar 

  24. Lustig M, Alley M, Vasanawala S, Donoho D L, Pauly J M (2009) Autocalibrating parallel imaging compressed sensing using L1 SPIR-iT with Poisson-Disc sampling and joint sparsity constraints. ISMRM Workshop on Data Sampling and Image Reconstruction.

    Google Scholar 

  25. Murphy M, Keutzer K, Vasanawala S, Lustig M (2010) Clinically Feasible Reconstruction for L1-SPIRiT Parallel Imaging and Compressed Sensing MRI. ISMRM 2010.

    Google Scholar 

  26. Dixon M, Chong J, Keutzer K (2009) Acceleration of market value-at-risk estimation. Workshop on High Performance Computing in Finance at Super Computing.

    Google Scholar 

  27. Worth B, Lindberg P, Granatir (2009) Smoke: Game Threading Tutorial. Game Developers Conference.

    Google Scholar 

  28. Cocke J, Schwartz J T (1970) Programming languages and their compilers: Preliminary notes. Courant Institute of Mathematical Sciences, New York University, Tech. Rep.

    Google Scholar 

  29. Kasami T (1965) An efficient recognition and syntax-analysis algorithm for context-free languages. Scientific report AFCRL-65-758, Air Force Cambridge Research Lab, Bedford, MA.

    Google Scholar 

  30. Pollack F (1999) Microarchitecture challenges in the coming generations of CMOS process tech-nologies. MICRO-32.

    Google Scholar 

  31. Gustafson J L (1988) Reevaluating Amdahl’s Law, CACM, 31(5): 532-533.

    Google Scholar 

  32. Luszczek P, Bailey D, Dongarra J, Kepner J, Lucas R, Rabenseifner R, Takahashi D (2006) The HPC Challenge (HPCC) benchmark suite. SC06 Conference Tutorial.

    Google Scholar 

  33. Sundaram N, Raghunathan, Chakradhar S (2009) A framework for efficient and scalable execution of domain specific templates on GPUs. IEEE International Parallel and Distributed Processing Symposium.

    Google Scholar 

  34. Catanzaro B, Kamil S, Lee Y, Asanovic K, Demmel J, Keutzer K, Shalf J, Yelick K, Fox A (2009) SEJITS: Getting productivity and performance with Selective Embedded JIT Specialization. Programming Models for Emerging Architectures.

    Google Scholar 

  35. Catanzaro B, Garland M, Keutzer K (2010) Copperhead: Compiling an Embedded Data Parallel Language. Tech. Rep. UCB/EECS-2010-124.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Kurt Keutzer .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer Science+Business Media, LLC

About this chapter

Cite this chapter

Anderson, M. et al. (2011). PALLAS: Mapping Applications onto Manycore. In: Hübner, M., Becker, J. (eds) Multiprocessor System-on-Chip. Springer, New York, NY. https://doi.org/10.1007/978-1-4419-6460-1_4

Download citation

  • DOI: https://doi.org/10.1007/978-1-4419-6460-1_4

  • Published:

  • Publisher Name: Springer, New York, NY

  • Print ISBN: 978-1-4419-6459-5

  • Online ISBN: 978-1-4419-6460-1

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics