Skip to main content

CellCilk: Extending Cilk for Heterogeneous Multicore Platforms

  • Conference paper
  • 927 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 7146))

Abstract

The potential of heterogeneous multicores, like the Cell BE, can only be exploited if the host and the accelerator cores are used in parallel and if the specific features of the cores are considered. Parallel programming, especially when applied to irregular task-parallel problems, is challenging itself. However, heterogeneous multicores add to that complexity due to their memory hierarchy and specialized accelerators. As a solution for these issues we present CellCilk, a prototype implementation of Cilk for heterogeneous multicores with a host/accelerator design, using the Cell BE in particular. CellCilk introduces a new keyword (spu_spawn) for task creation on the accelerator cores. Task scheduling and load balancing are done by a novel dynamic cross-hierarchy work-stealing regime. Furthermore, the CellCilk runtime employs a garbage collection mechanism for distributed data structures that are created during scheduling. On benchmarks we achieve a good speedup and reasonable runtimes, even when compared to manually parallelized codes.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Bellens, P., Pérez, J.M., Cabarcas, F., Ramírez, A., Badia, R.M., Labarta, J.: CellSs: Scheduling techniques to better exploit memory hierarchy. Scientific Programming 17(1-2), 77–95 (2009)

    Google Scholar 

  2. Blumofe, R.D., Frigo, M., Joerg, C.F., Leiserson, C.E., Randall, K.H.: An Analysis of Dag-Consistent Distributed Shared-Memory Algorithms. In: SPAA 1996: Proc. Symp. Parallel Algorithms and Architectures, Padua, Italy, pp. 297–308 (June 1996)

    Google Scholar 

  3. Cao, Q., Hu, C., He, H., Huang, X., Li, S.: Support for OpenMP Tasks on Cell Architecture. In: Hsu, C.-H., Yang, L.T., Park, J.H., Yeo, S.-S. (eds.) ICA3PP 2010, Part II. LNCS, vol. 6082, pp. 308–317. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

  4. Cooper, P., Dolinsky, U., Donaldson, A.F., Richards, A., Riley, C., Russell, G.: Offload – Automating Code Migration to Heterogeneous Multicore Systems. In: Patt, Y.N., Foglia, P., Duesterwald, E., Faraboschi, P., Martorell, X. (eds.) HiPEAC 2010. LNCS, vol. 5952, pp. 337–352. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

  5. Dinan, J., Larkins, D.B., Sadayappan, P., Krishnamoorthy, S., Nieplocha, J.: Scalable work stealing. In: SC 2009: Proc. Conf. High Performance Computing Networking, Storage and Analysis, Portland, OR, pp. 53:1–53:11. ACM (November 2009)

    Google Scholar 

  6. Duff, T.: Duff’s device. Usenet posting (November 1983), http://www.lysator.liu.se/c/duffs-device.html

  7. Frigo, M., Leiserson, C.E., Randall, K.H.: The implementation of the Cilk-5 multithreaded language. In: PLDI 1998: Proc. Conf. Programming Language Design and Impl., Montreal, Canada, pp. 212–223 (June 1998)

    Google Scholar 

  8. Frigo, M., Strumpen, V.: Cache oblivious stencil computations. In: ICS 2005: Proc. Intl. Conf. Supercomputing, Cambridge, MA, pp. 361–366 (June 2005)

    Google Scholar 

  9. Frigo, M., Strumpen, V.: The cache complexity of multithreaded cache oblivious algorithms. In: SPAA 2006: Proc. Symp. Parallel Algorithms and Architectures, Cambridge, MA, pp. 271–280 (July 2006)

    Google Scholar 

  10. Hackenberg, D.: Fast Matrix Multiplication on Cell (SMP) Systems (2009), http://www.tu-dresden.de/zih/cell/matmul

  11. Jones, R., Lins, R.: Garbage Collection: Algorithms for Automatic Dynamic Memory Management. Wiley & Sons (1996)

    Google Scholar 

  12. Leiserson, C.E.: Programming Irregular Parallel Applications in Cilk. In: Lüling, R., Bilardi, G., Ferreira, A., Rolim, J.D.P. (eds.) IRREGULAR 1997. LNCS, vol. 1253, pp. 61–71. Springer, Heidelberg (1997)

    Chapter  Google Scholar 

  13. Mendes, R., Whately, L., de Castro, M.C.S., Bentes, C., de Amorim, C.L.: Runtime System Support for Running Applications with Dynamic and Asynchronous Task Parallelism in Software DSM Systems. In: SBAC-PAD 2006: Symp. Computer Architecture and High Performance Computing, Ouro Preto, Brasil, pp. 159–166 (October 2006)

    Google Scholar 

  14. O’Brien, K., O’Brien, K., Sura, Z., Chen, T., Zhang, T.: Supporting OpenMP on Cell. In: Chapman, B., Zheng, W., Gao, G.R., Sato, M., Ayguadé, E., Wang, D. (eds.) IWOMP 2007. LNCS, vol. 4935, pp. 65–76. Springer, Heidelberg (2008)

    Chapter  Google Scholar 

  15. Peng, L., Wong, W.F., Yuen, C.K.: The Performance Model of SilkRoad - A Multithreaded DSM System for Clusters. In: CCGRID 2003: Intl. Symp. Cluster Computing and the Grid, Tokyo, Japan, pp. 495–501 (May 2003)

    Google Scholar 

  16. Randall, K.H.: Cilk: Efficient Multithreaded Computing. Ph.D. thesis, Massachusetts Institute of Technology (June 1998)

    Google Scholar 

  17. Seo, S., Lee, J., Sura, Z.: Design and implementation of software-managed caches for multicores with local memory. In: HPCA 2009: Intl. Conf. High-Performance Computer Architecture, Raleigh, NC, pp. 55–66. IEEE (February 2009)

    Google Scholar 

  18. Werth, T., Floßmann, T., Klemm, M., Schell, D., Weigand, U., Philippsen, M.: Dynamic Code Footprint Optimization for the IBM Cell Broadband Engine. In: IWMSE 2009: Proc. ICSE Workshop on Multicore Software Engineering, Vancouver, Canada, pp. 64–72 (May 2009)

    Google Scholar 

  19. Zeiser, T., Wellein, G., Iglberger, K., Nitsure, A., Rüde, U., Hager, G.: Introducing a parallel cache oblivious blocking approach for the Lattice Boltzmann Method. Progress in Computational Fluid Dynamics 8(1-4), 179–188 (2008)

    Article  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Werth, T., Schreier, S., Philippsen, M. (2013). CellCilk: Extending Cilk for Heterogeneous Multicore Platforms. In: Rajopadhye, S., Mills Strout, M. (eds) Languages and Compilers for Parallel Computing. LCPC 2011. Lecture Notes in Computer Science, vol 7146. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-36036-7_7

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-36036-7_7

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-36035-0

  • Online ISBN: 978-3-642-36036-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics