Accelerating Code on Multi-cores with FastFlow

  • Marco Aldinucci
  • Marco Danelutto
  • Peter Kilpatrick
  • Massimiliano Meneghin
  • Massimo Torquati
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6853)


FastFlow is a programming framework specifically targeting cache-coherent shared-memory multi-cores. It is implemented as a stack of C++ template libraries built on top of lock-free (and memory fence free) synchronization mechanisms. Its philosophy is to combine programmability with performance. In this paper a new FastFlow programming methodology aimed at supporting parallelization of existing sequential code via offloading onto a dynamically created software accelerator is presented. The new methodology has been validated using a set of simple micro-benchmarks and some real applications.


offload patterns multi-core lock-free synchronization C++ 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Stephens, R.: A survey of stream processing. Acta Informatica 34(7), 491–541 (1997)CrossRefMathSciNetzbMATHGoogle Scholar
  2. 2.
    Cole, M.: Algorithmic Skeletons: Structured Management of Parallel Computations. Research Monographs in Parallel and Distributed Computing. Pitman (1989)Google Scholar
  3. 3.
    Vanneschi, M.: The programming model of ASSIST, an environment for parallel and distributed portable applications. Parallel Computing 28(12), 1709–1732 (2002)CrossRefzbMATHGoogle Scholar
  4. 4.
    Dean, J., Ghemawat, S.: MapReduce: Simplified data processing on large clusters. In: Usenix OSDI 2004, pp. 137–150 (December 2004)Google Scholar
  5. 5.
    Intel Corp.: Threading Building Blocks (2011),
  6. 6.
    Asanovic, K., Bodik, R., Demmel, J., Keaveny, T., Keutzer, K., Kubiatowicz, J., Morgan, N., Patterson, D., Sen, K., Wawrzynek, J., Wessel, D., Yelick, K.: A view of the parallel computing landscape. CACM 52(10), 56–67 (2009)CrossRefGoogle Scholar
  7. 7.
    Bernstein, A.J.: Program analysis for parallel processing. IEEE Trans. on Electronic Computers EC-15(5), 757–762 (1966)CrossRefzbMATHGoogle Scholar
  8. 8.
    Pop, A., Pop, S., Jagasia, H., Sjodin, J., Kelly, P.H.J.: Improving GCC infrastructure for streamization. In: Proc. of the 2008 GCC Developers’ Summit, Ottawa, Canada (June 2008)Google Scholar
  9. 9.
    Aldinucci, M., Torquati, M.: FastFlow website (2009),
  10. 10.
    Aldinucci, M., Meneghin, M., Torquati, M.: Efficient Smith-Waterman on multi-core with fastflow. In: Danelutto, M., Gross, T., Bourgeois, J. (eds.) Proc. of Intl. Euromicro PDP 2010: Parallel Distributed and Network-Based Processing, Pisa, Italy, pp. 195–199 (February 2010)Google Scholar
  11. 11.
    Aldinucci, M., Drocco, M., Giordano, D., Spampinato, C., Torquati, M.: A parallel edge preserving algorithm for salt and pepper image denoising. Technical Report 138/2011, Università degli Studi di Torino, Dip. di Informatica, Italy (May 2011)Google Scholar
  12. 12.
    Gilchrist, J.: Parallel data compression with bzip2. In: Proc. of IASTED Intl. Conference on Parallel and Distributed Computing and Systems, pp. 559–564 (2004)Google Scholar
  13. 13.
    Buck, I., Foley, T., Horn, D., Sugerman, J., Fatahalian, K., Houston, M., Hanrahan, P.: Brook for GPUs: stream computing on graphics hardware. In: ACM SIGGRAPH 2004 Papers, New York, NY, USA, pp. 777–786 (2004)Google Scholar
  14. 14.
    Cooper, P., Dolinsky, U., Donaldson, A.F., Richards, A., Riley, C., Russell, G.: Offload – automating code migration to heterogeneous multicore systems. In: Patt, Y.N., Foglia, P., Duesterwald, E., Faraboschi, P., Martorell, X. (eds.) HiPEAC 2010. LNCS, vol. 5952, pp. 337–352. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  15. 15.
    Ferrer, R., Planas, J., Bellens, P., Duran, A., González, M., Martorell, X., Badia, R.M., Ayguadé, E., Labarta, J.: Optimizing the exploitation of multicore processors and gPUs with openMP and openCL. In: Cooper, K., Mellor-Crummey, J., Sarkar, V. (eds.) LCPC 2010. LNCS, vol. 6548, pp. 215–229. Springer, Heidelberg (2011)CrossRefGoogle Scholar
  16. 16.
    Park, I., Voss, M.J., Kim, S.W., Eigenmann, R.: Parallel programming environment for OpenMP. Scientific Programming 9, 143–161 (2001)CrossRefGoogle Scholar
  17. 17.
    Kunzman, D.M., Kalé, L.V.: Towards a framework for abstracting accelerators in parallel applications: experience with cell. In: Proc. of the Conference on High Performance Computing (SC), Portland, Oregon, USA, ACM, pp. 1–12. ACM, New York (2009)Google Scholar
  18. 18.
    Thies, W., Karczmarek, M., Amarasinghe, S.P.: StreamIt: A language for streaming applications. In: Proc. of the 11th Intl. Conference on Compiler Construction (CC), London, UK, pp. 179–196 (2002)Google Scholar
  19. 19.
    Newton, R., Schlimbach, F., Hampton, M., Knobe, K.: Capturing and composing parallel patterns with Intel CnC. In: Proc. of 2nd USENIX Workshop on Hot Topics in Parallelism (HotPar 2010), Berkley, CA, USA (June 2010)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2011

Authors and Affiliations

  • Marco Aldinucci
    • 1
  • Marco Danelutto
    • 2
  • Peter Kilpatrick
    • 3
  • Massimiliano Meneghin
    • 4
  • Massimo Torquati
    • 2
  1. 1.Computer Science DepartmentUniversity of TorinoItaly
  2. 2.Computer Science DepartmentUniversity of PisaItaly
  3. 3.Computer Science DepartmentQueen’s University BelfastUK
  4. 4.IBM Dublin Research LabIreland

Personalised recommendations