Usage of the TRACO Compiler for Neural Network Parallelization

  • Marek Palkowski
  • Wlodzimierz Bielecki
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8467)


Artificial neural networks (ANNs) are used often to solve a wide variety of problems using high performance computing. The paper presents automatic loop parallelization for selected ANNs programs by means of the TRACO compiler that permits us to extract loop dependences and produce synchronization-free slices including loop statement instances. Coarse-grained parallelism of nested program loops is obtained by creating a thread of computations on each processor to be executed independently. Program loops of recurrent and back-propagation networks are analysed. The speed-up and efficiency of parallel programs produced by means of TRACO are studied. Related compilers and ANNs parallelization techniques are considered. Future work is outlined.


artificial neural networks automatic loop parallelization iteration space slicing multi-core processing 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Fausett, L.: Fundamentals of neural networks: Architectures, algorithms, and applications, pp. 169–175. Prentice Hall, New Jersey (1994)zbMATHGoogle Scholar
  2. 2.
    Lim, A., Lam, M., Cheong, G.: An affine partitioning algorithm to maximize parallelism and minimize communication. In: ICS 1999, pp. 228–237. ACM Press (1999)Google Scholar
  3. 3.
    Feautrier, P.: Some efficient solutions to the affine scheduling problem, part I and II, one and multidimensional time. International Journal of Parallel Programming 21, 313–348, 389–420 (1992)Google Scholar
  4. 4.
    Beletska, A., Bielecki, W., Cohen, A., Palkowski, M., Siedlecki, K.: Coarse-grained loop parallelization: Iteration space slicing vs affine transformations. Parallel Computing 37, 479–497 (2011)CrossRefGoogle Scholar
  5. 5.
    OpenMP Specification, version 3.1 (2011),
  6. 6.
    Pugh, W., Rosser, E.: Iteration space slicing and its application to communication optimization. In: International Conference on Supercomputing, pp. 221–228 (1997)Google Scholar
  7. 7.
    Pugh, W., Wonnacott, D.: An exact method for analysis of value-based array data dependences. In: Banerjee, U., Gelernter, D., Nicolau, A., Padua, D.A. (eds.) LCPC 1993. LNCS, vol. 768, pp. 546–566. Springer, Heidelberg (1994)CrossRefGoogle Scholar
  8. 8.
    Kelly, W., Pugh, W., Rosser, E., Maslov, V., Shpeisman, T., Wonnacott, D.: New User Interface for Petit and Other Extensions. User Guide (1996)Google Scholar
  9. 9.
    Kelly, W., Maslov, V., Pugh, W., Rosser, E., Shpeisman, T., Wonnacott, D.: The omega library interface guide. Technical report, College Park, MD, USA (1995)Google Scholar
  10. 10.
    Bastoul, C.: Code Generation in the Polyhedral Model Is Easier Than You Think. In: PACT’13 IEEE International Conference on Parallel Architecture and Compilation Techniques, Juan-les-Pins, France, pp. 7–16 (2004)Google Scholar
  11. 11.
    Wonnacott, D.: A Retrospective of the Omega Project, Haverford College Computer Science Tech Report (2010)Google Scholar
  12. 12.
    Aarts, E.: Boltzmann machines for travelling salesman problems. European Journal of Operational Research 39(1), 79–95 (1989)CrossRefzbMATHMathSciNetGoogle Scholar
  13. 13.
    Frenz, C.: Give Your .NET App Brains and Brawn with the Intelligence of Neural Networks. MSDN Magazine 5/2005 (2005),
  14. 14.
    McCullock, J.: Mnemosyne Studio - Neural Network Programming (2011),
  15. 15.
    Elman, J.L.: Finding structure in time. Cognitive Science 14, 179–211 (1990)CrossRefGoogle Scholar
  16. 16.
    Seiffert, U.: Artificial Neural Networks on Massively Parallel Computer Hardware. In: ESANN 2002 Proceedings - European Symposium on Artificial Neural Networks, Bruges (Belgium), pp. 319–330. d-side publi. (2002) ISBN 2-930307-02-1Google Scholar
  17. 17.
    Dahl, G., McAvinney, A., Newhall, T.: Parallelizing neural network training for cluster systems. In: PDCN 2008 Proceedings of the IASTED International Conference on Parallel and Distributed Computing and Networks, pp. 220–225 (2008)Google Scholar
  18. 18.
    Long, L., Gupta, A.: Scalable Massively Parallel Artificial Neural Networks. Journal of Aerospace Computing, Information and Communication 5(1) (2008)Google Scholar
  19. 19.
    Beyon, T.: A parallel implementation of the back-propagation algorithm on a network of transputers. In: Proc. First IEEE Int. Neural Network Conf. (1987)Google Scholar
  20. 20.
    Tsaregorodtsev, V.G.: Parallel Implementation of Back-Propagation Neural Network Software on SMP Computers. In: Malyshkin, V.E. (ed.) PaCT 2005. LNCS, vol. 3606, pp. 186–192. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  21. 21.
    Bondhugula, U., Hartono, A., Ramanujan, J., Sadayappan, P.: A practical automatic polyhedral parallelizer and locality optimizer. In: ACM SIGPLAN Programming Languages Design and Implementation, PLDI 2008 (2008)Google Scholar
  22. 22.
    Chirag, D., Hansang, B., Seung-Jai, M., Seyong, L., Eigenmann, R., Midkiff, S.: CETUS: A Source-to-Source Compiler Infrastructure for Multicores. IEEE Computer, 36–42 (2009)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  • Marek Palkowski
    • 1
  • Wlodzimierz Bielecki
    • 1
  1. 1.Faculty of Computer Science and Information SystemsWest Pomeranian University of Technology in SzczecinSzczecinPoland

Personalised recommendations