Productivity-Aware Design and Implementation of Distributed Tree-Based Search Algorithms

  • Tiago CarneiroEmail author
  • Nouredine Melab
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11536)


Parallel tree search algorithms offer viable solutions to problems in different areas, such as operations research, machine learning and artificial intelligence. This class of algorithms is highly compute-intensive, irregular and usually relies on context-specific data structures and hand-made code optimizations. Therefore, C and C++ are the languages often employed, due to their low-level features and performance. In this work, we investigate the use of Chapel high-productivity language for the design and implementation of distributed tree search algorithms for solving combinatorial problems. The experimental results show that Chapel is a suitable language for this purpose, both in terms of performance and productivity. Despite the use of high-level features, the distributed tree search in Chapel is on average \(16\%\) slower and reaches up to \(85\%\) of the scalability observed for its MPI+OpenMP counterpart.


Tree search algorithms High productivity PGAS Chapel MPI+OpenMP 



The experiments presented in this paper were carried out on the Grid’5000 testbed [4], hosted by INRIA and including several other organizations ( We thank Bradford Chamberlain, Elliot Ronaghan (from Cray inc.) and Paul Hargrove (Berkeley lab.) for helping us to run GASNet on GRID5000. Moreover, we also thank Paul Hargrove for the modifications in GASNet InfiniBand implementation necessary to run GASNet on GRID’5000 MXM InfiniBand networks.


  1. 1.
    Almasi, G.: PGAS (partitioned global address space) languages. In: Padua, D. (ed.) Encyclopedia of Parallel Computing, pp. 1539–1545. Springer, Boston (2011). Scholar
  2. 2.
    Asanovic, K., et al.: The landscape of parallel computing research: a view from Berkeley. Technical report, Technical Report UCB/EECS-2006-183, EECS Department, University of California (2006)Google Scholar
  3. 3.
    Bell, J., Stevens, B.: A survey of known results and research areas for n-queens. Discrete Math. 309(1), 1–31 (2009)MathSciNetCrossRefGoogle Scholar
  4. 4.
    Bolze, R., et al.: Grid’5000: a large scale and highly reconfigurable experimental grid testbed. Int. J. High Perform. Comput. Appl. 20(4), 481–494 (2006)CrossRefGoogle Scholar
  5. 5.
    Carneiro, T., de Carvalho Júnior, F.H., Arruda, N.G.P.B., Pinheiro, A.B.: Um levantamento na literatura sobre a resolução de problemas de otimização combinatória através do uso de aceleradores gráficos. In: Proceedings of the XXXV Ibero-Latin American Congress on Computational Methods in Engineering (CILAMCE), Fortaleza-CE, Brasil (2014)Google Scholar
  6. 6.
    Carneiro Pessoa, T., Gmys, J., de Carvalho Junior, F.H., Melab, N., Tuyttens, D.: GPU-accelerated backtracking using CUDA dynamic parallelism. Concurr. Comput. Pract. Exp. 30, e4374-n/a (2017). Scholar
  7. 7.
    Chamberlain, B.L., Choi, S.E., Deitz, S.J., Navarro, A.: User-defined parallel zippered iterators in chapel. In: Proceedings of Fifth Conference on Partitioned Global Address Space Programming Models, pp. 1–11 (2011)Google Scholar
  8. 8.
    Chamberlain, B.L., et al.: Chapel comes of age: making scalable programming productive. Cray User Group (2018)Google Scholar
  9. 9.
    Crainic, T., Le Cun, B., Roucairol, C.: Parallel branch-and-bound algorithms. Parallel combinatorial optimization, pp. 1–28 (2006)Google Scholar
  10. 10.
    Cray Inc.: Chapel language specification, vol. 986. Cray Inc. (2018)Google Scholar
  11. 11.
    Da Costa, G., et al.: Exascale machines require new programming paradigms and runtimes. Supercomput. Front. Innov. 2(2), 6–27 (2015)Google Scholar
  12. 12.
    Feinbube, F., Rabe, B., von Löwis, M., Polze, A.: NQueens on CUDA: optimization issues. In: 2010 Ninth International Symposium on Parallel and Distributed Computing (ISPDC), pp. 63–70. IEEE (2010)Google Scholar
  13. 13.
    Fiore, S., Bakhouya, M., Smari, W.W.: On the road to exascale: advances in high performance computing and simulations–an overview and editorial. Future Gener. Comput. Syst. 82, 450–458 (2018)CrossRefGoogle Scholar
  14. 14.
    Gmys, J., Mezmaz, M., Melab, N., Tuyttens, D.: IVM-based parallel branch-and-bound using hierarchical work stealing on multi-GPU systems. Concurr. Comput. Pract. Exp. 29(9), e4019 (2017)CrossRefGoogle Scholar
  15. 15.
    Grama, A.Y., Kumar, V.: A survey of parallel search algorithms for discrete optimization problems. ORSA J. Comput. 7 (1993).
  16. 16.
    Mezmaz, M., Melab, N., Talbi, E.G.: A grid-enabled branch and bound algorithm for solving challenging combinatorial optimization problems. In: IEEE International Parallel and Distributed Processing Symposium, IPDPS 2007, pp. 1–9. IEEE (2007)Google Scholar
  17. 17.
    San Segundo, P., Rossi, C., Rodriguez-Losada, D.: Recent developments in bit-parallel algorithms. INTECH Open Access Publisher (2008)Google Scholar
  18. 18.
    Tschoke, S., Lubling, R., Monien, B.: Solving the traveling salesman problem with a distributed branch-and-bound algorithm on a 1024 processor network. In: 9th International Parallel Processing Symposium. Proceedings, pp. 182–189. IEEE (1995)Google Scholar
  19. 19.
    Zhang, W.: Branch-and-bound search algorithms and their computational complexity. Technical report, DTIC Document (1996)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.INRIA Lille - Nord EuropeLilleFrance
  2. 2.Université de Lille, CNRS/CRIStALLilleFrance

Personalised recommendations