A general and efficient divide-and-conquer algorithm framework for multi-core clusters

González, Carlos H.; Fraguela, Basilio B.

doi:10.1007/s10586-017-0766-y

A general and efficient divide-and-conquer algorithm framework for multi-core clusters

Published: 14 February 2017

Volume 20, pages 2605–2626, (2017)
Cite this article

Cluster Computing Aims and scope Submit manuscript

417 Accesses
6 Citations
1 Altmetric
Explore all metrics

Abstract

Divide-and-conquer is one of the most important patterns of parallelism, being applicable to a large variety of problems. In addition, the most powerful parallel systems available nowadays are computer clusters composed of distributed-memory nodes that contain an increasing number of cores that share a common memory. The optimal exploitation of these systems often requires resorting to a hybrid model that mimics the underlying hardware by combining a distributed and a shared memory parallel programming model. This results in longer development times and increased maintenance costs. In this paper we present a very general skeleton library that allows to parallelize any divide-and-conquer problem in hybrid distributed-shared memory systems with little effort while providing much flexibility and good performance. Our proposal combines a message-passing paradigm at the process level and a threaded model inside each process, hiding the related complexity from the user. The evaluation shows that this skeleton provides performance comparable, and often better than that of manually optimized codes while requiring considerably less effort when parallelizing applications on multi-core clusters.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A highly optimized skeleton for unbalanced and deep divide-and-conquer algorithms on multi-core clusters

Article Open access 24 January 2022

A Parallel Skeleton for Divide-and-conquer Unbalanced and Deep Problems

Article 14 May 2021

An Efficient Programming Skeleton for Clusters of Multi-Core Processors

Article 18 September 2017

Notes

One might think that in 2016 every major compiler distribution should support OpenMP, but as a representative example, during the development of this work we found that the compilers of the standard development environment for the current version of Mac OS X do not support OpenMP.

References

Aho, A.V., Hopcroft, J.E., Ullman, J.D.: The Design and Analysis of Computer Algorithms. Addison-Wesley, Reading, MA (1974)
MATH Google Scholar
Aldinucci, M., Danelutto, M., Teti, P.: An advanced environment supporting structured parallel programming in Java. Future Gener. Comput. Syst. 19(5), 611–626 (2003)
Article Google Scholar
Barnes, J., Hut, P.: A hierarchical O (N log N) force-calculation algorithm. Nature 324(4), 446–559 (1986)
Article Google Scholar
Bientinesi, P., Gunnels, J.A., Myers, M.E., Quintana- Ortí, E.S., van de Geijn, R.A.: The science of deriving dense linear algebra algorithms. ACM Trans. Math. Softw. 31(1), 1–26 (2005)
Blumofe, R.D., Joerg, C.F., Kuszmaul, B.C., Leiserson, C.E., Randall, K.H., Zhou, Y.: Cilk: an efficient multithreaded runtime system. J Parallel Distrib. Comput. 37(1), 55–69 (1996)
Article Google Scholar
Boost.org. Boost C++ libraries. http://boost.org (2016). Accessed 10 Dec 2016
Ciechanowicz, P., Kuchen, H.: Enhancing Muesli’s data parallel skeletons for multi-core computer architectures. In 12th IEEE International Conference on High Performance Computing and Communications, (HPCC 2010), pp. 108–113. Los Alamitos, CA (2010)
Cole, M.: Algorithmic Skeletons: Structured Management of Parallel Computation. MIT Press, Cambridge, MA (1989)
MATH Google Scholar
Cole, M.: Bringing skeletons out of the closet: a pragmatic manifesto for skeletal parallel programming. Parallel Comput. 30(3), 389–406 (2004)
Article Google Scholar
Danelutto, M., De Matteis, T., Mencagli, G., Torquati, M.: A divide-and-conquer parallel pattern implementation for multicores. In Proceedings 3rd International Workshop on Software Engineering for Parallel Systems, SEPS 2016, pp. 10–19. New York, NY (2016). ACM
Dean, J., Ghemawat, S.: Mapreduce: simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008)
Article Google Scholar
Denis, A.: pioman: A pthread-based multithreaded communication engine. In Proceedings of 23rd Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP 2015), pp. 155–162. Los Alamitos, CA (2015)
Falcou, J., Sérot, J., Chateau, T., Lapresté, J.-T.: Quaff: efficient C++ design for parallel skeletons. Parallel Comput. 32(7–8), 604–615 (2006)
Article Google Scholar
Fleming, P.J., Wallace, J.J.: How not to lie with statistics: the correct way to summarize benchmark results. Commun. ACM 29(3), 218–221 (1986)
Article Google Scholar
Frigo, M., Leiserson, C.E., Prokop, H., Ramachandran, S.: Cache-oblivious algorithms. In Proceeding of 40th Annual Symposium on Foundations of Computer Science, FOCS ’99, pp. 285–297. Washington, DC, USA. IEEE Computer Society (1999)
González, C.H., Fraguela, B.B.: A generic algorithm template for divide-and-conquer in multicore systems. In Proceedings of 12th IEEE International Conference on High Performance Computing and Communications, (HPCC 2010), pp. 79–88. Los Alamitos, CA (2010)
González, C.H., Fraguela, B.B.: A framework for argument-based task synchronization with automatic detection of dependencies. Parallel Comput. 39(9), 475–489 (2013)
Article Google Scholar
González, C.H., Fraguela, B.B.: An algorithm template for domain-based parallel irregular algorithms. Int. J. Parallel Program. 42(6), 948–967 (2014)
Article Google Scholar
Gorlatch, S., Cole, M.: Parallel skeletons. Encyclopedia of Parallel Computing, pp. 1417–1422. Springer, New York (2011)
Google Scholar
Gregor, D., Troyer, M.: Boost.MPI. http://boost.cowic.de/rc/pdf/mpi.pdf (2007)
Halstead, M.H.: Elements of Software Science. Elsevier, New York, NY (1977)
MATH Google Scholar
Hijma, P., Jacobs, C.J.H., van Nieuwpoort, R.V., Bal, H.E.: Cashmere: Heterogeneous many-core computing. In 29th IEEE International Parallel and Distributed Processing Symposium (IPDPS 2015), pp. 135–145 (2015)
Horowitz, E., Zorat, A.: Divide-and-conquer for parallel processing. IEEE Trans. Comput. 32(6), 582–585 (1983)
Article MATH Google Scholar
Intel\(\textregistered \). Cilk\(^{\text{TM}}\) Plus. https://www.cilkplus.org (2016). Accessed 10 Dec 2016
Karasawa, Y., Iwasaki, H.: A parallel skeleton library for multi-core clusters. In Proceedings of 2009 International Conference on Parallel Processing (ICPP’09), pp. 84–91. Los Alamitos, CA (2009)
Kawakatsu, T., Kinoshita, A., Takasu, A., Adachi, J.: Divide-and-Conquer Parallelism for Learning Mixture Models, pp. 23–47. Springer, Berlin (2016)
Google Scholar
Kuchen, H.: A skeleton library. Euro-Par 2002 Parallel Processing. Volume 2400 of Lecture Notes in Computer Science, pp. 620–629. Springer, Berlin (2002)
Kulkarni, M., Burtscher, M., Cascaval, C., Pingali, K.: Lonestar: A suite of parallel irregular programs. In 2009 IEEE International Symposium on Performance Analysis of Systems and Software, pp. 65–76 (2009)
Leyton, M., Piquer, J. M.: Skandium: Multi-core programming with algorithmic skeletons. In Proceedings of 18th Euromicro Conference on Parallel, Distributed and Network-based Processing (PDP 2010), pp. 289–296. Los Alamitos, CA (2010)
Lima, J.V.F., Broquedis, F., Gautier, T., Raffin, B.: Preliminary experiments with xKaapi on Intel Xeon Phi coprocessor. In Proceedings of 25th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD 2013), pp. 105–112. Los Alamitos, CA (2013)
Mallón, D.A., Taboada, G.L., Teijeiro, C., Touriño, J., Fraguela, B.B., Gómez-Tato, A., Doallo, R., Carlos Mouriño, J.: Performance evaluation of MPI, UPC and OpenMP on multicore architectures. In Recent Advances in Parallel Virtual Machine and Message Passing Interface, 16th European PVM/MPI Users’ Group Meeting, pp. 174–184. Springer, Berlin (2009)
Mattson, T., Sanders, B., Massingill, B.: Patterns for Parallel Programming. Addison-Wesley Professional, Boston, MA (2004)
MATH Google Scholar
McCabe, T.J.: A complexity measure. IEEE Trans. Softw. Eng. 2, 308–320 (1976)
Article MathSciNet MATH Google Scholar
Nakatsukasa, Y., Higham, N .J.: Stable and efficient spectral divide and conquer algorithms for the symmetric eigenvalue decomposition and the SVD. SIAM J. Sci. Comput. 35(3), A1325–A1349 (2013)
Article MathSciNet MATH Google Scholar
National Aeronautics and Space Administration. NAS Parallel Benchmarks. http://www.nas.nasa.gov/Software/NPB/ (2010). Accessed 10 Dec 2016
Olivier, S.L., Prins, J.F.: Comparison of OpenMP 3.0 and other task parallel frameworks on unbalanced task graphs. Int. J. Parallel Program. 38(5–6), 341–360 (2010)
Article MATH Google Scholar
Reinders, J.: Intel Threading Building Blocks: Outfitting C++ for Multi-core Processor Parallelism. O’Reilly, Sebastopol, CA (2007)
Google Scholar
Rogers, A., Carlisle, M .C., Reppy, J .H., Hendren, L .J.: Supporting dynamic data structures on distributed-memory machines. ACM Trans. Program. Lang. Syst. 17(2), 233–263 (1995)
Article Google Scholar
Tang, G., Yang, W., Li, K., Ye, Y., Xiao, G., Li, K.: An iteration-based hybrid parallel algorithm for tridiagonal systems of equations on multi-core architectures. Concurr. Comput. Pract. Exp. 27(17), 5076–5095 (2015)
Article Google Scholar
Teijeiro, C., Taboada, G.L., Touriño, J., Fraguela, B.B., Doallo, R., Mallón, D.A., Gómez, A., Mouriño, J.C., Wibecan, B.: Evaluation of UPC programmability using classroom studies. In Proceedings of Third Conference on Partitioned Global Address Space Programing Models, PGAS ’09, pp. 10:1–10:7, New York, NY (2009)
Tejedor, E., Farreras, M., Grove, D., Badia, R.M., Almasi, G., Labarta, J.: A high-productivity task-based programming model for clusters. Concurr. Comput. Pract. Exp. 24(18), 2421–2448 (2012)
Article Google Scholar
Tousimojarad, A., Vanderbauwhede, W.: Number of tasks, not threads, is key. In Proceedings of 23rd Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP 2015), pp. 128–136. Los Alamitos, CA (2015)
Van Nieuwpoort, R.V., Wrzesińska, G., Jacobs, C.J.H., Bal, H.E.: Satin: A high-level and efficient grid programming model. ACM Trans. Program. Lang. Syst. 32(3), 9 (2010)
Google Scholar
Walter, J., Koch, M.: Boost basic linear algebra library (uBLAS). http://www.boost.org/libs/numeric/ublas/ (2002). Accessed 7 Dec 2016
White, T: Hadoop: The Definitive Guide. O’Reilly Media, Inc., 1st edition (2009)
Yelick, K., Bonachea, D., Chen, W.-Y., Colella, P., Datta, K., Duell, J., Graham, S.L., Hargrove, P., Hilfinger, P., Husbands, P., Iancu, C., Kamil, A., Nishtala, R., Su, J., Welcome, M., Wen, T.: Productivity and performance using partitioned global address space languages. In Proceedings of 2007 International Workshop on Parallel Symbolic Computation, PASCO ’07, pp. 24–32. New York, NY (2007)
Zaharia, M., Chowdhury, M., Franklin, M.J., Shenker, S., Stoica, I.: Spark: Cluster computing with working sets. In Proceedings of 2nd USENIX Conference on Hot Topics in Cloud Computing, HotCloud’10, pp. 10–10, Berkeley, CA. USENIX Association (2010)
Zang, W., Zhang, P., Zhou, C., Guo, L.: Locating multiple sources in social networks under the SIR model: a divide-and-conquer approach. J. Comput. Sci. 10, 278–287 (2015)
Article MathSciNet Google Scholar
Zhang, Y., Duchi, J.C., Wainwright, M.J.: Divide and conquer kernel ridge regression. In 26th Annual Conference on Learning Theory (COLT 2013), pp. 592–617 (2013)

Download references

Acknowledgements

This research was supported by the Ministry of Economy and Competitiveness of Spain and FEDER funds (80%) of the EU (Ref. TIN2013-42148-P and TIN2016-75845-P), and by the Galician Government under the Consolidation Program of Competitive Reference Groups (Ref. GRC2013/055). We also acknowledge the Centro de Supercomputación de Galicia (CESGA) for the use of their computers, as well as the 16 anonymous reviewers because of their valuable feedback and suggestions.

Author information

Authors and Affiliations

Universidade da Coruña, A Coruña, Spain
Carlos H. González & Basilio B. Fraguela

Authors

Carlos H. González
View author publications
You can also search for this author in PubMed Google Scholar
Basilio B. Fraguela
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Basilio B. Fraguela.

Rights and permissions

Reprints and permissions

About this article

Cite this article

González, C.H., Fraguela, B.B. A general and efficient divide-and-conquer algorithm framework for multi-core clusters. Cluster Comput 20, 2605–2626 (2017). https://doi.org/10.1007/s10586-017-0766-y

Download citation

Received: 28 July 2016
Revised: 13 December 2016
Accepted: 29 January 2017
Published: 14 February 2017
Issue Date: September 2017
DOI: https://doi.org/10.1007/s10586-017-0766-y

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A general and efficient divide-and-conquer algorithm framework for multi-core clusters

Abstract

Access this article

Similar content being viewed by others

A highly optimized skeleton for unbalanced and deep divide-and-conquer algorithms on multi-core clusters

A Parallel Skeleton for Divide-and-conquer Unbalanced and Deep Problems

An Efficient Programming Skeleton for Clusters of Multi-Core Processors

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A general and efficient divide-and-conquer algorithm framework for multi-core clusters

Abstract

Access this article

Similar content being viewed by others

A highly optimized skeleton for unbalanced and deep divide-and-conquer algorithms on multi-core clusters

A Parallel Skeleton for Divide-and-conquer Unbalanced and Deep Problems

An Efficient Programming Skeleton for Clusters of Multi-Core Processors

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation