Skip to main content
Log in

AceMesh: a structured data driven programming language for high performance computing

  • Regular Paper
  • Published:
CCF Transactions on High Performance Computing Aims and scope Submit manuscript

Abstract

Asynchronous task-based programming models are gaining popularity to address the programmability and performance challenges of contemporary large scale high performance computing systems. In this paper we present AceMesh, a task-based, data-driven language extension targeting legacy MPI applications. Its language features include data-centric parallelizing template, aggregated task dependence for parallel loops. These features not only relieve the programmer from tedious refactoring details but also provide possibility for structured execution of complex task graphs, data locality exploitation upon data tile templates, and reducing system complexity incurred by complex array sections. We present the prototype implementation, including task shifting, data management and communication-related analysis and transformations. The language extension is evaluated on two supercomputing platforms. We compare the performance of AceMesh with existing programming models, and the results show that NPB/MG achieves at most 1.2X and 1.85X speedups on TaihuLight and TH-2, respectively, and the Tend_lin benchmark attains more than 2X speedup on average and attain at most 3.0X and 2.2X speedups on the two platforms, respectively.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15.
Fig. 16

Similar content being viewed by others

References

  • Acun, B., Gupta, B., Jain, N., Langer, A., Menon, H., Mikida, E., Ni, A., Robson, M., Sun, Y., Totoni, E., Wesolowski, L., Kale, L.: Parallel programming with migratable objects: Charm++ in Practice. SC ’14: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, New Orleans, LA, 2014, pp. 647–658, doi: 10.1109/SC.2014.58.

  • Augonnet, C., Thibault, S., Namyst, R., Wacrenier, P.: StarPU: a unified platform for task scheduling on heterogeneous multicore architectures. Concurr. Comput. Pract. Exper. 23(2), 187–198 (2011). https://doi.org/10.1002/cpe.1631

    Article  Google Scholar 

  • Barrera, I.S., Moretó, M., Ayguadé, E., Labarta, J., Valero, M., Casas, M.: Reducing data movement on large shared memory systems by exploiting computation dependencies. In Proceedings of the 2018 International Conference on Supercomputing (ICS ’18). ACM, New York, NY, USA, pp. 207–217. https://doi.org/10.1145/3205289.3205310

  • Bauer, M., Treichler, S., Slaughter, E., Aiken, A.: Legion: expressing locality and independence with logical regions. In Proceedings of the 2012 ACM/IEEE International Conference on High Performance Computing, Networking, Storage and Analysis (SC ’12). IEEE Computer Society, Los Alamitos, CA, USA, Article 66, p. 11.

  • Broquedis, F., Aumage, O., Goglin, B., Thibault, S., Wacrenier, P., Namyst,R.: Structuring the execution of OpenMP applications for multicore architectures. 2010 IEEE International Symposium on Parallel and Distributed Processing (IPDPS), Atlanta, GA, 2010, pp. 1-10.

  • Castillo, M., Jain, N., Casas, M., Moreto, M., Schulz, M. Beivide, R., Valero, M., Bhatele, A.: Optimizing computation-communication overlap in asynchronous task-based programs. In Proceedings of the ACM International Conference on Supercomputing (ICS ’19). Association for Computing Machinery, New York, NY, USA, pp. 380–391. https://doi.org/10.1145/3330345.3330379

  • Cicotti, P.: Tarragon: a programming model for latency-hiding scientific computations. PhD thesis, Department of Computer Science and Engineering, University of California, San Diego (2011)

  • Drebes, A., Heydemann, K., Drach, N., Pop, A., Cohen, A.: Topology-aware and dependence-aware scheduling and memory allocation for task-parallel languages. ACM Trans. Archit. Code Optim. 11(3), 1–25 (2014). https://doi.org/10.1145/2641764

    Article  Google Scholar 

  • Drebes, A., Pop, A., Heydemann, A., Cohen, A., Drach, N.: Scalable task parallelism for NUMA: a uniform abstraction for coordinated scheduling and memory management. In International Conference on Parallel Architectures and Compilation (PACT ’16). ACM, New York, NY, USA, pp. 125–137. https://doi.org/10.1145/2967938.2967946

  • Duran, A., Ayguadé, E., Badia, R.M., Labarta, J., Martinell, L., Martorell, X., Planas, J.: OmpSs: a proposal for programming heterogeneous multi-core architectures. Parallel Process. Lett. 21(2), 173–193 (2011)

    Article  MathSciNet  Google Scholar 

  • Ghosh, P., Yan, Y., Chapman, B.: A prototype implementation of OpenMP task dependency support. In: Rendell, A.P., Chapman, B.M., M¨uller, M.S. (eds.) IWOMP 2013. LNCS, vol. 8122, pp. 128–140. Springer, Heidelberg (2013)

  • Kaiser, H., Heller, T., Adelstein-Lelbach, B., Serio, A., Fey, D.: HPX: a task based programming model in a global address space. In Proceedings of the 8th International Conference on Partitioned Global Address Space Programming Models (PGAS ’14). ACM, New York, NY, USA, Article 6, p. 11.

  • Marjanovi´c, V., Labarta, J., Ayguadé, E., Valero, M.: Overlapping communication and computation by using a hybrid MPI/SMPSs approach. In Proceedings of the 24th ACM International Conference on Supercomputing, 2010, pp. 5–16, doi: 10.1145/1810085.1810091

  • Nguyen, T., Cicotti, P., Bylaska, E., Quinlan, D., Baden, S.: Automatic translation of MPI source into a latency-tolerant, data-driven form. J. Parallel Distrib. Comput. 106, 1–13 (2017). https://doi.org/10.1016/j.jpdc.2017.02.009

    Article  Google Scholar 

  • Perez, J.M.: A dependency-aware parallel programming model. PhD thesis. Universitat Politècnica de Catalunya, Barcelona (2014)

  • Podobas, A., Brorsson, M., Vlassov, V.: TurboBLYSK: scheduling for improved data-driven task performance with fast dependency resolution. In: DeRose, L., de Supinski, B.R., Olivier, S.L., Chapman, B.M., M¨uller, M.S. (eds.) IWOMP 2014. LNCS, vol. 8766, pp. 45–57. Springer, Cham.

  • Preissl, R., Schulz, M., Kranzlmuller, D., de Supinski, B., Quinlan, D.: Using MPI communication patterns to guide source code transformations. In Computational Science ICCS 2008, Volume 5103 of Lecture Notes in Computer Science, pp. 253–260. Springer, Berlin/Heidelberg (2008).

  • OpenMP Architecture Review Board: OpenMP application program interface. Version 5.0. Nov. 2018. https://www.openmp.org/wp-content/uploads/OpenMP-API-Specification-5.0.pdf

  • Sala, K., Teruel, X., Perez, J.M., Peña, A.J., Beltran, V., Labarta, J.: Integrating blocking and non-blocking MPI primitives with task-based programming models. Parallel Comput. 85, 153–166 (2019). https://doi.org/10.1016/j.parco.2018.12.008

    Article  Google Scholar 

  • Virouleau, P., Broquedis, F., Gautier, T., Rastello, F.: Using data dependencies to improve task-based scheduling strategies on NUMA architectures. In Euro-Par 2016: Parallel Processing. Springer, Cham, pp. 531–544. https://doi.org/10.1007/978-3-319-43659-3_39

  • Xu, Z., Lin, J., Matsuoka, S.: Benchmarking SW26010 many-core processor. In Proceedings—2017 IEEE 31st International Parallel and Distributed Processing Symposium Workshops, IPDPSW 2017, pp. 743–752, June 30, 2017

  • Zhang, H., Lin, Z., Zeng, Q.: The computational scheme and the test for dynamical framework of IAP AGCM-4. Chin. J. Atmos. Sci. 33, 1267–1285 (2009)

    Google Scholar 

Download references

Acknowledgements

This work was supported by National Key R&D Program of China (Grant No. 2017YFB02-02002); the Innovation Research Group of NSFC (Grant No. 61521092).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Li Chen.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Chen, L., Tang, S., Fu, Y. et al. AceMesh: a structured data driven programming language for high performance computing. CCF Trans. HPC 2, 309–322 (2020). https://doi.org/10.1007/s42514-020-00047-4

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s42514-020-00047-4

Keywords

Navigation