Skip to main content

Accelerating low-fidelity aerodynamic codes on multi- and many-core architectures


Vortex lattice and panel methods belong to a broad family of aerodynamic codes based on potential flow theory. They are used in preliminary aerodynamic studies in early stages of aircraft design where hundreds of thousands candidate configurations are analyzed. In this paper, we describe their efficient implementation on modern multi- and many-core architectures. We show how to bridge the ‘ninja gap’, defined as the performance gap between an unoptimized C/C\(++\) code and best optimized CPU code. We port the Vortex Lattice Method to a Graphics Processing Unit using the OpenACC standard. An elegant solution for implementation of data movements for C\(++\) classes is also presented.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11


  1. Dashti M, Fedorova A, Funston J, Gaud F, Lachaize R, Lepers B, Quema V, Roth M (2013) Traffic management: a holistic approach to memory placement on NUMA systems. SIGARCH Comput Archit News 41(1):381–394

    Google Scholar 

  2. Domeika M (2008) Scalar optimization and usability, Chapter 5. In: Domeika M (ed) Software development for embedded multi-core systems. Newnes, Burlington, pp 139–171

    Chapter  Google Scholar 

  3. Hager G, Wellein G (2010) Introduction to high performance computing for scientists and engineers, 1st edn. CRC Press, Boca Raton

    Book  MATH  Google Scholar 

  4. Hess JL (1990) Panel methods in computational fluid dynamics. Annu Rev Fluid Mech 22(1):255–274

    Article  Google Scholar 

  5. Katz J, Plotkin A (2001) Low-speed aerodynamics. Cambridge aerospace series. Cambridge University Press, Cambridge

    Book  Google Scholar 

  6. Jiri K, Michael S, Andrew A, Dirk P (2014) Accelerating a C\(++\) CFD code with OpenACC. In: Proceedings of the first workshop on accelerator programming using directives, WACCPD ’14, pp 47–54. IEEE Press, Piscataway

  7. Maleki S, Yaoqing G, Garzaran MJ, Wong T, Padua DA (2011) An evaluation of vectorizing compilers. In: 2011 International conference on parallel architectures and compilation techniques (PACT), pp 372–382

  8. Murua J, Palacios R, Graham JMR (2012) Applications of the unsteady vortex-lattice method in aircraft aeroelasticity and flight dynamics. Progr Aerosp Sci 55(0):46–72

    Article  Google Scholar 

  9. Nickolls J, Buck I, Garland M, Skadron K (2008) Scalable parallel programming with CUDA. Queue 6(2):40–53

    Article  Google Scholar 

  10. Niemeyer KE, Sung C-J (2014) Recent progress and challenges in exploiting graphics processors in computational fluid dynamics. J Supercomput 67(2):528–564

    Article  Google Scholar 

  11. Piperni P, DeBlois A, Henderson R (2013) Development of a multilevel multidisciplinary-optimization capability for an industrial environment. AIAA J 51(10):2335–2352

    Article  Google Scholar 

  12. Saad Y (2003) Iterative methods for sparse linear systems, 2nd edn. Society for Industrial and Applied Mathematics, Philadelphia

    Book  Google Scholar 

  13. Satish N, Kim C, Chhugani J, Saito H, Krishnaiyer R, Smelyanskiy M, Girkar M, Dubey P (2012) Can traditional programming bridge the Ninja performance gap for parallel computing applications? SIGARCH Comput Archit News 40(3):440–451

    Article  Google Scholar 

  14. Süß M, Leopold C (2008) Common mistakes in openmp and how to avoid them: a collection of best practices. In: Proceedings of the 2005 and 2006 international conference on OpenMP shared memory parallel programming, IWOMP’05/IWOMP’06, pp 312–323. Springer, Berlin

  15. Talbot SAM, Kelly PHJ (1998) Stable performance for cc-NUMA using first-touch page placement and reactive proxies. In: Schaeffer J (ed) Springer international series in engineering and computer science, 478th edn., High performance computing systems and applicationsSpringer, New York, pp 251–266

  16. Williams S, Waterman A, Patterson D (2009) Roofline: an insightful visual performance model for multicore architectures. Commun ACM 52(4):65–76

    Article  Google Scholar 

Download references


This work was supported by an NSERC Engage Grant with Cray Canada ULC as an industrial partner. M. Chrust and E. Laurendeau would like to thank Cray Canada ULC for providing access to the computing resources.

Author information

Authors and Affiliations


Corresponding author

Correspondence to Marcin Chrust.

Rights and permissions

Reprints and Permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Chrust, M., Laurendeau, E. & Ostiguy, L. Accelerating low-fidelity aerodynamic codes on multi- and many-core architectures. J Supercomput 71, 3456–3481 (2015).

Download citation

  • Published:

  • Issue Date:

  • DOI: