Structural and Multidisciplinary Optimization

, Volume 48, Issue 3, pp 473–485 | Cite as

Toward GPU accelerated topology optimization on unstructured meshes

  • Tomás Zegard
  • Glaucio H. Paulino
Research Paper


The present work investigates the feasibility of finite element methods and topology optimization for unstructured meshes in massively parallel computer architectures, more specifically on Graphics Processing Units or GPUs. Challenges in the parallel implementation, like the parallel assembly race condition, are discussed and solved with simple algorithms, in this case greedy graph coloring. The parallel implementation for every step involved in the topology optimization process is benchmarked and compared against an equivalent sequential implementation. The ultimate goal of this work is to speed up the topology optimization process by means of parallel computing using off-the-shelf hardware. Examples are compared with both a standard sequential version of the implementation and a massively parallel version to better illustrate the advantages and disadvantages of this approach.


Topology optimization Graphics processing units Finite element method FEM GPU CUDA 



We also thank Dr. Cameron Talischi for his help in the preparation of this manuscript. We acknowledge support from the National Science Foundation (NSF) under grant 1321661, and from the Donald B. and Elizabeth M. Willett endowment at the University of Illinois at Urbana-Champaign (UIUC).


  1. AMD (2009) ACML - AMD Core Math Library v4.3.0. Accessed Jan 2010
  2. Anderson E, Bai Z, Bischof C, Blackford S, Demmel J, Dongarra J, Du Croz J, Greenbaum A, Hammarling S, McKenney A, Sorensen D (1999) LAPACK users’ guideGoogle Scholar
  3. Bendsøe MP (1989) Optimal shape design as a material distribution problem. Struct Optim 1:193–202CrossRefGoogle Scholar
  4. Bendsøe MP, Kikuchi N (1988) Generating optimal topologies in structural design using a homogenization method. Comput Methods Appl Mech Eng 71(2):197–224CrossRefGoogle Scholar
  5. Bendsøe MP, Sigmund O (1999) Material interpolation schemes in topology optimization. Arch Appl Mech 69:635–654CrossRefGoogle Scholar
  6. Bendsøe MP, Sigmund O (2003) Topology optimization: theory, methods and applications. Engineering Online Library, 2nd edn. Springer, Berlin, GermanyGoogle Scholar
  7. Blackford LS, Demmel J, Dongarra J, Duff I, Hammarling S, Henry G, Heroux M, Kaufman L, Lumsdaine A, Petitet A, Pozo R, Remington K, Whaley RC (2002) An updated set of basic linear algebra subprograms BLAS. ACM Trans Math Softw 28(2):135–151CrossRefGoogle Scholar
  8. Bruns TE (2005) A reevaluation of the SIMP method with filtering and an alternative formulation for solid-void topology optimization. Struct Multidiscip Optim 30(6):428–436MathSciNetCrossRefGoogle Scholar
  9. Cannondale (2010) Cannondale Bicycle Corporation. Accessed May 2011
  10. Carvalho RF, Martins CAPS, Batalha RMS, Camargos AFP (2010) 3D parallel conjugate gradient solver optimized for GPUs. In: Digests of the 2010 14th biennial IEEE conference on electromagnetic field computation (CEFC). IEEEGoogle Scholar
  11. Cecka C, Lew AJ, Darve E (2011) Assembly of finite element methods on graphics processors. Int J Numer Methods Eng 85(5):640–669CrossRefzbMATHGoogle Scholar
  12. Cuthill E, Mckee J (1969) Reducing the bandwidth of sparse symmetric matrices. In: 24th national conference of the ACM, pp 157–172Google Scholar
  13. Dailey DP (1980) Uniqueness of colorability and colorability of planar 4-regular graphs are NP-complete. Discrete Math 30(3):289–293MathSciNetCrossRefzbMATHGoogle Scholar
  14. Dziekonski A, Lamecki A, Mrozowski M (2010) Jacobi and Gauss-Seidel preconditioned complex conjugate gradient method with GPU acceleration for finite element method. In: 40th European microwave conference, pp 1305–1308Google Scholar
  15. EM Photonics (2004) CULA Tools - GPU Accelerated LAPACK. Accessed Oct 2012
  16. Gebremedhin AH, Manne F, Pothen A (2005) What color is your Jacobian? Graph coloring for computing derivatives. SIAM Rev 47(4):629–705MathSciNetCrossRefzbMATHGoogle Scholar
  17. Gibbs NE, Poole Jr WG, Stockmeyer PK (1976a) A comparison of several bandwidth and profile reduction algorithms. ACM Trans Math Softw 2(4):322–330CrossRefzbMATHGoogle Scholar
  18. Gibbs NE, Poole Jr WG, Stockmeyer PK (1976b) An algorithm for reducing the bandwidth and profile of a sparse matrix. SIAM J Numer Anal 13(2):236–250MathSciNetCrossRefzbMATHGoogle Scholar
  19. Gödel N, Schomann S, Warburton T, Clemens M (2010) GPU accelerated Adams-Bashforth multirate discontinuous Galerkin FEM simulation of high-frequency electromagnetic fields. IEEE Trans Magn 46(8):2735–2738CrossRefGoogle Scholar
  20. Guney ME (2010) High-performance direct solution of finite element problems on multi-core processors. PhD thesis, Georgia Insitute of Technology, Atlanta, GAGoogle Scholar
  21. Haftka RT, Gürdal Z (1992) Elements of structural optimization Solid mechanics and its applications series, 3rd edn. Kluwer, Norwell, MACrossRefGoogle Scholar
  22. Hemp WS (1973) Optimum structures Oxford engineering science series. Clarendon Press, Oxford, UKGoogle Scholar
  23. Kakay A, Westphal E, Hertel R (2010) Speedup of FEM micromagnetic simulations with graphical processing units. IEEE Trans Magn 46(6):2303–2306CrossRefGoogle Scholar
  24. Kucěra L (1991) The greedy coloring is a bad probabilistic algorithm. J Algor 12(4):674–684CrossRefzbMATHGoogle Scholar
  25. Liu W-H, Sherman AH (1976) Comparative analysis of the Cuthill-McKee and the reverse Cuthill-McKee ordering algorithms for sparse matrices. SIAM J Numer Anal 13(2):198–213MathSciNetCrossRefzbMATHGoogle Scholar
  26. Liu Y, Jiao S, Wu W, De S (2008) GPU accelerated fast FEM deformation simulation. IEEE Asia Pac Conf Circ Syst 606–609Google Scholar
  27. Mahdavi A, Balaji R, Frecker M, Mockensturm EM (2006) Topology optimization of 2D continua for minimum compliance using parallel computing. Struct Multidiscip Optim 32(2):121–132CrossRefGoogle Scholar
  28. Matsui K, Terada K (2004) Continuous approximation of material distribution for topology optimization. Int J Numer Methods Eng 59(14):1925–1944MathSciNetCrossRefzbMATHGoogle Scholar
  29. Michell AGM (1904) The limits of economy of material in frame-structures. Philos Mag Ser 8(47):589–597CrossRefzbMATHGoogle Scholar
  30. NVIDIA (2007) CUDA programming guide. Accessed June 2009
  31. NVIDIA (2009) CUDA C programming - best practices guide. Accessed June 2009
  32. NVIDIA (2012) cuBLAS - CUDA Basic Linear Algebra Subroutines Accessed Oct 2012
  33. Oliker L, Biswas R (2000) Parallelization of a dynamic unstructured algorithm using three leading programming paradigms. IEEE Trans Parallel Distrib Syst 11(9):931–940CrossRefGoogle Scholar
  34. Paulino GH, Menezes IFM, Gattass M, Mukherjee S (1994a) Node and element resequencing using the Laplacian of a finite element graph: part I - general concepts and algorithm. Int J Numer Methods Eng 37(9):1511–1530CrossRefzbMATHGoogle Scholar
  35. Paulino GH, Menezes IFM, Gattass M, Mukherjee S (1994b) Node and element resequencing using the Laplacian of a finite element graph: part II - implementation and numerical results. Int J Numer Methods Eng 37(9):1531–1555CrossRefGoogle Scholar
  36. Peressini AL, Sullivan FE, Uhl Jr JJ (1988) The mathematics of nonlinear programming Undergraduate texts in mathematics series. Springer-Verlag, New YorkCrossRefzbMATHGoogle Scholar
  37. Remón A, Quintana-Ortí E, Quintana-Ortí G (2006) Cholesky factorization of band matrices using multithreaded BLAS. In: PARA 2006, pp 608–616Google Scholar
  38. Remón A, Quintana-Ortí E, Quintana-Ortí G (2007) The implementation of BLAS for band matrices. In: PPAM 07, pp 668–677Google Scholar
  39. Rozvany GIN (1997) Topology optimization in structural mechanics. CISM International Centre for Mechanical Sciences. Springer, New York, NYGoogle Scholar
  40. Schmidt S, Schulz V (2011) A 2589 line topology optimization code written for the graphics card. Comput Vis Sci 14(6):249–256MathSciNetCrossRefGoogle Scholar
  41. Sigmund O (2001) A 99 line topology optimization code written in Matlab. Struct Multidiscip Optim 21(2):120–127CrossRefGoogle Scholar
  42. SIMULIA (Dassault Systèmes) (1978) Abaqus FEA. Accessed Dec 2012
  43. Tomov S, Nath R, Du P, Dongarra J (2009) MAGMA users’ guide v0.2. Accessed Dec 2012
  44. Tomov S, Nath R, Ltaief H, Dongarra J (2010) Dense linear algebra solvers for multicore with GPU accelerators. In: 2010 IEEE international symposium on parallel & distributed processing workshops and PhD forum (IPDPSW)Google Scholar
  45. Vemaganti K, Lawrence WE, Parallel methods for topology optimization (2004). Comput Methods Appl Mech Eng 194(34–35):3637–3667MathSciNetGoogle Scholar
  46. Volkov V, Demmel JW (2008) Benchmarking GPUs to tune dense linear algebra. In: 2008 ACM/IEEE conference on supercomputingGoogle Scholar
  47. Zegard T (2010) Topology optimization with unstructured meshes on graphics processing units GPUs. Ms thesis, University of Illinois at Urbana-ChampaignGoogle Scholar
  48. Zegard T, Paulino GH (2011) GPU-based topology optimization on unstructured meshes. In: 11th US National Congress on computational mechanicsGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  1. 1.Department of Civil and Environmental Engineering, Newmark LaboratoryUniversity of Illinois at Urbana-ChampaignUrbanaUSA

Personalised recommendations