Skip to main content
Log in

Toward GPU accelerated topology optimization on unstructured meshes

  • Research Paper
  • Published:
Structural and Multidisciplinary Optimization Aims and scope Submit manuscript

Abstract

The present work investigates the feasibility of finite element methods and topology optimization for unstructured meshes in massively parallel computer architectures, more specifically on Graphics Processing Units or GPUs. Challenges in the parallel implementation, like the parallel assembly race condition, are discussed and solved with simple algorithms, in this case greedy graph coloring. The parallel implementation for every step involved in the topology optimization process is benchmarked and compared against an equivalent sequential implementation. The ultimate goal of this work is to speed up the topology optimization process by means of parallel computing using off-the-shelf hardware. Examples are compared with both a standard sequential version of the implementation and a massively parallel version to better illustrate the advantages and disadvantages of this approach.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19

Similar content being viewed by others

Notes

  1. A thread is an independent unit of processing that can handle and process a task.

  2. A race condition or race hazard is a flaw where the output and/or result of a process is wrong because two events that cannot take place at the same time race against each other to influence the result. In FEM this typically occurs when two local stiffness matrices \([\mathbf {k^e}]\) are being added at the same time to the same positions in the global \([\mathbf {K}]\).

  3. The hardware used for all benchmarks consists of a dual-socket dual-core AMD Opteron 2216, 8 GB of RAM and a NVIDIA Tesla T10 GPU with 4 GB of RAM.

References

  • AMD (2009) ACML - AMD Core Math Library v4.3.0. http://developer.amd.com/cpu/Libraries/acml/Pages/default.aspx. Accessed Jan 2010

  • Anderson E, Bai Z, Bischof C, Blackford S, Demmel J, Dongarra J, Du Croz J, Greenbaum A, Hammarling S, McKenney A, Sorensen D (1999) LAPACK users’ guide

  • Bendsøe MP (1989) Optimal shape design as a material distribution problem. Struct Optim 1:193–202

    Article  Google Scholar 

  • Bendsøe MP, Kikuchi N (1988) Generating optimal topologies in structural design using a homogenization method. Comput Methods Appl Mech Eng 71(2):197–224

    Article  Google Scholar 

  • Bendsøe MP, Sigmund O (1999) Material interpolation schemes in topology optimization. Arch Appl Mech 69:635–654

    Article  Google Scholar 

  • Bendsøe MP, Sigmund O (2003) Topology optimization: theory, methods and applications. Engineering Online Library, 2nd edn. Springer, Berlin, Germany

    Google Scholar 

  • Blackford LS, Demmel J, Dongarra J, Duff I, Hammarling S, Henry G, Heroux M, Kaufman L, Lumsdaine A, Petitet A, Pozo R, Remington K, Whaley RC (2002) An updated set of basic linear algebra subprograms BLAS. ACM Trans Math Softw 28(2):135–151

    Article  Google Scholar 

  • Bruns TE (2005) A reevaluation of the SIMP method with filtering and an alternative formulation for solid-void topology optimization. Struct Multidiscip Optim 30(6):428–436

    Article  MathSciNet  Google Scholar 

  • Cannondale (2010) Cannondale Bicycle Corporation. http://www.cannondale.com/. Accessed May 2011

  • Carvalho RF, Martins CAPS, Batalha RMS, Camargos AFP (2010) 3D parallel conjugate gradient solver optimized for GPUs. In: Digests of the 2010 14th biennial IEEE conference on electromagnetic field computation (CEFC). IEEE

  • Cecka C, Lew AJ, Darve E (2011) Assembly of finite element methods on graphics processors. Int J Numer Methods Eng 85(5):640–669

    Article  MATH  Google Scholar 

  • Cuthill E, Mckee J (1969) Reducing the bandwidth of sparse symmetric matrices. In: 24th national conference of the ACM, pp 157–172

  • Dailey DP (1980) Uniqueness of colorability and colorability of planar 4-regular graphs are NP-complete. Discrete Math 30(3):289–293

    Article  MathSciNet  MATH  Google Scholar 

  • Dziekonski A, Lamecki A, Mrozowski M (2010) Jacobi and Gauss-Seidel preconditioned complex conjugate gradient method with GPU acceleration for finite element method. In: 40th European microwave conference, pp 1305–1308

  • EM Photonics (2004) CULA Tools - GPU Accelerated LAPACK. http://www.culatools.com/. Accessed Oct 2012

  • Gebremedhin AH, Manne F, Pothen A (2005) What color is your Jacobian? Graph coloring for computing derivatives. SIAM Rev 47(4):629–705

    Article  MathSciNet  MATH  Google Scholar 

  • Gibbs NE, Poole Jr WG, Stockmeyer PK (1976a) A comparison of several bandwidth and profile reduction algorithms. ACM Trans Math Softw 2(4):322–330

    Article  MATH  Google Scholar 

  • Gibbs NE, Poole Jr WG, Stockmeyer PK (1976b) An algorithm for reducing the bandwidth and profile of a sparse matrix. SIAM J Numer Anal 13(2):236–250

    Article  MathSciNet  MATH  Google Scholar 

  • Gödel N, Schomann S, Warburton T, Clemens M (2010) GPU accelerated Adams-Bashforth multirate discontinuous Galerkin FEM simulation of high-frequency electromagnetic fields. IEEE Trans Magn 46(8):2735–2738

    Article  Google Scholar 

  • Guney ME (2010) High-performance direct solution of finite element problems on multi-core processors. PhD thesis, Georgia Insitute of Technology, Atlanta, GA

  • Haftka RT, Gürdal Z (1992) Elements of structural optimization Solid mechanics and its applications series, 3rd edn. Kluwer, Norwell, MA

    Book  Google Scholar 

  • Hemp WS (1973) Optimum structures Oxford engineering science series. Clarendon Press, Oxford, UK

    Google Scholar 

  • Kakay A, Westphal E, Hertel R (2010) Speedup of FEM micromagnetic simulations with graphical processing units. IEEE Trans Magn 46(6):2303–2306

    Article  Google Scholar 

  • Kucěra L (1991) The greedy coloring is a bad probabilistic algorithm. J Algor 12(4):674–684

    Article  MATH  Google Scholar 

  • Liu W-H, Sherman AH (1976) Comparative analysis of the Cuthill-McKee and the reverse Cuthill-McKee ordering algorithms for sparse matrices. SIAM J Numer Anal 13(2):198–213

    Article  MathSciNet  MATH  Google Scholar 

  • Liu Y, Jiao S, Wu W, De S (2008) GPU accelerated fast FEM deformation simulation. IEEE Asia Pac Conf Circ Syst 606–609

  • Mahdavi A, Balaji R, Frecker M, Mockensturm EM (2006) Topology optimization of 2D continua for minimum compliance using parallel computing. Struct Multidiscip Optim 32(2):121–132

    Article  Google Scholar 

  • Matsui K, Terada K (2004) Continuous approximation of material distribution for topology optimization. Int J Numer Methods Eng 59(14):1925–1944

    Article  MathSciNet  MATH  Google Scholar 

  • Michell AGM (1904) The limits of economy of material in frame-structures. Philos Mag Ser 8(47):589–597

    Article  MATH  Google Scholar 

  • NVIDIA (2007) CUDA programming guide. http://www.nvidia.com/. Accessed June 2009

  • NVIDIA (2009) CUDA C programming - best practices guide. http://www.nvidia.com/cuda/. Accessed June 2009

  • NVIDIA (2012) cuBLAS - CUDA Basic Linear Algebra Subroutines http://developer.nvidia.com/cublas. Accessed Oct 2012

  • Oliker L, Biswas R (2000) Parallelization of a dynamic unstructured algorithm using three leading programming paradigms. IEEE Trans Parallel Distrib Syst 11(9):931–940

    Article  Google Scholar 

  • Paulino GH, Menezes IFM, Gattass M, Mukherjee S (1994a) Node and element resequencing using the Laplacian of a finite element graph: part I - general concepts and algorithm. Int J Numer Methods Eng 37(9):1511–1530

    Article  MATH  Google Scholar 

  • Paulino GH, Menezes IFM, Gattass M, Mukherjee S (1994b) Node and element resequencing using the Laplacian of a finite element graph: part II - implementation and numerical results. Int J Numer Methods Eng 37(9):1531–1555

    Article  Google Scholar 

  • Peressini AL, Sullivan FE, Uhl Jr JJ (1988) The mathematics of nonlinear programming Undergraduate texts in mathematics series. Springer-Verlag, New York

    Book  MATH  Google Scholar 

  • Remón A, Quintana-Ortí E, Quintana-Ortí G (2006) Cholesky factorization of band matrices using multithreaded BLAS. In: PARA 2006, pp 608–616

  • Remón A, Quintana-Ortí E, Quintana-Ortí G (2007) The implementation of BLAS for band matrices. In: PPAM 07, pp 668–677

  • Rozvany GIN (1997) Topology optimization in structural mechanics. CISM International Centre for Mechanical Sciences. Springer, New York, NY

    Google Scholar 

  • Schmidt S, Schulz V (2011) A 2589 line topology optimization code written for the graphics card. Comput Vis Sci 14(6):249–256

    Article  MathSciNet  Google Scholar 

  • Sigmund O (2001) A 99 line topology optimization code written in Matlab. Struct Multidiscip Optim 21(2):120–127

    Article  Google Scholar 

  • SIMULIA (Dassault Systèmes) (1978) Abaqus FEA. http://www.3ds.com/products/simulia/overview/. Accessed Dec 2012

  • Tomov S, Nath R, Du P, Dongarra J (2009) MAGMA users’ guide v0.2. http://icl.cs.utk.edu/magma/. Accessed Dec 2012

  • Tomov S, Nath R, Ltaief H, Dongarra J (2010) Dense linear algebra solvers for multicore with GPU accelerators. In: 2010 IEEE international symposium on parallel & distributed processing workshops and PhD forum (IPDPSW)

  • Vemaganti K, Lawrence WE, Parallel methods for topology optimization (2004). Comput Methods Appl Mech Eng 194(34–35):3637–3667

    MathSciNet  Google Scholar 

  • Volkov V, Demmel JW (2008) Benchmarking GPUs to tune dense linear algebra. In: 2008 ACM/IEEE conference on supercomputing

  • Zegard T (2010) Topology optimization with unstructured meshes on graphics processing units GPUs. Ms thesis, University of Illinois at Urbana-Champaign

  • Zegard T, Paulino GH (2011) GPU-based topology optimization on unstructured meshes. In: 11th US National Congress on computational mechanics

Download references

Acknowledgments

We also thank Dr. Cameron Talischi for his help in the preparation of this manuscript. We acknowledge support from the National Science Foundation (NSF) under grant 1321661, and from the Donald B. and Elizabeth M. Willett endowment at the University of Illinois at Urbana-Champaign (UIUC).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Glaucio H. Paulino.

Appendix A: Nomenclature

Appendix A: Nomenclature

1.1 A.1 Symbols

\({[\mathbf {B}]}\) :

Strain-displacement matrix

\(b_w\) :

Bandwidth of a matrix

c:

Compliance

\({[\mathbf {D}]}\) :

Constitutive matrix

E:

Young’s modulus

f:

Volume fraction

\({\{\mathbf {f}\}}\) :

Global force vector

\(\hat {H}_j\) :

Convolution function

\({[\mathbf {J}]}\) :

Transformation Jacobian

\({[\mathbf {K}]}\) :

Global stiffness matrix

\({[\mathbf {k}]}\) :

Local stiffness matrix

\({[\mathbf {L}]}\) :

Lower triangular matrix

\({\mathbf {L} ( G^C )}\) :

Communication matrix for graph G

m:

Density move limit

n:

Number of elements, matrix size or other depending on the context

p:

Penalization factor for SIMP

R:

Filter radius

u:

Displacement

V:

Volume

\({[\mathbf {XY}]}\) :

Nodal coordinates of an element

\({[\boldsymbol {\Gamma }]}\) :

Inverse of the transformation Jacobian\([\mathbf {J}]\)

\(\eta \) :

Numerical damping parameter

\(\mathcal {L}\) :

Lagrangian function

\(\lambda \) :

Lagrange multiplier

\(\nu \) :

Poisson’s ratio

\(\rho \) :

Density

\({\chi ( G )}\) :

Chromatic number for graph G

1.2 A.2 Abbreviations

ALU:

Arithmetic Logic Unit

CAMD:

Continuous Approximation of Material Distribution

CPU:

Central Processing Unit

CUDA:

NVIDIA’s Compute Unified Device Architecture

DOF:

Degree Of Freedom

DRAM:

Dynamic Random-Access Memory

FEM:

Finite Element Method

GPU:

Graphics Processing Unit

MBB:

Messerschmitt–Bölkow–Blohm

OC:

Optimality Criteria

PCGS:

Pre-conditioned Conjugate Gradient Solver

SIMP:

Solid Isotropic Material with Penalization

TOP:

Topology Optimization

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zegard, T., Paulino, G.H. Toward GPU accelerated topology optimization on unstructured meshes. Struct Multidisc Optim 48, 473–485 (2013). https://doi.org/10.1007/s00158-013-0920-y

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00158-013-0920-y

Keywords

Navigation