Toward GPU accelerated topology optimization on unstructured meshes

Zegard, Tomás; Paulino, Glaucio H.

doi:10.1007/s00158-013-0920-y

Toward GPU accelerated topology optimization on unstructured meshes

Research Paper
Published: 12 April 2013

Volume 48, pages 473–485, (2013)
Cite this article

Structural and Multidisciplinary Optimization Aims and scope Submit manuscript

Tomás Zegard¹ &
Glaucio H. Paulino¹

1153 Accesses
40 Citations
Explore all metrics

Abstract

The present work investigates the feasibility of finite element methods and topology optimization for unstructured meshes in massively parallel computer architectures, more specifically on Graphics Processing Units or GPUs. Challenges in the parallel implementation, like the parallel assembly race condition, are discussed and solved with simple algorithms, in this case greedy graph coloring. The parallel implementation for every step involved in the topology optimization process is benchmarked and compared against an equivalent sequential implementation. The ultimate goal of this work is to speed up the topology optimization process by means of parallel computing using off-the-shelf hardware. Examples are compared with both a standard sequential version of the implementation and a massively parallel version to better illustrate the advantages and disadvantages of this approach.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

The Effect of Parallelization on a Tetrahedral Mesh Optimization Method

PolyTop++: an efficient alternative for serial and parallel topology optimization on CPUs & GPUs

Article 05 June 2015

High Performance and Scalable Graph Computation on GPUs

Notes

A thread is an independent unit of processing that can handle and process a task.
A race condition or race hazard is a flaw where the output and/or result of a process is wrong because two events that cannot take place at the same time race against each other to influence the result. In FEM this typically occurs when two local stiffness matrices \([\mathbf {k^e}]\) are being added at the same time to the same positions in the global \([\mathbf {K}]\).
The hardware used for all benchmarks consists of a dual-socket dual-core AMD Opteron 2216, 8 GB of RAM and a NVIDIA Tesla T10 GPU with 4 GB of RAM.

References

AMD (2009) ACML - AMD Core Math Library v4.3.0. http://developer.amd.com/cpu/Libraries/acml/Pages/default.aspx. Accessed Jan 2010
Anderson E, Bai Z, Bischof C, Blackford S, Demmel J, Dongarra J, Du Croz J, Greenbaum A, Hammarling S, McKenney A, Sorensen D (1999) LAPACK users’ guide
Bendsøe MP (1989) Optimal shape design as a material distribution problem. Struct Optim 1:193–202
Article Google Scholar
Bendsøe MP, Kikuchi N (1988) Generating optimal topologies in structural design using a homogenization method. Comput Methods Appl Mech Eng 71(2):197–224
Article Google Scholar
Bendsøe MP, Sigmund O (1999) Material interpolation schemes in topology optimization. Arch Appl Mech 69:635–654
Article Google Scholar
Bendsøe MP, Sigmund O (2003) Topology optimization: theory, methods and applications. Engineering Online Library, 2nd edn. Springer, Berlin, Germany
Google Scholar
Blackford LS, Demmel J, Dongarra J, Duff I, Hammarling S, Henry G, Heroux M, Kaufman L, Lumsdaine A, Petitet A, Pozo R, Remington K, Whaley RC (2002) An updated set of basic linear algebra subprograms BLAS. ACM Trans Math Softw 28(2):135–151
Article Google Scholar
Bruns TE (2005) A reevaluation of the SIMP method with filtering and an alternative formulation for solid-void topology optimization. Struct Multidiscip Optim 30(6):428–436
Article MathSciNet Google Scholar
Cannondale (2010) Cannondale Bicycle Corporation. http://www.cannondale.com/. Accessed May 2011
Carvalho RF, Martins CAPS, Batalha RMS, Camargos AFP (2010) 3D parallel conjugate gradient solver optimized for GPUs. In: Digests of the 2010 14th biennial IEEE conference on electromagnetic field computation (CEFC). IEEE
Cecka C, Lew AJ, Darve E (2011) Assembly of finite element methods on graphics processors. Int J Numer Methods Eng 85(5):640–669
Article MATH Google Scholar
Cuthill E, Mckee J (1969) Reducing the bandwidth of sparse symmetric matrices. In: 24th national conference of the ACM, pp 157–172
Dailey DP (1980) Uniqueness of colorability and colorability of planar 4-regular graphs are NP-complete. Discrete Math 30(3):289–293
Article MathSciNet MATH Google Scholar
Dziekonski A, Lamecki A, Mrozowski M (2010) Jacobi and Gauss-Seidel preconditioned complex conjugate gradient method with GPU acceleration for finite element method. In: 40th European microwave conference, pp 1305–1308
EM Photonics (2004) CULA Tools - GPU Accelerated LAPACK. http://www.culatools.com/. Accessed Oct 2012
Gebremedhin AH, Manne F, Pothen A (2005) What color is your Jacobian? Graph coloring for computing derivatives. SIAM Rev 47(4):629–705
Article MathSciNet MATH Google Scholar
Gibbs NE, Poole Jr WG, Stockmeyer PK (1976a) A comparison of several bandwidth and profile reduction algorithms. ACM Trans Math Softw 2(4):322–330
Article MATH Google Scholar
Gibbs NE, Poole Jr WG, Stockmeyer PK (1976b) An algorithm for reducing the bandwidth and profile of a sparse matrix. SIAM J Numer Anal 13(2):236–250
Article MathSciNet MATH Google Scholar
Gödel N, Schomann S, Warburton T, Clemens M (2010) GPU accelerated Adams-Bashforth multirate discontinuous Galerkin FEM simulation of high-frequency electromagnetic fields. IEEE Trans Magn 46(8):2735–2738
Article Google Scholar
Guney ME (2010) High-performance direct solution of finite element problems on multi-core processors. PhD thesis, Georgia Insitute of Technology, Atlanta, GA
Haftka RT, Gürdal Z (1992) Elements of structural optimization Solid mechanics and its applications series, 3rd edn. Kluwer, Norwell, MA
Book Google Scholar
Hemp WS (1973) Optimum structures Oxford engineering science series. Clarendon Press, Oxford, UK
Google Scholar
Kakay A, Westphal E, Hertel R (2010) Speedup of FEM micromagnetic simulations with graphical processing units. IEEE Trans Magn 46(6):2303–2306
Article Google Scholar
Kucěra L (1991) The greedy coloring is a bad probabilistic algorithm. J Algor 12(4):674–684
Article MATH Google Scholar
Liu W-H, Sherman AH (1976) Comparative analysis of the Cuthill-McKee and the reverse Cuthill-McKee ordering algorithms for sparse matrices. SIAM J Numer Anal 13(2):198–213
Article MathSciNet MATH Google Scholar
Liu Y, Jiao S, Wu W, De S (2008) GPU accelerated fast FEM deformation simulation. IEEE Asia Pac Conf Circ Syst 606–609
Mahdavi A, Balaji R, Frecker M, Mockensturm EM (2006) Topology optimization of 2D continua for minimum compliance using parallel computing. Struct Multidiscip Optim 32(2):121–132
Article Google Scholar
Matsui K, Terada K (2004) Continuous approximation of material distribution for topology optimization. Int J Numer Methods Eng 59(14):1925–1944
Article MathSciNet MATH Google Scholar
Michell AGM (1904) The limits of economy of material in frame-structures. Philos Mag Ser 8(47):589–597
Article MATH Google Scholar
NVIDIA (2007) CUDA programming guide. http://www.nvidia.com/. Accessed June 2009
NVIDIA (2009) CUDA C programming - best practices guide. http://www.nvidia.com/cuda/. Accessed June 2009
NVIDIA (2012) cuBLAS - CUDA Basic Linear Algebra Subroutines http://developer.nvidia.com/cublas. Accessed Oct 2012
Oliker L, Biswas R (2000) Parallelization of a dynamic unstructured algorithm using three leading programming paradigms. IEEE Trans Parallel Distrib Syst 11(9):931–940
Article Google Scholar
Paulino GH, Menezes IFM, Gattass M, Mukherjee S (1994a) Node and element resequencing using the Laplacian of a finite element graph: part I - general concepts and algorithm. Int J Numer Methods Eng 37(9):1511–1530
Article MATH Google Scholar
Paulino GH, Menezes IFM, Gattass M, Mukherjee S (1994b) Node and element resequencing using the Laplacian of a finite element graph: part II - implementation and numerical results. Int J Numer Methods Eng 37(9):1531–1555
Article Google Scholar
Peressini AL, Sullivan FE, Uhl Jr JJ (1988) The mathematics of nonlinear programming Undergraduate texts in mathematics series. Springer-Verlag, New York
Book MATH Google Scholar
Remón A, Quintana-Ortí E, Quintana-Ortí G (2006) Cholesky factorization of band matrices using multithreaded BLAS. In: PARA 2006, pp 608–616
Remón A, Quintana-Ortí E, Quintana-Ortí G (2007) The implementation of BLAS for band matrices. In: PPAM 07, pp 668–677
Rozvany GIN (1997) Topology optimization in structural mechanics. CISM International Centre for Mechanical Sciences. Springer, New York, NY
Google Scholar
Schmidt S, Schulz V (2011) A 2589 line topology optimization code written for the graphics card. Comput Vis Sci 14(6):249–256
Article MathSciNet Google Scholar
Sigmund O (2001) A 99 line topology optimization code written in Matlab. Struct Multidiscip Optim 21(2):120–127
Article Google Scholar
SIMULIA (Dassault Systèmes) (1978) Abaqus FEA. http://www.3ds.com/products/simulia/overview/. Accessed Dec 2012
Tomov S, Nath R, Du P, Dongarra J (2009) MAGMA users’ guide v0.2. http://icl.cs.utk.edu/magma/. Accessed Dec 2012
Tomov S, Nath R, Ltaief H, Dongarra J (2010) Dense linear algebra solvers for multicore with GPU accelerators. In: 2010 IEEE international symposium on parallel & distributed processing workshops and PhD forum (IPDPSW)
Vemaganti K, Lawrence WE, Parallel methods for topology optimization (2004). Comput Methods Appl Mech Eng 194(34–35):3637–3667
MathSciNet Google Scholar
Volkov V, Demmel JW (2008) Benchmarking GPUs to tune dense linear algebra. In: 2008 ACM/IEEE conference on supercomputing
Zegard T (2010) Topology optimization with unstructured meshes on graphics processing units GPUs. Ms thesis, University of Illinois at Urbana-Champaign
Zegard T, Paulino GH (2011) GPU-based topology optimization on unstructured meshes. In: 11th US National Congress on computational mechanics

Download references

Acknowledgments

We also thank Dr. Cameron Talischi for his help in the preparation of this manuscript. We acknowledge support from the National Science Foundation (NSF) under grant 1321661, and from the Donald B. and Elizabeth M. Willett endowment at the University of Illinois at Urbana-Champaign (UIUC).

Author information

Authors and Affiliations

Department of Civil and Environmental Engineering, Newmark Laboratory, University of Illinois at Urbana-Champaign, 205 N. Mathews Avenue, Urbana, IL, 61801, USA
Tomás Zegard & Glaucio H. Paulino

Authors

Tomás Zegard
View author publications
You can also search for this author in PubMed Google Scholar
Glaucio H. Paulino
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Glaucio H. Paulino.

Appendix A: Nomenclature

1.1 A.1 Symbols

\({[\mathbf {B}]}\) :: Strain-displacement matrix
\(b_w\) :: Bandwidth of a matrix
c:: Compliance
\({[\mathbf {D}]}\) :: Constitutive matrix
E:: Young’s modulus
f:: Volume fraction
\({\{\mathbf {f}\}}\) :: Global force vector
\(\hat {H}_j\) :: Convolution function
\({[\mathbf {J}]}\) :: Transformation Jacobian
\({[\mathbf {K}]}\) :: Global stiffness matrix
\({[\mathbf {k}]}\) :: Local stiffness matrix
\({[\mathbf {L}]}\) :: Lower triangular matrix
\({\mathbf {L} ( G^C )}\) :: Communication matrix for graph G
m:: Density move limit
n:: Number of elements, matrix size or other depending on the context
p:: Penalization factor for SIMP
R:: Filter radius
u:: Displacement
V:: Volume
\({[\mathbf {XY}]}\) :: Nodal coordinates of an element
\({[\boldsymbol {\Gamma }]}\) :: Inverse of the transformation Jacobian\([\mathbf {J}]\)
\(\eta \) :: Numerical damping parameter
\(\mathcal {L}\) :: Lagrangian function
\(\lambda \) :: Lagrange multiplier
\(\nu \) :: Poisson’s ratio
\(\rho \) :: Density
\({\chi ( G )}\) :: Chromatic number for graph G

1.2 A.2 Abbreviations

ALU:: Arithmetic Logic Unit
CAMD:: Continuous Approximation of Material Distribution
CPU:: Central Processing Unit
CUDA:: NVIDIA’s Compute Unified Device Architecture
DOF:: Degree Of Freedom
DRAM:: Dynamic Random-Access Memory
FEM:: Finite Element Method
GPU:: Graphics Processing Unit
MBB:: Messerschmitt–Bölkow–Blohm
OC:: Optimality Criteria
PCGS:: Pre-conditioned Conjugate Gradient Solver
SIMP:: Solid Isotropic Material with Penalization
TOP:: Topology Optimization

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zegard, T., Paulino, G.H. Toward GPU accelerated topology optimization on unstructured meshes. Struct Multidisc Optim 48, 473–485 (2013). https://doi.org/10.1007/s00158-013-0920-y

Download citation

Received: 18 November 2011
Revised: 25 February 2013
Accepted: 02 March 2013
Published: 12 April 2013
Issue Date: September 2013
DOI: https://doi.org/10.1007/s00158-013-0920-y

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Toward GPU accelerated topology optimization on unstructured meshes

Abstract

Access this article

Similar content being viewed by others

The Effect of Parallelization on a Tetrahedral Mesh Optimization Method

PolyTop++: an efficient alternative for serial and parallel topology optimization on CPUs & GPUs

High Performance and Scalable Graph Computation on GPUs

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Appendix A: Nomenclature

1.1 A.1 Symbols

1.2 A.2 Abbreviations

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Toward GPU accelerated topology optimization on unstructured meshes

Abstract

Access this article

Similar content being viewed by others

The Effect of Parallelization on a Tetrahedral Mesh Optimization Method

PolyTop++: an efficient alternative for serial and parallel topology optimization on CPUs & GPUs

High Performance and Scalable Graph Computation on GPUs

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Appendix A: Nomenclature

Appendix A: Nomenclature

1.1 A.1 Symbols

1.2 A.2 Abbreviations

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation