1 Introduction

Over the decades since the finite element method (FEM) was introduced initially by A. Hrennikoff in the work [1], in which he discretized the domain as lattice structure, then treated using a variational method by R. Courant [2], significant advancements have been achieved so far thanks to several discoveries in mathematics, particularly in variational methods [4, 6], numerical analysis [5] and the development of computers [3]. Thereby, the simulation of more complex problems, which was yesterday a dream, is progressively coming true today. However, despite these major strides observed in FEM [4], new challenges have emerged, such as sustainability [7] and integrative design [8], which imply multi-physics simulations [9, 10, 18], primordial for the optimal and rational use of natural resources. The concepts of sustainability and integrative design impose the development of new models capable of bringing together all the physical phenomena and driving forces behind the behavior of materials or systems, leading to more realistic results. Of course, this comes with a wide range of bottlenecks, like the complexity of the model [14, 16]. Failing to lighten models without depriving them of all their consistency and accuracy, one can improve their implementation by easing their interpretability and execution on computing resources.

At the early stage of the development of simulation tools in university research centers, MATLAB [11] has become one of the most widely used programming languages, not only for teaching purposes in STEM fields, but also for the deep-end research industry since MATLAB is a high-level vector-based programming language that enables to operate on vectors or matrices instead of basic elements. More concretely, this feature makes it possible to get rid of for-loops by transforming the intervening expressions so as to execute operations in fewer instructions. Thus, using this vector feature in the context of FEM implementation [17, 19] is a tremendous asset for developing more advanced packages designed for engineers dealing with simulations or researchers/lecturers to introduce students to the programmatic aspects of FEM in a practical way.

In this regard, one of the first attempts to vectorize FEM codes in MATLAB is due to work by Koko [12], in which the author presented a strategy on how to vectorize the computation global matrix/forces for the discretized weak form deriving from the strong equation in linear elasticity. However, this strategy is restricted to finite elements in 2D. Aside from that, Dabrowski et al. [13] introduced an optimized implementation of FEM in MATLAB called MILAMIN for large problems. Although vectorization strategy is not particularly explored in this work, the authors developed a new scheme to ease the construction of the triplet (column index, row index, matrix or vector components) and assembly of global matrix/force. Unlike what is commonly believed, authors demonstrated that in linear elasticity, the scheme outperforms many C++ or Fortran-based packages like OOFEM or FEAPpv.

A few years later, Cecka et al. [20] came out with an efficient strategy for the implementation on GPUs with an optimal use of memory. The authors in [22] proposed a vectorized scheme limited to problems meshed with \(P_{1}\) triangle elements (in 2D) and \(P_{1}\)-tetrahedral elements (in 3D) in linear elasticity. Although this strategy is free from the for-loop over elements, its scheme is computationally expensive since many smaller for-loops are introduced in the functions. Based on this work, [23] extended it to elasticity problems meshed with linear quadrilateral elements. Later on, [15] introduced the first MATLAB-based vectorized algorithm for triangular and tetrahedral elements of any order in linear elasticity. Besides its performance over the standard assembly method, this approach requires less than a quarter of the memory (RAM) needed by the former.

Marcinkowski et al. in [21] investigated the issue by taking advantage of multidimensional arrays to compute and store element matrices (stiffness and mass) at once and then use an iterative process to solve the global equation without assembling. However, when it comes to dealing with topologies with a high number of elements, the iterative feature of this approach turns out to be a big drawback and limiting factor.

In the work [24], the author paid particular attention to good programming styles in MATLAB to vectorize matrix/vector computation for the Poisson equation in a domain discretized with \(P_{1}\) triangle elements with one nodal degree freedom. Notwithstanding the fact that invoking sparse representation [25,26,27] with vectorization technique represents a fantastic asset, it is essential to have a good balance between the gained speed and memory needed for the computation since an optimal performance demands a good memory allocation extensively discussed in [28, 29].

In the recent version of MATLAB, starting from R2020b, new built-in functions such as pagetimes, mtimes, pagemldivide, and pagetranspose make it possible to efficiently multiply or transpose multidimensional matrices.

After underlining all these points, in this paper we present a generalized MATLAB-based vectorized algorithm for the computation of global matrix/force for finite element of any type and any order in linear elasticity. In this scheme, the following new features are addressed:

  • With help of MATLAB built-in functions operating on multidimensional matrices, the for-loop over elements is removed;

  • At each integration point, all element matrix/vector of the mesh are computed at once and then summed up to get the contribution of all integration points;

  • For any type of elements and nodal degree of freedom, the computation of column indices and row indices is obtained from degrees of freedom connectivity matrix using only basic operations and a single for-loop;

  • Generalization to elements of any types in 2D (triangle and quadrilateral) and 3D (tetrahedral, hexahedral, pentahedral, etc.) and for any approximation order (linear, quadratic, cubic, etc.);

  • Extension to Mindlin plate theory problem and functionally graded materials.

The structure of this paper is as follows: First, a brief classical and discretized variational formulation for general boundary value problems in elastodynamics is provided in Sect. 2, along with the standard algorithm for computing the global stiffness/mass matrix and force vector. After examining the shortcomings of the standard scheme, the methodology used to develop and implement in MATLAB the present algorithm is then succinctly described in the next section (Sect. 3). In the final section (Sect. 4), an L-shaped elastic structure is simulated using the proposed algorithm and the results validated after comparison with ABAQUS results of the same structure. Subsequently, a performance analysis is carried on a set of meshes for a 2D beam problem and a 3D plate problem to validate the computational cost and memory utilization of the proposed scheme over the standard algorithm and the vectorized algorithms by Anjam et al. [29] and Cuvelier et al. [15].

2 Problem formulation

For the sake of clarity, three types of notations are purposely used in the remainder to differentiate MATLAB’s commands and syntax from variables and expression defined in this study. Words in monospace typewriter font \((\texttt {pagetimes}, \texttt {zeros},\texttt {transpose},\texttt {none},\mathrm {etc.})\) refer to MATLAB’s built-in functions or syntax. Bold camel-case words are name of defined functions (GaussQuadrature, ShapeFunctions,etc.), while italic characters and words are variables or operators (nu,etc.).

Let us consider a continuum body (see Fig. 1) that initially (at \(t=0\)) occupies the domain \({\Omega }_0\subset {{\mathbb {R}}^d}_{| d\in (1,2,3)}\) in the reference configuration and \({\Omega }_t\subset {{\mathbb {R}}^d}_{| d\in (1,2,3)}\) in the current configuration and is subjected to body forces \({f}_v\) per unit mass and a stress vector t applied on the surface \({\partial \Omega }_1\).

Fig. 1
figure 1

Schematic drawing of the continuum body

At any time in the Cartesian coordinates, any point X of the body is described according to the repeated indices rule by \(u={u}_i{E}_i\) where \(i \in <1,2,3>\) and \(E_i\) are unit orthogonal vectors of the basis. Let also the boundary of the bounded set \({\Omega }\) (see Fig. 1) be defined by the union of three nonoverlapping partitions:

$$\begin{aligned} {\partial \Omega }=\partial \Omega _1\cup \partial \Omega _2\cup \Gamma \end{aligned}$$
(2.1)

Wherein \({\partial \Omega }_1\) is the boundary part where traction vector \(t=\sigma {\textbf {n}}\) is acting, \(\Gamma \) is the set of points where boundary conditions are enforced (\(u={u_p}\)) and \(\partial {\Omega }_2\) the remaining subset of \(\partial {\Omega }\). For completeness, we precise that \({\textbf {n}}\) is the outward normal unit vector to the surface \(\partial {\Omega }_1\), \(\sigma \) is Cauchy stress tensor and \(u_p\) are prescribed or known displacement values. Meanwhile, we precise that in the scope of the present work, Dirichlet boundary condition is sufficient to lay down the aim of this paper, and we emphasize that other types of boundary conditions (Newmann, Robin or mixed) can be enforced as well since the enforcement of boundary conditions is posterior to the assembly of global matrices/Vectors. Under all the above conditions, the equilibrium of the body is governed by the boundary value problem of the form (strong form):

$$\begin{aligned} \left\{ \begin{array}{l} {\mathrm{{div}}\left( \sigma \right) +f=\rho \ddot{u}} \\ {\sigma {\textbf {n}}=t} \\ {u=u_p} \end{array} \ \right. \ \begin{array}{l} \ \textrm{in}\ {[0,T]\times \Omega } \\ \textrm{on}\ {\partial }{\Omega }_1 \\ \textrm{on}\ {\Gamma } \end{array} \end{aligned}$$
(2.2)

We assume for simplicity to be in small strain (\(\varepsilon \)) and in linear elasticity, such a way that the constitutive law reads as

$$\begin{aligned} \sigma =\mathcal {C}:\varepsilon \end{aligned}$$
(2.3)

where \(\mathcal {C}\) is a fourth-order tensor called elasticity tensor and \({\varepsilon }\) is the small strain tensor given by the expression

$$\begin{aligned} {\varepsilon ={\frac{1}{2}}(\nabla u +{}^T\nabla u)} \end{aligned}$$
(2.4)

Variational formulation and discretization of the problem:

Before going any further, we define the space of kinematically admissible displacements as

$$\begin{aligned} V=\left\{ u\in H^1\left( \mathrm {\Omega }\right) \ |\ u=u_p\ on\ \mathrm {\Gamma }\right\} \end{aligned}$$
(2.5)

where \( H ^1({\Omega })\) is the Sobolev space of functions whose first-order derivatives are bounded in the energy norm. Considering a virtual displacement\(\ \delta u\), we multiply both sides of the first line of Eq.(2.2) and then integrate over the volume \({\ \delta u\ \in V}_d\)

$$\begin{aligned} \int \limits _{\mathrm {\Omega }}{div\left( \sigma \right) \ \bullet \delta u\ \mathrm{{d}}V}+\int \limits _{\mathrm {\Omega }}{f\bullet \delta u \mathrm{{d}}V}=\int \limits _{\mathrm {\Omega }}{\rho \ddot{u}\bullet \delta u\ \mathrm{{d}}S} \end{aligned}$$
(2.6)

In the previous equation, we precise that the symbol \((\bullet )\) is the scalar product. Now focusing our attention on the first expression on the left-hand side of Eq.(2.6) and making use of the Stokes–Green–Ostrogradski formula, we rewrite it into

$$\begin{aligned} \int \limits _{\mathrm {\Omega }}{div\left( \sigma \right) \bullet \ \delta u\ \mathrm{{d}}V}=\int \limits _{\mathrm {\partial }{\mathrm {\Omega }}_1}{t\ \bullet \delta u\ \mathrm{{d}}S}-\int \limits _{\mathrm {\Omega }}{\sigma :\mathrm {\nabla }\ \delta u\ \mathrm{{d}}V} \end{aligned}$$
(2.7)

where \(\mathrm {\nabla }\) denotes the gradient operator. Owing to the symmetricity of the strain tensor \(\varepsilon =\frac{1}{2}\left( \mathrm {\nabla }u+{}^T{\mathrm {\nabla }u}\right) \) and after transformation, the variational formulation of the boundary value reads as follows:

$$\begin{aligned} \int \limits _{\mathrm {\Omega }}{\rho \ddot{u}\bullet \delta u\ \mathrm{{d}}V}+\int \limits _{\mathrm {\Omega }}{\sigma :\delta \varepsilon \ \mathrm{{d}}V}=\int \limits _{\mathrm {\Omega }}{f\bullet \delta u\ \mathrm{{d}}V}+\int \limits _{\mathrm {\partial }{\mathrm {\Omega }}_1}{t\bullet \delta u\ \mathrm{{d}}S} \end{aligned}$$
(2.8)

Let \({\Omega }_h=\bigcup ^{nel}_{e=1}{{\Omega }^{(e)}_{h}}\) be the discretized domain in the discretized space and described with data in Table 1. In MATLAB language, we make use of the function \(\texttt {struct}\) to create a structure array that groups mesh parameters and their values into a single variable called \(FE\_model\) of type structure, each of whose fields can be retrieved using dot notation syntax (e.g., \( \texttt {nel\ =\ FE\_model.nel}\))

Table 1 Mesh data structure

Let also \(V_h\) be the space of kinematically admissible displacements associated with \({\mathrm {\Omega }}_h\). The variational formulation of the problem is defined in the discretized space by

$$\begin{aligned} \left\{ \begin{array}{l} \mathrm{{find}}\ u_h\in \ V_h\ \mathrm{{such}}\ \mathrm{{that}} \\ \int \limits _{{\mathrm {\Omega }}_h}{\rho {\ddot{u}}_h\bullet {\delta u}_h\ \mathrm{{d}}V}+\int \limits _{{\mathrm {\Omega }}_h}{{\sigma }_h:\delta {\varepsilon }_h\ \mathrm{{d}}V}=\int \limits _{{\mathrm {\Omega }}_h}{f\bullet \delta u_hdV}+\int \limits _{\mathrm {\partial }{\mathrm {\Omega }}_1}{t\bullet \delta u_h\ \mathrm{{d}}S} \end{array} \right. \end{aligned}$$
(2.9)

In the discretized space, the approximate global displacement of the element \(e \in <1,\ldots ,nel>\) can be written in terms of the nodal displacements \({\overline{u}}^{(e)}_{i\ \in <1,\ldots ,n>}\) and nodal shape functions \(N_{i\ \in <1,\ldots ,n>}\)

$$\begin{aligned} u^{(e)}_h=\sum ^{n}_{i=1}{N_i{\overline{u}}^{(e)}_i}={\mathcal {N}_{e}}{{u}}_e, \end{aligned}$$
(2.10)

where the shape function \({\mathcal {N}_{e}}\) of the element e is given by

$$\begin{aligned} {\mathcal {N}_{e}}\mathrm {=}\left[ \begin{array}{cccccccccc} N_{1}&{} 0 &{} 0 &{} N_{2}&{}0 &{} 0&{} \dots &{} N_{n} &{} 0 &{}0 \\ 0 &{} N_{1}&{} 0 &{} 0 &{}N_{2}&{}0 &{} \dots &{} 0 &{} N_{n} &{}0\\ 0 &{} 0 &{} N_{1} &{} 0 &{} 0 &{}N_{2}&{} \dots &{} 0 &{} 0 &{}N_{n}\\ \end{array}\right] \begin{array}{c}\mathrm {{in}}\ \mathrm {{3D}}\end{array} \nonumber \\ \begin{array}{c} \mathrm {{and}}\ \end{array}\, {\mathcal {N}_{e}}\mathrm {=}\left[ \begin{array}{ccccccccc} N_{1}&{} 0 &{} N_{2}&{}0 &{} \dots &{} N_{n} &{} 0 \\ 0 &{} N_{1}&{} 0 &{}N_{2}&{} \dots &{} 0 &{} N_{n}\\ \end{array}\right] \begin{array}{c} \mathrm {{in}}\ \mathrm {{2D}} \end{array} \end{aligned}$$
(2.11)

With the definition of u in Eq.(2.10), it is convenient for implementation purposes to use Voigt notation to reduce the order of tensors involved by one order. As such, \(\tilde{\sigma }\) and \(\tilde{\varepsilon }\) are respectively Cauchy stress vector and strain vector associated with \({\sigma }\) and \({\varepsilon }\)

$$\begin{aligned} \tilde{\sigma }^T\mathrm {=}\left[ \begin{array}{cccccc} \sigma _{xx}&\sigma _{yy}&\sigma _{zz}&\sigma _{yz}&\sigma _{zx}&\sigma _{xy} \end{array}\right] \begin{array}{c}\mathrm {{in}}\ \mathrm {{3D,}}\end{array} \tilde{\sigma }^T\mathrm {=}\left[ \begin{array}{ccc} \sigma _{xx}&\sigma _{yy}&\sigma _{xy} \end{array}\right] \begin{array}{c} \mathrm {{in}}\ \mathrm {{2D}} \end{array}\nonumber \\ \tilde{\varepsilon }^T\mathrm {=}\left[ \begin{array}{cccccc} \varepsilon _{xx}&\varepsilon _{yy}&\varepsilon _{zz}&\varepsilon _{yz}&\varepsilon _{zx}&\varepsilon _{xy} \end{array}\right] \begin{array}{c}\mathrm {{in}}\ \mathrm {{3D,}}\end{array} \tilde{\varepsilon }^T\mathrm {=}\left[ \begin{array}{ccc} \varepsilon _{xx}&\varepsilon _{yy}&\varepsilon _{xy} \end{array}\right] \begin{array}{c} \mathrm {{in}}\ \mathrm {{2D}} \end{array} \end{aligned}$$
(2.12)

Under this notation, the strain vector of each element e can be written in terms of the nodal displacement vector as follows:

$$\begin{aligned} \tilde{\varepsilon }_e={B_e}{u_e} \end{aligned}$$
(2.13)

where \({B_e}\) is the strain matrix of the element and has for components the following

$$\begin{aligned} \ B_{e}\mathrm {=}\left[ \begin{array}{cccccccccc} N_{1,x}&{} 0 &{} 0&{}N_{2,x}&{}0 &{} 0 &{}\dots &{} N_{n,x} &{}0&{}0 \\ 0 &{} N_{1,y}&{} 0 &{}0 &{}N_{2,y}&{}0 &{}\dots &{} 0 &{} N_{n,y} &{}0\\ 0 &{} 0 &{} N_{1,z} &{} 0 &{} 0 &{}N_{2,z}&{} \dots &{} 0 &{} 0 &{}N_{n,z}\\ N_{1,y}&{} 0 &{} N_{1,z} &{}N_{2,y}&{}0 &{} N_{2,z}&{}\dots &{} N_{n,y} &{}0&{}N_{n,z}\\ N_{1,x}&{} N_{1,z}&{} 0&{}N_{2,x}&{}N_{2,z}&{}0 &{}\dots &{}N_{n,x}&{} N_{n,z} &{}0\\ 0 &{} N_{1,y}&{}N_{1,x}&{} 0 &{}N_{2,y}&{}N_{2,x}&{}\dots &{} 0&{} N_{n,y} &{}N_{n,x}\\ \end{array}\right] \begin{array}{c}\mathrm {{in}}\ \mathrm {{3D}}\end{array}\nonumber \\ \begin{array}{c} \mathrm {{and}}\ \end{array} \ B_{e}\mathrm {=}\left[ \begin{array}{cccccccccc} N_{1,x}&{} 0 &{}N_{2,x}&{}0 &{}\dots &{} N_{n,x} &{}0\\ 0 &{} N_{1,y} &{}0 &{}N_{2,y} &{}\dots &{} 0 &{} N_{n,y}\\ N_{1,y}&{}N_{1,x}&{}N_{2,y}&{}N_{2,x}&{}\dots &{}N_{n,y}&{}N_{n,x}\\ \end{array}\right] \begin{array}{c} \mathrm {{in}}\ \mathrm {{2D}} \end{array} \end{aligned}$$
(2.14)

After replacing Eqs.(2.3), (2.10) and (2.13) into Eq.(2.9) and simplifying everything, we end up with Eq.(2.15)

(2.15)

where is the assembly operator and is preferred over the union operator since it makes more sense in regards to the fact that the assembly is done at nodal level. Subsequently, Eq.(2.15) can be recast in matrix form as

(2.16)

where the expressions \(M_e\),\(\ K_e\),\(\ f_e\) and \(t_e\) are the element mass matrix, stiffness matrix, body force vector and applied force vector, respectively. It should also be noted that in Eq.(2.14), integrals over elements are usually approximated numerically using the Gauss quadrature rule based on the transformation between parent space and the isoparametric space as illustrated in Fig. 2 for linear quadrilateral elements, for instance.

Fig. 2
figure 2

Isoparametric mapping for a quadrilateral element

Let \(\mathcal {U}\) be the global nodal displacement vector of all the nodes (nn) of the mesh such that

$$\begin{aligned} {\mathcal {U}^T}\mathrm {=}\left[ \begin{array}{ccccc} u_{1}&u_{2}&u_{3}&\dots&u_{m*nn} \end{array}\right] ^T,\ \mathcal {U}\in {{\mathbb {R}}^{m{\times }nn}} \end{aligned}$$
(2.17)

After rearranging and assembling contributions of all elements of the mesh, we end up with the resulting global algebraic system of equations of the boundary value problem as follows:

$$\begin{aligned} {M_g}\ {\mathcal {\ddot{U}}}+{K_g}\ {\mathcal {U}}={F_{ext}}\, \ \mathcal {U}\in {{\mathbb {R}}^{m{\times }nn}} \end{aligned}$$
(2.18)

where \({F_{ext}}\) is the global external force vector (sum of body force and traction force), while \({K_{g}}\) and \({M_{g}}\) are, respectively, the global stiffness matrix and mass matrix of the body. For further details about this type of variational formulation, the reader is referred to works in [30, 31].

In the conventional computer-based calculation [17, 32,33,34,35], \({K_{g}}\), \({M_{g}}\) or \({F_{ext}}\) are obtained by sequentially computing element matrices or vectors for each element and storing their row indices IndexI, column indices IndexJ and values \({K_{values}}\). Once all the elements have been calculated, global matrices or vectors are assembled using MATLAB’s built-in function \(\texttt {sparse}\) as shown in the workflow in Fig. 3.

The standard algorithm for computing the stiffness matrix according to the workflow shown in Fig. 3 is outlined in algorithm 1 and called StandardGlobalStiffness

Algorithm 1
figure a

standard algorithm for computing Kg

Taking a look at the structure of algorithm 1, two main limitations emerge:

  • The non-vectorized for-loop over elements of the mesh: The elementwise-based computation of element stiffness matrices and their indices (IndexI and IndexJ) is computationally inefficient since it does not make optimal use of MATLAB’s vector operation capabilities;

  • Apart from the outermost for-loop, four nested for-loops are needed to compute row and column indices.

Several assembly procedures have been proposed like those in [15, 17, 29] to get rid of at least one of the above-mentioned limitations. Precisely, while authors in the former use matrix rearrangements to reduce memory usage and achieve a better performance, the authors in the latter take advantage of affine transformations to formulate their algorithm.

3 Proposed vectorized algorithm

3.1 Formulation

First and foremost, it is important to keep in mind that serial calculation of finite element terms can be classified into two categories: non-vectorized serial calculation and vectorized serial calculation. Due to integrals involved in the calculation of finite element terms, each method can be performed either implicitly, by means of numerical integration schemes like Gauss quadrature rule, or explicitly (exact integration) as shown in Fig. 4. However, the latter approach is tedious and irrelevant for finite elements with approximation order higher than 1 (\(p>1\)).

Fig. 3
figure 3

Standard workflow for computing the stiffness matrix

Fig. 4
figure 4

Types of serial computation in FEM

Now, we are going to introduce an implicit vectorized algorithm based on the branch highlighted in orange in Fig. 4. The workflow of this algorithm in linear elasticity is depicted in Fig. 5.

Fig. 5
figure 5

Workflow of the proposed algorithm

In the following, we describe the whole procedure and the steps it comprises.

  1. (a)

    Firstly, we need to gather values of nodal shape functions and their derivatives in the isoparametric space so that they are computed only once. Hence in the subroutine GaussQuadrature, Gauss quadrature data (number of Gauss points, their respective coordinates and weights) are respectively calculated in the isoparametric space and stored in the expressions nPoint, weight and Gq,with

    $$\begin{aligned} {Gq=}\left[ \begin{array}{cccc} \xi _1&{}\xi _2&{}\dots &{}\xi _q\\ \eta _1&{} \eta _2&{}\dots &{}\eta _q\\ \kappa _1&{} \kappa _2&{}\dots &{}\kappa _q\\ \end{array} \right] \begin{array}{c} \mathrm {{in}}\ \mathrm {{3D}} \end{array} \end{aligned}$$
    (3.1)

    where Gq is of size \(d{\times }q\). When it comes to 1D elements, Gq is reduced to only the first line, whereas only the first two lines are relevant for 2D elements.

  2. (b)

    With Gauss quadrature data at hand, we proceed with the computation of values of shape functions and their derivatives at each integration point, then store them as NI and \(dN\_d{\theta }\), respectively. In the subroutine ShapeFunction, basis functions of some basic 2D and 3D isoparametric elements are defined with help of symbolic operations in MATLAB, which enables to play around with derivatives as well as corresponding values at any given integration point without much effort. Outputs read as

    $$\begin{aligned} {NI=}\left[ \begin{array}{ccc} N_1({\theta }_1)&{}\dots &{}N_1({\theta }_q)\\ N_2({\theta }_1)&{}\dots &{}N_2({\theta }_q)\\ \dots &{}\dots &{}\dots \\ N_n({\theta }_1)&{}\dots &{}N_n({\theta }_q)\\ \end{array} \right] \end{aligned}$$
    (3.2)

    where q is the number of Gauss quadrature points, \({dim\left( NI\right) }=d{\times }q\) and \({\theta }_{i\ \in \ <1,2,\dots ,\ q>}\ \)are the coordinates of quadrature points in the isoparametric space

    $$\begin{aligned} {\theta }_{i\ }\mathrm {=}\left\{ \begin{array}{c} {\xi }_i \\ {(\xi ,\eta )}_i \\ {(\xi ,\eta ,\kappa )}_i \end{array} \right. \begin{array}{c} \mathrm {{in}}\ \mathrm {{1D}}\\ \mathrm {{in}}\ \mathrm {{2D}}\\ \mathrm {{in}}\ \mathrm {{3D}} \end{array} \end{aligned}$$
    (3.3)

    Similarly, \(dN\_d{\theta }\) is a multidimensional matrix that stores values of derivative of basis functions (with respect to the isoparametric space) at all quadrature points in such a way that values at each integration point i are stored in the slice i of the multidimensional matrix \(dN\_d{\theta }\) and are calculated from the expression (for 3D problems) below

    $$\begin{aligned} {dN\_d{\theta }(:,:,i)=}\left[ \begin{array}{ccc} \frac{\partial N_1}{\partial \xi }({\theta }_i)&{}\frac{\partial N_1}{\partial \eta }({\theta }_i)&{}\frac{\partial N_1}{\partial \eta }({\theta }_i) \\ \frac{\partial N_2}{\partial \xi }({\theta }_i)&{}\frac{\partial N_2}{\partial \eta }({\theta }_i)&{}\frac{\partial N_2}{\partial \eta }({\theta }_i) \\ \dots &{}\dots &{}\dots \\ \frac{\partial N_n}{\partial \xi }({\theta }_i)&{}\frac{\partial N_n}{\partial \eta }({\theta }_i)&{}\frac{\partial N_n}{\partial \eta }({\theta }_i) \end{array} \right] \end{aligned}$$
    (3.4)

    So \({{dim} (dN\_d{\theta })}{=n{\times }d{\times }q}\), wherein n is the number of basis functions at each finite element and d the dimension of the problem. From the above form, one can deduce \(dN\_d{\theta }\) in 1D and 2D.

  3. (c)

    The next step is to get rid of the for-loop over elements. To this end, we define the multidimensional strain matrix \(\mathcal {B}\) that contains strain matrices of every single element stored in a specific slice or page with index i. Since \(dN\_d{\theta }\) is the derivative of the nodal basis functions in the isoparametric space, we have to compute the Jacobian J of the transformation. For every integration point, the following steps are defined to compute \(\mathcal {B}\) and the corresponding multidimensional element stiffness matrix \(\mathcal {K}_e\):

    1. (i)

      From \(dN\_d{\theta }\), we extract the derivative of the shape function at the current Gauss point and call MATLAB built-in function pagetimes to calculate the Jacobian of the transformation J by the formula

      $$\begin{aligned} J=\texttt {pagetimes}(dN{\mathrm {\_}}d{\theta }(:,:,i),\mathrm {\texttt {'none'}},Pts,\mathrm {\texttt {'transpose'}}) \end{aligned}$$
      (3.5)

      where J is of dimension \(d{\times }d{\times }nel\).

    2. (ii)

      Afterward, the derivative of the basis function matrix \(dN\_dX\) in the element space is computed

      $$\begin{aligned} dN\_dX=\texttt {pagemldivide}(J,dN\_d{\theta }) \end{aligned}$$
      (3.6)

      where \(dN\_dX\) is of dimension \(d{\times }n{\times }nel\).

    3. (iii)

      Then after pre-allocating memory for storing \(\mathcal {B}\), its values are computed and filled in only one for-loop of length n such that at the node \(s \in \ <1,2,\dots ,\ n>\) of all elements, the component \(\mathcal {B}(:,d* (s-1)+1:d*s,:)\) is in 3D given by

      $$\begin{aligned} \mathcal {B}(:,d*(s-1)+1:d*s,:)\mathrm {=}\left[ \begin{array}{ccc} dN{\_}dX(1,s,:)&{}0_{1{\times }1{\times }nel}&{}0_{1{\times }1{\times }nel}\\ 0_{1{\times }1{\times }nel}&{} dN{\_}dX(2,s,:)&{} 0_{1{\times }1{\times }nel}\\ 0_{1{\times }1{\times }nel}&{}0_{1{\times }1{\times }nel}&{} dN{\_}dX(3,s,:)\\ dN{\_}dX(2,s,:)&{} dN{\_}dX(1,s,:)&{}0_{1{\times }1{\times }nel} \\ 0_{1{\times }1xnel}&{}dN{\_}dX(3,s,:)&{} dN{\_}dX(2,s,:)\\ dN{\_}dX(3,s,:)&{} 0_{1{\times }1{\times }nel}&{} dN{\_}dX(1,s,:)\\ \end{array}\right] \end{aligned}$$
      (3.7)

      The component \(\mathcal {B}(:,d*(s-1)+1:d*s,:)\) for 2D problems is deduced following the same token as in Eq. 3.6.

      $$\begin{aligned} \mathcal {B}(:,d*(s-1)+1:d*s,:)\mathrm {=}\left[ \begin{array}{cc} dN{\_}dX(1,s,:)&{}0_{1{\times }1{\times }nel}\\ 0_{1{\times }1{\times }nel}&{} dN{\_}dX(2,s,:)\\ dN{\_}dX(2,s,:)&{} dN{\_}dX(1,s,:)\\ \end{array}\right] \end{aligned}$$
      (3.8)

      The matrix \(\mathcal {B}\) is fully constructed in a single for-loop by calling the function Bs written Algorithm 4.

    4. (iv)

      Now let detJ be the determinant of the Jacobian matrix J. In order to avoid elementwise computation of detJ, we introduce the following vectorized form defined in the function DetJacobian of Algorithm 2.

    5. (v)

      Update the multidimensional element stiffness matrix \(\mathcal {K}_{e}\) given in 3D

      $$\begin{aligned} \mathcal {K}_{e}=\mathcal {K}_{e}+\texttt {pagemtimes}(\mathcal {B},\mathrm {\texttt {'transpose'}}, \texttt {pagemtimes}(\mathcal {C},\mathcal {B}).*detJ*weight(i),\mathrm {\texttt {'transpose'}}) \end{aligned}$$
      (3.9)

      where the determinant of the Jacobian detJ is of size \(1{\times }1{\times }nel\) and \(\mathcal {K}_{e}\) is of size \((m*n){\times }(m*n){\times }nel\). The row vector weight of size \(1{\times }q\), is the vector of Gauss quadrature weights. In 2D, the second expression on the right-hand side of Eq.(3.9) will be multiplied by the thickness of the elements.

  4. d)

    Once we obtain stiffness matrix of all elements \(\mathcal {K}_{e}\) for all integration points, the next step is to construct row and column indices (IndexI and IndexJ). While the row and column indices in Algorithm 1 and others existing in the literature are computed concurrently with the element stiffness values, we propose a scheme in Algorithm 3, wherein IndexI and IndexJ are calculated from the degrees of freedom connectivity matrix \(Matrix\_dof\), obtained in a single for-loop over m. For memory sake, \(Matrix\_dof\), IndexI and IndexJ are defined as 32-bit integers. It should be noted that in algorithm 3, the operations \(Matrix\_dof(:)\) and \(Matrix\_dof(:)'\) convert the matrix \(Matrix\_dof\) into a column vector and a row vector, respectively.

Remark 3.1

For any element e with domain \({ \Omega }^{(e)}_h \subset { \Omega }_h\subset \ {\mathbb {R}}^d\), row and column indices (\({IndexI}_{e}\) and \({IndexJ}_{e}\)) of its stiffness components can be straightforwardly obtained from its degrees of freedom connectivity matrix after a single \(for-loop\) over the number of degrees of freedom per node.

Algorithm 2
figure b

Determinant of the Jacobian J

Algorithm 3
figure c

Computation of \(Matrix\_dof\)

  1. e)

    As a final step, after constructing IndexI, IndexJ, and \(\mathcal {K}_e\), the global stiffness matrix is assembled using MATLAB’s built-in function \(\texttt {sparse}\) through the command below

    $$\begin{aligned} \mathcal {K}_g\leftarrow \texttt {sparse}(IndexI{\texttt {(:)}},IndexJ{\texttt {(:)}},\texttt {reshape}(\mathcal {K}_e,[ \ ],1)) \end{aligned}$$
    (3.10)

To wrap it all up, the summary of the proposed method is outlined in Algorithm 4 under the function \({\textbf {PropGlobalStiffness}}\). In order to reduce as much as possible memory usage, some variables are emptied at certain points throughout the execution (see lines 13151826 and 27).

Algorithm 4
figure d

Proposed algorithm for computing the stiffness matrix \(\mathcal {K}_g\)

For sake of simplicity, some lines in the above algorithm have intentionally been omitted particularly those related to the retrieval of mesh data (nelndm,  etc.) from the structure array \(FE\_model\) using dot notation syntax (e.g., \( \texttt {nel\ =\ FE\_model.nel}\)).

3.2 Extension of the proposed vectorized algorithm for other types of problems

3.2.1 For functionally graded materials

Recalling that in the case of functionally graded materials, there is a spatial variation of material properties due to spatial changes in the material’s composition and/or microstructure. Basically, Young’s modulus E and/or Poisson’s ratio can be described as a function of the position of the observation point \(M \in \Omega \). So, is the elasticity tensor \(\mathcal {C}\left( M\right) \) such that the constitutive behavior reads as

$$\begin{aligned} \sigma =\mathcal {C}\left( M\right) :\varepsilon \end{aligned}$$
(3.11)

The numerical calculation of the element stiffness matrix within such materials is quite straightforward after computing the element elasticity tensor at the current Gauss quadrature point g. Knowing that the spatial position of each Gauss quadrature point can be computed from element nodal coordinates using shape functions as given in Eq. (3.12)

$$\begin{aligned} x^{(e)}_g=\sum ^{n}_{i=1}{N_i{x}^{(e)}_i}={\mathcal {N}_{e}}{{x}}_e;\qquad y^{(e)}_g=\sum ^{n}_{i=1}{N_i{y}^{(e)}_i}={\mathcal {N}_{e}}{{y}}_e;\qquad z^{(e)}_g=\sum ^{n}_{i=1}{N_i{z}^{(e)}_i}={\mathcal {N}_{e}}{{z}}_e \end{aligned}$$
(3.12)

Owing to the fact that the spatial expression of E and/or \(\nu \) is a known parameter of the problem, E and/or \(\nu \) at the current Gauss point g can be computed in 3D as per Eq. (3.13)

$$\begin{aligned} E\left( M_g\right) =E\left( x^{(e)}_g,y^{(e)}_g,z^{(e)}_g\right) ;\qquad \nu \left( M_g\right) =\nu \left( x^{(e)}_g,y^{(e)}_g,z^{(e)}_g\right) \end{aligned}$$
(3.13)

For a basic illustration of the current vectorized algorithm, let us assume a case with \(\nu \) constant, \(E\left( M\right) \) varying with respect to x and considering a state of plane stress, the elasticity tensor is given in 3D by the expression

$$\begin{aligned} \mathcal {C}\left( M_g \right) =\frac{E\left( M_g\right) }{\left( 1+\nu \right) \left( 1-2\nu \right) }\left[ \begin{array}{cccccc} 1-\nu &{}\nu &{}\nu &{}0&{}0&{}0\\ \nu &{} 1-\nu &{} \nu &{}0&{}0&{}0\\ \nu &{}\nu &{} 1-\nu &{}0&{}0&{}0\\ 0&{} 0&{}0&{}\frac{1}{2}\left( 1-2\nu \right) &{}0&{}0 \\ 0&{}0&{}0&{}0&{}\frac{1}{2}\left( 1-2\nu \right) &{}0\\ 0&{}0&{}0&{}0&{}0&{} \frac{1}{2}\left( 1-2\nu \right) \\ \end{array}\right] \end{aligned}$$
(3.14)

or simply

$$\begin{aligned} \mathcal {C}\left( x \right) =E\left( x\right) \mathcal {C}_b \end{aligned}$$
(3.15)

where \(\mathcal {C}_b\) is the material tensor for a unit Young’s modulus. \(E\left( x\right) \) at the Gauss point g for all elements of the model is computed using a single command

$$\begin{aligned} E\left( x_g \right) =\texttt {pagetimes}(E\left( \texttt {pagetimes} \left( Pts\left( 1,:,:\right) ,Ng\right) \right) ,\mathcal {C}_b ) \end{aligned}$$
(3.16)

In the previous expression, the user defines the analytical expression of \(E\left( x\right) \) using MATLAB function handle. Ng is a column vector of the element nodal shape function at the current integration point. Following this same token, the material tensor of any functionally graded material can be calculated in a vectorized manner without much effort. Provided the analytical expression of \(E\left( x\right) \) and/or \(\nu \left( x\right) \) is known in advance.

3.2.2 For plate problems with Mindlin formulation

Mindlin plate formulation proposes an improvement to Kirchhoff–Love plate theory to take into account shear deformations through the thickness of the thick plates. In this formulation, the section initially perpendicular to the middle surface before the deformation does not remain perpendicular to the middle plane after the deformation, such that the displacement field can be defined as follows:

$$\begin{aligned} u=z{\theta _x};\qquad v=z{\theta _y};\qquad w=w_0; \end{aligned}$$
(3.17)

where \(\theta _x\) and \(\theta _y\) are the rotations with respect to x and y axes, respectively. Similar to Kirchhoff–Love’s plate theory, the study of the problem is described from the middle surface as illustrated in Fig. 6.

Fig. 6
figure 6

Illustration of Mindlin plate formulation

With this approximation, bending component \(\epsilon _f\) of the strain tensor reads as

$$\begin{aligned} \epsilon _x&=\displaystyle \frac{\partial {u}}{\partial {x}}=z\displaystyle \frac{\partial {\theta _x}}{\partial {x}} \end{aligned}$$
(3.18a)
$$\begin{aligned} \epsilon _y&=\displaystyle \frac{\partial {v}}{\partial {y}}=z\displaystyle \frac{\partial {\theta _y}}{\partial {y}} \end{aligned}$$
(3.18b)
$$\begin{aligned} \gamma _{xy}&=\displaystyle \frac{\partial {u}}{\partial {y}}+\displaystyle \frac{\partial {v}}{\partial {x}}=z\left( \displaystyle \frac{\partial {\theta _y}}{\partial {x}}+\displaystyle \frac{\partial {\theta _x}}{\partial {y}}\right) , \end{aligned}$$
(3.18c)

while shear component \(\epsilon _c\) is given by

$$\begin{aligned} \gamma _{xz}&=\displaystyle \frac{\partial {w}}{\partial {x}}+\displaystyle \frac{\partial {u}}{\partial {z}}=\displaystyle \frac{\partial {w}}{\partial {x}}+z\theta _x \end{aligned}$$
(3.19a)
$$\begin{aligned} \gamma _{yz}&=\displaystyle \frac{\partial {w}}{\partial {y}}+\displaystyle \frac{\partial {v}}{\partial {z}}=\displaystyle \frac{\partial {w}}{\partial {y}}+z\theta _y \end{aligned}$$
(3.19b)

In the hypothesis of small strains and homogeneous material, the bending part of Cauchy stress \(\sigma \) reads as

$$\begin{aligned} \sigma _f={\mathcal {C}_f}{\epsilon _{f}} \end{aligned}$$
(3.20)

where \({\mathcal {C}_f}\) is defined as

$$\begin{aligned} {\mathcal {C}_f}=\frac{E}{\left( 1-\nu ^2\right) }\left[ \begin{array}{ccc} 1&{}\nu &{}\nu \\ \nu &{} 1&{} \nu \\ 0&{}0&{} \displaystyle \frac{1-\nu }{2}\\ \end{array}\right] \end{aligned}$$
(3.21)

Shear stress components are given by

$$\begin{aligned} \sigma _{c}={\mathcal {C}_c}{\epsilon _{c}} \end{aligned}$$
(3.22)

where \({\mathcal {C}_c}\) is defined as

$$\begin{aligned} {\mathcal {C}_c}=\kappa \frac{E}{2\left( 1+\nu \right) }\left[ \begin{array}{ccc} 1&{}0 \\ 0 &{} 1 \\ \end{array}\right] \end{aligned}$$
(3.23)

where \(\kappa \) is the shear correction factor. Now, since the vector field \(U=\left( u,v, w\right) \) can be fully characterized for given \(d^T=\left( \theta _x,\theta _y, w\right) \) thanks to Eq. (3.17), we assume in the discretized space, the following interpolation

$$\begin{aligned} \theta _x=\sum ^{n}_{i=1}{N_i{\theta _x}_i};\qquad \theta _y=\sum ^{n}_{i=1}{N_i{\theta _y}_i};\qquad w=\sum ^{n}_{i=1}{N_i{w_i}} \end{aligned}$$
(3.24)

In virtue of this general approximation, the bending \(B_f\) and shear \(B_c\) strain matrices can be readily written out.

$$\begin{aligned} B_{f}\mathrm {=}\left[ \begin{array}{cccccccccc} N_{1,x}&{}0&{}0&{}\dots &{} N_{n,x} &{}0&{}0 \\ 0&{} N_{1,y}&{}0 &{} \dots &{} 0 &{} N_{n,y} &{}0\\ N_{1,y}&{} N_{1,x}&{}0 &{} \dots &{} N_{n,y} &{} N_{n,x} &{}0\\ \end{array}\right] ; B_{c}\mathrm {=}\left[ \begin{array}{cccccccccc} N_{1}&{} 0 &{}N_{1,x} &{}\dots &{}N_{n}&{} 0 &{}N_{n,x}\\ 0 &{} N_{1} &{}N_{2,y} &{}\dots &{}0&{} N_{n} &{}N_{n,y}\\ \end{array}\right] \end{aligned}$$
(3.25)

After replacing Eqs.(3.25), (3.24), (3.22) and Eq.(3.20) into Eq.(2.9) and simplifying everything, we end up with Eq.(3.26)

(3.26)

where \(\overline{M}\) reads as

$$\begin{aligned} \overline{M}=\displaystyle \left[ \begin{array}{ccc} \frac{h^3}{12}&{}0&{}0 \\ 0 &{} \frac{h^3}{12}&{}0 \\ 0 &{} 0&{}h \\ \end{array}\right] \end{aligned}$$
(3.27)

After assembling, we end up with the system of partial differential equations below

$$\begin{aligned} {M_g}{{\ddot{d}}}+{K_g}{d}={F_{ext}}\,\quad d \in {{\mathbb {R}}^{nn*m}} \end{aligned}$$
(3.28)

It is important to bear in mind that in Eq.(3.28), the shear-related component of the stiffness matrix is calculated using reduced integration. This prevents shear-locking problems from occurring.

The vectorization of Eq. (3.28) is conducted similarly, as outlined in the general formulation of the proposed algorithm in the previous section, in particular, \(B_{f}\) and \(B_{c}\). Here, the functions \({\textbf {getBplate}}\), \({\textbf {PropStiffnessPlate}}\), \({\textbf {PropGlobalForce}}\) were created to implement Mindlin plate theory. The detailed structure of these functions is provided in the Appendix.

4 Numerical experiments:

This section is devoted to the verification and validation of the proposed vectorized algorithm for linear elastic boundary value problems.

  • For verification and performance analysis, the commercial software Simulia ABAQUS [38] is used to model and analyze a linear elastic L-shaped domain (see Sect. 4.1), a clamped plate based on Mindlin theory(see Sect. 4.3), a functionally graded membrane in tension(see Sect. 4.4) and a functionally graded 3D beam (see Sect. 4.5) and compare the results and performances with those obtained by the proposed method.

  • In order to numerically demonstrate the novelty of the proposed algorithm over two other existing MATLAB-based vectorized algorithms [15, 29], a computational time and memory usage-based performance test is conducted in Sect. 4.2. In each method, the global stiffness matrix of a 2D beam then a 3D plate problem is calculated on a set of meshes (from coarse to fine), and the associated execution time and memory usage are recorded and compared.

4.1 Example 1: Elastic L-shaped structure

We consider the 5 -cm-thick linear elastic L-shaped structure with the geometry shown in Fig. 7 and whose material properties are Young’s modulus E= 210 GPa, Poisson’s ratio \(\nu \)= 0.3 and density \(\rho \) = 7850 kg/m\(^3\), respectively. This structure is loaded with a concentric force of intensity 200KN oriented toward the negative y-axis as represented in Fig. 7. For simplicity, we restrict the problem to a plane-stress analysis.

Fig. 7
figure 7

Schematic drawing of the L-shaped stucture

The computing resource used for this simulation is a gaming desktop equipped with an AMD Ryzen 3 1200 processor (running at a clock frequency of 3.1GHz), a total RAM of 24GB, and a Windows 10 Pro 64-bit operating system. On this machine are installed Simulia ABAQUS/CAE 2020 [38] and MATLAB 2024a, which we used to carry out performance comparisons.

First, ABAQUS was used to create the geometry, define material properties and load, enforce boundary condition on the edge y=0, and generate the set of \(P_{1}\)-triangle meshes (linear triangles) with data reported in Table 2.

Table 2 Set of linear triangle element meshes

Subsequently, the meshes created in Simulia ABAQUS were imported into MATLAB as topology inputs. Afterward, the simulations using ABAQUS and the proposed algorithm in MATLAB were carried out, and the result of the vertical displacement \(U_z\) based on the first mesh is displayed in Fig. 8.

Fig. 8
figure 8

Visualization of the displacement along y-axis: a in ABAQUS/CAE; b with proposed algorithm in MATLAB

In both simulations, we note that the vertical displacement is the same. Having verified the accuracy of the proposed method, we move on with the comparison of the computational time in ABAQUS to that of the proposed algorithm based on the set of meshed listed in Table 2.

Fig. 9
figure 9

Runtime performance analysis of the L-shaped structure: a CPU time for assembling \(K_g\) as per the \( Standard \) and \( proposed \) algorithm; b Total CPU time of the analysis (global matrix/force construction + solving the system of the equation)

The results show that by reducing the global matrix construction time by 10 times using the proposed method (see Fig. 9a), the total runtime needed for the FEM analysis is reduced by 5 times (see Fig. 9b).

4.2 Example 2: Comparison with the existing vectorized algorithms

In this part, we have only reported information related to the objective of measuring and comparing the execution times for calculating the global stiffness matrix using the algorithms under consideration for a selection of 2D and 3D problems. For both examples, the simulations were conducted on a computer with specifications listed in Table 3 below

Table 3 Computer specifications

4.2.1 Case 1: 2D beam problem

In order to make an assessment of the performance of the proposed vectorized algorithm, we considered two other vectorized algorithms together with the algorithm 1 as comparison basis. For this purpose, a set of 6 \(P_{1}\) triangle element meshes was considered as topology inputs. The corresponding data are reported in Table 4.

Table 4 Set of linear triangle element meshes

We precise that here the number of degrees of freedom per node is \(m=2\); so \(ndof=m*nn\) for each mesh ID in Table 4.

The codes for implementing the two other vectorized algorithms [15, 29] considered in this study are obtained from authors’ platforms [39, 40]. Hereafter, these algorithms are referred to respectively as \({Cuvelier\ et\ al}\) and \({Anjam\ et\ al}\). Along with that we also used the external function fsparse provided with the package FAST in [41] and available on the first author’s github repository [42]. With this function, a sparse matrix can be constructed from the triplet IndexIIndexJKvalues more quickly than with MATLAB’s built-in sparse.

After running the programs using the standard algorithm, the proposed method (first with sparse then fsparse), \({Cuvelier\ et\ al}\) and \({Anjam\ et\ al}\) algorithms on the aforementioned set of 2D meshes, the recorded computational times are depicted in Fig. 10.

It is worth noting that the assembly with sparse, named proposed, is represented by the cyan dashed line with a star marker.

Fig. 10
figure 10

Visualizaton of \(\mathcal {K}_g\) computational time for a set of 2D meshes with \(P_1\) triangle elements: a runtime versus the number of degrees of freedom; b runtime versus ndof in log–log scale

It can be seen in Fig. 10b that the slope of the line joining the cloud of points for each algorithm is nearly the same. However, it is much more convenient to compare them on the basis of the speedup ratio (see Fig. 11) with respect to the standard algorithm.

Fig. 11
figure 11

Visualizaton of the computational speedup with respect to the standard algorithm for a set of 2D meshes with \(P_1\) triangle elements: a speedup versus the number of degrees of freedom; b boxplot of the speedup of the algorithms

In view of the boxplot in Fig. 11b, we find out that the dispersion with the proposed algorithm using the assembly functions sparse and fsparse is nearly the same. While the proposed scheme with sparse is about 9x faster than the standard algorithm, \({Cuvelier\ et\ al}\)’s approach is circa 6x faster and \({Anjan\ et\ al}\)’s method is about 4x faster. On the other hand, when the present algorithm is executed with fsparse instead, the program performance turns out to be 15x faster.

Besides that, the analysis of memory usage in MATLAB profiler reveals that \({Cuvelier\ et\ al}\)’s approach is the least RAM-intensive compared to the present method, standard method and \({Anjam\ et\ al}\)’s approach as illustrated in Fig. 12.

Fig. 12
figure 12

Memory usage for computing \(\mathcal {K}_g\) per algorithm with respect to mesh ID for the selected 2D problem

4.2.2 Case 2: 3D beam problem

Similarly, as we proceeded in section 4.2.1, we consider a simply supported plate discretized with a set of 6 \(P_{1}\) tetrahedral element meshes whose information are reported in Table 5

Table 5 Set of linear tetrahedral element meshes

We precise that here the number of degrees of freedom per node is \(m=3\); so \(ndof=m*nn\) for each mesh ID in Table 5.

After execution with each algorithm as proceeded for the 2D case, we obtained the runtime depicted in Fig. 13.

Fig. 13
figure 13

Visualizaton of computational time for a set of 3D meshes with \(P_{1}\) tetrahedral elements: a runtime versus the number of degrees of freedom; b runtime versus ndof in log-log scale

We note in Fig. 14a that the performance of the proposed algorithm used with sparse decreases from mesh 1 to 3 but stabilizes with a speedup of circa 3.8 for larger meshes. Although this behavior is observed, the speedup is at least 1.5x higher than that of \({Cuvelier\ et\ al}\) and 2x higher than that of \({Anjam\ et\ al}\), whereas the use of fsparse enables to maintain an approximately constant speedup of 8 regardless of mesh size. A more comprehensive visualization of the speedup with respect to the standard algorithm is illustrated with the boxplot in Fig. 14b.

Fig. 14
figure 14

Visualization of the computational speedup with respect to the standard algorithm for a set of 3D meshes with \(P_1\) tetrahedral elements: a speedup versus the number of degrees of freedom; b boxplot of the speedup of the algorithms

On the memory front, it can be seen in Fig. 15 that the proposed method uses almost as much memory as the standard method.

Fig. 15
figure 15

Memory usage for computing \(\mathcal {K}_g\) per algorithm with respect to mesh ID for the selected 3D plate problem

4.3 Example 3: clamped thick plate

Let us assume the isotropic and homogeneous thick plate depicted Fig. 16 and with mechanical properties \(E=10920\) MPa and \(\nu =0.3\). This plate is subjected to a uniformly distributed transverse pressure of intensity 1MN applied on its top surface. The plate is clamped at its four edges. This example was drawn out from the reference work by Ferreira in [43].

Fig. 16
figure 16

Sketch of the plate

In order to carry out a comprehensive comparison between the performance of the proposed algorithm and that of ABAQUS based on Mindlin plate theory, a group of six meshes of type 4-node elements(QUA4 or Q4) was created in ABAQUS and exported in MATLAB as topological inputs. The properties of the created meshes are summarized in Table 6.

Table 6 Set of 4-node quadrilateral element meshes

Firstly, mesh 1 was selected for verification of the vectorized algorithm in Appendix 1 based on the results obtained in ABAQUS. Then, simulations were conducted on the same computer specified in Sect. 4.1. Figure 17 visualizes the comparison of the components of the solution vector \(d=(\theta _x,\theta _y,w=U_z)\) associated with Mindlin plate theory formulation for the mesh 1.

Fig. 17
figure 17

Static result of the deformation of the plate: in ABAQUS/CAE (on the left-hand side) and in MATLAB with proposed algorithm (on the right-hand side)

For simplicity in this example, a shear correction factor of \(k=k_x=k_y=\frac{5}{6}\) was considered for implementation in MATLAB. It can be seen in view of both results that the relative error is less that 1\(\%\). Furthermore, the displacement \(U_z=-1.503\ mm\) at the center of the plate is consistent with the one obtained by Ferreira in [43] for the same value of k.

Under the same boundary conditions, we have conducted a modal analysis of the plate (with mesh 2) based on the vectorized form of the Mindlin plate formulation outlined in Eq. (3.26). Table 7 provides a comparison of the six smallest natural frequencies obtained using ABAQUS and our MATLAB algorithm, respectively.

Table 7 Comparison of obtained frequencies

As of the result of the comparison of data in Table 7, it is obvious that the error is still less than \(1\%\). In order to perform modal analysis, the parameter AnalysisType is set to “modal” using the command line FE_model.AnalysisType=“modal” under the “Material parameters, analysis parameters and applied load” in the main file MainProgram. Focusing our attention on the six smallest modes, we obtained the mode shapes depicted in Fig. 18 and generated using ModeShape.m file created for this specific task.

Fig. 18
figure 18

Visualization of the six smallest mode shapes of the square plate

As in the previous example, we have analyzed the performance of the static problem with the meshes shown in Table 6. Figure 19 depicts the comparison of the computational cost.

Fig. 19
figure 19

Runtime performance analysis of the thick square plate: a CPU time for assembling \(K_g\) as per the Standard and proposed algorithm; b Total CPU time of the analysis (global matrix/force construction + solving the system of the equations)

Fig. 20
figure 20

Sketch of the FG membrane

In view of Fig. 19, it also turns out that in this example, matrix/force construction time is reduced by at least 10 times using the proposed method. Thus, the whole FEM process is accelerated by about 5 times compared to ABAQUS.

4.4 Example 4: functionally graded membrane

In this example, we assess the performance of the proposed method for solving a functionally graded membrane problem. For simplicity, we consider the example investigated by Martínez-Pañeda [44], with dimension sketched in Fig. 20.

The material’s Poisson ratio is set as a constant value \(\nu =0.30\). Meanwhile, Young’s modulus E changes in the x-direction as per the function defined in Eq. (4.1).

$$\begin{aligned} E\left( x\right) =E_0{e^{{\beta }x}} \end{aligned}$$
(4.1)

where \(\beta =ln\left( 8\right) \) and \(E_0=1\). Also, this membrane is \(1\ m\) thick and loaded with a uniform tensile stress of \(\sigma _0=2\) MPa applied on the top surface of the membrane.

Fig. 21
figure 21

Exponential variation of Young’s modulus E along x-axis: a Plotting of E versus x; b Distribution of the E within the functionally graded membrane

Table 8 Set of 4-node quadrilateral element meshes

The analytical solution for this problem is given by

$$\begin{aligned} U_x\left( x,y\right) =-\nu \left( \frac{A}{2}x^2+B\right) -\frac{A}{2}y^2;\ U_y\left( x,y\right) = \left( Ax+B\right) y;\ E_0{e^{{\beta }x}} \end{aligned}$$
(4.2)

where A and B read as

$$\begin{aligned} A=\frac{\beta {N}}{2E_0}\frac{\beta ^2e^{\beta }-2{\beta }e^{\beta }+\beta ^2+2\beta }{\beta ^2e^{\beta }-e^{2\beta }+2e^{\beta }-1};\quad B=\frac{\beta {N}}{2E_0}\frac{e^{\beta } \left[ e^{\beta }\left( -{\beta }^2+3\beta -4\right) +{\beta }^2-2\beta +8\right] -\beta -4}{\left( e^{\beta }-1\right) \left( \beta ^2e^{\beta }-e^{2\beta }+2e^{\beta }-1\right) } \end{aligned}$$
(4.3)

A set of 6 \(P-1\) quadrangle meshes were created in ABAQUS as topological inputs for this analysis. Table 8 presents the size of these meshes.

Fig. 22
figure 22

Visualization of FG membrane deformation for a scaling factor of 0.0428 according to the analytical equation (left column), the proposed algorithm (middle column) and ABAQUS (right column)

Fig. 23
figure 23

Runtime performance analysis of the thick square plate: a CPU time for assembling \(K_g\) as per the Standard and proposed algorithm; b Total CPU time of the analysis (global matrix/force construction + solving the system of the equations)

Fig. 24
figure 24

Young’s modulus E as a function x

Fig. 25
figure 25

Components of the applied pressure \(f_s\): a representation of the function \(f_y\left( x,y\right) \); b representation of the function \(f_z\left( x,y\right) \)

In order to verify the accuracy of our algorithm, mesh 1 was selected and used for static analysis under the definition provided above. In ABAQUS, we utilized the so-called nodal/temperature-based definition of the young modulus following the x-axis. Using such an approach implies that the value of E at each Gauss integration point is calculated by numerical interpolation from the values of E at the element’s nodes. Whereas in the exact method, \(E_g\) at each Gauss point is directly computed from the analytical expression of E using the spatial position obtained in Eq. (3.12).

Simulations were conducted on the same computer specified in Sect. 4.1. Figure 22 depicts the deformations (\(U_x\) and \(U_y\)) obtained with the analytical equation, proposed algorithm and node/temperature-based approximation method in ABAQUS.

The result of the performance analysis conducted on the set of meshes above is depicted in Fig. 23.

We observe that this example demonstrates a similar performance to the previous ones. The proposed algorithm continues to be at least ten times quicker, and its application for finite element problem analysis results in a fivefold decrease in CPU time compared with ABAQUS performance.

Fig. 26
figure 26

Vertical displacement \(U_y\) of the beam

Fig. 27
figure 27

Visualization of the six smallest mode shapes of the square plate

4.5 Example 5: functionally graded 3D cantilever beam

While the previous example was dedicated to evaluating the performance of the proposed algorithm on a 2D functionally graded membrane, this example focuses on the behavior of the algorithm in the case of a 3D functionally graded problem. The mechanical properties of the material are defined with the Poisson ratio \(\nu \) equal to 0.3 and an exponentially decreasing Young’s modulus E with respect to x-direction as illustrated in Figure[dddddd] and given by

$$\begin{aligned} E\left( x\right) =E_0{e^{-{\beta }x}} \end{aligned}$$
(4.4)

where \(\beta =\displaystyle \frac{1}{2}ln\left( 8\right) \) and \(E_0=20000\) MPa. The density of the beam is \(\rho =500\) kg.m\(^{-3}\).

This functionally graded beam is non-symmetrically loaded with a complex pressure \(f_s=\left( 0,f_y,f_z\right) \) with components \(f_y=-1.32x\) and \(f_z=1.2xz^2\) depicted in Fig. 25.

Simulations were conducted on the same computer specified in Sect. 4.1. After the analysis, we obtained the deformation depicted in Fig. 26.

We have also performed the modal analysis of the beam and obtained mode shapes associated with the six smallest natural frequencies of the beam as shown in Fig. 27.

Finally, we focused our attention on the CPU time (see Fig. 28) for performing the static and modal analysis

Fig. 28
figure 28

Performance analysis for static and modal analysis; a CPU time taken to compute the matrix; b CPU time taken to compute the matrix and solve the system of equations

It should be noted that in the proposed function in the appendix, only the stiffness matrix is computed in the case of static analysis. In modal analysis, both the stiffness matrix and mass matrix are calculated. We can see here that combining both enables us to reduce the computational cost, though it might not be a good idea when the available memory is a limiting factor.

5 Conclusion

We have proposed in this work, an efficient MATLAB-based vectorized algorithm for computing global matrix/force deriving from the discretized weak form of boundary value problems in linear elasticity. This scheme introduces a set of new features, which are detailed in section 3. By leveraging the power of vectorization and other optimizations, the algorithm can efficiently handle large-scale problems and deliver accurate results as verified in the numerical experiments in sections 4.1, 4.3, 4.4 and 4.5. Additionally, the performance analysis has revealed that the proposed method is at least ten times faster than the standard approach, and its utilization leads to a decrease of the runtime by at least five times compared to ABAQUS. This is because the approach presented builds on existing vectorized methods in the literature and improves on them in several important ways, including the use of advanced programming techniques, the integration of the latest MATLAB built-in functions to improve computational performance, generalization to meshes with elements of any order and type, and full vectorization of the construction of row indices, column indices and matrix/vector components. Moreover, an extension of the vectorized algorithm to special problems, such as Mindlin plate theory and functionally graded materials, was derived. Alongside its improved performance, it has been shown that this algorithm requires almost the same amount of memory as the standard algorithm, as shown by the performance analysis carried out in sections 4.2.1 and 4.2.2. Even though the proposed scheme requires more memory than that of Cuvelier et al. [15], a reference in the literature, it is at least 1.5x (with sparse) and 2.5x (with fsparse) faster than the latter, making it a game-changer in a massive simulation.

Overall, this work represents a significant contribution to the field of linear computational mechanics and has important implications for the design of structures and materials in various engineering and scientific applications.

Using the findings of this work, the following aspects will be investigated in forthcoming research:

  • Enhancement of memory usage;

  • Extension to nonlinear elasticity, J2 plasticity and contact problems;

  • Implementation on graphics processing units (GPU).

  • Development of an all-in-one fully vectorized MATLAB package intended for practical teaching of FEM to undergraduate and graduate students.