1 Introduction

The three-dimensional analysis is a standard technique for structural mechanics. However, it has yet to be used in the design of large-scale structures, safety assessment, and maintenance. Instead, frame and lumped mass models have long been used owing to historical background, insufficient computer power, and computational techniques. However, in recent years, three-dimensional analyses have been gradually but widely used for nuclear power plants (NPPs) [1], high-rise buildings [2], and bridges [3]. For example, Japan’s practice standard for the seismic probabilistic risk assessment (PRA) of NPPs requires a detailed evaluation of the damage limits of buildings and structures using three-dimensional seismic response analysis [4]. In a study of high-rise buildings [2], three-dimensional elastoplastic seismic response damage analysis has been conducted with as many as 74 million degrees-of-freedom (DOFs), which represents a level of detail and complexity of damage that is impossible to evaluate detailed damage through the frame and lumped mass analysis.

We need to shorten the calculation time and stability when performing three-dimensional analyses. The present study considers these two properties in the development of our solver.

It is widely known that small eigenvalues close to zero hamper the convergence of iterative methods, mainly because the condition number increases with a decrease in the smallest eigenvalue. Many studies have focused on avoiding eigenvalues close to zero [5,6,7,8]. The deflation algorithm mainly aims to eliminate eigenvectors corresponding to eigenvalues close to zero [7,8,9,10,11,12,13,14].

In the case of structural problems discretized by the finite element analysis (FEA), the stiffness matrices \(\varvec{A}\) are symmetric positive semi-definite. There are three parallel translations and three infinitesimal rotations, a total of six rigid-body motions, which implies that \(\varvec{A}\) duplicates the six zero eigenvalues. We must, therefore, impose at least six constraint conditions to eliminate the zero eigenvalues. Let \(\varvec{\bar{A}}\) be the stiffness matrix after imposing the constraint conditions as will be discussed in Sect. 3. The eigenvalues of \(\varvec{A}\) and \(\varvec{\bar{A}}\) are then different from each other. However, the eigenvalues of these two matrices have the so-called interlacing property [15]. The rigid-body mode is widely used in structural analysis for different objectives [10, 16,17,18,19,20, 22, 23].

We assume that the analysis models are constructed with solid elements. The rigid-body motions, which are the bases of \(\textrm{Ker}\varvec{A}\) (the kernel of \(\varvec{A}\)), relate to the lower or lowest eigenvalue(s) of \(\varvec{\bar{A}}\), owing to the interlacing property. We can construct the near rigid-body modes using domain decomposition in which the subdomains have six rigid-body modes. We can set a basis of a coarse space by aggregating the subdomains through a partition of unity. We use the above-noted six rigid-body motions to simulate low-frequency motions described in the literature [22, 24].

This paper discusses the distance between the eigenvectors of \(\varvec{\bar{A}}\) and \(\textrm{Ker}\varvec{A}\). The distance describes how the lower-frequency modes are segregated from the higher modes.

In our analysis, we use the parallelized conjugate gradient (CG) method as our basic framework on the entire domain instead of the simplest domain decomposition method (DDM) [25], which means that we do not apply a direct method to the decomposed subdomains in our method. The performance of the DDM and CG methods are compared in the next section, showing the reason for using the CG method.

Our solver uses two sets of domain decomposition. One decomposition is for parallelizing the CG method, and the other is for generating the coarse space. The former corresponds to the subdomains of the DDM, and the latter generates the projection in the deflation algorithm.

Near-rigid-body motions generate a projection from the entire space to the coarse space. The projection space is so small that the reduced equation can be solved using direct methods. In contrast, we can apply the CG method to the large complementary space, which is much easier to solve than the entire space containing “rough” components generated by lower-frequency modes. Given the direct sum decomposition of the solution space, our method is a two-level method [13, 26, 27].

Our definition of the projection constructs by the coarse motion, which is in contrast with definitions adopted in many studies (e.g., [7,8,9,10,11,12,13,14]), which will be described in Sect. 5.2.

The deflated domain decomposition method (DDDM) algorithm, which we describe in this paper, shares similarities with the algorithm given in [7] that expands the Krylov subspace, adding approximated eigenvectors corresponding to eigenvalues close to zero using the orthogonality of the residuals, which we describe in Sect. 6.

The DDDM solver outperforms a successive symmetric overrelaxation (SSOR) preconditioned solver, which we assume is a standard solver, as will be discussed in Sect. 6.3. Also, in an article [28], an elastic seismic response analysis of an NPP building installed on the ground consisting of one million DOFs with hexahedral and plate elements involving 2700 steps for 54 s took 1.1 h using the DDDM solver. In contrast, the SSOR solver took 13.8 h. We used sixteen parallel processes in both cases. The stability of the DDDM solver compared to the SSOR solver concerning the tolerance range of the convergence was also discussed in this article.

2 Basic framework of the linear solver

2.1 Displacement-based finite element equations

We outline the FEA method used in the paper. Only a static equilibrium state is assumed. We start from the principle of virtual work. We omit the commonly used isoparametric discretization here. Refer to the detailed discussion in [21].

(a) Discretization

We only consider the solid elements. We approximate the target body as an assembly of a finite number of discrete finite elements, e.g., tetrahedral, hexahedral, or other elements. Each element m has nodal points xyz in a local coordinate system. Let the number of DOFs be n.

We use indicial notation and summation convention. \(x_i\) denotes the coordinate axis, where \(i = 1, 2, 3\) in the three-dimensional analysis given to each nodal point and \(u_i\) denote the displacement components, where \(i = 1, 2, 3\).

(b) Principle of Virtual Work

We take virtual displacements \(\bar{u}_i\) and take the corresponding virtual strains \(\bar{\epsilon }_{ij}\), differentiating \(\bar{u}_i\). We then have the following principle of virtual displacements, which is the fundamental equation for the equilibrium of a general three-dimensional body:

$$\begin{aligned} \int _\Omega \tau _{ij}\bar{\epsilon }_{ij} d\Omega =\int _\Omega X_i \bar{u}_i d\Omega + \int _S Y_i \bar{u}_i^{(S)}\textrm{d}S, \end{aligned}$$
(1)

where \(X_i\) is the components of the body force \(\varvec{X}\), and \(Y_i\) is the components of the surface traction \(\varvec{Y}\) on the surface S of the body. \(u_i^{(S)}\) is the displacements on the surface S. The displacement (essential) boundary conditions are given by

$$\begin{aligned} u_i=u_i^{(S)}. \end{aligned}$$
(2)

The natural boundary condition is given by

$$\begin{aligned} \tau _{ij}\varvec{n}_j=Y_i, \end{aligned}$$
(3)

on the surface S, where \(\varvec{n}_j\) is the unit normal vector to the surface S. The left side of (1) corresponds to the internal virtual work, and the right side represents the external virtual work.

(c) Finite element equations

Let \(\hat{\varvec{u}}\) be the unknown vector of all the global displacements, where the unknown displacements \(u_i, i = 1, 2, 3\) of the nodal points are globally aligned.

$$\begin{aligned} \hat{\varvec{u}}^T=(\hat{u}_1~\hat{u}_2~\hat{u}_3~\cdots ~\hat{u}_n). \end{aligned}$$
(4)

Let \(\varvec{N}^{(m)}\) be a displacement interpolation matrix and \(\varvec{B}^{(m)}\) be the strain–displacement matrix for element m. We write

$$\begin{aligned} \begin{array}{c} \varvec{u}(x,y,z)=\varvec{N}^{(m)}(x,y,z)\hat{\varvec{u}},\\ \varvec{\epsilon }(x,y,z)=\varvec{B}^{(m)}(x,y,z)\hat{\varvec{u}}. \end{array} \end{aligned}$$
(5)

We assume the virtual displacements and strains are

$$\begin{aligned} \begin{array}{c} \bar{\varvec{u}}(x,y,z)=\varvec{N}^{(m)}(x,y,z)\bar{\hat{\varvec{u}}},\\ \bar{\varvec{\epsilon }}(x,y,z)=\varvec{B}^{(m)}(x,y,z)\bar{\hat{\varvec{u}}}.\\ \end{array} \end{aligned}$$
(6)

We further need a relation between \(\varvec{\epsilon }\) and \(\varvec{\tau }\)

$$\begin{aligned} \varvec{\tau }^{(m)}(x,y,z)=\varvec{D}^{(m)}(x,y,z){\varvec{\epsilon }}^{(m)}(x,y,z), \end{aligned}$$
(7)

where \(\varvec{D}^{(m)}\) is the elasticity matrix of element m. Substituting \(\varvec{u}^{(m)}\), \(\varvec{\epsilon }^{(m)}\), \(\bar{\varvec{u}}^{(m)}\), \(\bar{\varvec{\epsilon }}^{(m)}\), and \(\varvec{\tau }^{(m)}\) into (1), we have a summation form of the virtual equation for the unknown displacements:

$$\begin{aligned} \begin{aligned}&\bar{\hat{\textbf{u}}}^T\left( \sum _m\int _{\Omega ^{(m)}}{\varvec{B}^{(m)}}^T{\varvec{D}}^{(m)}{\varvec{B}}^{(m)}d\Omega \right) {\hat{\varvec{u}}}\\&=\bar{\hat{\textbf{u}}}^T\left( \sum _m\int _{\Omega ^{(m)}}{\varvec{N}^{(m)}}^T\varvec{X}^{(m)}d\Omega ^{(m)} +\sum _m\int _{S^{(m)}} {\varvec{N}^{(m)}}^T\varvec{Y}^{(m)}dS^{(m)}\right) . \end{aligned} \end{aligned}$$
(8)

This equation leads to the finite element equation for the displacements \(\hat{\varvec{u}}\) of DOFs of the discretized entire body:

$$\begin{aligned} \varvec{K}\hat{\varvec{u}}=\varvec{F}. \end{aligned}$$
(9)

The second term of (8) on the right side corresponds to the constraint condition of the linear equation, which we will describe in the following sections, ensuring this equation is solvable.

As noted above, the explanation here describes the static equilibrium state. However, we can describe the dynamic or nonlinear dynamic state similarly. The corresponding dynamic equation is:

$$\begin{aligned} \varvec{M}\ddot{\hat{\varvec{u}}}+\varvec{C}\dot{\hat{\varvec{u}}}+\varvec{K}\hat{\varvec{u}}=\varvec{F}, \end{aligned}$$
(10)

where \(\varvec{C}\) is a damping matrix. In many cases of the FEA, the dynamic equation is solved using, e.g., Newmark’s \(\beta \) method, which converts the dynamic equation to the static equation in the form of (9). The nonlinear equation is linearized by, e.g., Newton’s method and is reduced to the linear equations.

We use the symbol \(\varvec{A}\) for the stiffness matrix corresponding to \(\varvec{K}\) in (9), and we focus on the linear equation in this paper:

$$\begin{aligned} \varvec{A}\varvec{x}=\varvec{b},\quad \varvec{A}\in \textbf{R}^{n\times n},\quad \varvec{b}\in \textbf{R}^n,\quad \varvec{x}\in \textbf{R}^n. \end{aligned}$$
(11)

2.2 Comparison of the DDM and CG method

As noted in Sect. 1, In our DDDM algorithm, we use the CG method on almost all the solution space instead of the DDM. The DDM refers to the simplest iterative substructuring method using a domain decomposition, which overlaps only with the boundaries of the neighboring subdomains. The direct methods are applied on each subdomain inside the boundaries with the displacement boundary condition on the boundaries. It is not easy to compare the performance of the DDM with that of the CG method, even though the two methods were compared in previous work [29], wherein the advantages of the CG method were demonstrated under some conditions. However, the performances depend on the models, elements, boundary conditions, multi-point constraint (MPC)s, materials, and especially the sparsity of the stiffness matrix. We compare the two methods through simple static analysis using simple models.

Note that the results in this section do not have generality. The objective of this section is to show that our restricted but simple and standard examples have better performances than those using the DDM and the reason we take the CG method on the entire space. Significantly, the simple structure of the CG method helps the DDDM algorithm.

We used the open-source structural analysis code, Adventure [30], to compare the DDM and CG methods. Adventure has options of (a) the simplest DDM with the diagonal scaling, (b) the balancing domain decomposition (BDD) method, and (c) the parallel CG method in its simplest form with the diagonal scaling. Here, we used options (a) and (c). The CG method is parallelized, but only a single process was assumed. The CG method used the same code in the DDM, which could thus be fairly compared with the DDM. The computer used was a cluster computer with an Intel Xeon Platinum 9242, operating at 2.3 GHz, six nodes with 96 core processors per node, 384 GB of RAM (16 GB DDR4-2933 \(\times \) 24), and Infiniband EDR networking capability (100 Gbps). We assume a single thread for all the following cases.

Fig. 1
figure 1

Thin-plate model with the dimensions of \(6\textrm{mm}\times 1\textrm{mm}\times 6\textrm{mm}\). We refer to it as the “\(6\times 1\times 6\)” model, used in comparing the performance of the CG method and DDM for cantilever-type dead weight analysis. We use this model in later sections in various dimensions

We use a plate model built by arranging hexahedral linear elements having dimensions of \(1\textrm{mm}\times 1\textrm{mm}\times 1\textrm{mm}\) as shown in Fig. 1. We conducted cantilever-type analyses as follows. The x, y, and z elements align in the directions of the x, y, and z-axes, respectively. We refer to this plate as “\(x\times y\times z\).” We use standard steel with Young’s modulus of 200 GPa and Poisson ratio of 0.3 as the material. The plate was rotated \(90^\circ \) around the x axis in the direction of \(-y\). This plate’s xy surface (\(z=0\)) was then constrained fully, and we applied a dead weight in the direction of \(-y\). y corresponds to the thickness of the plate, whereas z corresponds to the length of the plate.

The eight models had dimensions from \(50 \times 5 \times 50\) up to \(200 \times 10 \times 200\), as shown in Table 1.

We used up to 16 cores in the calculation. For each model, we conducted the analyses of the DDM with 2, 4, 8, and 16 processes with the same number of processor cores. The results were compared with those of the CG method without parallelization, resulting in five parallel cases. The elapsed times until convergence, measured in wall-clock time, are compared in Table 1. We set the tolerance value of the residual error for the convergence to \(1.0\times 10^{-7}\).

There are irregularities in Table 1 in the DDM analysis results. For example, the calculation time increases from two to four processes for the \(50\times 5\times 50\) and \(100\times 5\times 100\) models. The \(100\times 5\times 100\) model in Fig. 2 shows this state of irregularity.

Table 1 Performances of the CG method and DDM. The DDM is parallelized with 2, 4, 8, and 16 processes, whereas the CG method uses a single process. The single process CG method outperforms the DDM
Fig. 2
figure 2

Typical convergence curves of the \(100\times 5\times 100\) model. The single-process CG method is faster than the DDM with two, four, and eight processes

Fig. 3
figure 3

Typical analysis result—displacement of the \(150\times 5\times 150\) model. Color represents the displacement norms. We applied the deformation factor of 1000

2.3 Basic framework of the linear solver

The previous section showed that the CG method can be a basic framework. The well-known solver algorithm FETI (Finite Element Tearing Interconnecting) [17, 18], BDD [19, 20] both depend on the DDM in the sense that the entire domain is decomposed into non-overlapping subdomains, a direct method is applied to each of subdomain, and the solution is divided into the rigid-body motion and the motion that includes the strain. The rigid-body motions of the subdomains are solved globally. The subdomains are taken to be structural objects. The FETI method takes the surface traction for the unknowns on the inner boundaries between the neighboring subdomains. In contrast, the BDD method takes the displacement between the boundaries of the neighboring subdomains. The CG method removes the gaps in the displacements between the boundaries in the FETI method and the gaps in the surface traction between the boundaries in the BDD method.

As noted in Sect. 1, our approach uses the CG method on the entire space in contrast to the DDM. Our solver uses two classes of domain decomposition. One parallelizes the CG method, whereas the other generates the coarse space based on the rigid-body motions. The number of subdomains for the parallel CG method corresponds to the number of parallel processes. The number of the subdomains used in the deflation algorithm should be much greater than the number used in the parallel CG method, from our experience.

In the FETI and BDD methods, the rigid-body motions are used to build the approximated motions of the subdomains in the iteration processes. Meanwhile, in our method, We use the rigid-body motions to construct lower-frequency modes.

2.4 DDDM: deflated domain decomposition method

The strategy of our algorithm is summarized as follows.

  1. (a)

    We apply the parallel CG method to the entire domain as the basic framework, and we decompose the entire domain into subdomains that correspond to the parallel processes. The domain decomposition is the “nodal-point” base decomposition.

  2. (b)

    Separately and independently from the domain decomposition in a), the entire domain is decomposed into some non-overlapping subdomains, which approximate lower-frequency modes (i.e., the coarse grid modes) using the rigid-body motions. The domain decomposition is the “element” base decomposition.

  3. (c)

    The deflation algorithm removes the lower-frequency components from the solution based on the domain decomposition b). As will be explained in Sect. 4, there are lower-frequency eigenvectors close to the kernel of the unconstrained stiffness matrix, and we construct the lower-frequency modes using the basis of the kernel. Removal of the lower-frequency modes gives rise to high-speed performance and stability of the solver.

  4. (d)

    In the parallel CG method, we distribute the coarse grid motion generation process into the parallel processes of the CG method.

3 Rewriting of the stiffness matrix

Let \(\varvec{A} \in \textbf{R}^{n\times n}\) be a stiffness matrix without constraint conditions, i.e., a symmetric semi-positive definite matrix. \(\varvec{A}\) includes six zero eigenvalues and six corresponding eigenvectors, namely rigid-body motions. Our problem is to solve

$$\begin{aligned} \varvec{A}\varvec{x}=\varvec{b}, \end{aligned}$$
(12)

by imposing necessary constraint conditions. Let \(r\ge 6\) be the number of constraint conditions. By relocating the DOF numbers of the constraint conditions to the upper position and taking the first r DOFs as the constraint conditions in ascending order, \(\varvec{A}\) is rewritten as:

$$\begin{aligned} \varvec{A}'=\left( \begin{array}{cc} \varvec{0}&{}\varvec{0}\\ \varvec{0}&{}\varvec{\bar{A}} \end{array} \right) \in \textbf{R}^{n\times n}. \end{aligned}$$
(13)

where \(\varvec{\bar{A}}\) is a symmetric positive definite matrix. We rewrite (12) in the form:

$$\begin{aligned} \varvec{A}'\varvec{x}'=\varvec{b}', \end{aligned}$$
(14)

whereFootnote 1

$$\begin{aligned} \varvec{x}'=\left( \begin{array}{c} \varvec{0}\\ x_{r+1}\\ \vdots \\ x_n \end{array} \right) \in \textbf{R}^n, \quad ~\varvec{b}'=\left( \begin{array}{c} \varvec{0}\\ b_{r+1}\\ \vdots \\ b_n \end{array} \right) \in \textbf{R}^n. \end{aligned}$$
(15)

We further rewrite the components \(x_{r+1},\dots ,x_n\) as \(x_1,\dots ,x_{n-r}\) and rewrite \(b_{r+1},\dots ,b_n\) as \(b_1,\dots ,b_{n-r}\). We thus write

$$\begin{aligned} \varvec{x}'=\left( \begin{array}{c} \varvec{0}\\ \varvec{\bar{x}} \end{array} \right) ,\quad \varvec{\bar{x}}=\left( \begin{array}{c} x_1\\ \vdots \\ x_{n-r} \end{array} \right) ;\quad \varvec{b}'=\left( \begin{array}{c} \varvec{0}\\ \varvec{\bar{b}} \end{array} \right) ,\quad \varvec{\bar{b}}=\left( \begin{array}{c} b_1\\ \vdots \\ b_{n-r} \end{array} \right) . \end{aligned}$$
(16)

The original constrained components \(x_1,\dots ,x_r\) are included in \(\bar{\varvec{b}}\) in the form of a product sum of \(\varvec{\bar{b}}\) and the first to r-th components of \(\varvec{A}\varvec{x}\) in (12). We identify \(\varvec{\bar{x}}\in \textbf{R}^{n-r}\) and \(\varvec{x}'\in \textbf{R}^n\) given above to evaluate the distance between \(\varvec{\bar{x}}\) and the kernel of \(\varvec{A}\) in the following sections. Our problem (12) is then rewritten as

$$\begin{aligned} \varvec{\bar{A}}\varvec{\bar{x}}=\varvec{\bar{b}}. \end{aligned}$$
(17)

These procedures give a structure of \(\textbf{R}^{n-r}\) as a subspace of \(\textbf{R}^n\).

Let

$$\begin{aligned} \lambda _1=\cdots = \lambda _6=0,\quad (0<)\,\lambda _{7}\le \cdots \le \lambda _n \end{aligned}$$
(18)

be the eigenvalues of \(\varvec{A}\), and let

$$\begin{aligned} (0<)\,\bar{\lambda }_1\le \cdots \le \bar{\lambda }_{n-r} \end{aligned}$$
(19)

be the eigenvalues of \(\varvec{\bar{A}}\).

The relationship of \(\lambda _i,~(1\le i< n)\) and \(\bar{\lambda }_i~(1\le i< n-r)\) is known from the so-called interlacing properties [15]:

$$\begin{aligned} \begin{array}{c} \lambda _1^{(0)}\le \lambda _1^{(1)}\le \lambda _2^{(0)}\le \lambda _2^{(1)}\le \cdots \le \lambda _{n-1}^{(0)}\le \lambda _{n-1}^{(1)}\le \lambda _n^{(0)},\\ \lambda _1^{(1)}\le \lambda _1^{(2)}\le \lambda _2^{(1)}\le \lambda _2^{(2)}\le \cdots \le \lambda _{n-2}^{(1)}\le \lambda _{n-2}^{(2)}\le \lambda _{n-1}^{(1)},\\ \cdots \cdots \\ \lambda _1^{(r-1)}\le \lambda _1^{(r)}\le \lambda _2^{(r-1)}\le \lambda _2^{(r)}\le \cdots \le \lambda _{n-r}^{(r-1)}\le \lambda _{n-r}^{(r)}\le \lambda _{n-r+1}^{(r-1)}, \end{array} \end{aligned}$$
(20)

where \(\lambda _1^{(0)},\cdots ,\lambda _n^{(0)}\) are the eigenvalues of \(\varvec{A}\), which are the alias names of \(\lambda ,\cdots ,\lambda _n\) shown in (18), and \(\lambda _1^{(r)},\cdots ,\lambda _{n-r}^{(r)}\) are those of \(\varvec{\bar{A}}\), which are the alias names shown in (19). From this relation, we obtain the following relationship of the intervals of the maximum and minimum of the eigenvalues with the increasing the number of constraint conditions:

$$\begin{aligned}{}[\lambda _1^{(0)},\lambda _n^{(0)}]\supset [\lambda _1^{(1)},\lambda _{n-1}^{(1)}]\supset \cdots \supset [\lambda _1^{(r)},\lambda _{n-r}^{(r)}], \end{aligned}$$
(21)

(21) shows that the existence ranges of the eigenvalues are monotonously decreasing, included in the range with the smaller number of constraints. Significantly, the range of the eigenvalues (19) are included in the range of the eigenvalues (18).

We let the eigenvectors corresponding to the eigenvalues (19) be

$$\begin{aligned} \varvec{\bar{x}}_1,\,\dots ,\,\varvec{\bar{x}}_{n-r}. \end{aligned}$$
(22)

The eigenvalues of \(\varvec{A}\) and \(\varvec{\bar{A}}\) are close in the sense of (21), which gives the fundamentals of considering the distance between the eigenvectors of \(\varvec{A}\) and the kernel of \(\varvec{\bar{A}}\).

4 Distance between the eigenvector and the kernel of the unconstrained stiffness matrix

4.1 Coarse grid matrix

Let \(\varvec{A}\in \textbf{R}^{n\times n}\) be the unconstrained stiffness matrix, as noted above. In this section, we discuss the distances between \(\textrm{Ker}\varvec{A}\) and the eigenvectors of \(\varvec{\bar{A}}\).

Let \(\Omega \) be the target of the analysis model and \((x_i,y_i,z_i),1\le i\le n\) be the nodal points of \(\Omega \). All the elements are assumed to be solid elements. According to descriptions given in the literature [22, 24], let

$$\begin{aligned} \varvec{\Phi } = \left( {\begin{array}{cccccc} 1 &{} 0 &{} 0 &{} 0 &{} z_1 &{} -y_1 \\ 0 &{} 1 &{} 0 &{} -z_1 &{} 0 &{} x_1 \\ 0 &{} 0 &{} 1 &{} y_1 &{} -x_1 &{} 0 \\ 1 &{} 0 &{} 0 &{} 0 &{} z_2 &{} -y_2 \\ 0 &{} 1 &{} 0 &{} -z_2 &{} 0 &{} x_2 \\ 0 &{} 0 &{} 1 &{} y_2 &{} -x_2 &{} 0 \\ {\cdots } &{}&{}&{}&{}&{} \\ 1 &{} 0 &{} 0 &{} 0 &{} z_{n/3} &{} -y_{n/3} \\ 0 &{} 1 &{} 0 &{} -z_{n/3} &{} 0 &{} x_{n/3} \\ 0 &{} 0 &{} 1 &{} y_{n/3} &{} -x_{n/3} &{} 0 \\ \end{array} } \right) \in \textbf{R}^{n\times 6}, \end{aligned}$$
(23)

where \((x_i,y_i,z_i)\) denotes the coordinates of nodal point i. Each column of \(\varvec{\Phi }\) is normalized. We refer to \(\varvec{\Phi }\) as a coarse-grid matrix.

The first three columns correspond to the parallel translations of the structure, and the last three are infinitesimal rotations around each coordinate. These six columns are independent and consist of a basis of \(\textrm{Ker}\varvec{A}\), which means that the six vectors are the basis of the rigid-body motions of \(\Omega \). We write these as \(\varvec{f}_1,\,\dots ,\,\varvec{f}_6\):

$$\begin{aligned} \varvec{\Phi } =(\varvec{f}_1~\varvec{f}_2~\varvec{f}_3~\varvec{f}_4~\varvec{f}_5~\varvec{f}_6)\in \textbf{R}^{n\times 6}. \end{aligned}$$
(24)

We replace these vectors’ first r rows with zero values, where these r rows correspond to the constraint condition. We write these as \(\varvec{\bar{f}}_1,\,\dots ,\,\varvec{\bar{f}}_6\). According to the rule described in Sect. 3, we identify \(\varvec{\bar{f}}_i\in \textbf{R}^{n-r}\) and \(\varvec{f}'\in \textbf{R}^n\):

$$\begin{aligned} \varvec{f}_i'=\left( \begin{array}{c} \varvec{0}\\ \varvec{\bar{f}}_i \end{array} \right) . \end{aligned}$$
(25)

\(\varvec{\Phi }\) is rewritten as \(\varvec{\Phi }'\in \textbf{R}^{n\times 6}\) or \(\varvec{\bar{\Phi }}\in \textbf{R}^{(n-r)\times 6}\) corresponding to (14) or (17):

$$\begin{aligned} \varvec{\Phi }'= & {} (\varvec{f}'_1~\cdots ~\varvec{f}'_6)=\left( \begin{array}{cccccc} 0 &{} 0 &{} 0 &{} 0 &{} 0 &{} 0\\ {\cdots } &{}&{}&{}&{}&{} \\ 0 &{} 0 &{} 0 &{} 0 &{} 0 &{} 0\\ 1 &{} 0 &{} 0 &{} 0 &{} z_1 &{} -y_1\\ 0 &{} 1 &{} 0 &{} -z_1 &{} 0 &{} x_1 \\ 0 &{} 0 &{} 1 &{} y_1 &{} -x_1 &{} 0 \\ {\cdots } &{}&{}&{}&{}&{} \\ 1 &{} 0 &{} 0 &{} 0 &{} z_{n/3-[r/3]} &{} -y_{n/3-[r/3]} \\ 0 &{} 1 &{} 0 &{} -z_{n/3-[r/3]} &{} 0 &{} x_{n/3-[r/3]} \\ 0 &{} 0 &{} 1 &{} y_{n/3-[r/3]} &{} -x_{n/3-[r/3]} &{} 0 \\ \end{array} \right) , \end{aligned}$$
(26)
$$\begin{aligned} \varvec{\bar{\Phi }}= & {} (\varvec{\bar{f}}_1~\cdots ~\varvec{\bar{f}}_6)=\left( \begin{array}{cccccc} 1 &{} 0 &{} 0 &{} 0 &{} z_1 &{} -y_1\\ 0 &{} 1 &{} 0 &{} -z_1 &{} 0 &{} x_1 \\ 0 &{} 0 &{} 1 &{} y_1 &{} -x_1 &{} 0 \\ {\cdots } &{}&{}&{}&{}&{} \\ 1 &{} 0 &{} 0 &{} 0 &{} z_{n/3-[r/3]} &{} -y_{n/3-[r/3]} \\ 0 &{} 1 &{} 0 &{} -z_{n/3-[r/3]} &{} 0 &{} x_{n/3-[r/3]} \\ 0 &{} 0 &{} 1 &{} y_{n/3-[r/3]} &{} -x_{n/3-[r/3]} &{} 0 \\ \end{array} \right) , \end{aligned}$$
(27)

where [r/3] represents the quotient r/3. In the expression for \(\varvec{\Phi }'\), the first r rows are zero matrix in \(\textbf{R}^{r\times 6}\), and in \(\varvec{\bar{\Phi }}\), these r rows are excluded. Although we assume that the active elements in both (26) and (27) start from the row \((1~~ 0~~ 0~~ 0~~ z_1~~-y_1)\) for simplicity, the expression of the first three rows depends on whether r is a multiple of 3 or not. In other words, although n is a multiple of 3, r is not necessarily a multiple of 3. We also refer to \(\varvec{\Phi }'\) and \(\varvec{\bar{\Phi }}\) as the coarse grid matrices, as we refer to \(\varvec{\Phi }\) defined by (23).

In the following, we assume the form of the equation for our problem to be (17), and accordingly, the form of the coarse grid matrix is taken to be (27); i.e., \(\varvec{\bar{\Phi }}\).

4.2 Distance between the eigenvector and the kernel of the unconstrained stiffness matrix

In this section, we consider the \(L^2\) distance between the eigenvector \(\bar{\varvec{x}}\) of \(\varvec{\bar{A}}\), which we identify with \(\varvec{x}'\), and \(\textrm{Ker}\varvec{A}\).

Let U be a subspace of \(\textbf{R}^n\). The distance between a point \(\varvec{x}\in \textbf{R}^n\) and U is then the distance between \(\varvec{x}\) and the closest point of U. We denote this distance as

$$\begin{aligned} \textrm{dist}\,(\varvec{x},U)=\min _{\varvec{y}\in U} \Vert \varvec{x}-\varvec{y}\Vert _2. \end{aligned}$$
(28)

Let \(\varvec{P}\) be an orthogonal projection from \(\textbf{R}^n\) to U. This equation is equal to

$$\begin{aligned} \textrm{dist}\,(\varvec{x},U)=\Vert \varvec{x}-\varvec{P}\varvec{x}\Vert _2. \end{aligned}$$
(29)

In particular, if \(\Vert \varvec{\varvec{x}}\Vert _2=1\), then

$$\begin{aligned} \Vert \varvec{x}-\varvec{P} \varvec{x}\Vert _2^2=1-\varvec{x}^T\varvec{P}\varvec{x},\quad 0\le \textrm{dist}\,(\varvec{x},\,U) \le 1. \end{aligned}$$
(30)

The projection from \(\textbf{R}^n\) to \(\textrm{Ker}\varvec{A}\) is given by

$$\begin{aligned} \varvec{P}=\varvec{\Phi }(\varvec{\Phi }^T \varvec{\Phi })^{-1}\varvec{\Phi }^T=\varvec{\Phi }\varvec{\Phi }^T \in \textbf{R}^{n\times n}, \end{aligned}$$
(31)

where \(\Phi \) is defined by (23).

In the following, we evaluate the distance between the eigenvector \(\bar{\varvec{x}}\in \textbf{R}^{n-r}\) of \(\varvec{\bar{A}}\) and \(\textrm{Ker} \varvec{\varvec{A}}\). According to the rule described in Sect. 3, we identify \(\bar{\varvec{x}}\in \textbf{R}^{n-r}\) and \(\varvec{x}'\in \textbf{R}^n\). We consider the distance to be

$$\begin{aligned} \textrm{dist}(\bar{\varvec{x}},\,\textrm{Ker} \varvec{\varvec{A}})=\Vert \varvec{x}'-\varvec{P} \varvec{x}'\Vert _2. \end{aligned}$$
(32)

Additionally, we discuss, in the following theorem, the angle (minimal angle) between a point \(\varvec{x}\in \textbf{R}^n\) with the unit norm and a subspace U, which is defined as in [31]

$$\begin{aligned} \theta =\arccos \max _{\varvec{u}\in U,~\Vert \varvec{u}\Vert _2=1} {\varvec{x}}^T \varvec{u}. \end{aligned}$$
(33)

Theorem 1

(Distance between the eigenvector and the kernel of the unconstrained stiffness matrix)

Let the eigenvalues and the corresponding eigenvectors of \(\varvec{\bar{A}}\) be given by (19) and (22), respectively. We then have the following properties.

  1. (1)

    For arbitrary \(1\le k\le n-r\), \(\varvec{x}_k'\notin \textrm{Ker}\varvec{A}\), and accordingly,

    $$\begin{aligned} \textrm{dist}\,(\varvec{x}_k',\,\textrm{Ker}\varvec{A})>0. \end{aligned}$$
    (34)
  2. (2)

    If \(k<n-r\) is large enough, then \(\varvec{x}_k'\) is nearly orthogonal to \(\textrm{Ker}\varvec{A}\), and accordingly,

    $$\begin{aligned} \textrm{dist}\,(\varvec{x}_k',\,\textrm{Ker}\, \varvec{A})\approx 1. \end{aligned}$$
    (35)
  3. (3)

    If \(k\ge 1\) is small enough, then

    $$\begin{aligned} \textrm{dist}\,(\varvec{x}_k',\,\textrm{Ker}\varvec{A})\approx 0. \end{aligned}$$
    (36)

    In particular, if \(\varvec{x}_k'\) is the fundamental mode \(\varvec{x}_1'\), then \(\textrm{dist}\,(\varvec{x}_1',\,\textrm{Ker} \varvec{A})\) takes the smallest value.

Proof

As noted previously, the number of the eigenvalues and eigenvectors is assumed to be \(1\le k\le n-r\) imposing the r constrained conditions.

We prove 1). If \(\varvec{A}\varvec{x}_k'=\varvec{0}\), then \(\varvec{x}_k'\) is the rigid-body mode. The r zeros put above \(\bar{\varvec{x}}_k\) cannot be represented by the rigid-body motions. This proves (34).

We prove 2) and 3). It is known from modal analysis that the modal displacement components of a structural body are dominant for the lower frequency range, and the whole body vibrates largely, whereas, in the range of higher frequencies, vibrations become minute in proportion to the frequencies and depending on the shape of the body.

Take an arbitrary \(1\le k\le n-r\). Referring to (33), the angle (minimal angle) between \(\varvec{x}_k'\) and \(\textrm{Ker}\varvec{A}\) is given by

$$\begin{aligned} \theta =\arccos \max _{1\le j\le 6} {\varvec{x}_k'}^T \varvec{g}_j, \end{aligned}$$
(37)

where \(\varvec{g}_1,\cdots ,\varvec{g}_6\) are the appropriate basis of \(\textrm{Ker}\varvec{A}\).

We write the components of the i-th nodal point of \(\varvec{x}_k'\) as \(\{x_{ki}'~y_{ki}'~z_{ki}'\}\). The first r components of \(\varvec{x}_k'\) are zeros, and \(\varvec{x}_k'\) can thus be written as

$$\begin{aligned}{} & {} \varvec{x}_k'=\frac{1}{\Vert \varvec{x}_k'\Vert }\left( 0~\cdots ~0~\{x_{k,[r/3]+1}'~y_{k,[r/3]+1}'~z_{k,[r/3]+1}'\}\right. \nonumber \\{} & {} \quad \left. ~\cdots ~ \{x_{k,{n/3-[r/3]}}'~y_{k,{n/3-[r/3]}}'~z_{k,{n/3-[r/3]}}'\} \right) ^T. \end{aligned}$$
(38)

Although \(\varvec{g}_i\) in (37) does not coincide with \(\varvec{f}_i\) in general, by representing appropriate \(\varvec{g}_i\) as a linear combination of \(\varvec{f}_i\), we can take \(\varvec{f}_i\) instead of \(\varvec{g}_i\).

The representations of \(\varvec{f}_1,\varvec{f}_2,\varvec{f}_3\) are easily obtained according to their forms. \(\varvec{f}_4,\varvec{f}_5,\varvec{f}_6\) are given as

$$\begin{aligned} \begin{array}{l} \varvec{f}_4=\displaystyle {\frac{1}{\Vert \varvec{f}_4\Vert }}\left( \{0~~-z_1~~y_1\}~\cdots ~\{0~~-z_{[r/3]}~~y_{[r/3]}\}~~\{0~~-z_{[r/3]+1}~~y_{[r/3]+1}\}\right. \\ \hspace{50mm}\left. ~\cdots ~\{0~~-z_{n/3-[r/3]}~~y_{n/3-[r/3]}\}\right) ^T,\\ \varvec{f}_5=\displaystyle {\frac{1}{\Vert \varvec{f}_5\Vert }}\left( \{z_1~~0~~-x_1\}~\cdots ~\{z_{[r/3]}~~0~~-x_{[r/3]}\}~~\{z_{[r/3]+1}~~0~~-x_{[r/3]+1}\}\right. \\ \hspace{50mm}\left. ~\cdots ~\{z_{n/3-[r/3]}~~0~~-x_{n/3-[r/3]}\}\right) ^T,\\ \varvec{f}_6=\displaystyle {\frac{1}{\Vert \varvec{f}_6\Vert }}\left( \{-y_1~~x_1~~0\}~\cdots ~\{-y_{[r/3]}~~x_{[r/3]}~~0\}~~\{-y_{[r/3]+1]}~~x_{[r/3]+1]}~~0\}\}\right. \\ \hspace{50mm}\left. ~\cdots ~\{-y_{n/3-[r/3]}~~x_{n/3-[r/3]}~~0\} \right) ^T. \end{array} \end{aligned}$$
(39)

Assuming that all the vectors are normalized, the inner products of \(\varvec{x}_k'\) and \(\varvec{f}_1,\,\dots ,\,\varvec{f}_6\) are

$$\begin{aligned} \begin{array}{c} {\varvec{x}_k'}^T \varvec{f}_1=\displaystyle {\sum _{i=1}^{n/3-[r/3]}} x_{ki}',\\ {\varvec{x}_k'}^T \varvec{f}_2=\displaystyle {\sum _{i=1}^{n/3-[r/3]}} y_{ki}',\\ {\varvec{x}_k'}^T \varvec{f}_3=\displaystyle {\sum _{i=1}^{n/3-[r/3]}} z_{ki}',\\ {\varvec{x}_k'}^T \varvec{f}_4=\displaystyle {\sum _{i=1}^{n/3-[r/3]}}(0~-z_{k,[r/3]+i}~~y_{k,[r/3]+i}) \left( \begin{array}{c} x_{k,[r/3]+i}'\\ y_{k,[r/3]+i}'\\ z_{k,[r/3]+i}'\\ \end{array} \right) ,\\ {x_k'}^T \varvec{f}_5=\displaystyle {\sum _{i=1}^{n/3-[r/3]}}(z_{k,[r/3]+i}~0~-x_{k,[r/3]+i}) \left( \begin{array}{c} x_{k,[r/3]+i}'\\ y_{k,[r/3]+i}'\\ z_{k,[r/3]+i}'\\ \end{array} \right) ,\\ {\varvec{x}_k'}^T \varvec{f}_6=\displaystyle {\sum _{i=1}^{n/3-[r/3]}}(-y_{k,[r/3]+i}~~x_{k,[r/3]+i}~0) \left( \begin{array}{c} x_{k,[r/3]+i}'\\ y_{k,[r/3]+i}'\\ z_{k,[r/3]+i}'\\ \end{array} \right) . \end{array} \end{aligned}$$
(40)

We can evaluate the variation of the six inner products \({\varvec{x}_k'}^T \varvec{f}_i\) using these representations depending on k. The first three equations involve only the displacements, whereas the displacements are multiplied by the constant values of the coordinates of the nodal points in the last three equations. If \(1\le k\le n-r\) is sufficiently small, then the modal shape of the target model deforms largely from the static state. In particular, the deformation is largest for the fundamental mode with \(k=1\), which corresponds to the maximum of the six values of (40), and \(\textrm{dist}\,(\varvec{x}_1',\,\textrm{Ker}\varvec{A})\) is accordingly the smallest. This proves (36) and the last statement of property 3).

Meanwhile, if k becomes large, the vibration state of the model becomes minute, and the variations of the components of \(\varvec{x}_k'\) decrease. As a result, fixing \(1\le l \le n-r\) and taking sufficiently large \(k>l\), we have

$$\begin{aligned} \max _{1\le j\le 6} {\varvec{x}_l'}^T \varvec{f}_j \ge \max _{1\le j\le 6} {\varvec{x}_k'}^T \varvec{f}_j, \end{aligned}$$
(41)

and a larger k results in \(\varvec{x}_k'\) being closer to the direction orthogonal to \(\textrm{Ker} \varvec{A}\), which means \(\textrm{dist}(\varvec{x}_k',\,\textrm{Ker}\, \varvec{A})\) approaches 1. This proves (35).

4.3 Example of the distance between eigenvectors and the kernel of the unconstrained matrix

In this section, we present actual distance curves of the distance between eigenvectors and the kernel of the unconstrained matrix using simple examples.

We show three cuboid models in Fig. 4. On the left and at the center are \(1\times 1\times 6\) and \(4\times 3\times 12\) models with hexahedral linear elements, respectively. On the right is a model of the same size as the \(4\times 3\times 12\) model but with tetrahedral elements set to be linear and quadrilateral. Additionally, we include a \(2\times 2\times 12\) model with hexahedral linear elements in our testing. The constraint conditions are assumed to have 9, 12, and 18 DOFs for the \(1\times 1\times 6\), \(2\times 2\times 12\), and \(4\times 3\times 12\) models, respectively. The cuboid models are summarized in Table 2.

Moreover, we use long and short perforated plates and a pipe model. We present these three models in Fig. 5 and Table 3. We assume linear and quadrilateral hexahedral elements for these models.

We assume the physical properties to be those of standard steel. Young’s modulus and Poisson ratio are 200 GPa and 0.3, respectively.

Table 2 Cuboid models for evaluating the distance between the eigenvectors and the kernel. See Fig. 4
Table 3 Other models for evaluating the distance between the eigenvectors and the kernel. See Fig. 5
Fig. 4
figure 4

Cuboid models used to evaluate the distance between the eigenvectors and the kernel

Fig. 5
figure 5

Long, short perforated plates and pipe used to evaluate the distance. The long perforated plate model with the quadrilateral tetrahedral elements has the largest number of DOFs among our models, including other models (see Fig. 4 and Table 3)

Curves of the distance between the eigenvectors and the kernel for the cuboid models, those for the short perforated plate, those for the long perforated plate, and those for the pipe model are shown in Fig. 6, 7, 8, and 9, respectively.

The eigenvalues and the distance values up to \(k=15\) are given in Table 4 as examples; (a) corresponds to the cuboid \(1\times 1\times 6\) model with linear hexahedral elements, (b) corresponds to the cuboid \(4\times 3\times 12\) model with quadrilateral tetrahedral element, (c) corresponds to the long plate model with quadrilateral tetrahedral elements, and (d) corresponds to the pipe model with quadrilateral tetrahedral elements.

All the results are in accordance with (1), (2), and (3) described in Theorem 1.

Table 4 Example of the eigenvalues and the distances from the kernel up to \(k=15\)
Fig. 6
figure 6

Distance between the eigenvector and the kernel of the cuboid models. The distance is small at smaller eigenvalues and rapidly approaches a value of 1 at higher eigenvalues

Fig. 7
figure 7

Distance between the eigenvector and the kernel of the short perforated plate. The same tendency is seen in Fig. 6. A finer mesh decomposition reduces the distance between the first eigenvector and the kernel

Fig. 8
figure 8

Distance between the eigenvector and the kernel of the long perforated plate

Fig. 9
figure 9

Distance between the eigenvector and the kernel of the pipe model. In our examples, the pipe model with quadrilateral elements has the smallest first eigenvalue and the smallest distance between the eigenvector and the kernel

5 Introduction to the DDDM

We use solid elements for all the elements given in this section.

5.1 Expression of the coarse grid by domain decomposition

We decompose \(\Omega \) into N non-overlapping subdomains \(\Omega _J,1\le J\le N\), except that the boundaries between subdomains overlap one another. Each \(\Omega _J\) is a closed domain that includes boundary surfaces. We write \(\Omega \) as

$$\begin{aligned} \Omega =\bigcup _J \Omega _J. \end{aligned}$$
(42)

The domain decomposition here is “element-base” decomposition as opposed to the domain decomposition used for the parallelization is “nodal-point-base” decomposition which we describe in Sect. 7.1.

The domain decomposition is represented by diagonal matrices \(\varvec{\bar{D}}_J\in \textbf{R}^{(n-r)\times (n-r)}\) comprising \(d_{Ji}\):

$$\begin{aligned} \varvec{\bar{D}}_J=\textrm{diag}\,(0,\,\dots ,\,0,\,d_{J*},\,\dots ,\,d_{J*},\,0,\,\dots ,\,0), \end{aligned}$$
(43)
$$\begin{aligned} d_{Ji}= \left\{ \begin{array}{ll} 0&{}i\notin \Omega _J\\ 1&{}i\in \Omega _J^\circ \\ 1/\mathrm {(number~of~overlaps),}&{} \textrm{where}~ i~ \mathrm {is~a~ point~of~the~boundary} \end{array}, \right. \end{aligned}$$
(44)

where \(\Omega _J^\circ \) is the interior of the subdomain \(\Omega _J\). Representing i that appears in the component \(\varvec{\bar{D}}_J\) is complex, and thus in (43), an abbreviated expression \(d_{J*}\) is used. In (43), there are cases in which non-zero components leap and are not continuously aligned.

From the construction of \(\varvec{\bar{D}}_J\), the domain decomposition (42) corresponds to

$$\begin{aligned} \sum _{J=1}^N \varvec{\varvec{\bar{D}}}_J=I. \end{aligned}$$
(45)

This means that \(\{\varvec{\bar{D}}_J\}\) is a partition of unity on the space \(\textbf{R}^{n-r}\). We then have

$$\begin{aligned} \sum _{J=1}^N \varvec{\bar{D}}_J\varvec{\bar{\Phi }}=\varvec{\bar{\Phi }}. \end{aligned}$$
(46)

Let

$$\begin{aligned} \varvec{\bar{F}}_J=\varvec{\bar{D}}_J\varvec{\bar{\Phi }}\in \textbf{R}^{(n-r)\times 6},\quad 1\le J\le N. \end{aligned}$$
(47)

We align (47) and let

$$\begin{aligned} \varvec{\bar{F}}=(\varvec{\bar{F}}_1~~\cdots ~~\varvec{\bar{F}}_N)\in \textbf{R}^{(n-r)\times 6N}. \end{aligned}$$
(48)

We refer to this matrix as the extension matrix. We write \(W\equiv \textbf{R}^{6N}\) and \(V\equiv \textbf{R}^{n-r}\). W is a coarse space, whereas V is the global solution space. \(\varvec{\bar{F}}\) embeds W into V. The column vectors of \(\varvec{\bar{F}}_1,\cdots ,\varvec{\bar{F}}_N\) constitute a basis of W. Although W is not a subspace of V, \(\varvec{\bar{F}}W\) is a subspace. Since \(\varvec{\bar{F}}^T\) is of full rank, \(\textrm{Im}\varvec{\bar{F}}^T\) is isomorphic to \(W: \varvec{\bar{F}}^T V\cong W\).

5.2 Framework of the DDDM algorithm

We describe the DDDM algorithm as follows. Construct a coarse grid using the rigid-body modes \(\varvec{f}_1,\,\dots ,\,\varvec{f}_6\) of the original problem without a constraint condition; remove the corresponding low-frequency modes from the entire space V using the deflation algorithm; and apply the CG method to the segregated complementary space.

Let

$$\begin{aligned} \varvec{\bar{A}}_{\varvec{\bar{F}}}=\varvec{\bar{F}}^T\varvec{\bar{A}}\varvec{\bar{F}}\in \textbf{R}^{6N\times 6N}. \end{aligned}$$
(49)

We refer to this matrix as a contraction matrix. Because the size of this matrix is small, we can obtain the inverse matrix \(\varvec{\bar{A}}_{\varvec{\bar{F}}}^{-1}\) using a direct method or LU decomposition (lower-upper decomposition). Moreover, we extend \(\varvec{\bar{A}}_{\varvec{\bar{F}}}^{-1}\) onto V and express \((\varvec{\bar{A}}_{\varvec{F}}^{-1})^*\) as

$$\begin{aligned} (\varvec{\bar{A}}_{\varvec{\bar{F}}}^{-1})^*=\varvec{\bar{F}}\varvec{\bar{A}}_{\varvec{\bar{F}}}^{-1}\varvec{\bar{F}}^T=\varvec{\bar{F}}(\varvec{\bar{F}}^T\varvec{\bar{A}}\varvec{\bar{F}})^{-1}\varvec{\bar{F}}^T \in \textbf{R}^{(n-r)\times (n-r)}. \end{aligned}$$
(50)

We refer to this matrix as a pullback of \(\varvec{\bar{A}}\) under \(\varvec{\bar{F}}\). Figure 10 is a diagram of \(\varvec{\bar{A}}_{\varvec{\bar{F}}}\) and \((\varvec{\bar{A}}_{\varvec{\bar{F}}}^{-1})^*\).

Furthermore, let \(\varvec{\bar{P}}_{\varvec{\bar{A}}}\) be a matrix obtained by multiplying \(\varvec{\bar{A}}\) by the pullback from the right side:

$$\begin{aligned} \varvec{\bar{P}}_{\varvec{\bar{A}}}=(\varvec{\bar{A}}_{\varvec{\bar{F}}}^{-1})^*\varvec{\bar{A}}=\varvec{\bar{F}}(\varvec{\bar{F}}^T\varvec{\bar{A}}\varvec{\bar{F}})^{-1}\varvec{\bar{F}}^T\varvec{\bar{A}}\in \textbf{R}^{(n-r)\times (n-r)}. \end{aligned}$$
(51)

Proposition 2

(Contraction projection)

The following properties hold:

$$\begin{aligned} \varvec{\bar{P}}_{\varvec{\bar{A}}}^2=\varvec{\bar{P}}_{\varvec{\bar{A}}},\quad \varvec{\bar{A}}\varvec{\bar{P}}_{\varvec{\bar{A}}}=\varvec{\bar{P}}_{\varvec{\bar{A}}}^T\varvec{\bar{A}}. \end{aligned}$$
(52)

Therefore, \(\varvec{\bar{P}}_{\varvec{\bar{A}}}\) is a projection.

Proof

Easily shown.

We refer to \(\varvec{\bar{P}}_{\varvec{\bar{A}}}\) as a contraction projection obtained from the pullback of \(\varvec{\bar{A}}_{\varvec{\bar{F}}}^{-1}\), or simply a contraction projection. The image of V obtained by the contraction projection is called a contraction projection space, or simply a contraction space.

Remark 1

(Definition of the deflation projection)

In many articles (e.g., [7,8,9,10,11,12,13,14]), the deflation projection \(\varvec{\bar{P}}\) is defined as

$$\begin{aligned} \varvec{\bar{P}}=\varvec{I}-\varvec{\bar{A}}\varvec{\bar{F}}(\varvec{\bar{F}}^T\varvec{\bar{A}}\varvec{\bar{F}})^{-1}\varvec{\bar{F}}^T \end{aligned}$$
(53)

if using our notation.

  1. (a)

    The second term on the right side of (53) corresponds to (51), but in (51), the matrix \(\varvec{\bar{A}}\) is multiplied from the right side, whereas in (53), \(\varvec{\bar{A}}\) is multiplied from the left.

  2. (b)

    Additionally, the projection \(\varvec{\bar{P}}\) here is defined as the complementary projection of the contraction projection (in our sense).

  3. (c)

    Our definition (51) is needed as will be described in Remark 2 in Sect. 6.2 to reduce the calculation cost in each CG step.

Proposition 3

(Contraction space and the image of the extension matrix)

The image of the extension matrix coincides with the contraction projection space:

$$\begin{aligned} \varvec{\bar{F}}W=\varvec{\bar{P}}_{\varvec{\bar{A}}}V. \end{aligned}$$
(54)

Proof

Easily shown.

Because \(\varvec{\bar{F}} \in \textbf{R}^{(n-r)\times 6N}\) is of full rank and one-to-one, and Proposition 3 thus shows that \(\varvec{\bar{F}}\) is an isomorphism from the coarse space W onto the contraction space. We therefore refer to \(\varvec{\bar{F}}W\equiv \varvec{\bar{P}}_{\varvec{\bar{A}}}V\) as the coarse space like W.

Proposition 2 leads to the direct sum decomposition of V:

$$\begin{aligned} \begin{aligned} V&=\varvec{\bar{P}}_{\varvec{\bar{A}}}V\oplus (\varvec{I}-\varvec{\bar{P}}_{\varvec{\bar{A}}})V\\&=\{\varvec{\bar{x}}\in V\mid \varvec{\bar{P}}_{\varvec{\bar{A}}}\varvec{\bar{x}}=\varvec{\bar{x}}\,\}\oplus \{\varvec{\bar{x}}\in V\,\mid \,\varvec{\bar{P}}_{\varvec{\bar{A}}}\varvec{\bar{x}}=\varvec{0}\}. \end{aligned} \end{aligned}$$
(55)

Here, \(\varvec{\bar{P}}_{\varvec{\bar{A}}}V=\textrm{Im}\varvec{\bar{P}}_{\varvec{\bar{A}}}\) is a space spanned by the eigenvectors with the smaller eigenvalues including the lowest one. Meanwhile, \((\varvec{I}-\varvec{\bar{P}}_{\varvec{\bar{A}}})V=\textrm{Ker}\varvec{\bar{P}}_{\varvec{\bar{A}}}\) is a space obtained by eliminating those eigenvectors with the small eigenvalues. In other words, \(\varvec{\bar{P}}_{\varvec{\bar{A}}}V\) is a coarse space and \((\varvec{I}-\varvec{\bar{P}}_{\varvec{\bar{A}}})V\) is a fine space.

Fig. 10
figure 10

Restriction of \(\varvec{\bar{A}}\) to \(\varvec{\bar{A}}_{\varvec{F}}\) and the extension of \(\varvec{\bar{A}}_{\varvec{F}}^{-1}\) to its pullback

Theorem 4

(Restriction of the stiffness matrix to the contraction space)

The following three properties hold.

  1. (i)

    The restriction of \(\varvec{\bar{A}}^{-1}\) onto \(\varvec{P}_{\varvec{\bar{A}}}V\) by the contraction projection \(\varvec{\bar{P}}_{\varvec{\bar{A}}}\) coincides with the pullback of \(\varvec{\bar{A}}\) under \(\varvec{\bar{F}}\):

    $$\begin{aligned} \varvec{\bar{P}}_{\varvec{\bar{A}}}\varvec{\bar{A}}^{-1}\varvec{\bar{P}}_{\varvec{\bar{A}}}^T=\varvec{\bar{F}}(\varvec{\bar{F}}^T\varvec{\bar{A}}\varvec{\bar{F}})^{-1}\varvec{\bar{F}}^T=(\varvec{\bar{A}}_{\varvec{\bar{F}}}^{-1})^*. \end{aligned}$$
    (56)
  2. (ii)

    The image of the pullback coincides with the contraction space:

    $$\begin{aligned} (\varvec{\bar{A}}_{\varvec{\bar{F}}}^{-1})^*V=\varvec{\bar{P}}_{\varvec{\bar{A}}}V. \end{aligned}$$
    (57)
  3. (iii)

    The restriction of the stiffness matrix \(\varvec{\bar{A}}\) onto the contraction space \(\varvec{\bar{P}}_{\varvec{\bar{A}}}V\) coincides with the pullback:

    $$\begin{aligned} (\varvec{\bar{A}}_{\varvec{\bar{F}}}^{-1})^*\varvec{\bar{A}}(\varvec{\bar{A}}_{\varvec{\bar{F}}}^{-1})^*=(\varvec{\bar{A}}_{\varvec{\bar{F}}}^{-1})^*. \end{aligned}$$
    (58)

Proof

i) and ii) can be easily seen. (58) is a rewriting of (56).

(56) presents the restriction of \(\varvec{\bar{A}}^{-1}\) onto the contraction space by \(\varvec{\bar{P}}_{\varvec{\bar{A}}}\), and (58) presents the restriction onto the contraction space by the pullback \((\varvec{\bar{A}}_{\varvec{\bar{F}}}^{-1})^*\), showing that the two restrictions give rise to the representations of the same \((\varvec{\bar{A}}_{\varvec{\bar{F}}}^{-1})^*\), and a reciprocal relation as seen in Fig. .

Fig. 11
figure 11

Reciprocal relation between the restrictions of \(\varvec{\bar{A}}^{-1}\) and \(\varvec{\bar{A}}\). Panel (a) shows that the pullback \((\varvec{\bar{A}}_{\varvec{\bar{F}}}^{-1})^*\) on the coarse space \(\varvec{\bar{P}}_{\varvec{\bar{A}}}V\) hides the inverse \(\varvec{\bar{A}}^{-1}\) on the global space V

The condition number of the problem (17) is

$$\begin{aligned} \kappa (\varvec{\bar{A}})=\frac{\bar{\lambda }_{n-r}}{\bar{\lambda }_1}. \end{aligned}$$
(59)

In the DDDM algorithm, we removed the lower, e.g., m modes, and the condition number changes to

$$\begin{aligned} \frac{\bar{\lambda }_{n-r}}{\bar{\lambda }_{m+1}}\quad \left( \le \kappa (\varvec{\bar{A}})\right) , \end{aligned}$$
(60)

which means that the DDDM algorithm reduces the condition number to

$$\begin{aligned} \frac{\lambda _1}{\lambda _{m+1}}\kappa (\varvec{\bar{A}}). \end{aligned}$$
(61)

5.3 Examples of deformation modes obtained using the DDDM

The target object \(\Omega \) deforms in the coarse space, as represented by the coarse motion of the subdomains \(\Omega _J,1\le J\le N\). The coarse motion of \(\Omega \) is close to the rigid-body motion of \(\Omega \), though the imposed constraint conditions fix some part of the deformation of \(\Omega \). \(\Omega _J\) itself has rigid-body modes, and the boundaries of some neighboring subdomains or the constrained boundaries restrict the deformation of \(\Omega \).

We present an example of how the target object deforms in the coarse problem. The stiffness matrix in the coarse space is given by (49). We use here the \(1\times 1 \times 6\) model shown in Fig. 4 in Sect. 4.3. The material is standard steel. We give the constraint conditions to nodal points 1, 2, and 3, corresponding to DOFs 1, 2, 3, 4, 5, 6, 7, 8, and 9. We eliminate these DOFs. Thus, \(n=84\), \(r=9\), and \(n-r=75\), which are the same as noted in Table 2 in Sect. 4.3.

Our problem is to solve the following equation for the displacement \(\varvec{\bar{x}}\) given an external force \(\bar{b}\):

$$\begin{aligned} \varvec{\bar{A}}_{\varvec{\bar{F}}}\varvec{\bar{x}}=\varvec{\bar{b}}. \end{aligned}$$
(62)

The results are shown in Fig. 12. In the figure, we show the nodal point numbers 25 and 28 in Case (a), and the element numbers 1, 2, 3, 4, 5, and 6 in Case (c). The number of decompositions N is two in Case (c) and three in Case (b), whereas Case (a) has no decomposition (fine analysis). In Case (b), we decomposed the model between elements 2 and 3 and between elements 4 and 5. In Case (c), we decomposed between elements 3 and 4.

Fig. 12
figure 12

Deformation of the \(1\times 1 \times 6\) model in the coarse problem setting. Case (a) shows the deformation in the fine space. There are three and two subdomains in Cases (b) and (c), respectively. In Case (c), the model body is decomposed only between elements 3 and 4, whereas in Case (b), the body is decomposed between elements 2 and 3 and between 4 and 5. In Case (c), elements 3 and 4 are dragged by the joining surface of elements 3 and 4, and both elements are distorted. The result of Case (b) is close to the fine result (a) because we give a finer decomposition than Case (c)

We apply the external force on the first DOF (the x coordinate) of nodal points 25 and 28 in the negative and positive directions of the x axis, respectively, clockwise with twisting. The force strength is assumed to be as high as 3000 N, which creates an artificially large displacement. The swelling seen in the upper part of the figures, especially in Cases (a) and (b), is supposed to be caused by a violation of the assumption of the infinitesimal deformation in the FEA. In Case (c), elements 3 and 4 are dragged by the joining surface between elements 3 and 4, and both elements are distorted. In Case (b), elements 2, 3, 4, and 5 are distorted. In particular, Case (c) has a deformation near the rigid-body motion of the model. The result of Case (b), which has a finer decomposition than Case (c), is close to the fine result of Case (a).

6 DDDM algorithm

The DDDM algorithm shares similarities with the algorithm given in [7] that expands the Krylov subspace, adding approximated eigenvectors corresponding to eigenvalues close to zero using the orthogonality of the residuals. In our method, instead of adding approximated eigenvectors to the Krylov subspace, we choose an appropriate domain decomposition, which we defined in Sect. 5.1, to eliminate eigenvectors corresponding to eigenvalues close to zero determining the contraction projection space; see Sect. 5.2.

We describe the DDDM algorithm below. The DDDM algorithm is the same as the standard deflation algorithm. We define the algorithm within the standard preconditioned CG algorithm. The two-level algorithm described below has a much shorter calculation time owing to the form of the contraction projection matrix (51).

6.1 Preconditioned CG method

For use in Theorem 6 below, we note here the general preconditioned CG algorithm. Let \(\varvec{\bar{M}}\in \textbf{R}^{(n-r)\times (n-r)}\) be an arbitrary precondition matrix, and let \(\varvec{\bar{x}}_0\in \textbf{R}^{n-r}\) be an arbitrary initial guess of the solution vector. Let the residual vector \(\varvec{\bar{r}}_0\in \textbf{R}^{n-r}\) and the gradient vector \(\varvec{\bar{p}}_0\in \textbf{R}^{n-r}\) be

$$\begin{aligned}&\varvec{\bar{r}}^0=\varvec{\bar{b}}-\varvec{\bar{A}}\varvec{\bar{x}}^0, \end{aligned}$$
(63)
$$\begin{aligned}&\varvec{\bar{p}}^0=\varvec{\bar{M}}\varvec{\bar{r}}^0. \end{aligned}$$
(64)

Then, repeat for \(k=1,2,\cdots \)

$$\begin{aligned}&\alpha ^k=\displaystyle {\frac{(\varvec{\bar{r}}^k,\varvec{\bar{M}}\varvec{\bar{r}}^k)}{(\varvec{\bar{p}}^k,\varvec{\bar{A}}\varvec{\bar{p}}^k)}}, \end{aligned}$$
(65)
$$\begin{aligned}&\varvec{\bar{x}}^{k+1}=\varvec{\bar{x}}^k+\alpha ^k\varvec{\bar{p}} ^k, \end{aligned}$$
(66)
$$\begin{aligned}&\varvec{\bar{r}}^{k+1}=\varvec{\bar{r}}^k-\alpha ^k \varvec{\bar{A}}\varvec{\bar{p}}^k, \end{aligned}$$
(67)
$$\begin{aligned}&\beta ^k=\displaystyle {\frac{(\varvec{\bar{r}}^{k+1},\varvec{\bar{M}}\varvec{\bar{r}}^{k+1})}{(\varvec{\bar{r}}^k,\varvec{\bar{M}}\varvec{\bar{r}}^k)}},\end{aligned}$$
(68)
$$\begin{aligned}&\varvec{\bar{p}}^{k+1}=\varvec{\bar{M}}\varvec{\bar{r}}^{k+1}+\beta ^k \varvec{\bar{p}}^k. \end{aligned}$$
(69)

6.2 Definition of the precondition matrix and the DDDM algorithm

Our problem is (17), which we obtained by imposing the constraint condition on (12). Let the precondition matrix be \(\varvec{\bar{M}}\). We multiply both sides of (17) by \(\varvec{\bar{M}}\) to obtain

$$\begin{aligned} \varvec{\bar{M}}\varvec{\bar{A}}\varvec{\bar{x}}=\varvec{\bar{M}}\varvec{\bar{b}}. \end{aligned}$$
(70)

Along with the direct sum decomposition (55), we assume \(\varvec{\bar{M}}\) as

$$\begin{aligned} \varvec{\bar{M}}=\varvec{\bar{P}}_{\varvec{\bar{A}}}\varvec{\bar{A}}^{-1}\varvec{\bar{P}}_{\varvec{\bar{A}}}^T+(\varvec{I}-\varvec{\bar{P}}_{\varvec{\bar{A}}})\varvec{\bar{R}}(\varvec{I}-\varvec{\bar{P}}_{\varvec{\bar{A}}}^T), \end{aligned}$$
(71)

where \(\varvec{\bar{R}}\in \textbf{R}^{(n-r)\times (n-r)}\) is a symmetric positive definite matrix that plays the role of a precondition matrix in the fine space. The relationship between the decomposition (55) and the precondition matrix (71) is shown in Fig. 13.

Fig. 13
figure 13

Relationship between the direct sum decomposition and the precondition matrix. \(\varvec{\bar{R}}\) is an arbitrary precondition matrix in the fine space V

The second term of (71) is a precondition on the fine space. The first term on the right side of (71), which eliminates the residuals that comprise the lower modes, coincides with the pullback \((\varvec{\bar{A}}_{\varvec{\bar{F}}}^{-1})^*\) from (56) in Theorem 4, and then

$$\begin{aligned} \varvec{\bar{M}}=(\varvec{\bar{A}}_{\varvec{\bar{F}}}^{-1})^*+(\varvec{I}-\varvec{\bar{P}}_{\varvec{\bar{A}}})\varvec{\bar{R}}(\varvec{I}-\varvec{\bar{P}}_{\varvec{\bar{A}}}^T). \end{aligned}$$
(72)

The first term on the right side of (71) or (72), which we have already seen in (50), is

$$\begin{aligned} \varvec{\bar{P}}_{\varvec{\bar{A}}}\varvec{\bar{A}}^{-1}\varvec{\bar{P}}_{\varvec{\bar{A}}}^T=(\varvec{\bar{A}}_{\varvec{\bar{F}}}^{-1})^*=\varvec{\bar{F}}(\varvec{\bar{F}}^T\varvec{\bar{A}}\varvec{\bar{F}})^{-1}\varvec{\bar{F}}^T. \end{aligned}$$
(73)

This expression hides \(\varvec{\bar{A}}^{-1}\) by the pullback; see Fig. . \(\varvec{\bar{F}}^T\varvec{\bar{A}}\varvec{\bar{F}}\in \textbf{R}^{6N\times 6N}\) is a symmetric positive definite matrix on \(W\equiv \textbf{R}^{6N}\). It is so small that we can obtain \((\varvec{\bar{F}}^T\varvec{\bar{A}}\varvec{\bar{F}})^{-1}\) can be obtained using a direct method (or LU decomposition).

Proposition 5

(Product of the projection and the precondition matrix)

It holds that

$$\begin{aligned} \varvec{\bar{P}}_{\varvec{\bar{A}}}\varvec{\bar{M}}=(\varvec{\bar{A}}_{\varvec{\bar{F}}}^{-1})^*. \end{aligned}$$
(74)

Proof

Multiplying both sides of (72) by \(\varvec{\bar{P}}_{\varvec{\bar{A}}}=(\varvec{\bar{A}}_{\varvec{\bar{F}}}^{-1})^*\varvec{\bar{A}}\) yields

$$\begin{aligned} \varvec{\bar{P}}_{\varvec{\bar{A}}}\varvec{\bar{M}}=(\varvec{\bar{A}}_{\varvec{\bar{F}}}^{-1})^*\varvec{\bar{A}}(\varvec{\bar{A}}_{\varvec{\bar{F}}}^{-1})^*+\varvec{\bar{P}}_{\varvec{\bar{A}}}(\varvec{I}-\varvec{\bar{P}}_{\varvec{\bar{A}}})\varvec{\bar{R}}(\varvec{I}-\varvec{\bar{P}}_{\varvec{\bar{A}}}^T). \end{aligned}$$
(75)

From (58) in Theorem 4, the first term on the right side of this equation is \((\varvec{\bar{A}}_{\varvec{\bar{F}}}^{-1})^*\), and the second term becomes zero because \(\varvec{\bar{P}}_{\varvec{\bar{A}}}\) is a projection. Thus, (74) is obtained.

Theorem 6

(Range of the pullback and the gradient vector)

In the CG iteration steps, (63) to (69), let

$$\begin{aligned} \varvec{\bar{x}}^0=(\varvec{\bar{A}}_{\varvec{\bar{F}}}^{-1})^*\varvec{\bar{b}}. \end{aligned}$$
(76)

Using this vector, we define

$$\begin{aligned}&\varvec{\bar{r}}^0=\varvec{\bar{b}}-\varvec{\bar{A}}\varvec{\bar{x}}^0 \in \textbf{R}^{n-r}, \end{aligned}$$
(77)
$$\begin{aligned}&\varvec{\bar{p}}^0=\varvec{\bar{M}}\varvec{\bar{r}}^0 \in \textbf{R}^{n-r}. \end{aligned}$$
(78)

For \(k\ge 0\), the residual vector \(\varvec{\bar{r}}^k\) belongs to the kernel of \((\varvec{\bar{A}}_{\varvec{\bar{F}}}^{-1})^*\), and the gradient vector \(\varvec{\bar{p}}^k\) belongs to the kernel of \(\varvec{\bar{P}}_{\varvec{\bar{A}}}\), which is equal to the conjugate space \((\varvec{I}-\varvec{\bar{P}}_{\varvec{\bar{A}}})V\); see (55). Two equations thus hold for arbitrary \(k\ge 0\):

$$\begin{aligned}&(\varvec{\bar{A}}_{\varvec{\bar{F}}}^{-1})^*\varvec{\bar{r}}^k=\varvec{0}, \end{aligned}$$
(79)
$$\begin{aligned}&\hspace{3mm}\varvec{\bar{P}}_{\varvec{\bar{A}}}\varvec{\bar{p}}^k=\varvec{0}. \end{aligned}$$
(80)

Proof

We show (79) and (80) simultaneously by induction. First, we have

$$\begin{aligned} \varvec{\bar{r}}^0=\varvec{\bar{b}}-\varvec{\bar{A}}\varvec{\bar{x}}^0=\varvec{\bar{b}}-\varvec{\bar{A}}(\varvec{\bar{A}}_{\varvec{\bar{}F}}^{-1})^*\varvec{\bar{b}}. \end{aligned}$$
(81)

Multiplying both sides from the left side of this equation by \((\varvec{\bar{A}}_{\varvec{\bar{F}}}^{-1})^*\), and using (58) in Theorem 4, we have

$$\begin{aligned} (\varvec{\bar{A}}_{\varvec{\bar{F}}}^{-1})^*\varvec{\bar{r}}^0=(\varvec{\bar{A}}_{\varvec{\bar{F}}}^{-1})^*\varvec{\bar{b}}-(\varvec{\bar{A}}_{\varvec{\bar{F}}}^{-1})^*\varvec{\bar{A}}(\varvec{\bar{A}}_{\varvec{\bar{F}}}^{-1})^*\varvec{\bar{b}} =(\varvec{\bar{A}}_{\varvec{\bar{F}}}^{-1})^*\varvec{\bar{b}}-(\varvec{\bar{A}}_{\varvec{\bar{F}}}^{-1})^*\varvec{\bar{b}}=\varvec{0}. \end{aligned}$$
(82)

This result shows that

$$\begin{aligned} \varvec{\bar{p}}^0= & {} \varvec{\bar{M}}\varvec{\bar{r}}^0=((\varvec{\bar{A}}_{\varvec{\bar{F}}}^{-1})^*+(\varvec{I}-(\varvec{\bar{A}}_{\varvec{\bar{F}}}^{-1})^*\varvec{\bar{A}})\varvec{\bar{R}}(\varvec{I}-\varvec{\bar{A}}(\varvec{\bar{A}}_{\varvec{\bar{F}}}^{-1})^*))\varvec{\bar{r}}^0 \nonumber \\= & {} (\varvec{I}-(\varvec{\bar{A}}_{\varvec{\bar{F}}}^{-1})^*\varvec{\bar{A}})\varvec{\bar{R}}(\varvec{I}-\varvec{\bar{A}}(\varvec{\bar{A}}_{\varvec{\bar{F}}}^{-1})^*)\varvec{\bar{r}}^0=(\varvec{I}-\varvec{\bar{P}}_{\varvec{\bar{A}}})\varvec{\bar{R}}\varvec{\bar{r}}^0. \end{aligned}$$
(83)

We then have

$$\begin{aligned} \varvec{\bar{P}}_{\varvec{\bar{A}}} \varvec{\bar{p}}^0=\varvec{\bar{P}}_{\varvec{\bar{A}}} (\varvec{I}-\varvec{\bar{P}}_{\varvec{\bar{A}}})\varvec{\bar{R}}\varvec{\bar{r}}^0=\varvec{0}. \end{aligned}$$
(84)

(82) and (84) show (79) and (80) for \(k=0\).

We then assume (79) and (80) for \(k\ge 1\). Multiplying both sides of (67) of the CG iteration steps by \((\varvec{\bar{A}}_{\varvec{\bar{F}}}^{-1})^*\varvec{\bar{A}}\), using (79) for k, we have

$$\begin{aligned} (\varvec{\bar{A}}_{\varvec{\bar{F}}}^{-1})^*\varvec{\bar{r}}^{k+1}=(\varvec{\bar{A}}_{\varvec{\bar{F}}}^{-1})^*\varvec{\bar{r}}^k-\alpha ^k(\varvec{\bar{A}}_{\varvec{\bar{F}}}^{-1})^*\varvec{\bar{A}}\varvec{\bar{p}}^k=-\alpha ^k(\varvec{\bar{A}}_{\varvec{\bar{F}}}^{-1})^*\varvec{\bar{A}}\varvec{\bar{p}}^k=-\alpha ^k \varvec{\bar{P}}_{\varvec{\bar{A}}}\varvec{\bar{p}}^k=\varvec{0}.\nonumber \\ \end{aligned}$$
(85)

This shows (79) for \(k+1\). Multiplying both sides of (69) of the CG iteration steps by \(\varvec{\bar{P}}_{\varvec{\bar{A}}}=(\varvec{\bar{A}}_{\varvec{\bar{F}}}^{-1})^*\varvec{\bar{A}}\), we obtain from (80) for k,

$$\begin{aligned} \varvec{\bar{P}}_{\varvec{\bar{A}}} \varvec{\bar{p}}^{k+1}=\varvec{\bar{P}}_{\varvec{\bar{A}}}\varvec{\bar{M}} \varvec{\bar{r}}^{k+1}+\beta ^k \varvec{\bar{P}}_{\varvec{\bar{A}}} \varvec{\bar{p}}^k=\varvec{\bar{P}}_{\varvec{\bar{A}}}\varvec{\bar{M}} \varvec{\bar{r}}^{k+1}. \end{aligned}$$
(86)

From (74) in Proposition 5, we obtain

$$\begin{aligned} \varvec{\bar{P}}_{\varvec{\bar{A}}} \varvec{\bar{p}}^{k+1}=(\varvec{\bar{A}}_{\varvec{\bar{F}}}^{-1})^*\varvec{\bar{r}}^{k+1}. \end{aligned}$$
(87)

Using (79) for \(k+1\), which we have already proved, the right side of this equation becomes zero, which means (80) for \(k+1\).

Theorem 6 shows that the residual vectors \(\varvec{\bar{r}}^k,~k\ge 1\) that appear in the iteration steps stay in the kernel of \((\varvec{\bar{A}}_{\varvec{\bar{F}}}^{-1})^*\), and the gradient vectors \(\varvec{\bar{p}}^k, k\ge 1\), stay in the fine space \((\varvec{I}-\varvec{\bar{P}}_{\varvec{\bar{A}}})V\), which means that none of these ever steps over to the counter space.

Remark 2

(Advantageous effect of the definition of the deflation projection. See also Remark 1in Sect. 5.2)

  1. a)

    In the CG iteration from (65) to (69), the matrix–vector products that are necessary for updating the steps from k to \(k+1\) are \(\varvec{\bar{A}}\varvec{\bar{p}}^k\) and \(\varvec{\bar{M}}\varvec{\bar{r}}^k\), and \(\varvec{\bar{p}}_k\) is determined by \(\varvec{\bar{M}}\varvec{\bar{r}}^k\). Theorem 6 shows that \((\varvec{\bar{A}}_{\varvec{\bar{F}}}^{-1})^*\varvec{\bar{r}}^k=\varvec{0}\) for all \(k\ge 0\); therefore, we have

    $$\begin{aligned} {\varvec{\bar{M}}}\varvec{\bar{r}}^k= & {} (\varvec{\bar{A}}_{\varvec{\bar{F}}}^{-1})^*\varvec{\bar{r}}^k +(\varvec{I}-(\varvec{\bar{A}}_{\varvec{\bar{F}}}^{-1})^*\varvec{\bar{A}})\varvec{\bar{R}}(\varvec{I}-\varvec{\bar{A}}(\varvec{\bar{A}}_{\varvec{\bar{F}}}^{-1})^*)\varvec{\bar{r}}^k\nonumber \\= & {} (\varvec{I}-(\varvec{\bar{A}}_{\varvec{\bar{F}}}^{-1})^*\varvec{\bar{A}})\varvec{\bar{R}}\varvec{\bar{r}}^k. \end{aligned}$$
    (88)

    Thus, the property \((\varvec{\bar{A}}_{\varvec{\bar{F}}}^{-1})^*\varvec{\bar{r}}^k=\varvec{0}\) eliminates one matrix–vector product in each CG iteration step.

  2. b)

    The reduction of the calculation in (88) is given by the definition of the contraction projection (51) and the complementary projection defined in the precondition matrix (71).

6.3 Performance of the DDDM algorithm

We discuss the performance of the DDDM. Although we parallelize the DDDM, we assumed only a single process calculation in this section.

In general, depending on their thickness, convergence is difficult to achieve for thin-plate models when using iterative methods. We thus use thin plate models for our benchmarks.

Remark 3

(Precondition matrix for the fine space in the DDDM)

We give the precondition matrix for the fine space in the DDDM as \(\varvec{\bar{R}}\) in the definition for \(\varvec{\bar{M}}\) in (71) or (72). Although we can set \(\varvec{\bar{R}}\) to be any precondition matrix, we assume \(\varvec{\bar{R}}\) is a matrix of the successive symmetric overrelaxation (SSOR) for the following reasons.

The DDDM solver for our benchmarks is incorporated into FrontISTR [32], an open-source structural analysis code. FrontISTR has the SSOR preconditioned CG solver as a standard solver. The choice of the SSOR precondition for the DDDM enables a fair comparison between the SSOR preconditioned CG solver on the entire space and the SSOR-preconditioned-on-the-fine-space DDDM solver. In the following benchmarks, we switch between the SSOR-preconditioned CG solver and the SSOR-preconditioned DDDM solver. SSOR below refers to the SSOR-preconditioned CG method, and DDDM below refers to the SSOR-preconditioned-on-the-fine-space of DDDM.

In this section, the computer used was a cluster computer with Intel Xeon E5-2670, operating at 2.6 GHz, 16 cores \(\times \) 13 nodes, and 128 GB of RAM per node. We do not parallelize with processes or the multi-threads. We used METIS [33] for the domain decomposition in the DDDM solver.

We built the plate model by arranging \(1\textrm{mm}\times 1\textrm{mm} \times 1\textrm{mm}\) hexahedral linear elements in the same way as in Sect. 2.2. The x, y, and z elements were aligned in the directions of the x, y, and z-axes, respectively, except for some models noted below. We refer to the “\(x\times y\times z\)” plate in the same manner as in Sect. 2.2. The direction of the plate, constraint conditions, and loading conditions were likewise the same as in Sect. 2.2; see Fig. 1.

The analysis is the same cantilever-type dead weight analysis as conducted in Sect. 2.2. We ran the analysis starting with the \(300\times 10\times 300\) model and then reducing the thickness (i.e., the number of hexahedral elements in the y-axis direction) to 9, 8, 7, 6, 5, 4, 3, 2, and 1. The DDDM converged even with \(y=1\). We then set y as thin as \(y=\) 0.5, 0.2, and 0.1.

We set the tolerance as \(1.0\times 10^{-7}\) in all cases and set the maximum number of the iterations as 10,000 for both the SSOR and DDDM. We considered that the iteration did not converge when the number of the iteration reached the maximum number, and there was no tendency for a decrease in the relative residual error.

We summarize the benchmark result in Table 5.

Stability of the DDDM solver. Table 5 presents the stability of the DDDM solver in the sense that the SSOR converged with the \(300\times 10\times 300\) model but failed to converge with the \(300\times 9\times 300\) model. This tendency shows the difficulty of handling thin plates using the SSOR preconditioned CG method. Meanwhile, the DDDM converged from the \(300\times 10\times 300\) model to the \(300\times 0.2\times 300\) model but did not converge for the \(300\times 0.1\times 300\) model. The number of iterations and calculation time of the DDDM decreased from the \(300\times 10\times 300\) model to the \(300\times 1\times 300\) model, presumably because of the decrease in the number of DOFs. However, the trend turned at the \(300\times 0.5\times 300\) model, which shows the difficulty of convergence with a reduction in the thickness even when using the DDDM.

Performance of the DDDM solver. Table 6 compares the performances of the DDDM solver and SSOR solver taken from the first row of Table 5 for \(y=10\). We can see that the DDDM solver is 85.9 times as fast as the SSOR solver in terms of the number of iterations and 17.2 times as fast as the calculation time.

Table 5 Stability of the DDDM solver. As y decreased from 10, the SSOR diverged at \(y=9\), whereas the DDDM converged to \(y=0.2\). The number of iterations for the DDDM was as low as approximately 40 to 80, whereas that for the SSOR was 4127. The thickness was so small at \(y=1\) and lower than the number of iterations increased. The number of subdomains N was determined by preliminary benchmark tests
Table 6 Performance of the DDDM solver for the \(300\times 10\times 300\) model. The result is taken from the first row of Table 5

We show the analysis results of the \(300\times 10\times 300\), \(300\times 5\times 300\), \(300\times 2\times 300\), \(300\times 1\times 300\), \(300\times 0.5\times 300\), and \(300\times 0.2\times 300\) models in Fig. 14. The color represents the displacement norms, and we enlarged the deformations by a factor of 10 in all cases.

Fig. 14
figure 14

Results of the thin model benchmark. Six cases from Table 5 are shown. We enlarged the deformations by a factor of 10 in all cases. The color contours show the displacement norms with the same range

7 Parallelization

As stated in Sect. 1 and Sect. 2.3, our solver uses two sets of domain decomposition. One is for the generation of the coarse grid, which corresponds to the structure of the DDDM, and we use the other for parallelizing the CG method in the DDDM.

7.1 Overview of the parallelization

We decompose the entire body \(\Omega \) into non-overlapping subspaces:

$$\begin{aligned} \Omega =\bigcup _{I=1}^M \Omega _I, \end{aligned}$$

as in Fig. 15, where M is the number of parallel processes, and each \(\Omega _I\) corresponds to each of the parallel processes. The domain decomposition here is “nodal-point-base” as opposed to the domain decomposition used for generation of the coarse grid is “element-base.” \(\Omega _{IJ}\) is a region consisting of the points nearest the neighboring both \(\Omega _I\) and \(\Omega _J\). The points in the figure show the nodal points \(\varvec{\bar{x}}\). We fix I and J \((I\ne J)\). Though both \(\Omega _I\) and \(\Omega _J\) have other neighboring subspaces, we omit them for an explanation. We apply the CG method to the extended subdomains \(\Omega _I\cup \Omega _{IJ}\) and \(\Omega _J\cup \Omega _{IJ}\) in parallel. The CG iteration refers to the nodal value in \(\Omega _I\cup \Omega _{IJ}\) and also \(\Omega _J\cup \Omega _{IJ}\) updating the values of the nodal points in the iteration process through the communications between the parallel processes I and J. In the CG process, the points in \(\Omega _{IJ}\) have different values in each of the parallel processes I and J. After some number of the CG iterations, all the nodal values converge to some values, including the points in \(\Omega _{IJ}\) within a given residual error range.

Fig. 15
figure 15

Example of domain decomposition for parallelization

In the DDDM algorithm, we parallelize the coarse grid generation process in the parallel CG method. As noted in Sect. 5.2, we use the contraction matrix \(\varvec{\bar{A}}_{\varvec{\bar{F}}}=\varvec{\bar{F}}^T\varvec{\bar{A}}\varvec{\bar{F}}\in \textbf{R}^{6N\times 6N}\) defined by Equation (49) in the whole space. The components of \(\varvec{\bar{A}}\) and \(\varvec{\bar{F}}\) are distributed to some parallel process I, we then calculate \((\varvec{\bar{F}}^T\varvec{\bar{A}}\varvec{\bar{F}})_I\) independently in the process I. We gather those components to the parent MPI process. We apply the LU decomposition to \(\varvec{\bar{F}}^T\varvec{\bar{A}}\varvec{\bar{F}}\). We then apply \((\varvec{\bar{F}}^T\varvec{\bar{A}}\varvec{\bar{F}})^{-1}\), which we distribute to each parallel process in the CG process in parallel for each I. The parallel process I includes not only the inner points of \(\Omega _I\) but points in \(\Omega _{IJ}\) outside of \(\Omega _I\).

7.2 Parallel performance

We present here the cantilever-type cuboid model under the dead weight conditions, which is the same analysis as in Sect. 6.3; however, we assumed a larger model with dimensions of \(500\times 30\times 500\). We compared the result of the DDDM with the result of the SSOR. We chose the configuration referring to Sect. 6.3. The configuration of \(500\times 30\times 500\) was almost the lower bound for convergence for the SSOR concerning the thickness; i.e., we cannot set the thickness to be less than 30 in this configuration, as the SSOR does not converge. The computer used was the Oakbridge-CX system [34] installed at the University of Tokyo.

The number of elements, DOFs, and constraints were 7,500,000, \(n=23,343,093\), and \(r=15,531\), respectively. We conducted the analyses using 1, 2, 4, 8, 16, 24, and 32 processes. The number of subdomains N for the DDDM (see Sect. 5.1) was set at 2500 for these processes by conducting preliminary benchmarks.

The performance results are given in Table 7. Although the superiority of the DDDM is clear, the parallel performance of the DDDM decreases with an increasing number of processes in contrast with the parallel performance of the SSOR. This tendency is because the larger number of parallel processes increases the communication traffic.

Table 7 Parallel performance of the DDDM and comparison with the SSOR. The configuration \(500\times 30\times 500\) is almost the lower bound concerning the thickness \(y=30\) by the lower limit of the convergence of the SSOR

8 Conclusions

Static structural problems discretized by the FEA method without constraint conditions have singular stiffness matrices, and the kernel of the stiffness matrices has six eigenvectors with duplicated zero eigenvalues. The static equations become solvable by imposing at least six forced displacement conditions (i.e., constraint conditions).

The starting point of the DDDM algorithm is that we apply the (parallel) CG method to the entire domain as the basic framework rather than using the direct method as in the DDM. We decompose the entire domain in two ways. One decomposition parallelizes the CG method, and the other generates the coarse space based on the rigid-body modes.

We discussed the distance between the eigenvectors and the kernel of the unconstrained stiffness matrices. The basis of the kernel represents the rigid-body motions. The small distance between the eigenvectors with small eigenvalues gives rise to generating the space of the coarse motions together with the domain decomposition.

We construct the solver algorithm to remove the coarse space obtained by the coarse motion from the solution space. The iteration space is handed over to the fine space using the deflation and two-level algorithm. The two-level method contributes to high-speed performance.

We conducted benchmark tests of the elastic static analysis for thin plate models, and the results showed the high-speed performance and stability of the DDDM. We also presented a brief overview of the parallelization and its performance.