1 Introduction

In the nuclear engineering domain, the fundamental mode solution of the K-eigenvalue problem based on the steady-state multigroup neutron diffusion theory is crucial for simulation and analysis of nuclear reactors. The eigenvalue equation can be expressed as follows:

$$\begin{aligned} \begin{aligned} - \nabla \cdot (D_g \nabla \phi _g) + \Sigma ^R_g \phi _g - \sum _{g' \ne g} \Sigma _{g' \rightarrow g} \phi _g'\\ = \frac{1}{K} \chi _g \sum _{g'} \nu \Sigma ^\textrm{f}_{g'} \phi _g' ~~~~ g=1,2,...,G, \end{aligned} \end{aligned}$$
(1)

where \(\phi _g \equiv \phi _g(\textbf{r})\) denotes the neutron flux at spatial point \(\textbf{r}\) in the g-th energy group, \(D_g,\Sigma ^R_g, \Sigma _{g' \rightarrow g}, \chi _g\) and \(\nu \Sigma ^\textrm{f}_{g'}\) denote spatially dependent (possibly discontinuous) parameters that reflect the material properties in a reactor core [1]. Nuclear engineers and analysts must numerically determine the fundamental mode eigenvalue (commonly called \(\textrm{K}_\textrm{eff}\)) and the corresponding eigenvector for a given geometry/material configuration. For numerical solution, Eq. (1) must be discretized and reduced to a set of G-coupled algebraic equations, which can be expressed using matrices as follows:

$$\begin{aligned} \begin{aligned} M \Phi = \frac{1}{K} F \Phi . \end{aligned} \end{aligned}$$
(2)

This is often termed as the generalized eigenvalue problem because coefficient matrices occur on both sides of the equation. Many mature numerical methods, such as the finite difference method [2], nodal collocation method [3], finite-element method [4,5,6], and nodal expansion method [7,8,9], have been proposed to solve neutron diffusion equations. Among the methods, the nodal expansion method is widely used because it is easier to implement and requires less computational effort than other methods. The power iteration method is the most well-known numerical method for solving the principal K-eigenvalue [10]. A detailed review of conventional numerical methods for solving K-eigenvalue problems is available in recent nuclear reactor physics textbooks [1, 11, 12]. Although traditional and mature numerical methods are currently widely used in nuclear reactor physics to solve K-eigenvalue problems with acceptable engineering accuracy and computational cost, state-of-the-art neural networks to solve K-eigenvalue problems are still in the infancy stages. However, neural networks exhibit the potential to provide another option to solve nuclear engineering problems with the progress of new algorithms and hardware.

As mentioned above, many traditional numerical methods were proposed to solve neutron diffusion equations. However, a dense mesh is required to ensure highly accurate results. The model consumes more computational resources if the mesh is extremely dense. Inaccurate solutions are obtained if the mesh is coarse. Moreover, for high-dimensional problems, classical methods are either less efficient or not successful due to the problems involving dimensionality. A neural network can potentially enhance its performance by using a capable hypothesis space due to its relatively low statistical error [13].

Neural networks exhibit multiple potential benefits in solving nuclear engineering problems when compared with conventional numerical methods.

  • They provide mesh-free solutions to approximate physics fields in nuclear reactors, in which mesh generation is significantly complex due to high heterogeneity of geometry and material.

  • They provide a general framework to solve high-dimensional problems governed by parameterized PDEs, particularly for original neutron transport equation.This equation comprises seven variables: three spatial variables, two directional variables, one energy variable, and one temporal variable.

  • They are able to seamlessly incorporate prior data (potentially with noise) into existing algorithms given that data assimilation is necessary as a post-process for conventional numerical methods [14,15,16,17,18,19].

  • They provide a general framework to solve inverse problems with hidden physics. Conversely, they are typically prohibitively expensive and require different formulations and elaborate computer codes for conventional numerical methods.

In recent years, neural networks have been widely used to solve Partial Differential Equations (PDEs) [20] and achieved remarkable success.

Based on neural networks, a large number of methods were proposed to solve PDE, such as the deep backward stochastic differential equation (BSDE) method [21, 22], deep Galerkin method (DGM) [23], deep Ritz method (DRM) [24], and Physics-Informed Neural Network (PINN) [25]. The deep BSDE method reformulates PDEs using backward stochastic differential equations and approximates the gradient of an unknown solution using neural networks. Although DGM and PINN appear independently under two names in the literature, they are similar. They both train neural networks by minimizing the mean squared error loss of the residual equation. However, DRM reformulates the original problem into a variational problem and trains neural networks by minimizing the energy function of the variational problem. Specifically, DRM and PINN [25] attracted widespread attention. Extensive studies focused on these methods to solve a variety of problems, including fiber optics [26], hyperelasticity [27], solid mechanics [28], heat transfer problems [29,30,31], inverse problems [32,33,34,35], and uncertainty quantification [36,37,38,39]. However, a few studies focused on solving eigenvalue problems [24, 40,41,42,43,44,45].

The need to solve eigenvalue problems can be traced back to 2018 [24]. A deep Ritz method to solve variational problems was proposed, and several examples elucidated on how to use DRM to solve eigenvalue problems [24]. First, the original eigenvalue problem was transformed into a variational problem. Then, a specially defined loss function was constructed, termed as the Rayleigh quotient, using the variational principle [46]. The Rayleigh quotient is a well-known approximation of the eigenvalue of matrix \(\varvec{A}\), which is defined by \(R=\frac{\varvec{x}^{\textrm{T}} \varvec{Ax}}{\varvec{x}^{\textrm{T}} \varvec{x}}\). Finally, they minimized the loss function and obtained the smallest eigenvalue. Similarly, some studies [41, 42] directly used PINN to solve eigenvalue problems. In contrast to DRM transferring the original problem to a variational problem, the PINN solves eigenvalue problems without variation. Neural networks are used in PINN to represent the function, and automatic differentiation (AD) [47] is used to acquire the vector impacted by the differential operators. The loss function is obtained using the Rayleigh quotient. The smallest eigenvalue and corresponding eigenvector were then solved using optimization tools. Moreover, the deep forward-backward stochastic differential equation (FBSDE) method [48] was proposed to solve eigenvalue problems, which is an expansion of the deep BSDE method. In the method, the eigenvalue problem is reformulated as a fixed-point problem of semigroup flow induced by the operator. Other studies [43, 44] proposed an alternative method to learn the eigenvalue problem by adding one or two regularization terms to the loss function. More recently, Ref. [49], a neural network framework based on the power method [50] was presented to solve eigenvalue problems and smallest eigenvalue problems, where the eigenfunction is expressed by the neural network and iteratively solved following the idea of the power method or inverse power method. However, the scope was limited to linear operators and certain special eigenvalue problems.

In a recent study, PINN was applied to solve neutron diffusion equations [45, 51], where the authors used a free learnable parameter to approximate the eigenvalue and a novel regularization technique to exclude null solutions from the PINN framework. A conservative physics-informed neural network (cPINN) was proposed in discrete domains for nonlinear conservation laws [52]. Moreover, cPINN [53] was applied to solve heterogeneous neutron diffusion problems in one-dimensional cases, which develops PINN for each subdomain and considers additional conservation laws along the interfaces of subdomains (a general consideration in reactor physics [11]), which is involved in neural network training as the variable to be optimized.

More recently, a data-enabled physics-informed neural network (DEPINN) [40] was proposed to solve neutron diffusion eigenvalue problems. To achieve acceptable engineering accuracy for complex engineering problems, it is suggested that a very small amount of prior data from physical experiments be used to improve the training accuracy and efficiency. In contrast to PINN, which solves the neutron diffusion eigenvalue problem directly, an autoencoder-based machine learning method in combination with the reduced-order method [54, 55] was proposed [56] to solve the same problem. However, it still relies on solving governing equations with traditional numerical methods such as the finite difference method.

Although DRM provides a way to solve eigenvalue problems with neural networks, as shown in Sect. 4.2.2, results indicate that DRM is not stable when solving two-dimensional cases. First, DRM learns the eigenvalue and eigenfunction at the early stage of the training process. Subsequently, DRM attempts to learn a smaller eigenvalue after it is close to the true eigenvalue. Finally, DRM successfully learns one smaller eigenvalue that may be close to the true eigenvalue and one incorrect eigenfunction that is meaningless and far from the true eigenfunction. Additionally, the framework of a neural network based on the power method [50] is unsuitable for generalized eigenvalue problems. Therefore, it is necessary to propose a new algorithm to solve K-eigenvalue problems.

The study focuses on eigenvalue problems, which are also interface problems in which the eigenfunctions are continuous on the interface, and the derivatives of the eigenfunction are not continuous at the interface. Specifically, in the nuclear reactor physics domain, this is a general problem in which the reactor core is composed of fuel assemblies of different fissile nuclides enrichments [1]. Some studies focused on the use of neural networks to solve elliptic interface problems. Some researchers [57] use the idea of DRM and formulated the PDEs into the variational problems, which can be solved using the deep learning approach. They presented a novel mesh-free numerical method for solving elliptical interface problems based on deep learning [58]. They employed different neural networks in different subdomains and reformulated the problem as a least-squares problem. A similar case exists, in which the authors [53] enforce the interface conditions using piecewise neural network. In contrast to the methods, a discontinuity-capturing shallow neural network (DCSNN) [59] has been proposed for elliptic interface problems. The crucial concept of DCSNN is that a d-dimensional piecewise continuous function can be extended to a continuous function defined in \((d+1)\)-dimensional space, where the augmented coordinate variable labels the pieces of each subdomain. However, to the best of the authors’ knowledge, only a few studies focused on the use of neural networks to solve eigenvalue problems that incorporate interface problems involving regions of different materials. However, challenges exist at least on three fronts.

  • Designing a neural network that is more suitable for K-eigenvalue problems for more complicated/medium-size test problems.

  • Dealing with the interface problem in a more general and understandable manner when designing the neural network.

  • Proposing a framework effectively enhances the robustness of the neural network and improves the efficiency of utilizing the noisy prior data.

To address the aforementioned challenges and advance beyond the state-of-the-art research [45, 51, 53], we initially introduced the study [40]. This served as a preliminarily demonstration of the applicability of the PINN approach to reactor physics eigenvalue problems in complex engineering scenarios. The contributions of this study are as follows.

  • Firstly, we extend the Inverse Power Method Neural Network (IPMNN)[49] to the so-called Generalized Inverse Power Method Neural Network (GIPMNN) to solve the smallest eigenvalue and the related eigenfunction of the generalized K-eigenvalue problems. Compared to DEPINN from our previous study [53], we omit the prior data in the training process and attempt to solve the K-eigenvalue problems from a data-free mathematical/numerical perspective.

  • Then, we propose a Physics Constrained GIPMNN (PC-GIPMNN) to address the interface problem in a more general and understandable manner with respect to previous studies [45, 51, 53].

  • Finally, we conduct a thorough comparative study of GIPMNN, PC-GIPMNN, and DRM using a variety of numerical tests. We evaluate the applicability and accuracy of these three methods using typical 1D and 2D test cases in reactor physics, particularly accounting for material discontinuities in different geometries. In the 1D example, we determine the optimal ratio of outer to inner iterations, a finding that may be particularly relevant for GIPMNN. Additionally, we observe the failure of DRM in the 2D experiments, whereas PC-GIPMNN consistently outperforms GIPMNN and DRM over a set number of epochs.

The rest of this paper is organized as follows. The governing equations for the eigenvalue problems are presented in Sect. 2. In Sect. 3, we propose GIPMNN and PC-GIPMNN and introduce DRM in our cases. In Sect. 4, the results of 1D and 2D test cases are listed to verify the three methods. Finally, conclusions and future research are discussed in Sect. 5.

2 K-eigenvalue problems

This section introduces the equations that govern the neutron criticality over a spatial domain. We recall Eqs. (1) and (2). The generalized K-eigenvalue problem can be formulated as follows:

$$\begin{aligned} \left\{ \begin{aligned}&\mathcal {L} \phi =\lambda \mathcal {Q} \phi ,{} & {} \text {in} \hspace{5.0pt}\Omega , \\&\mathcal {B} \phi = g,{} & {} \text {on} \hspace{5.0pt}\partial \Omega , \end{aligned} \right. \end{aligned}$$
(3)

where domains \(\Omega \subset \mathbb {R}^\textrm{d}\), \(\mathcal {L}, \mathcal {Q}\), and \(\mathcal {B}\) denote the differential operators acting on the functions defined in the interior of \(\Omega \) and at the boundary of \(\Omega \)( \(\partial \Omega \)). Furthermore, \(\phi \) denotes the eigenfunction of the system and \(\lambda \) denotes the associated eigenvalue. In this preliminary study, inspired by the notable work in [56], we utilize the one-group steady-state diffusion equation for criticality, framing it as a generalized eigenvalue problem. It is expressed as:

$$\begin{aligned} \begin{aligned} - \nabla \cdot (D \nabla \phi ) + \Sigma ^\text {a} \phi = \lambda v\Sigma ^\text {f} \phi , \end{aligned} \end{aligned}$$
(4)

where eigenfunction \(\phi \) denotes the neutron flux, which is a scalar quantity used in nuclear reactor physics. It corresponds to the total length travelled by all free neutrons per unit time and volume. \(\Sigma ^\text {a}\), \(\Sigma ^\text {f}\), and v denote the absorption cross section, fission cross section, and average number of neutrons produced per fission event, respectively. We follow [56] and use the diffusion coefficient approximation

$$\begin{aligned} \begin{aligned} D=\frac{1}{3(\Sigma ^\text {a}+\Sigma ^\text {s})}. \end{aligned} \end{aligned}$$
(5)

Here, \(\Sigma ^\textrm{s}\) denotes the cross-section where a neutron scatters in a different direction. The eigenvalue \(\lambda \) is obtained by multiplying the neutron source in Eq. (4). The value balances the terms that produce neutrons with those that account for the losses. This is defined as the reciprocal of \(k_\text {eff}\), i.e., \(\lambda =\frac{1}{k_\text {eff}}\), where

$$\begin{aligned} \begin{aligned} k_\text {eff}=\frac{\text{ number } \text{ of } \text{ neutrons } \text{ in } \text{ one } \text{ generation }}{\text{ number } \text{ of } \text{ neutrons } \text{ in } \text{ the } \text{ preceding } \text{ generation }}. \end{aligned} \end{aligned}$$
(6)

Two main boundary conditions are imposed on the diffusion equation. One condition represents a surface on which neutrons are reflected back into the domain (reflective condition), and the other condition represents surfaces that allow neutrons to escape from the system (vacuum or bare condition). Both the conditions are satisfied by relating the flux solution to its gradient on the boundary, i.e.,

$$\begin{aligned} \begin{aligned} -\frac{1}{2}D \nabla \phi \cdot \varvec{n}= \left\{ \begin{aligned}&\frac{1}{4}\phi{} & {} \text{ bare } \ \text{ surface }, \\&0{} & {} \text{ reflective } \ \text{ surface }, \end{aligned} \right. \end{aligned} \end{aligned}$$
(7)

where \(\varvec{n}\) denotes an outward point normal to the surface.

2.1 PINN as a Eigenvalue solver

In this subsection, we discuss using PINN to solve the generalized eigenvalue problem (Eq. (2)).

The eigenfunction of the operator \(\mathcal {L}\) is approximated by \(\mathcal {N}^{\theta }\), i.e., \(\phi (\varvec{x})=\mathcal {N}^{\theta }(\varvec{x})\). Then, \(\mathcal {L}\phi \) and \(\mathcal {B}\phi \) can be computed using the AD. For the boundary conditions: A penalty term (Eq. (8)) is added to the loss function in PINN, which penalizes the discrepancy between the approximated value on the boundary and exact boundary condition, where \(N_\textrm{b}\) denotes the number of sampling points on the boundary \(\partial \Omega \) and \(\varvec{x}_i\) is one point in the sampling set \(\{\varvec{x}_i\}_{i=1}^{N_\textrm{b}}\).

$$\begin{aligned}{} & {} \begin{aligned} {{Loss}}_\textrm{b} = \sum _{{i}=1}^{N_\textrm{b}} |\mathcal {B} \Phi (\varvec{x}_{i}) - g (\varvec{x}_{i})|^2. \end{aligned} \end{aligned}$$
(8)
$$\begin{aligned}{} & {} \begin{aligned} \lambda&= \frac{\langle \mathcal {L}\phi , \phi \rangle }{\langle \phi , \phi \rangle }.\\ \end{aligned} \end{aligned}$$
(9)

As mentioned previously, we are concerned with the smallest eigenvalue and associated eigenfunction. Hence, the eigenvalue (Rayleigh quotient) is viewed as a loss term in PINN to attain the lowest eigenvalue.

Finally, the total loss function (Eq. (10)) in PINN corresponds to the weighted sum of the two objectives (Eq. (8)) and (Eq. (9)), where \(\alpha \) and \(\beta \) corresponds to the weights.

$$\begin{aligned} \begin{aligned} {{Loss}}_\text {total}= \alpha \lambda ^2 + \beta {\text{ Loss }}_\textrm{b}. \end{aligned} \end{aligned}$$
(10)

Unfortunately, PINN does not work for the cases in the study and even performs worse than DRM. Therefore, we only compared the results of our method with those of DRM.

Remark 1

It should be noted that the square of the eigenvalue is used as the loss function in PINN. Given that the smallest eigenvalue implies that the absolute value is the smallest, PINN attempts to determine a negative infinity value without using a square term.

3 Methodologies

In this section, we extend our previous study [49] and discuss the use of a neural network to numerically solve the smallest eigenvalue problem. The main concept is to use a neural network to approximate the eigenfunction and compute the eigenvalue based on the Rayleigh quotient using an eigenvector expressed by the points calculated using the eigenfunction.

3.1 Neural network architecture

Next, we discuss the neural network structure used to approximate eigenfunction \(\phi (\varvec{x})\). The neural network architecture employed in the study is the same as ResNet [60], which is built by stacking several residual modules. It is one of the most popular models used in Deep Learning, and it is also commonly used in the field for solving PDEs via neural networks [24, 61, 62]. Each module has one skip connection, and each block consists of two fully connected layers as shown in Fig. 1.

In the network, let \(\varvec{x},\varvec{x}_{k}\in \mathbb {R}^d\) be the input and let \(\varvec{W}_l\) and \(\varvec{b}_l\), \(l=1, 2, 3, 4 \) and 5 be the parameters in the fully connected layers. We use \(\varvec{W}_{\textrm{r}k}\) and \(\varvec{b}_{\textrm{r}k}\) and \(k=1\) and 2 to denote the parameters in the residual connections. The results \(\varvec{s}_1\) and \(\varvec{s}_2\) for the modules can be expressed as follows:

$$\begin{aligned}{} & {} \varvec{s}_1 = \sigma (\varvec{W}_2 (\sigma (\varvec{W}_1 \varvec{x} + \varvec{b}_1)) + \varvec{b}_2) + \varvec{r}_1(\varvec{x}), \end{aligned}$$
(11)
$$\begin{aligned}{} & {} \varvec{s}_2 = \sigma (\varvec{W}_4 (\sigma (\varvec{W}_3 \varvec{s}_1 + \varvec{b}_3)) + \varvec{b}_4) + \varvec{r}_2(\varvec{s}_1), \end{aligned}$$
(12)

where \(\sigma \) denotes the activation function and is chosen as tanh. Furthermore, \(\varvec{r}_\textrm{k}\), \(k=1\), and 2 are functions in the residual connections, which can represented as follows:

$$\begin{aligned} \begin{aligned} \varvec{r}_{k}(\varvec{x}_{k}) = \sigma (\varvec{W}_{\textrm{r}k} \varvec{x}_{k} + \varvec{b}_{\textrm{r}k}). \end{aligned} \end{aligned}$$
(13)

Subsequently, the neural network \(\mathcal {N}^{\theta }(\varvec{x})\) is expressed as

$$\begin{aligned} \begin{aligned} \mathcal {N}^{\theta }(\varvec{x}) = \varvec{W}_5 \varvec{s}_2(\varvec{s}_1(\varvec{x})) + \varvec{b}_5. \end{aligned} \end{aligned}$$
(14)

Therefore, the eigenfunction \(\phi (\varvec{x})=\mathcal {N}^{\theta }(\varvec{x})\), where \(\theta \) denotes neural network parameters.

Remark 2

Given that \(\phi (\varvec{x})\) is the scalar of the neutron population and denotes the density of the free-moving neutron distribution over the spatial domain, we obtain \(\phi (\varvec{x}) \ge 0\). Therefore, the eigenfunction can be expressed as \(\phi (\varvec{x})=(\mathcal {N}^{\theta }(\varvec{x}))^2\).

Fig. 1
figure 1

Neural network architecture consists of two modules and one linear output layer. Each module contains two fully-connected layers and a skip connection

3.2 Recap of inverse power method neural network

We consider the following linear eigenvalue problem:

$$\begin{aligned} \mathcal {L} \phi = \lambda \phi , \ \text{ in } \hspace{5.0pt}\Omega , \end{aligned}$$
(15)

which differs from the generalized eigenvalue problem \(\mathcal {L} \phi = \lambda \mathcal {Q} \phi \).

Inspired by the idea of the inverse power method, IPMNN [49] was proposed to solve for the smallest eigenvalue and associated eigenfunction. Equation (16) depicts the key step of the inverse power method, and equation (17) is analogous to equation (16), where \({\varvec{A}}\) denotes a matrix, and \(\lambda _{k-1}\) and \(\varvec{\phi }_{k-1}\) denote the results of previous iteration. Therefore, \(\lambda _k\) and \(\varvec{\phi }_k\) are obtained by using Eq. (16).

$$\begin{aligned}{} & {} \left\{ \begin{aligned} \varvec{p}_{k}&= A^{-1}\varvec{\phi }_{{k}-1}, \\ \varvec{\phi }_{k}&= \frac{\varvec{p}_{k}}{\Vert \varvec{p}_{k} \Vert }. \\ \lambda _k&= \frac{\langle \varvec{A}\varvec{\phi }_{{k}}, \varvec{\phi }_{k}\rangle }{\langle \varvec{\phi }_{k}, \varvec{\phi }_{k}\rangle }. \end{aligned} \right. \end{aligned}$$
(16)
$$\begin{aligned}{} & {} \begin{aligned} \frac{\mathcal {L}\Phi _{k}}{\Vert \mathcal {L}\Phi _{k} \Vert } = \Phi _{k-1}. \\ \end{aligned} \end{aligned}$$
(17)

Here, the neural network \(\mathcal {N}^{\theta }\) represents the approximated eigenfunction \(\Phi _{k}\) at the kth iteration of Eq. (17). Given that \(\Phi _{k-1}\), which is obtained from the last iterative step and follows the main idea of inverse power method, we must solve \(\Phi _{k}\) using \(\mathcal {P}_k=\mathcal {L}^{-1}\Phi _{k-1}\) and \(\Phi _{k} = \mathcal {P}_{k} / \Vert \mathcal {P}_{k} \Vert \). However, it is difficult to obtain the inverse operator \(\mathcal {L}^{-1}\) for the differential operator \(\mathcal {L}\). Therefore, \(\Phi _{k}\) is obtained without knowing the inverse operator. The term \(\mathcal {L}\Phi _k\) is computed using AD in Eq. (17). The main idea is that it is not necessary to calculate \(\mathcal {L}^{-1}\), and the eigenfunction can be approximated iteratively by minimizing the defined loss (18) to approach Eq. (17), where \(\varvec{x}_i \in S\) is the dataset, and N denotes the number of points in S. The eigenvalue in the kth iteration is obtained by using (19).

$$\begin{aligned}{} & {} \begin{aligned} {{Loss}}_\text {ipmnn}(\theta ) = \sum _{i=1}^{N}\left( \frac{\mathcal {L}\Phi _{k}({\varvec{x}}_i)}{\Vert \mathcal {L}\Phi _{k} \Vert }-\Phi _{k-1}({\varvec{x}}_i)\right) ^2. \end{aligned} \end{aligned}$$
(18)
$$\begin{aligned}{} & {} \begin{aligned} \lambda _k&= \frac{\langle \mathcal {L}\Phi _{k}, \Phi _{k}\rangle }{\langle \Phi _{k}, \Phi _{k}\rangle }.\\ \end{aligned} \end{aligned}$$
(19)

3.3 Generalized inverse power method neural network

In standard nuclear engineering procedures, we normally employ the inverse power method to solve for the smallest eigenvalue and associated eigenvector, which relies on the discretization of Eq. (3). IPMNN [49] was proposed to solve the smallest eigenvalue problem, which is a mesh-free method realized by a neural network. However, the method is restricted to solving the equation \(\mathcal {L} \phi = \lambda \phi \), which is a simple form. Therefore, we propose GIPMNN to solve Eq. (3). Details of algorithm of GIPMNN is presented in Algorithm 1.

First, the generalized inverse power method is used to solve Eq. (20). The key step to focus on is given in Eq. (21), where \({\varvec{A}}\) and \({\varvec{B}}\) are two matrices and \(\lambda _{k-1}\) and \(\varvec{\phi }_{k-1}\) are the results of the previous iteration. Therefore, \(\lambda _k\) and \(\varvec{\phi }_k\) are obtained by using Eq. (21).

$$\begin{aligned}{} & {} \begin{aligned} \varvec{A} \varvec{\phi } = \lambda \varvec{B} \varvec{\phi }. \end{aligned} \end{aligned}$$
(20)
$$\begin{aligned}{} & {} \left\{ \begin{aligned}&\varvec{A} \varvec{\phi }_k = \lambda _{k-1} \varvec{B} \varvec{\phi }_{k-1}, \\&\lambda _k = \frac{\langle \varvec{A}\varvec{\phi }_{k}, \varvec{\phi }_{k}\rangle }{\langle \varvec{B}\varvec{\phi }_{k}, \varvec{\phi }_{k}\rangle }. \end{aligned} \right. \end{aligned}$$
(21)

We use a neural network \(\mathcal {N}^{\theta }\) to represent the approximated eigenfunction \(\Phi \).

$$\begin{aligned} \left\{ \begin{aligned}&\mathcal {L} \Phi _{k} = \lambda _{k-1} \mathcal {Q} \Phi _{k-1}, \\&\lambda _k = \frac{\langle \mathcal {L}\Phi _{k}, \Phi _{k}\rangle }{\langle \mathcal {Q}\Phi _{k}, \Phi _{k}\rangle }. \end{aligned} \right. \end{aligned}$$
(22)

In GIPMNN, Eq. (22) is an analog of Eq. (21), where \(\mathcal {L}\) and \(\mathcal {Q}\) denote linear differential operators implemented by AD as opposed to specially discretized matrices. In a manner similar to the generalized inverse power method, we record the results \(\lambda _{k-1}\) of previous iteration. In contrast to the generalized inverse power method, instead of recording \(\varvec{\phi }_{k-1}\), we record \(\mathcal {Q}\Phi _{k-1}\). It should be noted that \(\Phi _{k-1}\) denotes the eigenfunction represented by the neural network in \((k-1)\)th iteration, and \(\mathcal {Q}\Phi _{k-1}\) is realized by AD. In k-th iteration, we directly compute \(\Phi _{k}\) using the neural network, i.e., \(\Phi _k = \mathcal {N}^{\theta }\), and calculate \(\mathcal {L}\Phi _{k}\) using AD. We obtain \(\Phi _k\) directly through the neural network instead of solving the equation \(\mathcal {L} \Phi _{k} = \lambda _{k-1} \mathcal {Q} \Phi _{k-1}\), We define the loss function \({{Loss}}_\text {gipmnn}\) in Eq. (23) to propel the neural network to learn \(\Phi _k\). When the neural network converges, we obtain the smallest eigenvalue, and the associated eigenfunction can be expressed using a neural network.

$$\begin{aligned} \begin{aligned} {{Loss}}_\text {gipmnn} = \sum _{i=1}^N |\mathcal {L} \Phi _k({\varvec{x}}_i) - \lambda _{k-1} \mathcal {Q} \Phi _{k-1}({\varvec{x}}_i) |^2. \end{aligned} \end{aligned}$$
(23)
Algorithm 1
figure a

GIPMNN for Finding the Smallest Eigenvalue

Remark 3

In the algorithm of GIPMNN, given that the initial function \(\Phi _0\) is not represented by the neural network, it is not possible to obtain \(\mathcal {Q}\Phi _0\) using the AD. Therefore, we chose an arbitrary function \(\mathcal {Q}\Phi _0\).

The Neumann and Robin boundary conditions were used for the eigenvalue problem. It is difficult to enforce them by encoding the boundary conditions into a neural network, as in [49, 63, 64], where the Dirichlet and periodic boundary conditions are used.

We form the loss function \({{Loss}}_\textrm{b}\) in Eq. (8), where \(N_\textrm{b}\) denotes the number of sampling points on \(\partial \Omega \), and \(\varvec{x}_i\) is a point in the sampling set \(\{\varvec{x}_i\}_{i=1}^{N_\textrm{b}}\).

Fig. 2
figure 2

Illustration of PC-GIPMNN architecture diagram. There are multiple neurons in the output layer, which denote the eigenfunctions in different subdomains

Therefore, the loss function is defined in Eqs. (24) and (25) as follows: where the surface indicates that neutrons are reflected back into the domain or that the surface allows neutrons to escape from the system.

figure b

The total loss function (Eq. (26)) denotes the weighted sum of the objectives (Eq. (23)) and (Eq. (8)). For the process of GIPMNN, refer to Algorithm 1.

$$\begin{aligned} \begin{aligned} {{Loss}}_\text {total}= \alpha {{Loss}}_\text {gipmnn} + \beta {{Loss}}_\textrm{b}. \end{aligned} \end{aligned}$$
(26)

Remark 4

In Eq. (26), \(\alpha \) and \(\beta \) denote the weights of the two losses. It is noted that \(\beta \ge \alpha \) when training the neural network, particularly in the method GIPMNN. Given this issue, it is easy for a neural network to determine the eigenvalue and eigenfunction.

3.4 Physics constrained generalized inverse power method neural network

Although GIPMNN can solve the eigenvalue problems of Eq. (3), it is still difficult to solve eigenvalue problems with discontinuous coefficients in different regions. We discuss interface problems, which implies that the eigenfunction may be continuous at the interface, and the derivatives of the eigenfunction may not be continuous at the interface. The enforcement of interface conditions is very important for GIPMNN.

In the study, inspired by the idea of a piecewise deep neural network [58], we propose a PC-GIPMNN to solve eigenvalue problems with interface conditions. However, instead of employing different neural networks in different subdomains, we use only one neural network and multiple neurons in the output layer, as shown in Fig. 2. It should be noted that each neuron in the output layer corresponds to a subdomain. Therefore, we can obtain outputs in different subdomains that can be used to enforce the conditions at the interface.

Suppose that there are two domains \(\Omega _l\) and \(\Omega _r\), with an interface \(\Gamma \), which is the cross region between the two domains. Given the properties of the neutron population \(\phi (\varvec{x})\), we can summarize that the eigenfunction will satisfy two interface conditions, i.e., (27) and Eq. (28), where \(\phi _l\) and \(\phi _r\) represent the eigenfunctions defined in \(\Omega _l\) and \(\Omega _r\)m, respectively, and \(\varvec{n}\) denotes the normal vector pointing from \(\Omega _r\) to \(\Omega _l\). \(D_l\) and \(D_r\) are the coefficients defined in \(\Omega _l\) and \(\Omega _r\), respectively, which are discontinuous at the interface. Equation (27) indicates that the eigenfunction is continuous at the interface, i.e., the neutron flux is continuous at the interface. Equation (28) indicates that neutron flow is continuous at the interface.

figure c

Assume that \(S_{\Gamma }\) corresponds to the set of points at the interface \(\Gamma \) and \(|S_{\Gamma } |\) denotes the number of points in \(S_{\Gamma }\). We then introduce two penalty terms to enforce the two interface conditions: Equation (29) and Eq. (30), where \(\varvec{x_i} \in S_{\Gamma }\).

$$\begin{aligned} {{Loss}}_{i1}= & {} \sum _{i=1}^{|S_{\Gamma } |} |\Phi _l(\varvec{x_i}) - \Phi _r(\varvec{x_i}) |^2, \end{aligned}$$
(29)
$$\begin{aligned} {{Loss}}_{i2}= & {} \sum _{i=1}^{|S_{\Gamma } |} |(-D_l \nabla \Phi _l(\varvec{x_i}) \cdot \varvec{n})- (- D_r \nabla \Phi _r(\varvec{x_i}) \cdot \varvec{n}) |^2. \nonumber \\ \end{aligned}$$
(30)

By combining (26), (29), and (30), Equation (31) is the total loss defined in our method PC-GIPMNN, where \(\alpha \), \(\beta \), \(\gamma \), and \(\delta \) are the weights of the different losses. In subsequent experiments, we chose 1. In the study, we focused on the proposed algorithms and neglected the influence of weights. We are expecting that our proposed algorithms are universal and achieve better results, even without adjusting the weights. Therefore, we select all weights as 1.

$$\begin{aligned} \begin{aligned} {{Loss}}_\textrm{total}= \alpha {{Loss}}_\text {gipmnn} + \beta {{Loss}}_\textrm{b} + \gamma {{Loss}}_{i1} + \delta {{Loss}}_{i2}. \end{aligned} \end{aligned}$$
(31)

Moreover, the eigenfunction can be represented as:

$$\begin{aligned} \begin{aligned} \phi = l_l \phi _l + l_r \phi _r, \end{aligned} \end{aligned}$$
(32)

where, \(l_l\) and \(l_r\) denote the indicator functions, \(l_l = 1\) in \(\Omega _l\), \(l_l = 0\) in \(\Omega _r\), \(l_r = 1\) in \(\Omega _r\) and \(l_r = 0\) in \(\Omega _l\).

Remark 5

Conservative PINN (cPINN) [53] developed PINN for each subdomain and considered additional conservation law along the subdomains’ interfaces (a general consideration in reactor physics [11]). However, in neural network training, eigenvalue is involved as a variable to be optimized, and the numerical examples that are presented correspond to only one-dimensional cases. Furthermore, the relative errors of \(k_\text {eff}\) in these cases are \(4.4800\,\times \,10^{-04}\) and \(3.3500\,\times \,10^{-04}\). Similarly, we use Eqs. (29) and (30) to enforce the interface conditions. As shown below, our methods are more generic and yield better results.

Remark 6

In [45], the impact of the interface conditions was ignored, and the relative errors of \(k_\text {eff}\) in their cases corresponded to \(1.3\,\times \,10^{-03}\) and \(4.4\,\times \,10^{-03}\), respectively, and the study did not involve the smallest eigenvalue problem. In this study, we obtain the lowest eigenvalue using the inverse power method as opposed to using a free learnable parameter to approximate the eigenvalue. Our numerical results demonstrate that accurate results can be obtained in more complicated cases.

3.5 Deep Ritz method

DRM is a deep-learning-based method for numerically solving variational problems [24]. It reformulates the original PDEs into equivalent variational equations and defines the loss function based on variational formulations. The solutions of PDEs are represented by a neural network, and the derivatives are calculated using AD. DRM [24] is also used to solve eigenvalue problems. Furthermore, we specify how to use DRM to solve Eq. (3).

We consider the variational principle of the smallest eigenvalues.

$$\begin{aligned} \left\{ \begin{aligned}&\min&\frac{\int _{\Omega } \mathcal {L} \phi \cdot \phi \text{ d }x}{\int _{\Omega } \mathcal {Q} \phi \cdot \phi \text{ d }x}, \\&\text{ s.t. }&\mathcal {B} \phi |_{\partial \Omega } = g, \end{aligned} \right. \end{aligned}$$
(33)

where Rayleigh quotient was used. The boundary conditions were enforced by adding a penalty term.

$$\begin{aligned} \begin{aligned}&\min&\int _{\partial \Omega } |\mathcal {B} \phi - g |^2 \text {d}\,s, \end{aligned} \end{aligned}$$
(34)

and the total loss function \({{Loss}}_\mathrm{{drm}}\) is defined as:

$$\begin{aligned} \begin{aligned}&{{Loss}}_\text {drm} = \alpha \frac{\int _{\Omega } \mathcal {L} \phi \cdot \phi \textrm{d}x}{\int _{\Omega } \mathcal {Q} \phi \cdot \phi \textrm{d}x} + \beta \int _{\partial \Omega } |\mathcal {B} \phi - g |^2 \textrm{d}s, \end{aligned} \end{aligned}$$
(35)

where \(\alpha \) and \(\beta \) denote the weights of different losses. We chose \(\alpha =1\) and \(\beta =1\) for our experiments.

After the optimal approximation is obtained by solving the optimization problem (35), we obtain the smallest eigenvalue \(\lambda = \frac{\int _{\Omega } \mathcal {L} \phi \cdot \phi \text{ d }x}{\int _{\Omega } \mathcal {Q} \phi \cdot \phi \text{ d }x}\) and eigenfunction represented by the trained neural network. It should be noted that \(\mathcal {L} \phi \), \(\mathcal {Q} \phi \), and \(\mathcal {B} \phi \) are computed using the AD. For the process of DRM, refer to Algorithm 2.

Remark 7

We enforced the boundary condition by adding a penalty term, (34). However, if the boundary condition is the Neumann or Robin boundary condition, we do not use the penalty term (34) because the boundary condition is incorporated into the Rayleigh quotient based on Green’s first identity [65].

Algorithm 2
figure d

DRM for Finding the Smallest Eigenvalue

4 Experiments

In this section, we present numerical experiments to compare the applicability and accuracy of GIPMNN, PC-GIPMNN, and DRM for solving the smallest eigenvalue problems in reactor physics. In all the experiments below, we chose the Adam optimizer with an initial learning rate \(10^{-3}\) to minimize the loss function. Furthermore, we trained the neural network with the architecture of ResNet on a server equipped with CentOS 7 system, one Intel Xeon Platinum 8358 2.60-GHz CPU, and one NVIDIA A100 80GB GPU. Moreover, unless otherwise specified, the activation function was selected as the tanh function.

4.1 One-dimensional slab reactor

We consider one-dimensional case with a simple slab reactor consisting of a domain bounded by vacuum or bare surfaces, as shown in Fig. 3. The length of each slab reactor was 10 cm. It consists of fuel and control rod regions labeled 1, 2, and 3. The control rods were located between 2.2 and 2.5 cm and between 7.5 and 7.8 cm on the x-axis.

As shown in Fig. 3, there were two control rods in the one-dimensional slab reactor. Either device can be withdrawn or inserted. Three scenarios were designed to model the reactor and completely simulate the actions of the control rods. The three situations are labeled F1, F2, and F3 in Table 1. They indicate that both rods are withdrawn, only the left rod is inserted, and only the right rod is inserted. We also considered three other problems, R1, R2, and R3 in Table 1. Problem R1 is the same as F1, which is designed to simulate the withdrawal of both control rods. Here, we use R1 to facilitate comparison with R2 and R3. Although problems R2 and R3 denote that all the control rods are inserted, problem R2 resembles heavily inserted control rods, and problem R3 resembles slightly inserted control rods.

Fig. 3
figure 3

Macroscopic cross-sections for the 1D slab reactor, which is modified based on the figure reported in [66]. There are three regions with different materials and functions

Table 1 Six sets of material cross-sections in the work [66] are used to test GIPMNN and DRM

To generate the necessary data to validate the accuracy of GIPMNN, PC-GIPMNN, and DRM, FreeFEM [67,68,69,70,71,72,73] is utilized to solve these problems in Table 1. FreeFEM is a partial differential equation solver for non-linear multi-physics systems using the finite element method. We choose number of cells in the x-direction \(N = 1001\) and mesh size \(\Delta x=10^{-3}\). The solution of the baseline is obtained by FEM, and uniform cells are used to train GIPMNN, PC-GIPMNN, and DRM.

Remark 8

When solving all the parameter-dependent problems below, parameters \(\Sigma ^\textrm{a}\), \(\Sigma ^\textrm{s}\), and \(v\Sigma ^\textrm{f}\) are selected based on whether the data point \(\varvec{x}\) belongs to a related region. For PC-GIPMNN, the data points on the interface are considered to belong to multiple regions simultaneously. Therefore, we can enforce the interface conditions using data points on the interface.

Remark 9

As unsupervised algorithms, our methods use only points to train a neural network without any prior knowledge. Therefore, there was no test set for the proposed algorithm.

4.1.1 Using GIPMNN to solve for Eigenvalue

In this one-dimensional example, we select 20 neurons for each hidden layer in ResNet and \(N_\text {epoch}=50000\). Without a loss of generality, the weights \(\alpha \) and \(\beta \) of the different losses are not adjusted to achieve better results. Therefore, \(\alpha \) and \(\beta \) were set to one.

In Eq. (21), we must solve for \(\varvec{\phi }_{k}\) using the eigenvalues \(\lambda _{k-1}\) and \(\varvec{\phi }_{k-1}\). To accelerate the training process and obtain more accurate results using Algorithm 1, we split the iterations in the original Algorithm 1 into inner and outer iterations, which can be observed in Algorithm 3, where \(N_\text {inner}\) and \(N_\text {outer}\) denote the number of inner and outer iterations, respectively.

We chose the ratios of the outer and inner iterations as 1 : 1, 1 : 10, 1 : 100, 1 : 1000, and 1 : 10000 to investigate the effect of the ratio on the results. Ratio 1 : n implies that the outer code is executed once, whereas the inner code is executed n times. For comparison, the total number of iterations was fixed at \(N_\text {total}=100000\) and \(N_\text {total} = N_\text {inner} \times N_\text {outer}\). The relative errors of \(k_\text {eff}\) and eigenfunction for different ratios of outer and inner iterations during the training process are shown in Fig. 4. The results of \(k_\text {eff}^\text {rel}\) are shown in the first row and those of \(\phi ^\text {rel}\) are shown in the second row, where the relative errors of \(k_\text {eff}\) and eigenfunction are calculated by

$$\begin{aligned} k_\text {eff}^\text {rel}= & {} \frac{|k_\text {eff}(FEM)-k_\text {eff}(NN) |}{k_\text {eff}(FEM)}, \end{aligned}$$
(36)
$$\begin{aligned} \phi ^\text {rel}= & {} \frac{\mathop \textrm{max}\limits _{\varvec{x}}( |\phi (FEM) - \phi (NN) |)}{\mathop \textrm{max}\limits _{\varvec{x}}(|\phi (FEM) |)}, \end{aligned}$$
(37)

respectively. As shown in the figures, the best ratio is 1 : 1 because the relative errors of \(k_\text {eff}\) and eigenfunction of the ratio 1 : 1 is relatively smaller than the others when training the neural network. The convergence worsens when the ratio of the outer and inner iterations changes from 1 : 1 to 1 : 10000. Therefore, we trained GIPMNN using a ratio 1 : 1 to solve 1D and 2D problems.

Algorithm 3
figure e

Iterations in Algorithm 1 are split into inner iterations and outer iterations.

Fig. 4
figure 4

(Color online) Relative error of \(k_\text {eff}\) and \(\phi \) for problems F1, F2, and F3 (from (a) to (c)) with respect to different ratios of outer and inner iterations during training process. The first row shows the results of \(k_\text {eff}^\text {rel}\) and the second row shows the results of \(\phi ^\text {rel}\). The neural network is trained with \(N_\text {total}=100000\)

Fig. 5
figure 5

(Color online) Relative error of \(k_\text {eff}\) and \(\phi \) for problems F1, F2, and F3 (from (a) to (c)) with respect to different ratios of outer and inner iterations during training process. The first row shows the results of \(k_\text {eff}^\text {rel}\) and second row shows the results of \(\phi ^\text {rel}\). The neural network is trained with \(N_\textrm{outer}=1000\)

Moreover, we fixed the number of outer iterations \(N_\textrm{outer}=1000\) and retrained the neural network using the ratios. The results are shown in Fig. 5. The opposite results are observed when compared with the results in Fig. 4. The ratio 1 : 1 is the worst ratio because increases in inner iterations lead to a better approximation of the eigenfunction in the next outer iteration. This is consistent with the results of the inverse power method.

4.1.2 Using PC-GIPMNN to solve for Eigenvalue

We divided the slab reactor into five parts from left to right. The output layer included five neurons, and five functions were defined in different subdomains. Here, u, w, and q denote the functions defined in Region 1, v and p are those defined in Regions 2 and 3, respectively.

As shown in Fig. 3, the four points are denoted as \(x_{i1}\)=2.2, \(x_{i2}\)=2.5, \(x_{i3}=7.5\), and \(x_{i4}=7.8\). The interface conditions (29) and (30) are implemented as loss functions defined in (38) and (39) in the 1D example.

$$\begin{aligned} {{Loss}}_{i1}= & {} |u(x_{i1})-v(x_{i1}) |^2 + |v(x_{i2})-w(x_{i2}) |^2 \nonumber \\{} & {} + |w(x_{i3})-p(x_{i3}) |^2 + |p(x_{i4})-q(x_{i4}) |^2, \end{aligned}$$
(38)
$$\begin{aligned} {{Loss}}_{i2}= & {} |(-D_1 u_x(x_{i1})) - (- D_2 v_x(x_{i1})) |^2 \nonumber \\{} & {} + |(-D_2 v_x(x_{i2})) - (- D_1 w_x(x_{i2})) |^2 \nonumber \\{} & {} + |(-D_1 w_x(x_{i3})) - (- D_3 p_x(x_{i3})) |^2 \nonumber \\{} & {} + |(-D_3 p_x(x_{i4})) - (- D_1 q_x(x_{i4})) |^2. \end{aligned}$$
(39)

The eigenfunction can be expressed as follows:

$$\begin{aligned} \begin{aligned} \phi = ul_1 + vl_2 + wl_3 + pl_4 + ql_5, \end{aligned} \end{aligned}$$
(40)

where \(l_1\), \(l_2\), \(l_3\), \(l_4\), and \(l_5\) are indicator functions that are 1 or 0 based on the input \(\varvec{x}\) inside or outside the subdomain.

Table 2 Results obtained by GIPMNN, PC-GIPMNN, and DRM compared with those obtained by FEM for problems in Table 1. \(k_\text {eff}^\text {rel}\) denotes the relative error of \(k_\text {eff}\)
Table 3 Results obtained by GIPMNN, PC-GIPMNN, and DRM compared with those obtained by FEM for problems in Table 1. \(\phi ^\text {rel}\) denotes the relative error of eigenfunction
Fig. 6
figure 6

(Color online) Eigenfunctions obtained by GIPMNN, PC-GIPMNN, and DRM are compared with those obtained by FEM for one-dimensional example. The eigenfunctions are normalized. The first row shows the results of problems F1, F2, and F3, and the second row shows the results of problems R1, R2, and R3

4.1.3 Using DRM to solve for Eigenvalue

The configuration of DRM is identical to that of GIPMNN. Given that it is a variational formulation, and the boundary condition is incorporated into the Rayleigh quotient, we do not use the penalty term to enforce the boundary condition. The loss function \({{Loss}}_\text {drm}\) is defined as:

$$\begin{aligned} \begin{aligned}&{{Loss}}_\text {drm} = \frac{\frac{{Length}_\text {slab}}{N} \sum _{i=1}^{N} D |\nabla \phi (x_i) |^2 + \frac{1}{2}(\phi (x_l)^2 + \phi (x_r)^2) + \frac{{Length}_\text {slab}}{N} \sum _{i=1}^{N} \Sigma ^\textrm{a} \phi (x_i)^2}{\frac{{Length}_\text {slab}}{N} \sum _{i=1}^{N} v\Sigma ^\textrm{f} \phi (x_i)^2}, \end{aligned} \end{aligned}$$
(41)

where \({Length}_\text {slab}\) denotes the length of the slab reactor, \(x_l\) and \(x_r\) denote points that show the positions of the endpoints of the slab reactor, and N denotes the number of points used to approximate the integral. Therefore, the smallest eigenvalue \(\lambda \) can be obtained after the neural network converges.

4.1.4 Results

The results for the one-dimensional example are listed in Tables 2, 3, and 6. The values of \(k_\text {eff}\) and their relative errors are listed in Table 2. For problems F1 and R1, which have continuous coefficients, the results obtained by GIPMNN are better than those obtained by PC-GIPMNN and DRM. For problem F2, the relative error of \(k_\text {eff}\) obtained by DRM is better than those obtained by other methods. For other problems with discontinuous coefficients, the results obtained using the PC-GIPMNN are better than those obtained by GIPMNN and DRM. The relative error of \(k_\text {eff}\) computed by PC-GIPMNN is approximately \(10^{-5}\), which is lower than \(10^{-4}\) as computed by DRM. The relative errors in the eigenfunctions are listed in Table 3. For all problems, PC-GIPMNN can attain better results than GIPMNN and DRM.

In Fig. 6, the eigenfunctions of GIPMNN, PC-GIPMNN, and DRM are compared with those of FEM. Specifically, we follow the conventional normalization process in nuclear reactor physics [1], where the normalization constant is generally computed to make the average reactor power equal to unity; thus, the eigenfunctions are normalized by Eq. (42), where N denotes the number of training points in the entire domain.

$$\begin{aligned} \begin{aligned}&\phi _{\text{ norm }} = \frac{N}{\sum _{i=1}^N \phi (x_i)}\phi . \end{aligned} \end{aligned}$$
(42)

The figures in the first row show the results for problems F1, F2, and F3 and those in the second row show the results for problems R1, R2, and R3. In each figure, we plot the eigenfunction obtained by FEM and compare the relative errors of the eigenfunctions computed by GIPMNN, PC-GIPMNN, and DRM. Evidently, the results obtained by PC-GIMPNN are better than those obtained by the other methods, which is consistent with the results in Table 3.

As reported in a previous study, [49], IPMNN can attain more accurate eigenvalues when compared to those obtained with DRM when linear eigenvalue problems without interface are considered. Therefore, IPMNN and GIPMNN are suitable to determine the eigenfunction of a problem with a strong form, and DRM is similar to FEM in that DRM is applicable for finding the eigenfunction of a problem with a weak form. Consequently, DRM is better than GIPMNN in this one-dimensional example with discontinuous coefficients. However, given that the interface conditions are well implemented in PC-GIPMNN, it successfully learns the eigenvalue and eigenfunction and achieves better results than GIPMNN and DRM, as shown in Tables 2 and 3.

Remark 10

Specifically, DRM is applicable to determine the eigenfunction of a problem with a weak form, which implies that the eigenfunction exhibits low regularity. Subsequently, as shown in the implementation of GIPMNN, it is necessary for the eigenfunction obtained from GIPMNN to exhibit high regularity. Therefore, the learned eigenfunction is in a strong form. Hence, GIPMNN is unable to obtain accurate values at the interface. However, PC-GIPMNN does not require the eigenfunction to exhibit a higher regularity at the interface but instead guarantees continuity and physical constraints by realizing the interface conditions. Therefore, PC-GIPMNN successfully learns the eigenvalue and eigenfunction and obtains better results when compared to GIPMNN and DRM.

Fig. 7
figure 7

Geometry of a 2D reactor core with graphite, fuel, and four control-rod regions labeled as 1, 2, 3, and 4. The figure is similar to the figure in a previous study [66]

Table 4 Cross section data for various materials of the 2D reactor. This data are similar to those reported in a previous study [66]
Table 5 Configurations for reactor problems with different materials. The configurations are similar to those in a previous study [66]. C-R denotes the control rod region, CR denotes a control rod material, and W denotes water

4.2 Two-dimensional reactor

As shown in Fig. 7, a two-dimensional reactor is modeled in a square-shaped domain with 90-cm sides. The reactor was surrounded by a neutron reflector with graphite material, which implies that Robin boundary condition was selected. The main bulk of the reactor corresponds to the fuel region. Within its central region, four control rods can be inserted or withdrawn. When the control rods are withdrawn, materials in the regions are replaced with water, which corresponds to common practice in many reactor designs. Table 4 lists five different materials used in the mock reactor. However, the two types of fuels in Table 4 are designed to simulate different fuel materials. Fuel type 1 defines the standard fuel used in most problems, and fuel type 2 defines an adjusted fuel composition that is different from fuel type 1. Fuel type 2 in problem R7 was used to test whether our methods can be affected by different types of fuels.

As shown in Table 5, there are 12 problems in validating the accuracy of the proposed method. Five full models which are labeled as F1-F5, were proposed to simulate the reactor with all control rods removed and then with only one control rod inserted in the regions of the control rods. As previously discussed, when the control rod was removed, the material in the region was replaced with water. Therefore, W is used to denote water in Table 5. Other seven reactor configurations, denoted by R1-R7, were proposed to simulate the cases where more control rods were inserted. Problems R1 and R2 are equivalent to the full model problems F1 and F2: Problems R3-R6 utilized different combinations of inserted and rejected control rods. It should be noted that in problem R7, the material configuration differs from the material configuration of other problems. The fuel type was replaced with fuel type 2, and the control rods were assumed to be partially inserted, which implies that the materials in the regions corresponded to a mix of control rods and water materials.

In the two-dimensional case, we use FreeFEM to solve the problems listed in Table 5. We chose uniform grids with \(\Delta x=\frac{1}{90}\) and \(\Delta y=\frac{1}{90}\). We trained GIPMNN, PC-GIPMNN, and DRM with \(N_x=91\) and \(N_y=91\).

4.2.1 Using GIPMNN and PC-GIMPNN to solve for Eigenvalue

The number of points \(N=N_xN_y\) is used to calculate \({{Loss}}_\text {gipmnn}\) and number of points \(N_\textrm{b}=2(N_x-2)+2(N_y-2)+4\) is used to calculate \({{Loss}}_b\). The number of neurons is 20 for each hidden layer in ResNet and \(N_\textrm{epoch}=500000\) for both GIPMNN and PC-GIPMNN. Without a loss of generality, \(\alpha \) and \(\beta \) are set to one. As mentioned previously, we use the optimal ratio of the outer and inner iterations, which is 1 : 1.

The 2D reactor is divided into six parts, as shown in Fig. 7. The output layer includes six neurons, and there are six functions that are defined in different subdomains and are labeled as u, v, w, r, p, and q, where u, v, w, and r denote functions defined in CR1, CR2, CR3, and CR4, and p and q denote functions defined for Fuel and Graphite, respectively. \(S_\mathrm{{CR}1}\), \(S_\mathrm{{CR}2}\), \(S_\mathrm{{CR}3}\), \(S_\mathrm{{CR}4}\), and \(S_\mathrm{{GF}}\) denote different datasets at different interfaces.

Fig. 8
figure 8

(Color online) Failure of DRM during the 2D experiments. DRM fails to learn the eigenfunctions of F1-F5 and R3-R7

For PC-GIPMNN, interface conditions (29) and (30) are implemented as loss functions (43) and (44) in the 2D example.

$$\begin{aligned}{} & {} {{Loss}}_{i1} = \sum _{i=1}^{|S_\mathrm{{CR}1} |} |u(\varvec{x}_i)-p(\varvec{x}_i) |^2 \nonumber \\{} & {} \qquad \qquad \ + \sum _{i=1}^{|S_\mathrm{{CR}2} |} |v(\varvec{x}_i)-p(\varvec{x}_i) |^2 + \sum _{i=1}^{|S_\mathrm{{CR}3} |} |w(\varvec{x}_i)-p(\varvec{x}_i) |^2 \nonumber \\{} & {} \qquad \qquad \ + \sum _{i=1}^{|S_\mathrm{{CR}4} |} |r(\varvec{x}_i)-p(\varvec{x}_i) |^2 \nonumber \\{} & {} + \sum _{i=1}^{|S_\textrm{GF} |} |p(\varvec{x}_i)-q(\varvec{x}_i) |^2, \end{aligned}$$
(43)
$$\begin{aligned}{} & {} {{Loss}}_{i2}\nonumber \\ {}= & {} \sum _{i=1}^{|S_\mathrm{{CR}1} |} |(-D_\mathrm{{CR}1} \nabla u(\varvec{x}_i) \cdot \varvec{n}) - (-D_\textrm{Fuel} \nabla p(\varvec{x}_i) \cdot \varvec{n}) |^2 \nonumber \\{} & {} + \sum _{i=1}^{|S_\mathrm{{CR}2} |} |(-D_\mathrm{{CR}2} \nabla v(\varvec{x}_i) \cdot \varvec{n}) - (-D_\textrm{Fuel} \nabla p(\varvec{x}_i) \cdot \varvec{n}) |^2 \nonumber \\{} & {} + \sum _{i=1}^{|S_\mathrm{{CR}3} |} |(-D_\mathrm{{CR}3} \nabla w(\varvec{x}_i) \cdot \varvec{n}) - (-D_\textrm{Fuel} \nabla p(\varvec{x}_i) \cdot \varvec{n}) |^2 \nonumber \\{} & {} + \sum _{i=1}^{|S_\mathrm{{CR}4} |} |(-D_\mathrm{{CR}4} \nabla r(\varvec{x}_i) \cdot \varvec{n}) - (-D_\textrm{Fuel} \nabla p(\varvec{x}_i) \cdot \varvec{n}) |^2 \nonumber \\{} & {} + \sum _{i=1}^{|S_{\textrm{GF}} |} |(-D_\textrm{Fuel} \nabla p(\varvec{x}_i) \cdot \varvec{n}) - (-D_\textrm{Graphite} \nabla q(\varvec{x}_i) \cdot \varvec{n}) |^2. \nonumber \\ \end{aligned}$$
(44)

The eigenfunction is expressed as follows:

$$\begin{aligned} \begin{aligned} \phi = ul_1 + vl_2 + wl_3 + rl_4 + pl_5 + ql_6, \end{aligned} \end{aligned}$$
(45)

where \(l_1\), \(l_2\), \(l_3\), \(l_4\), \(l_5\) and \(l_6\) denote indicator functions.

Remark 11

In the experiment, it was important to use Eq. (42) instead of the original L2 norm, which is too small to optimize the neural network. Specifically, the L2 norm of the eigenfunction corresponds to one if we attempt to find a normalized eigenfunction. If the total number of points N is excessively high, then the value of each component of the eigenvector is excessively low to such an extent that it is difficult for the neural network to learn the eigenfunction.

4.2.2 Using DRM to solve for Eigenvalue and the failure of DRM during the 2D experiments

Given that the homogeneous Neumann boundary condition is used, the loss function in DRM can be defined as:

$$\begin{aligned} \begin{aligned}&{{Loss}}_\textrm{drm} \\&\quad = \frac{\frac{Area_\text {square}}{N} \sum _{i=1}^{N} D |\nabla \phi (x_i) |^2 + \frac{Area_\text {square}}{N} \sum _{i=1}^{N} \Sigma ^a \phi (x_i)^2}{\frac{Area_\text {square}}{N} \sum _{i=1}^{N} v\Sigma ^f \phi (x_i)^2}, \end{aligned} \end{aligned}$$
(46)

where \(Area_\text {square}\) denotes the area of the square domain as shown in Fig. 7.

We use the number of epochs \(N_\text {epoch}=500000\) in DRM. First, we found that the DRM can learn the eigenvalues and eigenfunctions at an early stage of the training process. Subsequently, the DRM attempts to learn a smaller eigenvalue after it is close to the true eigenvalue. Finally, the DRM successfully learns one smaller eigenvalue that may be close to the true eigenvalue and one terrible eigenfunction that is meaningless and far from the true eigenfunction. This phenomenon is illustrated in Fig. 8.

As listed in Table 6, although the neural network in DRM fails to learn the eigenfunction, the eigenvalue is close to the true value. This phenomenon may be caused by minimization of Rayleigh quotient. As mentioned in [49], the loss approaches zero, whereas the eigenvalue may not reach zero.

Table 6 Failure of DRM during the 2D experiments. The eigenfunction of F1-F5 and R1-R7 is not correct, but the eigenvalue is to the true value
Table 7 Results obtained via GIPMNN, PC-GIPMNN, and DRM compared with FEM for problems in Table 5, \(k_\text {eff}^\text {rel}\) denotes the relative error of \(k_\text {eff}\). Especially, DRM is trained with \(N_\text {epoch}=50000\)

In Fig. 8, we observe that the failure of DRM during the 2D experiments occurs after \(N_\text {epoch} \ge 60000\). Therefore, we select \(N_\text {epoch}=50000\) to train DRM again and record the results. In a previous study [40], the stopping criteria of training process for PINN was investigated. In future studies, we will follow the technique discussed in [40] to determine the stopping criteria for DRM. In the next section, we compare the results of the DRM trained with \(N_\text {epoch}=50000\) with the results of GIPMNN and PC-GIPMNN.

Table 8 Results obtained via GIPMNN, PC-GIPMNN, and DRM when compared with FEM for problems in Table 5, \(\phi ^\text {rel}\) denotes the relative error of the eigenfunction. Especially, DRM is trained with \(N_\text {epoch}=50000\)

Remark 12

The numerical results in Fig. 8 are not due to hardware problems or code errors but can be attributed to the nature of the neural network. The eigenvalue was approximated by constructing a Rayleigh quotient in the DRM. Then, the eigenvalue is treated as a loss function to optimize the neural network. However, this mechanism of minimizing the eigenvalue leads to overfitting of the neural network, as the neural network always attempts to find a point where the loss function tends toward zero.

4.2.3 Results

Similar to the results of the one-dimensional case, the relative errors of \(k_\text {eff}\) and eigenfunction \(\phi \) are shown in Tables 7 and 8. It can be observed that both the relative errors for \(k_\text {eff}\) obtained via all the methods are small, and the results for the DRM are trained with \(N_\textrm{epoch}=50000\).

For problems F1, F2, F4, and R7, the relative error of \(k_\text {eff}\) obtained by DRM was smaller than that obtained by GIPMNN, except for problems F3, F5, R3, R4, R5, and R6. However, although the relative error of \(k_\text {eff}\) obtained by GIPMNN is small, the relative error of \(\phi \) simulated by GIPMNN is larger than that obtained by DRM. It is observed that \(k_\text {eff}^\text {rel}\) obtained by the PC-GIPMNN is smaller than that obtained by GIPMNN and DRM for all problems. Furthermore, the relative errors of \(\phi \) computed by the PC-GIMPNN are smaller than those obtained by the GIPMNN and DRM for all problems. Therefore, the PC-GIPMNN can successfully learn eigenvalues and eigenfunctions.

Fig. 9
figure 9

(Color online) First column shows the heatmap of the eigenfunction of FEM (the first column) and the other columns show the heatmaps of the relative error of GIPMNN (the second column), PC-GIPMNN (the third column), and DRM (the fourth column) for problems F1, F2, F3, F4, and F5. Evidently, GIPMNN less favorable results than DRM. However, by enforcing the interface conditions, PC-GIPMNN outperforms GIPMNN and DRM, as shown in the third column

In Fig. 9 and 10, the eigenfunctions computed by the FEM are shown in the first column, and the relative errors of the eigenfunctions obtained by the GIPMNN, PC-GIPMNN, and DRM are shown in the other columns for different problems. It is observed that the relative errors of the eigenfunction computed by the PC-GIPMNN and DRM are smaller than those obtained by the GIPMNN, which failed to learn some details. Compared to the eigenfunctions computed by the FEM, the results obtained by the PC-GIPMNN are the best among the three methods.

Fig. 10
figure 10

(Color online) First column shows the heatmap of the eigenfunction of FEM (the first column) and the other columns show the heatmaps of the relative error of GIPMNN (the second column), PC-GIPMNN (the third column), and DRM (the fourth column) for problems R3, R4, R5, R6, and R7. Obviously, GIPMNN yields less favorable results than DRM. However, by enforcing the interface conditions, PC-GIPMNN outperforms GIPMNN and DRM, as shown in the third column

4.3 2D IAEA benchmark problem

We also considered the classical 2D IAEA benchmark problem reported in the study by Yang et al. [40], which was modeled using two-dimensional and two-group diffusion equations. Here, one-group neutron diffusion equation, defined in Eq. (4) is considered, and multigroup problems are considered in our future study. Its geometry is shown in Fig. 11. The main bulk of the reactor consists of two fuel regions, labeled 1 and 2, representing the two types of fuel materials. Within its central region, there are four control rods, which are all labeled as 3. The last region, labeled 4, was composed of water. Cross-sectional data for the 2D IAEA benchmark problem are presented in Table 9. It is worth noting that only one quarter of the reactor is shown in this figure because the rest can be inferred by symmetry along the x- and y-axes. Therefore, this 2D IAEA benchmark problem is confined to the two types of boundary conditions defined in Eq. (7). The problem is confined to the Neumann boundary condition on the x- and y-axes and to the Robin boundary condition on the other boundaries.

In this two-dimensional case, we used FreeFEM to solve the 2D IAEA benchmark problem with the parameters listed in Table 9. We selected uniform grids with \(\Delta x=\frac{1}{170}\) and \(\Delta y=\frac{1}{170}\). We trained GIPMNN, PC-GIPMNN and DRM with \(N_x=171\) and \(N_y=171\).

4.3.1 Using GIPMNN and PC-GIMPNN to solve for Eigenvalue

The number of points \(N=N_xN_y\) is used to calculate \({{Loss}}_\text {gipmnn}\). The number of points \(N_\text {Nb}\) on the x- and y-axes and number of points \(N_\text {Rb}\) on the other boundaries were used to calculate \({{Loss}}_\text {Nb}\) and \({{Loss}}_\text {Rb}\), which enforced the Neumann and Robin boundary conditions. The number of neurons was 20 for each hidden layer in ResNet and \(N_\text {epoch}=500000\) for GIPMNN and PC-GIPMNN. As mentioned previously, we used the optimal ratio of the outer and inner iterations, which was 1 : 1.

The 2D reactor is divided into seven parts, as shown in Fig. 11. The output layer has seven neurons, and seven functions are defined in different subdomains and labeled as u, v, w, r, p, q, and h, where u, v, w, and r denote the functions defined in the control rods and p, q, and h are the functions defined in the fuel and water regions, respectively. \(S_{up}\), \(S_{vp}\), \(S_{wp}\), \(S_{rp}\), \(S_{rq}\), \(S_{pq}\), and \(S_{qh}\) denote different datasets at different interfaces.

For the PC-GIPMNN, the interface conditions (29) and (30) are implemented as the loss functions (47) and (48), respectively, in the 2D example.

$$\begin{aligned} {{Loss}}_{i1}= & {} \sum _{i=1}^{|S_{up} |} |u(\varvec{x}_i)-p(\varvec{x}_i) |^2 + \sum _{i=1}^{|S_{vp} |} |v(\varvec{x}_i)-p(\varvec{x}_i) |^2 \nonumber \\{} & {} + \sum _{i=1}^{|S_{wp} |} |w(\varvec{x}_i)-p(\varvec{x}_i) |^2 \nonumber \\{} & {} + \sum _{i=1}^{|S_{rp} |} |r(\varvec{x}_i)-p(\varvec{x}_i) |^2 + \sum _{i=1}^{|S_{rq} |} |r(\varvec{x}_i)-q(\varvec{x}_i) |^2 \nonumber \\{} & {} + \sum _{i=1}^{|S_{pq} |} |p(\varvec{x}_i)-q(\varvec{x}_i) |^2 \nonumber \\{} & {} + \sum _{i=1}^{|S_{qh} |} |q(\varvec{x}_i)-h(\varvec{x}_i) |^2, \end{aligned}$$
(47)
$$\begin{aligned} {{Loss}}_{i2}= & {} \sum _{i=1}^{|S_{up} |} |(-D_{3} \nabla u(\varvec{x}_i) \cdot \varvec{n}) - (-D_{2} \nabla p(\varvec{x}_i) \cdot \varvec{n}) |^2 \nonumber \\{} & {} + \sum _{i=1}^{|S_{vp} |} |(-D_{3} \nabla v(\varvec{x}_i) \cdot \varvec{n}) - (-D_{2} \nabla p(\varvec{x}_i) \cdot \varvec{n}) |^2 \nonumber \\{} & {} + \sum _{i=1}^{|S_{wp} |} |(-D_{3} \nabla w(\varvec{x}_i) \cdot \varvec{n}) - (-D_{2} \nabla p(\varvec{x}_i) \cdot \varvec{n}) |^2 \nonumber \\{} & {} + \sum _{i=1}^{|S_{rp} |} |(-D_{3} \nabla r(\varvec{x}_i) \cdot \varvec{n}) - (-D_{2} \nabla p(\varvec{x}_i) \cdot \varvec{n}) |^2 \nonumber \\{} & {} + \sum _{i=1}^{|S_{rq} |} |(-D_{3} \nabla r(\varvec{x}_i) \cdot \varvec{n}) - (-D_{1} \nabla q(\varvec{x}_i) \cdot \varvec{n}) |^2 \nonumber \\{} & {} + \sum _{i=1}^{|S_{pq} |} |(-D_{2} \nabla p(\varvec{x}_i) \cdot \varvec{n}) - (-D_{1} \nabla q(\varvec{x}_i) \cdot \varvec{n}) |^2 \nonumber \\{} & {} + \sum _{i=1}^{|S_{qh} |} |(-D_{1} \nabla q(\varvec{x}_i) \cdot \varvec{n}) - (-D_{4} \nabla h(\varvec{x}_i) \cdot \varvec{n}) |^2. \nonumber \\ \end{aligned}$$
(48)

The eigenfunction can be expressed as follows:

$$\begin{aligned} \begin{aligned} \phi = ul_1 + vl_2 + wl_3 + rl_4 + pl_5 + ql_6 + hl_7, \end{aligned} \end{aligned}$$
(49)

where \(l_1\), \(l_2\), \(l_3\), \(l_4\), \(l_5\), \(l_6\), and \(l_7\) are the indicator functions.

Fig. 11
figure 11

(Color online) Geometry of the 2D IAEA benchmark problem with two fuel regions, four control-rod regions, and a water region labeled as 1, 2, 3 m and 4. This figure is similar to the figure reported in a previous study [40]

4.3.2 Using DRM to solve for Eigenvalue

Given that the homogeneous Neumann boundary condition is used, the loss function in DRM omits the impact of the Neumann boundary condition and focuses on the Robin boundary condition. The loss function is defined as follows:

$$\begin{aligned}&J = \frac{\frac{Area}{N} \sum _{i=1}^{N} D |\nabla \phi (x_i) |^2 + \frac{Length}{N_\text {Rb}} \sum _{i=1}^{N_\text {Rb}} \frac{1}{2} \phi (\varvec{x}_i)^2 + \frac{Area}{N} \sum _{i=1}^{N} \Sigma ^a \phi (x_i)^2}{\frac{Area}{N} \sum _{i=1}^{N} v\Sigma ^f \phi (x_i)^2}, \end{aligned}$$
(50)

where Area denotes the area of all regions, Length indicates the length of the boundaries other than the x- and y-axes in Fig. 11, and \(N_\text {Rb}\) denotes the number of points on the Robin boundary.

Table 9 Cross section data for the 2D IAEA benchmark problem
Table 10 Results obtained via GIPMNN, PC-GIPMNN, and DRM when compared with FEM for the 2D IAEA benchmark problem. Specifically, \(k_\text {eff}^\text {rel}\) denotes the relative error of \(k_\text {eff}\). Especially, DRM is trained with \(N_\text {epoch}=50000\)

4.3.3 Results

As discussed above, we also trained the DRM with the number of epochs \(N_\text {epoch}=500000\) and found that DRM failed to learn the eigenfunction again. In this case, DRM attains a good \(k_\text {eff}=0.9750\) and the relative errors of \(k_\text {eff}\) and \(\phi \) are \(6.4023\,\times \,10^{-03}\) and 0.9557, respectively. Therefore, we retrained the DRM with \(N_\text {epoch}=50000\) and stored the best results.

The relative errors of \(k_\text {eff}\) and eigenfunction \(\phi \) are shown in Table 10 and Table 11. It was observed that all three methods obtained good results, and the relative errors of \(k_\text {eff}\) of DRM were small. However, the ability of DRM to learn the eigenfunction was worse than that of GIPMNN and PC-GIPMNN and the relative errors of \(\phi \) were the largest. Thus, it can be concluded that the PC-GIPMNN successfully learned the eigenvalues and eigenfunctions. The same conclusion can be drawn from the graphs in Fig. 12.

Table 11 Results obtained via GIPMNN, PC-GIPMNN, and DRM when compared with FEM for the 2D IAEA benchmark problem. \(\phi ^\text {rel}\) denotes the relative error of the eigenfunction. Especially, DRM is trained with \(N_\text {epoch}=50000\)
Fig. 12
figure 12

(Color online) First column shows the heatmap of the eigenfunction of FEM (the first column) and the other columns show the heatmaps of the relative error of GIPMNN (the second column), PC-GIPMNN (the third column), and DRM (the fourth column) for the 2D IAEA benchmark problem. However, by enforcing the interface conditions, PC-GIPMNN outperforms GIPMNN and DRM, as shown in the third column

For the 1D slab reactor, the computation time of FEM is 1.25 s, and the training times of DRM, GIPMNN, and PC-GIPMNN are 4788.37 s, 10614.57 s, and 16052.16 s, respectively. For the 2D reactor, the computation time of FEM is 3.64 s, and the training times of DRM, GIPMNN, and PC-GIPMNN are 5827.91 s, 18444.39 s, and 108072.41 s, respectively. For the 2D IAEA benchmark, the computation time of FEM is 5.22 s, and the training times of DRM, GIPMNN, and PC-GIPMNN are 29352.83 s, 64546.74 s, and 137812.92 s, respectively.

Although PC-GIPMNN was better than DRM and GIPMNN, it required significantly more training time to obtain accurate results. Compared with FEM, neural network methods require excessive time. However, neural networks are an emerging method and we believe that they will achieve better results in the near future.

5 Conclusion

In this study, we proposed two methods, GIPMNN and PC-GIPMNN, to solve generalized K-eigenvalue problems in nuclear reactor physics. We also conducted a comprehensive study of GIPMNN, PC-GIPMNN, and DRM. GIPMNN follows the main idea of the inverse power method to find the smallest eigenvalue. The PC-GIPMNN enforce interface conditions through multiple neurons in the output layer. The concept of DRM is to define the function of the Rayleigh quotient and form an optimization problem. Unlike DRM solving for the smallest eigenvalue by directly minimizing the eigenvalue (Rayleigh quotient), the GIPMNN and PC-GIPMNN attain the smallest eigenvalue using the iterative method. All the methods used neural networks to represent functions and the differential was implemented using AD. Finally, we applied these three methods to problems in reactor physics.

Three numerical experiments were conducted to verify the applicability and accuracy of the GIPMNN, PC-GIPMNN, and DRM. In the first 1D example, we used inner and outer iterations for the simulation. According to our test, the best ratio of outer and inner iterations was 1 : 1. Furthermore, we compared the results of the GIPMNN, PC-GIPMNN, and DRM with those of the FEM. For the continuous problem, the solution learned by the GIPMNN was more accurate than those learned by the DRM and PC-GIPMNN. For interface problem, the eigenvalue and eigenfunction learned by PC-GIPMNN were better than that learned by DRM and GIPMNN. This is due to the interface conditions that are implemented in the loss function of PC-GIPMNN.

In the 2D examples, we observed the failure of DRM on the 2D experiments. The DRM can learn the eigenvalue and eigenfunction at the early stage of the training process, and the results decrease when \(N_\text {epoch}\) increases. Therefore, we selected \(N_\text {epoch}=50000\) to train the DRM and compare the results obtained by the GIPMNN, PC-GIPMNN, and DRM with those obtained by the FEM. The results show that the PC-GIPMNN outperforms the GIPMNN and DRM for the results of eigenvalue and eigenfunction. Moreover, the GIPMNN and PC-GIPMNN are more stable than the DRM.

Although good results were obtained, there are still some aspects that need to be examined in the future. First, given that the architecture of a neural network significantly influences the accuracy of our methods, it is important to design a universal network architecture to achieve high accuracy. For example, MLP is widely used in the current field of solving PDEs using neural networks, and ResNet [60] is effective in improving the convergence rate and may even obtain better results than MLP. Recently, a transformer [74] was used for operator learning and better results were obtained. Sifan Wang et al. [62] investigated the effect of MLP on operator learning and proposed a modified MLP [61], which is a new architecture of MLP that improves accuracy. Although GIPMNN and PC-GIPMNN are more stable than DRM and PC-GIPMNN is more accurate than DRM, they require a large number of iterations and a long training time to achieve good results. Therefore, improving convergence and reducing training time will be investigated in our future study. Furthermore, the failure of DRM on 2D experiments on the eigenvalue problem should be studied and clarified. Finally, for the interface problem, a suitable sampling algorithm can facilitate the training process and provide a better approximation. The next study could also involve the implementation of the proposed networks in emerging reactor digital twins [75,76,77] as core solvers.