Support vector machine-based importance sampling for rare event estimation

Ling, Chunyan; Lu, Zhenzhou

doi:10.1007/s00158-020-02809-8

Support vector machine-based importance sampling for rare event estimation

Research Paper
Published: 25 February 2021

Volume 63, pages 1609–1631, (2021)
Cite this article

Structural and Multidisciplinary Optimization Aims and scope Submit manuscript

Chunyan Ling¹ &
Zhenzhou Lu¹

897 Accesses
15 Citations
Explore all metrics

Abstract

Structural reliability analysis aims at computing failure probability with respect to prescribed performance function. To efficiently estimate the structural failure probability, a novel two-stage meta-model importance sampling based on the support vector machine (SVM) is proposed. Firstly, a quasi-optimal importance sampling density function is approximated by SVM. To construct the SVM model, a multi-point enrichment algorithm allowing adding several training points in each iteration is employed. Then, the augmented failure probability and quasi-optimal importance sampling samples can be obtained by the trained SVM model. Secondly, the current SVM model is further polished by selecting informative training points from the quasi-optimal importance sampling samples until it can accurately recognize the states of samples, and the correction factor is estimated by the well-trained SVM model. Finally, the failure probability is obtained by the product of augmented failure probability and correction factor. The proposed method provides an algorithm to efficiently deal with multiple failure regions and rare events. Several examples are performed to illustrate the feasibility of the proposed method.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 4

Machine learning techniques applied to mechanical fault diagnosis and fault prognosis in the context of real industrial manufacturing use-cases: a systematic literature review

Article 04 March 2022

Marta Fernandes, Juan Manuel Corchado & Goreti Marreiros

Machine Learning Algorithms in Civil Structural Health Monitoring: A Systematic Review

Article 29 July 2020

Majdi Flah, Itzel Nunez, … Moncef L. Nehdi

Artificial Intelligence, Machine Learning, and Deep Learning in Structural Engineering: A Scientometrics Review of Trends and Best Practices

Article 24 July 2022

Arash Teymori Gharah Tapeh & M. Z. Naser

References

Alibrandi U, Alani AM, Ricciardi G (2015) A new sampling strategy for SVM-based response surface for structural reliability analysis. Probabilistic Eng Mech 41:1–12
Article Google Scholar
Basudhar A (2015) Multi-objective optimization using adaptive explicit non-dominated region sampling. In: 11th world congress on structural and multidisciplinary optimization
Basudhar A, Missoum S (2007) Parallel update of failure domain boundaries constructed using support vector machines. In: 7th World Congress on Structural and Multidisciplinary Optimization, Seoul, Korea
Basudhar A, Missoum S (2008) Adaptive explicit decision functions for probabilistic design and optimization using support vector machines. Comput Struct 86(19–20):1904–1917
Article Google Scholar
Basudhar A, Missoum S (2009) Local update of support vector machine decision boundaries. In: 50th AIAA/ASME/ASCE/AHS/ASC Structures, Structural Dynamics, and Materials Conference 17th AIAA/ASME/AHS Adaptive Structures Conference 11th AIAA No 2009 (p. 2189)
Basudhar A, Missoum S (2010) An improved adaptive sampling scheme for the construction of explicit boundaries. Struct Multidiscip Optim 42(4):517–529
Article Google Scholar
Basudhar A, Missoum S, Sanchez AH (2007) Limit state function identification using support vector machines for discontinuous responses and disjoint failure domains. Probabilistic Eng Mech 23(1):1–11
Article Google Scholar
Basudhar A, Dribusch C, Lacaze S, Missoum S (2012) Constrained efficient global optimization with support vector machines. Struct Multidiscip Optim 46(2):201–221
Article Google Scholar
Basudhar A, Witowski K, Gandikota I (2020) Sequential optimization & probabilistic analysis using adaptively refined constraints in LS-OPT®. 16th International LS-DYNA® Users Conference
Bichon BJ, Eldred MS, Swiler LP, Mahadevan S, McFarland JM (2008) Efficient global reliability analysis for nonlinear implicit performance functions. AIAA J 46:2459–2468
Article Google Scholar
Bourinet JM (2016) Rare-event probability estimation with adaptive support vector regression surrogates. Reliab Eng Syst Saf 150:210–221
Article Google Scholar
Bourinet JM, Deheeger F, Lemaire M (2011) Assessing small failure probabilities by combined subset simulation and support vector machines. Struct Saf 33(6):343–353
Article Google Scholar
Cadini F, Santos F, Zio E (2014) An improved adaptive kriging-based importance technique for sampling multiple failure regions of low probability. Reliab Eng Syst Saf 131:109–117
Article Google Scholar
Cheng K, Lu ZZ (2018) Adaptive sparse polynomial chaos expansions for global sensitivity analysis based on support vector regression. Comput Struct 194:86–96
Article Google Scholar
Cheng K, Lu ZZ, Zhou YC, Shi Y, Wei YH (2017) Global sensitivity analysis using support vector regression. Appl Math Model 49:587–598
Article MathSciNet Google Scholar
Derennes P, Morio J, Simatos F (2019) A nonparametric importance sampling estimator for moment independent importance measures. Reliab Eng Syst Saf 187:3–16
Article Google Scholar
Dubourg V, Sudret B, Deheeger F (2013) Metamodel-based importance sampling for structural reliability analysis. Probabilistic Eng Mech 33:47–57
Article Google Scholar
Echard B, Gayton N, Lemaire M (2011) AK-MCS: an active learning reliability method combining Kriging and Monte Carlo simulation. Struct Saf 33:145–154
Article Google Scholar
Echard B, Gayton N, Lemaire M, Relun N (2013) A combined importance sampling and Kriging reliability method for small failure probabilities with time-demanding numerical methods. Reliab Eng Syst Saf 111:232–240
Article Google Scholar
He W, Zeng Y, Li G (2020) An adaptive polynomial chaos expansion for high-dimensional reliability analysis. Struct Multidiscip Optim. https://doi.org/10.1007/s00158-020-02594-4
Hurtado JE (2004) An examination of methods for approximating implicit limit state functions from the viewpoint of statistical learning theory. Struct Saf 26(3):271–293
Article Google Scholar
Hurtado JE (2007) Filtered importance sampling with support vector margin: a powerful method for structural reliability analysis. Struct Saf 29(1):2–15
Article Google Scholar
Lacaze S, Missoum S (2014) A generalized “max-min” sample for surrogate update. Struct Multidiscip Optim 49(4):683–687
Article Google Scholar
Ling CY, Lu ZZ, Zhu XM (2019) Efficient methods by active learning Kriging coupled with variance reduction based sampling methods for time-dependent failure probability. Reliab Eng Syst Saf 188:23–35
Article Google Scholar
Ling CY, Lu ZZ, Sun B, Wang MJ (2020) An efficient method combining active learning Kriging and Monte Carlo simulation for profust failure probability. Fuzzy Sets Syst 387:89–107
Article MathSciNet Google Scholar
MacKay D (1992) Information-based objective functions for active data selection. Neural Comput 4(4):590–604
Article Google Scholar
Misaka T (2020) Image-based fluid data assimilation with deep neural network. Struct Multidiscip Optim. https://doi.org/10.1007/s00158-020-02537-z
Pan QJ, Dias D (2017) An efficient reliability method combining adaptive support vector machine and Monte Carlo simulation. Struct Saf 67:85–95
Article Google Scholar
Rocco CM, Moreno JA (2002) Fast Monte Carlo reliability evaluation using support vector machine. Reliab Eng Syst Saf 76(3):237–243
Article Google Scholar
Song H, Choi KK, Lee I, Zhao L, Lamb D (2013) Adaptive virtual support vector machine for reliability analysis for high-dimensional problems. Struct Multidiscip Optim 47(4):479–491
Article MathSciNet Google Scholar
Tharwat A (2019) Parameter investigation of support vector machine classifier with kernel functions. Knowl Inf Syst. https://doi.org/10.1007/s10115-019-01335-4
Vapnik VN (2000) The nature of statistical learning theory. Springer Verlag, New York
Book Google Scholar
Vapnik VN (1998) Statistical learning theory. Wiley, New York (1)
Wang ZQ, Wang PF (2014) A maximum confidence enhancement based sequential sampling scheme for simulation-based design. J Mech Des 136:021006–021001
Article Google Scholar
Wang ZQ, Wang PF (2016) Accelerated failure identification sampling for probability analysis of rare events. Struct Multidiscip Optim 54(1):137–149
Article MathSciNet Google Scholar
Xing J, Luo Y, Gao Z (2020) A global optimization strategy based on the Kriging surrogate model and parallel computing. Struct Multidiscip Optim 62:405–417
Article Google Scholar

Download references

Funding

This work was supported by the National Natural Science Foundation of China (Grant no. NSFC 52075442), and National Science and Technology Major Project (Grant no. 2017-IV-0009-0046).

Author information

Authors and Affiliations

School of Aeronautics Xi’an, Northwestern Polytechnical University, Xi’an, 710072, Shaanxi, China
Chunyan Ling & Zhenzhou Lu

Authors

Chunyan Ling
View author publications
You can also search for this author in PubMed Google Scholar
Zhenzhou Lu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Zhenzhou Lu.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Ethical approval

This manuscript is approved by all authors for publication. We would like to declare that the work described was an original research that has not been published previously, and not under consideration for publication elsewhere, in whole or in part.

Replication of results

The MATLAB codes used to generate the results are available in the Supplementary information.

Additional information

Responsible Editor: Erdem Acar

Publisher's note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

ESM 1

(RAR 1 kb)

Appendices

Appendix 1. Support vector machine

This section illustrates the detailed knowledge of SVM, which includes the linear SVM, nonlinear SVM, and imperfect SVM. They are shown as follows.

1.1 Linear SVM

Given a two-class problem, suppose we have N_t sets of labeled training datum $ \left\{{\mathbf{x}}_i^t,{y}_i^t\right\}\left(i=1,2,\cdots, {N}_t\right) $, where $ {\mathbf{x}}_i^t\in {R}^n $ is the training sample of inputs and $ {y}_i^t\in \left\{-1,+1\right\} $ is the sign of $ {\mathbf{x}}_i^t $. SVM aims at searching an optimal hyperplane which is a decision function (also named as separating function) for which all vectors labeled as “− 1” are located on one side and all vectors labeled as “+ 1” on the other side. The optimal hyperplane is the one that has the largest distance to the nearest training samples of any class (maximum margin).

Considering a possible hyperplane (the SVM decision boundary) as (28), which divides the space into two spaces (just as shown in Fig. 17): (1) positive half space where the samples from the positive class (+ 1) are located and (2) negative half space where the samples from the negative class (− 1) are located (Tharwat 2019).

$$ H:\kern0.4em {w}^{\mathrm{T}}\mathbf{x}+b=0 $$

(28)

in which the weight vector w is perpendicular to the hyperplane and b is a scalar parameter which represents the bias or threshold.

The goal of SVM is to determine the values of w and b to orient the hyperplane to be as far as possible from the closest samples. Two hyperplanes (H₁ and H₂) parallel to decision boundary H are shown as follows,

$$ \kern0.2em \Big\{{\displaystyle \begin{array}{l}{H}_1:\kern0.4em {\mathbf{w}}^{\mathrm{T}}\mathbf{x}+b=+1\\ {}{H}_2:\kern0.5em {\mathbf{w}}^{\mathrm{T}}\mathbf{x}+b=-1\kern0.3em \end{array}}\kern0.4em $$

(29)

There are no data points between H₁ and H₂. Let d₊ (d₋) be the shortest distance from the decision boundary to the closest positive (negative) point. The distance between H₁ and H₂ (the margin) is d₊ + d₋, and d₊ = d₋ = 1/‖w‖, thus the margin is 2/‖w‖.

All training points $ \left\{{\mathbf{x}}_i^t,{y}_i^t\right\}\left(i=1,2,\cdots, {N}_t\right) $ should satisfy the following constrains,

$$ {y}_i^t\left({\mathbf{w}}^{\mathrm{T}}{\mathbf{x}}_i^t+b\right)\ge 1\kern0.5em $$

(30)

This constraint ensures that there is no sample existing inside the margin. Therefore, determining the optimal hyperplane with maximum margin is equivalently reduced to find the pair of hyperplanes that give the maximum margin,

$$ \Big\{{\displaystyle \begin{array}{l}\min \left\{\frac{1}{2}{\left\Vert \mathbf{w}\right\Vert}^2\right\}\\ {}\mathrm{s}.\mathrm{t}.\kern0.6em {y}_i^t\left({\mathbf{w}}^{\mathrm{T}}{\mathbf{x}}_i^t+b\right)\ge 1\kern2.6em \end{array}}\kern0.2em \left(i=1,2,\cdots, {N}_t\right) $$

(31)

Introducing Lagrange multipliers α_i ≥ 0(i = 1, 2, ⋯, N_t), a Lagrangian function for the above optimization problem can be defined,

$$ L\left(\mathbf{w},b,\boldsymbol{\upalpha} \right)=\frac{1}{2}{w}^{\mathrm{T}}\mathbf{w}-\sum \limits_{i=1}^{N_t}\left[{\alpha}_i{y}_i^t\left({\mathbf{w}}^{\mathrm{T}}{\mathbf{x}}_i^t+b\right)-{\alpha}_i\right] $$

(32)

Now, L(w, b, α) must be minimized with respect to w and b with the condition that the derivatives of L(w, b, α) with respect to all the α_i ≥ 0(i = 1, 2, ⋯, N_t) that vanished. The constrains associated with the gradient give the following conditions,

$$ \Big\{{\displaystyle \begin{array}{l}\mathbf{w}=\sum \limits_{i=1}^{N_t}{\alpha}_i{y}_i^t{\mathbf{x}}_i^t\\ {}\sum \limits_{i=1}^{N_t}{\alpha}_i{y}_i^t=0\end{array}}\kern1.4em $$

(33)

Substituting (33) for (32), the Wolfe dual formulation is obtained,

$$ {L}_D\left(\boldsymbol{\upalpha} \right)=\sum \limits_{i=1}^{N_t}{\alpha}_i-\frac{1}{2}\sum \limits_{i,j=1}^{N_t}{\alpha}_i{\alpha}_j{y}_i^t{y}_j^t{\left({\mathbf{x}}_i^t\right)}^{\mathrm{T}}{\mathbf{x}}_j^t $$

(34)

When the maximal margin hyperplanes (H₁ and H₂) are found, only those sample points which lie closest to the decision boundary (H) satisfy α_i > 0, and these points are termed as support vectors (just as the “*” points shown in Fig. 17). That is to say, the Lagrange multipliers associated with the support vectors are positive while the other samples have Lagrange multipliers equal to zero. A SVM model trained using only the support vectors is identical as the one obtained using all the data samples. Typically, the number of support vectors is much smaller than N_t (Basudhar and Missoum 2010).

The classification of any arbitrary point x to be predicted is determined by the following function,

$$ s\left(\mathbf{x}\right)=s\left[\sum \limits_{j=1}^{N_{\mathrm{SV}}}{\alpha}_j^{\ast }{y}_j^{\ast }{\left({\mathbf{x}}_j^{\ast}\right)}^{\mathrm{T}}\mathbf{x}+b\right] $$

(35)

where s(⋅) is the symbolic function, $ {\mathbf{x}}_j^{\ast}\left(j=1,2,\cdots, {N}_{\mathrm{SV}}\right) $ are N_SV support vectors, $ {y}_j^{\ast}\left(j=1,2,\cdots, {N}_{\mathrm{SV}}\right) $ are the signs of $ {\mathbf{x}}_j^{\ast}\left(j=1,2,\cdots, {N}_{\mathrm{SV}}\right) $. $ {\alpha}_j^{\ast}\left(j=1,2,\cdots, {N}_{\mathrm{SV}}\right) $ represent the Lagrange multipliers corresponding to the support vectors.

For the primal L(w, b, α), the Karush-Kuhn-Tucker (KKT) conditions are

$$ \Big\{{\displaystyle \begin{array}{l}\frac{\partial L\left(\mathbf{w},b,\boldsymbol{\upalpha} \right)}{\partial {w}_l}=0\\ {}\frac{\partial L\left(\mathbf{w},b,\boldsymbol{\upalpha} \right)}{\partial b}=0\\ {}{y}_i^t\left({\mathbf{w}}^{\mathrm{T}}{\mathbf{x}}_i^t+b\right)\ge 1\kern0.3em \\ {}{\alpha}_i{y}_i^t\left({\mathbf{w}}^{\mathrm{T}}{\mathbf{x}}_i^t+b\right)=0\kern0.4em ,\kern0.3em {\alpha}_i\ge 0\end{array}}l=1,2,\cdots, n;\kern0.4em i=1,2,\cdots, {N}_t\kern0.1em $$

(36)

The KKT conditions are necessary and sufficient for w, b and α to be optimal solutions. It is noted that, while w is explicitly determined by the training procedure, the threshold b is found by using the KKT conditions.

1.2 Nonlinear SVM

This section presents how SVM works in the case of nonlinearly separable samples. The main idea is to map the input data into a higher-dimensional feature space where the problem is linearly separable, just as shown in Fig. 18.

Denote the nonlinear mapping function as Φ(⋅), the Lagrangian function in the higher-dimensional feature space is

$$ {L}_D\left(\boldsymbol{\upalpha} \right)=\sum \limits_{i=1}^{N_t}{\alpha}_i-\frac{1}{2}\sum \limits_{i,j=1}^{N_t}{\alpha}_i{\alpha}_j{y}_i^t{y}_j^t\varPhi \left({\mathbf{x}}_i^t\right)\varPhi \left({x}_j^t\right) $$

(37)

Suppose $ \varPsi \left({\mathbf{x}}_i^t,{\mathbf{x}}_j^t\right)=\varPhi \left({\mathbf{x}}_i^t\right)\cdot \varPhi \left({\mathbf{x}}_j^t\right) $, i.e., the dot product in the higher-dimensional feature space defines a kernel function of the input space. Therefore, it is not necessary to be explicit about the mapping function Φ(⋅) as long as it is known that the kernel function $ \varPsi \left({\mathbf{x}}_i^t,{\mathbf{x}}_j^t\right) $ corresponds to a dot product in some higher-dimensional feature space.

There are many kernel functions that can be used, for example,

(1)
Linear kernel $ \varPsi \left({\mathbf{x}}_i^t,{\mathbf{x}}_j^t\right)={\left({x}_i^t\right)}^{\mathrm{T}}{\mathbf{x}}_j^t $
(2)
Polynomial kernel $ \varPsi \left({\mathbf{x}}_i^t,{\mathbf{x}}_j^t\right)={\left({\left({\mathbf{x}}_i^t\right)}^{\mathrm{T}}{\mathbf{x}}_j^t+1\right)}^q $
(3)
Gaussian kernel $ \varPsi \left({\mathbf{x}}_i^t,{\mathbf{x}}_j^t\right)=\exp \left(-\gamma {\left\Vert {\mathbf{x}}_i^t-{\mathbf{x}}_j^t\right\Vert}^2\right) $

where q and γ are the parameters needed to be decided. The linear kernel is suitable for linear classification, whereas the polynomial and Gaussian kernels are applicable to the nonlinear classification. It must be observed that the nature of the optimization problem in the SVM makes the problem kernel-insensitive. In empirical applications in very high dimensional spaces, it has been found that a large part of the support vectors determined with different kernels coincides (Hurtado 2004; Vapnik 2000).

The prediction of classification of a point x is then expressed as follows,

$$ s\left(\mathbf{x}\right)=s\left[\sum \limits_{j=1}^{N_{SV}}{\alpha}_j^{\ast }{y}_j^{\ast}\varPsi \left(\mathbf{x},{\mathbf{x}}_j^{\ast}\right)+b\right] $$

(38)

1.3 Imperfect SVM

SVM can be extended to allow for imperfect separation. That is data between H₁ and H₂ can be penalized. The penalty P will be finite.

Introduce the nonnegative slack variables ζ_i ≥ 0 so that

$$ \Big\{{\displaystyle \begin{array}{l}{\mathbf{w}}^{\mathrm{T}}{\mathbf{x}}_i^t+b\ge +1-{\zeta}_i\kern2.799999em \mathrm{for}\kern0.5em {y}_i^t=+1\\ {}{\mathbf{w}}^{\mathrm{T}}{\mathbf{x}}_i^t+b\le -1+{\zeta}_i\kern2.799999em \mathrm{for}\kern0.5em {y}_i^t=-1\end{array}} $$

(39)

and add to the objective function in (31) a penalizing term, the problem is now formulated as

$$ \Big\{{\displaystyle \begin{array}{l}\min \left\{\frac{1}{2}{\left\Vert \mathbf{w}\right\Vert}^2+P\sum \limits_{i=1}^{N_t}{\zeta}_i\right\}\\ {}\mathrm{s}.\mathrm{t}.\kern0.7em {y}_i^t\left({\mathbf{w}}^{\mathrm{T}}{\mathbf{x}}_i^t+b\right)-1+{\zeta}_i\ge 0\kern2.5em \end{array}}{\zeta}_i\ge 0,\kern0.5em i=1,2,\cdots, {N}_t\kern0.2em $$

(40)

Use the Lagrange multipliers and the Wolfe dual formulation, the problem is shown as follows (Rocco and Moreno 2002),

$$ \Big\{{\displaystyle \begin{array}{l}\max \kern0.8000001em {L}_D\left(\boldsymbol{\upalpha} \right)=\sum \limits_{i=1}^{N_t}{\alpha}_i-\frac{1}{2}\sum \limits_{i,j=1}^{N_t}{\alpha}_i{\alpha}_j{y}_i^t{y}_j^t\varPhi \left({\mathbf{x}}_i^t\right)\varPhi \left({\mathbf{x}}_j^t\right)\\ {}\mathrm{s}.\mathrm{t}.\kern1.5em 0\le {\alpha}_i\le P,\kern1.1em \sum \limits_{i=1}^{N_t}{\alpha}_i{y}_i^t=0\kern0.7em \end{array}} $$

(41)

The only difference from the perfect separation case is that α_i(i = 1, 2, ⋯, N_t) are now bounded above by P. The soft margin parameter P permits the misclassification and should be specified by the user. Increasing P generates a stricter separation between classes. If we reduce P towards 0, it makes misclassification less important, in contrast, if we increase P to infinity, it means no misclassification is allowed.

In summary, there are two parameters that should be tuned by the users when use the SVM, i.e., the penalty P which controls the trade-off between minimizing the training error and maximizing the classification margin, and the kernel parameter which determines the distances between patterns into the new space, dimensions of the new space, and the complexity of the classification model.

Appendix 2. Calculation of the variation coefficient

The estimator in (18) is defined as the product of two unbiased independent estimators. The calculation of the variation coefficient of the final estimator $ \hat{P}\left\{F\right\}={\hat{P}}_{\varepsilon}\left\{F\right\}{\hat{\alpha}}_C $ proceeds as follows.

First of all, according to its definition, the variance reads

$$ \mathrm{Var}\left[\hat{P}\left\{F\right\}\right]=\mathrm{Var}\left[{\hat{P}}_{\varepsilon}\left\{F\right\}{\hat{\alpha}}_C\right]=E\left[{\hat{P}}_{\varepsilon}^2\left\{F\right\}{\hat{\alpha}}_C^2\right]-{E}^2\left[{\hat{P}}_{\varepsilon}\left\{F\right\}{\hat{\alpha}}_C\right] $$

(42)

Since the two estimators $ {\hat{P}}_{\varepsilon}\left\{F\right\} $ and $ {\hat{\alpha}}_C $ are independent, the variance also reads

$$ \mathrm{Var}\left[\hat{P}\left\{F\right\}\right]=E\left[{\hat{P}}_{\varepsilon}^2\left\{F\right\}\right]E\left[{\hat{\alpha}}_C^2\right]-{E}^2\left[{\hat{P}}_{\varepsilon}\left\{F\right\}\right]{E}^2\left[{\hat{\alpha}}_C\right] $$

(43)

According to the Konig-Huyghens theorem, $ \mathrm{Var}\left[\hat{P}\left\{F\right\}\right] $ can be further elaborated

$$ \mathrm{Var}\left[\hat{P}\left\{F\right\}\right]=\left(E\left[{\hat{P}}_{\varepsilon}^2\left\{F\right\}\right]+\mathrm{Var}\left[{\hat{P}}_{\varepsilon}\left\{F\right\}\right]\right)\left(E\left[{\hat{\alpha}}_C^2\right]+\mathrm{Var}\left[{\hat{\alpha}}_C\right]\right)-{E}^2\left[{\hat{P}}_{\varepsilon}\left\{F\right\}\right]{E}^2\left[{\hat{\alpha}}_C\right] $$

(44)

Taking advantage of the unbiasedness of the estimators, we can obtain the following result,

$$ {\displaystyle \begin{array}{l}\mathrm{Var}\left[\hat{P}\left\{F\right\}\right]=\left({\hat{P}}_{\varepsilon}^2\left\{F\right\}+\mathrm{Var}\left[{\hat{P}}_{\varepsilon}\left\{F\right\}\right]\right)\left({\hat{\alpha}}_C^2+\mathrm{Var}\left[{\hat{\alpha}}_C\right]\right)-{\hat{P}}_{\varepsilon}^2\left\{F\right\}{\hat{\alpha}}_C^2\\ {}\kern4.599998em =\mathrm{Var}\left[{\hat{P}}_{\varepsilon}\left\{F\right\}\right]\mathrm{Var}\left[{\hat{\alpha}}_C\right]+{\hat{P}}_{\varepsilon}^2\left\{F\right\}\mathrm{Var}\left[{\hat{\alpha}}_C\right]+{\hat{\alpha}}_C^2\mathrm{Var}\left[{\hat{P}}_{\varepsilon}\left\{F\right\}\right]\end{array}} $$

(45)

Then, the variation coefficient of the estimator is expressed as follows,

$$ {\displaystyle \begin{array}{l}\delta \left[\hat{P}\left\{F\right\}\right]=\frac{\sqrt{\mathrm{Var}\left[\hat{P}\left\{F\right\}\right]}}{\hat{P}\left\{F\right\}}\\ {}=\frac{\sqrt{\mathrm{Var}\left[{\hat{P}}_{\varepsilon}\left\{F\right\}\right]\mathrm{Var}\left[{\hat{\alpha}}_C\right]+{\hat{P}}_{\varepsilon}^2\left\{F\right\}\mathrm{Var}\left[{\hat{\alpha}}_C\right]+{\hat{\alpha}}_C^2\mathrm{Var}\left[{\hat{P}}_{\varepsilon}\left\{F\right\}\right]}}{{\hat{P}}_{\varepsilon}\left\{F\right\}{\hat{\alpha}}_C}\\ {}=\sqrt{\frac{\mathrm{Var}\left[{\hat{P}}_{\varepsilon}\left\{F\right\}\right]\mathrm{Var}\left[{\hat{\alpha}}_C\right]}{{\hat{P}}_{\varepsilon}^2\left\{F\right\}{\hat{\alpha}}_C^2}+\frac{{\hat{P}}_{\varepsilon}^2\left\{F\right\}\mathrm{Var}\left[{\hat{\alpha}}_C\right]}{{\hat{P}}_{\varepsilon}^2\left\{F\right\}{\hat{\alpha}}_C^2}+\frac{{\hat{\alpha}}_C^2\mathrm{Var}\left[{\hat{P}}_{\varepsilon}\left\{F\right\}\right]}{{\hat{P}}_{\varepsilon}^2\left\{F\right\}{\hat{\alpha}}_C^2}}\\ {}=\sqrt{\delta^2\left[{\hat{P}}_{\varepsilon}\left\{F\right\}\right]{\delta}^2\left[{\hat{\alpha}}_C\right]+{\delta}^2\left[{\hat{\alpha}}_C\right]+{\delta}^2\left[{\hat{P}}_{\varepsilon}\left\{F\right\}\right]}\end{array}} $$

(46)

In practice, usual target variation coefficient is smaller than 10% so that

$$ \delta \left[\hat{P}\left\{F\right\}\right]\approx \sqrt{\delta^2\left[{\hat{P}}_{\varepsilon}\left\{F\right\}\right]+{\delta}^2\left[{\hat{\alpha}}_C\right]}\kern1.3em \mathrm{for}\kern0.6em \delta \left[{\hat{P}}_{\varepsilon}\left\{F\right\}\right]\ll 1,\delta \left[{\hat{\alpha}}_C\right]\ll 1 $$

(47)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ling, C., Lu, Z. Support vector machine-based importance sampling for rare event estimation. Struct Multidisc Optim 63, 1609–1631 (2021). https://doi.org/10.1007/s00158-020-02809-8

Download citation

Received: 10 July 2020
Revised: 08 October 2020
Accepted: 04 December 2020
Published: 25 February 2021
Issue Date: April 2021
DOI: https://doi.org/10.1007/s00158-020-02809-8

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Support vector machine-based importance sampling for rare event estimation

Abstract

Access this article

Similar content being viewed by others

Machine learning techniques applied to mechanical fault diagnosis and fault prognosis in the context of real industrial manufacturing use-cases: a systematic literature review

Machine Learning Algorithms in Civil Structural Health Monitoring: A Systematic Review

Artificial Intelligence, Machine Learning, and Deep Learning in Structural Engineering: A Scientometrics Review of Trends and Best Practices

References

Funding