Skip to main content
Log in

Support vector machine-based importance sampling for rare event estimation

  • Research Paper
  • Published:
Structural and Multidisciplinary Optimization Aims and scope Submit manuscript

Abstract

Structural reliability analysis aims at computing failure probability with respect to prescribed performance function. To efficiently estimate the structural failure probability, a novel two-stage meta-model importance sampling based on the support vector machine (SVM) is proposed. Firstly, a quasi-optimal importance sampling density function is approximated by SVM. To construct the SVM model, a multi-point enrichment algorithm allowing adding several training points in each iteration is employed. Then, the augmented failure probability and quasi-optimal importance sampling samples can be obtained by the trained SVM model. Secondly, the current SVM model is further polished by selecting informative training points from the quasi-optimal importance sampling samples until it can accurately recognize the states of samples, and the correction factor is estimated by the well-trained SVM model. Finally, the failure probability is obtained by the product of augmented failure probability and correction factor. The proposed method provides an algorithm to efficiently deal with multiple failure regions and rare events. Several examples are performed to illustrate the feasibility of the proposed method.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16

Similar content being viewed by others

References

  • Alibrandi U, Alani AM, Ricciardi G (2015) A new sampling strategy for SVM-based response surface for structural reliability analysis. Probabilistic Eng Mech 41:1–12

    Article  Google Scholar 

  • Basudhar A (2015) Multi-objective optimization using adaptive explicit non-dominated region sampling. In: 11th world congress on structural and multidisciplinary optimization

  • Basudhar A, Missoum S (2007) Parallel update of failure domain boundaries constructed using support vector machines. In: 7th World Congress on Structural and Multidisciplinary Optimization, Seoul, Korea

  • Basudhar A, Missoum S (2008) Adaptive explicit decision functions for probabilistic design and optimization using support vector machines. Comput Struct 86(19–20):1904–1917

    Article  Google Scholar 

  • Basudhar A, Missoum S (2009) Local update of support vector machine decision boundaries. In: 50th AIAA/ASME/ASCE/AHS/ASC Structures, Structural Dynamics, and Materials Conference 17th AIAA/ASME/AHS Adaptive Structures Conference 11th AIAA No 2009 (p. 2189)

  • Basudhar A, Missoum S (2010) An improved adaptive sampling scheme for the construction of explicit boundaries. Struct Multidiscip Optim 42(4):517–529

    Article  Google Scholar 

  • Basudhar A, Missoum S, Sanchez AH (2007) Limit state function identification using support vector machines for discontinuous responses and disjoint failure domains. Probabilistic Eng Mech 23(1):1–11

    Article  Google Scholar 

  • Basudhar A, Dribusch C, Lacaze S, Missoum S (2012) Constrained efficient global optimization with support vector machines. Struct Multidiscip Optim 46(2):201–221

    Article  Google Scholar 

  • Basudhar A, Witowski K, Gandikota I (2020) Sequential optimization & probabilistic analysis using adaptively refined constraints in LS-OPT®. 16th International LS-DYNA® Users Conference

  • Bichon BJ, Eldred MS, Swiler LP, Mahadevan S, McFarland JM (2008) Efficient global reliability analysis for nonlinear implicit performance functions. AIAA J 46:2459–2468

    Article  Google Scholar 

  • Bourinet JM (2016) Rare-event probability estimation with adaptive support vector regression surrogates. Reliab Eng Syst Saf 150:210–221

    Article  Google Scholar 

  • Bourinet JM, Deheeger F, Lemaire M (2011) Assessing small failure probabilities by combined subset simulation and support vector machines. Struct Saf 33(6):343–353

    Article  Google Scholar 

  • Cadini F, Santos F, Zio E (2014) An improved adaptive kriging-based importance technique for sampling multiple failure regions of low probability. Reliab Eng Syst Saf 131:109–117

    Article  Google Scholar 

  • Cheng K, Lu ZZ (2018) Adaptive sparse polynomial chaos expansions for global sensitivity analysis based on support vector regression. Comput Struct 194:86–96

    Article  Google Scholar 

  • Cheng K, Lu ZZ, Zhou YC, Shi Y, Wei YH (2017) Global sensitivity analysis using support vector regression. Appl Math Model 49:587–598

    Article  MathSciNet  Google Scholar 

  • Derennes P, Morio J, Simatos F (2019) A nonparametric importance sampling estimator for moment independent importance measures. Reliab Eng Syst Saf 187:3–16

    Article  Google Scholar 

  • Dubourg V, Sudret B, Deheeger F (2013) Metamodel-based importance sampling for structural reliability analysis. Probabilistic Eng Mech 33:47–57

    Article  Google Scholar 

  • Echard B, Gayton N, Lemaire M (2011) AK-MCS: an active learning reliability method combining Kriging and Monte Carlo simulation. Struct Saf 33:145–154

    Article  Google Scholar 

  • Echard B, Gayton N, Lemaire M, Relun N (2013) A combined importance sampling and Kriging reliability method for small failure probabilities with time-demanding numerical methods. Reliab Eng Syst Saf 111:232–240

    Article  Google Scholar 

  • He W, Zeng Y, Li G (2020) An adaptive polynomial chaos expansion for high-dimensional reliability analysis. Struct Multidiscip Optim. https://doi.org/10.1007/s00158-020-02594-4

  • Hurtado JE (2004) An examination of methods for approximating implicit limit state functions from the viewpoint of statistical learning theory. Struct Saf 26(3):271–293

    Article  Google Scholar 

  • Hurtado JE (2007) Filtered importance sampling with support vector margin: a powerful method for structural reliability analysis. Struct Saf 29(1):2–15

    Article  Google Scholar 

  • Lacaze S, Missoum S (2014) A generalized “max-min” sample for surrogate update. Struct Multidiscip Optim 49(4):683–687

    Article  Google Scholar 

  • Ling CY, Lu ZZ, Zhu XM (2019) Efficient methods by active learning Kriging coupled with variance reduction based sampling methods for time-dependent failure probability. Reliab Eng Syst Saf 188:23–35

    Article  Google Scholar 

  • Ling CY, Lu ZZ, Sun B, Wang MJ (2020) An efficient method combining active learning Kriging and Monte Carlo simulation for profust failure probability. Fuzzy Sets Syst 387:89–107

    Article  MathSciNet  Google Scholar 

  • MacKay D (1992) Information-based objective functions for active data selection. Neural Comput 4(4):590–604

    Article  Google Scholar 

  • Misaka T (2020) Image-based fluid data assimilation with deep neural network. Struct Multidiscip Optim. https://doi.org/10.1007/s00158-020-02537-z

  • Pan QJ, Dias D (2017) An efficient reliability method combining adaptive support vector machine and Monte Carlo simulation. Struct Saf 67:85–95

    Article  Google Scholar 

  • Rocco CM, Moreno JA (2002) Fast Monte Carlo reliability evaluation using support vector machine. Reliab Eng Syst Saf 76(3):237–243

    Article  Google Scholar 

  • Song H, Choi KK, Lee I, Zhao L, Lamb D (2013) Adaptive virtual support vector machine for reliability analysis for high-dimensional problems. Struct Multidiscip Optim 47(4):479–491

    Article  MathSciNet  Google Scholar 

  • Tharwat A (2019) Parameter investigation of support vector machine classifier with kernel functions. Knowl Inf Syst. https://doi.org/10.1007/s10115-019-01335-4

  • Vapnik VN (2000) The nature of statistical learning theory. Springer Verlag, New York

    Book  Google Scholar 

  • Vapnik VN (1998) Statistical learning theory. Wiley, New York (1)

  • Wang ZQ, Wang PF (2014) A maximum confidence enhancement based sequential sampling scheme for simulation-based design. J Mech Des 136:021006–021001

    Article  Google Scholar 

  • Wang ZQ, Wang PF (2016) Accelerated failure identification sampling for probability analysis of rare events. Struct Multidiscip Optim 54(1):137–149

    Article  MathSciNet  Google Scholar 

  • Xing J, Luo Y, Gao Z (2020) A global optimization strategy based on the Kriging surrogate model and parallel computing. Struct Multidiscip Optim 62:405–417

    Article  Google Scholar 

Download references

Funding

This work was supported by the National Natural Science Foundation of China (Grant no. NSFC 52075442), and National Science and Technology Major Project (Grant no. 2017-IV-0009-0046).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Zhenzhou Lu.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Ethical approval

This manuscript is approved by all authors for publication. We would like to declare that the work described was an original research that has not been published previously, and not under consideration for publication elsewhere, in whole or in part.

Replication of results

The MATLAB codes used to generate the results are available in the Supplementary information.

Additional information

Responsible Editor: Erdem Acar

Publisher's note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

ESM 1

(RAR 1 kb)

Appendices

Appendix 1. Support vector machine

This section illustrates the detailed knowledge of SVM, which includes the linear SVM, nonlinear SVM, and imperfect SVM. They are shown as follows.

1.1 Linear SVM

Given a two-class problem, suppose we have Nt sets of labeled training datum \( \left\{{\mathbf{x}}_i^t,{y}_i^t\right\}\left(i=1,2,\cdots, {N}_t\right) \), where \( {\mathbf{x}}_i^t\in {R}^n \) is the training sample of inputs and \( {y}_i^t\in \left\{-1,+1\right\} \) is the sign of \( {\mathbf{x}}_i^t \). SVM aims at searching an optimal hyperplane which is a decision function (also named as separating function) for which all vectors labeled as “− 1” are located on one side and all vectors labeled as “+ 1” on the other side. The optimal hyperplane is the one that has the largest distance to the nearest training samples of any class (maximum margin).

Considering a possible hyperplane (the SVM decision boundary) as (28), which divides the space into two spaces (just as shown in Fig. 17): (1) positive half space where the samples from the positive class (+ 1) are located and (2) negative half space where the samples from the negative class (− 1) are located (Tharwat 2019).

$$ H:\kern0.4em {w}^{\mathrm{T}}\mathbf{x}+b=0 $$
(28)

in which the weight vector w is perpendicular to the hyperplane and b is a scalar parameter which represents the bias or threshold.

Fig. 17
figure 17

Decision hyperplane generated by a linear SVM

The goal of SVM is to determine the values of w and b to orient the hyperplane to be as far as possible from the closest samples. Two hyperplanes (H1 and H2) parallel to decision boundary H are shown as follows,

$$ \kern0.2em \Big\{{\displaystyle \begin{array}{l}{H}_1:\kern0.4em {\mathbf{w}}^{\mathrm{T}}\mathbf{x}+b=+1\\ {}{H}_2:\kern0.5em {\mathbf{w}}^{\mathrm{T}}\mathbf{x}+b=-1\kern0.3em \end{array}}\kern0.4em $$
(29)

There are no data points between H1 and H2. Let d+ (d) be the shortest distance from the decision boundary to the closest positive (negative) point. The distance between H1 and H2 (the margin) is d+ + d, and d+ = d = 1/‖w‖, thus the margin is 2/‖w‖.

All training points \( \left\{{\mathbf{x}}_i^t,{y}_i^t\right\}\left(i=1,2,\cdots, {N}_t\right) \) should satisfy the following constrains,

$$ {y}_i^t\left({\mathbf{w}}^{\mathrm{T}}{\mathbf{x}}_i^t+b\right)\ge 1\kern0.5em $$
(30)

This constraint ensures that there is no sample existing inside the margin. Therefore, determining the optimal hyperplane with maximum margin is equivalently reduced to find the pair of hyperplanes that give the maximum margin,

$$ \Big\{{\displaystyle \begin{array}{l}\min \left\{\frac{1}{2}{\left\Vert \mathbf{w}\right\Vert}^2\right\}\\ {}\mathrm{s}.\mathrm{t}.\kern0.6em {y}_i^t\left({\mathbf{w}}^{\mathrm{T}}{\mathbf{x}}_i^t+b\right)\ge 1\kern2.6em \end{array}}\kern0.2em \left(i=1,2,\cdots, {N}_t\right) $$
(31)

Introducing Lagrange multipliers αi ≥ 0(i = 1, 2, ⋯, Nt), a Lagrangian function for the above optimization problem can be defined,

$$ L\left(\mathbf{w},b,\boldsymbol{\upalpha} \right)=\frac{1}{2}{w}^{\mathrm{T}}\mathbf{w}-\sum \limits_{i=1}^{N_t}\left[{\alpha}_i{y}_i^t\left({\mathbf{w}}^{\mathrm{T}}{\mathbf{x}}_i^t+b\right)-{\alpha}_i\right] $$
(32)

Now, L(w, b, α) must be minimized with respect to w and b with the condition that the derivatives of L(w, b, α) with respect to all the αi ≥ 0(i = 1, 2, ⋯, Nt) that vanished. The constrains associated with the gradient give the following conditions,

$$ \Big\{{\displaystyle \begin{array}{l}\mathbf{w}=\sum \limits_{i=1}^{N_t}{\alpha}_i{y}_i^t{\mathbf{x}}_i^t\\ {}\sum \limits_{i=1}^{N_t}{\alpha}_i{y}_i^t=0\end{array}}\kern1.4em $$
(33)

Substituting (33) for (32), the Wolfe dual formulation is obtained,

$$ {L}_D\left(\boldsymbol{\upalpha} \right)=\sum \limits_{i=1}^{N_t}{\alpha}_i-\frac{1}{2}\sum \limits_{i,j=1}^{N_t}{\alpha}_i{\alpha}_j{y}_i^t{y}_j^t{\left({\mathbf{x}}_i^t\right)}^{\mathrm{T}}{\mathbf{x}}_j^t $$
(34)

When the maximal margin hyperplanes (H1 and H2) are found, only those sample points which lie closest to the decision boundary (H) satisfy αi > 0, and these points are termed as support vectors (just as the “*” points shown in Fig. 17). That is to say, the Lagrange multipliers associated with the support vectors are positive while the other samples have Lagrange multipliers equal to zero. A SVM model trained using only the support vectors is identical as the one obtained using all the data samples. Typically, the number of support vectors is much smaller than Nt (Basudhar and Missoum 2010).

The classification of any arbitrary point x to be predicted is determined by the following function,

$$ s\left(\mathbf{x}\right)=s\left[\sum \limits_{j=1}^{N_{\mathrm{SV}}}{\alpha}_j^{\ast }{y}_j^{\ast }{\left({\mathbf{x}}_j^{\ast}\right)}^{\mathrm{T}}\mathbf{x}+b\right] $$
(35)

where s(⋅) is the symbolic function, \( {\mathbf{x}}_j^{\ast}\left(j=1,2,\cdots, {N}_{\mathrm{SV}}\right) \) are NSV support vectors, \( {y}_j^{\ast}\left(j=1,2,\cdots, {N}_{\mathrm{SV}}\right) \) are the signs of \( {\mathbf{x}}_j^{\ast}\left(j=1,2,\cdots, {N}_{\mathrm{SV}}\right) \). \( {\alpha}_j^{\ast}\left(j=1,2,\cdots, {N}_{\mathrm{SV}}\right) \) represent the Lagrange multipliers corresponding to the support vectors.

For the primal L(w, b, α), the Karush-Kuhn-Tucker (KKT) conditions are

$$ \Big\{{\displaystyle \begin{array}{l}\frac{\partial L\left(\mathbf{w},b,\boldsymbol{\upalpha} \right)}{\partial {w}_l}=0\\ {}\frac{\partial L\left(\mathbf{w},b,\boldsymbol{\upalpha} \right)}{\partial b}=0\\ {}{y}_i^t\left({\mathbf{w}}^{\mathrm{T}}{\mathbf{x}}_i^t+b\right)\ge 1\kern0.3em \\ {}{\alpha}_i{y}_i^t\left({\mathbf{w}}^{\mathrm{T}}{\mathbf{x}}_i^t+b\right)=0\kern0.4em ,\kern0.3em {\alpha}_i\ge 0\end{array}}l=1,2,\cdots, n;\kern0.4em i=1,2,\cdots, {N}_t\kern0.1em $$
(36)

The KKT conditions are necessary and sufficient for w, b and α to be optimal solutions. It is noted that, while w is explicitly determined by the training procedure, the threshold b is found by using the KKT conditions.

1.2 Nonlinear SVM

This section presents how SVM works in the case of nonlinearly separable samples. The main idea is to map the input data into a higher-dimensional feature space where the problem is linearly separable, just as shown in Fig. 18.

Fig. 18
figure 18

A nonlinear separating region transformed into a linear one

Denote the nonlinear mapping function as Φ(⋅), the Lagrangian function in the higher-dimensional feature space is

$$ {L}_D\left(\boldsymbol{\upalpha} \right)=\sum \limits_{i=1}^{N_t}{\alpha}_i-\frac{1}{2}\sum \limits_{i,j=1}^{N_t}{\alpha}_i{\alpha}_j{y}_i^t{y}_j^t\varPhi \left({\mathbf{x}}_i^t\right)\varPhi \left({x}_j^t\right) $$
(37)

Suppose \( \varPsi \left({\mathbf{x}}_i^t,{\mathbf{x}}_j^t\right)=\varPhi \left({\mathbf{x}}_i^t\right)\cdot \varPhi \left({\mathbf{x}}_j^t\right) \), i.e., the dot product in the higher-dimensional feature space defines a kernel function of the input space. Therefore, it is not necessary to be explicit about the mapping function Φ(⋅) as long as it is known that the kernel function \( \varPsi \left({\mathbf{x}}_i^t,{\mathbf{x}}_j^t\right) \) corresponds to a dot product in some higher-dimensional feature space.

There are many kernel functions that can be used, for example,

  1. (1)

    Linear kernel \( \varPsi \left({\mathbf{x}}_i^t,{\mathbf{x}}_j^t\right)={\left({x}_i^t\right)}^{\mathrm{T}}{\mathbf{x}}_j^t \)

  2. (2)

    Polynomial kernel \( \varPsi \left({\mathbf{x}}_i^t,{\mathbf{x}}_j^t\right)={\left({\left({\mathbf{x}}_i^t\right)}^{\mathrm{T}}{\mathbf{x}}_j^t+1\right)}^q \)

  3. (3)

    Gaussian kernel \( \varPsi \left({\mathbf{x}}_i^t,{\mathbf{x}}_j^t\right)=\exp \left(-\gamma {\left\Vert {\mathbf{x}}_i^t-{\mathbf{x}}_j^t\right\Vert}^2\right) \)

where q and γ are the parameters needed to be decided. The linear kernel is suitable for linear classification, whereas the polynomial and Gaussian kernels are applicable to the nonlinear classification. It must be observed that the nature of the optimization problem in the SVM makes the problem kernel-insensitive. In empirical applications in very high dimensional spaces, it has been found that a large part of the support vectors determined with different kernels coincides (Hurtado 2004; Vapnik 2000).

The prediction of classification of a point x is then expressed as follows,

$$ s\left(\mathbf{x}\right)=s\left[\sum \limits_{j=1}^{N_{SV}}{\alpha}_j^{\ast }{y}_j^{\ast}\varPsi \left(\mathbf{x},{\mathbf{x}}_j^{\ast}\right)+b\right] $$
(38)

1.3 Imperfect SVM

SVM can be extended to allow for imperfect separation. That is data between H1 and H2 can be penalized. The penalty P will be finite.

Introduce the nonnegative slack variables ζi ≥ 0 so that

$$ \Big\{{\displaystyle \begin{array}{l}{\mathbf{w}}^{\mathrm{T}}{\mathbf{x}}_i^t+b\ge +1-{\zeta}_i\kern2.799999em \mathrm{for}\kern0.5em {y}_i^t=+1\\ {}{\mathbf{w}}^{\mathrm{T}}{\mathbf{x}}_i^t+b\le -1+{\zeta}_i\kern2.799999em \mathrm{for}\kern0.5em {y}_i^t=-1\end{array}} $$
(39)

and add to the objective function in (31) a penalizing term, the problem is now formulated as

$$ \Big\{{\displaystyle \begin{array}{l}\min \left\{\frac{1}{2}{\left\Vert \mathbf{w}\right\Vert}^2+P\sum \limits_{i=1}^{N_t}{\zeta}_i\right\}\\ {}\mathrm{s}.\mathrm{t}.\kern0.7em {y}_i^t\left({\mathbf{w}}^{\mathrm{T}}{\mathbf{x}}_i^t+b\right)-1+{\zeta}_i\ge 0\kern2.5em \end{array}}{\zeta}_i\ge 0,\kern0.5em i=1,2,\cdots, {N}_t\kern0.2em $$
(40)

Use the Lagrange multipliers and the Wolfe dual formulation, the problem is shown as follows (Rocco and Moreno 2002),

$$ \Big\{{\displaystyle \begin{array}{l}\max \kern0.8000001em {L}_D\left(\boldsymbol{\upalpha} \right)=\sum \limits_{i=1}^{N_t}{\alpha}_i-\frac{1}{2}\sum \limits_{i,j=1}^{N_t}{\alpha}_i{\alpha}_j{y}_i^t{y}_j^t\varPhi \left({\mathbf{x}}_i^t\right)\varPhi \left({\mathbf{x}}_j^t\right)\\ {}\mathrm{s}.\mathrm{t}.\kern1.5em 0\le {\alpha}_i\le P,\kern1.1em \sum \limits_{i=1}^{N_t}{\alpha}_i{y}_i^t=0\kern0.7em \end{array}} $$
(41)

The only difference from the perfect separation case is that αi(i = 1, 2, ⋯, Nt) are now bounded above by P. The soft margin parameter P permits the misclassification and should be specified by the user. Increasing P generates a stricter separation between classes. If we reduce P towards 0, it makes misclassification less important, in contrast, if we increase P to infinity, it means no misclassification is allowed.

In summary, there are two parameters that should be tuned by the users when use the SVM, i.e., the penalty P which controls the trade-off between minimizing the training error and maximizing the classification margin, and the kernel parameter which determines the distances between patterns into the new space, dimensions of the new space, and the complexity of the classification model.

Appendix 2. Calculation of the variation coefficient

The estimator in (18) is defined as the product of two unbiased independent estimators. The calculation of the variation coefficient of the final estimator \( \hat{P}\left\{F\right\}={\hat{P}}_{\varepsilon}\left\{F\right\}{\hat{\alpha}}_C \) proceeds as follows.

First of all, according to its definition, the variance reads

$$ \mathrm{Var}\left[\hat{P}\left\{F\right\}\right]=\mathrm{Var}\left[{\hat{P}}_{\varepsilon}\left\{F\right\}{\hat{\alpha}}_C\right]=E\left[{\hat{P}}_{\varepsilon}^2\left\{F\right\}{\hat{\alpha}}_C^2\right]-{E}^2\left[{\hat{P}}_{\varepsilon}\left\{F\right\}{\hat{\alpha}}_C\right] $$
(42)

Since the two estimators \( {\hat{P}}_{\varepsilon}\left\{F\right\} \) and \( {\hat{\alpha}}_C \) are independent, the variance also reads

$$ \mathrm{Var}\left[\hat{P}\left\{F\right\}\right]=E\left[{\hat{P}}_{\varepsilon}^2\left\{F\right\}\right]E\left[{\hat{\alpha}}_C^2\right]-{E}^2\left[{\hat{P}}_{\varepsilon}\left\{F\right\}\right]{E}^2\left[{\hat{\alpha}}_C\right] $$
(43)

According to the Konig-Huyghens theorem, \( \mathrm{Var}\left[\hat{P}\left\{F\right\}\right] \) can be further elaborated

$$ \mathrm{Var}\left[\hat{P}\left\{F\right\}\right]=\left(E\left[{\hat{P}}_{\varepsilon}^2\left\{F\right\}\right]+\mathrm{Var}\left[{\hat{P}}_{\varepsilon}\left\{F\right\}\right]\right)\left(E\left[{\hat{\alpha}}_C^2\right]+\mathrm{Var}\left[{\hat{\alpha}}_C\right]\right)-{E}^2\left[{\hat{P}}_{\varepsilon}\left\{F\right\}\right]{E}^2\left[{\hat{\alpha}}_C\right] $$
(44)

Taking advantage of the unbiasedness of the estimators, we can obtain the following result,

$$ {\displaystyle \begin{array}{l}\mathrm{Var}\left[\hat{P}\left\{F\right\}\right]=\left({\hat{P}}_{\varepsilon}^2\left\{F\right\}+\mathrm{Var}\left[{\hat{P}}_{\varepsilon}\left\{F\right\}\right]\right)\left({\hat{\alpha}}_C^2+\mathrm{Var}\left[{\hat{\alpha}}_C\right]\right)-{\hat{P}}_{\varepsilon}^2\left\{F\right\}{\hat{\alpha}}_C^2\\ {}\kern4.599998em =\mathrm{Var}\left[{\hat{P}}_{\varepsilon}\left\{F\right\}\right]\mathrm{Var}\left[{\hat{\alpha}}_C\right]+{\hat{P}}_{\varepsilon}^2\left\{F\right\}\mathrm{Var}\left[{\hat{\alpha}}_C\right]+{\hat{\alpha}}_C^2\mathrm{Var}\left[{\hat{P}}_{\varepsilon}\left\{F\right\}\right]\end{array}} $$
(45)

Then, the variation coefficient of the estimator is expressed as follows,

$$ {\displaystyle \begin{array}{l}\delta \left[\hat{P}\left\{F\right\}\right]=\frac{\sqrt{\mathrm{Var}\left[\hat{P}\left\{F\right\}\right]}}{\hat{P}\left\{F\right\}}\\ {}=\frac{\sqrt{\mathrm{Var}\left[{\hat{P}}_{\varepsilon}\left\{F\right\}\right]\mathrm{Var}\left[{\hat{\alpha}}_C\right]+{\hat{P}}_{\varepsilon}^2\left\{F\right\}\mathrm{Var}\left[{\hat{\alpha}}_C\right]+{\hat{\alpha}}_C^2\mathrm{Var}\left[{\hat{P}}_{\varepsilon}\left\{F\right\}\right]}}{{\hat{P}}_{\varepsilon}\left\{F\right\}{\hat{\alpha}}_C}\\ {}=\sqrt{\frac{\mathrm{Var}\left[{\hat{P}}_{\varepsilon}\left\{F\right\}\right]\mathrm{Var}\left[{\hat{\alpha}}_C\right]}{{\hat{P}}_{\varepsilon}^2\left\{F\right\}{\hat{\alpha}}_C^2}+\frac{{\hat{P}}_{\varepsilon}^2\left\{F\right\}\mathrm{Var}\left[{\hat{\alpha}}_C\right]}{{\hat{P}}_{\varepsilon}^2\left\{F\right\}{\hat{\alpha}}_C^2}+\frac{{\hat{\alpha}}_C^2\mathrm{Var}\left[{\hat{P}}_{\varepsilon}\left\{F\right\}\right]}{{\hat{P}}_{\varepsilon}^2\left\{F\right\}{\hat{\alpha}}_C^2}}\\ {}=\sqrt{\delta^2\left[{\hat{P}}_{\varepsilon}\left\{F\right\}\right]{\delta}^2\left[{\hat{\alpha}}_C\right]+{\delta}^2\left[{\hat{\alpha}}_C\right]+{\delta}^2\left[{\hat{P}}_{\varepsilon}\left\{F\right\}\right]}\end{array}} $$
(46)

In practice, usual target variation coefficient is smaller than 10% so that

$$ \delta \left[\hat{P}\left\{F\right\}\right]\approx \sqrt{\delta^2\left[{\hat{P}}_{\varepsilon}\left\{F\right\}\right]+{\delta}^2\left[{\hat{\alpha}}_C\right]}\kern1.3em \mathrm{for}\kern0.6em \delta \left[{\hat{P}}_{\varepsilon}\left\{F\right\}\right]\ll 1,\delta \left[{\hat{\alpha}}_C\right]\ll 1 $$
(47)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ling, C., Lu, Z. Support vector machine-based importance sampling for rare event estimation. Struct Multidisc Optim 63, 1609–1631 (2021). https://doi.org/10.1007/s00158-020-02809-8

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00158-020-02809-8

Keywords

Navigation