Estimation of conditional distribution functions from data with additional errors applied to shape optimization

We study the problem of estimating conditional distribution functions from data containing additional errors. The only assumption on these errors is that a weighted sum of the absolute errors tends to zero with probability one for sample size tending to infinity. We prove sufficient conditions on the weights (e.g. fulfilled by kernel weights) of a local averaging estimate of the codf, based on data with errors, which ensure strong pointwise consistency. We show that two of the three sufficient conditions on the weights and a weaker version of the third one are also necessary for the spc. We also give sufficient conditions on the weights, which ensure a certain rate of convergence. As an application we estimate the codf of the number of cycles until failure based on data from experimental fatigue tests and use it as objective function in a shape optimization of a component.


Introduction
Let (X , Y ) be a random vector, such that X is R d -valued and Y is real-valued, with conditional distribution function (codf) F, i.e., One idea to construct estimates, approaching the codf asymptotically for some fixed y ∈ R and P X -almost all x ∈ R d (where P X is the of X induced measure on R d , B d , i.e., P X (B) = P (X ∈ B) for every B ∈ B d ), is to use an independent and identically distributed (i.i.d.) sample (X 1 , Y 1 ), …, (X n , Y n ) of (X , Y ) to compute a local averaging estimate of the codf. Here W n,i (x) for i = 1, . . . , n are nonnegative weights, which can depend on the samples X 1 , . . . , X n . A commonly used example for those weights are the weights of the so-called kernel estimate, which are defined by where 0/0 = 0 by definition (cf., e.g., Nadaraya (1964) and Watson (1964)). Here h n > 0 is the so-called bandwidth and K : R d → R is a so-called kernel function, e.g., the so-called naive kernel defined by For a fixed y ∈ R, the estimate introduced in (1) is a special case of an estimate of a regression function m(x) = E { Y | X = x} (with the choice of I {Y ≤y} as dependent variable). Thus, all known results on estimates of the regression function do also apply for the corresponding estimates of the codf. The regression estimate corresponding to the one of the codf in (1), i.e., has been seminally considered by Stone (1977). In particular, in Theorem 1 Stone gave necessary and sufficient conditions on the weights, such that the regression estimate m n is weakly consistent in L r for every real r ≥ 1, i.e., E |m n (x) − m (x)| r P X (dx) → 0 (n → ∞).
These conditions are for example fulfilled by the special choices of the weights of the partitioning, kernel and nearest-neighbor estimate, for details we refer to Chapters 4, 5 and 6 in Györfi et al. (2002).
Since the work of Stone (1977), several authors also dealt with strongly pointwise consistency of special regression function estimates for dependent variable Y , which are almost surely bounded by some constant. In case of the estimation of the codf we say that the estimate F n of the codf F is strongly pointwise consistent for some fixed y ∈ R, if F n (y, x) → F (y, x) a.s. for P X -almost every x. (3) In the context of nonparametric regression, Devroye (1981) showed the strongly pointwise consistency of the kernel regression estimate, presuming that K is a so-called window kernel and that the bandwidth h n fulfills some mild asymptotic conditions. Greblicki et al. (1984) generalized this consistency result to some broader class of kernels with possibly unbounded support. Stute (1986) also showed a result concerning the uniform pointwise consistency of the kernel estimate of the conditional distribution function. A proof of the strongly pointwise consistency of the partitioning regression estimate can be found in (Györfi et al. 2002, Theorem 25.6.). Györfi (1981a) and Devroye (1981) independently showed results concerning the strongly pointwise consistency of the nearest neighbor regression estimates. Devroye (1982) also gave necessary and sufficient conditions for the strongly pointwise consistency of the nearest neighbor regression estimates.
In order to obtain consistency results for all distributions (X , Y ) with E |Y | < ∞ (so-called universal consistency), some authors considered modified versions of the above mentioned estimates [cf., e.g., Walk (2001), Algoet and Györfi (1999)]. See also (Györfi et al. 2002, Chapter 25) and the literature cited therein for other estimates and further results on the strongly universal pointwise consistency.
Rates of convergence in probability for the kernel regression estimate have been obtained in Krzyzak and Pawlak (1987) and in Györfi (1981b) for the nearest neighbor regression estimate. Uniform almost sure rates of convergence for regression estimates have been shown in Härdle et al. (1988) by considering a more general setting of kerneltype estimators of conditional functionals. Optimal global rates of convergence for nonparametric regression estimates have been shown by Stone (1982).
Other estimates of the codf have been proposed by Hall et al. (1999), who studied the rate of convergence of a weighted kernel estimator. Cai (2002) showed asymptotic normality of this estimate. Furthermore, Hall and Yao (2005) used a dimension reduction technique to approximate the codf and study the asymptotic properties. Preadjusted local averaging estimates of the codf were proposed by Veraverbeke et al. (2014), who proved results concerning the uniform rate of convergence.
So far, only Liero (1989) and  studied a local averaging regression estimate with generalized weights and formulated conditions for consistency results. Liero (1989) assumed the weights to have the special form where φ i,n is a Borel-measurable function on R d ×R d and does therefore only depend on X i and formulated conditions that ensure a certain uniformly strong rate of convergence.  gave conditions on the above introduced general weights W n,i of a local averaging regression estimate which imply the strongly universally consistency, i.e.
To the authors knowledge there is no result so far, which characterizes necessary and sufficient conditions on the above mentioned general weights W n,i , that ensure the strongly pointwise consistency or a certain rate of convergence in probability of the corresponding local averaging estimate of the conditional distribution function. One of the main goals of this paper is to present these two results. A further aspect investigated in this paper is the consideration of additional errors in the data, which is motivated by an application in the context of shape optimization with respect to the fatigue life of a component. A short overview on the method used to asses the fatigue behaviour will be described in the following section.

Application in the context of experimental fatigue tests
In order to predict the fatigue life of a certain material, we use data from so-called strain-controlled fatigue tests, in which a material sample gets repeatedly elongated by a fixed strain amplitude ε. The repetitions, the so-called number of cycles N , until the material fails are counted and the corresponding stress amplitude τ is measured. Repeating this experiment yields data (m) for each material m. Since the mentioned strain-contolled fatigue tests are very time consuming, we only have 12 data points for the material of interest, which is not enough for a nonparametric estimation of the conditional distribution function of the number of cycles given a certain strain amplitude ε. Thus, we assume the model to hold. In Sect. 3.1. we will describe a suitable method to estimate μ and σ byμ and σ , respectively, such that we can finally obtain datâ Due to the assumption in (4) the conditional distribution function of the number of cycles given a strain amplitude ε can be determined by a simple linear transformation from the distribution function of δ (m) . Since we only have available 4 to 35 of the above data samples per material to estimate the distribution function of δ (m) , we will use data samples from other materials, that have similar static material properties. To this end we use an estimate of the conditional distribution function, with the vector of five statical material properties (Young's modulus, the yield limit for 0.2% residual elongation, the tensile strength, the static strength coefficient and the static strain hardening exponent) as covariate X (m) and the samples δ (m) i as dependent variable. More precisely we apply a nonparametric estimate of the codf to the data : m is a material in our database Furthermore the above data points contain errors in the dependent variable since we only estimated μ (m) and σ (m) , which leads to the topic of this paper, where we want to investigate how additional errors in the dependent variable influence an estimate of codf and show theoretical results concerning strongly pointwise consistency and rate of convergence in probability.

Data with errors
Motivated by the application described in the previous subsection, we generalize our mathematical setting and assume that we only have available data X 1 ,Ȳ 1,n , …, X n ,Ȳ n,n with errors in the samples of the dependent variable instead of the i.i.d. data (X 1 , Y 1 ), …, (X n , Y n ).
In our above mentioned application we do not know anything explicitly on the errorsȲ i,n − Y i (i = 1, . . . , n). Thus, we are not able to impose a structure on those errors. In particular, we can not assume that those errors have to be random and in case that they are random they do not need to be independent or identically distributed and they do not need to have expectation zero, so estimates for convolution problems (see, e.g., Meister (2009) and the literature cited therein) are not applicable in the context of this paper. But we can assume that with increasing number n of total samples we also get more samples from the strain controlled fatigue tests for each of the materials. Thus, our estimatesμ (m) andσ (m) and therefore our dataδ (m) i of (5) get more reliable for all materials m. Consequently, with increasing n, our errorsδ get small for all materials m. Since the δ (m) i are the samples of our dependent variable it seems to be a natural idea in our application to assume that the absolute errors between Y i andȲ i,n uniformly converge to zero almost surely, i.e., to assume that In our theoretical results in Section 2 it will turn out that we only have to assume the weaker condition where W n,i are the weights of the local averaging estimate above. Note also that our set-up is triangular, which is the necessary in our application since the estimatesμ (m) andσ (m) can change with the number of data points n and can therefore lead in (5) to completely new samples with errors of the random variable δ (m) .
Since we do not assume anything on the nature of the errors besides that they are pointwise asymptotically negligible in the sense that (E1) holds, it seems to be a natural idea to ignore them completely and to try to use the same estimates as in the case that an independent and identically distributed sample is given.

Main Results
In Theorem 2.1 we present sufficient conditions on the weights and prove that these conditions ensure that the estimateF n (y, x) applied to data X 1 ,Ȳ 1,n , …, X n ,Ȳ n,n with errors in the samples of the dependent variable is pointwise consistent in the sense that it approaches the interval for P X -almost all x asymptotically, presumed that the errors fulfill (E1). As we will show in Corollary 2.1, these assumptions on the weights are for example fulfilled by the weights of the kernel estimate.
We also show that two of the three sufficient conditions and a weaker version of the third condition are also necessary for the above pointwise consistency (see Theorem 2.2).
We also investigate the rate of convergence of the estimateF n and present conditions on the weights, which ensure forF n a pointwise rate of convergence in probability of We also present an application to simulated and real data (see Section 3). In the real data application we use the considered method to estimate the distribution function of the numbers of cycles until failure in the context of fatigue behavior of steel under cyclic loading. This estimate is utilized as the objective in a shape optimization procedure, which is embedded in an algorithm-based product development approach to determine an optimal profile geometry with respect to the fatigue behavior.

Notation
Throughout this paper the following notation is used: We write U n = O P (V n ) if the nonnegative random variables U n and V n satisfy The sets of natural positive, natural nonnegative and real numbers are denoted by N, N 0 and R, respectively. We write → P as an abbreviation for convergence in probability and I A for the indicator function of the set A. We denote the Euclidean Norm on R d by ||·||. For z ∈ R and a set A ⊆ R, we define the distance from z to A as Furthermore, we write for the left-sided limit of a function G

Outline
The outline of the paper is as follows: The main results are formulated in Section 2 and proven in the supplemental material. In Section 3, we present an application to simulated and real data.

Main results
LetF be a local averaging estimate of the codf F (y, x) corresponding to the data with errors X 1 ,Ȳ 1,n , …, X n ,Ȳ n,n .

Consistency
First of all, we give sufficient conditions on the sequence of weights W n,i , such that the estimateF n is pointwise consistent for all distributions of (X , Y ) and all y ∈ R.
The following result holds, which will be proven in Section S2 in the supplemental material.
for P X -almost every x ∈ R d . Furthermore letȲ 1,n , . . . ,Ȳ n,n be random variables, which fulfill (E1) and letF n be the local averaging estimate defined in (6) with weights W n,i . ThenF n is pointwise consistent in the sense that for all y In the following corollary, which will be proven in Section S2 in the supplemental material, we formulate sufficient conditions for the pointwise strongly consistency of the kernel estimate of the codf, defined by the weights in (2).
Assume that K is the naive kernel and that the bandwidth h n > 0 fulfills Furthermore, letȲ 1,n , . . . ,Ȳ n,n be random variables, which fulfill Let W n,i be the weights of the kernel estimate with kernel K and bandwidth h n . Then the kernel estimateF n of the codf as defined in (6) Remark 2.1 Analogous results can be shown for estimates of the codf corresponding to the partitioning and nearest neighbor weights, assuming the conditions from Theorems 25.6. and 25.17., respectively, in Györfi et al. (2002). Furthermore, Corollary 2.1 can be extended to a more general class of kernels, which has been considered by Greblicki et al. (1984).
In Theorem 2.1, we formulated sufficient conditions on the sequence of weights that imply the pointwise consistency in the sense of (7). In the following theorem, we show that at least two of these three conditions and a weaker version of (A3) are also necessary, if F n is pointwise consistent in the sense that for all distributions of (X , Y ) for all y ∈ R (7) holds. As we will see in the following theorem, it is sufficient to Theorem 2.2 Assume that W n,i is a sequence of nonnegative weights such that the corresponding estimate F n from (1) is strongly pointwise consistent for all distributions of (X , Y ) and all independent and identically as (X , Y ) distributed random vectors (X 1 , Y 1 ) , (X 2 , Y 2 ) , . . . in the sense that for all y ∈ R (7) holds. Then (A1), (A2) and which is a weaker version of (A3), are fulfilled.

Rate of convergence
Next, we investigate the rate of convergence. Therefore, we assume that for a fixed y 0 ∈ R and P X -almost all x the codf F (y, x) is locally Hölder continuous in x with exponent 0 < p ≤ 1, locally uniform in y. More precisely, we assume that for P X -almost every x there exist finite constants C (x) , κ 1 (x) , κ 2 (x) > 0 such that sup y∈R:|y−y 0 |≤κ 1 (x) for all z ∈ R d with ||z − x|| ≤ κ 2 (x). In the following result we present conditions on the weights, which ensure a certain pointwise rate of convergence in probability forF n .

Theorem 2.3
Let (X , Y ) , (X 1 , Y 1 ) , (X 2 , Y 2 ) . . . be i.i.d. R d ×R-valued random vectors and y 0 ∈ R be fixed. Furthermore, let the conditional distribution function F fulfill the Hölder-assumption of (10) in y 0 ∈ R for some 0 < p ≤ 1 and let F (·, x) be continuous and differentiable at y 0 for P X -almost every x. Furthermore, let a n (x) , b n (x) and c n (x) for every x ∈ R d be real and positive sequences, which tend to zero as n → ∞ for P X -almost every x ∈ R d . Let W n,i (x) := W n,i (x, X 1 , . . . , X n ) be nonnegative weights, which fulfill for P X -almost every Furthermore letȲ 1,n , . . . ,Ȳ n,n be random variables, which fulfill and letF n be the local averaging estimate defined in (6) with weights W n,i . Then for It can be shown that the conditions from 2.3 are fulfilled by the kernel estimate for some appropriate sequences a n , b n and c n , which leads to the following corollary, which will be proven in Section S2 in the supplemental material.
valued random vectors and let y 0 ∈ R be fixed. Assume that the conditional distribution function F fulfills the assumptions of Theorem 3. Assume furthermore that K is the naive kernel and that the bandwidth h n > 0 fulfills h n → 0 and n · h d n → ∞ (n → ∞) .
Furthermore, letȲ 1,n , . . . ,Ȳ n,n be random variables, which fulfill LetF n be the kernel estimate of the codf with kernel K and bandwidth h n as defined in (6). Then for P X -almost every x In particular, the choice of h n =c · 1 n 1 2 p+d leads to for P X -almost every x.

Remark 2.2
The rate 1 n p 2 p+d + √ η n (x) can also be achieved by choosing a sufficient number of the nearest neighbors and special cubic partitions for the weights of the nearest neighbor and partitioning estimate of the conditional distribution function.

Application to simulated and real data
In this section we apply the above described methods to simulated and real data and estimate the codfs. Therefore we choose W n,i as kernel weights with naive kernel. The bandwidth h n is chosen data-dependent from the set {0.05, 0.1, 0.2, 0.3} by crossvalidation w.r.t. the corresponding regression estimatê [cf. Section 8 in Györfi et al. (2002)]. More precisely, we try to find aĥ ∈ {0.05, 0.1, 0.2, 0.3} that minimizes wheremĥ ,− j is the above mentioned regression estimate with kernel weights and bandwidthĥ corresponding to all n data points with additional errors in the dependent variable omitting X j ,Ȳ j,n . In order to get an impression regarding the convergence of our estimates, we firstly consider distributions with known codfs, afterwards we will apply our estimator to real data in the context of experimental fatigue tests. The latter estimate is then utilized as the objective in a shape optimization procedure, which is embedded in an algorithm-based product development approach to determine an optimal profile geometry with respect to the fatigue behavior.

Application to simulated data
Motivated by the application in the context of experimental fatigue tests, where we have 1222 data points (see Sect. 3.2), we will consider sample sizes of n = 500, 1000 and 2000 in order to classify our estimate of the codf. The goodness of our estimate of the codf will be assessed by the maximum absolute error on a grid that is determinded by equidistant y 1 , . . . , y I and x 1 , . . . , x J for some fixed numbers I , J ∈ N . Due to the random number generation in our simulated data, our estimates of the codf contain randomness, therefore we repeat the codf estimation 100 times with new random numbers and subscript our maximum absolute errors by an upper index i. We will compare our estimates by considering the average value 1 100 100 i=1 err i max of the maximum absolute error. As a first example we choose (X , Y ) , (X 1 , Y 1 ) , (X 2 , Y 2 ) , . . . as independent and identically distributed random vectors such that X is F-distributed with 5 numerator and 2 denominator degrees of freedom and Y is normal-distributed with mean X · (X − 1) and variance 1. As data with errors we setȲ i,n = Y i + 100 n . Observe that we get completely new samples, when n changes. As a comparison to that we also considerZ i,n = Y i + 100 i , where the samples with bigger errors are kept by. The values x 1 , . . . , x 20 and y 1 , . . . , y 20 for the grid for the maximum absolute error are  F (y, x). The mentioned result of Corollary 2.1 is confirmed by the average values of the maximum absolute error in Table 1. Due to the fact that the samples with bigger errors are kept by, the estimator FȲ ,n yields smaller average squared errors than the estimator FZ ,n , in particular for the small sample sizes n.
As a second example we choose (X , Y ) , (X 1 , Y 1 ) , (X 2 , Y 2 ) , . . . as independent and identically distributed random vectors such that X is t (5)-distributed and Y is exponentially distributed with mean √ |X |. As data with errors we chooseȲ i,n = Y i + U i,n , where U 1,n , . . . , U n,n are independent and uniformly on (0, 100/n)-distributed, which are also independent of (X 1 , Y 1 ) , . . . , (X n , Y n ). The values x 1 , . . . , x 20 and y 1 , . . . , y 20 for the grid for the maximum absolute error are chosen equistantly on [−1, 1] and [0.5, 1]. Again, we can conclude from Corollary 2.1 for all y ∈ R the pointwise consistency ofFȲ ,n , which is confirmed by the average maximum errors in Table 2.
As a third example we choose (X , Y ) , (X 1 , Y 1 ) , (X 2 , Y 2 ) , . . . as independent and identically distributed random vectors with a discrete covariate X which is uniformly distributed on {1, 2, 3, 4, 5}. The dependend variable Y is chosen as χ 2 -distributed with X degrees of freedom. As data with errors we setȲ i,n = Y i + i,n where 1,n , . . . , n,n are independent and normal-distributed with mean and variance 100/n and also independent of (X 1 , Y 1 ) , . . . , (X n , Y n ). As the covariate is discrete in this setting and the distance between the discrete values is one, the set {0.5, 1, 2} is used for the choice of the bandwidth. A choice ofĥ = 0.5 would for example mean that for x = j only samples of Y will be used in the estimate for which the corresponding samples of the covariate are equal to j. In this example the x-values for the grid for the maximum absolute error can be chosen as x j = j for j = 1, 2, 3, 4, 5. The values for y 1 , . . . , y 20 will be chosen equidistantly on [0.5, 5]. Again, we can conclude from Corollary 2.1 for all y ∈ R the pointwise consistency ofFȲ ,n , which is confirmed by the average maximum errors in Table 3.

Application to real data
In the following, we provide an application of the methods above in the context of shape optimization of steel profiles with respect to the fatigue behavior under cyclic loading. From a practical point of view the robustness against cyclic loading is one of the major aspects to guarantee a long product lifetime, which results in an increased sustainability. Usually, the design process of such a profile geometry takes a lot of time and (human) resources, with no guarantee of the resulting profile geometry to be optimal. The new method described in this paper automates this process, which reduces the development effort and leads to a provable optimal profile design. For further details on the algorithm-based development process we refer to Roos et al. (2016) and Groche et al. (2017).
Our focus in the subsequent steps lies on the optimization of an integral sheet metal profile (made of the material HC 480 LA) with respect to the fatigue behavior. In particular we study a three-chambered profile, which is continuously produced in an integral way and can for example be used to separate oil, water and power supply. The used manufacturing technology is developed within the Collaborative Research Center 666 (CRC 666) at the Technische Universität Darmstadt. One main aspect of this technology is to produce those integral structures out of one part by linear flow and bend splitting. On the one hand this production technique requires less joining operations involving, for example, stress concentration or the action of heat and on the other hand the linear flow splitting leads to a ultrafine-grained microstructure (UFG) at the upper side of the steel flanges (cf., e.g., Bohn et al. (2008)). Both points yield significant advantages concerning the material properties, which can be utilized to produce lightweight structures with an improved fatigue life. Due to the significantly changed material properties in comparison to the material in as-received state, we model the linear flow split and bent parts of the structure as a different material.
In order to assess the fatigue behavior, we use data of experimental fatigue tests, in which a material sample gets repeatedly elongated by a fixed strain amplitude ε. The repetitions, the so called number of cycles N , until the material fails are counted and the corresponding stress amplitude τ is measured. Based on this data we estimate the conditional distribution function of the number of cycles N (m) until failure given a fixed strain amplitude ε for both of two considered materials m (as-received and linear flow split state). This estimation will be described in Section 3.1 in detail. Finally, both estimates of the codf are evaluated at N min = 50, 000 in order to determine the approximate probability of a failure before N min number of cycles for a fixed strain amplitude ε, which will be used as the objective of the optimization. Details on the shape optimization can be found in Sect. 3.2.

Estimation of the conditional distribution function
For the estimation of the codf we use a database that contains for each material m data Since the experimental fatigue tests for obtaining one of the above data points are very time consuming, there are only 12 and 8 data points available for the considered material HC 480 LA in as-received and linear flow split state, respectively, which is not enough for a nonparametric estimation of the conditional distribution function. In order to nevertheless estimate the codf of the number of cycles until failure, we assume the model (m) to hold, where μ (m) (ε) is the expected number of cycles until failure, σ (m) (ε) is the standard deviation for each material m and strain amplitude ε; δ (m) is an error term that has expectation 0 and variance 1 for each material m. We estimate the conditional distribution function of δ (m) as well as μ (m) (ε) and σ (m) (ε), so that we can obtain an estimate of the codf of N (m) by a simple linear transformation. For this purpose we use a similar approach as in Bott and Kohler (2017): In order to obtain an estimateμ (m) (ε) of the expected number of cycles μ (m) (ε), we apply a standard-method from the literature [(cf. Williams et al. (2003))], which uses the measured data of material m to estimate the coefficients p = σ f , ε f , b, c of the strain life curve according to Coffin-Morrow- Manson(cf. Manson (1965))by linear regression and estimate μ (m) (ε) from the corresponding strain life curve.
The estimation of the standard deviation σ (m) (ε) is more complicated, since we need to apply a nonparametric estimator to the squared deviations for each material m, which usually needs more samples. So we augmented our data points per material m by 100 artificial ones as in Furer and Kohler (2015): At first, we interpolate the squared deviations Y (k) i for each material k = m on a grid of 100 equidistant strain amplitudes ε. In order to generate an artificial data point at a fixed grid point, we also use interpolated values from materials, that are similar to the material m, assuming that similar materials yield similar fatigue behavior. Observe that we use the whole database consisting of 132 materials in order to obtain more interpolated values and to improve the statistical power of our estimation. The similarity is measured using 5 static material properties, namely Young's modulus, the yield limit for 0.2% residual elongation, the tensile strength, the static strength coefficient and the static strain hardening exponent. In order to ensure that we only use interpolated values from those materials that have similar static material properties, we apply the Nadaraya-Watson kernel regression estimates with the static material properties as covariate and the interpolated data as dependent variable. In this way we obtain 100 artificial data points (one at each grid point) per material m. Finally, the estimation of the standard deviation σ (m) (ε) is done by weighting the Nadaraya-Watson kernel regression estimates applied to the real and the artificial data of the squared deviations as dependent variable and the corresponding ε-values as covariate.
Thus, we finally determine the data sampleŝ of the random variables δ (m) for each material m. Notice that these samples contain errors because we only estimated μ (m) (ε) and σ (m) (ε). Since only 12 and 8 of the above data samples for the two material states of HC 480 LA are available, we also use data samples from other materials of the database, that have similar static material properties (with the same justification as above), in order to estimate the codf of δ (m) . This consideration of similar materials in the estimation of the conditional distribution function is done by using the kernel estimate of the codf with the static material properties as covariate X i and the data samples of δ (m) for all 122 materials m as the dependent variable. The bandwidth h of the kernel weights is determined by a crossvalidation of the corresponding regression estimate as described in the beginning of this section.
As described in Sect. 1.2 it can be assumed that (E1) holds for the errorsδ i . Thus, evaluating the mentioned estimate of the codf at the static material properties x = X (m) of some material m leads to an estimateĜ δ (m) of the codf of δ (m) (see Theorem 2.2 for a theoretical justification). However, this estimateĜ δ (m) can be transformed to an estimateF (m) of the codf of N (m) given a strain amplitude ε bŷ This estimate of the conditional distribution function is evaluated at y = N min =50,000 numbers of cycles to obtain an approximate probability of a failure before N min number of cycles for a fixed strain amplitude ε. In Fig. 1, we illustrated this estimated probabilityF N (m) (N min , ε) for the considered material HC 480 LA in as-received and linear flow split state and ε ∈ [0%, 1%]. Here the strain amplitude ε is given proportional to the length of the material sample used in the experiments. As expected,F N (m) (N min , ε) is increasing in ε.
Since we also needed the derivative ofF N (m) (N min , ε) w.r.t. ε, we interpolated the functionF N (m) (N min , ε) by a piece-wise cubic smoothing spline, using 200 equidistant data points of ε and corresponding function values.

Fatigue Strength Shape Optimization
The former presented estimate of the failure probabilityF N (m) (N min , ·) is in the following applied to the shape optimization of a multichambered profile. Our aim is to find the optimal geometry for a specific load scenario and a given starting geometry under certain design constraints, to reach minimal failure probability, as defined above. In order to calculate the failure probability of every point of the profile, we model the physical behavior of the considered geometry under applied loads at each point. For this purpose we describe the mechanical system in terms of the linear elasticity equations, for further details on the elasticity equations we refer to the supplement material Section S1. For numerical treatment of the elasticity equations are discretized in the sense of isogeometric analysis. Thus, the discretized linear elasticity equations are denoted by The discretization by methods of the isogeometric approach is explained in detail in the supplement material.

Shape Optimization
In this section we briefly describe the shape optimization problem governed by the linear elasticity problem as defined in the supplement material Section S1. The finite dimensional shape optimization problem can be written as The design variables are denoted by u ∈ R n , where n ∈ N is the dimension of the design space. By y h ∈ Rñ the displacement is described, which is determined by the linear elasticity equations A h (u)y h = b h (u). The number of the control points of the isogeometric mesh is denoted byñ ∈ N. An introduction to shape optimization is given in Haslinger and Mäkinen (2003). Since the elasticity problem has a unique solution, we define a Lipschitz continuous solution operator u → y h (u) such that the reduced form of the objective function can be written as The corresponding shape gradient g h (u) can be efficiently determined by the adjoint approach as described in Hinze et al. (2009). The reduced shape optimization is stated as Here the set of admissible designs U ad ⊂ R n is defined by design constraints, for example angle or length restrictions. In this work we use the accumulated failure probability as objective function as defined above, see (17), with N min = 50, 000 fixed, where the sum is calculated over all control points (coefficients of the basis functions in y h above). If we define the failure of the whole profile by the failure of one of its parts, the accumulated failure probability over all parts is a (discretized) upper bound on the failure probability of For simplicity the bending radii are neglected in this draft. The right side of the profile is clamped at the bottom and the top. A uniform surface load q 1 ∈ R 3 , with q 1 = 70N , is applied with 45 • to the surface at the upper left part and a load q 2 ∈ R 3 , with q 2 = 70N , is applied with 45 • to the lower left side of the profile. The load scenario is constant in the third dimension q 1 q 2 the whole profile and thus a reasonable objective function. It can be shown, that the objective function is nonconvex with respect to the design. The main principal strainε can be determined by calculating the maximal eigenvalue with respect to the absolute value of the linearized strain tensor ε, by Cardano's formula. The optimization is done using a sequential quadratic programming method (SQP) [see, e.g., Nocedal and Wright (2006)].

Numerical Result
We apply the above described methods to perform a shape optimization of a threechambered profile with respect to fatigue strength. Therefore, we assume a static load scenario, as shown in Fig. 2. The profile is clamped at the boundary on the right-hand side. Additionally, there are surface loads applied at the upper and lower left of the geometry. The loads act on the surface at an angle of 45 • .
The geometry is modeled as a tricubic NURBS solid, with 25,920 degrees of freedom and 1350 elements. The outer dimensions are 50 cm × 50 cm × 2 cm. To reduce the need of numerous additional constraints, we applied a parametrization with only twelve degrees of freedom. For this purpose, we subdivide the profile into four parts and determine the barycentric coordinates of each control point.
As constraints, we consider an upper bound on the total volume, and we fix the volume of the Neumann and Dirichlet boundaries. For technical reasons, we also add a minimal bound for the volume of each element to circumvent negative element volumes. After 58 iterations with 375 function evaluations, the SQP method found the solution depict in Fig. 3. The accumulated failure probability could be reduced about almost 53.58%. The used SQP method is the standard MATLAB R2018a implementation. Additionally, we compare the result to the optimization with respect to the compliance where f h and q h are the discretized volume force and surface load acting on the geometry, respectively, and M h and M h are the mass matrices of the interior and Fig. 3 Starting solution (left) compared to the optimal geometries with respect to the accumulated failure probability (middle) and the compliance (right). The color represents the von Mises stress in MPa. The displacement is neglected. The accumulated failure probabilities of the optimal solutions could be reduced about 53.58% (middle) or 37.82% (right), respectively boundary of the considered geometry, respectively. In this case the compliance could be reduced about 58.34% after 29 iterations and 106 function evaluations. The optimal solution is visualized in Fig. 3. The accumulated failure probability of this geometry is reduced about 37.82% compared to the starting solution. We see that in general the optimization of the accumulated failure probability can not be replaced by the classical compliance optimization. All the calculations are performed on an Intel Core i7-4790 CPU with 3.60 GHz and 16 GB RAM. The used software was Mathworks MATLAB R2018a running in single thread mode.